0
は、私は次のように詳細でトレーニングデータの6つのCSVファイルを持っている:異なるエントリを持ち、同じ列を持つフォルダ内の異なるCSVファイルをマージする方法はありますか?
1 chefmozaccepts.csv
Instances: 1314
Attributes: 2
placeID: Nominal
Rpayment: Nominal, 12 [cash,VISA,MasterCard-Eurocard,American_Express,bank_debit_cards,checks,Discover,Carte_Blanche,Diners_Club,Visa,Japan_Credit_Bureau,gift_certificates]
%---
2 chefmozcuisine.csv
Instances: 916
Attributes: 2
placeID: Nominal
Rcuisine: Nominal, 59 [Afghan,African,American,Armenian,Asian,Bagels,Bakery,Bar,Bar_Pub_Brewery,Barbecue,Brazilian,Breakfast-Brunch,Burgers,Cafe-Coffee_Shop, Cafeteria,California,Caribbean,Chinese,Contemporary,Continental-European,Deli-Sandwiches,Dessert-Ice_Cream,Diner,Dutch-Belgian,Eastern_European,Ethiopian,Family,Fast_Food,Fine_Dining,French,,Game,German,Greek,Hot_Dogs, International,Italian,Japanese,Juice,Korean,Latin_American,Mediterranean,Mexican,Mongolian,Organic-Healthy,Persian, Pizzeria,Polish,Regional,Seafood,Soup,Southern,Southwestern,Spanish,Steaks,Sushi,Thai,Turkish,Vegetarian,Vietnamese]
%---
3 chefmozhours4.csv
Instances: 2339
Attributes: 3
placeID: Nominal
hours: Nominal, Range:00:00-23:30
days:Nominal, 7 [Mon;Tue;Wed;Thu;Fri;Sat;Sun]
%---
4 chefmozparking.csv
Instances: 702
Attributes: 2
placeID: Nominal
parking_lot:Nominal, 7[public,none,yes,valet_parking,free,street,validated_parking]
%---
5 geoplaces2.csv
Instances: 130
Attributes: 21
placeID: Nominal
latitude: Numeric
longitude: Numeric
the_geom_meter: Nominal (Geospatial)
name: Nominal
address: Nominal,Missing: 27
city: Nominal, Missing: 18
state: Nominal, Missing: 18
country: Nominal, Missing: 28
fax: Numeric, Missing: 130
zip: Nominal,Missing: 74
alcohol: Nominal, Values: 3 [No_Alcohol_Served,Wine_Beer,Full_Bar]
%---
6 rating_final.csv
Instances: 1161
Attributes: 5
userID: Nominal
placeID: Nominal
rating: Numeric, 3 [0,1,2]
food_rating: Numeric, 3 [0,1,2]
service_rating: Numeric, 3 [0,1,2]
%---
%---
7 usercuisine.csv
Instances: 330
Attributes: 2
userID: Nominal
Rcuisine: Nominal, 103
あなたは、私は、各ファイルに異なっているインスタンスのしかし数、一つの共通の列PlaceIDを持って見ることができるように。
私はすべてのcsvファイルをplaceIDの1つの最終的なcsvに結合する必要があります。しかし、より多くのインスタンスを持つファイルの場合、最終的にすべての列が均等に塗りつぶされ、インスタンスが不均一な行に対して残りのメタデータを複製できるように、データを分割したいと思います。
サンプル入力:
ファイル1:
placeID Rpayment
135110 cash
135110 VISA
135110 MasterCard-Eurocard
135110 American_Express
135110 bank_debit_cards
135109 cash
135107 cash
135107 VISA
135107 MasterCard-Eurocard
135107 American_Express
135107 bank_debit_cards
135106 cash
135106 VISA
135106 MasterCard-Eurocard
135105 cash
ファイル2
placeID Rcuisine
135110 Spanish
135109 Italian
135107 Latin_American
135106 Mexican
135105 Fast_Food
135104 Mexican
135103 Burgers
135103 Dessert-Ice_Cream
135103 Fast_Food
135103 Hot_Dogs
ファイル3
placeID hours days
135110 08:00-19:00; Mon;Tue;Wed;Thu;Fri;
135110 00:00-00:00; Sat;
135110 00:00-00:00; Sun;
135109 08:00-21:00; Mon;Tue;Wed;Thu;Fri;
135109 08:00-21:00; Sat;
135109 08:00-21:00; Sun;
135108 00:00-23:30; Mon;Tue;Wed;Thu;Fri;
ファイル4
placeID parking_lot
135110 public
135109 none
135108 none
135107 none
135106 none
135105 none
ファイル5
placeID latitude longitude name address city state country fax zip alcohol smoking_area dress_code accessibility price url Rambience franchise area other_services
135109 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135107 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135106 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
サンプル出力:
placeID payment Cuisine parking_lot hours days latitude longitude name address city state country fax zip alcohol smoking_area dress_code accessibility price url ambience franchise area other_services
135110 cash Spanish public 08:00-19:00; Mon;Tue;Wed;Thu;Fri;
135110 VISA Spanish public 00:00-00:00; Sat;
135110 MasterCard-Eurocard Spanish public 00:00-00:00; Sun;
135110 American_Express Spanish public 08:00-19:00; Mon;Tue;Wed;Thu;Fri;
135110 bank_debit_cards Spanish public 00:00-00:00; Sat;
135110 bank_debit_cards Spanish public 00:00-00:00; Sun;
135109 cash Italian none 08:00-21:00; Mon;Tue;Wed;Thu;Fri; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135109 cash Italian none 08:00-21:00; Sat; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135109 cash Italian none 08:00-21:00; Sun; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet
135107 cash Latin_American none 07:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 VISA Latin_American none 07:00-23:30; Sat; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 MasterCard-Eurocard Latin_American none 07:00-23:30; Sun; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 American_Express Latin_American none 07:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 bank_debit_cards Latin_American none 07:00-23:30; Sat; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135107 MasterCard-Eurocard Latin_American none 07:00-23:30; Sun; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none
135106 cash Mexican none 18:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
135106 VISA Mexican none 18:00-23:30; Sat; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
135106 MasterCard-Eurocard Mexican none 18:00-21:00; Sun; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none
私は、これは面倒な作業である知っているが、助けいただければ幸いです。私はパンダを使用しようとしています。 csvreaderではありません。
をマージした後、私はそれは空白のままにしたいが、まだ合併して、ファイル5にplaceID 135110、たとえば、行方不明にいくつかのデータがあります。あなたのコードは動作しますが、データがない行は無視されます。最終出力にはplaceID 135110とその関連データがありません...... – lightyagami96
'how = 'inner'' to' how =' left''に変更すると、あなたが言ったように動作します – brunormoreira
本当にありがとう。 – lightyagami96