2016-06-17 15 views
1

私はいくつかのデータ解析をPythonでやろうとしています。それによって、いくつかのTwitterデータを使用して、異なる国のツイートの数を調べようとしています。PythonでKeyErrorを処理するには?

import json 
import pandas as pd 
import matplotlib.pyplot as plt 

tweets_data=[] 
with open('/home/surya/tweet.txt','r') as f: 
    for line in f: 
    try: 
     tweet= json.loads(line) 
     tweets_data.append(tweet) 
    except: 
     continue 

tweet_table= pd.DataFrame() 
tweet_table['country']= map(lambda tweet: tweet["place"]["country"] if tweet["place"] != None else None, tweets_data) 

tweets_by_country = tweet_table['country'].value_counts() 

fig, ax = plt.subplots() 
ax.tick_params(axis='x', labelsize=15) 
ax.tick_params(axis='y', labelsize=10) 
ax.set_xlabel('Countries', fontsize=15) 
ax.set_ylabel('Number of tweets' , fontsize=15) 
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold') 
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue') 

これは

KeyError "place" 

は、だから私はこのようなものにコードを変更し、エラー生成していました:これは私が使用していたコードである

import json 
import pandas as pd 
import matplotlib.pyplot as plt 
tweets_data=[] 

def keyCheck(key,arr,default): 
    if key in arr.keys(): 
     return arr[key] 
    else: 
     return default 

with open('/home/surya/tweet.txt','r') as f: 
    for line in f: 
    try: 
     tweet= json.loads(line) 
     tweets_data.append(tweet) 
    except: 
     continue 

tweet_table= pd.DataFrame() 
tweet_table['country']= map(lambda tweet: tweet["place"]["country"] if keyCheck("place",tweet,"#default") != None else None, tweets_data) 

tweets_by_country = tweet_table['country'].value_counts() 

fig, ax = plt.subplots() 
ax.tick_params(axis='x', labelsize=15) 
ax.tick_params(axis='y', labelsize=10) 
ax.set_xlabel('Countries', fontsize=15) 
ax.set_ylabel('Number of tweets' , fontsize=15) 
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold') 
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue') 

をしかし、これはエラーになりました

AttributeError: list object has no attribute "keys" 

私のデータのRMATがある:それはそう

{"created_at":"Thu Jun 16 13:15:13 +0000 2016","id":743431739238932480,"id_str":"743431739238932480","text":"I fucking hate Ramsey #ENGWAL #EURO2016 https:\/\/t.co\/wkFqOu8iwf","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":343618050,"id_str":"343618050","name":"SamuEars","screen_name":"S88Griff","location":"Derbados","url":null,"description":"27 years old, @RocesterFC1876 footballer, genuine, chilled out, opinionated, but most of all, wind up merchant","protected":false,"verified":false,"followers_count":496,"friends_count":272,"listed_count":1,"favourites_count":1915,"statuses_count":5505,"created_at":"Wed Jul 27 20:53:02 +0000 2011","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/636136111191031809\/aQyj3bgK_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/636136111191031809\/aQyj3bgK_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/343618050\/1409857726","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":{"id":"232163114ebb8671","url":"https:\/\/api.twitter.com\/1.1\/geo\/id\/232163114ebb8671.json","place_type":"city","name":"Etwall","full_name":"Etwall, England","country_code":"GB","country":"United Kingdom","bounding_box":{"type":"Polygon","coordinates":[[[-1.608732,52.874969],[-1.608732,52.887677],[-1.594409,52.887677],[-1.594409,52.874969]]]},"attributes":{}},"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"ENGWAL","indices":[22,29]},{"text":"EURO2016","indices":[30,39]}],"urls":[],"user_mentions":[],"symbols":[],"media":[{"id":743431733853433856,"id_str":"743431733853433856","indices":[40,63],"media_url":"http:\/\/pbs.twimg.com\/media\/ClEzORsWMAAlcQ3.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/ClEzORsWMAAlcQ3.jpg","url":"https:\/\/t.co\/wkFqOu8iwf","display_url":"pic.twitter.com\/wkFqOu8iwf","expanded_url":"http:\/\/twitter.com\/S88Griff\/status\/743431739238932480\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":517,"resize":"fit"},"medium":{"w":1178,"h":896,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":1178,"h":896,"resize":"fit"}}}]},"extended_entities":{"media":[{"id":743431733853433856,"id_str":"743431733853433856","indices":[40,63],"media_url":"http:\/\/pbs.twimg.com\/media\/ClEzORsWMAAlcQ3.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/ClEzORsWMAAlcQ3.jpg","url":"https:\/\/t.co\/wkFqOu8iwf","display_url":"pic.twitter.com\/wkFqOu8iwf","expanded_url":"http:\/\/twitter.com\/S88Griff\/status\/743431739238932480\/photo\/1","type":"photo","sizes":{"small":{"w":680,"h":517,"resize":"fit"},"medium":{"w":1178,"h":896,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"},"large":{"w":1178,"h":896,"resize":"fit"}}}]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1466082913585"} 

問題が「場所」キーは私のtweet.txtファイルのどこかに利用できないということです。

誰かが解決策を提案することができますか、正確に私が間違っているところを指摘してください。

EDIT

私はちょうどこの

import json 
import pandas as pd 
import matplotlib.pyplot as mp 

tweets_data=[] 
with open('/home/surya/tweet.txt','r') as f: 
    for line in f: 
    try: 
     tweet= json.loads(line) 
     tweets_data.append(tweet) 
    except: 
     continue 

tweet_table= pd.DataFrame() 
tweet_table['country'] = map(lambda tweet: tweet['place']['country'] if 'place' in tweet is not None and 'country' in tweet['place'] is not None else None, tweets_data) 

tweets_by_country = tweet_table['country'].value_counts() 

fig, ax = plt.subplots() 
ax.tick_params(axis='x', labelsize=15) 
ax.tick_params(axis='y', labelsize=10) 
ax.set_xlabel('Countries', fontsize=15) 
ax.set_ylabel('Number of tweets' , fontsize=15) 
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold') 
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue') 

にコードを更新しましたそして今、私はこのエラー

TypeError: argument of type 'NoneType' is not iterable 

UPDATE

ちょうど見つけ取得ソリューション。ツイートの「場所」が「なし」でない場合にのみ、tweets_dataを追加してください。あなただけに持っているので、私は問題を発見そこで

import json 
import pandas as pd 
import matplotlib.pyplot as plt 

tweets_data=[] 
with open('/home/surya/tweet.txt','r') as f: 
    for line in f: 
    try: 
     tweet= json.loads(line) 
     if 'place' in tweet is not None: 
      tweets_data.append(tweet) 
    except: 
     continue 

tweet_table= pd.DataFrame() 
tweet_table['country'] = [tweet['place']['country'] for tweet in tweets_data if tweet['place']] 

tweets_by_country = tweet_table['country'].value_counts() 

fig, ax = plt.subplots() 
ax.tick_params(axis='x', labelsize=15) 
ax.tick_params(axis='y', labelsize=10) 
ax.set_xlabel('Countries', fontsize=15) 
ax.set_ylabel('Number of tweets' , fontsize=15) 
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold') 
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue') 

plt.show() 
+0

'もし '場所'のつぶやき ' –

+0

'場所 'と一緒にツイート['場所 '] 'TypeType'オブジェクトに属性 '__getitem__'がありません ' –

+0

keyCheck関数全体が既にPythonで実装されています。単に 'dict_name.get(" key "、" defaultValue ")' – Keatinge

答えて

1
import json 
import pandas as pd 
import matplotlib.pyplot as plt 


with open("testfile.txt", "r") as f: 
    tweet_data = [json.loads(line) for line in f] 

tweet_table= pd.DataFrame() 

tweet_table['country'] = [tweet['place']['country'] for tweet in tweet_data if tweet['place']] 

tweets_by_country = tweet_table['country'].value_counts() 

fig, ax = plt.subplots() 
ax.tick_params(axis='x', labelsize=15) 
ax.tick_params(axis='y', labelsize=10) 
ax.set_xlabel('Countries', fontsize=15) 
ax.set_ylabel('Number of tweets' , fontsize=15) 
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold') 
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue') 

plt.show() 

さて、それはそれは、ちょうどその場所がない場所が存在しない場合はnullになります、つぶやきは常にJSONでキーとして場所が含まれていることだですあなたが国にアクセスしようとする前に、場所がヌルでないことを確認してください。これは私のために働いていますが、データは1つしかないので、私はイギリスに行っています

関連する問題