UTF-8以外のASCII文字の問題twitteRパッケージのR

以前の質問では、Haaretz Twitterから多数のTwitterフォロワー（およびその場所、作成日、フォロワーの数など）をダウンロードすることについて尋ねましたRのtwitteRパッケージ（Work around rate limit for extracting large list of user information using twitteR package in Rを参照）を使用したフィード（@haaretzcom）。 Twitterのフィードには90,000人以上のフォロワーがいます。以下のコードを使用して、フォロワーの全リストを問題なくダウンロードできました。UTF-8以外のASCII文字の問題twitteRパッケージのR

require(twitteR) 
    require(ROAuth) 
    #Loading the Twitter OAuthorization 
    load("~/Dropbox/Twitter/my_oauth") 

    #Confirming the OAuth 
    registerTwitterOAuth(my_oauth) 

    # opening list to download 
    haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999) 

    for (follower in haaretz_followers){ 
    Sys.sleep(5) 
    haaretz_followers_info<-lookupUsers(haaretz_followers) 

    haaretz_followers_full<-twListToDF(haaretz_followers_info) 

    #Export data to csv 
    write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv", sep=",") 
}

このコードは、多くのユーザーの抽出に役立ちます。私は、特定のユーザーを打つたびしかし、私は次のエラーを取得する：

Error in twFromJSON(out) : 
RMate stopped at line 51 
Error: Malformed response from server, was not JSON. 
RMate stopped at line 51 
The most likely cause of this error is Twitter returning a character which 
can't be properly parsed by R. Generally the only remedy is to wait long 
enough for the offending character to disappear from searches (e.g. if 
using searchTwitter()). 
Calls: twListToDF ... lookupUsers -> lapply -> FUN -> <Anonymous> -> twFromJSON 
Execution halted

私はTwitterのパッケージ後RJSONIOパッケージをロードした場合でも、私はこの問題に実行しています。 TwitteRとRJSONIOパッケージでは、UTF-8以外の文字やASCII文字（アラビア語など）の解析に問題があるようです。http://lists.hexdump.org/pipermail/twitter-users-hexdump.org/2013-May/000335.html私が持っているコードで単純に非UTF-8やASCIIを無視して、フォロワの情報をすべて抽出する方法はありますか？どんな助けでも大歓迎です。

出典

2013-05-15 Thomas

あなたはツイートをつかむことができますか？それは失敗したばかりの解析か、ツイートをダウンロードできませんか？前者の場合は、 'readLines'を使用して問題の文字列を取り除くことができます –

@RicardoSaporta残念ながら、それは私にツイートをダウンロードさせません。このループは、問題のユーザー情報について言えば壊れてしまいます。 – Thomas

@トーマス：まだ答えはありませんか？私はtwitteRで何かをしようとするたびにこれに遭遇する... – Heisenberg

この問題に対処するパッケージアップデート（1.1.7）があります。 https://github.com/geoffjentry/twitteR/blob/master/NEWS

出典

2013-08-17 15:48:38 SPi

UTF-8以外のASCII文字の問題twitteRパッケージのR

答えて

関連する問題