apache httpclientを使用してTwitterのストリーミングAPIを増分処理する？

私は、Apache HTTPClient 4を使用して、デフォルトレベルのアクセス権を持つTwitterのストリーミングAPIに接続しています。これは、初めに完璧に動作しますが、データを取得するのは数分後には、このエラーで出ベイル：私はこの問題に直面していますなぜapache httpclientを使用してTwitterのストリーミングAPIを増分処理する？

2012-03-28 16:17:00,040 DEBUG org.apache.http.impl.conn.SingleClientConnManager: Get connection for route HttpRoute[{tls}->http://myproxy:80->https://stream.twitter.com:443] 
2012-03-28 16:17:00,040 WARN com.cloudera.flume.core.connector.DirectDriver: Exception in source: TestTwitterSource 
java.lang.IllegalStateException: Invalid use of SingleClientConnManager: connection still allocated. 
    at org.apache.http.impl.conn.SingleClientConnManager.getConnection(SingleClientConnManager.java:216) 
Make sure to release the connection before allocating another one. 
    at org.apache.http.impl.conn.SingleClientConnManager$1.getConnection(SingleClientConnManager.java:190)

私は理解しています。私はこのHttpClientをflumeソースとしてflumeクラスタに使用しようとしています。コードは次のようになります。私はStringBufferのに応答ストリームで30,000文字をバッファリングして、受信データとしてこれを返すようにしようとしています

public Event next() throws IOException, InterruptedException { 

    try { 

     HttpHost target = new HttpHost("stream.twitter.com", 443, "https"); 
     new BasicHttpContext(); 
     HttpPost httpPost = new HttpPost("/1/statuses/filter.json"); 
     StringEntity postEntity = new StringEntity("track=birthday", 
       "UTF-8"); 
     postEntity.setContentType("application/x-www-form-urlencoded"); 
     httpPost.setEntity(postEntity); 
     HttpResponse response = httpClient.execute(target, httpPost, 
       new BasicHttpContext()); 
     BufferedReader reader = new BufferedReader(new InputStreamReader(
       response.getEntity().getContent())); 
     String line = null; 
     StringBuffer buffer = new StringBuffer(); 
     while ((line = reader.readLine()) != null) { 
      buffer.append(line); 
      if(buffer.length()>30000) break; 
     } 
     return new EventImpl(buffer.toString().getBytes()); 
    } catch (IOException ie) { 
     throw ie; 
    } 

}

。私は明らかに接続を閉鎖していない - しかし、私はそれを閉じたいとは思わないが、私は推測する。 Twitterのdevのガイドは、それが読み込み、このhereについて語る：

は

Some HTTP client libraries only return the response body after the connection has been closed by the server. These clients will not work for accessing the Streaming API. You must use an HTTP client that will return response data incrementally. Most robust HTTP client libraries will provide this functionality. The Apache HttpClient will handle this use case, for example.

それは明らかにHttpClientをインクリメンタルに応答データを返すことを示しています。私はサンプルとチュートリアルを終えましたが、これに近いものは見つかりませんでした。皆さんがhttpclient（apacheでない場合）を使用して、twitterのストリーミングAPIを徐々に読み込んだら、このフィーチャーをどのように達成したか教えてください。持っていない人は、自由に回答に貢献してください。 TIA。

UPDATE

私はこれをやってみました：1）私は、水路ソースのopenメソッドにストリームハンドルを取得する移動。 2）単純な入力ストリームを使用し、データをバイトバッファに読み込みます。そこでここでは、メソッド本体が今のように見えるものです：

 byte[] buffer = new byte[30000]; 

     while (true) { 
      int count = instream.read(buffer); 
      if (count == -1) 
       continue; 
      else 
       break; 
     } 
     return new EventImpl(buffer);

これは程度の作品 - 私はつぶやきを取得し、彼らはうまく先に書き込まれています。問題は、instream.read（バッファー）の戻り値にあります。ストリームにデータがなく、バッファのデフォルト値が\ u0000バイトと30,000である場合でも、この値は宛先に書き込まれます。だから宛先ファイルはこのようになります.. "つぶやき.tweets..tweeets .. \ u0000 \ u0000 \ u0000 \ u0000 \ u0000 \ u0000 \ u0000 ...つぶやく。私はカウントが-1コーズを返さないことを理解しています。これは決して終わりのないストリームなので、バッファに読み込みコマンドの新しいコンテンツがあるかどうかをどうやって判断しますか？

出典

2012-03-28 Jay

#closeメソッドでスローされたI/O例外をキャッチしようとしましたか？私はそれに応じて私の答えを更新しました。 – oleg

また、\ u0000 \ u0000 ... bytes/nullバイトはストリームにありません - 30k文字のバッファをインスタンス化すると、これらはデフォルトのバイトになり、ストリームの内容が30k文字未満の場合、残りの文字は空のバイトです。 – Jay

これは水路の問題であることが判明しました。 Flumeは、サイズ32kbのイベントを転送するように最適化されています。 32キロバイトを超えて、Flumeが救う。（回避策は、イベント・サイズを32KBより大きく調整することです）。だから、少なくとも20,000文字をバッファするようにコードを変更しました。それは作品のようなものですが、それは愚かな証拠ではありません。これは、バッファの長さが32kbを超えると失敗する可能性がありますが、これは1時間のテストでこれまで失敗していません。Twitterが公開ストリームで多くのデータを送信しないという事実と関係していると思います。

while ((line = reader.readLine()) != null) { 
      buffer.append(line); 
      if(buffer.length()>20000) break; 
     }

出典

2012-04-01 13:25:48 Jay

問題は、コードが接続をリークしていることです。コンテンツストリームを閉じても、要求を中断しても問題がないことを確認してください。

InputStream instream = response.getEntity().getContent(); 
    try { 
     BufferedReader reader = new BufferedReader(
       new InputStreamReader(instream)); 
     String line = null; 
     StringBuffer buffer = new StringBuffer(); 
     while ((line = reader.readLine()) != null) { 
      buffer.append(line); 
      if (buffer.length()>30000) { 
       httpPost.abort(); 
       // connection will not be re-used 
       break; 
      } 
     } 
     return new EventImpl(buffer.toString().getBytes()); 
    } finally { 
     // if request is not aborted the connection can be re-used 
     try { 
      instream.close(); 
     } catch (IOException ex) { 
      // log or ignore 
     } 
    }

出典

2012-03-28 19:56:19 oleg

nope。働いていない。 Flumeはストリームがクローズされたことを通知します。処理を開始する前に例外が発生しています。 – Jay

例外は#close（）メソッドによってスローされ、無視しても問題ありません。 – oleg

apache httpclientを使用してTwitterのストリーミングAPIを増分処理する？

答えて

関連する問題