ファイルを読み込んでそのテキストをグループ化します

私はいくつかのテキストを含み、最後に数字を含むファイルを持っています。ファイルは次のようなものです：ファイルを読み込んでそのテキストをグループ化します

to Polyxena. Achilles appears in the in the novel The Firebrand by Marion 
the firebrand 14852520 
fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well 
captain marvel 403585 
the city its central theme and 
corfu 45462

私が望むのは、すべてのテキストを番号までグループ化することです。例：

" to Polyxena. Achilles appears in the in the novel The Firebrand by Marion the firebrand 14852520" 

" fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well captain marvel 403585"

各テキストグループが空白で始まることに気付きました。しかし、私はそれらをグループ化する方法が難しいです。私はこれをコード化：

String line; 
String s = " "; 
char whiteSpace = s.charAt(0); 

ArrayList<String> lines = new ArrayList<>(); 
BufferedReader in = new BufferedReader(new FileReader(args[0])); 
while((line = in.readLine()) != null) 
{ 
    if (whiteSpace == line.charAt(0)){ //start of sentence 
     lines.add(line);    
    } 
} 
in.close();

出典

2016-10-29 yaylitzis

実際にあなただけの「行」のArrayList、右にフロントにホワイトスペースで行を追加します？。だから例えば。 _火の街14852520_は配列の中にいけませんか？たぶんインデックスで試してみてください。したがって、2つの空白の間のすべての行が1つのインデックスに追加されます。行が空白で始まる場合は、索引を増やします。 – theoretisch

あなたはこのアルゴリズムに従うことができます：行が終了した場合、バッファ

に

追加：
- は行ごとに空のバッファ
- を作成します。番号付き：
- バッファをリストに追加します
- 空のバッファ

このような何か：

String text = " to Polyxena. Achilles appears in the in the novel The Firebrand by Marion \n" + 
     "the firebrand 14852520\n" + 
     " fantasy novelist David Gemmell omic book hero Captain Marvel is endowed with the courage of Achilles, as well \n" + 
     "captain marvel 403585\n" + 
     " the city its central theme and \n" + 
     "corfu 45462"; 
Scanner scanner = new Scanner(text); 

List<String> lines = new ArrayList<>(); 
StringBuilder buffer = new StringBuilder(); 

while (scanner.hasNext()) { 
    String line = scanner.nextLine(); 
    buffer.append(line); 
    if (line.matches(".*\\d+$")) { 
     lines.add(buffer.toString()); 
     buffer.setLength(0); 
    } 
}

出典

2016-10-29 16:39:33 janos

それは動作します！すばらしいです！しかし、あなたは私に '。* \\ d + $'を説明することができますか？ – yaylitzis

これは正規表現です。 '\\ d +'は1桁以上の数字を意味し、 '$'はファイルの最後を意味し、 '。*'は任意の数の任意の文字を意味します。 – janos

ファイルを読み込んでそのテキストをグループ化します

答えて

関連する問題