Java - テキストファイルからunicodeを印刷しても、対応するUTF-8文字が出力されない

このテキストファイルには多数のユニコードがあり、対応するUTF-8文字をコンソールで印刷しようとしています。私が値のいずれかをコピーしてSystem.outに貼り付けるのと同じように、それはうまく動作しますが、テキストファイルから読み込むときはそうではありません。Java - テキストファイルからunicodeを印刷しても、対応するUTF-8文字が出力されない

以下は、\ u00C0、\ u00C1、\ u00C2、\ u00C3のような値の行を含む、ファイルを読み取るためのコードです。これは、コンソールに出力され、必要な値ではありません。

private void printFileContents() throws IOException { 
    Path encoding = Paths.get("unicode.txt"); 
    try (Stream<String> stream = Files.lines(encoding)) { 

     stream.forEach(v -> { System.out.println(v); }); 

    } catch (IOException e) { 
     e.printStackTrace(); 
    } 
}

これは私が最初の場所でユニコードを持っていたHTMLを解析するために使用する方法です。

private void parseGermanEncoding() { 

    try 
    { 
     File encoding = new File("encoding.html"); 

     Document document = Jsoup.parse(encoding, "UTF-8", "http://example.com/"); 

     Element table = document.getElementsByClass("codetable").first(); 

     Path f = Paths.get("unicode.txt"); 

     try (BufferedWriter wr = new BufferedWriter(new FileWriter(f.toFile()))) 
     { 
      for (Element row : table.select("tr")) 
      { 
       Elements tds = row.select("td"); 

       String unicode = tds.get(0).text(); 

       if (unicode.startsWith("U+")) 
       { 
        unicode = unicode.substring(2); 
       } 

       wr.write("\\u" + unicode); 
       wr.newLine(); 

      } 
      wr.flush(); 
      wr.close(); 
     } 

    } catch (IOException e) 
    { 
     e.printStackTrace(); 
    } 
}

出典

2017-07-08 sean le roy

あなただけの '\ u00C2'を書くというように、あなたのファイルにしましたか？テキストファイルの一部を表示してください –

テキストファイルは次のようになります。「\ u00C0 \ u00C1 \ u00C2 \ u00C3 \ u00C4 \ u00C5 \ u00C6 \ u00C7 \ u00C8 \ u00C9 \ u00CA \ u00CB \ u00CC \ u00CD \ u00CE \ u00CF \ u00D0 \ u00D1 \ u00D2 \ u00D3 \ u00D4 ' –

申し訳ありませんが、印刷が正しくありません。基本的にこれらの値はそれぞれ別の行にあります。 –

上記のOTMのコメントのおかげで、私はこれに対する実際の解決策を得ることができました。 Unicode文字列を取得し、Integer.parseInt（）を使用して16進数に変換し、最後にcharにキャストして実際の値を取得します。このソリューションは、OTMによって提供されるこのポストに基づいている - How to convert a string with Unicode encoding to a string of letters

private void printFileContents() throws IOException { 
    Path encoding = Paths.get("unicode.txt"); 

    try (Stream<String> stream = Files.lines(encoding)) { 
     stream.forEach(v -> 
     { 
      String output = ""; 

      // Takes unicode digits and converts to HEX value 
      int parse = Integer.parseInt(v, 16); 

      // Get the actual value of the hex value 
      output += (char) parse; 

      System.out.println(output); 
     }); 

    } catch (IOException e) { 
     e.printStackTrace(); 
    } 
}

出典

2017-07-09 19:49:41

あなたはUTF-8でエンコードされた文字列にUnicode文字エンコードされた文字列から文字列を変換する必要があります。あなたはステップを踏むことができます。1. myString.getBytes（ "UTF-8"）を使用して文字列をバイト配列に変換し、2.新しい文字列（byteArray、 "UTF-8"）を使用してUTF-8エンコード文字列を取得します。コードブロックはUnsupportedEncodingExceptionのtry/catchで囲む必要があります。

出典

2017-07-08 17:47:20 OTM

まだ動作しません。私の方法は次のようになります。 'Path encoding = Paths.get（" unicode.txt "）; \t \t \t \t System.out.println（ "\ u00D9 \ u00FC \ u00C2 \ u00C7 Acme、Inc."）; \t \t \t \t試み（ストリームストリーム= Files.lines（符号化））{ \t \t \tストリーム。forEachの（V - > \t \t \t { \t \t \t \t \t \t \t \t \t \t \t \t { \t \t \t \t \tバイト[]バイト=のv.getBytes（ "UTF-8"）を試みる。 \t \t \t \t \t文字列str = new文字列（バイト、 "UTF-8"）; \t \t \t \t \t System.out.println（str）; \t \t \t \t \t \t \t \t \t}キャッチ（にUnsupportedEncodingException電子） \t \t \t \t { \t \t \t \t \t \t \t \t \t \t e.printStackTrace（）; \t \t \t \t} \t \t \t \t \t \t \t}） ' –

コードをコメントにも印刷されません。私は最初のものが私が望む正しい文字を印刷して、この中に別のシステムを含めました。 –

投稿のオリジナルコードでは、stream.forEach（System.out :: println）を試してみることができます。？ – OTM

Java - テキストファイルからunicodeを印刷しても、対応するUTF-8文字が出力されない

答えて

関連する問題