UTF-8中国語の文字列を解析する方法

中国語を含む可能性のあるstd::stringを解析しようとしています。 哈、囉、hi、你、好、hello：文字列のために例えば、私は6弦にそれらを分けたいUTF-8中国語の文字列を解析する方法

哈囉hi你好hello

含まれています。今すぐ文字列は、テキストファイルからgetline()を使用して取得します。この記事How to use boost::spirit to parse UTF-8?を参照すると、ここに私の現在のコードは次のとおりです。

#include <boost/regex/pending/unicode_iterator.hpp> 
#include <boost/spirit/include/qi.hpp> 
#include <boost/range.hpp> 
#include <iterator> 
#include <iostream> 
#include <ostream> 
#include <cstdint> 
#include <string> 

using namespace boost; 
using namespace std; 
using namespace std::string_literals; 

int main() 
{ 
    string str = u8"哈囉hi你好hello"; //actually got from getline() 
    auto &&utf8_text = str; 

    u8_to_u32_iterator<const char*> 
     tbegin(begin(utf8_text)), tend(end(utf8_text)); 

    vector<uint32_t> result; 
    spirit::qi::parse(tbegin, tend, *spirit::standard_wide::char_, result); 
    for(auto &&code_point : result) { 
     cout << code_point << ";"; 
    } 
}

しかし、エラーが発生しました：「開始」に電話し、「終了」は曖昧です。 auto &&utf8_text = u8"哈囉hi你好hello"を直接宣言すると動作しますが、文字列の内容がgetline()であるため、この方法では書き込めません。

また、私はこの試みた：

auto str = u8"你好，世界！"; 
auto &&utf8_text = str;

をまだエラーました：「始まり」と「終わり」への呼び出しに該当する機能を。

出典

2016-12-11 Jean Chen

あなたは 'u8_to_u32_iterator 'を試しましたか？ 'begin（）'と 'end（）'の戻りポインタは保証されていません。 – codicodi

ありがとうございます。それは今働く。 –

auto文字列リテラルでは、charポインタが返されます。 std::stringが必要な場合は、それを書き留めてください。

出典

2016-12-11 09:25:15 deviantfan

UTF-8中国語の文字列を解析する方法

答えて

関連する問題