私はpythonでjsonファイルを解析する問題があります。ValueError:無効な制御文字:行115076列173
私のコードはここにある:
import json
from pprint import pprint
with open('review_sample.json') as data_file:
data = json.load(data_file)
pprint(data)
JSONファイル形式はこちらです:
{
"table": "TempTable",
"rows":
[
{
"comment_id": "R1KLDHE77IOLUM",
"crawl_time": "2015-07-17 22:55:16",
"title": "Excellent TV, excellent price... but look out for bugs.",
"overall_rating": "5",
"purchase": "Verified Purchase",
"comment": "This is an excellent TV at an excellent price. For those who say that you can't tell the difference between 4k and ***p, I disagree. I compared this side by side to my *** LG 55' ***p set, and the resolution and sharpness of the image is just no comparison. Can you see an individual pixel from a normal viewing distance on either set? Of course not. But you can see when things start to get fuzzy and pixelated with a large ***p set, and that simply is not an issue with 4K. Picture quality is outstanding but you will want to tweak picture settings - I find that 'Standard' and 'Photo' modes are the best right out of the box, but worth customizing. I also turned off TruMotion, which seemed to be creating some lag when gaming, and is also a bit unsettling for movies and TV (which are usually filmed in 24 and 30 FPS, respectively, rather than 120 FPS TruMotion). 4K playback from Netflix and Amazon Instant Video are superb, as is upscaling from a ***p source. I was surprised how great Battlefield Hardline looked when upscaled to 4K. Overall, WebOS 2.0 is a joy to use, though I'm not a huge fan of the Smart Remote - just clunky to use and not really necessary. I had a bit of a scare when suddenly every 20th vertical row of pixels started bugging out in rainbow colors - see photos. I cycled the power and everything was fine, so I suspect that this was a software bug in the upscaling process (was playing Xbox One at ***p at the time). Will update this review if it happens again. Build quality feels good and the TV looks great - very sleek, slim, and minimal bezels.",
"site": "amazon",
"brand": "lg",
"country_code": "us",
"product_group_name": "tv",
"product_name": "smarttv",
"model_name": "4k",
"model_code": "*UF7600"
}
]
}
私はいくつかのレビューを持っている場合、それは問題になりません。しかし、私は完全なjsonファイル(レビューの多く)をロードすると、値のエラーが発生します。 エラーメッセージはこちらです。
Traceback (most recent call last):
File "D:/kaggle/word2vec/server.py", line 11, in <module>
data = json.load(data_file)
File "C:\Anaconda2\lib\json\__init__.py", line 291, in load
**kw)
File "C:\Anaconda2\lib\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\Anaconda2\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Anaconda2\lib\json\decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 115076 column 173 (char 8301811)
Process finished with exit code 1
助けてください。
ですから、私たちを見るの例では動作しますが、はるかに大きいの例ではないでしょうか?あなたはテキストエディタでそれを開き、行115076を見て、何かが分かりにくいカンマのようなものかどうかを調べることができます。それに失敗した場合は、115076行目まで行項目を削除し、まだ失敗しているかどうかをテストします。 – tdelaney
問題のある領域を囲むようなものを印刷して、 'f = open( 'review_sample.json'、encoding = 'utf-8')のような明白なものがあるかどうかを確認することもできます。 f.seek(8301811-200); print(f.read(300)) 'を実行します。無効な文字が何であるかを確認するのに十分なコンテキストを得ることができるかどうかを試してみてください。 – tdelaney
@tdelaney 'EOF'文字です。なぜ 'json'を使うのですか?読めない文字を無視するには? 'json.loads()'の前に文字列変数をクリアして追加してください。 – dsgdfg