1
python -m gensim.scripts.make_wiki
スクリプトを使用して、gensimを使用してWikipediaのダンプをプレーンテキストに変換したいと考えています。私のようにそれを使用pythonを使ってWikipediaダンプをテキストに変換する-m gensim.scripts.make_wiki
:
python -m gensim.scripts.make_wiki ./enwiki-latest-pages-articles.xml.bz2 ./results
は私の最後のエラーを与える:
2016-04-06 20:43:46,471 : INFO : storing corpus in Matrix Market format to ./results/_bow.mm
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/scripts/make_wiki.py", line 88, in <module>
MmCorpus.serialize(outp + '_bow.mm', wiki, progress_cnt=10000) # another ~9h
File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/corpora/indexedcorpus.py", line 89, in serialize
offsets = serializer.save_corpus(fname, corpus, id2word, progress_cnt=progress_cnt, metadata=metadata)
File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/corpora/mmcorpus.py", line 49, in save_corpus
return matutils.MmWriter.write_corpus(fname, corpus, num_terms=num_terms, index=True, progress_cnt=progress_cnt, metadata=metadata)
File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/matutils.py", line 486, in write_corpus
mw = MmWriter(fname)
File "/usr/local/lib/python2.7/dist-packages/gensim-0.12.3-py2.7-linux-x86_64.egg/gensim/matutils.py", line 436, in __init__
self.fout = utils.smart_open(self.fname, 'wb+') # open for both reading and writing
File "build/bdist.linux-x86_64/egg/smart_open/smart_open_lib.py", line 111, in smart_open
NotImplementedError: unknown file mode wb+
は、誰もが何が起こっているか知っていますか?コマンドラインスクリプトの