2016-11-25 17 views
0

csvデータを書き込もうとしていますが、csvファイルの各単語の直後にエスケープシーケンスキーを取得し続けます。すべての単語の後にエスケープ文字を追加するPython csv writer

セットアップ:

with open('gibber.csv', 'wb') as csvfile: 
    writer = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_NONE, escapechar=" ") 
    for values in izip_longest(*csv_data, fillvalue="-,-"): 
     writer.writerow([unicode(s).encode("utf-8") for s in values]) 
csvfile.close() 

私は上記のようwriter.writerow(...)をプリントアウトした場合、以下の行がSAMPEです。

['dipey,1', 'you have,2', 'at the beginning,1', 'brilliant charles brown truly,1', 'great the first also was,1', 'identical to this one as far,1', 'be when pie mood mark lake a,1', 'shardely uptown is you free on a stone,1', 'let it rest and sun it those it super,1'] 

私はかなり私はCSVライターはすべての単語の後にエスケープシーケンスを配置している理由についてを検索することができますすべてのものをthisのようなthings多くを試してみましたが、しましたか?

私の所望の出力は、この

-------------------------------------------------------------------------- 
word1 | word_count1 | word2 | word_count2 | .. wordN | word_countN 
-------------------------------------------------------------------------- 
word |  3  | word word |  7  | .............. N 

のようなものでなければなりませんが、代わりに、私は私のEscapeCharにとして空白を使用して、この

[] = escapecharacter 
-------------------------------------------------------------------------- 
word1 | word_count1 | word2 | word_count2 | .. wordN | word_countN 
-------------------------------------------------------------------------- 
word[]|  3  |word[] word[]| 7  | .............. N 

のようなものを取得していますが、私はすべての単語の後に余分なスペースを取得します。タブまたは改行を使用すると、行/列のレイアウトが損なわれます。単一の文字、数字、または\を使用すると、そのエスケープ文字が行項目の右端に置かれますが、二重スペースはなくなります。

Iは、上記投稿サンプルリストはwriter.writerow(...)

テストデータ

data0 = unicode("Rainforests are forests characterized by high rainfall, with annual rainfall between 250 and 450 centimetres (98 and 177 in).[1] There are two types of rainforest: tropical rainforest and temperate rainforest. The monsoon trough, alternatively known as the intertropical convergence zone, plays a significant role in creating the climatic conditions necessary for the Earth's tropical rainforests. Around 40% to 75% of all biotic species are indigenous to the rainforests.[2] It has been estimated that there may be many millions of species of plants, insects and microorganisms still undiscovered in tropical rainforests. Tropical rainforests have been called the \"jewels of the Earth\" and the \"world's largest pharmacy\", because over one quarter of natural medicines have been discovered there.[3] Rainforests are also responsible for 28% of the world's oxygen turnover, sometimes misnamed oxygen production,[4] processing it through photosynthesis from carbon dioxide and consuming it through respiration. The undergrowth in some areas of a rainforest can be restricted by poor penetration of sunlight to ground level. If the leaf canopy is destroyed or thinned, the ground beneath is soon colonized by a dense, tangled growth of vines, shrubs and small trees, called a jungle. The term jungle is also sometimes applied to tropical rainforests generally.", "utf-8") 

data1 = unicode("Tropical rainforests are characterized by a warm and wet climate with no substantial dry season: typically found within 10 degrees north and south of the equator. Mean monthly temperatures exceed 18 °C (64 °F) during all months of the year.[5] Average annual rainfall is no less than 168 cm (66 in) and can exceed 1,000 cm (390 in) although it typically lies between 175 cm (69 in) and 200 cm (79 in).[6] Many of the world's tropical forests are associated with the location of the monsoon trough, also known as the intertropical convergence zone.[7] The broader category of tropical moist forests are located in the equatorial zone between the Tropic of Cancer and Tropic of Capricorn. Tropical rainforests exist in Southeast Asia (from Myanmar (Burma) to the Philippines, Malaysia, Indonesia, Papua New Guinea, Sri Lanka, Sub-Saharan Africa from Cameroon to the Congo (Congo Rainforest), South America (e.g. the Amazon Rainforest), Central America (e.g. Bosawás, southern Yucatán Peninsula-El Peten-Belize-Calakmul), Many Australia, and on many of the Pacific Islands (such as Hawaiʻi). Tropical forests have been called the \"Earth's lungs\", although it is now known that rainforests contribute little net oxygen addition to the atmosphere through photosynthesis", "utf-8") 

data2 = unicode("Tropical forests cover many a large part of the globe, but temperate rainforests only occur in few regions around the world. Temperate rainforests are rainforests in temperate regions. They occur in North America (in the Pacific Northwest in Alaska, British Columbia, Washington, Oregon and California), in Europe (parts of the British Isles such as the coastal areas of Ireland and Scotland, southern Norway, parts of the western Balkans along the Adriatic coast, as well as in Galicia and coastal areas of the eastern Black Sea, including Georgia and coastal Turkey), in East Asia (in southern China, Highlands of Taiwan, much of Japan and Korea, and on Sakhalin Island and the adjacent Russian Far East coast), in South America (southern Chile) and also in Australia and New Zealand.[10]", "utf-8") 

にIが通過リストの一例であるサンプルcsv_datahere インポートPPRINT PPフルデータを参照= pprint.PrettyPrinter(インデント= 4) pp.pprint(csv_data)

[ [ u'shrubs,1', 
     u'chile,1', 
     u'equatorial,1', 
     u'china,1', 
     u'may,1', 
     u'zone7,1'], 
    [ u'washington oregon,1', 
     u'new zealand10,1', 
     u'moist forests,1', 
     u'biotic species,1', 
     u'and tropic,1', 
     u'term jungle,1', 
     u'sometimes misnamed,1', 
     u'japan and,1', 
     u'the world,1', 
     u'200 cm,1', 
     u'between the,1', 
     u'canopy is,1', 
     u'as hawaii,1', 
     u'and temperate,1', 
     u'many australia,1', 
     u'but temperate,1'], 
    [ u'cancer and tropic,1', 
     u'black sea including,1', 
     u'asia in southern,1', 
     u'some areas of,1', 
     u'also known as,1', 
     u'as well as,1', 
     u'areas of a,1', 
     u'central america eg,1', 
     u'250 and 450,1'], 
    [ u'rainforest the monsoon trough,1', 
     u'shrubs and small trees,1',u'dense tangled growth of,1', 
     u'of the british isles,1'], 
    [ u'sometimes misnamed oxygen production4 processing,1', 
     u'a significant role in creating,1', 
     and,1', 
     u'are also responsible for 28 of the worlds oxygen,1', 
     u'the climatic conditions necessary for the earths tropical rainforests,1', 
     u'growth of vines shrubs and small trees called a,1', 
     u'columbia washington oregon and california in europe parts of,1']] 

上記のサンプルデータから見ると、csv_dataを転記して転記し、各行を書き出すことができます。

編集

これは私が行になりたいデータを書いています方法です。

csv_data = [] 
    for index, item in enumerate(package.count_set[0]): 
     payload = [] 
     phrase = item[0] 
     for pindex, pitem in enumerate(phrase): #pitem is a Counter 
      # print(index, pindex, " ".join(pitem), phrase[pitem]) 
      _str = " ".join(pitem) 
      _cnt = phrase[pitem] 
      _data = _str+",%d"%(_cnt) 
      payload.append(_data) 
     csv_data.append(payload) 

はので、私はまた、末尾のカンマなしで試してみたこの [ "word,count,", "word1,count1,", "word2,count2,", "wordN,countN," ]

のような項目のリストを作成 [ "word,count", "word1,count1", "word2,count2", "wordN,countN" ]

それは私がこのリスト​​を作成追加しています方法ですそれはcsv_dataに問題を列挙しますか?

+1

サンプル入力とは何ですか?予想される出力は何ですか?実際の出力は何ですか? –

+0

@MarkTolonen私はいくつかの詳細を編集しました。スペースエスケープ文字はスペースをすべての単語の右側に置いて出力を二重にするのはなぜですか?「hello word」の代わりに「hello word」と書かれます – user1610950

+0

あなたのコードに構文エラーがあるようです( 'writer。 writerow([unicode(s).encode( "utf-8")for values])values)) ')。また、入力を提供してください(特に、 'csvdata'とは何ですか?) –

答えて

0

私は通常自分の質問に答えるのは好きではありませんが、文字列を自分で作成してファイルに書き込むだけで問題は解決しました。

_range = files_to_load + 1 
with open('data.csv', 'wb') as csvfile: 
    header = (["%d word phrase, phrase count"%(i) for i in range(1, _range)]) 

    header_line = "" 
    for index, item in enumerate(header): 
     word, count = item.split(",") 
     if int(word[0]) <= 1: 
      pass 
     else: 
      word = word.replace("phrase", "phrases") 

     header_line += word+","+count+"," 
    header_line = header_line[:-1] 
    header_line += "\n" 
    csvfile.write(header_line) 

    for values in izip_longest(*csv_data, fillvalue="-,0"): 
     line_list = ([unicode(s).encode("utf-8") for s in values]) 
     line_str = "" 
     for item in line_list: 
      word, count = item.split(",") 
      line_str += word+","+count+"," 
     line_str = line_str[:-1]+"\n" 

     csvfile.write(line_str) 
csvfile.close() 

上記のコードは、おそらく多くのことをクリーンアップすることでしたが、関係なく、私が何をしたか、私はPythonのcsvモジュールは、私のデータで正しく動作するために得ることができませんでした。

これはおそらくユーザーエラーであり、私の一部ではあるがそれでもなお監視しています。上記のコードは、奇妙な人工物がなくても、私が必要とするものをcsv形式で書き出します。

関連する問題