csvファイル行の列ごとにPython固有の値

これを長時間クランチする。 NumpyやPandasを使用する簡単な方法はありますか？コードを固定して行の列の一意の値を "|"で区切って取得する方法はありますか？csvファイル行の列ごとにPython固有の値

すなわちデータ：

"id","fname","lname","education","gradyear","attributes" 
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL" 
"2","john","doe","htw","2000","dev"

出力は次のようになります。

"id","fname","lname","education","gradyear","attributes" 
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL" 
"2","john","doe","htw","2000","dev"

マイ壊れコード：

import csv 
import pprint 

your_list = csv.reader(open('out.csv')) 
your_list = list(your_list) 

#pprint.pprint(your_list) 
string = "|" 
cols_no=6 
for line in your_list: 
    i=0 
    for col in line: 
     if i==cols_no: 
     print "\n" 
     i=0 
     if string in col: 
     values = col.split("|") 
     myset = set(values) 
     items = list() 
     for item in myset: 
      items.append(item) 
     print items 
     else: 
     print col+",", 
     i=i+1

それは出力：

id, fname, lname, education, gradyear, attributes, 1, john, smith, ['harvard', 'ft', 'mit'] 
['2003', '212', '207'] 
['qa', 'admin,co', 'NULL', 'master'] 
2, john, doe, htw, 2000, dev,

を

ありがとうございます！

出典

2016-09-15 android_dev

をhttp://stackoverflow.com/questions/39504079/take-column-of-string-data-in-pandas-dataframe-and-split-intoを見てください-separate-columnsとhttp://stackoverflow.com/questions/39500258/pandas-how-to-get-the-unique-values-of-a-column-that-contains-a-list-of-values – danio

numpy/pandasはのためのビットやり過ぎですcsv.DictReaderとcsv.DictWriterでcollections.OrderedDictを使用して達成できるもの、たとえば

import csv 
from collections import OrderedDict 

# If using Python 2.x - use `open('output.csv', 'wb') instead 
with open('input.csv') as fin, open('output.csv', 'w') as fout: 
    csvin = csv.DictReader(fin) 
    csvout = csv.DictWriter(fout, fieldnames=csvin.fieldnames, quoting=csv.QUOTE_ALL) 
    csvout.writeheader() 
    for row in csvin: 
     for k, v in row.items(): 
      row[k] = '|'.join(OrderedDict.fromkeys(v.split('|'))) 
     csvout.writerow(row)

はあなたに与える：

"id","fname","lname","education","gradyear","attributes" 
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL" 
"2","john","doe","htw","2000","dev"

出典

2016-09-15 11:38:31

ありがとう!!!あなたのPythonスキルはロック！ –

あなたは|で区切っ多くの項目がある場合、これは動作します順序を気にしない場合：

lst = ["id","fname","lname","education","gradyear","attributes", 
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL", 
"2","john","doe","htw","2000","dev"] 

def no_duplicate(string): 
    return "|".join(set(string.split("|"))) 

result = map(no_duplicate, lst) 

print result

結果：

['id', 'fname', 'lname', 'education', 'gradyear', 'attributes', '1', 'john', 'smith', 'ft|harvard|mit', '2003|207|212', 'NULL|admin,co|master|qa', '2', 'john', 'doe', 'htw', '2000', 'dev']

出典

2016-09-15 11:31:30 Julien

Ifあなたは注文を気にする、あなたはno_duplicateの中でset（）の代わりにhttp://stackoverflow.com/a/480227/12663を使用することができます – danio

あなたの答えをありがとう –

csvファイル行の列ごとにPython固有の値

答えて

関連する問題