csv
モジュールは、この問題のためにcsv snifferを使用することをお勧めしているようです。
これらは、次の例を示しています。
with open('example.csv', 'rb') as csvfile: # python 3: 'r',newline=""
dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,")
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
のは、それを試してみましょう。
[9:13am][[email protected] /tmp] cat example
#!/usr/bin/env python
import csv
def parse(filename):
with open(filename, 'rb') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,')
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
for line in reader:
print line
def main():
print 'Comma Version:'
parse('comma_separated.csv')
print
print 'Semicolon Version:'
parse('semicolon_separated.csv')
print
print 'An example from the question (kingdom.csv)'
parse('kingdom.csv')
if __name__ == '__main__':
main()
そして、我々のサンプル入力
[9:13am][[email protected] /tmp] cat comma_separated.csv
test,box,foo
round,the,bend
[9:13am][[email protected] /tmp] cat semicolon_separated.csv
round;the;bend
who;are;you
[9:22am][[email protected] /tmp] cat kingdom.csv
ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes
1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document
そして、我々は例のプログラム実行した場合:
[9:14am][[email protected] /tmp] ./example
Comma Version:
['test', 'box', 'foo']
['round', 'the', 'bend']
Semicolon Version:
['round', 'the', 'bend']
['who', 'are', 'you']
An example from the question (kingdom.csv)
['ReleveAnnee', 'ReleveMois', 'NoOrdre', 'TitreRMC', 'AdopCSRegleVote', 'AdopCSAbs', 'AdoptCSContre', 'NoCELEX', 'ProposAnnee', 'ProposChrono', 'ProposOrigine', 'NoUniqueAnnee', 'NoUniqueType', 'NoUniqueChrono', 'PropoSplittee', 'Suite2LecturePE', 'Council PATH', 'Notes']
['1999', '1', '1', '1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC', 'U', '', '', '31999D0083', '1998', '577', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']
['1999', '1', '2', '1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes', 'U', '', '', '31999D0081', '1998', '184', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']
をそれはおそらく私が使用しているのpythonのバージョンは注目に値します。私はこれを完全に一般解(区切り文字は私のデータフィールドの一部が;
を含めることができるようにする必要があるということであると私は,
を使用するかもしれない理由の一つが存在することができるとは思わない
[9:20am][[email protected] /tmp] python -V
Python 2.7.2
こんにちは、より一般的な議論(非pythonで)https://stackoverflow.com/questions/2789695/how-to-programmatically-guess-whether-a-csv-file-is-comma-or-セミコロン区切り – Lorenzo