2017-08-14 14 views
1

新しいが、Pythonに興奮し、あなたの助言が必要です。私はnmapのスキャンに基づいて2つのCSVファイルを比較するために、次のコードを思い付いた:パンダ:csvファイルのデータフレームに列名を追加する方法

import pandas as pd 
from pandas import DataFrame 
import os 
file = raw_input('\nEnter the Old CSV file: ') 
file1 = raw_input('\nEnter the New CSV file: ') 
A=set(pd.read_csv(file, index_col=False, header=None)[0]) 
B=set(pd.read_csv(file1, index_col=False, header=None)[0]) 
final=list(A-B) 
df = pd.DataFrame(final, columns=["host"]) 
df.to_csv('DIFF_'+file) 

print "Completed!" 

私はそれを実行したときに、私は次のような結果だ: を、

host 
0,82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
1,82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 

私の質問を追加する方法でありますホスト名、ポート、ポート名、状態など 私は試しました: df ['hostname'] = range(1、len(df)+ 1)しかし、私はExcelでファイルを開くとホストと一緒に最初の列にホスト名を追加

+0

あなただけの最初のすべての列を比較したりしたいですか? – jezrael

答えて

2

を動作するはずです私はあなたがパラメータsep=','read_csvが必要だと思うし、namesのための最初の列名を定義します。

file = raw_input('\nEnter the Old CSV file: ') 
file1 = raw_input('\nEnter the New CSV file: ') 

cols = ['hostname','port','portname', ...] 
A= pd.read_csv(file, index_col=False, header=None, sep=';', names=cols) 
B= pd.read_csv(file1, index_col=False, header=None, sep=';', names=cols) 

を次にすべての列を比較する必要がある場合はboolean indexingで比較するとmergeを使用します。

df = pd.merge(A, B, how='outer', indicator=True) 
df = df[df['_merge']=='left_only'].drop('_merge',axis=1) 

df.to_csv('DIFF_'+file) 

print "Completed!" 

サンプル

import pandas as pd 
from pandas.compat import StringIO 

temp=u"""82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.74;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.75;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j'] 
A = pd.read_csv(StringIO(temp), sep=";", names=cols) 
print (A) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
1 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
3 open NaN NaN syn-ack NaN 3 NaN 

temp=u"""82.214.228.75;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
82.214.228.77;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3; 
""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j'] 
B = pd.read_csv(StringIO(temp), sep=";", names=cols) 
print (B) 
     hostname       port portname a b  c \ 
0 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
1 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 

df1 = pd.merge(A, B, how='outer', indicator=True) 

print (df1) 

     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 
4 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
5 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j  _merge 
0 open NaN NaN syn-ack NaN 3 NaN left_only 
1 open NaN NaN syn-ack NaN 3 NaN  both 
2 open NaN NaN syn-ack NaN 3 NaN left_only 
3 open NaN NaN syn-ack NaN 3 NaN left_only 
4 open NaN NaN syn-ack NaN 3 NaN right_only 
5 open NaN NaN syn-ack NaN 3 NaN right_only 
#only values in A 
df1 = df1[df1['_merge']=='left_only'].drop('_merge',axis=1) 
print (df1) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
3 82.214.228.75 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
3 open NaN NaN syn-ack NaN 3 NaN 
#only values in B 
df1 = pd.merge(A, B, how='outer', indicator=True) 
df11 = df1[df1['_merge']=='right_only'].drop('_merge',axis=1) 
print (df11) 
     hostname       port portname a b  c \ 
4 82.214.228.75 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
5 82.214.228.77 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
4 open NaN NaN syn-ack NaN 3 NaN 
5 open NaN NaN syn-ack NaN 3 NaN 
#same values in both dataframes 
df12 = df1[df1['_merge']=='both'].drop('_merge',axis=1) 
print (df12) 
     hostname       port portname a b  c \ 
1 82.214.228.70 dsl-radius-01.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
1 open NaN NaN syn-ack NaN 3 NaN 

しかし、もしboolean indexingを反転させるためのマスクのための唯一の最初の列hostname使用isin~を比較する必要があります。

df2 = A[~A['hostname'].isin(B['hostname'])] 
print (df2) 
     hostname       port portname a b  c \ 
0 82.214.228.71 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 
2 82.214.228.74 dsl-radius-02.direcpceu.com  PTR tcp 111 rpcbind 

     d e f  g h i j 
0 open NaN NaN syn-ack NaN 3 NaN 
2 open NaN NaN syn-ack NaN 3 NaN 
+0

ちょっとJez.Thanks!もう一度やり直してください。 –

+0

はい、確かです。小さな通知 - csvにcsvヘッダがある場合は、パラメータ 'header = None'とパラメータ' names'を削除してください。 – jezrael

+0

Perfect Jez!魅力的に働いた! sep = ';'を追加するだけでした。 df.to_csv( 'DIFF _' + file、sep = ';')と私は私が欲しかったものを得ました:)私はあなたが気にしないなら、 ホストホスト名hostname_typeプロトコルポート\ 24 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 32 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111 –

1

データフレームを定義する場所にラベルを追加できます。たとえば、次のように

df = pd.DataFrame(final, columns=["host"].append([x for x in range(1, len(df) + 1)])) 
+0

ありがとうAmit!試してみてください –

+0

おかげでAmit.Thisも良いです! –

+0

フィードバックのための@IvanMadolevありがとう – Amit

関連する問題