二つのデータフレーム、他よりも多くの列を持つ1 - >

[OK]を減算し、コンバイン、私はタイトルが少し混乱するかもしれないが、私はこれを詳細に説明しようとします知っている：二つのデータフレーム、他よりも多くの列を持つ1 - >

私は、Python 3.5を使用しています。 2：

私はpandas経由で読み込んだ2つの.csvファイルを2つの別々のデータフレームに変換しています。最初のデータフレーム（XYZ.csvから来る）は次のようになります。

ip    community 
10.0.0.1  OL123 
. 
. 
. 
123.12.5.31 IK753

秒（たExport.csv）単に「IP」列があります。

今、私が何をしたい：

私は2つのデータフレームを比較し、その結果として、第三のデータフレーム（またはリスト）を取得したい最初のデータフレームではなく内にあるすべてのIP-アドレスが含まれています彼らの相関するコミュニティの他の。これまでのところ、2つのデータフレームにコミュニティが含まれている限り、2つを比較して適切な結果を得ました。私は手作業で2つ目のexport.csvを作成しましたが、残念ながらこれを自動化することはできません。そのため、コミュニティを含む2つ目のデータフレームなしで作業する必要があります。今、私が必要とするすべては、私が作成した2つのリストを比較し、それが割り当てられている「IP」と「コミュニティ」を保つことができるものです

def compare_csvs(): 
     timestamp = time.strftime("%Y-%m-%d") 

    # Reads XYZ.csv and creates list that contains all ip addresses in integer format. 
     A = pd.read_csv("XYZ.csv", index_col=False, header=0) 
     ips1 = A.ip.tolist() 
     comu1 = A.ro_community.tolist() 
     AIP = [] 
     for element1 in ips1: 
        AIP.append(int(ipaddress.IPv4Address(element1))) 
     IPACOM1 = zip(AIP,comu1)    

    # Reads export.csv and creates list that contains all ip addresses in integer format. 
     B = pd.read_csv("export" + timestamp + ".csv", index_col=False, header=0) 
     ips2 = B.ip.tolist() 
     comu2 = B.ro_community.tolist() 
     BIP = [] 
     for element2 in ips2: 
        BIP.append(int(ipaddress.IPv4Address(element2))) 
     IPACOM2 = zip(BIP,comu2) 

    # Creates a set that contains all ip addresses (in integer format) that exist inside the XYZ.csv but not the export.csv. 
     DeltaInt = OrderedSet(IPACOM1)-OrderedSet(IPACOM2) 
     List = list(DeltaInt) 
     UnzippedIP = [] 
     UnzippedCommunity = [] 
     UnzippedIP, UnzippedCommunity = zip(*List) 

    # Puts all the elements of the DeltaInt set inside a list and also changes the integers back to readable IPv4-addresses. 
     DeltaIP = [] 
     for element3 in UnzippedIP: 
       DeltaIP.append(str(ipaddress.IPv4Address(element3))) 

     IPandCommunity = zip(DeltaIP,UnzippedCommunity)

：

は、これは私のコードです。私はたくさんのことを試みましたが、私は何かを得ることができません。たぶん私はここで論理に問題があるだけです、すべての助けに感謝！

また、コードの混乱を恐れて、コードを実際に動作させると、すべてのコードを一緒に投げ捨て、コードをクリーンアップします。ここで

出典

2016-11-14 JaWi

はと遊ぶためにいくつかのダミーデータである：

これはDFです：

ip    community 
10.0.0.1  OL123 
10.1.1.1  ACLSH 
10.9.8.7  OKUAJ1 
123.12.5.31  IK753 

df = pd.read_clipboard()

これはたExport.csvです：

s_export = pd.Series(s_export = pd.Series(name='ip', data=['10.1.1.1','123.12.5.31', '0.0.0.0']) 

s_export 

0  10.1.1.1 
1 123.12.5.31 
2  0.0.0.0 
Name: ip, dtype: object

にないものを選択するにはエクスポートするには、isin()：

# ~ means 'not', so here that's "find df.ip that is NOT in s_export" 
# Store result in a dataframe 
df_exclude = df[~df.ip.isin(s_export)] 


df_exclude 
     ip community 
0 10.0.0.1  OL123 
2 10.9.8.7 OKUAJ1

を使用してブーリアンインデックスを使用するだけです

出典

2016-11-14 10:40:43

isin（）のコンセプトは役に立つと思われますが、残念ながら正しい結果は得られません。結果には単にデータフレーム "df"全体が含まれ、export.csv内のデータフレームは除外されません。 – JaWi

私があなたに与えた具体例を実行しようとしましたか？私はそれが動作することを100％確信している –

ああ、私は問題を発見した。 "B"データフレームをパラメータ "squeeze = True"を持つシリーズに変換するとすぐに動作します。出力として、関連するIPアドレスとそのコミュニティを含むデータフレームを取得しました。それは正しい方向への大きな一歩です、ありがとうございます！ – JaWi

二つのデータフレーム、他よりも多くの列を持つ1 - >

答えて

関連する問題