2つのファイル(最初のファイルから2番目のファイルを検索して2番目のファイル全体を検索)を比較し、fileA.txtからfileB.txtの終わりまでの行を書き留めます。2つのファイルを比較してPythonの違いを比較します
import difflib
file1 = "fileA.txt"
file2 = "fileB.txt"
diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),
が、結果には、私は、各ラインに適したタグを持つ2つのファイルの組み合わせを持っている:最初の時点で私はこのような簡単なプログラムアボと思ったので、私のpythonに新しいです。私はタグ " - "で行開始を探してからfileB.txtファイルの最後に書き込むことができますが、膨大なファイル(〜100 MB)でこのメソッドは非効率的です。誰かがプログラムを改善するために私を助けることができますか?
ファイルの構造は次のようになります。
入力:
fileA.txt
Oct 9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct 9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct 9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct 9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)
fileB.txt
Oct 9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct 9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct 9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct 9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct 9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
出力:
FILEB
Oct 9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct 9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct 9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct 9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct 9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct 9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)
だから、基本的には、マージしたいです2つのテキストファイルが重複を維持しない? – MooingRawr