Python Pandasはデータフレームを追加します

私には、.csvファイルにUUIDカラムを追加するケースがあります。同時に、私はソースファイルをチェックし、それらを処理済みファイルと比較しています - ソースファイルに追加の行がある場合、それらの新しい行をデスティネーションファイルに追加する予定です。ファイルを上書きしないで追加する必要がある理由は、以前に処理された行のUUIDを同じに保つ必要があるためです。Python Pandasはデータフレームを追加します

行を追加する場合は、ソースと出力先ファイルの行数が同じかどうかをチェックします。そうでない場合は、宛先ファイルの行数と等しい行番号からのデータ（ソースファイルから）を持つ新しいデータフレームを作成します。

その時点で、私は新しく作成されたデータフレームを宛先データフレームに追加しようとしましたが、失敗し続けます。私は次のエラーを受け取る：私は使用しています

> RuntimeWarning: '<' not supported between instances of 'int' and 
> 'str', sort order is undefined for incomparable objects result = 
> result.union(other)

コードは以下の通りです：

import os, uuid 
import pandas as pd 


def process_files(): 
    source_dir = "C:\\Projects\\test\\raw" 
    destination_dir = "C:\\Projects\\test\\processed" 

    for file_name in os.listdir(source_dir): 
     if file_name.endswith((".csv", ".new")): 
      df_source = pd.read_csv(source_dir + "/" + file_name, sep=";") 

      if os.path.isfile(destination_dir + "/" + file_name): 
       df_destination = pd.read_csv(destination_dir + "/" + file_name, sep=",", header=None) 

       if df_source.shape[0] != (df_destination.shape[0]): 
        df_newlines = pd.read_csv(source_dir + "/" + file_name, sep=";", skiprows=df_destination.shape[0], header=None) 
        df_newlines.insert(0, "uu_id", pd.Series([uuid.uuid4() for i in range(len(df_newlines))])) 
        df_destination.append(df_newlines, ignore_index=True) 
        df_destination.to_csv(destination_dir + "/" + file_name, sep=",", header=False, mode="w", index=False) 
       else: 
        continue 
      else: 
       df_source.insert(0,"uu_id", pd.Series([uuid.uuid4() for i in range(len(df_source))])) 
       df_source.to_csv(destination_dir + "/" + file_name, sep=",", header=False, mode="w", index=False) 
     else: 
      continue 


process_files()

私は、彼らは、列ごとに一致して、両方のデータフレームのdtypesをチェックしています。私はまた、列の名前を同じ文字列にすることを余儀なくされましたが、そのトリックはしません。 append行をコメントアウトしてappend（問題なしでスクリプトを実行する）に間違って何をしているのかわかりません。

は

、

出典

2017-11-30 Bostjan

を Bostjanをあなたとよろしくお願い免責事項：原因評判ポイントの欠如に、私は通常、appendが代わりに使用されていない

をコメントすることは許されないのです。したがって、私は

df_destination = df_destination.append(df_newlines, ignore_index=True)

希望だと言います。

それ以外は、os.walkとfnmatchを使用してファイルを参照することをおすすめします。

出典

2017-12-14 15:50:55 Eulenfuchswiesel

こんにちは！本当に助けてくれてありがとうございます。一方、私はその間に回避策を講じました（誰もがそれが役に立つと思うように）。 append（）を使用する代わりに、欠落した行を含む新しいデータフレームを作成し、modeパラメータを "a"に設定して.to_csv（）を使用しました。最高の、Bostjan – Bostjan

Python Pandasはデータフレームを追加します

答えて

関連する問題