odoを使用してAWSでCSV - > postgresをロードする

私はかなり簡単な操作をしようとしていますが、odoが壊れているか、このパッケージのコンテキストでdatashapesがどのように機能するのか分かりません。odoを使用してAWSでCSV - > postgresをロードする

CSVファイル：

email,dob 
[email protected],1982-07-13 
[email protected],1997-01-01 
...

コード：

from odo import odo 
import pandas as pd 

df = pd.read_csv("...") 
connection_str = "postgresql+psycopg2:// ... " 

t = odo('path/to/data.csv', connection_str, dshape='var * {email: string, dob: datetime}')

エラー：私はデータフレームから直接移動しようとすると、それは同じエラーだ

AssertionError: datashape must be Record type, got 0 * {email: string, dob: datetime}

- > Postgresも同様：

t = odo(df, connection_str, dshape='var * {email: string, dob: datetime}')

問題を解決しない他の方法：1）CSVファイルからヘッダー行を削除する、2）varをDataFrameの実際の行数に変更する。

私はここで間違っていますか？

出典

2017-09-18 lollercoaster

あなたはpd.to_sqlを試してみましたか？ csvをpostgresテーブルに保存しようとしているようですか？ https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html – wkzhu

はい、本当に遅いです。 'odo'はpostgresのコピー内部構造を使って、はるかに迅速にそれを行うことになっています：http://odo.pydata.org/en/latest/perf.html – lollercoaster

私は' odo'に慣れていませんが、できます。あなた自身を素早くローディングするhttps://stackoverflow.com/questions/41875817/write-fast-pandas-dataframe-to-postgres/ – Michael

connection_strにはテーブル名がありますか？同様の問題に遭遇したときにsqliteデータベースを使用したときに修正されました。

'connection_str' の 'データ' は、あなたの新しいテーブルの名前です

connection_str = "postgresql+psycopg2://your_database_name::data" 
t = odo(df, connection_str, dshape='var * {email: string, dob: datetime}')

：

のようなものであるべき。

も参照してください：

python odo sql AssertionError: datashape must be Record type, got 0 * {...}

https://github.com/blaze/odo/issues/580

出典

2017-10-03 23:36:52 mbyim

odoを使用してAWSでCSV - > postgresをロードする

答えて

関連する問題