Pandas DataFrame列を変換するための最良の方法

私はpandasを初めて使いました。このデータ変換を実行する最良の方法については疑問があります。以下の方法は機能しますが、よりきれいに/効率的に行うことができると感じています。Pandas DataFrame列を変換するための最良の方法

私は形態にすることができるオフィスの情報があります。

"<建物を>/<オフィス>"
"< >構築"
<建物番号>（int型）
''（空文字列）
なし

これをBuilding and Officeの列に変換したいと思います。

が

df = pandas.DataFrame({ "Office" : [ "Building Foo/10", "Building Only", None, 100, ""]}) 
df 

    Office 
0 Building Foo/10 
1 Building Only 
2 None 
3 100 
4

私は経由して、それを変換することができます：

は、データを考えると

items = [ (str(row["Office"]) or '').rsplit('/', 1) for _, row in df.iterrows() ] 
items = [ item if len(item) == 2 else (item[0] or None, None) for item in items ] 

df["Building"], df["Office"] = zip(*items) 
df 

    Office Building 
0 10  Building Foo 
1 None Building Only 
2 None None 
3 None 100 
4 None None

パンダを使用してこれを行うための最善の方法は何ですか？

ありがとうございました！

出典

2017-05-03 David Brownell

IMOあなたDATAFRAMEが間違っています。各オフィスはフレーム内に独自の回線が必要です。あなたはシリーズが何であるか知っていますか？ DataFrameは_n_シリーズのテーブルです。 – Elmex80s

あり、これを行うには最善方法はおそらくありませんが、ここで十分です一つだ：

pd.DataFrame([(None,None) if not o else 
       (None, o) if isinstance(o, int) else 
       tuple(o.split("/")) for o in df.Office], 
      columns=("Building", "Office")) 
#  Building Office 
#0 Building Foo  10 
#1 Building Only None 
#2   None None 
#3   None 100 
#4   None None

あなたはapplyと同じ結果を得ることができますが。後者の方法では、行インデックスを維持します。

df['Office'].apply(lambda x: 
        pd.Series((None,None) if not x else 
          (None, x) if isinstance(x, int) else 
          tuple(x.split("/")))) 
#    0  1 
#0 Building Foo 10 
#1 Building Only NaN 
#2   None None 
#3   None 100 
#4   None None

（列の名前を変更することを忘れないでください）。

出典

2017-05-03 19:26:21 DyZ

私はそれをこのようにしてください：

In [99]: df.Office = df.Office.astype(str) 

In [100]: df[['Building','Office']] = \ 
       df.Office.str.replace(r'(\d+)', r'/\1').str.split(r'\/+', expand=True) 

In [101]: df 
Out[101]: 
    Office  Building 
0  10 Building Foo 
1 None Building Only 
2 None   None 
3 100 
4 None

出典

2017-05-03 19:48:40 MaxU

Pandas DataFrame列を変換するための最良の方法

答えて

関連する問題