マイデータ:列を折りたたんで新しい行を挿入しますか?
df
Out[79]:
INC Theme Theme_Hat TRAIN_TEST
0 123 A NaN TRAIN
1 124 A NaN TRAIN
2 125 A NaN TRAIN
3 126 A NaN TRAIN
4 127 A NaN TRAIN
5 128 A NaN TRAIN
6 129 A NaN TRAIN
7 130 A NaN TRAIN
8 131 B NaN TRAIN
9 132 B B TEST
10 133 B A TEST
11 134 B A TEST
12 135 B A TEST
私はTRAIN_TEST
インジケータを維持しながら、Theme
列にTheme_Hat
列を崩壊しようとしています。私は以下のfor
ループを使用しましたが、私の腸はもう少しpandas
エスクの解決策があるはずです。 TRAIN
の情報が保存される代わりに、TEST
がdf
に連続して複製されているため、以下の試行は私の出力には届きません。ここに私の所望の出力です:
Out[81]:
INC Theme TRAIN_TEST
0 123 A TRAIN
1 124 A TRAIN
2 125 A TRAIN
3 126 A TRAIN
4 127 A TRAIN
5 128 A TRAIN
6 129 A TRAIN
7 130 A TRAIN
8 131 B TRAIN
9 132 B TRAIN
10 132 B TEST
11 133 B TRAIN
12 133 A TEST
13 134 B TRAIN
14 134 A TEST
15 135 B TRAIN
16 135 A TEST
は、ここで私はこれまで何をやったかです:
# copy so we can reference the original dataframe as rows are inserted into df
df2 = df.copy(deep = True)
no_nulls = df2[df2['Theme_Hat'].notnull()]
# get rid of the Theme_Hat column for final dataframe (since we're migrating that info into Theme)
df.drop('Theme_Hat', inplace = True, axis = 1)
# I'm sure there's some pandas built-in functionality that
# can handle this better than a for loop
for idx in no_nulls.index:
# reference the unchanged df2 for INC, Theme_Hat, and TRAIN_TEST info
new_row = pd.DataFrame({"INC": df2.loc[idx, 'INC'],
"Theme": df2.loc[idx, 'Theme_Hat'],
"TRAIN_TEST": df2.loc[idx, 'TRAIN_TEST']}, index = [idx+1])
print(new_row, '\n\n')
# insert the new row right after the row at the current index
df = pd.concat([df.ix[:idx], new_row, df.ix[idx+1:]]).reset_index(drop = True)