2017-12-23 20 views

長い用語集があり、節に用語集が含まれているかどうかを確認したいと思います。以下のように:Pandas - 別の列の値に列ラベルが存在するかどうかをチェックして列を更新する

>>> glossary = ['phrase 1', 'phrase 2', 'phrase 3'] 
>>> glossary 
['phrase 1', 'phrase 2', 'phrase 3'] 

>>> df= pd.DataFrame(['This is a phrase 1 and phrase 2', 'phrase 1', 
'phrase 3', 'phrase 1 & phrase 2. phrase 3 as well'],columns=['text']) 
>>> df 
0  This is a phrase 1 and phrase 2 
1        phrase 1 
2        phrase 3 
3 phrase 1 & phrase 2. phrase 3 as well 


        text phrase 1 phrase 2 phrase 3 
0  This is a phrase 1 and phrase 2  NaN  NaN  NaN 
1        phrase 1  NaN  NaN  NaN 
2        phrase 3  NaN  NaN  NaN 
3 phrase 1 & phrase 2. phrase 3 as well  NaN  NaN  NaN 


        text phrase 1 phrase 2 phrase 3 
0  This is a phrase 1 and phrase 2  1  1  0 
1        phrase 1  1  0  0 
2        phrase 3  0  0  1 
3 phrase 1 & phrase 2. phrase 3 as well  1  1  1 


あなたが 0,1データフレームのため intにキャストで str.containsconcatでリストの内包表記を使用することができます



L = [df['text'].str.contains(x) for x in glossary] 
df1 = pd.concat(L, axis=1, keys=glossary).astype(int) 
print (df1) 
    phrase 1 phrase 2 phrase 3 
0   1   1   0 
1   1   0   0 
2   0   0   1 
3   1   1   1 


df = df.join(df1) 
print (df) 
            text phrase 1 phrase 2 phrase 3 
0  This is a phrase 1 and phrase 2   1   1   0 
1        phrase 1   1   0   0 
2        phrase 3   0   0   1 
3 phrase 1 & phrase 2. phrase 3 as well   1   1   1 