Python：列の単語の頻度をカウントし、データフレームの別の列に結果を格納する

1つの列の各行に表示される各単語の数をカウントして、私のデータフレームのheadlampという列（ 'word'）。私は次のダウンコードを試していますが、私はエラーを受け取ります。Python：列の単語の頻度をカウントし、データフレームの別の列に結果を格納する

for i in range(0,len(headlamp)): 
    headlamp['word'].apply(lambda text: Counter(" ".join(headlamp['Comment'][i].astype(str)).split(" ")).items()) 
--------------------------------------------------------------------------- 
KeyError         Traceback (most recent call last) 
<ipython-input-16-a0c20291b4f5> in <module>() 
    1 for i in range(0,len(headlamp)): 
    ----> 2  headlamp['word'].apply(lambda text: Counter("".join(headlamp['Comment'][i].astype(str)).split(" ")).items()) 

    C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key) 
    1995    return self._getitem_multilevel(key) 
    1996   else: 
    -> 1997    return self._getitem_column(key) 
    1998 
    1999  def _getitem_column(self, key): 

    C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key) 
    2002   # get column 
    2003   if self.columns.is_unique: 
    -> 2004    return self._get_item_cache(key) 
    2005 
    2006   # duplicate columns & possible reduce dimensionality 

    C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item) 
    1348   res = cache.get(item) 
    1349   if res is None: 
    -> 1350    values = self._data.get(item) 
    1351    res = self._box_item_values(item, values) 
    1352    cache[item] = res 

    C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\internals.pyc in get(self, item, fastpath) 
    3288 
    3289    if not isnull(item): 
    -> 3290     loc = self.items.get_loc(item) 
    3291    else: 
    3292     indexer = np.arange(len(self.items))[isnull(self.items)] 

    C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\indexes\base.pyc in get_loc(self, key, method, tolerance) 
    1945     return self._engine.get_loc(key) 
    1946    except KeyError: 
    -> 1947     returnself._engine.get_loc(self._maybe_cast_indexer(key)) 
    1948 
    1949   indexer = self.get_indexer([key], method=method, tolerance=tolerance) 

    pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)() 

    pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)() 

    pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)() 

    pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)() 

    KeyError: 'word'

すべてのヘルプは非常にあなたがこれを試すことができます

出典

2016-10-15 Rafael Rodrigues Santos

こんにちは、各単語の出現頻度を格納する列の予想されるフォーマットは何ですか？ 'dict'、単語単位の列？ – Romain

データフレームヘッドを投稿できますか？ 'Headlamp ['word']'を見つけようとしているときに 'KeyError： 'word''を取得しています –

あなたの返信ありがとうございます@rfw、それぞれの単語の数をすべて列のコメントに入れたい'new column' word ' したがって、この新しい列' word 'が作成されます。ヘッドランプ（自動車部品）に関連する問題を特定するために、各コメントに特定の単語が何回出現したかを知りたいからです。ここにデータフレーム –

に感謝されます。

headlamp['word'] = headlamp['Comment'].apply(lambda x: len(x.split()))

例：

headlamp = pd.DataFrame({'Comment': ['hello world','world','foo','foo and bar']}) 
print(headlamp) 
     Comment 
0 hello world 
1  world 
2   foo 
3 foo and bar 

headlamp['word'] = headlamp['Comment'].apply(lambda x: len(x.split())) 
print(headlamp) 
     Comment word 
0 hello world  2 
1  world  1 
2   foo  1 
3 foo and bar  3

出典

2016-10-15 14:06:32

あなたが欲しいものを達成することができmost_common()方法を使用して。

は、コードのこの部分を使用してお気軽

：

import pandas as pd 
from collections import Counter 

df = pd.DataFrame({'Comment': ['This has has words words words that are written twice twice', 'This is a comment without repetitions', 'This comment, has ponctuations!']}, index = [0, 1, 2]) 

#you must create the new column before trying to assing any value 
df['Words'] = "" 

#counting frequencies 
i = 0 
for row in df['Comment']: 
    df['Words'][i] = str(Counter(row.split()).most_common()) 
    i+=1 

print df

出力：

           Comment \ 
0 This has has words words words that are writte... 
1    This is a comment without repetitions 
2     This comment, has ponctuations! 

               Words 
0 [('words', 3), ('twice', 2), ('has', 2), ('tha... 
1 [('a', 1), ('comment', 1), ('This', 1), ('is',... 
2 [('This', 1), ('comment,', 1), ('has', 1), ('p...

出典

2016-10-15 14:11:50

あなたは –

Hoewverを@rfwあなたの助けをありがとう@rfwました\ Users \ Rafael \ Anaconda2 \ envs \ gl-env \ lib \ site-パッケージ\ ipykernel \ __ main__.py:1：SettingWithCopyWarning：値は、DataFrameからスライスのコピーに設定しようとしています。代わりに.loc [row_indexer、col_indexer] = valueを使用してみてくださいドキュメントの注意点を参照してください：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy __name__ == '__main__'： –

をインポートした私は、新しい列を作成したとき、私は警告を得た結果を達することができなかった –

Python：列の単語の頻度をカウントし、データフレームの別の列に結果を格納する

答えて

関連する問題