データフレームからストップワードを削除する

dataframe['Text'] = dataframe['Text'].apply(lambda x : ' '.join([item for item in string.split(x.lower()) if item not in stopwords]))

データフレームからストップワードを削除しています。ロジックは正常に動作していますが、空の行がある場合はエラーとなります。データフレームからストップワードを削除する

私はdropna（）を使用しましたが、代わりに他の列にデータがあります。その列のテキストがあなたのロジックの前にヌル

出典

2017-04-13 lucy

「空行」とはどういう意味ですか？ NaN、空の文字列ですか？この場合、どのような出力が得られますか？ – FLab

他のユーザーが試すことができる例を示してください。 – mhoff

その後、そのきれいなテキストで何をするつもりですか？たぶんあなたはCountVectorized/TfidfVectorizerメソッドをチェックすべきです - 彼らはそれを "オンザフライ"で行うことができます... – MaxU

使用すべきではありません上記のロジックに条件を追加する方法

、

dataframe.dropna(subset=['Text'], how='all')

出典

2017-04-13 11:56:08

あなたは容易ではない何listを空にNaNを置き換えることができます - によってmaskやcombine_firstを使用Series空にするlists：

pos_tweets = [('I love this car', 'positive'), 
('This view is amazing', 'positive'), 
('I feel great this morning', 'positive'), 
('I am so excited about the concert', 'positive'), 
(None, 'positive')] 

df = pd.DataFrame(pos_tweets, columns= ["Text","col2"]) 
print (df) 
           Text  col2 
0     I love this car positive 
1    This view is amazing positive 
2   I feel great this morning positive 
3 I am so excited about the concert positive 
4        None positive 

stopwords = ['love','car','amazing'] 
s = pd.Series([[]], index=df.index) 
df["Text"] = df["Text"].str.lower().str.split().mask(df["Text"].isnull(), s) 
print (df) 
             Text  col2 
0      [i, love, this, car] positive 
1     [this, view, is, amazing] positive 
2   [i, feel, great, this, morning] positive 
3 [i, am, so, excited, about, the, concert] positive 
4           [] positive 

df['Text']=df['Text'].apply(lambda x:' '.join([item for item in x if item not in stopwords])) 
print (df) 
           Text  col2 
0        i this positive 
1      this view is positive 
2   i feel great this morning positive 
3 i am so excited about the concert positive 
4          positive

別の解決策：

stopwords = ['love','car','amazing'] 
df["Text"]=df["Text"].str.lower().str.split().combine_first(pd.Series([[]], index=df.index)) 
print (df) 
             Text  col2 
0      [i, love, this, car] positive 
1     [this, view, is, amazing] positive 
2   [i, feel, great, this, morning] positive 
3 [i, am, so, excited, about, the, concert] positive 
4           [] positive

出典

2017-04-13 12:05:16 jezrael

データフレームからストップワードを削除する

答えて

関連する問題