パンダのデータフレームに文字列を追加するには？

私は、テキストファイルのディレクトリから文章を抽出する次のコードを持っています。パンダのデータフレームに文字列を追加するには？

# -*- coding: utf-8 -*- 
from nltk.tokenize import sent_tokenize 
import pandas as pd 

directory_in_str = "E:\\Extracted\\" 
directory = os.fsencode(directory_in_str) 

for file in os.listdir(directory): 
    filename = os.fsdecode(file) 
    with open(os.path.join(directory_in_str, filename), encoding="utf8") as f_in: 
     for line in f_in: 
      sentences = sent_tokenize(line)

私は私が言うことですHow to find ngram frequency of a column in a pandas dataframe?

あたりとして文章中のnグラムの頻度数を構築することができるようにそのデータフレームに文章をパンダのデータフレームを構築し、追加したいと思いますdfデータフレームに文章を追加するコードになり

from sklearn.feature_extraction.text import CountVectorizer 
word_vectorizer = CountVectorizer(ngram_range=(1,2), analyzer='word') 
sparse_matrix = word_vectorizer.fit_transform(df['description']) 
frequencies = sum(sparse_matrix).toarray()[0] 
pd.DataFrame(frequencies, index=word_vectorizer.get_feature_names(), columns=['frequency'])

何を：私は、私はそれから行うことができるようにdf = pd.DataFrame([], columns=['description'])に文章を追加する必要がありますか？

出典

2017-09-26 Superdooperhero

抽出コードを少し変更する必要があります。 sentencesを外部に宣言し、必要に応じてextendを保管してください。一度行わ

sentences = [] 
for file in os.listdir(directory): 
    filename = os.fsdecode(file) 
    with open(os.path.join(directory_in_str, filename), encoding="utf8") as f_in: 
     for line in f_in: 
      sentences.extend(sent_tokenize(line))

、単にこのようなあなたのdfを初期化：私は `ngram_freq = pd.DataFrame（周波数、インデックス= word_vectorizer.get_feature_names（）、列= [ '周波数']行う場合

df = pd.DataFrame({'Description' : sentences})

出典

2017-09-26 21:20:54

） 'と' df.index.name = 'ngram''とngram_freq [ngram_freq.ngram ==' youtube '] '私はyoutubeの頻度カウントを取得できません。どのようにそれを行うにはどのようなアイデア？ – Superdooperhero

@Superdooperheroどういう意味ですか？ 'ngram_freq [ngram_freq.index == 'youtube']'？ –

はい、申し訳ありません。 'ngram_freq.index.name = 'ngram''である必要があります。 – Superdooperhero

パンダのデータフレームに文字列を追加するには？

答えて

関連する問題