pythonでドキュメント内のキーワードを検索

私は、ドキュメント内のキーワードを検索し、そのキーワードがある文全体を取得できるようにpythonスクリプトを作成しようとしています。私の研究から、私はacoraを使うことができるのを見ましたが、私はまだそれが失敗したことを発見しました。pythonでドキュメント内のキーワードを検索

出典

2011-06-30 Ryan

'$猫のドキュメント.txt | grep "keyword" –

@Franklinこれは彼が言ったこととはまったく異なっています。彼は文を求める。 –

はい、私はgrep "キーワード"が "キーワード"のためだけであることを認識しています。しかし、私が探しているのは、キーワードが現れた場合、キーワードがある文全体をつかむことです。何か案は？ – Ryan

これは、シェルで簡単に実行できる方法です。それを自分でスクリプトで書くべきです。

>>> text = '''this is sentence 1. and that is sentence 
       2. and sometimes sentences are good. 
       when that's sentence 4, there's a good reason. and that's 
       sentence 5.''' 
>>> for line in text.split('.'): 
...  if 'and' in line: 
...   print line 
... 
and that is sentence 2 
and sometimes sentences are good 
and that's sentence 5

ここで私は.split('.')でtextを分割さと繰り返し、その後、単語andで制御し、それが含まれている場合は、それを印刷。

また、の大文字と小文字を区別すると考える必要があります。あなたはこれが文である（HA？）かだと思います、このような!と?で終わるものとして、あなたのソリューションの多くのものが、また文章です（時には、彼らはありません）

を検討すべきである（！）ので、？

は

これは文である（HA
）、またはあなたは「私はドン（
）ので

出典

2011-06-30 06:32:21

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol...""" 

>>> import re 
>>> s = re.split(r'[.?!:]+', text) 
>>> def search(word, sentences): 
     return [i for i in sentences if re.search(r'\b%s\b' % word, i)] 

>>> search('is', s) 
['Hello, this is the first sentence', ' This is the second']

出典

2011-06-30 06:35:55 JBernardo

-1： "is"という単語が含まれていなくても、関数が3番目の文と一致しました。 '' this''という単語に* sequence * '' is''を含んでいます。 – Blair

@ Blair oh yeah。それを実現しなかった。それは非常に簡単に修正することができますし、また、他のすべての答えをdownvoteする必要があります。 – JBernardo

@Blairはあなたが本当にそれをしたと信じられません。素敵な仲間になろうよ – JBernardo

思いとして分割されようとしていますこれに多くの経験がありますが、あなたはnltkを探しているかもしれません。

Try this; span_tokenizeを使って、あなたの単語のインデックスに該当するスパンを見つけて、その文章を見てください。

出典

2011-06-30 06:36:46 nattofriends

grepまたはegrepコマンドをpythonのサブプロセスモジュールで使用すると、役立つことがあります。

例えば：

from subprocess import Popen, PIPE 

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout 
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",  
#shell=True, #stdout=PIPE).stdout 
data = stdout.read() 
data.split('\n')

出典

2011-06-30 09:16:39 Yajushi

pythonでドキュメント内のキーワードを検索

答えて

関連する問題