2017-03-25 15 views
1

私はPythonを使用して特定の文章を整理しています。 、pythonでappostrophe /短い単語を置き換えます。

同様
What's -> What is 

また

must've -> must have 

、元の形に動詞、

told -> tell 

に特異:私は変換したい

What's the best way to ensure this? 

:私の文があるとし複数、など。

現在textblobを調べています。しかし、それを使用して上記のすべてを行うことはできません。

+0

あなたは実際に質問していません。しかし、あなたが図書館の推薦を求めているのであれば、それはSOの話題ではありません。 –

答えて

4

最初の質問については、あなたがあなた自身を構築する必要がありますので、あなたのために、まずあなたがこのような収縮辞書を必要とすることを行い、直接モジュールがありません。

contractions = { 
"ain't": "am not/are not", 
"aren't": "are not/am not", 
"can't": "cannot", 
"can't've": "cannot have", 
"'cause": "because", 
"could've": "could have", 
"couldn't": "could not", 
"couldn't've": "could not have", 
"didn't": "did not", 
"doesn't": "does not", 
"don't": "do not", 
"hadn't": "had not", 
"hadn't've": "had not have", 
"hasn't": "has not", 
"haven't": "have not", 
"he'd": "he had/he would", 
"he'd've": "he would have", 
"he'll": "he shall/he will", 
"he'll've": "he shall have/he will have", 
"he's": "he has/he is", 
"how'd": "how did", 
"how'd'y": "how do you", 
"how'll": "how will", 
"how's": "how has/how is", 
"i'd": "I had/I would", 
"i'd've": "I would have", 
"i'll": "I shall/I will", 
"i'll've": "I shall have/I will have", 
"i'm": "I am", 
"i've": "I have", 
"isn't": "is not", 
"it'd": "it had/it would", 
"it'd've": "it would have", 
"it'll": "it shall/it will", 
"it'll've": "it shall have/it will have", 
"it's": "it has/it is", 
"let's": "let us", 
"ma'am": "madam", 
"mayn't": "may not", 
"might've": "might have", 
"mightn't": "might not", 
"mightn't've": "might not have", 
"must've": "must have", 
"mustn't": "must not", 
"mustn't've": "must not have", 
"needn't": "need not", 
"needn't've": "need not have", 
"o'clock": "of the clock", 
"oughtn't": "ought not", 
"oughtn't've": "ought not have", 
"shan't": "shall not", 
"sha'n't": "shall not", 
"shan't've": "shall not have", 
"she'd": "she had/she would", 
"she'd've": "she would have", 
"she'll": "she shall/she will", 
"she'll've": "she shall have/she will have", 
"she's": "she has/she is", 
"should've": "should have", 
"shouldn't": "should not", 
"shouldn't've": "should not have", 
"so've": "so have", 
"so's": "so as/so is", 
"that'd": "that would/that had", 
"that'd've": "that would have", 
"that's": "that has/that is", 
"there'd": "there had/there would", 
"there'd've": "there would have", 
"there's": "there has/there is", 
"they'd": "they had/they would", 
"they'd've": "they would have", 
"they'll": "they shall/they will", 
"they'll've": "they shall have/they will have", 
"they're": "they are", 
"they've": "they have", 
"to've": "to have", 
"wasn't": "was not", 
"we'd": "we had/we would", 
"we'd've": "we would have", 
"we'll": "we will", 
"we'll've": "we will have", 
"we're": "we are", 
"we've": "we have", 
"weren't": "were not", 
"what'll": "what shall/what will", 
"what'll've": "what shall have/what will have", 
"what're": "what are", 
"what's": "what has/what is", 
"what've": "what have", 
"when's": "when has/when is", 
"when've": "when have", 
"where'd": "where did", 
"where's": "where has/where is", 
"where've": "where have", 
"who'll": "who shall/who will", 
"who'll've": "who shall have/who will have", 
"who's": "who has/who is", 
"who've": "who have", 
"why's": "why has/why is", 
"why've": "why have", 
"will've": "will have", 
"won't": "will not", 
"won't've": "will not have", 
"would've": "would have", 
"wouldn't": "would not", 
"wouldn't've": "would not have", 
"y'all": "you all", 
"y'all'd": "you all would", 
"y'all'd've": "you all would have", 
"y'all're": "you all are", 
"y'all've": "you all have", 
"you'd": "you had/you would", 
"you'd've": "you would have", 
"you'll": "you shall/you will", 
"you'll've": "you shall have/you will have", 
"you're": "you are", 
"you've": "you have" 
} 

次に書きます、辞書によると、このような何かあなたのテキストを変更するにはいくつかのコード:緊張動詞の変更のあなたの2番目の質問については

text="What's the best way to ensure this?" 
for word in text.split(): 
    if word.lower() in contractions: 
     text = text.replace(word, contractions[word.lower()]) 
print(text) 

を、nodebox's linguistics libraryは非常に人気があり、非常にそのようなタスクのためにお勧めです。 downloading their zip fileの後、解凍してpythonのsite-packageディレクトリにコピーします。それをやった後、あなたがこのような何かを書くことができます:

import en 
for word in text.split(): 
    if en.is_verb(word.lower()): 
     text = text.replace(word, en.verb.present(word.lower())) 
print text 

注:それはまだ3

1

あなたが独自のロールしたい場合は、縮小写像のためにこれを使用することができます。

http://alicebot.blogspot.com/2009/03/english-contractions-and-expansions.html

と動詞の交換のために、この後者の場合

http://www.lexically.net/downloads/BNC_wordlists/e_lemma.txt

、あなたが希望おそらく曖昧なフォームが存在する可能性があることを念頭に置きながら、すべての結合フォームを元のものにマッピングする逆辞書を生成したいと思っているので、それらを適切に)。

+0

ありがとうございました。英語の縮みと拡張を使いたいです。 – learner

1

答え上記のpythonのサポートを提供していないだけでなく完璧に動作しますので、このライブラリは、Python 2のためだけで、あいまいな収縮の方が良いかもしれません(あまりあいまいなケースはあまりないと主張しますが)。

import re 

def decontracted(phrase): 
    # specific 
    phrase = re.sub(r"won't", "will not", phrase) 
    phrase = re.sub(r"can\'t", "can not", phrase) 

    # general 
    phrase = re.sub(r"n\'t", " not", phrase) 
    phrase = re.sub(r"\'re", " are", phrase) 
    phrase = re.sub(r"\'s", " is", phrase) 
    phrase = re.sub(r"\'d", " would", phrase) 
    phrase = re.sub(r"\'ll", " will", phrase) 
    phrase = re.sub(r"\'t", " not", phrase) 
    phrase = re.sub(r"\'ve", " have", phrase) 
    phrase = re.sub(r"\'m", " am", phrase) 
    return phrase 


test = "Hey I'm Yann, how're you and how's it going ? That's interesting: I'd love to hear more about it." 
print(decontracted(test)) 
# Hey I am Yann, how are you and how is it going ? That is interesting: I would love to hear more about it. 

私は考えていないいくつかの欠陥があるかもしれません。

+0

+1私はこのアプローチが好きです。私が見つけたバグの1つは、簡単に修正することができますが、「できません」ということは「ca not」に変わります。可能な修正は 'phrase = reを追加することです。具体的に – gionni

+0

@ gionniありがとう私はこの事例の例にはならないと思ったに違いありません: 'phrase = re.sub(" r "\ 't"、 "not"、フレーズ) '。しかし、あなたは間違いなく、最初のケースのせいでcaで終わることは間違いありません。それを指摘してくれてありがとう、私は私の答えを更新しました! –

関連する問題