一般的な表現を下塗り、ここで数字だけ

を抽出するために探していることはサンプルテキストです：一般的な表現を下塗り、ここで数字だけ

initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake

理想的には、テキストは次のようになります。

initiated to address the deviation to 020583 Section 016248 john doe, john doe 020583 fake text, this is all fake

ここれますこれまでのコード：

def dashrepl(matchobj): 
    print (type(matchobj)) 
    return re.findall('[0-9]',matchobj) 

re.sub(SOP, dashrepl, long_desc_text[22])

しかし、私は次のエラーを取得しています：

TypeError: expected string or buffer

編集更新内容：

long_desc_text[22]

SOP-020583v11.0 Section 8.4.On 17Jan2016 at ATO Site, SOP-016248v2.0 was due for periodic review but the periodic SOP-016248 revision is not tied to any change control records. SOP-020583 tied to a change control record" and notified ID63718 notifiedID22359 of the event. SOP-020583v11.0, fake text fake text

出典

2017-12-13 madsthaks

'findall'の2番目の引数が間違っていると思います。それは文字列にする必要がありますか？ –

はい、そうですが、このエラーが出ます： 'TypeError：シーケンスアイテム1：期待される文字列、リストが見つかりました' – madsthaks

'matchobj'は文字列でなければなりません。 –

だから、ここに私のコードだ：

import re 

test = "initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake" 

regexp = r"SOP-(\d+)(?:v\d+\.\d)?" 

test = re.subn(regexp, r"\1", test) 

print test[1]

それが生成します：
"020583への偏差に対処するために開始されました。セクション016248 john doe、john doe 020583偽のテキスト、これはすべて偽です"

パターンのすべての例を見つけて指定された文字列で置き換えるpython re関数 "subn" - この場合、最初のキャプチャグループ。文字列の前の "r"は正規表現オブジェクトとしてそれを指定します。参考のため

また、私はこのlink

は、この情報がお役に立てば幸いました。

出典

2017-12-13 05:22:42 Chromane

これは最初に尋ねたことではないが、私は何をするだろうかということは、数字を文字形式でエンコードしたいと思っていた。たとえば、a = 0、b = 1など – madsthaks

一般的な表現を下塗り、ここで数字だけ

答えて

関連する問題