は、Python

で正規表現（RE）の重複結果を削除する方法

文字列があります：は、Python

str = 'Please Contact Prof. Zheng Zhao: <a href="mailto:[email protected]">[email protected]</a> for details, or our HR: [email protected]'

私はその文字列内の電子メールのすべてを解析したかったので、私は設定：

p = r'[\w\.][email protected][\w\.]+' 
re.findall(p, str)

結果は次のとおりです。

['[email protected]', '[email protected]', '[email protected]']

明らかに、最初と2番目が重複しています。これをどうやって防ぐか？

出典

2017-09-20 Abe Wong

setを使用して重複を削除できます。 setは重複していないlistのようなものです。この場合、大文字と小文字は区別されません。結果を小文字にすると、重複を正しくチェックできます。

import re 

s = 'Please Contact Prof. Zheng Zhao: <a href="mailto:[email protected]">[email protected]</a> for details, or our HR: [email protected]' 

p = r'[\w\.][email protected][\w\.]+' 
list(set(result.lower() for result in re.findall(p, s)))

出典

2017-09-20 02:59:21

答えて

関連する問題