Pythonを使用して文字列URLを単語に分割する

Pythonで文字列（URL）からさまざまな単語を取得するにはどうすればよいですか？Pythonを使用して文字列URLを単語に分割する

http://www.sample.com/level1/level2/index.html?id=1234

私のような言葉を取得したい：

http, www, sample, com, level1, level2, index, html, id, 1234

のpythonを使用して任意のソリューションをのようなURLから。

ありがとうございました。

出典

2017-01-30 Shakoor Ab

としてそれを使用することができて、私はリストに結果を格納します。 –

これは、あなたがすべてのためにそれを行うことができます方法ですURL

import re 
def getWordsFromURL(url): 
    return re.compile(r'[\:/?=\-&]+',re.UNICODE).split(url)

は今、あなたは

url = "http://www.sample.com/level1/level2/index.html?id=1234" 
words = getWordsFromURL(url)

出典

2017-01-30 12:21:01

ありがとう...それは働いた。 –

あなたが '' http '、' www '、' sample '、' com '、' level1 '、' level2 '、' index '、' html '、' id '、' 1234 '] ' '[' http '、' www.sample.com '、' level1 '、' level2 '、' index.html？id '、' 1234 ']' –

@ Jean-FrançoisFabrere.UNICODEとしてコンパイルしました。 http://stackoverflow.com/questions/41935748/splitting-a-stri ng-url-into-words-us ing-python –

だけで正規表現スプリット非alphanumsの最大の順序に従って：

import re 
l = re.split(r"\W+","http://www.sample.com/level1/level2/index.html?id=1234") 
print(l)

利回り：

['http', 'www', 'sample', 'com', 'level1', 'level2', 'index', 'html', 'id', '1234']

これは単純ですが、誰かが指摘したよう_がある場合、機能しません。、-、...のURL名です。だから、あまり楽しいソリューションは、パス部分を分離することができ、すべての可能なトークンの一覧を表示するために、次のようになります。

l = re.split(r"[/:\.?=&]+","http://stackoverflow.com/questions/41935748/splitting-a-stri‌ng-url-into-words-us‌ing-python")

（私はいくつかの分離記号を忘れている可能性があることを認める）

出典

2017-01-30 12:19:00

'http：// stackoverflow.com/questions/41935748/split-a-string-url-into-words-using-python'のようなURLでは機能しません – Himal

@Himalは私の答えをチェックします、それは –

ではありません'[' http '、' www.sample.com '、' level1 '、' level2 '、' index.html？id '、' 1234 ']' –

Pythonを使用して文字列URLを単語に分割する

答えて

関連する問題