リスト内の単語を数えるには？

from bs4 import BeautifulSoup 
import urllib2 
# Imported libraries for future use. 
response = urllib2.urlopen('http://www.nytimes.com').read() 
soup = BeautifulSoup(response,"lxml") 

host = [] 
#created empty list to append future words extracted from data set. 
for story_heading in soup.find_all(class_="story-heading"): 
    story_title = story_heading.text.replace("\n", " ").strip() 
    new_story_title = story_title.encode('utf-8') 


    parts = new_story_title.split()[0] 

    i=['a','A','an','An','the','The','from','From','to','To','when','When','what','What','on','On','for','For'] 
    if parts not in i: 
     host.append(parts) 
    else: 
     pass 
#now i have to calculate the number of repeated words in the file and calcute the number of repeatation.  
print host

作成したリストの繰り返し単語の数を計算する方法を教えてください。実際には上記のコードについてもかなり混乱しています。誰かが私が間違ったことを説明できるのであれば、それは感謝するでしょう。リスト内の単語を数えるには？

出典

2016-04-26 vikhaf

あなたが行うことができ、デフォルトで[カウンタ]（https://docs.python.org/2/library/collections.html ＃collections.Counter） – r3ign

大文字と小文字は区別されますか？「何」==「何」ですか、あるいはそれらを異なる値として扱うのでしょうか？ –

あなたは行うことができますとcount

d = {i: host.count(i) for i in set(host)} 
print(d)

出典

2016-04-26 06:09:08 salomonderossi

これは重複した項目を何度も数えることに注意してください。結果は同じですが、時間がかかります。ホストリストの代わりに、ホストの要素のセット（重複なしのコレクション）を繰り返して、特に複数のリストアイテムが複数存在する場合は、パフォーマンスを向上させる必要があります。 –

@ByteCommanderヒントありがとうございます。私は自分の答えを編集しました – salomonderossi

使用のコレクションにCounter方法モジュール：

from bs4 import BeautifulSoup 
from collections import Counter 
import urllib2 
# Imported libraries for future use. 
response = urllib2.urlopen('http://www.nytimes.com').read() 
soup = BeautifulSoup(response,"lxml") 

host = [] 
#created empty list to append future words extracted from data set. 
for story_heading in soup.find_all(class_="story-heading"): 
    story_title = story_heading.text.replace("\n", " ").strip() 
    new_story_title = story_title.encode('utf-8') 


    parts = new_story_title.split()[0] 

    i=['a','A','an','An','the','The','from','From','to','To','when','When','what','What','on','On','for','For'] 
    if parts not in i: 
     host.append(parts) 
    else: 
     pass 
#now i have to calculate the number of repeated words in the file and calcute the number of repeatation.  
print Counter(host)

出力：

>>> ================================ RESTART ================================ 
>>> 
Counter({'North': 2, 'Trump': 1, 'U.S.': 1, 'Kasich-Cruz': 1, '8': 1, 'Court': 1, 'Where': 1, 'Your': 1, 'Forget': 1}) 
>>>

出典

2016-04-26 06:09:09 EbraHim

あなたは以下のコードを見ることができますlisを使わないt理解。これは理解しやすいはずです。

host = ['Hello','foo','bar','World','foo','Hello'] 
dict1 = {} 
host_unique = list(set(host)) 
for i in host_unique: 
    dict[i] = host.count(i)

出典

2016-04-26 06:20:55 Nagaraj

あなたにはインデントがありません。 –

要素のセットを反復辞書内包表記の使用：

、大文字と小文字を区別したバージョンを（ "何" = "何を"！）：
```
occurrences = { item: host.count(item) for item in set(host) } 
```
大文字と小文字を区別しないバージョン（ "What" == "what"）：
```
occurrences = { item: host.count(item) for item in set(item.lower() for item in host) } 
```
この場合、辞書キーも小文字の要素になります。

出典

2016-04-26 06:37:40

用途：

lst = ['hi', 'Hio', 'Hi', 'hello', 'there' ] 
s = set() 
map(lambda x: s.add(x.lower()), lst) 
print(len(s))

lst = ['hi', 'Hio', 'Hi', 'hello', 'there' ] 
s = set() 
for item in lst: 
    s.add(item.lower()) 
print(len(s))

出典

2016-04-27 03:37:33

リスト内の単語を数えるには？

答えて

関連する問題