Beautifulsoupを使ったPython Webスクレーパー4

よく使う言葉でデータベースを作りたかったのです。今すぐこのスクリプトを実行するとうまくいきますが、最大の問題はすべての単語を1つの列に入れることだけです。私は本当の修正よりもハックのほうがいいと思う。 Beautifulsoupを使用すると、余分な空白行を入れずに1つの列にすべてを印刷できますか？Beautifulsoupを使ったPython Webスクレーパー4

import requests 
import re 
from bs4 import BeautifulSoup 

#Website you want to scrap info from 
res = requests.get("https://github.com/first20hours/google-10000-english/blob/master/google-10000-english-usa.txt") 
# Getting just the content using bs4 
soup = BeautifulSoup(res.content, "lxml") 

# Creating the CSV file 
commonFile = open('common_words.csv', 'wb') 

# Grabbing the lines you want 
    for node in soup.findAll("tr"): 
    # Getting just the text and removing the html 
    words = ''.join(node.findAll(text=True)) 
    # Removing the extra lines 
    ID = re.sub(r'[\t\r\n]', '', words) 
    # Needed to add a break in the line to make the rows 
    update = ''.join(ID)+'\n' 
    # Now we add this to the file 
    commonFile.write(update) 
commonFile.close()

出典

2016-04-27 Valerie Sharp

これはいかがですか？

import requests 
import csv 
from bs4 import BeautifulSoup 

f = csv.writer(open("common_words.csv", "w")) 
f.writerow(["common_words"]) 

#Website you want to scrap info from 
res = requests.get("https://github.com/first20hours/google-10000-english/blob/master/google-10000-english-usa.txt") 
# Getting just the content using bs4 
soup = BeautifulSoup(res.content, "lxml") 

words = soup.select('div[class=file] tr') 

for i in range(len(words)): 
    word = words[i].text 
    f.writerow([word.replace('\n', '')])

出典

2016-04-28 22:53:01 ahmadhas

Beautifulsoupを使ったPython Webスクレーパー4

答えて

関連する問題