2016-04-27 19 views
0

よく使う言葉でデータベースを作りたかったのです。今すぐこのスクリプトを実行するとうまくいきますが、最大の問題はすべての単語を1つの列に入れることだけです。私は本当の修正よりもハックのほうがいいと思う。 Beautifulsoupを使用すると、余分な空白行を入れずに1つの列にすべてを印刷できますか?Beautifulsoupを使ったPython Webスクレーパー4

import requests 
import re 
from bs4 import BeautifulSoup 

#Website you want to scrap info from 
res = requests.get("https://github.com/first20hours/google-10000-english/blob/master/google-10000-english-usa.txt") 
# Getting just the content using bs4 
soup = BeautifulSoup(res.content, "lxml") 

# Creating the CSV file 
commonFile = open('common_words.csv', 'wb') 

# Grabbing the lines you want 
    for node in soup.findAll("tr"): 
    # Getting just the text and removing the html 
    words = ''.join(node.findAll(text=True)) 
    # Removing the extra lines 
    ID = re.sub(r'[\t\r\n]', '', words) 
    # Needed to add a break in the line to make the rows 
    update = ''.join(ID)+'\n' 
    # Now we add this to the file 
    commonFile.write(update) 
commonFile.close() 

答えて

1

これはいかがですか?

import requests 
import csv 
from bs4 import BeautifulSoup 

f = csv.writer(open("common_words.csv", "w")) 
f.writerow(["common_words"]) 

#Website you want to scrap info from 
res = requests.get("https://github.com/first20hours/google-10000-english/blob/master/google-10000-english-usa.txt") 
# Getting just the content using bs4 
soup = BeautifulSoup(res.content, "lxml") 

words = soup.select('div[class=file] tr') 

for i in range(len(words)): 
    word = words[i].text 
    f.writerow([word.replace('\n', '')]) 
関連する問題