2017-01-02 23 views
-1

誰かがウェブサイトを傷つけることができる方法を知っていますか.txtからURLリストIEを読み込み、.txtからaの名前を使って.txtに各URLの結果を書き出します。したがって、コードが読み取るURLと名前ファイルがあり、.txtファイルの各行に本文が書き込まれます。私が見つけたもっとも近いものはコードの下にありますが、その名前は可変ではなく固定の名前である1つの.txtファイルにすべて保存されています。そのURLをリストから読み取ることです。私はループが最も良い方法だと思っていますが、私はこのタイプの仕事のためにコードや助けを見たことがありません。テキストファイルから変数を使用してPythonウェブサイトを掻き集める

import requests 
from bs4 import BeautifulSoup 
from collections import Counter 
urls = ["http://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart","http://en.wikipedia.org/wiki/Golf"] 

with open('thisisanew.txt', 'w', encoding='utf-8') as outfile: 
    for url in urls: 
    website = requests.get(url) 
    soup = BeautifulSoup(website.content) 
    text = [''.join(s.findAll(text=True))for s in soup.findAll('p')] 
    for item in text: 
      print(item ,file=outfile,) 

ご協力いただきありがとうございます。

+0

だから、自分で任意のコードを書くことを試みていない、とだけを探しています誰かあなたのためにそれを行うには? –

+0

google pythonを読むと、行ごとにtxtファイルが読み込まれます。 pythonファイルにデータを書き込みます。 – Bobby

+0

1. [ask] 2. [mcve] – MYGz

答えて

0
import requests 
from bs4 import BeautifulSoup 
from pathlib import Path 
urls = ["http://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart","http://en.wikipedia.org/wiki/Golf"] 
for url in urls: 
    # make requests 
    response = requests.get(url) 
    soup = BeautifulSoup(response.text, 'lxml') 
    text = '\n'.join(s.get_text() for s in soup.findAll('p')) # add '\n' betweent each p 
    # write to file 
    filename = Path(url).name # get the last part of url 
    out_file = Path(filename).with_suffix('.txt') # add.txt to filename 
    with out_file.open(mode='w') as f: 
     f.writelines(text) 

アウト:

Golf.txt:

Golf is a club and ball sport in which players use various clubs to hit balls into a series of holes on a course in as few strokes as possible. 
Golf, unlike most ball games, does not require a standardized playing area. The game is played on a course with an arranged progression of either nine or 18 holes. Each hole on the course must contain a tee box to start from, and a putting green containing the actual hole or cup (4.25 inches in width). There are other standard forms of terrain in between, such as the fairway, rough (long grass), sand traps, and hazards (water, rocks, fescue) but each hole on a course is unique in its specific layout and arrangement. 
Golf is played for the lowest number of strokes by an individual, known as stroke play, or the lowest score on the most individual holes in a complete round by an individual or team, known as match play. Stroke play is the most commonly seen format at all levels. 

Wolfgang_Amadeus_Mozart.txt

Wolfgang Amadeus Mozart (/ˈwʊlfɡæŋ æməˈdeɪəs ˈmoʊtsɑːrt/; MOHT-sart;[1] German: [ˈvɔlfɡaŋ amaˈdeːʊs ˈmoːtsaʁt]; 27 January 1756 – 5 December 1791), baptised as Johannes Chrysostomus Wolfgangus Theophilus Mozart,[2] was a prolific and influential composer of the Classical era. 
Born in Salzburg, he showed prodigious ability from his earliest childhood. Already competent on keyboard and violin, he composed from the age of five and performed before European royalty. At 17, Mozart was engaged as a musician at the Salzburg court, but grew restless and traveled in search of a better position. While visiting Vienna in 1781, he was dismissed from his Salzburg position. He chose to stay in the capital, where he achieved fame but little financial security. During his final years in Vienna, he composed many of his best-known symphonies, concertos, and operas, and portions of the Requiem, which was largely unfinished at the time of his death. The circumstances of his early death have been much mythologized. He was survived by his wife Constanze and two sons. 
関連する問題