Python Web Crawler、forループからの関数呼び出しはできますか？

私はすべてのタイトルのURLリンクを取得し、すべての章のURLリンクを見つけて、チャプターリンクからすべてのセクションリンクを見つけるように、このWebクローラーをコーディングしようとします。Python Web Crawler、forループからの関数呼び出しはできますか？

問題は、このhttps://github.com/buckyroberts/Source-Code-from-Tutorials/blob/master/Python/27_workingsolution_python.pyのチュートリアルで、著者が定義する前に2番目の関数を呼び出すことができたということです。それは本当に混乱しています。

私は同様の方法で試しましたが、期待通りに "leveltwo"という名前は定義されていません。私の質問は、どのように2番目の関数のためのパラメータとしてそれを使用するために、以前の関数から取得したリンクを使用することです。

私のコード：

import requests 
from bs4 import BeautifulSoup, SoupStrainer 
import re 


######################################Titles############################### 
def levelone(url): 
r = requests.get(url) 
for links in BeautifulSoup((r.content),"html.parser",parse_only=SoupStrainer('a')): 
    if links.has_attr('href'): 
     if 'title' in links['href']: 
      titlelinks = "http://law.justia.com" + links.get('href') 
      # titlelinks = "\n" + str(titlelinks) 
      leveltwo(titlelinks) 
      # print (titlelinks) 


base_url = "http://law.justia.com/codes/alabama/2015/" 
levelone(base_url) 


########################################Chapters########################## 
def leveltwo(item_url): 
r = requests.get(item_url) 
for sublinks in BeautifulSoup((r.content),"html.parser",parse_only=SoupStrainer('a')): 
    if sublinks.has_attr('href'): 
    if 'chapt' in sublinks['href']: 
     chapterlinks = "http://law.justia.com" + sublinks.get('href') 
     # chapterlinks = "\n" + str(chapterlinks) 

     levelthree(chapterlinks) 
     # print (chapterlinks) 

# leveltwo(titlelinks) ### I tried call the function right here, but titlelinks is not defined. 

########################################Sections########################## 
def levelthree(item2_url): 
r = requests.get(item2_url) 
for sectionlinks in BeautifulSoup((r.content),"html.parser",parse_only=SoupStrainer('a')): 
    if sectionlinks.has_attr('href'): 
    if 'section' in sectionlinks['href']: 
     href = "http://law.justia.com" + sectionlinks.get('href') 
     href = "\n" + str(href) 
     print (href)

出典

2016-04-09 CHballer

まず関数を定義してから呼び出します。

import requests 
from bs4 import BeautifulSoup, SoupStrainer 
import re 

########################################Sections########################## 
def levelthree(item2_url): 
r = requests.get(item2_url) 
for sectionlinks in BeautifulSoup((r.content),"html.parser",parse_only=SoupStrainer('a')): 
    if sectionlinks.has_attr('href'): 
    if 'section' in sectionlinks['href']: 
     href = "http://law.justia.com" + sectionlinks.get('href') 
     href = "\n" + str(href) 
     print (href) 

########################################Chapters########################## 
def leveltwo(item_url): 
r = requests.get(item_url) 
for sublinks in BeautifulSoup((r.content),"html.parser",parse_only=SoupStrainer('a')): 
    if sublinks.has_attr('href'): 
    if 'chapt' in sublinks['href']: 
     chapterlinks = "http://law.justia.com" + sublinks.get('href') 
     # chapterlinks = "\n" + str(chapterlinks) 

     levelthree(chapterlinks) 
     # print (chapterlinks) 

######################################Titles############################### 
def levelone(url): 
r = requests.get(url) 
for links in BeautifulSoup((r.content),"html.parser",parse_only=SoupStrainer('a')): 
    if links.has_attr('href'): 
     if 'title' in links['href']: 
      titlelinks = "http://law.justia.com" + links.get('href') 
      # titlelinks = "\n" + str(titlelinks) 
      leveltwo(titlelinks) 
      # print (titlelinks) 

########################################################################### 
base_url = "http://law.justia.com/codes/alabama/2015/" 
levelone(base_url)

出典

2016-04-09 08:04:11

Python Web Crawler、forループからの関数呼び出しはできますか？

答えて

関連する問題