ディレクトリ内のHTMLファイルを反復して使用する - python

指定されたディレクトリ内の.htmlファイルを繰り返し処理し、データをスクラップする必要があります。これまでのところ私のコードです、どうすれば内部のスクリプトにアクセスできますか？ディレクトリ内のHTMLファイルを反復して使用する - python

import os 
directory ='/Users/xxxxx/Documents/sample/' 
for filename in os.listdir(directory): 
    if filename.endswith('.html'): 
     print(os.path.join(directory,filename)) 
    else: 
     continue

（システム：Macの/ Python3.x）

出典

2016-12-06 reuben

あなたはこのような何か行うことができます：

import os 
from bs4 import BeautifulSoup 

directory ='/Users/xxxxx/Documents/sample/' 
for filename in os.listdir(directory): 
    if filename.endswith('.html'): 
     fname = os.path.join(directory,filename) 
     with open(fname, 'r') as f: 
      soup = BeautifulSoup(f.read(),'html.parser') 
      # parse the html as you wish

を

出典

2016-12-06 21:28:30 Den1al

ディレクトリ内のHTMLファイルを反復して使用する - python

答えて

関連する問題