1
以下は、URLのブランド名と製品名を盗むコードです。このURLはxlsxファイルに保存され、出力はxlsファイルです。Pythonを使用してxlsxに列ヘッダーを追加しますか?
import requests
from bs4 import BeautifulSoup
import xlrd
import xlwt
file_location = "C:/Users/Nitin Kansal/Desktop/Facets Project/Jabong ALL/Jabong/input.xlsx"
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)
products = []
for r in range(sheet.nrows):
products.append(sheet.cell_value(r,0))
book = xlwt.Workbook(encoding= "utf-8", style_compression = 0)
sheet = book.add_sheet("Sheet11", cell_overwrite_ok=True)
for index, url in enumerate(products):
source = requests.get(url)
data = source.content
soup = BeautifulSoup(data, "lxml")
sheet.write(index, 0, url)
try:
Brand = soup.select(".brand")[0].text
sheet.write(index, 1, Brand)
except Exception:
sheet.write(index, 1, "")
try:
Product_Name = soup.select(".product-title")[0].text
sheet.write(index, 2, Product_Name)
except Exception:
sheet.write(index, 2, "")
book.save("Jabong Output.xls")
出力以下の通りです:
http://www.jabong.com/belle-fille-Grey-Solid-Winter-Jacket-1310773.html Belle Fille Grey Solid Winter Jacket
http://www.jabong.com/Femella-Red-Solid-Winter-Jacket-2880302.html Femella Red Solid Winter Jacket
http://www.jabong.com/Style-Quotient-Fuchsia-Striped-Sweatshirt-2765328.html Style Quotient Fuchsia Striped Sweatshirt
私はそれが以下のように見えるように、出力にヘッダを追加する必要があります。
URL Brand Product_Name
http://www.jabong.com/belle-fille-Grey-Solid-Winter-Jacket-1310773.html Belle Fille Grey Solid Winter Jacket
http://www.jabong.com/Femella-Red-Solid-Winter-Jacket-2880302.html Femella Red Solid Winter Jacket
http://www.jabong.com/Style-Quotient-Fuchsia-Striped-Sweatshirt-2765328.html Style Quotient Fuchsia Striped Sweatshirt