2016-04-01 12 views
1

私はいくつかのYouTubeチャンネルから新しい音楽を自動的に確認してダウンロードする小さなプログラムに取り組んでいます。私は現在、各チャンネルにアップロードされたすべての動画のリンクを取得する方法を検討しています。これはスクレーパーのようにしています。 (はい、YouTubeのAPIは、おそらく行くための適切な方法だろうが、私はまだちゃんとそれを使用する方法がわからない。)Python拡張YouTubeアップロード

from __future__ import unicode_literals 
from bs4 import BeautifulSoup 
import urllib.request 

ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos' 
r = urllib.request.urlopen(ytlink).read() 
soup = BeautifulSoup(r, "html.parser") 
links = soup.find_all('a', {"class": "yt-uix-sessionlink yt-uix-tile-link spf-link yt-ui-ellipsis yt-ui-ellipsis-2"}) 

for tag in links: 
    link = tag.get('href', None) 
    if link is not None: 
     print(link) 

これは私が現在持っているもので、問題は、それが現在だけであり、彼らはスクリーン上の唯一のものなので、最初の30ビデオリンクをつかむ。私はすでに"Load More"ボタンが押されると、いくつかのJavaScriptによって開始されるいくつかのAjaxを実行することを見てきました。私の質問です:すべてのアップロードが表示されるまで、"Load More"ボタンをトリガーし続けるようにPythonを取得するにはどうすればよいですか?

+0

あなたがすることはできません。私たちはまったく同じ出力を参照してください実行する場合

from selenium import webdriver from selenium.common.exceptions import NoSuchElementException,StaleElementReferenceException ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos' hrefs = "a.yt-uix-sessionlink.yt-uix-tile-link.spf-link.yt-ui-ellipsis.yt-ui-ellipsis-2" ajax= "button[data-uix-load-more-href]" dr = webdriver.PhantomJS() dr.get(ytlink) while True: try: load_mode_b = dr.find_element_by_css_selector(ajax) load_mode_b.click() except StaleElementReferenceException as e: print(e) except NoSuchElementException as e: print(e) break 

あなたは次のようになりますどのPhantomJsseleniumを使用することができますブラウザなどがあります。 BeautifulSoupは実際にJSをあなたのために解析する機能を持っていないので、SeleniumやPython Mechanizeは、これらのようなタスクを探す必要があります。 –

+0

まあ、それはあなたが運が良ければ、AJAX呼び出しが送られるURLを見つけることができますが、それは非常に厄介なものになります。 –

+0

申し訳ありませんが、私はそれを調べるでしょう。ありがとう。 –

答えて

1

あなたは簡単にAjaxの呼び出しを模倣し、JSONの出力が返さ解析することができ、私達はちょうど/browse_ajax?action_continuation=... URLを引くと、それはJSONでもはやなるまで要求し続けるんする必要が返さ:

あなたを与えるだろう
from bs4 import BeautifulSoup 
import requests 
from urlparse import urljoin # python 3 -> from urllib.parse import urljoin 


def get_links(): 
    # cretate all css selectors 
    ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos' 
    ajax_css = "button[data-uix-load-more-href]" 
    link_css = "a.yt-uix-sessionlink.yt-uix-tile-link.spf-link.yt-ui-ellipsis.yt-ui-ellipsis-2" 
    base = "https://www.youtube.com/" 


    r = requests.get(ytlink).content 
    soup = BeautifulSoup(r, "lxml") 

    # yield first visible links 
    for link in soup.select(link_css): 
     yield urljoin(base, link["href"]) 

    # Load more button 
    ajax = soup.select(ajax_css)[0]["data-uix-load-more-href"] 

    while True: 
     print(ajax) 
     r = requests.get(urljoin('https://www.youtube.com/', ajax)) 

     # next html is stored in the json.values() 
     soup = BeautifulSoup("".join(r.json().values()), "lxml") 
     for link in soup.select(link_css): 
      yield urljoin(base, link["href"]) 

     ajax = soup.select(ajax_css) 
     # if empty "Load more" button would be gone 
     if not ajax: 
      break 
     ajax = ajax[0]["data-uix-load-more-href"] 

をすべての87リンク。

In [26]: links = list(get_links()) 
/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ1V2b3Vsdnd6Q25VVms3eW9kdUlfR3caJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk1yZ0JBQSUzRCUzRA%253D%253D 
/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ1V2b3Vsdnd6Q25VVms3eW9kdUlfR3caJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D 

In [27]: len(links) 
Out[27]: 87 

In [28]: print(links) 
['https://www.youtube.com/watch?v=kjmzIu4VJEY', 'https://www.youtube.com/watch?v=ecRpNV8Xob8', 'https://www.youtube.com/watch?v=mdHoaoAhnMo', 'https://www.youtube.com/watch?v=3oqBKEvdrqE', 'https://www.youtube.com/watch?v=VIbvfOd34-A', 'https://www.youtube.com/watch?v=x4G8ge1VO5s', 'https://www.youtube.com/watch?v=EkW0f2iUOCc', 'https://www.youtube.com/watch?v=Ex2NIeXfYl8', 'https://www.youtube.com/watch?v=XMd4pSX-aVs', 'https://www.youtube.com/watch?v=ZS7KjUjlLWA', 'https://www.youtube.com/watch?v=ZEq9sQJLOgg', 'https://www.youtube.com/watch?v=nSgaCowC5TY', 'https://www.youtube.com/watch?v=nV5Ive_zJT4', 'https://www.youtube.com/watch?v=snThWzMroaA', 'https://www.youtube.com/watch?v=Ud6YhBCucPg', 'https://www.youtube.com/watch?v=1nSfyivyxdg', 'https://www.youtube.com/watch?v=b7hf2wqpUY4', 'https://www.youtube.com/watch?v=cVBvxkVt9wc', 'https://www.youtube.com/watch?v=pcI25yU9yso', 'https://www.youtube.com/watch?v=EMIZZS8HY8A', 'https://www.youtube.com/watch?v=xWD3Zi23rIs', 'https://www.youtube.com/watch?v=M-IbllcTi64', 'https://www.youtube.com/watch?v=U_tW_UxG8bM', 'https://www.youtube.com/watch?v=vQd0mopVnQg', 'https://www.youtube.com/watch?v=mG8NJlsg4rI', 'https://www.youtube.com/watch?v=PsaNY6xpnKY', 'https://www.youtube.com/watch?v=839h3eZMSWA', 'https://www.youtube.com/watch?v=Q_yytPtWmP0', 'https://www.youtube.com/watch?v=oGESQfB9dYM', 'https://www.youtube.com/watch?v=mO5R-1uTJhg', 'https://www.youtube.com/watch?v=wgqLck9SFOc', 'https://www.youtube.com/watch?v=GCaFEsxd-Y8', 'https://www.youtube.com/watch?v=VlpMbnOqP20', 'https://www.youtube.com/watch?v=bj1QT5bxFlA', 'https://www.youtube.com/watch?v=SMtKCu6a7gQ', 'https://www.youtube.com/watch?v=RV6x33mf4WI', 'https://www.youtube.com/watch?v=WhlXuTtmNqE', 'https://www.youtube.com/watch?v=7TWN1G5e-tg', 'https://www.youtube.com/watch?v=jgjeYTkROyk', 'https://www.youtube.com/watch?v=0hFkFoOf-aA', 'https://www.youtube.com/watch?v=yH1u_KQapfw', 'https://www.youtube.com/watch?v=5-l-FGDsbjw', 'https://www.youtube.com/watch?v=sFSgyE64Jjw', 'https://www.youtube.com/watch?v=OhDBtfvv2BM', 'https://www.youtube.com/watch?v=uFgPFi04oTo', 'https://www.youtube.com/watch?v=58a45EfYv1g', 'https://www.youtube.com/watch?v=jtYl5TbK2nc', 'https://www.youtube.com/watch?v=TI-1qxoDRnw', 'https://www.youtube.com/watch?v=Q0M90HqibHI', 'https://www.youtube.com/watch?v=Llb19v7QiXU', 'https://www.youtube.com/watch?v=sqhL_Ms6vuY', 'https://www.youtube.com/watch?v=YFFRgAjXs1Y', 'https://www.youtube.com/watch?v=8eHFG5AACHI', 'https://www.youtube.com/watch?v=_eVOx8Sw9Jg', 'https://www.youtube.com/watch?v=9s_XvG3M-UI', 'https://www.youtube.com/watch?v=lzdO01_tKFo', 'https://www.youtube.com/watch?v=uA2KkxfSW_U', 'https://www.youtube.com/watch?v=29Lt1LQtp5k', 'https://www.youtube.com/watch?v=nfJ9p5iJGz8', 'https://www.youtube.com/watch?v=cjMHd1xVlS0', 'https://www.youtube.com/watch?v=tkZ0FISTxkk', 'https://www.youtube.com/watch?v=bkhD8kYi4MI', 'https://www.youtube.com/watch?v=_bQajpTnOrY', 'https://www.youtube.com/watch?v=XglzEbcjP8c', 'https://www.youtube.com/watch?v=KBszbh6Qwag', 'https://www.youtube.com/watch?v=rVGWndVjCYg', 'https://www.youtube.com/watch?v=AgJxj2cUoyQ', 'https://www.youtube.com/watch?v=TaEVwakp_rI', 'https://www.youtube.com/watch?v=-YnpS-IaYCw', 'https://www.youtube.com/watch?v=sEFSFU2a9CY', 'https://www.youtube.com/watch?v=Jc2aVD4pwnk', 'https://www.youtube.com/watch?v=aY1dOJEv4j4', 'https://www.youtube.com/watch?v=bwjXt2pWoBE', 'https://www.youtube.com/watch?v=Dqn26tWxNsI', 'https://www.youtube.com/watch?v=wiv6JqGhcCU', 'https://www.youtube.com/watch?v=IFi47HLPqoM', 'https://www.youtube.com/watch?v=N1zdWugNdy0', 'https://www.youtube.com/watch?v=ngOBscDs3T4', 'https://www.youtube.com/watch?v=RT5dQVZ-VQY', 'https://www.youtube.com/watch?v=bifExgZW7k0', 'https://www.youtube.com/watch?v=fBEbaEgox1Y', 'https://www.youtube.com/watch?v=wDy9aGFngkY', 'https://www.youtube.com/watch?v=i06Iv0k5fVY', 'https://www.youtube.com/watch?v=2NaRXV7uyPE', 'https://www.youtube.com/watch?v=Hl0nIoLJUU0', 'https://www.youtube.com/watch?v=iXo0T4dRdgA', 'https://www.youtube.com/watch?v=i-7H5Wq0_2Y'] 

私はあなたがそれがどのように変化するかを見ることができるでprint(ajax)コールを残しました。あなたはヘッドレスを使用しない限り、

In [32]: l = [a.get_attribute("href") for a in dr.find_elements_by_css_selector(hrefs)] 

In [33]: len(l) 
Out[33]: 87 

In [34]: print(l) 
[u'https://www.youtube.com/watch?v=kjmzIu4VJEY', u'https://www.youtube.com/watch?v=ecRpNV8Xob8', u'https://www.youtube.com/watch?v=mdHoaoAhnMo', u'https://www.youtube.com/watch?v=3oqBKEvdrqE', u'https://www.youtube.com/watch?v=VIbvfOd34-A', u'https://www.youtube.com/watch?v=x4G8ge1VO5s', u'https://www.youtube.com/watch?v=EkW0f2iUOCc', u'https://www.youtube.com/watch?v=Ex2NIeXfYl8', u'https://www.youtube.com/watch?v=XMd4pSX-aVs', u'https://www.youtube.com/watch?v=ZS7KjUjlLWA', u'https://www.youtube.com/watch?v=ZEq9sQJLOgg', u'https://www.youtube.com/watch?v=nSgaCowC5TY', u'https://www.youtube.com/watch?v=nV5Ive_zJT4', u'https://www.youtube.com/watch?v=snThWzMroaA', u'https://www.youtube.com/watch?v=Ud6YhBCucPg', u'https://www.youtube.com/watch?v=1nSfyivyxdg', u'https://www.youtube.com/watch?v=b7hf2wqpUY4', u'https://www.youtube.com/watch?v=cVBvxkVt9wc', u'https://www.youtube.com/watch?v=pcI25yU9yso', u'https://www.youtube.com/watch?v=EMIZZS8HY8A', u'https://www.youtube.com/watch?v=xWD3Zi23rIs', u'https://www.youtube.com/watch?v=M-IbllcTi64', u'https://www.youtube.com/watch?v=U_tW_UxG8bM', u'https://www.youtube.com/watch?v=vQd0mopVnQg', u'https://www.youtube.com/watch?v=mG8NJlsg4rI', u'https://www.youtube.com/watch?v=PsaNY6xpnKY', u'https://www.youtube.com/watch?v=839h3eZMSWA', u'https://www.youtube.com/watch?v=Q_yytPtWmP0', u'https://www.youtube.com/watch?v=oGESQfB9dYM', u'https://www.youtube.com/watch?v=mO5R-1uTJhg', u'https://www.youtube.com/watch?v=wgqLck9SFOc', u'https://www.youtube.com/watch?v=GCaFEsxd-Y8', u'https://www.youtube.com/watch?v=VlpMbnOqP20', u'https://www.youtube.com/watch?v=bj1QT5bxFlA', u'https://www.youtube.com/watch?v=SMtKCu6a7gQ', u'https://www.youtube.com/watch?v=RV6x33mf4WI', u'https://www.youtube.com/watch?v=WhlXuTtmNqE', u'https://www.youtube.com/watch?v=7TWN1G5e-tg', u'https://www.youtube.com/watch?v=jgjeYTkROyk', u'https://www.youtube.com/watch?v=0hFkFoOf-aA', u'https://www.youtube.com/watch?v=yH1u_KQapfw', u'https://www.youtube.com/watch?v=5-l-FGDsbjw', u'https://www.youtube.com/watch?v=sFSgyE64Jjw', u'https://www.youtube.com/watch?v=OhDBtfvv2BM', u'https://www.youtube.com/watch?v=uFgPFi04oTo', u'https://www.youtube.com/watch?v=58a45EfYv1g', u'https://www.youtube.com/watch?v=jtYl5TbK2nc', u'https://www.youtube.com/watch?v=TI-1qxoDRnw', u'https://www.youtube.com/watch?v=Q0M90HqibHI', u'https://www.youtube.com/watch?v=Llb19v7QiXU', u'https://www.youtube.com/watch?v=sqhL_Ms6vuY', u'https://www.youtube.com/watch?v=YFFRgAjXs1Y', u'https://www.youtube.com/watch?v=8eHFG5AACHI', u'https://www.youtube.com/watch?v=_eVOx8Sw9Jg', u'https://www.youtube.com/watch?v=9s_XvG3M-UI', u'https://www.youtube.com/watch?v=lzdO01_tKFo', u'https://www.youtube.com/watch?v=uA2KkxfSW_U', u'https://www.youtube.com/watch?v=29Lt1LQtp5k', u'https://www.youtube.com/watch?v=nfJ9p5iJGz8', u'https://www.youtube.com/watch?v=cjMHd1xVlS0', u'https://www.youtube.com/watch?v=tkZ0FISTxkk', u'https://www.youtube.com/watch?v=bkhD8kYi4MI', u'https://www.youtube.com/watch?v=_bQajpTnOrY', u'https://www.youtube.com/watch?v=XglzEbcjP8c', u'https://www.youtube.com/watch?v=KBszbh6Qwag', u'https://www.youtube.com/watch?v=rVGWndVjCYg', u'https://www.youtube.com/watch?v=AgJxj2cUoyQ', u'https://www.youtube.com/watch?v=TaEVwakp_rI', u'https://www.youtube.com/watch?v=-YnpS-IaYCw', u'https://www.youtube.com/watch?v=sEFSFU2a9CY', u'https://www.youtube.com/watch?v=Jc2aVD4pwnk', u'https://www.youtube.com/watch?v=aY1dOJEv4j4', u'https://www.youtube.com/watch?v=bwjXt2pWoBE', u'https://www.youtube.com/watch?v=Dqn26tWxNsI', u'https://www.youtube.com/watch?v=wiv6JqGhcCU', u'https://www.youtube.com/watch?v=IFi47HLPqoM', u'https://www.youtube.com/watch?v=N1zdWugNdy0', u'https://www.youtube.com/watch?v=ngOBscDs3T4', u'https://www.youtube.com/watch?v=RT5dQVZ-VQY', u'https://www.youtube.com/watch?v=bifExgZW7k0', u'https://www.youtube.com/watch?v=fBEbaEgox1Y', u'https://www.youtube.com/watch?v=wDy9aGFngkY', u'https://www.youtube.com/watch?v=i06Iv0k5fVY', u'https://www.youtube.com/watch?v=2NaRXV7uyPE', u'https://www.youtube.com/watch?v=Hl0nIoLJUU0', u'https://www.youtube.com/watch?v=iXo0T4dRdgA', u'https://www.youtube.com/watch?v=i-7H5Wq0_2Y'] 
関連する問題