都市の数が非常に多い場合、スレッドを正しく使用するにはどうすればよいですか？

問題を解決する方法都市の数が非常に多い場合、スレッドを正しく使用するにはどうすればよいですか？

warning: conflicting chdir during another chdir block

私は、市内のすべての場所を取得し、適切にコードを最適化し、正しい仕事を実装する方法添付ファイルでフォルダを作成しますか？ファイルがあるフォルダが存在するかどうかをチェックして、既存のファイルに新しいテキストを追加する方法？

require 'open-uri' 
require 'JSON' 
require 'thread' 

def scrape_instagram_city_page(page) 
    cityArray = [] 
    id = 0 
    begin 
     instagram_source = open(page).read 
     content = JSON.parse(instagram_source.split("window._sharedData = ")[1].split(";</script>")[0]) 
     locationName = content['entry_data']['LocationsDirectoryPage'][0]['city_info']['name'] 
     nextpage = content['entry_data']['LocationsDirectoryPage'][0]['next_page'] 
     Dir.mkdir("#{locationName}") 
     loop do 
      id +=1 
      instagram_source = open(page+"?page=#{id}").read 
      content = JSON.parse(instagram_source.split("window._sharedData = ")[1].split(";</script>")[0]) 
      locationsList = content['entry_data']['LocationsDirectoryPage'][0]['location_list'] 
      locationsList.each do |hash| 
       cityArray.push(hash['id'].to_i) 
      end 
      if nextpage == "null" 
       break 
      end 
     Dir.chdir("#{locationName}") do 
      fileName = "#{locationName}.txt" 
      File.open(fileName, 'w') do |file| 
       cityArray.each do |item| 
        file << "https://www.instagram.com/explore/locations/#{item}/\n" 
       end 
      end 
     end 
     end 
    rescue Exception => e 
     return nil 
    end 
end 

threads = [] 
city = ["https://www.instagram.com/explore/locations/c2269433/dhewng-thailand/","https://www.instagram.com/explore/locations/c2260532/ban-poek-thailand/","https://www.instagram.com/explore/locations/c2267999/ban-wang-takrai-thailand/","https://www.instagram.com/explore/locations/c2255595/ban-nong-kho-thailand/","https://www.instagram.com/explore/locations/c2252832/ban-na-khum-thailand/","https://www.instagram.com/explore/locations/c2267577/ban-wang-khaen-thailand/","https://www.instagram.com/explore/locations/c2248064/ban-khung-mae-luk-on-thailand/","https://www.instagram.com/explore/locations/c2243370/ban-hua-dong-kheng-thailand/","https://www.instagram.com/explore/locations/c2269271/chieng-sean-thailand/","https://www.instagram.com/explore/locations/c2256442/ban-nong-phiman-thailand/","https://www.instagram.com/explore/locations/c2246490/ban-khlong-khwang-thai-thailand/"] 
city.each do |page| 
    threads << Thread.new do 
     scrape_instagram_city_page "#{page}" 
    end 
end 

threads.each(&:join)

出典

2017-11-20 Gtufc92 Gtufc92

質問に答える前に、サイトのスクラップは、そのサイトの利用規約に違反することがよくあります。これをチェックして、違法行為をしていないことを確認してください。

chdirで変更された「カレントディレクトリ」は、すべてのスレッドが共有するプロセス全体の設定です。このため、2つのスレッドが同時にスレッドを変更しようとすると例外が発生します。作成しているスレッドの数とは関係ありません。

この問題を回避するには、現在のディレクトリを変更しないでください。

def scrape_instagram_city_page(page) 
    cityArray = [] 
    id = 0 
    begin 
     instagram_source = open(page).read 
     content = JSON.parse(instagram_source.split("window._sharedData = ")[1].split(";</script>")[0]) 
     locationName = content['entry_data']['LocationsDirectoryPage'][0]['city_info']['name'] 
     nextpage = content['entry_data']['LocationsDirectoryPage'][0]['next_page'] 
     Dir.mkdir("#{locationName}") 
     loop do 
      id +=1 
      instagram_source = open(page+"?page=#{id}").read 
      content = JSON.parse(instagram_source.split("window._sharedData = ")[1].split(";</script>")[0]) 
      locationsList = content['entry_data']['LocationsDirectoryPage'][0]['location_list'] 
      locationsList.each do |hash| 
       cityArray.push(hash['id'].to_i) 
      end 
      if nextpage == "null" 
       break 
      end 
      fileName = "#{locationName}/#{locationName}.txt" 
      File.open(fileName, 'w') do |file| 
       cityArray.each do |item| 
        file << "https://www.instagram.com/explore/locations/#{item}/\n" 
       end 
      end 
     end 
    rescue Exception => e 
     return nil 
    end 
end

出典

2017-11-21 17:53:26

スレッドを有効にする方法スレッドを効果的に使用するにはどうすればよいですか？ –

これは非常に主観的で幅広く、ここでは簡単に答えられません。たとえば、主要なパフォーマンスのボトルネックは、インターネット接続か、接続しているサイトのいずれかである可能性が非常に高いです。 –

この場合に使用できるスレッドの最大数または有効数はいくらですか？ –

都市の数が非常に多い場合、スレッドを正しく使用するにはどうすればよいですか？

答えて

関連する問題