資格情報（SSL）を必要とするウェブサイトからどうやって掻き出しますか？

誰かが私を正しい方向に向けることができるのだろうかと思っていました。 SSL対応のWebサイト（URLのhttps）からhtml/textの内容を拾いたいこのサイトのファイルシステムには複数のブランチがあります。資格情報（SSL）を必要とするウェブサイトからどうやって掻き出しますか？

私の質問は以下のとおりです。

私は私のRailsアプリケーション内から外部のウェブサイトのための資格情報を提供して行くにはどうすればよいですか？

ありがとうございます！

出典

2012-09-25 Symba

require 'httpclient' 
require 'nokogiri' 

client = HTTPClient.new 

client.set_auth("http://domain.com", "username", "password") 

doc = Nokogiri::HTML(c.get_content("http://example.com"))

やあみんな、遅く応答について申し訳ありませんが、私はいくつかのことが殺到してきました。上のコードは私のために働いた。（機械化した多くのタンゴと、他のノコギリをベースにした宝石の後に）。 openuri、mechanizeなどの他の宝石の中には、MD5 Unknown hashing algorithmなどのエラーが発生していました。あなたの時間と助けてくれてありがとう。

出典

2012-11-06 23:10:24 Symba

Typhoeus gemを使用してください。

私は以前もこの問題を抱えていました。

ANSWER

しかし、あなたはTyphoeus、

1.9.3p194 :001 > Typhoeus # Checking that Typhoeus gem is being used. 
=> Typhoeus 
1.9.3p194 :002 > url = "https://twitter.com/" 
=> "https://twitter.com/" 
1.9.3p194 :003 > response = Typhoeus::Request.get(url, :timeout => 5000) 

=> #<Typhoeus::Response:0x007fdd8cc00488 @code=200, @curl_return_code=0, @curl_error_message="No error", @status_message=nil, @http_version=nil, @headers="HTTP/1.1 200 OK\r\nDate: Tue, 25 Sep 2012 23:56:32 GMT\r\nStatus: 200 OK\r\nX-Runtime: 0.08814\r\nX-MID: 0cfcab7a410834bf31115f9a5cd7fb62651aa568\r\nStrict-Transport-Security: max-age=631138519\r\nCache-Control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0\r\nContent-Type: text/html; charset=utf-8\r\nX-Frame-Options: SAMEORIGIN\r\nLast-Modified: Tue, 25 Sep 2012 23:56:32 GMT\r\nETag: \"95db45f50f8dc1a45be3895e03a23d53\"\r\nExpires: Tue, 31 Mar 1981 05:00:00 GMT\r\nX-Transaction: 72253ef75f0755e1\r\nPragma: no-cache\r\nSet-Cookie: k=10.35.35.113.1348617392068257; path=/; expires=Tue, 02-Oct-12 23:56:32 GMT; domain=.twitter.com\r\nSet-Cookie: guest_id=v1%3A134861739271966362; domain=.twitter.com; path=/; expires=Fri, 26-Sep-2014 11:56:32 GMT\r\nSet-Cookie: _twitter_sess=BAh7CToPY3JlYXRlZF9hdGwrCFBS3P85AToMY3NyZl9pZCIlNTY2MzNjOTM0%250AOTIyMDE4ZmNkY2E4NjViZmE3ZTBkMDAiCmZsYXNoSUM6J0FjdGlvbkNvbnRy%250Ab2xsZXI6OkZsYXNoOjpGbGFzaEhhc2h7AAY6CkB1c2VkewA6B2lkIiViYjAw%250AY2Q1YWZkMDAwNmExNWJhNjAyYmNiNzBhOTA0Yg%253D%253D--5ffbea931432fe65a2128be90048e3bb6fc9dbca; domain=.twitter.com; path=/; HttpOnly\r\nX-XSS-Protection: 1; mode=block\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\nContent-Length: 13733\r\nServer: tfe\r\n\r\n", @body="<!DOCTYPE html>\n<html lang=\"en\">\n <head>\n <meta charset=\"utf-8\">\n \n <script>document.domain='twitter.com'</script>\n\n  <title>Twitter</title>\n\n <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\">\n \n  <meta name=\"description\" content=\"Instantly connect to what&#39;s most important to you. Follow your friends, experts, favorite celebrities, and breaking news.\">\n \n \n  <link href=\"/favicons/favicon.ico\" rel=\"shortcut icon\" type=\"image/x-icon\">\n \n \n   <link rel=\"stylesheet\" href=\"https://twimg0-a.akamaihd.net/a/1348559220/t1/css/t1_core_logged_out.bundle.css\" type=\"text/css\" media=\"screen\">\n \n  <link rel=\"stylesheet\" href=\"https://twimg0-a.akamaihd.net/a/13485592 

1.9.3p194 :005 > response.body # returns html document 
=> "<!DOCTYPE html>\n<html lang=\"en\">\n <head>\n <meta charset=\"utf-8\">\n \n <script>document.domain='twitter.com'</script>\n\n  <title>Twitter</title>\n\n <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge,chrome=1\">\n \n  <meta name=\"description\" content=\"Instantly connect to what&#39;s most important to you. Follow your friends, experts, favorite celebrities, and breaking news.\">\n \n \n  <link href=\"/favicons/favicon.ico\" rel=\"shortcut icon\" type=\"image/x-icon\">\n \n \n   <link rel=\"stylesheet\" href=\"https://twimg0-a.akamaihd.net/a/1348559220/t1/css/t1_core_logged_out.bundle.css\" type=\"text/css\" media=\"screen\">\n \n  <link rel=\"stylesheet\" href=\"https://twimg0-a.akamaihd.net/a/1348559220/t1/css/t1_more.bundle.css\" type=\"text/css\" media=\"screen\">\n \n   <script>\n  (function() {\n  function getPhxPath(){var a=l.href.match(/#(.)(.*)$/);return a&&a[1]==\"!\"&&a[2]}function getEvent(a){return a?(a=a.replace(/^#|\\/$/,\"\").toLowerCase(),a.match(/^[a-z0-9_]+$/)?a:!1):!1}function redirectEventPath(a){var a=getEvent(a);if(a){var b=document.referrer||\"none\",c=\"ev_redir_\"+a+\"=\"+b+\"; path=/\";document.cookie=c,l.replace(\"/hashtag/\"+a)}}function resolveInlineRedirects(){var a=getPhxPath();a&&l.replace(a),l.hash!=\"\"&&redirectEventPath(l.hash.substr(1).toLowerCase())}var l=window.location;resolveInlineRedirects(),window.addEventListener?window.addEventListener(\"hashchange\",resolveInlineRedirects,!1):window.attachEvent&&window.attachEvent(\"onhashchange\",resolveInlineRedirects);\n  }());\n  </script>\n \n <script>\n  \n  \n  (func

幸運を使用している場合！

出典

2012-09-26 00:01:35

ありがとう、私はこれを試してみましょう！ – Symba

Net/httpはhttpsをサポートしていますが、リクエストでuse_sslフラグを設定する必要があります –

Thanks Frederick。今すぐ削除されました。 –

私はこれを手伝うことができます。実際にはそれほど難しいことではありません。

open("http://...", :http_basic_authentication=>[user, password])

あなたが解析したいのであれば、私のクローラを適応させることもできます。ここにはその主な方法があります。

require "open-uri" 
require "zlib" 

SHINSO_HEADERS = { 
    'Accept'   => '*/*', 
    'Accept-Charset' => 'utf-8, windows-1251;q=0.7, *;q=0.6', 
    'Accept-Encoding' => 'gzip,deflate', 
    'Accept-Language' => 'bg-BG, bg;q=0.8, en;q=0.7, *;q=0.6', 
    'Connection'  => 'keep-alive', 
    'Cookie'   => '', 
    'From'   => '[email protected]', 
    'Referer'   => 'http://svejo.net/', 
    'User-Agent'  => 'Your user agent' 
} 

def crawl(url_address) 
    self.errors = Array.new 
    begin 
    begin 
     url_address = URI.parse(url_address) 
    rescue URI::InvalidURIError 
     url_address = URI.decode(url_address) 
     url_address = URI.encode(url_address) 
     url_address = URI.parse(url_address) 
    end 
    url_address.normalize! 
    stream = "" 
    timeout(8) { stream = url_address.open(SHINSO_HEADERS) } 
    if stream.size > 0 
     url_crawled = URI.parse(stream.base_uri.to_s) 
    else 
     self.errors << "Server said status 200 OK but document file is zero bytes." 
     return 
    end 
    rescue Exception => exception 
    self.errors << exception 
    return 
    end 
end

ははあなたが最終的に必要なものであるurl_crawled。

このアドレスをテスト用に使用してください。 https://developer.mozilla.org/en-US/docs/HTTP_access_control

サーバーが正しく構成されていない可能性がある場合は、証明書が賢明であることを確認してください。

また、パーズについて深刻な問題がある場合は、CharGuess gemとZlibを使用してコンテンツを読み取り、Iconvで問題のあるものを変換することも検討してください。ここに例があります。

if stream.content_encoding.include?('gzip') 
    document = Zlib::GzipReader.new(stream).read 
elsif stream.content_encoding.include?('deflate') 
    document = Zlib::Deflate.new().deflate(stream).read 
#elsif stream.content_encoding.include?('x-gzip') or 
#elsif stream.content_encoding.include?('compress') 
else 
    document = stream.read 
end 
self.charset_guess = CharGuess.guess(document)

次にアイコンをコンテンツに使用してください。

希望すると、これが役立ちます。

よろしく、 Yavor

出典

2012-09-26 15:04:40

資格情報（SSL）を必要とするウェブサイトからどうやって掻き出しますか？

答えて

関連する問題