2017-03-15 21 views
0

私はFlickrから他のサイトの画像をダウンロードできるPythonスクリプトに取り組んでいます。 Flickr APIを使用して、ダウンロードしようとしているさまざまなサイズの画像を取得し、元のサイズのURLを特定します。さて、それは私がしようとしているものです。ここに私のコードは私のコード全体の印刷()ステートメントを使用してPythonの正規表現が動作しない

URL = {a Flickr link} 

flickr = re.match(r".*flickr\.com\/photos\/([^\/]+)\/([0-9^\/]+)\/", URL) 
URL = "https://api.flickr.com/services/rest/?method=flickr.photos.getSizes&api_key=6002c84e96ff95c1a861eafafa4284ba&photo_id=" + flickr.group(2) + "&format=json&nojsoncallback=1" 

request = requests.get(URL) 
result = request.text 

parsed = re.match(r".\"Original\".*\"source\"\: \"([^\"]+)", result) 
URL = parsed.group(1) 

...これまでのところですが、私は最初の正規表現は正しく動作する(写真のIDを識別するために、元のFlickrのURLを解析する)ことを知って、そのAPIリクエストは、(例えばURL https://www.flickr.com/photos/matbellphotography/33413612735/sizes/h/を使用して)次の結果を返し、正常に動作...

{ "sizes": { "canblog": 0, "canprint": 0, "candownload": 1, 
"size": [ 
    { "label": "Square", "width": 75, "height": 75, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_s.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/sq\/", "media": "photo" }, 
    { "label": "Large Square", "width": "150", "height": "150", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_q.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/q\/", "media": "photo" }, 
    { "label": "Thumbnail", "width": 100, "height": 67, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_t.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/t\/", "media": "photo" }, 
    { "label": "Small", "width": "240", "height": "160", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_m.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/s\/", "media": "photo" }, 
    { "label": "Small 320", "width": "320", "height": "213", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_n.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/n\/", "media": "photo" }, 
    { "label": "Medium", "width": "500", "height": "333", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/m\/", "media": "photo" }, 
    { "label": "Medium 640", "width": "640", "height": "427", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_z.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/z\/", "media": "photo" }, 
    { "label": "Medium 800", "width": "800", "height": "534", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_c.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/c\/", "media": "photo" }, 
    { "label": "Large", "width": "1024", "height": "683", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_b.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/l\/", "media": "photo" }, 
    { "label": "Large 1600", "width": "1600", "height": "1067", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_4d92e2f70d_h.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/h\/", "media": "photo" }, 
    { "label": "Large 2048", "width": "2048", "height": "1365", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_81441ed1da_k.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/k\/", "media": "photo" }, 
    { "label": "Original", "width": "5760", "height": "3840", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_34cbc172c1_o.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/o\/", "media": "photo" } 
] }, "stat": "ok" } 

私のコードは明らかにその後分解します。 2番目の正規表現は、元のファイルサイズでの画像のダウンロードURLを特定するためのもので、一致するものは見当たりません。さらに別の印刷()声明によると...

parsed.group(1) = none 

私のセットアップ私は、JSONの結果から必要な正確に何を識別RegExrを用いた発現、。私は何を間違えたのですか?

+1

're.match'の代わりに' re.search'を使用する方がいいと思っています。 –

+0

@Rawingこの時点で、ほとんどすべての選択肢を追求します!この場合、なぜよりうまくいくのでしょうか? – Andrew

+2

jsonパーサを使ってみませんか?次に、必要なデータを含む属性にアクセスするだけで、自動的にエスケープされます。実際には、すでにリクエストライブラリを使用しています。 '.text'の代わりに' .json'を使うことができます – Shadow

答えて

3

あなたのrequests.Responseオブジェクトには、直接アクセスできるjsonという属性があります。そうでない場合は、単にimport jsonを入力し、request.contentを解析し、返された辞書を使用します。例:

>>> import json 
>>> json_response = """ 
... { "sizes": { "canblog": 0, "canprint": 0, "candownload": 1, 
... "size": [ 
... { "label": "Square", "width": 75, "height": 75, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_s.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/sq\/", "media": "photo" }, 
... { "label": "Large Square", "width": "150", "height": "150", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_q.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/q\/", "media": "photo" }, 
... { "label": "Thumbnail", "width": 100, "height": 67, "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_t.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/t\/", "media": "photo" }, 
... { "label": "Small", "width": "240", "height": "160", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_m.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/s\/", "media": "photo" }, 
... { "label": "Small 320", "width": "320", "height": "213", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_n.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/n\/", "media": "photo" }, 
... { "label": "Medium", "width": "500", "height": "333", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/m\/", "media": "photo" }, 
... { "label": "Medium 640", "width": "640", "height": "427", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_z.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/z\/", "media": "photo" }, 
... { "label": "Medium 800", "width": "800", "height": "534", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_c.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/c\/", "media": "photo" }, 
... { "label": "Large", "width": "1024", "height": "683", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_645397d6a5_b.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/l\/", "media": "photo" }, 
... { "label": "Large 1600", "width": "1600", "height": "1067", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_4d92e2f70d_h.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/h\/", "media": "photo" }, 
... { "label": "Large 2048", "width": "2048", "height": "1365", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_81441ed1da_k.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/k\/", "media": "photo" }, 
... { "label": "Original", "width": "5760", "height": "3840", "source": "https:\/\/farm3.staticflickr.com\/2855\/33413612735_34cbc172c1_o.jpg", "url": "https:\/\/www.flickr.com\/photos\/matbellphotography\/33413612735\/sizes\/o\/", "media": "photo" } 
... ] }, "stat": "ok" }""" 
>>> 
>>> json_parsed = json.loads(json_response) 
>>> for img in json_parsed["sizes"]["size"]: 
...  print img.get("source") 
... 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_s.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_q.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_t.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_m.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_n.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_z.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_c.jpg 
https://farm3.staticflickr.com/2855/33413612735_645397d6a5_b.jpg 
https://farm3.staticflickr.com/2855/33413612735_4d92e2f70d_h.jpg 
https://farm3.staticflickr.com/2855/33413612735_81441ed1da_k.jpg 
https://farm3.staticflickr.com/2855/33413612735_34cbc172c1_o.jpg 
>>> 
+0

それは働いた!私はprintステートメントのための小さな変更 - かっこを作成する必要がありました – Andrew

+0

"オリジナル"のラベルを持つエントリのURLだけを識別させる方法はありますか? – Andrew

+1

@Andrew:ちょうど 'dict'の' list'です。 'dict'を' what '[' label '] ==' Original''で使用してください。 –

関連する問題