私は注釈、数式、条件式を持つこのクエリを持っていますが、これは非常に遅く実行され、永遠にかかります。アノテーションと条件付きカウントが遅すぎるDjangoクエリ
私は、譜表の出版物を格納するモデルと、twitterの出版物を格納するモデルの2モデルを持っています。各出版物は、都市内の六角形の地理的領域を表す別のモデルへのFKも有する。
出版[FK] - > HexCityArea
TwitterPublication [FK] - > HexCityArea
私はそれぞれの六角形の資料をカウントしようとしているが、刊行物は、日付のような他の分野で事前にフィルタ処理されています、そのコードは次のとおりです。
instagram_publications_ids = list(instagram_publications.values_list('id', flat=True))
twitter_publications_ids = list(twitter_publications.values_list('id', flat=True))
print "\n[HEXAGONS QUERY]> List of publications ids insta\n %s \n" % instagram_publications.query
print instagram_publications.explain()
print "\n[HEXAGONS QUERY]> List of publications ids twitter\n %s \n" % twitter_publications.query
print twitter_publications.explain()
# Get count of publications by hexagon
resultant_hexagons = HexagonalCityArea.objects.filter(city=city).annotate(
instagram_count=Count(Case(
When(publication__id__in=instagram_publications_ids, then=1),
output_field=IntegerField(),
))
).annotate(
twitter_count=Count(Case(
When(twitterpublication__id__in=twitter_publications_ids, then=1),
output_field=IntegerField(),
))
)#filter(instagram_count__gt=0).filter(twitter_count__gt=0) # Discard empty hexagons
# For debug only
print "\n[HEXAGONS QUERY]> Count of publications\n %s \n" % resultant_hexagons.query
print resultant_hexagons.explain()
resultant_hexagons_list = list(resultant_hexagons)
# Iterate remaining hexagons
city_hexagons = [h for h in resultant_hexagons_list if h.instagram_count > 0 or h.twitter_count > 0]
あなたが見ることができるように、最初に私が選択した出版物のIDのリストを取得し、私はのみの出版物をカウントするために、後でそれらを使用しています。
私が見る1つの問題は、IDリストが28000個ほどの要素で非常に長いことですが、IDのリストを使用しないと希望の結果が得られない場合、カウント条件が正しく機能しませんその都市のすべての出版物が数えられる。
私はIDのリストを使用しないように、これを試してみた:ここ
resultant_hexagons = HexagonalCityArea.objects.filter(city=city).annotate(
instagram_count=Count(Case(
When(publication__in=instagram_publications, then=1),
output_field=IntegerField(),
))
).annotate(
twitter_count=Count(Case(
When(twitterpublication__in=twitter_publications, then=1),
output_field=IntegerField(),
))
).filter(instagram_count__gt=0).filter(twitter_count__gt=0) # Discard empty hexagons
# For debug only
print "\n[HEXAGONS QUERY]> Count of publications\n %s \n" % resultant_hexagons.query
print resultant_hexagons.explain()
は、生成されたSQLです:
SELECT
"instanalysis_hexagonalcityarea"."id",
"instanalysis_hexagonalcityarea"."created",
"instanalysis_hexagonalcityarea"."modified",
"instanalysis_hexagonalcityarea"."geom",
"instanalysis_hexagonalcityarea"."city_id",
COUNT(
CASE
WHEN
"instanalysis_publication"."id" IN
(
SELECT
U0."id"
FROM
"instanalysis_publication" U0
INNER JOIN
"instanalysis_instagramlocation" U1
ON (U0."location_id" = U1."id")
INNER JOIN
"instanalysis_spot" U2
ON (U1."spot_id" = U2."id")
INNER JOIN
"instanalysis_city" U3
ON (U2."city_id" = U3."id")
WHERE
(
U3."name" = Durban
AND U0."publication_date" >= 2016 - 12 - 01 00:00:00 + 01:00
AND U0."publication_date" <= 2016 - 12 - 11 00:00:00 + 01:00
)
)
THEN
1
ELSE
NULL
END
) AS "instagram_count", COUNT(
CASE
WHEN
"instanalysis_twitterpublication"."id" IN
(
SELECT
U0."id"
FROM
"instanalysis_twitterpublication" U0
INNER JOIN
"instanalysis_twitterlocation" U1
ON (U0."location_id" = U1."id")
INNER JOIN
"instanalysis_spot" U2
ON (U1."spot_id" = U2."id")
INNER JOIN
"instanalysis_city" U3
ON (U2."city_id" = U3."id")
WHERE
(
U3."name" = Durban
AND U0."publication_date" >= 2016 - 12 - 01 00:00:00 + 01:00
AND U0."publication_date" <= 2016 - 12 - 11 00:00:00 + 01:00
)
)
THEN
1
ELSE
NULL
END
) AS "twitter_count"
FROM
"instanalysis_hexagonalcityarea"
LEFT OUTER JOIN
"instanalysis_publication"
ON ("instanalysis_hexagonalcityarea"."id" = "instanalysis_publication"."hexagon_id")
LEFT OUTER JOIN
"instanalysis_twitterpublication"
ON ("instanalysis_hexagonalcityarea"."id" = "instanalysis_twitterpublication"."hexagon_id")
WHERE
"instanalysis_hexagonalcityarea"."city_id" = 7
GROUP BY
"instanalysis_hexagonalcityarea"."id"
HAVING
(COUNT(
CASE
WHEN
"instanalysis_publication"."id" IN
(
SELECT
U0."id"
FROM
"instanalysis_publication" U0
INNER JOIN
"instanalysis_instagramlocation" U1
ON (U0."location_id" = U1."id")
INNER JOIN
"instanalysis_spot" U2
ON (U1."spot_id" = U2."id")
INNER JOIN
"instanalysis_city" U3
ON (U2."city_id" = U3."id")
WHERE
(
U3."name" = Durban
AND U0."publication_date" >= 2016 - 12 - 01 00:00:00 + 01:00
AND U0."publication_date" <= 2016 - 12 - 11 00:00:00 + 01:00
)
)
THEN
1
ELSE
NULL
END
) > 0
AND COUNT(
CASE
WHEN
"instanalysis_twitterpublication"."id" IN
(
SELECT
U0."id"
FROM
"instanalysis_twitterpublication" U0
INNER JOIN
"instanalysis_twitterlocation" U1
ON (U0."location_id" = U1."id")
INNER JOIN
"instanalysis_spot" U2
ON (U1."spot_id" = U2."id")
INNER JOIN
"instanalysis_city" U3
ON (U2."city_id" = U3."id")
WHERE
(
U3."name" = Durban
AND U0."publication_date" >= 2016 - 12 - 01 00:00:00 + 01:00
AND U0."publication_date" <= 2016 - 12 - 11 00:00:00 + 01:00
)
)
THEN
1
ELSE
NULL
END
) > 0)
これは、はるかに高速で、analizeを説明参照:
GroupAggregate (cost=1.14..743590.08 rows=3300 width=184) (actual time=5186.606..46907.530 rows=334 loops=1)
Group Key: instanalysis_hexagonalcityarea.id
Filter: ((count(CASE WHEN (hashed SubPlan 3) THEN 1 ELSE NULL::integer END) > 0) AND (count(CASE WHEN (hashed SubPlan 4) THEN 1 ELSE NULL::integer END) > 0))
Rows Removed by Filter: 2966
-> Merge Left Join (cost=1.14..320194.96 rows=7166797 width=184) (actual time=4851.792..17369.232 rows=70436610 loops=1)
Merge Cond: (instanalysis_hexagonalcityarea.id = instanalysis_publication.hexagon_id)
-> Merge Left Join (cost=0.71..21686.40 rows=49328 width=180) (actual time=109.033..164.451 rows=30857 loops=1)
Merge Cond: (instanalysis_hexagonalcityarea.id = instanalysis_twitterpublication.hexagon_id)
-> Index Scan using instanalysis_hexagonalcityarea_pkey on instanalysis_hexagonalcityarea (cost=0.29..591.47 rows=3300 width=176) (actual time=22.783..23.878 rows=3300 loops=1)
Filter: (city_id = 7)
Rows Removed by Filter: 7282
-> Index Scan using instanalysis_twitterpublication_5c78aecb on instanalysis_twitterpublication (cost=0.42..64392.25 rows=504291 width=8) (actual time=0.018..111.677 rows=170305 loops=1)
-> Materialize (cost=0.43..501402.61 rows=3754731 width=8) (actual time=0.011..6788.670 rows=71922153 loops=1)
-> Index Scan using instanalysis_publication_5c78aecb on instanalysis_publication (cost=0.43..492015.78 rows=3754731 width=8) (actual time=0.005..4034.838 rows=1778030 loops=1)
SubPlan 1
-> Nested Loop (cost=0.72..105061.24 rows=27624 width=4) (actual time=0.326..74.024 rows=21824 loops=1)
-> Nested Loop (cost=0.29..620.11 rows=2767 width=4) (actual time=0.024..2.915 rows=3374 loops=1)
-> Nested Loop (cost=0.00..143.13 rows=504 width=4) (actual time=0.016..0.618 rows=829 loops=1)
Join Filter: (u2.city_id = u3.id)
Rows Removed by Join Filter: 3350
-> Seq Scan on instanalysis_city u3 (cost=0.00..1.10 rows=1 width=4) (actual time=0.004..0.006 rows=1 loops=1)
Filter: ((name)::text = 'Durban'::text)
Rows Removed by Filter: 7
-> Seq Scan on instanalysis_spot u2 (cost=0.00..89.79 rows=4179 width=8) (actual time=0.001..0.242 rows=4179 loops=1)
-> Index Scan using instanalysis_instagramlocation_e72b53d4 on instanalysis_instagramlocation u1 (cost=0.29..0.89 rows=6 width=8) (actual time=0.001..0.002 rows=4 loops=829)
Index Cond: (spot_id = u2.id)
-> Index Scan using instanalysis_publication_e274a5da on instanalysis_publication u0 (cost=0.43..37.45 rows=30 width=8) (actual time=0.006..0.021 rows=6 loops=3374)
Index Cond: (location_id = u1.id)
Filter: ((publication_date >= '2016-11-30 23:00:00+00'::timestamp with time zone) AND (publication_date <= '2016-12-10 23:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 80
SubPlan 2
-> Hash Join (cost=2595.62..25893.51 rows=9013 width=4) (actual time=22.511..73.141 rows=6220 loops=1)
Hash Cond: (u0_1.location_id = u1_1.id)
-> Seq Scan on instanalysis_twitterpublication u0_1 (cost=0.00..22927.36 rows=74772 width=8) (actual time=15.212..59.628 rows=75775 loops=1)
Filter: ((publication_date >= '2016-11-30 23:00:00+00'::timestamp with time zone) AND (publication_date <= '2016-12-10 23:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 428516
-> Hash (cost=2348.24..2348.24 rows=19790 width=4) (actual time=6.538..6.538 rows=15589 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 805kB
-> Nested Loop (cost=0.70..2348.24 rows=19790 width=4) (actual time=0.023..5.052 rows=15589 loops=1)
-> Nested Loop (cost=0.28..39.28 rows=504 width=4) (actual time=0.015..0.186 rows=829 loops=1)
-> Seq Scan on instanalysis_city u3_1 (cost=0.00..1.10 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=1)
Filter: ((name)::text = 'Durban'::text)
Rows Removed by Filter: 7
-> Index Scan using instanalysis_spot_c7141997 on instanalysis_spot u2_1 (cost=0.28..33.14 rows=504 width=8) (actual time=0.010..0.124 rows=829 loops=1)
Index Cond: (city_id = u3_1.id)
-> Index Scan using instanalysis_twitterlocation_e72b53d4 on instanalysis_twitterlocation u1_1 (cost=0.42..3.93 rows=65 width=8) (actual time=0.001..0.004 rows=19 loops=829)
Index Cond: (spot_id = u2_1.id)
SubPlan 3
-> Nested Loop (cost=0.72..105061.24 rows=27624 width=4) (actual time=0.348..80.863 rows=21824 loops=1)
-> Nested Loop (cost=0.29..620.11 rows=2767 width=4) (actual time=0.028..3.507 rows=3374 loops=1)
-> Nested Loop (cost=0.00..143.13 rows=504 width=4) (actual time=0.016..0.646 rows=829 loops=1)
Join Filter: (u2_2.city_id = u3_2.id)
Rows Removed by Join Filter: 3350
-> Seq Scan on instanalysis_city u3_2 (cost=0.00..1.10 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=1)
Filter: ((name)::text = 'Durban'::text)
Rows Removed by Filter: 7
-> Seq Scan on instanalysis_spot u2_2 (cost=0.00..89.79 rows=4179 width=8) (actual time=0.001..0.276 rows=4179 loops=1)
-> Index Scan using instanalysis_instagramlocation_e72b53d4 on instanalysis_instagramlocation u1_2 (cost=0.29..0.89 rows=6 width=8) (actual time=0.001..0.003 rows=4 loops=829)
Index Cond: (spot_id = u2_2.id)
-> Index Scan using instanalysis_publication_e274a5da on instanalysis_publication u0_2 (cost=0.43..37.45 rows=30 width=8) (actual time=0.007..0.022 rows=6 loops=3374)
Index Cond: (location_id = u1_2.id)
Filter: ((publication_date >= '2016-11-30 23:00:00+00'::timestamp with time zone) AND (publication_date <= '2016-12-10 23:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 80
SubPlan 4
-> Hash Join (cost=2595.62..25893.51 rows=9013 width=4) (actual time=41.392..92.680 rows=6220 loops=1)
Hash Cond: (u0_3.location_id = u1_3.id)
-> Seq Scan on instanalysis_twitterpublication u0_3 (cost=0.00..22927.36 rows=74772 width=8) (actual time=32.641..78.020 rows=75775 loops=1)
Filter: ((publication_date >= '2016-11-30 23:00:00+00'::timestamp with time zone) AND (publication_date <= '2016-12-10 23:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 428516
-> Hash (cost=2348.24..2348.24 rows=19790 width=4) (actual time=7.907..7.907 rows=15589 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 805kB
-> Nested Loop (cost=0.70..2348.24 rows=19790 width=4) (actual time=0.044..6.136 rows=15589 loops=1)
-> Nested Loop (cost=0.28..39.28 rows=504 width=4) (actual time=0.026..0.220 rows=829 loops=1)
-> Seq Scan on instanalysis_city u3_3 (cost=0.00..1.10 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=1)
Filter: ((name)::text = 'Durban'::text)
Rows Removed by Filter: 7
-> Index Scan using instanalysis_spot_c7141997 on instanalysis_spot u2_3 (cost=0.28..33.14 rows=504 width=8) (actual time=0.016..0.135 rows=829 loops=1)
Index Cond: (city_id = u3_3.id)
-> Index Scan using instanalysis_twitterlocation_e72b53d4 on instanalysis_twitterlocation u1_3 (cost=0.42..3.93 rows=65 width=8) (actual time=0.001..0.005 rows=19 loops=829)
Index Cond: (spot_id = u2_3.id)
Planning time: 50.735 ms
Execution time: 46908.482 ms
問題は、私が欲しいものを手に入れられないということです。それはもっと多くの出版物を数えているようです。パブリケートは以前は日付でフィルタリングされており、フィルタリングされたパブリケーションの数は各六角形に含まれているだけですが、When節が機能していなかった場合と同様に、すべてのパブリケーションを六角形で数えているようです。
ありがとうございました。
なぜ[カウント集計](https://docs.djangoproject.com/en/1.10/topics/db/aggregation/#generating-aggregates-for-each-item-in-a-queryset )はオプションではありませんか?理論的には、 'count'を使った2つの集約クエリは、IN節を持つユニオンクエリよりも効率的になるはずです – Marat
あなたのコメント@Maratに感謝します。あなたの方法ははるかに高速ですが、問題は私が間違った結果を得ることです。私はポストをSQLで更新し、アナリシスを説明しました。 –