Postgresql - 配列内の一致数で並べ替え

私はタグのリストを与える最もマッチする項目のリストを取得しようとしています。以下のデータで：Postgresql - 配列内の一致数で並べ替え

DROP TABLE IF EXISTS testing_items; 
CREATE TEMP TABLE testing_items(
    id bigserial primary key, 
    tags text[] 
); 
CREATE INDEX ON testing_items using gin (tags); 

INSERT INTO testing_items (tags) VALUES ('{123,456, abc}'); 
INSERT INTO testing_items (tags) VALUES ('{222,333}'); 
INSERT INTO testing_items (tags) VALUES ('{222,555}'); 
INSERT INTO testing_items (tags) VALUES ('{222,123}'); 
INSERT INTO testing_items (tags) VALUES ('{222,123,555,666}');

私はタグ222,555 and 666を持っています。このようなリストはどのように入手できますか？

ps：大量のレコードがあるため、GINインデックスを使用する必要があります。任意のタグ

1   0

出典

2017-02-28 Eduardo

と一致しないため、

id   matches 
--   ------- 
5   3 
3   2 
2   1 
4   1

編集同上1がリストにすべきではない、ここでそれを確認してください：http://rextester.com/UTGO74511

あなたはGINインデックスを使用している場合は、使用& &：

select * 
from testing_items 
where not (ARRAY['333','555','666'] && tags); 


id | tags 
--- ------------- 
1 123456abc 
4 222123

個の

出典

2017-02-28 23:18:57 McNets

これは、すべてのタグに一致するアイテムのみを返します。 – Eduardo

申し訳ありませんが、私は質問を誤解しました。 – McNets

見てください：http://stackoverflow.com/a/24330181/3270427 – McNets

ネスト解除タグ、フィルタネスト解除要素と集約残り：

select id, count(*) 
from testing_items, 
unnest(array['11','5','8']) u 
where tags @> array[u] 
group by id 
order by 2 desc, 1;

それを：すべての答えを考慮

select id, count(distinct u) as matches 
from (
    select id, u 
    from testing_items, 
    lateral unnest(tags) u 
    where u in ('222', '555', '666') 
    ) s 
group by 1 
order by 2 desc 

id | matches 
----+--------- 
    5 |  3 
    3 |  2 
    2 |  1 
    4 |  1 
(4 rows)

が、このクエリは、それらのそれぞれの良い側面を兼ね備えているようですエドゥアルドのテストで最高のパフォーマンスを発揮します。

出典

2017-02-28 23:21:17 klin

@ IN演算子 – Eduardo

の欠如のため、IN句はGINインデックスを使用しないと考えています。右のように逆順にする必要があるインデックスを使用するpaqashのバージョン（nice try）で。 – klin

はここアンネストを使用して、私の2セントだと配列が含まれています

select id, count(*) 
from (
    select unnest(array['222','555','666']) as tag, * 
    from testing_items 
) as w 
where tags @> array[tag] 
group by id 
order by 2 desc

結果：これは私が0の間のランダムな数字と3つのタグそれぞれ、10万のレコードでテストする方法

+------+---------+ | id | count | |------+---------| | 5 | 3 | | 3 | 2 | | 2 | 1 | | 4 | 1 | +------+---------+

出典

2017-02-28 23:33:03 paqash

ですおよび100：

BEGIN; 
LOCK TABLE testing_items IN EXCLUSIVE MODE; 
INSERT INTO testing_items (tags) SELECT (ARRAY[trunc(random() * 99 + 1), trunc(random() * 99 + 1), trunc(random() * 99 + 1)]) FROM generate_series(1, 10000000) s; 
COMMIT;

ORDER BY c DESC, id LIMIT 5は大きな反応を待っていません。

@paqashおよび@klinソリューションは、同様のパフォーマンスを備えています。私のラップトップは、タグ11と12秒でそれらを実行します8と5

しかし、これは4.6秒で実行されます：

SELECT id, count(*) as c 
FROM (
SELECT id FROM testing_items WHERE tags @> '{11}' 
UNION ALL 
SELECT id FROM testing_items WHERE tags @> '{8}' 
UNION ALL 
SELECT id FROM testing_items WHERE tags @> '{5}' 
) as items 
GROUP BY id 
ORDER BY c DESC, id 
LIMIT 5

しかし、私はまだより高速な方法があると思います。

出典

2017-03-01 02:26:14 Eduardo

あなたは正しい方法を見つけたと思います。 Unnestはコストがかかりますが、右側の定数を使用した簡単なクエリはGINインデックスを最大限に活用します。 – klin

@> GINインデックスを使用しないでください – McNets

Explain says yes – Eduardo

Postgresql - 配列内の一致数で並べ替え

答えて

関連する問題