ウェブ解析プロジェクトにはmemsql 5.1が使用されています。 1日に約80Mのレコードと0.5Mのレコードがあります。簡単なリクエストは約5秒間に実行されます。ドメイン、地理、langごとに受信したデータの数です。私はその時間を減らすことは可能だと思うが、私は方法を見つけることができません。方法を教えてください。このような1つのmemsqlの80Mレコードから選択あたり5秒
CREATE TABLE `domains` (
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`geo` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`lang` char(5) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`browser` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`os` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`device` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`domain` varchar(200) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`ref` varchar(200) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
`blk_cnt` int(11) DEFAULT NULL,
KEY `date` (`date`,`geo`,`lang`,`domain`) /*!90619 USING CLUSTERED COLUMNSTORE */
/*!90618 , SHARD KEY() */
)
要求等
テーブル:推奨
- 元のクエリの時間の適用後
memsql> explain SELECT domain, geo, lang, avg(blk_cnt) as blk_cnt, count(*) as cnt FROM domains WHERE date BETWEEN '2016-07-31 0:00' AND '2016-08-01 0:00' GROUP BY domain, geo, lang ORDER BY blk_cnt ASC limit 40; +-------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------+ | Project [r0.domain, r0.geo, r0.lang, $0/CAST(COALESCE($1,0) AS SIGNED) AS blk_cnt, CAST(COALESCE($2,0) AS SIGNED) AS cnt] | | Top limit:40 | | GatherMerge [SUM(r0.s)/CAST(COALESCE(SUM(r0.c),0) AS SIGNED)] partitions:all est_rows:40 | | Project [r0.domain, r0.geo, r0.lang, s/CAST(COALESCE(c,0) AS SIGNED) AS blk_cnt, CAST(COALESCE(cnt_1,0) AS SIGNED) AS cnt, s, c, cnt_1] est_rows:40 | | TopSort limit:40 [SUM(r0.s)/CAST(COALESCE(SUM(r0.c),0) AS SIGNED)] | | HashGroupBy [SUM(r0.s) AS s, SUM(r0.c) AS c, SUM(r0.cnt) AS cnt_1] groups:[r0.domain, r0.geo, r0.lang] | | TableScan r0 storage:list stream:no | | Repartition [domains.domain, domains.geo, domains.lang, cnt, s, c] AS r0 shard_key:[domain, geo, lang] est_rows:40 est_select_cost:144350216 | | HashGroupBy [COUNT(*) AS cnt, SUM(domains.blk_cnt) AS s, COUNT(domains.blk_cnt) AS c] groups:[domains.domain, domains.geo, domains.lang] | | Filter [domains.date >= '2016-07-31 0:00' AND domains.date <= '2016-08-01 0:00'] | | ColumnStoreScan scan_js_data.domains, KEY date (date, geo, lang, domain) USING CLUSTERED COLUMNSTORE est_table_rows:72175108 est_filtered:18043777 | +-------------------------------------------------------------------------------------------------------------------------------------------------------+
- 5Sタイムスタンプ最適化
- - 3.7秒 タイムスタンプ+ shardkeyと
- - 2.6s
はあなたに非常にマッチをありがとうございました!
我々はより多くのvCPUを追加し、:元のクエリの 時間 - タイムスタンプ+ shardkeyと3.7s - - タイムスタンプの最適化と5S 2.6s をあなたに非常にマッチに感謝! – Georgy