Mark Logic(8.0-6.4)で実行するのに非常に長い時間(10秒)かかる非常に単純なSPARQLクエリです。私はそれをスピードアップするために何ができますか?大きなSPARQLデータセットのインデックス作成を改善するにはどうすればよいですか?
データはジオネームのサブセットに基づいており、同じオーダーの大きさです(約2200万トリプルのように見えます)。
PREFIX gj: <http://mycompany.com/geonames-jurisdiction/1.0/schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX gn: <http://www.geonames.org/ontology#>
SELECT *
FROM <http://mycompany.com/geonames-jurisdiction/1.0/data>
FROM <http://mycompany.com/geonames-jurisdiction/1.0/rule-data>
WHERE
{ ?this_0 rdf:type gj:LocalCounty ;
gn:name ?name_1 .
}
ORDER BY ASC(?name_1)
LIMIT 100
更新
パーMarkLogicの提案、私は地元の郡に固有のDBに新しいプロパティを挿入し、クエリを実行しました:
INSERT {
GRAPH <http://mycompany.com/geonames-jurisdiction/1.0/rule-data> {
?this gj:localCountyName ?name .
}
}
WHERE {
?this a gj:LocalCounty .
?this gn:name ?name .
}
私はいくつか提案されたクエリの修正をもなされています:
PREFIX gj: <http://mycompany.com/geonames-jurisdiction/1.0/schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX gn: <http://www.geonames.org/ontology#>
SELECT ?this_0 ?name_1
FROM <http://mycompany.com/geonames-jurisdiction/1.0/data>
FROM <http://mycompany.com/geonames-jurisdiction/1.0/rule-data>
WHERE
{ ?this_0 rdf:type gj:LocalCounty ;
gj:localCountyName ?name_1 .
}
ORDER BY ?name_1
LIMIT 20
総問合せ時間が〜4秒に短縮されますが、これは優れていますが、まだまだ巨大です。上記のクエリから
トレース情報:ハードウェア(メモリ、CPU、ディスク)に応じて、
2017-05-04 12:00:18.684 Info: <triple-value-statistics count="147540458" unique-subjects="25064012" unique-predicates="81" unique-objects="67600843" xmlns="cts:triple-value-statistics">
2017-05-04 12:00:18.684 Info: <triple-value-entries>
2017-05-04 12:00:18.684 Info: <triple-value-entry count="8385355">
2017-05-04 12:00:18.684 Info: <triple-value>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</triple-value>
2017-05-04 12:00:18.684 Info: <subject-statistics count="0" unique-predicates="0" unique-objects="0"/>
2017-05-04 12:00:18.684 Info: <predicate-statistics count="8356279" unique-subjects="8341989" unique-objects="13"/>
2017-05-04 12:00:18.684 Info: <object-statistics count="0" unique-subjects="0" unique-predicates="0"/>
2017-05-04 12:00:18.684 Info: </triple-value-entry>
2017-05-04 12:00:18.684 Info: <triple-value-entry count="29204">
2017-05-04 12:00:18.684 Info: <triple-value>http://mycompany.com/geonames-jurisdiction/1.0/schema#LocalCounty</triple-value>
2017-05-04 12:00:18.684 Info: <subject-statistics count="2" unique-predicates="2" unique-objects="2"/>
2017-05-04 12:00:18.684 Info: <predicate-statistics count="0" unique-subjects="0" unique-objects="0"/>
2017-05-04 12:00:18.684 Info: <object-statistics count="29202" unique-subjects="29202" unique-predicates="3"/>
2017-05-04 12:00:18.684 Info: </triple-value-entry>
2017-05-04 12:00:18.684 Info: <triple-value-entry count="29201">
2017-05-04 12:00:18.684 Info: <triple-value>http://mycompany.com/geonames-jurisdiction/1.0/schema#localCountyName</triple-value>
2017-05-04 12:00:18.684 Info: <subject-statistics count="0" unique-predicates="0" unique-objects="0"/>
2017-05-04 12:00:18.684 Info: <predicate-statistics count="29201" unique-subjects="29201" unique-objects="26692"/>
2017-05-04 12:00:18.684 Info: <object-statistics count="0" unique-subjects="0" unique-predicates="0"/>
2017-05-04 12:00:18.684 Info: </triple-value-entry>
2017-05-04 12:00:18.684 Info: </triple-value-entries>
2017-05-04 12:00:18.684 Info: </triple-value-statistics>
2017-05-04 12:00:18.684 Info: [Event:id=SPARQL AST] sessionKey=7777437449602930525
2017-05-04 12:00:18.684 Info: initialPlan=SPARQLModule[
2017-05-04 12:00:18.684 Info: Prolog[]
2017-05-04 12:00:18.684 Info: SPARQLSelect[SPARQLLimit[
2017-05-04 12:00:18.684 Info: LIMIT GraphNode[Literal "20"^^<http://www.w3.org/2001/XMLSchema#integer>]
2017-05-04 12:00:18.684 Info: SPARQLProject[order(1)
2017-05-04 12:00:18.684 Info: GraphNode[Var this_0 0]
2017-05-04 12:00:18.684 Info: GraphNode[Var name_1 1]
2017-05-04 12:00:18.684 Info: SPARQLOrder[order(1) UNSORTED
2017-05-04 12:00:18.684 Info: OrderSpec[
2017-05-04 12:00:18.684 Info: Variable[QName[(Unknown) name_1] 1]
2017-05-04 12:00:18.684 Info: ASCENDING EMPTY MIN]
2017-05-04 12:00:18.684 Info: SPARQLMergeJoin[order(0) hash(0==0) scatter()
2017-05-04 12:00:18.684 Info: TriplePattern[order(0,1) PSO
2017-05-04 12:00:18.684 Info: GraphNode[Var this_0 0]
2017-05-04 12:00:18.684 Info: GraphNode[IRI <http://mycompany.com/geonames-jurisdiction/1.0/schema#localCountyName>]
2017-05-04 12:00:18.684 Info: GraphNode[Var name_1 1]]
2017-05-04 12:00:18.684 Info: TriplePattern[order(0) OPS
2017-05-04 12:00:18.684 Info: GraphNode[Var this_0 0]
2017-05-04 12:00:18.684 Info: GraphNode[IRI <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>]
2017-05-04 12:00:18.684 Info: GraphNode[IRI <http://mycompany.com/geonames-jurisdiction/1.0/schema#LocalCounty>]]]]]]]]
2017-05-04 12:00:18.684 Info: [Event:id=SPARQL Cost Analysis] sessionKey=7777437449602930525 optimize=1 r=3 t=1.28811 os=360 is=15 mutations=9 seed=15212683942933123635
2017-05-04 12:00:18.684 Info: initialCost=(m:6.02656e+07,r:0,io:(52.931/1.20414e+07/0),cpu(2):(0/116805/0),mem:87603,c:20,crd:[20,20])
2017-05-04 12:00:18.726 Info: [Event:id=SPARQL Cost Analysis] sessionKey=7777437449602930525 diff=0 diff%=0 r=0
2017-05-04 12:00:18.726 Info: cost=(m:6.02656e+07,r:0,io:(52.931/1.20414e+07/0),cpu(2):(0/116805/0),mem:87603,c:20,crd:[20,20])
2017-05-04 12:00:18.726 Info: [Event:id=SPARQL Cost Analysis] sessionKey=7777437449602930525 diff=0 diff%=0 r=1
2017-05-04 12:00:18.726 Info: cost=(m:6.02656e+07,r:0,io:(52.931/1.20414e+07/0),cpu(2):(0/116805/0),mem:87603,c:20,crd:[20,20])
2017-05-04 12:00:18.728 Info: [Event:id=SPARQL Cost Analysis] sessionKey=7777437449602930525 diff=0 diff%=0 r=2
2017-05-04 12:00:18.728 Info: cost=(m:6.02656e+07,r:0,io:(52.931/1.20414e+07/0),cpu(2):(0/116805/0),mem:87603,c:20,crd:[20,20])
2017-05-04 12:00:18.728 Info: [Event:id=SPARQL Cost Analysis] sessionKey=7777437449602930525
2017-05-04 12:00:18.728 Info: bestCost=(m:6.02656e+07,r:0,io:(52.931/1.20414e+07/0),cpu(2):(0/116805/0),mem:87603,c:20,crd:[20,20])
2017-05-04 12:00:18.729 Info: [Event:id=SPARQL AST] sessionKey=7777437449602930525
2017-05-04 12:00:18.729 Info: plan=SPARQLModule[
2017-05-04 12:00:18.729 Info: Prolog[]
2017-05-04 12:00:18.729 Info: SPARQLSelect[SPARQLLimit[
2017-05-04 12:00:18.729 Info: LIMIT GraphNode[Literal "20"^^<http://www.w3.org/2001/XMLSchema#integer>]
2017-05-04 12:00:18.729 Info: SPARQLProject[order(1)
2017-05-04 12:00:18.729 Info: GraphNode[Var this_0 0]
2017-05-04 12:00:18.729 Info: GraphNode[Var name_1 1]
2017-05-04 12:00:18.729 Info: SPARQLOrder[order(1) UNSORTED
2017-05-04 12:00:18.729 Info: OrderSpec[
2017-05-04 12:00:18.729 Info: Variable[QName[(Unknown) name_1] 1]
2017-05-04 12:00:18.729 Info: ASCENDING EMPTY MIN]
2017-05-04 12:00:18.729 Info: SPARQLMergeJoin[order(0) hash(0==0) scatter()
2017-05-04 12:00:18.729 Info: TriplePattern[order(0,1) PSO
2017-05-04 12:00:18.729 Info: GraphNode[Var this_0 0]
2017-05-04 12:00:18.729 Info: GraphNode[IRI <http://mycompany.com/geonames-jurisdiction/1.0/schema#localCountyName>]
2017-05-04 12:00:18.729 Info: GraphNode[Var name_1 1]]
2017-05-04 12:00:18.729 Info: TriplePattern[order(0) OPS
2017-05-04 12:00:18.729 Info: GraphNode[Var this_0 0]
2017-05-04 12:00:18.729 Info: GraphNode[IRI <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>]
2017-05-04 12:00:18.729 Info: GraphNode[IRI <http://mycompany.com/geonames-jurisdiction/1.0/schema#LocalCounty>]]]]]]]]
クエリと一致する結果の総数はどのくらいですか? 'ORDER BY'を使わない場合のパフォーマンスはどうですか?これは、基本的にグラフパターンに一致するデータ全体を実行する必要があるためです。 – AKSW
LIMIT節を削除してカウントを行うと、〜29,000トリプルがカウントされます。 – RMorrisey
Ok、「ODER BY」を削除するのはどうですか?これははるかに速くなければなりません。 – AKSW