ElasticSearchのすべての_idを効率的に取得する方法

特定のインデックスのすべての_idをElasticSearchから取得する最速の方法は何ですか？単純なクエリを使用することで可能ですか？私の索引には約2万の文書があります。ElasticSearchのすべての_idを効率的に取得する方法

2013-07-05 Mahoni

[this]（https：// github。 com/elastic/elasticsearch/issues/17159）非常に役に立ちます。 – shellbye

編集：@Aleck Landgrafの回答を読んでください、あまりにも

は、あなただけのelasticsearch-内部_idフィールドをしたいですか？または文書内のフィールドid？かつて、

curl http://localhost:9200/index/type/_search?pretty=true -d ' 
{ 
    "query" : { 
     "match_all" : {} 
    }, 
    "stored_fields": [] 
} 
'

を試してみてください。注2017年の更新のために

：ポストはもともと"fields": []を含むが、それ以来、名前が変更されたとstored_fieldsが新しい値です。

結果は、あなたの文書からフィールドを含めたい場合は、単にfields配列

curl http://localhost:9200/index/type/_search?pretty=true -d ' 
{ 
    "query" : { 
     "match_all" : {} 
    }, 
    "fields": ["document_field_to_be_returned"] 
} 
'

に追加し、あなたの文書の唯一の「メタデータ」、後者の場合

{ 
    "took" : 7, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 4, 
    "max_score" : 1.0, 
    "hits" : [ { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "36", 
     "_score" : 1.0 
    }, { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "38", 
     "_score" : 1.0 
    }, { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "39", 
     "_score" : 1.0 
    }, { 
     "_index" : "index", 
     "_type" : "type", 
     "_id" : "34", 
     "_score" : 1.0 
    } ] 
    } 
}

が含まれています

出典

2013-07-05 22:07:28 Thorsten

これは10件の結果しか返しませんか？ –

これを行うには、ストレートクエリを実行するのが最も効率的な方法ではありません。クエリを実行するときは、結果をすべて返す前にソートする必要があります。以下のレスポンスで述べたScrollとScanは、結果セットを返す前にソートしないので、はるかに効率的です。 – aamiri

5.xではもはや動作しません。フィールド 'fields'が削除されました。代わりに' '_source '：false'を追加しました。 –

別のオプション

curl 'http://localhost:9200/index/type/_search?pretty=true&fields='

_index戻ります、_type、_ idと_score。

出典

2014-08-18 06:43:44

-1いくつかのドキュメントにアクセスする場合は、スキャンとスクロールを使う方がよい。これは "速い方法"ですが、うまく行かず、大規模なインデックスでも失敗する可能性があります – PhaedrusTheGreek

6.2： "要求には認識できないパラメータが含まれています：[fields]" –

あなたもあなたの適切なリストを与えるのpython、でそれを行うことができます。elasticsearchは結果をランク付けし、ソートする必要はありませんので、結果のリストを取得するためにscroll and scanを使用して

import elasticsearch 
es = elasticsearch.Elasticsearch() 

res = es.search(
    index=your_index, 
    body={"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]}) 

ids = [d['_id'] for d in res['hits']['hits']]

出典

2015-05-28 07:24:19

良いです。

from elasticsearch import Elasticsearch 
from elasticsearch_dsl import Search 

es = Elasticsearch() 
s = Search(using=es, index=ES_INDEX, doc_type=DOC_TYPE) 

s = s.fields([]) # only get ids, otherwise `fields` takes a list of field names 
ids = [h.meta.id for h in s.scan()]

コンソールログ：このLIB elasticsearch-dslのpythonで

をすることによって達成することができる

GET http://localhost:9200/my_index/my_doc/_search?search_type=scan&scroll=5m [status:200 request:0.003s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.003s] 
GET http://localhost:9200/_search/scroll?scroll=5m [status:200 request:0.005s] 
...

注：スクロールは、クエリからの結果のバッチを引っ張ると、カーソルを保持します一定の時間（1分、2分、更新可能）の間オープンします。 スキャンは並べ替えを無効にします。 scanヘルパー関数は、安全に反復処理できるpythonジェネレータを返します。（Aleck-Landgraf @ロバート・Lujoによると@ 2つの回答にエラボレーション

from elasticsearch import Elasticsearch 
from elasticsearch.helpers import scan 
es = Elasticsearch() 
for dobj in scan(es, 
       query={"query": {"match_all": {}}, "fields" : []}, 
       index="your-index-name", doc_type="your-doc-type"): 
     print dobj["_id"],

出典

2015-06-15 21:57:48

メソッド 'fields'が削除されましたバージョン5.0.0'（https://elasticsearch-dsl.readthedocs.io/en/latest/Changelog.html?highlight=fields(#id2）を参照してください） 's = s.source（[ ） '。 – illagrenan

指定されたリンクは利用できません404 –

search_type = 2.1以降の非推奨のスキャン（[https://www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_search_changes.html](https： //www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_search_changes.html）） – aleha

私にとっては、標準elasticsearchパイソンのAPIに直接scan機能を使用して働いていました権限を持つ誰かが喜んで）コメントにこれを移動することができます。を返される発電機から印刷が、リスト内のすべてを取得したくない場合は、ここで私が使用するものです。

from elasticsearch import Elasticsearch,helpers 
es = Elasticsearch(hosts=[YOUR_ES_HOST]) 
a=helpers.scan(es,query={"query":{"match_all": {}}},scroll='1m',index=INDEX_NAME)#like others so far 

IDs=[aa['_id'] for aa in a]

出典

2016-01-16 22:39:47

：、@ Aleck-Landgrafの答えに触発

は

出典

2016-02-10 17:16:31

-1

Url -> http://localhost:9200/<index>/<type>/_query 
http method -> DELETE 
Query -> {"query": {"match_all": {}}, "size": 30000, "fields": ["_id"]})

出典

2016-10-04 08:47:50

elasticsearch 5.xでは、「_source」フィールドを使用できます。

GET /_search 
{ 
    "_source": false, 
    "query" : { 
     "term" : { "user" : "kimchy" } 
    } 
}

"fields"は廃止されました。（エラー：「フィールド[fields]はサポートされなくなりました。[stored_fields]を使用して保存されたフィールドを取得するか、フィールドが保存されていない場合は_sourceフィルタリングを使用してください）

出典

2016-11-14 04:25:52 Nav

エラーテキストを追加するためのボーナスポイントElasticsearchエラーメッセージはほとんどがあまりうまくいかないようです。 – AmericanUmlaut

ElasticSearchのすべての_idを効率的に取得する方法

答えて

関連する問題