1

私は再フォーマット関数に渡されるオブジェクトがApacheのビームGoogleのデータストアReadFromDatastoreエンティティいるProtobuf

google.cloud.protoを入力しているReadFromDatastore

p = beam.Pipeline(options=options) 
(p 
| 'Read from Datastore' >> ReadFromDatastore(gcloud_options.project, query) 
| 'reformat'   >> beam.Map(reformat) 
| 'Write To Datastore' >> WriteToDatastore(gcloud_options.project)) 

にapacheのビームのGoogleのデータストアAPIを使用しようとしています。 datastore.v1.entity_pb2.Entity

これは、変更や読み込みが困難なprotobuf形式です。

私は

entity= dict(google.cloud.datastore.helpers._property_tuples(entity_pb)) 

で辞書にentity_pb2.Entityを変換することができると思います。しかし、次の2つのライブラリをインポートしようとしているいくつかの理由のために私にいくつかのエラーを与える:

import google.cloud.datastore.helpers 
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore 

エラー:

Traceback (most recent call last): 
    File "/home/nburn42/MotoGarage/MotoGarage/MotoGarageBackgroundJobs/format_data.py", line 16, in <module> 
    import google.cloud.datastore.helpers 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py", line 57, in <module> 
    from google.cloud.datastore.batch import Batch 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py", line 24, in <module> 
    from google.cloud.datastore import helpers 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py", line 29, in <module> 
    from google.cloud.grpc.datastore.v1 import entity_pb2 as _entity_pb2 
    File "/usr/local/lib/python2.7/dist-packages/google/cloud/grpc/datastore/v1/entity_pb2.py", line 28, in <module> 
    dependencies=[google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,]) 
    File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 824, in __new__ 
    return _message.default_pool.AddSerializedFile(serialized_pb) 
TypeError: Couldn't build proto file into descriptor pool! 
Invalid proto descriptor for file "google/cloud/grpc/datastore/v1/entity.proto": 
    google.datastore.v1.PartitionId.project_id: "google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.PartitionId.namespace_id: "google.datastore.v1.PartitionId.namespace_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.PartitionId: "google.datastore.v1.PartitionId" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.partition_id: "google.datastore.v1.Key.partition_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.path: "google.datastore.v1.Key.path" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.id_type: "google.datastore.v1.Key.PathElement.id_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.kind: "google.datastore.v1.Key.PathElement.kind" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.id: "google.datastore.v1.Key.PathElement.id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement.name: "google.datastore.v1.Key.PathElement.name" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.PathElement: "google.datastore.v1.Key.PathElement" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key: "google.datastore.v1.Key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.ArrayValue.values: "google.datastore.v1.ArrayValue.values" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.ArrayValue: "google.datastore.v1.ArrayValue" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.value_type: "google.datastore.v1.Value.value_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.null_value: "google.datastore.v1.Value.null_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.boolean_value: "google.datastore.v1.Value.boolean_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.integer_value: "google.datastore.v1.Value.integer_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.double_value: "google.datastore.v1.Value.double_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.timestamp_value: "google.datastore.v1.Value.timestamp_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.key_value: "google.datastore.v1.Value.key_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.string_value: "google.datastore.v1.Value.string_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.blob_value: "google.datastore.v1.Value.blob_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.geo_point_value: "google.datastore.v1.Value.geo_point_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.entity_value: "google.datastore.v1.Value.entity_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.array_value: "google.datastore.v1.Value.array_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.meaning: "google.datastore.v1.Value.meaning" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value.exclude_from_indexes: "google.datastore.v1.Value.exclude_from_indexes" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Value: "google.datastore.v1.Value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.key: "google.datastore.v1.Entity.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.properties" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.PropertiesEntry.key: "google.datastore.v1.Entity.PropertiesEntry.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Entity.PropertiesEntry.value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity.PropertiesEntry: "google.datastore.v1.Entity.PropertiesEntry" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Entity: "google.datastore.v1.Entity" is already defined in file "google/cloud/proto/datastore/v1/entity.proto". 
    google.datastore.v1.Key.partition_id: "google.datastore.v1.PartitionId" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Key.path: "google.datastore.v1.Key.PathElement" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.ArrayValue.values: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Value.key_value: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Value.entity_value: "google.datastore.v1.Entity" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Value.array_value: "google.datastore.v1.ArrayValue" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Entity.key: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 
    google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.PropertiesEntry" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto". To use it here, please add the necessary import. 

entity_pb2.Entityを使用可能なものに変換するための方法はありますか?
ReadFromDatastoreは現実にはあまりにも新しくなっていますか?
別の方法を使用する必要がありますか?

おかげで、
ネイサン

+0

'com.google.datastore.v1'パッケージと –

答えて

1

あなたはgoogle.cloud.datastore.entity.Entityentity_pb2.Entityを変換する機能google.cloud.datastore.helpers.entity_from_protobufを使用することができます。

google.cloud.datastore.entity.Entityはdictのサブクラスであり、必要なユーザビリティを提供します。

+0

を見てみてください。このソリューションは、ローカルマシン上で実行されているApache Beamでうまくいきます。しかし、DataflowRunnerにジョブをプッシュすると、ジョブは 'google.cloud'で' datastore'を見つけることができないと言って失敗します。これは、DatastoreのApache Beam Python SDKのサポートがまだベータ版であるためですか? –

+1

Apache Beamを使用してGoogle Dataflowワーカーにパイプラインをデプロイすると、Python環境はワーカーマシンに複製されません。問題は、ワーカーマシンにローカル環境ではデフォルトでgoogle-cloud-datastoreパッケージがインストールされていないことです。 PyPIの依存関係の指定方法については、https://cloud.google.com/dataflow/pipelines/dependencies-pythonをご覧ください。 –

関連する問題