SQLalchemy：iteration + countとfunc.count（）の結果が一致しない

このようなクラスが定義されています（注：：既存のデータベースをautomapまで拡張しているため、以下の列への参照が反映されないことがあります。ここではクラス）：彼らはかなり複雑なクエリで組み立てられますが、ここではスリム化されているSQLalchemy：iteration + countとfunc.count（）の結果が一致しない

class VariantAssociation(Base): 

    __tablename__ = "sample_variant_association" 

    vid = Column(Integer, ForeignKey("variants.variant_id"), 
       primary_key=True, index=True) 
    sid = Column(Integer, ForeignKey("samples.sample_id"), 
       primary_key=True, index=True) 

    vdepth = Column(Integer, index=True) 
    valt_depth = Column(Integer, index=True) 
    gt = Column(Text) 
    gt_type = Column(Integer) 
    fraction = Column(Float, index=True) 

    variant = relationship("Variant", back_populates="samples") 
    sample = relationship("Samples", back_populates="variants") 


class Variant(Base): 

    __tablename__ = "variants" 

    variant_id = Column(Integer, primary_key=True) 
    info = deferred(Column(LargeBinary)) 

    samples = relationship("VariantAssociation", 
         back_populates="variant") 

    def __repr__(self): 

     data = "<Variant {chrom}:{start}-{end} {gene} {ref}/{alt} {type}>" 

     return data.format(chrom=self.chrom, 
         start=self.start, 
         end=self.end, 
         gene=self.gene, 
         ref=self.ref, 
         alt=self.alt, 
         type=self.type) 


class Samples(Base): 

    __tablename__ = "samples" 

    sample_id = Column(Integer, primary_key=True, index=True) 
    name = Column(Text, index=True) 
    variants = relationship("VariantAssociation", 
          back_populates="sample")

：refと01：今

query = session.query(Variant).join(VariantAssociation.variant_id).join(Samples) 
query = query.filter(VariantAssociation.vdepth >= 60)

は、私は2つの列の組み合わせをカウントしたいです。

私は同じくらい簡単だろうと思った：（1行の例）を生じるどの

query = query.with_entities(Variant.ref, Variant.alt, 
    func.count()).distinct().group_by(gemini.Variant.ref, gemini.Variant.alt)

：

('A', 'C', 308)

しかし、私はちょうどクエリに反復して数える場合：

from collections import defaultdict, Counter 
counts = defaultdict(Counter) 
for row in query.with_entities(Variant.ref, Variant.alt): 
    counts[f"{row.ref}>{row.alt}"].update(["present"])

私に

'A>C': Counter({'present': 155})

ほぼ半分私が見つけたものcount。私は後者が正しいことを知っていますが、前者は正しくありません。しかし、私は後者が非常に遅い（大きなSQLiteデータベース）として、前者を使用したいと思います。

私は誤ってカウントアップしましたか？

EDIT：要求されたとして、countのための完全なクエリ（DB自体から夫婦複数のフィルタを含んでいる）

SELECT DISTINCT variants.ref AS variants_ref, variants.alt AS variants_alt, count(*) AS count_1 
FROM variants JOIN sample_variant_association ON variants.variant_id = sample_variant_association.vid JOIN 
samples ON samples.sample_id = sample_variant_association.sid 
WHERE sample_variant_association.gt_type != ? AND variants.impact NOT IN (?, ?, ?, ?) AND 
sample_variant_association.vdepth >= ? AND sample_variant_association.fraction >= ? AND variants.chrom NOT IN (?, 
?) AND variants.aaf_1kg_eur < ? AND variants.type = ? AND sample_variant_association.fraction >= ? AND 
sample_variant_association.vdepth >= ? GROUP BY variants.ref, variants.alt

そして反復するために使用されるもの：

SELECT DISTINCT variants.ref AS variants_ref, variants.alt AS variants_alt 
FROM variants JOIN sample_variant_association ON variants.variant_id = sample_variant_association.vid JOIN 
samples ON samples.sample_id = sample_variant_association.sid 
WHERE sample_variant_association.gt_type != ? AND variants.impact NOT IN (?, ?, ?, ?) AND 
sample_variant_association.vdepth >= ? AND sample_variant_association.fraction >= ? AND variants.chrom NOT IN (?, 
?) AND variants.aaf_1kg_eur < ? AND variants.type = ? AND sample_variant_association.fraction >= ? AND 
sample_variant_association.vdepth >= ?

EDIT 2：Iこれをベースクエリに重複したvariant_idsが存在することを確認しました。

query.with_entities(gemini.Variant.variant_id).count() 
18288 
query.with_entities(gemini.Variant.variant_id).distinct().count() 
14437

S o問題は当初考えていたものとは異なります。どういうわけか、重複したレコードはループ内で考慮されますが、func.count()では処理されません。サブクエリを使用して

出典

2017-06-30 Einar

すると、クエリを共有することができますどのように追加するには、 'STR（クエリ）' – shanmuga

を生成@ shanmugaは、両方の場合に使用されるクエリを共有してくださいと述べた。 –

両方のクエリを追加しました。ありがとうございます。 – Einar

は、最初の重複を除去することで、働いていた：

id_subquery = query.with_entities(Variant.variant_id).distinct().subquery()

は、実際のデータ取得：

c_query = session.query(Variant.ref, Variant.alt, func.count(1)) 
c_query = c_query.filter(Variant.variant_id.in_(id_subquery)) 
c_query = c_query.group_by(Variant.ref, Variant.alt) 

c_query.first() 
('A', 'C', 155)

出典

2017-06-30 11:52:52 Einar

SQLalchemy：iteration + countとfunc.count（）の結果が一致しない

答えて

関連する問題