2016-08-24 9 views
1

私のようなデータフレーム持っている:私はこれを行うことができることを知って)(str.formatに名前付き引数として列を持つデータフレームを使用して

import pandas as pd 
df = pd.DataFrame({'author':["Melville","Hemingway","Faulkner"], 
        'title':["Moby Dick","The Sun Also Rises","The Sound and the Fury"], 
        'subject':["whaling","bullfighting","a messed-up family"] 
        }) 

# produces desired output     
("Some guy " + df['author'] + " wrote a book called " + 
    df['title'] + " that uses " + df['subject'] + 
    " as a metaphor for the human condition.") 

だけに、それは可能であるが

# returns KeyError:'author' 
["Some guy {author} wrote a book called {title} that uses " 
    "{subject} as a metaphor for the human condition.".format(x) 
     for x in df.itertuples(index=False)] 

答えて

3
>>> ["Some guy {author} wrote a book called {title} that uses " 
    "{subject} as a metaphor for the human condition.".format(**x._asdict()) 
     for x in df.itertuples(index=False)] 

['Some guy Melville wrote a book called Moby Dick that uses whaling as a metaphor for the human condition.', 'Some guy Hemingway wrote a book called The Sun Also Rises that uses bullfighting as a metaphor for the human condition.', 'Some guy Faulkner wrote a book called The Sound and the Fury that uses a messed-up family as a metaphor for the human condition.'] 
:、の線に沿って何かを str.format()を使用して、より明確にこれを書きます

_asdict()はパブリックAPIの一部ではありませんので、将来のパンダのアップデートではそれに頼ってしまう可能性があります。

あなたが代わりにこれを行うことができます:あなたはまた、このようなDataFrame.iterrows()を使用することができ

>>> ["Some guy {} wrote a book called {} that uses " 
    "{} as a metaphor for the human condition.".format(*x) 
     for x in df.values] 
+0

これで、 '*'は私のタプル部分を行います。輝かしい、ありがとう - なぜ誰かが私たちをdownvoted理由 – C8H10N4O2

0

:あなたがしたい場合はいいです

["The book {title} by {author} uses " 
    "{subject} as a metaphor for the human condition.".format(**x) 
    for i, x in df.iterrows()] 

へ:

  • 使用名前付き引数、そのため、使用順序は列の順序と一致する必要はありません(上記のように)
  • _asdict()

タイミングのような内部機能を使用しない:最速のは、私たちが、キャッシングに関する警告に注意し、最も遅い実行を取る場合でも、M. Klugerfordの第二の溶液であるように思われます。

# example 
%%timeit 
("Some guy " + df['author'] + " wrote a book called " + 
    df['title'] + " that uses " + df['subject'] + 
    " as a metaphor for the human condition.") 
# 1000 loops, best of 3: 883 µs per loop 

%%timeit 
    ["Some guy {author} wrote a book called {title} that uses " 
     "{subject} as a metaphor for the human condition.".format(**x._asdict()) 
      for x in df.itertuples(index=False)] 
#1000 loops, best of 3: 962 µs per loop 

%%timeit 
    ["Some guy {} wrote a book called {} that uses " 
    "{} as a metaphor for the human condition.".format(*x) 
      for x in df.values] 
#The slowest run took 5.90 times longer than the fastest. This could mean that an intermediate result is being cached. 
#10000 loops, best of 3: 18.9 µs per loop 

%%timeit 
    from collections import OrderedDict 
    ["The book {title} by {author} uses " 
     "{subject} as a metaphor for the human condition.".format(**x) 
     for x in [OrderedDict(row) for i, row in df.iterrows()]] 
#1000 loops, best of 3: 308 µs per loop    

%%timeit 
    ["The book {title} by {author} uses " 
     "{subject} as a metaphor for the human condition.".format(**x) 
     for i, x in df.iterrows()] 
#1000 loops, best of 3: 413 µs per loop   

なぜ次のものが最後のものよりも速いのが私を超えているのはなぜですか?

関連する問題