パンダのIterrows行番号とパーセンテージ

私は1000sの行を持つデータフレームを繰り返しています。私は理想的には、私のループの進行状況を知りたいと思っています。つまり、完了した行の数、合計行の何パーセントが完了したかなどです。パンダのIterrows行番号とパーセンテージ

行番号を印刷する方法はありますか？以上の繰り返しの行？

私のコードは現在以下の通りです。現在、下にどのように表示されているかを印刷すると、何らかのタプル/リストが表示されますが、必要なのは行番号だけです。これはおそらく単純です。

for row in testDF.iterrows(): 

     print("Currently on row: "+str(row))

理想プリント応答：formatと

Currently on row 1; Currently iterrated 1% of rows 
Currently on row 2; Currently iterrated 2% of rows 
Currently on row 3; Currently iterrated 3% of rows 
Currently on row 4; Currently iterrated 4% of rows 
Currently on row 5; Currently iterrated 5% of rows

出典

2017-07-02 christaylor

なぜループを使用して始めていますか？おそらくもっと良い方法があります。必要な場合は、行の総数で割ることができる現在の行のインデックス（行自体とともに）を返す 'enumerate'を使って、進捗状況を簡単に計算できます。 'progress = index/len（testDF）' – DeepSpace

私はジオコーディングされたデータで新しい列を作成しているので、iterrowsループを使用しています。ジオコードできるサービスのほとんどには制限があるため、私のループには0.1秒の遅延が追加されています。 – christaylor

一つの可能な解決策であれば一意単調指数（0,1,2,...）：

for i, row in testDF.iterrows(): 
     print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100))

サンプル：

np.random.seed(1332) 
testDF = pd.DataFrame(np.random.randint(10, size=(10, 3))) 
print (testDF) 
    0 1 2 
0 8 1 9 
1 4 3 5 
2 0 1 3 
3 1 8 6 
4 7 4 7 
5 7 5 3 
6 7 9 9 
7 0 1 2 
8 1 3 4 
9 0 0 3 

for i, row in testDF.iterrows(): 
     print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100)) 
Currently on row: 0; Currently iterrated 10.0% of rows 
Currently on row: 1; Currently iterrated 20.0% of rows 
Currently on row: 2; Currently iterrated 30.0% of rows 
Currently on row: 3; Currently iterrated 40.0% of rows 
Currently on row: 4; Currently iterrated 50.0% of rows 
Currently on row: 5; Currently iterrated 60.0% of rows 
Currently on row: 6; Currently iterrated 70.0% of rows 
Currently on row: 7; Currently iterrated 80.0% of rows 
Currently on row: 8; Currently iterrated 90.0% of rows 
Currently on row: 9; Currently iterrated 100.0% of rows

EDI T：length of dfのと同じであるどのようないくつかのカスタムインデックスの値であれば

、length of indexによってzipとnumpy.arangeと解決策：すべてのiterrowsの

np.random.seed(1332) 
testDF = pd.DataFrame(np.random.randint(10, size=(10, 3)), index=[2,4,5,6,7,8,2,1,3,5]) 
print (testDF) 
    0 1 2 
2 8 1 9 
4 4 3 5 
5 0 1 3 
6 1 8 6 
7 7 4 7 
8 7 5 3 
2 7 9 9 
1 0 1 2 
3 1 3 4 
5 0 0 3 

for i, (idx, row) in zip(np.arange(len(testDF.index)), testDF.iterrows()): 
    print("Currently on row: {}; Currently iterrated {}% of rows".format(idx, (i + 1)/len(testDF.index) * 100)) 

Currently on row: 2; Currently iterrated 10.0% of rows 
Currently on row: 4; Currently iterrated 20.0% of rows 
Currently on row: 5; Currently iterrated 30.0% of rows 
Currently on row: 6; Currently iterrated 40.0% of rows 
Currently on row: 7; Currently iterrated 50.0% of rows 
Currently on row: 8; Currently iterrated 60.0% of rows 
Currently on row: 2; Currently iterrated 70.0% of rows 
Currently on row: 1; Currently iterrated 80.0% of rows 
Currently on row: 3; Currently iterrated 90.0% of rows 
Currently on row: 5; Currently iterrated 100.0% of rows

出典

2017-07-02 13:45:19 jezrael

以下のように印刷する方がよいでしょうか？ 'print'（現在、行 '、i'、 'iterated through'、100 * i/testDF.shape [0]、 '％'） 'となっています。お返事ありがとうございます –

@ RayhaneMama - 私はあなたの作品も多くの可能な方法があると思います。私は 'len（df.index）'を好んでいます。これは最速の方法です。 – jezrael

ここで、 'i'は各行のインデックスです。インデックスに0〜 'len（df）-1'の整数が含まれていても' testDF'がカスタムインデックス値を使用している場合は動作しません。 –

まず(index, row)のタプルを与えます。そう適切なコードが行の数一般的な場合に

for index, row in testDF.iterrows():

インデックスではない、それは（これはパンダの電力一部識別子であるが、それはここでPythonでordinar listないように振る舞うように、いくつかの混乱を行いますインデックスは行数です）。そのため、独立した行数を計算する必要があります。私たちはline_number = 0を導入し、それぞれの数字をline_number += 1に増やすことができます。しかし、Pythonはenumerateという便利なツールを提供しています。valueではなく(line_number, value)のタプルを返します。だから我々はそのコードに来る

for (line_number, (index, row)) in enumerate(testDF.iterrows()): 
    print("Currently on row: {}; Currently iterrated {}% of rows".format(
      line_number, 100*(line_number + 1)/len(testDF)))

P.S.あなたがintegeresを分けると、python2は整数を返します。そのため、999/1000 = 0、あなたが期待していないものです。だからあなたは浮動小数点数を得ることができますか、または整数になるように最初に100*を取ることができます。

出典

2017-07-02 14:04:41

大きなデータフレームの場合は、印刷を制限する方が時間がかかります。これを行う方法は次のとおりです。

dftest=pd.DataFrame(np.random.rand(10**5,5)) 

percent=0 
n=len(dftest)//100 

for i,row in dftest.iterrows(): 
    if (i+1)//n>percent : 
     percent +=1 
     print (percent, "% realized") 
    dftest.iloc[i] = 2*row #a job

出典

2017-07-02 14:36:09

パンダのIterrows行番号とパーセンテージ

答えて

関連する問題