2016-11-01 2 views
1

私はウェブからいくつかのテーブルを抽出することで働いていますので、私は、およそread_htmlパンダの機能を読んでいたので、私が行うとき:パンダread_htmlから一連のテーブルを読み込み、フラット化/正規化する方法は?

import pandas as pd 
url_mcc = 'link.com.html' 
dfs = pd.read_html(url_mcc) 
dfs 

私は次のリストを取得:

[          Presentation \ 
0 0.4 mg/mL, 1 mL single-dose vial, package of 2... 
1 1 mg/mL, 1 mL single-dose vial, package of 25 ... 

    Availability and Estimated Shortage Duration \ 
0    Available for NDC 00517-0401-25. 
1         Available 

            Related Information \ 
0 American Regent is currently releasing the 0.4... 
1 American Regent is currently releasing the 1mg... 

    Shortage Reason (per FDASIA) 
0 Demand increase for the drug 
1       Other , 
             Presentation \ 
0 0.1 mg/mL; 10 mL Luer-Jet Prefilled Syringe (N... 

    Availability and Estimated Shortage Duration Related Information \ 
0       Product available     NaN 

    Shortage Reason (per FDASIA) 
0 Demand increase for the drug , 
             Presentation \ 
0 0.1 mg/mL; 10 mL Ansyr syringe (NDC 0409-1630-10) 
1 0.05 mg/mL; 5 mL Ansyr syringe (NDC 0409-9630-05) 
2 0.1 mg/mL; 5 mL Lifeshield syringe (NDC 0409-4... 
3 0.1 mg/mL; 10 mL Lifeshield syringe (NDC 0409-... 

     Availability and Estimated Shortage Duration \ 
0 Next delivery: Late October. Estimated recover... 
1   Next delivery: TBD Estimated recovery: TBD 
2           Available 
3           Available 

            Related Information \ 
0 Please check with your wholesaler for availabl... 
1 Please check with your wholesaler for availabl... 
2    Shortage per Manufacturer: Available 
3    Shortage per Manufacturer: Available 

    Shortage Reason (per FDASIA) 
0      Other 
1      Other 
2      Other 
3      Other , 
           Presentation \ 
0 0.4 mg/mL, 20 mL vial (NDC 0641-6006-10) 

    Availability and Estimated Shortage Duration \ 
0   West-Ward has available inventory. 

            Related Information \ 
0 Additional lots are scheduled to be manufactur... 

    Shortage Reason (per FDASIA) 
0 Demand increase for the drug ] 

することができますようにリスト(またはテーブル?)に列が繰り返し表示されている:PresentationAvailability and Estimated Shortage DurationRelated InformationShortage Reason (per FDASIA)、このウェブサイトには同じ列の3つの異なるテーブルがあるため、そこで、私の質問はどのように平らなまたは多かれ少なかれ、このように、単一のものにすべての異なるテーブルやリストを正規化である:私はあなたがconcatdfs場合が必要だと思う

[          Presentation \ 
0 0.4 mg/mL, 1 mL single-dose vial, package of 2... 
1 1 mg/mL, 1 mL single-dose vial, package of 25 ... 
2 1 mg/mL; 10 mL Luer-Jet Prefilled Syringe (N... 
3 0.1 mg/mL; 10 mL Ansyr syringe (NDC 0409-1630-10) 
4 0.05 mg/mL; 5 mL Ansyr syringe (NDC 0409-9630-05) 
5 0.1 mg/mL; 5 mL Lifeshield syringe (NDC 0409-4... 
6 0.1 mg/mL; 10 mL Lifeshield syringe (NDC 0409-... 



    Availability and Estimated Shortage Duration \ 
0    Available for NDC 00517-0401-25. 
1         Available 
2       Product available     NaN 
0 Next delivery: Late October. Estimated recover... 
1   Next delivery: TBD Estimated recovery: TBD 
2           Available 
3           Available 
0 0.4 mg/mL, 20 mL vial (NDC 0641-6006-10) 

    Availability and Estimated Shortage Duration \ 
0   West-Ward has available inventory. 


    Shortage Reason (per FDASIA) 
0 Demand increase for the drug 


            Related Information \ 
0 American Regent is currently releasing the 0.4... 
1 American Regent is currently releasing the 1mg... 
0 Please check with your wholesaler for availabl... 
1 Please check with your wholesaler for availabl... 
2    Shortage per Manufacturer: Available 
3    Shortage per Manufacturer: Available 
0 Additional lots are scheduled to be manufactur... 


    Shortage Reason (per FDASIA) 
0 Demand increase for the drug 
1       Other , 



    Shortage Reason (per FDASIA) 
0 Demand increase for the drug , 
0      Other 
1      Other 
2      Other 
3      Other , 

答えて

2

DataFramesのリストです:

df = pd.concat(dfs) 

はまた、あなたがインデックスに避けるの重複のパラメータignore_index=Trueを使用することができます。

df = pd.concat(dfs, ignore_index=True) 

サンプル:

df1 = pd.DataFrame({'A':[1,2,3], 
        'B':[4,5,6], 
        'C':[7,8,9]}) 

#print (df1) 

df2 = pd.DataFrame({'A':[3,4,6], 
        'B':[2,3,4], 
        'C':[3,6,0]}) 

#print (df2) 

df3 = pd.DataFrame({'A':[4,7,9], 
        'B':[3,4,5], 
        'C':[5,1,9]}) 

#print (df3) 

dfs = [df1,df2,df3] 
print (dfs) 
[ A B C 
0 1 4 7 
1 2 5 8 
2 3 6 9, A B C 
0 3 2 3 
1 4 3 6 
2 6 4 0, A B C 
0 4 3 5 
1 7 4 1 
2 9 5 9] 
df = pd.concat(dfs) 
print (df) 
    A B C 
0 1 4 7 
1 2 5 8 
2 3 6 9 
0 3 2 3 
1 4 3 6 
2 6 4 0 
0 4 3 5 
1 7 4 1 
2 9 5 9 

df1 = pd.concat(dfs, ignore_index=True) 
print (df1) 
    A B C 
0 1 4 7 
1 2 5 8 
2 3 6 9 
3 3 2 3 
4 4 3 6 
5 6 4 0 
6 4 3 5 
7 7 4 1 
8 9 5 9 
関連する問題