Pipeline sklearn（Python）で複数のカスタムクラスを使用する

私は学生向けにPipelineのチュートリアルを試みますが、ブロックします。私は専門家ではないが、私は改善しようとしている。あなたの贅沢に感謝します。Pipeline sklearn（Python）で複数のカスタムクラスを使用する

ステップ1：実際にはは、私が分類器のためのデータフレームを製造する際に、いくつかのステップを実行するためにパイプラインで試すデータフレームの説明を
ステップ2：NaNの値を記入
ステップ3：変革ここ数

へのカテゴリ値は私のコードです：

class Descr_df(object): 

    def transform (self, X): 
     print ("Structure of the data: \n {}".format(X.head(5))) 
     print ("Features names: \n {}".format(X.columns)) 
     print ("Target: \n {}".format(X.columns[0])) 
     print ("Shape of the data: \n {}".format(X.shape)) 

    def fit(self, X, y=None): 
     return self 

class Fillna(object): 

    def transform(self, X): 
     non_numerics_columns = X.columns.difference(X._get_numeric_data().columns) 
     for column in X.columns: 
      if column in non_numerics_columns: 
       X[column] = X[column].fillna(df[column].value_counts().idxmax()) 
      else: 
       X[column] = X[column].fillna(X[column].mean())    
     return X 

    def fit(self, X,y=None): 
     return self 

class Categorical_to_numerical(object): 

    def transform(self, X): 
     non_numerics_columns = X.columns.difference(X._get_numeric_data().columns) 
     le = LabelEncoder() 
     for column in non_numerics_columns: 
      X[column] = X[column].fillna(X[column].value_counts().idxmax()) 
      le.fit(X[column]) 
      X[column] = le.transform(X[column]).astype(int) 
     return X 

    def fit(self, X, y=None): 
     return self

手順1と2、または手順1と3を実行すると、手順1と2と3を同時に実行しても動作します。から

pipeline = Pipeline([('df_intropesction', Descr_df()), ('fillna',Fillna()), ('Categorical_to_numerical', Categorical_to_numerical())]) 
pipeline.fit(X, y) 
AttributeError: 'NoneType' object has no attribute 'columns'

出典

2017-04-19 Jeremie Guez

おそらく、そのうちのいくつかは、どれも： 'X'または 'y'。完全にスタックしてください。 – sergzach

パイプラインの最初の推定器の出力が第二に行くので、このエラーが発生し、その後、第2の推定の出力はように第三に行くと...

：私はこのエラーを持っていますdocumentation of Pipeline：

あなたのパイプラインのためにそう

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

、実行の手順は次のとおりです。

Descr_df.fit（X） - >は何もしないで自分自身を返します
newX = Descr_df.transform（X） - > newXに代入する値を返す必要がありますが、それは次の推定子に渡す必要があります。何も返さない（印刷物のみ）。したがって、Noneは暗黙的に返されます。
Fillna.fit（newX） - >何もしないで返します。
Fillna.transform（newX） - > newX.columnsを呼び出します。しかし、newX =ステップ2のなし。したがって、エラー。

ソリューション：そのままデータフレームを返すためにDescr_dfの変換方法を変更します。

def transform (self, X): 
    print ("Structure of the data: \n {}".format(X.head(5))) 
    print ("Features names: \n {}".format(X.columns)) 
    print ("Target: \n {}".format(X.columns[0])) 
    print ("Shape of the data: \n {}".format(X.shape)) 
    return X

が提案：あなたのクラスはに確認するscikitにベース見積もりと変圧器のクラスから継承してください良い習慣。

つまりclass Descr_df(object)をclass Descr_df(BaseEstimator, TransformerMixin)、Fillna(object)からFillna(BaseEstimator, TransformerMixin)に変更します。

は、パイプラインでのカスタムクラスの詳細については、この例を参照してくださいません：

http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py

出典

2017-04-19 16:19:38

私は見てあなたに知らせるでしょう。あなたの答えは非常に興味深く役立ちそうです。ありがとうございます！ –

@JeremieGuez解決策を試してみてください。問題が解決しない場合は、この回答を受け入れることを検討してください。 –

ありがとう –

Pipeline sklearn（Python）で複数のカスタムクラスを使用する

答えて

関連する問題