Sklearn：Categorical Imputer？

sklearn.preprocessingオブジェクトを使用してカテゴリ値を代入する方法はありますか？最終的に新しいデータに適用して古いデータと同じように変換できるようにする前処理オブジェクトを最終的に作成したいと思います。Sklearn：Categorical Imputer？

私はそれを使用できるようにそれを行う方法を探していますthis方法。

出典

2017-03-16 user1367204

あなたがデータと一緒に、より多くの説明を追加する必要がありますあなたはそれで何をしたいですか？ –

コピーとthis答えを修正し、私はあなたがするだろう、それを使用するには

import numpy 
import pandas 

from sklearn.base import TransformerMixin 


class SeriesImputer(TransformerMixin): 

    def __init__(self): 
     """Impute missing values. 

     If the Series is of dtype Object, then impute with the most frequent object. 
     If the Series is not of dtype Object, then impute with the mean. 

     """ 
    def fit(self, X, y=None): 
     if X.dtype == numpy.dtype('O'): self.fill = X.value_counts().index[0] 
     else       : self.fill = X.mean() 
     return self 

    def transform(self, X, y=None): 
     return X.fillna(self.fill)

オブジェクトpandas.Seriesためimputerをした：

# Make a series 
s1 = pandas.Series(['k', 'i', 't', 't', 'e', numpy.NaN]) 


a = SeriesImputer() # Initialize the imputer 
a.fit(s1)    # Fit the imputer 
s2 = a.transform(s1) # Get a new series

出典

2017-03-17 13:51:53 user1367204

はい、可能です。たとえば、パラメータstrategy = 'most_frequent'のsklearn.preprocessing.Imputerを使用できます。

fit_transformメソッドを使用して古いデータ（列車セット）に適用し、次に新しいデータ（テストセット）にtransformを適用します。

出典

2017-03-17 11:33:42 slonopotam

私はImputerが文字列で動作するとは思わない。 – user1367204

まあ、私はそれが既に数値（整数）であると仮定しました。カテゴリデータが文字列形式である場合は、まずsklearn.LabelEncoder – slonopotam

で数値に変換する必要があります。LabelEncoderを使用すると、numpy.NaNフィールドが失われ、数値に変換されてから次のステップでImputerを使用してください。 – user1367204

Sklearn：Categorical Imputer？

答えて

関連する問題