2017-06-11 32 views
1

ベルヌーイ・ナイーブベイズのモデルでどれが最良の精度を与えるかを調べるために、ある範囲のアルファ(LaPlace平滑化パラメータ)上でGridSearchCVを使用したいと思います。GridSearchCVの初期化

def binarize_pixels(data, threshold=0.784): 
    # Initialize a new feature array with the same shape as the original data. 
    binarized_data = np.zeros(data.shape) 

    # Apply a threshold to each feature. 
    for feature in range(data.shape[1]): 
     binarized_data[:,feature] = data[:,feature] > threshold 
    return binarized_data 

binarized_train_data = binarize_pixels(mini_train_data) 

def BNB(): 
    clf = BernoulliNB() 
    clf.fit(binarized_train_data, mini_train_labels) 
    scoring = clf.score(mini_train_data, mini_train_labels) 
    predsNB = clf.predict(dev_data) 
    print "Bernoulli binarized model accuracy: {:.4}".format(np.mean(predsNB == dev_labels)) 

私GridSearchクロス検証はそうではないモデルでは、正常に動作:

pipeline = Pipeline([('classifier', BNB())]) 
def P8(alphas): 
    gs_clf = GridSearchCV(pipeline, param_grid = alphas, refit=True) 
    y_predictions = gs_clf.best_estimator_.predict(dev_data) 
    print classification_report(dev_labels, y_predictions) 
alphas = {'alpha' : [0.0, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 10.0]} 
P8(alphas) 

私ははAttributeErrorを得る: 'GridSearchCV' オブジェクトが問題である

答えて

1

'best_estimator_' は属性を持っていません次の2行に続く:

gs_clf = GridSearchCV(pipeline, param_grid = alphas, refit=True) 
y_predictions = gs_clf.best_estimator_.predict(dev_data) 

predict最初にモデルに適合させる必要があります。つまり、gs_clf.fitに電話することです。 documentationの次の例を参照してください。

>>> from sklearn import svm, datasets 
>>> from sklearn.model_selection import GridSearchCV 
>>> iris = datasets.load_iris() 
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} 
>>> svr = svm.SVC() 
>>> clf = GridSearchCV(svr, parameters) 
>>> clf.fit(iris.data, iris.target) 
...        
GridSearchCV(cv=None, error_score=..., 
     estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=..., 
        decision_function_shape=None, degree=..., gamma=..., 
        kernel='rbf', max_iter=-1, probability=False, 
        random_state=None, shrinking=True, tol=..., 
        verbose=False), 
     fit_params={}, iid=..., n_jobs=1, 
     param_grid=..., pre_dispatch=..., refit=..., return_train_score=..., 
     scoring=..., verbose=...) 
>>> sorted(clf.cv_results_.keys()) 
...        
['mean_fit_time', 'mean_score_time', 'mean_test_score',... 
'mean_train_score', 'param_C', 'param_kernel', 'params',... 
'rank_test_score', 'split0_test_score',... 
'split0_train_score', 'split1_test_score', 'split1_train_score',... 
'split2_test_score', 'split2_train_score',... 
'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]