プロジェクトの実行中にこのエラーが発生しました:ValueError: Found arrays with inconsistent numbers of samples: [878049 884262]
。san Francisの犯罪を予測するValueError
私の下のknnクラシファイアを実行しようとすると、それが表示されます。私はそれについて読んできたし、私はそれが私のXとYが同じではないことを知っている。 Xの形状は(878049,2)、yは(884262)です。
どうすればこのエラーを修正して一致させることができますか?
コード:
# drop features that we wont be using
# train.head()
df = train.drop(['Descript', 'Resolution', 'Address'], axis=1)
df2 = test.drop(['Address'], axis=1)
# trying to see the times during a day a particular crime occurs, for example
# rapes occur more from 12am-4am during the weekend.
# example below
dow = {
'Monday':0,
'Tuesday':1,
'Wednesday':2,
'Thursday':3,
'Friday':4,
'Saturday':5,
'Sunday':6
}
df['DOW'] = df.DayOfWeek.map(dow)
# Add column containing time of day
df['Hour'] = pd.to_datetime(df.Dates).dt.hour
# making my feature column
feature_cols = ['DOW', 'Hour']
X = df[feature_cols]
df2['DOW'] = df2.DayOfWeek.map(dow)
y = df2['DOW']
# columns in X and y don't match
print(X.shape)
print(y.shape)
print(y.head())
print(X.head())
# Knn classifier
k = 5
my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)
my_knn_for_cs4661.fit(X, y)
# KNN (with k=5), Decision Tree accuracy
y_predict = my_knn_for_cs4661.predict(X)
print('\n')
score = accuracy_score(y, y_predict)
print("K=",k,"Has ",score, "Accuracy")
results = pd.DataFrame()
results['actual'] = y
results['prediction'] = y_predict
print(results.head(10))
スタックトレース:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-5a002c1fd668> in <module>()
7 k = 5
8 my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)
----> 9 my_knn_for_cs4661.fit(X, y)
10 #KNN (with k=5), Decision Tree accuracy
11 y_predict = my_knn_for_cs4661.predict(X)
C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, X, y)
776 """
777 if not isinstance(X, (KDTree, BallTree)):
--> 778 X, y = check_X_y(X, y, "csr", multi_output=True)
779
780 if y.ndim == 1 or y.ndim == 2 and y.shape[1] == 1:
C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
518 y = y.astype(np.float64)
519
--> 520 check_consistent_length(X, y)
521
522 return X, y
C:\Users\Michael\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
174 if len(uniques) > 1:
175 raise ValueError("Found arrays with inconsistent numbers of samples: "
--> 176 "%s" % str(uniques))
177
178
ValueError: Found arrays with inconsistent numbers of samples: [878049 884262]
スタックトレースを追加できますか? –
@SayaliSonawane大丈夫、私はそれを追加しました – lupejuares
XとYの形状をX.shapeを使って確認してください。 Stack traceは、XとYのインスタンス数が違うと言います。 –