Kaggle - Titanicデータセット( "train.csv"と "test.csv"の場合はURL- )のナイーブベイズ分類子を訓練しようとしています。しかし、「出力」は本当に何が含まれていませんTitanic KaggleデータセットNaive BayesクラシファイアエラーRプログラミング
library(e1071)
train_d <- read.csv("train.csv", stringsAsFactors = TRUE)
# columns chosen for training data-
# colnames(TD) OR names(TD)
# "Survived", "Pclass", "Sex", "Age", "SibSp", "Parch","Embarked"
train_data <- train_d[, c(2:3, 5:8, 12)]
# to find out which columns contain NA (missing values)-
colnames(train_data)[apply(is.na(train_data), 2, any)]
# mean(TD$age, na.rm = TRUE) # to find mean of 'age' which contains 'NA'
# which(is.na(age))
# fill in missing value (NA) with mean of 'Age' column-
train_data$Age[which(is.na(train_data$Age))] <- mean(train_data$Age, na.rm = TRUE)
# check whether there are any existing NAs-
which(is.na(train_data$Age))
# OR-
colnames(train_data)[apply(is.na(train_data), 2, any)]
test_d <- read.csv("test.csv", stringsAsFactors = TRUE)
# columns chosen for training data-
# "Pclass", "Sex", "Age", "SibSp", "Parch", "Embarked"
test_data <- test_d[, c(2, 4:7, 11)]
# find out missing values (NA)-
colnames(test_data)[apply(is.na(test_data), 2, any)]
# fill in missing value (NA) with mean of 'Age' column-
test_data$Age[which(is.na(test_data$Age))] <- mean(test_data$Age, na.rm = TRUE)
# check whether there are any existing NAs-
which(is.na(train_data$Age))
# OR-
colnames(train_data)[apply(is.na(train_data), 2, any)]
# training a naive-bayes classifier-
titanic_nb <- naiveBayes(Survived ~ Pclass + Sex + Age + SibSp + Parch + Embarked, data = train_data)
# predict using trained naive-bayes classifier-
output <- predict(titanic_nb, test_data, type = "class")
follows-として
私は今のところ出ているコードがあります。 '出力'変数の出力は
> output
factor(0)
Levels:
何が問題になりますか?
ありがとうございます!
おそらく[this](https://stackoverflow.com/questions/17904190/why-does-naivebayes-return-all-nas-for-multiclass-classification-in-r)は、すべての文字列を要因に変換した後に – akrun