2017-11-09 7 views

答えて

0

ベースRを使用すると、次のことができます。

set.seed(12345) 
#getting training data set sizes of .20 (in this case 20 out of 100) 
train.x<-sample(1:100, 20) 
train.y<-sample(1:100, 20) 

#simulating random data 
x<-rnorm(100) 
y<-rnorm(100) 

#sub-setting the x data 
training.x.data<-x[train] 
testing.x.data<-x[-train] 

#sub-setting the y data 
training.y.data<-y[train] 
testing.y.data<-y[-train] 
0

あなたはcaret使用してこれを行うことができますのcreateDataPartition機能を:

library(caret) 

# Make example data 
X = data.frame(matrix(rnorm(200), nrow = 100)) 
y = rnorm(100) 

#Extract random sample of indices for test data 
set.seed(42) #equivalent to python's random_state arg 
test_inds = createDataPartition(y = 1:length(y), p = 0.2, list = F) 

# Split data into test/train using indices 
X_test = X[test_inds, ]; y_test = y[test_inds] 
X_train = X[-test_inds, ]; y_train = y[-test_inds] 

あなたはまた、test_indsを作成することができます 'test_inds = sample(1:length(y), ceiling(length(y) * 0.2))

3

を使用して' 最初からそう

#read in iris dataset 
data(iris) 
library(caret) #this package has the createDataPartition function 

set.seed(123) #randomization` 

#creating indices 
trainIndex <- createDataPartition(iris$Species,p=0.75,list=FALSE) 

#splitting data into training/testing data using the trainIndex object 
IRIS_TRAIN <- iris[trainIndex,] #training data (75% of data) 

IRIS_TEST <- iris[-trainIndex,] #testing data (25% of data) 
を行うにはおそらく最も簡単な方法を
関連する問題