NbClustのエラー：クラスタに十分なオブジェクトがありません

RのNbClustメソッドを使用して、Manningという本のアプローチに従ったクラスター分析で最適なクラスター数を決定しようとしています。 hclustでNbClustのエラー：クラスタに十分なオブジェクトがありません

エラー（MD、メソッド= "平均"）：はしかし、私はというエラーメッセージが表示されますクラスタへのn> = 2つのオブジェクトを持っている必要があります。

hclustメソッドが動作しているように見えますが。したがって、NbClustが内部に1つのオブジェクトのみを含むグループを作成しようとしているという問題（エラーメッセージにも記載されている）が原因であると想定しています。

私のデータセットがhereを発見し、ここに私のコードであることができます。

mydata = read.table("PLR_2016_WM_55_5_Familienstand_aufbereitet.csv", skip = 0, sep = ";", header = TRUE) 

mydata <- mydata[-1] # Without first line (int) 
data.transformed <- t(mydata) # Transformation of matrix 
data.scale <- scale(data.transformed) # Scaling of table 
data.dist <- dist(data.scale) # Calculates distances between points 

fit.average <- hclust(data.dist, method = "average") 
plot(fit.average, hang = -1, cex = .8, main = "Average Linkage Clustering") 

library(NbClust) 
nc <- NbClust(data.scale, distance="euclidean", 
      min.nc=2, max.nc=15, method="average")

私は同様の問題hereを見つけましたが、私は、コードを適応することができませんでした。

出典

2017-06-28 Hannah

データセットにはいくつかの問題があります。
最後の4行にはデータが含まれていないため、削除する必要があります。

mydata <- read.table("PLR_2016_WM_55_5_Familienstand_aufbereitet.csv", skip = 0, sep = ";", header = TRUE) 
mydata <- mydata[1:(nrow(mydata)-4),] 
mydata[,1] <- as.numeric(mydata[,1])

今データセット再スケール：何らかの理由data.scaleため

data.transformed <- t(mydata) # Transformation of matrix 
data.scale <- scale(data.transformed) # Scaling of table

がフルランク行列ではありません。したがって

dim(data.scale) 
# [1] 72 447 
qr(data.scale)$rank 
# [1] 71

、我々はdata.scaleから行を削除し、それをトランスポーズ：

data.scale <- t(data.scale[-72,])

データセットはNbClustの準備ができました。

library(NbClust) 
nc <- NbClust(data=data.scale, distance="euclidean", 
      min.nc=2, max.nc=15, method="average")

出力は

[1] "Frey index : No clustering structure in this data set" 
*** : The Hubert index is a graphical method of determining the number of clusters. 
       In the plot of Hubert index, we seek a significant knee that corresponds to a 
       significant increase of the value of the measure i.e the significant peak in Hubert 
       index second differences plot. 

*** : The D index is a graphical method of determining the number of clusters. 
       In the plot of D index, we seek a significant knee (the significant peak in Dindex 
       second differences plot) that corresponds to a significant increase of the value of 
       the measure. 

******************************************************************* 
* Among all indices:             
* 8 proposed 2 as the best number of clusters 
* 4 proposed 3 as the best number of clusters 
* 8 proposed 4 as the best number of clusters 
* 1 proposed 5 as the best number of clusters 
* 1 proposed 8 as the best number of clusters 
* 1 proposed 11 as the best number of clusters 

        ***** Conclusion *****        

* According to the majority rule, the best number of clusters is 2 

*******************************************************************

出典

2017-06-28 11:25:37

でいただきありがとうございます。あなたの答えはとても助けになりました。 – Hannah

NbClustのエラー：クラスタに十分なオブジェクトがありません

答えて

関連する問題