2017-09-01 5 views
1

単語/文の間の距離を計算し、scipyリンケージ関数でそれらを実行しますが、元の入力に戻す方法を知る必要があります。私。私はリンケージ機能がそれを受け入れないので、途中で私のラベルを失う。Scipy:出力する連鎖ラベルZ

tl; dr; ラベル(var X)をリンケージ関数の出力に関連付ける方法がわかりません。

X = [ 
    "the weather is good", 
    "it is a rainy day", 
    "it is raining today", 
    "This has something to do with today", 
    "This has something to do with tomorrow", 
] 

# my magic function 

result_set = [['this has something to do with today', 'this has something to do with tomorrow', 0.95044514149501169], 
    ['this has something to do with today', 'it is a rainy day', 0.27315656750393491], 
    ['this has something to do with today', 'it is raining today', 0.21404567560988952], 
    ['this has something to do with today', 'the weather is good', 0.12284646267479128], 
    ['this has something to do with tomorrow', 'it is a rainy day', 0.28564020977046212], 
    ['this has something to do with tomorrow', 'it is raining today', 0.19174771483161279], 
    ['this has something to do with tomorrow', 'the weather is good', 0.12920110156248313], 
    ['it is a rainy day', 'it is raining today', 0.54390124565447373], 
    ['it is a rainy day', 'the weather is good', 0.20843820300588964], 
    ['it is raining today', 'the weather is good', 0.19278767792873652]] 

sims = np.array(result_set)[:, 2] 
sims = ['0.950445141495' '0.273156567504' '0.21404567561' '0.122846462675' 
    '0.28564020977' '0.191747714832' '0.129201101562' '0.543901245654' 
    '0.208438203006' '0.192787677929'] 

Z = linkage(sims, 'ward') 
Z = [[ 0.   4.   0.12284646 2.  ] 
    [ 1.   3.   0.19174771 2.  ] 
    [ 2.   5.   0.27143491 3.  ] 
    [ 6.   7.   0.70328415 5.  ]] 

答えて

2

私は距離関数に類似点を入力していましたので、その結果をシンプルに反転してから意味が分かりました。次の場合、ラベルは正しく表示されます

dendrogram(
    Z, 
    labels=X, 
    orientation="right", 
    leaf_rotation=0, # rotates the x axis labels 
    leaf_font_size=8, # font size for the x axis labels 
)