シボーンジョイントプロットで外れ値を注釈する

"チップ"データセットをジョイントプロットとしてグラフ化すると、グラフ上の上位10個の外れ値（または上位n個の外れ値）を「ヒント」データフレームからのインデックスでラベル付けしたいと思います。私は、外れ値を見つけるための残差（平均線からの距離）を計算します。この異常値の検出方法のメリットは無視してください。私は仕様に従ってグラフに注釈を付けるだけです。シボーンジョイントプロットで外れ値を注釈する

import seaborn as sns 
sns.set(style="darkgrid", color_codes=True) 

tips = sns.load_dataset("tips") 
model = pd.ols(y=tips.tip, x=tips.total_bill) 
tips['resid'] = model.resid 

#indices to annotate 
tips.sort_values(by=['resid'], ascending=[False]).head(5)

tips.sort_values(by=['resid'], ascending=[False]).tail(5)

%matplotlib inline 
g = sns.jointplot("total_bill", "tip", data=tips, kind="reg", 
        xlim=(0, 60), ylim=(0, 12), color="r", size=7)

は、どのように私は、各ポイントのインデックス値によってグラフ上のトップ10の外れ値（最大5および最小の5残差）（最大残差に注釈を付けるん）これを持つ：

出典

2017-03-24 Thomas Matthew

matplotlib annotateを使用して、ある点に対する注釈を作成できます。このアイデアは、データフレームを繰り返し処理し、"tip"と"total_bill"列で指定されたそれぞれの位置に注釈を配置することです。

import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt 

sns.set(style="darkgrid", color_codes=True) 

tips = sns.load_dataset("tips") 
model = pd.ols(y=tips.tip, x=tips.total_bill) 
tips['resid'] = model.resid 

g = sns.jointplot("total_bill", "tip", data=tips, kind="reg", 
        xlim=(0, 60), ylim=(0, 12), color="r", size=7) 

#indices to annotate 
head = tips.sort_values(by=['resid'], ascending=[False]).head(5) 

tail = tips.sort_values(by=['resid'], ascending=[False]).tail(5) 

def ann(row): 
    ind = row[0] 
    r = row[1] 
    plt.gca().annotate(ind, xy=(r["total_bill"], r["tip"]), 
      xytext=(2,2) , textcoords ="offset points",) 

for row in head.iterrows(): 
    ann(row) 
for row in tail.iterrows(): 
    ann(row) 

plt.show()

なおパンダバージョン0.20 pandas.ols has been removedのよう。それを置き換えるには、 statsmodelsから OLS modelを使用することができます。それぞれの行は次のようになります。

import statsmodels.api as sm 
model = sm.OLS(tips.tip, tips.total_bill) 
tips['resid'] = model.fit().resid

結果は若干異なります（重みが異なるためです）。

出典

2017-03-24 23:35:09 ImportanceOfBeingErnest

iterablesを 'head'と' tail'でソートして切り捨てるのは、実際のデータセットのような大規模なデータフレームの反復回数を減らすのに最適です。ありがとう –

これは本当にクールです。よくやった！ – Charlie

新しいバージョンのpandasの解決策で答えを更新しました。 – ImportanceOfBeingErnest

シボーンジョイントプロットで外れ値を注釈する

答えて

関連する問題