2017-02-03 6 views
1

私は私の多重線形回帰方程式を計算しました。調整したR-二乗を見たいと思います。私は、スコア関数が私にr-squaredを見せることを認めていますが、調整されていません。python sklearn複数線形回帰表示r-squared

import pandas as pd #import the pandas module 
import numpy as np 
df = pd.read_csv ('/Users/jeangelj/Documents/training/linexdata.csv', sep=',') 
df 
     AverageNumberofTickets NumberofEmployees ValueofContract Industry 
    0    1     51     25750 Retail 
    1    9     68     25000 Services 
    2    20     67     40000 Services 
    3    1     124     35000 Retail 
    4    8     124     25000 Manufacturing 
    5    30     134     50000 Services 
    6    20     157     48000 Retail 
    7    8     190     32000 Retail 
    8    20     205     70000 Retail 
    9    50     230     75000 Manufacturing 
    10    35     265     50000 Manufacturing 
    11    65     296     75000 Services 
    12    35     336     50000 Manufacturing 
    13    60     359     75000 Manufacturing 
    14    85     403     81000 Services 
    15    40     418     60000 Retail 
    16    75     437     53000 Services 
    17    85     451     90000 Services 
    18    65     465     70000 Retail 
    19    95     491     100000 Services 

from sklearn.linear_model import LinearRegression 
model = LinearRegression() 
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets 
model.fit(X, y) 
model.score(X, y) 
>>0.87764337132340009 

私はそれを手動でチェックし、0.87764はR-平方です。 0.863248は調整されたR-二乗である。

答えて

12

R^2adjusted R^2を計算するさまざまな方法がありますが、次は(あなたが提供されたデータで計算された)それらのほんの一部です:

from sklearn.linear_model import LinearRegression 
model = LinearRegression() 
X, y = df[['NumberofEmployees','ValueofContract']], df.AverageNumberofTickets 
model.fit(X, y) 

# compute with formulas from the theory 
yhat = model.predict(X) 
SS_Residual = sum((y-yhat)**2) 
SS_Total = sum((y-np.mean(y))**2) 
r_squared = 1 - (float(SS_Residual))/SS_Total 
adjusted_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1) 
print r_squared, adjusted_r_squared 
# 0.877643371323 0.863248473832 

# compute with sklearn linear_model, although could not find any function to compute adjusted-r-square directly from documentation 
print model.score(X, y), 1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1) 
# 0.877643371323 0.863248473832 

# compute with statsmodels, by adding intercept manually 
import statsmodels.api as sm 
X1 = sm.add_constant(X) 
result = sm.OLS(y, X1).fit() 
#print dir(result) 
print result.rsquared, result.rsquared_adj 
# 0.877643371323 0.863248473832 

# compute with statsmodels, another way, using formula 
import statsmodels.formula.api as sm 
result = sm.ols(formula="AverageNumberofTickets ~ NumberofEmployees + ValueofContract", data=df).fit() 
#print result.summary() 
print result.rsquared, result.rsquared_adj 
# 0.877643371323 0.863248473832 
+1

印象 - FYI非常 – jeangelj

+2

をありがとう、あなたはモデルを使用することができます式のX.shape [1]ではなく.coef_です。その方法をもっと説明する –