2017-05-03 13 views
0

私はpandasデータフレームで2つの変数を使用してWelch Two Sample t-testを実行しようとしています。両方の変数は文字列です。私はJupyterノートブックを使用しています - 私はscipy.stats.ttest_ind()TypeError:/: 'unicode'と 'int'のサポートされていないオペランドタイプ

camis  cuisine_description dba  boro zipcode record_date inspection_date score grade critical_flag action violation_code violation_description inspection_type 
40372466 American MURALS ON 54/RANDOLPHS'S MANHATTAN 10019 2017-04-26T06:00:59.000 2016-03-10T00:00:00.000 10 A Critical Violations were cited in the following area(s). 02H Food not cooled by an approved method whereby ... Cycle Inspection/Re-inspection 
50Jewish/Kosher SUSHI FUSSION QUEENS 11375 2017-04-26T06:00:59.000 2015-12-08T00:00:00.000 20 B Not Critical Violations were cited in the following area(s). 10I Single service item reused, improperly stored,... Cycle Inspection/Re-inspection 
41028194 Chinese SAI'S CAFE BROOKLYN 11219 2017-04-26T06:00:59.000 2015-01-02T00:00:00.000 13 A Not Critical Violations were cited in the following area(s). 10I Single service item reused, improperly stored,... Cycle Inspection/Re-inspection  

TypeError         Traceback (most recent call last) 
<ipython-input-228-5ba9bcaf819c> in <module>() 
    1 from scipy import stats 
----> 2 print(scipy.stats.ttest_ind(gradeRm['inspection_type'], gradeRm['grade'])) 

/Users/sharonmorris/anaconda/lib/python2.7/site- packages/scipy/stats/stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy) 
    4058   return Ttest_indResult(np.nan, np.nan) 
    4059 
-> 4060  v1 = np.var(a, axis, ddof=1) 
    4061  v2 = np.var(b, axis, ddof=1) 
    4062  n1 = a.shape[axis] 

    /Users/sharonmorris/anaconda/lib/python2.7/site- packages/numpy/core/fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims) 
    3124 
    3125  return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, 
-> 3126       **kwargs) 

/Users/sharonmorris/anaconda/lib/python2.7/site-packages/numpy/core/_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims) 
103  if isinstance(arrmean, mu.ndarray): 
104   arrmean = um.true_divide(
--> 105     arrmean, rcount, out=arrmean, casting='unsafe', subok=False) 
106  else: 
107   arrmean = arrmean.dtype.type(arrmean/rcount) 

TypeError: unsupported operand type(s) for /: 'unicode' and 'int' 
+0

ユニコード文字列を数学関数に渡しています。その文字列がどのように見えるかによって、おそらくdtypeを変更するだけです。あなたはデータフレームのスニペットを投稿できますか? – pshep123

+0

normalize gradeRm ['inspection_type']、gradeRm ['grade']データです。例。インポートUnicodeデータ unicodedata.normalize( 'NFKD'、gradeRm ['inspection_type'])。encode( 'ascii'、 'ignore') – vikasmcajnu

答えて

0

パラメータが数値でなければなりません運データフレームの

import urllib 
import json 
import pandas as pd 
import bumpy as np 
from collections import Counter 
import scipy.stats 
from scipy import stats 

url = "https://data.cityofnewyork.us/resource/9w7m-hzhe.json" 
response = urllib.urlopen(url) 
data = json.loads(response.read()) 
pdData = pd.DataFrame(data) 

#remove na  
dataB = pdData.dropna() 

#remove unnecessary values 
gradeYes = ['A', 'B', 'C'] 
gradeRm = dataB.query('[email protected]') 

print(scipy.stats.ttest_ind(gradeRm['inspection_type'], gradeRm['grade'])) 

スニペットで非常に多くの異なるシナリオを試してみました関数がその平均値を比較しているためです。文字列には使用できません。

関連する問題