私はpandasデータフレームで2つの変数を使用してWelch Two Sample t-testを実行しようとしています。両方の変数は文字列です。私はJupyterノートブックを使用しています - 私はscipy.stats.ttest_ind()TypeError:/: 'unicode'と 'int'のサポートされていないオペランドタイプ

camis  cuisine_description dba  boro zipcode record_date inspection_date score grade critical_flag action violation_code violation_description inspection_type 
40372466 American MURALS ON 54/RANDOLPHS'S MANHATTAN 10019 2017-04-26T06:00:59.000 2016-03-10T00:00:00.000 10 A Critical Violations were cited in the following area(s). 02H Food not cooled by an approved method whereby ... Cycle Inspection/Re-inspection 
50Jewish/Kosher SUSHI FUSSION QUEENS 11375 2017-04-26T06:00:59.000 2015-12-08T00:00:00.000 20 B Not Critical Violations were cited in the following area(s). 10I Single service item reused, improperly stored,... Cycle Inspection/Re-inspection 
41028194 Chinese SAI'S CAFE BROOKLYN 11219 2017-04-26T06:00:59.000 2015-01-02T00:00:00.000 13 A Not Critical Violations were cited in the following area(s). 10I Single service item reused, improperly stored,... Cycle Inspection/Re-inspection  

TypeError         Traceback (most recent call last) 
<ipython-input-228-5ba9bcaf819c> in <module>() 
    1 from scipy import stats 
----> 2 print(scipy.stats.ttest_ind(gradeRm['inspection_type'], gradeRm['grade'])) 

/Users/sharonmorris/anaconda/lib/python2.7/site- packages/scipy/stats/stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy) 
    4058   return Ttest_indResult(np.nan, np.nan) 
-> 4060  v1 = np.var(a, axis, ddof=1) 
    4061  v2 = np.var(b, axis, ddof=1) 
    4062  n1 = a.shape[axis] 

    /Users/sharonmorris/anaconda/lib/python2.7/site- packages/numpy/core/fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims) 
    3125  return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, 
-> 3126       **kwargs) 

/Users/sharonmorris/anaconda/lib/python2.7/site-packages/numpy/core/_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims) 
103  if isinstance(arrmean, mu.ndarray): 
104   arrmean = um.true_divide(
--> 105     arrmean, rcount, out=arrmean, casting='unsafe', subok=False) 
106  else: 
107   arrmean = arrmean.dtype.type(arrmean/rcount) 

TypeError: unsupported operand type(s) for /: 'unicode' and 'int' 

ユニコード文字列を数学関数に渡しています。その文字列がどのように見えるかによって、おそらくdtypeを変更するだけです。あなたはデータフレームのスニペットを投稿できますか? – pshep123


normalize gradeRm ['inspection_type']、gradeRm ['grade']データです。例。インポートUnicodeデータ unicodedata.normalize( 'NFKD'、gradeRm ['inspection_type'])。encode( 'ascii'、 'ignore') – vikasmcajnu




import urllib 
import json 
import pandas as pd 
import bumpy as np 
from collections import Counter 
import scipy.stats 
from scipy import stats 

url = "https://data.cityofnewyork.us/resource/9w7m-hzhe.json" 
response = urllib.urlopen(url) 
data = json.loads(response.read()) 
pdData = pd.DataFrame(data) 

#remove na  
dataB = pdData.dropna() 

#remove unnecessary values 
gradeYes = ['A', 'B', 'C'] 
gradeRm = dataB.query('[email protected]') 

print(scipy.stats.ttest_ind(gradeRm['inspection_type'], gradeRm['grade'])) 

