2017-01-27 10 views
0

私はPysparkを使用してKMeansアルゴリズムを実装しようとしていますが、whileループの最後の行に上記のエラーが表示されます。ループの外側でうまく動作しますが、ループを作成した後でこのエラーが発生しました これをどのように修正しますか?TypeError:タイプ 'map'のオブジェクトにはlen()がありませんPython3

# Find K Means of Loudacre device status locations 
# 
# Input data: file(s) with device status data (delimited by '|') 
# including latitude (13th field) and longitude (14th field) of device locations 
# (lat,lon of 0,0 indicates unknown location) 
# NOTE: Copy to pyspark using %paste 

# for a point p and an array of points, return the index in the array of the point closest to p 
def closestPoint(p, points): 
    bestIndex = 0 
    closest = float("+inf") 
    # for each point in the array, calculate the distance to the test point, then return 
    # the index of the array point with the smallest distance 
    for i in range(len(points)): 
     dist = distanceSquared(p,points[i]) 
     if dist < closest: 
      closest = dist 
      bestIndex = i 
    return bestIndex 

# The squared distances between two points 
def distanceSquared(p1,p2): 
    return (p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2 

# The sum of two points 
def addPoints(p1,p2): 
    return [p1[0] + p2[0], p1[1] + p2[1]] 

# The files with device status data 
filename = "/loudacre/devicestatus_etl/*" 

# K is the number of means (center points of clusters) to find 
K = 5 

# ConvergeDist -- the threshold "distance" between iterations at which we decide we are done 
convergeDist=.1 

# Parse device status records into [latitude,longitude] 
rdd2=rdd1.map(lambda line:(float((line.split(",")[3])),float((line.split(",")[4])))) 
# Filter out records where lat/long is unavailable -- ie: 0/0 points 
# TODO 
filterd=rdd2.filter(lambda x:x!=(0,0)) 
# start with K randomly selected points from the dataset 
# TODO 
sample=filterd.takeSample(False,K,42) 
# loop until the total distance between one iteration's points and the next is less than the convergence distance specified 
tempDist =float("+inf") 
while tempDist > convergeDist: 
    # for each point, find the index of the closest kpoint. map to (index, (point,1)) 
    # TODO 
    indexed =filterd.map(lambda (x1,x2):(closestPoint((x1,x2),sample),((x1,x2),1))) 

    # For each key (k-point index), reduce by adding the coordinates and number of points 

    reduced=indexed.reduceByKey(lambda x,y: ((x[0][0]+y[0][0],x[0][1]+y[0][1]),x[1]+y[1])) 
    # For each key (k-point index), find a new point by calculating the average of each closest point 
    # TODO 
    newCenters=reduced.mapValues(lambda x1: [x1[0][0]/x1[1], x1[0][1]/x1[1]]).sortByKey() 
    # calculate the total of the distance between the current points and new points 
    newSample=newCenters.collect() #new centers as a list 
    samples=zip(newSample,sample) #sample=> old centers 
    samples1=sc.parallelize(samples) 
    totalDistance=samples1.map(lambda x:distanceSquared(x[0][1],x[1])) 
    # Copy the new points to the kPoints array for the next iteration 
    tempDist=totalDistance.sum() 
    sample=map(lambda x:x[1],samples) #new sample for next iteration as list 
sample 
+4

エラーメッセージは、私にはかなりはっきりと明確に見えます - ' map'はPython 2のようなリストではなくジェネレータを返します。 – miradulo

+0

strackトレースを投稿してください。どのラインに問題があるかを言わずに、100行のコードを投稿しました。 – tdelaney

+0

関連(可能なdupe?):http://stackoverflow.com/a/12319034/748858 – mgilson

答えて

9

あなたがlenをサポートしていません(発電タイプの)lenmapのオブジェクトを取得しようとしているため、このエラーを取得しています。たとえば:

>>> x = [[1, 'a'], [2, 'b'], [3, 'c']] 

# `map` returns object of map type 
>>> map(lambda a: a[0], x) 
<map object at 0x101b75ba8> 

# on doing `len`, raises error 
>>> len(map(lambda a: a[0], x)) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
TypeError: object of type 'map' has no len() 

長さを見つけるために、次のように入力し、キャストするmaplist(またはtuple)になりますと、あなたはそれの上にlenを呼び出すことができます。たとえば、次のように

>>> len(list(map(lambda a: a[0], x))) 
3 

それとも、単に(mapを使用せずに)リストの内包を使用してリストを作成するにはさらに良いですのように:

>>> my_list = [a[0] for a in x] 

# since it is a `list`, you can take it's length 
>>> len(my_list) 
3 
+1

リストの理解度があります: 'sample = [x [1] for x samples]' –