Python to cython - 大規模な配列の繰り返しのパフォーマンスを向上させる

私は大きな座標の配列をループするジェネレータを使用する次の関数を持っています。パフォーマンスが重要なので、私はそれをcythonに変換しようとします。Python to cython - 大規模な配列の繰り返しのパフォーマンスを向上させる

パフォーマンスを向上させる可能性がある他の変更点はありますか？多分配列の宣言のようなcpython配列または他のですか？

geometry_converter.pyx：

def esriJson_to_CV(geometry, geometry_type): 
    def compress_geometry(coords): 
     cdef int previous_x, previous_y, current_x, current_y 
     iterator = iter(coords) 
     previous_x, previous_y = iterator.next() 
     yield previous_x 
     yield previous_y 
     for current_x, current_y in iterator: 
      yield previous_x - current_x 
      yield previous_y - current_y 
      previous_x, previous_y = current_x, current_y 

    if geometry_type == "POINT": 
     converted_geometry = [int(geometry["x"]), int(geometry["y"])] 
    elif geometry_type == "POLYLINE": 
     converted_geometry = [list(compress_geometry(path)) for path in geometry["paths"]] 
    elif geometry_type == "POLYGON": 
     converted_geometry = [list(compress_geometry(ring)) for ring in geometry["rings"]] 
    else: 
     raise Exception("geometry_converter.esriJSON_to_CV - {} geometry type not supported".format(geometry_type)) 

    return converted_geometry

ベンチマークテスト：

import time 
from functools import wraps 
import numpy as np 
import geometry_converter as gc 

def timethis(func): 
    ''' 
    Decorator that reports the execution time. 
    ''' 
    @wraps(func) 
    def wrapper(*args, **kwargs): 
     start = time.time() 
     result = func(*args, **kwargs) 
     end = time.time() 
     print(func.__name__, end-start) 
     return result 
    return wrapper 


def prepare_data(featCount, size): 
    """create numpy array with coords and fields""" 
    input = [] 
    for i in xrange(0, featCount): 
     polygon = {"rings" : []} 
     ys = np.random.uniform(0.0,89.0,size).tolist() 
     xs = np.random.uniform(-179.0,179.0,size).tolist() 
     polygon["rings"].append(zip(xs,ys)) 
     input.append(polygon) 
    return input 

@timethis 
def process_data(data): 
    output = [gc.esriJson_to_CV(x, "POLYGON") for x in data] 
    return output 


data = prepare_data(1000, 1000000) 
out = process_data(data) 
print(out[0][0][0:10])

出典

2016-12-06 Below the Radar

私はこの種の質問でCythonの実装とベンチマークの数値を期待しています。 –

このような質問に遠慮なく興味を持っている人は、cythonの実装を見たいと思っています。実際の数と測定方法についても同じです。 –

したがって、最初の回答の提案に合致するように質問内のコードを変更しましたが、この方法でより優れたパフォーマンスがあった場合はコメントしませんでした。 – jsbueno

Cythonは魔法ではありません。 Cythonのパフォーマンスの向上は、ほとんどの場合、静的な型を使用せずには意味がありません。

パフォーマンスを大幅に向上させるには、cython型の宣言を使用する必要があります。代わりにやっての例えば

、：

x = int()

あなたはどうなる：

cdef int x

あなたはcython documentationでそれらを使用する方法の完全な説明を持っています。

出典

2016-12-06 13:22:32 user312016

あなたの提案に感謝します。そのような配列を 'converted_geometry'宣言し、if文でリスト内包を使用する代わりに配列を宣言すると高速になりますか？ –

@BelowtheRadar cython配列を使用する場合、yes – user312016

Python to cython - 大規模な配列の繰り返しのパフォーマンスを向上させる

答えて

関連する問題