2017-03-24 5 views
0

に隣接する領域を探すIは1以上の値を持つすべての領域を選択したいです次のデータセットの場合は、パンダシリーズ

pd.Series(data = [0,2,0,2,3,6,3,0]) 

出力は私が醜い方法でそれを自分で解決

pd.Series(data = [False,False,False,True,True,True,True,False]) 
+1

2番目の2は5より大きい値に隣接していません。定義を明確にすることはできますか? –

+0

これは明らかにしましたか? –

+1

は厳密に1より大きいか、1より大きいですか? – FLab

答えて

1

まあ、私はGROUPBY機能パンダを使用して、ワンライナーを発見したように見えます:

import pandas as pd 

ts = pd.Series(data = [0,2,0,2,3,6,3,0]) 

# The flag column allows me to identify sequences. Here 0s are included 
# in the "sequence", but as you can see in next line doesn't matter 
df = pd.concat([ts, (ts==0).cumsum()], axis = 1, keys = ['val', 'flag']) 

# val flag 
#0 0  1 
#1 2  1 
#2 0  2 
#3 2  2 
#4 3  2 
#5 6  2 
#6 3  2 
#7 0  3 

# For each group (having the same flag), I do a boolean AND of two conditions: 
# any value above 5 AND value above 1 (which excludes zeros) 
df.groupby('flag').transform(lambda x: (x>5).any() * x > 1) 

#Out[32]: 
#  val 
#0 False 
#1 False 
#2 False 
#3 True 
#4 True 
#5 True 
#6 True 
#7 False 

あなたが思っている場合は、1行ですべてを折りたたむことができます。

ts.groupby((ts==0).cumsum()).transform(lambda x: (x>5).any() * x > 1).astype(bool) 

私はまだ私の最初のアプローチのために参考にしておきます:

import itertools 
import pandas as pd 

def flatten(l): 
    # Util function to flatten a list of lists 
    # e.g. [[1], [2,3]] -> [1,2,3] 
    return list(itertools.chain(*l)) 

ts = pd.Series(data = [0,2,0,2,3,6,3,0]) 
#Get data as list 
values = ts.values.tolist() 

# From what I understand the 0s delimit subsequences (so numbers are not 
# connected if separated by a 0 

# Get location of zeros 
gap_loc = [idx for (idx, el) in enumerate(values) if el==0] 
# Re-create pandas series 
gap_series = pd.Series(False, index = gap_loc) 

# Get values and locations of the subsequences (i.e. seperated by zeros) 
valid_loc = [range(prev_gap+1,gap) for prev_gap, gap in zip(gap_loc[:-1],gap_loc[1:])] 
list_seq = [values[prev_gap+1:gap] for prev_gap, gap in zip(gap_loc[:-1],gap_loc[1:])] 
# list_seq = [[2], [2, 3, 6, 3]] 

# Verify your condition 
check_condition = [[el>1 and any(map(lambda x: x>5, sublist)) for el in sublist] 
        for sublist in list_seq] 
# Put results back into a pandas Series 
valid_series = pd.Series(flatten(check_condition), index = flatten(valid_loc)) 

# Put everything together: 
result = pd.concat([gap_series, valid_series], axis = 0).sort_index() 

#result 
#Out[101]: 
#0 False 
#1 False 
#2 False 
#3  True 
#4  True 
#5  True 
#6  True 
#7 False 
#dtype: bool 
+0

新しい1つのライナーソリューションを確認したい場合があります – FLab

0

する必要があり、以下を参照してください。しかし、私はまだこれを行うより良い方法があるかどうかを知りたいです。

test_series = pd.Series(data = [0,2,0,2,3,6,3,0]) 

bool_df = pd.DataFrame(data= [(test_series>1), (test_series>5)]).T 
bool_df.loc[:,0] = (bool_df.loc[:,0])&(~bool_df.loc[:,1]) 
# make a boolean DataFrame. 
# Column 0 is values between 1 and 5, and column 1 is values above 5. 
# the resulting boolean series we are looking for is column 1 after it has been modified in the following way. 



k=0 # k is an integer that indexes the bool_df values that are less than 1 
while k < len(bool_df.loc[bool_df.loc[:,0],0]): 
    i = bool_df.loc[bool_df.loc[:,0],0].index[k] # the bool_df index corresponding to k 
    if i > 0: # avoid negative indeces 
     if bool_df.loc[i-1,1]: # Check if the previous entry had a value above 5 
      bool_df.loc[i,1] = True 
      k+=1 
     else: 
      j=i 
      while bool_df.loc[j,0]: # find the end of the streak of 1<values<5. 
       j+=1 
      bool_df.loc[i:j,1] = bool_df.loc[j,1] # set the whole streak to the value found at the end, either >5 or <1 
      k = sum(bool_df.loc[bool_df.loc[:,0],0].index<j) 
    else: 
     k+=1