2017-10-03 9 views
3

未知数の列を持つcsvファイルをネストされた辞書に読み込む方法を探していました。すなわちフォームPythonでN行のcsvファイルからネストされた辞書を作成する方法

file.csv: 
1, 2, 3, 4 
1, 6, 7, 8 
9, 10, 11, 12 

の入力用Iフォームの辞書をしたい:

{1:{2:{3:4}, 6:{7:8}}, 9:{10:{11:12}}} 

これは、CSVファイル内の値のO(1)検索を可能にするためです。 辞書の作成には比較的長い時間がかかります。アプリケーションでは1度しか作成しませんが、何百万回も検索します。

ここ

答えて

0

は私が思いついたものです後、私は不要無視できるように、私はまた、関連する列に名前を付けるためのオプションを望んでいました。コメントして改善を提案してください。

import csv 
import itertools 

def list_to_dict(lst): 
    # Takes a list, and recursively turns it into a nested dictionary, where 
    # the first element is a key, whose value is the dictionary created from the 
    # rest of the list. the last element in the list will be the value of the 
    # innermost dictionary 
    # INPUTS: 
    # lst - a list (e.g. of strings or floats) 
    # OUTPUT: 
    # A nested dictionary 
    # EXAMPLE RUN: 
    # >>> lst = [1, 2, 3, 4] 
    # >>> list_to_dict(lst) 
    # {1:{2:{3:4}}} 
    if len(lst) == 1: 
     return lst[0] 
    else: 
     data_dict = {lst[-2]: lst[-1]} 
     lst.pop() 
     lst[-1] = data_dict 
     return list_to_dict(lst) 


def dict_combine(d1, d2): 
    # Combines two nested dictionaries into one. 
    # INPUTS: 
    # d1, d2: Two nested dictionaries. The function might change d1 and d2, 
    #   therefore if the input dictionaries are not to be mutated, 
    #   you should pass copies of d1 and d2. 
    #   Note that the function works more efficiently if d1 is the 
    #   bigger dictionary. 
    # OUTPUT: 
    # The combined dictionary 
    # EXAMPLE RUN: 
    # >>> d1 = {1: {2: {3: 4, 5: 6}}} 
    # >>> d2 = {1: {2: {7: 8}, 9: {10, 11}}} 
    # >>> dict_combine(d1, d2) 
    # {1: {2: {3: 4, 5: 6, 7: 8}, 9: {10, 11}}} 

    for key in d2: 
     if key in d1: 
      d1[key] = dict_combine(d1[key], d2[key]) 
     else: 
      d1[key] = d2[key] 
    return d1 


def csv_to_dict(csv_file_path, params=None, n_row_max=None): 
    # NAME: csv_to_dict 
    # 
    # DESCRIPTION: Reads a csv file and turns relevant columns into a nested 
    #    dictionary. 
    # 
    # INPUTS: 
    # csv_file_path: The full path to the data file 
    # params:  A list of relevant column names. The resulting dictionary 
    #     will be nested in the same order as parameters in 'params'. 
    #     Default is None (read all columns) 
    # n_row_max:  The maximum number of rows to read. Default is None 
    #     (read all rows) 
    # 
    # OUTPUT: 
    # A nested dictionary containing all the relevant csv data 

    csv_dictionary = {} 

    with open(csv_file_path, 'r') as csv_file: 
     csv_data = csv.reader(csv_file, delimiter=',') 
     names = next(csv_data)   # Read title line 
     if not params: 
      # A list of column indices to read from csv 
      relevant_param_indices = list(range(0, len(names) - 1)) 
     else: 
      # A list of column indices to read from csv 
      relevant_param_indices = [] 
      for name in params: 
       if name not in names:  
       # Parameter name is not found in title line 
        raise ValueError('Could not find {} in csv file'.format(name)) 
       else: 
       # Get indices of the relevant columns 
        relevant_param_indices.append(names.index(name)) 
     for row in itertools.islice(csv_data, 1, n_row_max): 
      # Get a list containing relevant columns only 
      relevant_cols = [row[i] for i in relevant_param_indices] 
      # Turn the string to numbers. Not necessary 
      float_row = [float(element) for element in relevant_cols] 
      # Build nested dictionary 
      csv_dictionary = dict_combine(csv_dictionary, list_to_dict(float_row)) 

     return csv_dictionary 
0

ここで脆性的なアプローチとはいえ、簡単です:

>>> d = {} 
>>> with io.StringIO(s) as f: # fake a file 
...  reader = csv.reader(f) 
...  for row in reader: 
...   nested = d 
...   for val in map(int, row[:-2]): 
...    nested = nested.setdefault(val, {}) 
...   k, v = map(int, row[-2:]) # this will fail if you don't have enough columns 
...   nested[k] = v 
... 
>>> d 
{1: {2: {3: 4}, 6: {7: 8}}, 9: {10: {11: 12}}} 

しかし、これは、列の数は、少なくとも2

であると仮定し
関連する問題