ソートアルゴリズムにより、大きなデータセットを処理する際にスタックオーバーフローが発生しますか？

私は以下のタイプのデータをより良い方法でソートする方法を模索しています。以下ここでは、このソートアルゴリズムにより、大きなデータセットを処理する際にスタックオーバーフローが発生しますか？

public class AttributeItem 
{ 
    public string AttributeType { get; set; } 
    public string Title { get; set; } 
    public string Value { get; set; } 
    public int ObjectID { get; set; } 
    public bool CanModify { get; set; } 
    public bool CanDelete { get; set; } 
    public bool? IsParent { get; set; } 
    public int SortID { get; set; } 
    public int? ParentSortID { get; set; } 
    public bool Deleted { get; set; } 
} 

public class AttributeItemNode 
{ 
    public AttributeItem Item {get;set;} 
    public int Depth {get;set;} 

    public AttributeItemNode(AttributeItem item , int Depth) 
    { 
     this.Item = item ; 
     this.Depth = Depth; 
    } 
}

のように見えているデータを保持しているその場所での構造（一部のシステムで他の9000の2000年）より小さなデータセットのために正常に動作しますが、大きいものを処理するときにStackOverflowが発生し

深度を示すintを持つ単一のオブジェクトにソートする必要があるデータの例です。子どもたちのレベルは次のように予想される出力は次のようになり、データ例に示される3つのレベル

var items = new List<AttributeItem>(); 
items.Add(new AttributeItem{Title ="Parent1", ObjectID=1,SortID =1, IsParent= true, ParentSortID = Int32.MinValue}); 

items.Add(new AttributeItem{Title ="FooChild", ObjectID=2,SortID =2, IsParent= false, ParentSortID = 1}); 

items.Add(new AttributeItem{Title ="Parent2", ObjectID=4,SortID =4, IsParent= true, ParentSortID = Int32.MinValue}); 

items.Add(new AttributeItem{ Title ="Parent2Child1", ObjectID=5,SortID =5, IsParent= false, ParentSortID = 4}); 

items.Add(new AttributeItem{Title ="Parent2Child2", ObjectID=7,SortID =7, IsParent= false, ParentSortID = 4}); 

items.Add(new AttributeItem{Title ="Parent2Child2Child1", ObjectID=6,SortID =6, IsParent= false, ParentSortID = 5});

より深く行くことが（私は読みやすさを支援するために、オブジェクトから無関係なデータを削除した）

可能ですここで

Depth = 0 Title ="Parent1" 
Depth = 1 Title ="FooChild" 
Depth = 0 Title ="Parent2" 
Depth = 1 Title ="Parent2Child1" 
Depth = 2 Title ="Parent2Child2Child1" 
Depth = 1 Title ="Parent2Child2"

は、実際のソートコード

public static IList<AttributeItemNode> SortAttributeItems(IList<AttributeItem> list) 
    { 
     List<AttributeItemNode> newList = new List<AttributeItemNode>(); 
     SortAttributeItems(list, null, 0, newList); 

     return newList; 
    } 

    private static void SortAttributeItems(IList<AttributeItem> list, AttributeItem currentItem, int depth, List<AttributeItemNode> newList) 
    { 
     AttributeItem newItem = null; 
     // look for children 
     if (currentItem != null) 
     { 
      foreach (AttributeItem item in list) 
      { 
       if (item.ParentSortID.HasValue && item.ParentSortID.Value != Int32.MinValue && item.ParentSortID.Value == currentItem.SortID) 
       { 
        newList.Add(new AttributeItemNode(item, (depth + 1))); 
        SortAttributeItems(list, item, depth + 1, newList); 
       } 
      } 
     } 

     if (depth == 0) 
     { 
      foreach (AttributeItem item in list) 
      { 
       if (!item.ParentSortID.HasValue || item.ParentSortID.Value == Int32.MinValue) 
       { 
        if (currentItem == null || item.SortID >= currentItem.SortID) 
        { 
         if (newItem == null || newItem.SortID >= item.SortID) 
         { 
          newItem = item; 
         } 
        } 
       } 
      } 
     } 

     if (newItem != null) 
     { 
      newList.Add(new AttributeItemNode(newItem, depth)); 
      list.Remove(newItem); 
      SortAttributeItems(list, newItem, depth, newList); 
     } 

    }

出典

2016-10-01 Kevin Kunderman

私はあなたのポイントを見ますが、質問で述べたように。それは大量のデータで失敗します。最適化を探しているだけです。大きなデータセットで実際に遅い場合は、私は同意します。 –

タイトルを編集して、*改善*以外のものを求めてください。*はより効果的なものを作ることを意味します*と、最初の段落は*よりうまくいくと尋ねます。それを改善*。 –

ありがとうございます。私の議決は取り下げられました。 :-) –

再帰を使用しないで効率的に問題を解決できます。それは2つの部分に分割することができます - ツリー構造を作成し、反復pre-order Depth First Traversalを使用してツリーを平坦化し、各レベルをソートします。

最初の部分については、O（N）時にはParentSortIDで高速ルックアップ構造を作成するためにLINQ ToLookupメソッドを使用できます。

DRYの原則に従うと、私はHow to flatten tree via LINQ?への私の答えの汎用メソッドを使用して、項目と深度からのカスタム結果に投影できるオーバーロードを作成します（これは私が既に見るように）：

public static class TreeUtils 
{ 
    public static IEnumerable<TResult> Expand<T, TResult>(
     this IEnumerable<T> source, Func<T, IEnumerable<T>> elementSelector, Func<T, int, TResult> resultSelector) 
    { 
     var stack = new Stack<IEnumerator<T>>(); 
     var e = source.GetEnumerator(); 
     try 
     { 
      while (true) 
      { 
       while (e.MoveNext()) 
       { 
        var item = e.Current; 
        yield return resultSelector(item, stack.Count); 
        var elements = elementSelector(item); 
        if (elements == null) continue; 
        stack.Push(e); 
        e = elements.GetEnumerator(); 
       } 
       if (stack.Count == 0) break; 
       e.Dispose(); 
       e = stack.Pop(); 
      } 
     } 
     finally 
     { 
      e.Dispose(); 
      while (stack.Count != 0) stack.Pop().Dispose(); 
     } 
    } 
}

そして、ここで問題になっているメソッドの実装です：

public static IList<AttributeItemNode> SortAttributeItems(IList<AttributeItem> list) 
{ 
    var childrenMap = list.ToLookup(e => e.ParentSortID ?? int.MinValue); 
    return childrenMap[int.MinValue].OrderBy(item => item.SortID) 
     .Expand(parent => childrenMap[parent.SortID].OrderBy(item => item.SortID), 
      (item, depth) => new AttributeItemNode(item, depth)) 
     .ToList(); 
}

出典

2016-10-01 16:00:13

非常に良い解決です、ありがとう –

' Expand'メソッド呼び出し（Item、depth）=>新しいAttributeItemNode（item、depth） 'のように' resultSelector'を指定しているところで、深さ値は '

@MrinalKamboj 'Expand'メソッドから来ています（再帰メソッドの' depth'変数に似ています）。内部に明示的な 'stack'を使用しているので、' stack.Count'は現在の深さです。 –

あなたは、単に深さを計算するために親のポインタをたどることができない何らかの理由で？

あなたは今、各AttributeItem itemを取り、行うことができますキーとしてSortIdとDictionary<int,AttributeItem> mapに入れた場合：

int depth = 0; 
var current = item; 
while (!current.IsParent) 
{ 
    depth++; 
    current = map[current.ParentSortId; 
}

あなたは木やグラフの多くNugetパッケージのいずれかを使用した場合、あなたがこれを行うことができますそれが有効でありサイクルを含まないかどうかをチェックすることを含む、あなたのデータに関する多くの他のグラフ操作。

IsParentがありますが、ParentSortIdにマーカー値があるという2つの方法で同じ情報を表示しないことをお勧めします。これらが一致しないとどうなりますか？

出典

2016-10-01 03:11:29

'Parent'にトラバースすることで、子に深度を割り当てることができますが、期待どおりに階層を作成するためには再帰が必要になります。実際に私が提供しているソリューションは、類似の辞書ベースの戦略を使用しています –

public class AttributeItemNode : IComparable<AttributeNode> { 

    public int CompareTo(AttributeItemNode other) { 
     // compare the Ids in appropriate order 
    } 
} 

public class NodeCollection { 
    protected List<AttributeItemNode> nodes; 

    public void AddNode() { } 

    public void Sort() { 
     nodes.Sort(); 
     this.CalcDepth(); 
    } 

    protected void CalcDepth { 
     foreach (var node in nodes) 
      if (node.IsParent) { node.Depth = 0; break; } 

      //use the various Ids that are now in sorted order 
      // and calculate the item's Depth. 
    } 
}

AttributeItemは、すでにソートに必要なすべてを備えています。上記のCompareTo()を実装するには、IsParent（多分？）、SortId、およびParentSortIdを使用してください。

Calc深度はソート後のみであり、再帰の必要はありません。その後

：

myNodeCollection.Sort()

List.Sort() .NETがインテリジェントに、いくつかのソートアルゴリズムのどれを使用することを決定。

出典

2016-10-01 04:54:49 radarbob

これは、確実にOPが求めているものを解決しません、 'AttributeItem List'の子供を親にリンクさせる最初の部分は完了せず、特定のサブ階層では、各サブパーツでは実行できないので、あなたのソリューションは、IComparable を使用して達成されるものです。 –

質問は再帰によるスタックオーバーフローの回避と深さの計算については避けています。キーオーダーですべてが必要です。何らかの種類のツリーまたはリンクされたリストが必要な場合は、ソートされたリストをトラバースしてビルドします。それでも再帰は避けられます。このソリューションは、大きなデータボリュームを処理する可能性があります。 – radarbob

あなたの提案に基づいてソリューションを実装しようとすると、期待どおりに動作しない場所を理解し、OPによって提供されたデータを試してみてください。 'IComparer'を使ったソートを使った' Parents'と 'Children'の配置は問題を単純化しすぎて、期待通りのツリー構造にならないでしょう –

私は期待通りの結果を達成するために、親と子どもと再帰をマッピングするために辞書を使用してコードの修正版をチェックアウト（私の中でバージョンを管理するために、クリーンで簡単です）

重要な仮定

すべての子要素、トップレベルの親がParentSortId
のObjectIDが与えられた要素
ParentSortID再帰的方法はList<AttributeItemNode>を取って見ることができるように親

深い再帰レベルの重要な変形

のオブジェクトID（識別子）であり、そしてDictionary<int?, List<AttributeItem>>を入力として標準クラス変数にすることで回避することができます。それ以外の場合、すべての再帰的cすべて、これはStackOverflow Exceptionにつながる、より深いレベルの再帰の問題を引き起こす可能性があります。 PrimarySortMethodに

、ただ結果を疑問に示すList<AttributeItem>を供給し、私は次ソート方法

private static List<AttributeItemNode> SortAttributeItems(List<AttributeItem> attributeItemList) 
{ 
    // Final Result list 
    List<AttributeItemNode> returnAttributeItemNodeList = new List<AttributeItemNode>(); 

    // Item dictionary to map the children of each AttributeItem 
    var attributeItemDictionary = new Dictionary<int?, List<AttributeItem>>(); 

    // Only Main/Top Level Parents (depth = 0) 
    var parentsMainLevelAttributeItems = attributeItemList.Where(i => i.ParentSortID == Int32.MinValue); 

    // Create a Dictionary mapping/grouping between Parents and Children 
    foreach (var item in attributeItemList) 
    { 
     if (item.ParentSortID != Int32.MinValue) 
     { 
      if (attributeItemDictionary.ContainsKey(item.ParentSortID)) 
       attributeItemDictionary[item.ParentSortID].Add(item); 
      else 
       attributeItemDictionary.Add(item.ParentSortID, new List<AttributeItem> { item }); 
     } 
    } 

    // Add the parents and their children to the result list 
    foreach (var mainItem in parentsMainLevelAttributeItems) 
    { 
     returnAttributeItemNodeList.Add(new AttributeItemNode(mainItem,0)); 

     SortAttributeItems(returnAttributeItemNodeList,attributeItemDictionary,mainItem.ObjectID,0); 
    } 

    return (returnAttributeItemNodeList); 
}

再帰Sortメソッドその作業

をテストしている

private static void SortAttributeItems(List<AttributeItemNode> attributeItemNodeList, Dictionary<int?, List<AttributeItem>> attributeItemDictionary, int currentParentId, int depth) { // Traverse through all the Children and Add them Recursively foreach (var childItem in attributeItemDictionary[currentParentId]) { attributeItemNodeList.Add(new AttributeItemNode(childItem,depth+1)); if(!attributeItemDictionary.ContainsKey(childItem.ObjectID) || !attributeItemDictionary[childItem.ObjectID].Any()) break; else SortAttributeItems(attributeItemNodeList,attributeItemDictionary,childItem.ObjectID,depth+1); } }

出典

2016-10-01 10:05:28

ソートアルゴリズムにより、大きなデータセットを処理する際にスタックオーバーフローが発生しますか？

答えて

関連する問題