2017-10-19 8 views
0

2つの一時表のUNIONを実行していますが、列で注文しようとしていますが、注文している列を解決できないという不満があります。これはバグですか、何か不足していますか?私は、UNIONをやっていないよあればSELECT句にない列で注文Spark SQL UNION - ORDER BY列がSELECTに含まれていません

+-------------+---------------+ 
|   id|   name| 
+-------------+---------------+ 
|old_order_id1|old_order_name1| 
|old_order_id2|old_order_name2| 
+-------------+---------------+ 


cannot resolve '`oo.is_old`' given input columns: [id, name]; line 5 pos 9; 
'Sort ['oo.is_old ASC NULLS FIRST], true 
+- Distinct 
    +- Union 
     :- Project [id#121, name#122] 
     : +- SubqueryAlias oo 
     :  +- SubqueryAlias old_orders 
     :  +- LogicalRDD [id#121, name#122, is_old#123] 
     +- Project [id#131, name#132] 
     +- SubqueryAlias no 
      +- SubqueryAlias new_orders 
       +- LogicalRDD [id#131, name#132, is_old#133] 

org.apache.spark.sql.AnalysisException: cannot resolve '`oo.is_old`' given input columns: [id, name]; line 5 pos 9; 
'Sort ['oo.is_old ASC NULLS FIRST], true 
+- Distinct 
    +- Union 
     :- Project [id#121, name#122] 
     : +- SubqueryAlias oo 
     :  +- SubqueryAlias old_orders 
     :  +- LogicalRDD [id#121, name#122, is_old#123] 
     +- Project [id#131, name#132] 
     +- SubqueryAlias no 
      +- SubqueryAlias new_orders 
       +- LogicalRDD [id#131, name#132, is_old#133] 

が動作し、私がやっている場合、それは失敗します。

lazy val spark: SparkSession = SparkSession.builder.master("local[*]").getOrCreate() 
    import org.apache.spark.sql.types.StringType 

    val oldOrders = Seq(
    Seq("old_order_id1", "old_order_name1", "true"), 
    Seq("old_order_id2", "old_order_name2", "true") 
) 

    val newOrders = Seq(
    Seq("new_order_id1", "new_order_name1", "false"), 
    Seq("new_order_id2", "new_order_name2", "false") 
) 
    val schema = new StructType() 
    .add("id", StringType) 
    .add("name", StringType) 
    .add("is_old", StringType) 

    val oldOrdersDF = spark.createDataFrame(spark.sparkContext.makeRDD(oldOrders.map(x => Row(x:_*))), schema) 
    val newOrdersDF = spark.createDataFrame(spark.sparkContext.makeRDD(newOrders.map(x => Row(x:_*))), schema) 

    oldOrdersDF.createOrReplaceTempView("old_orders") 
    newOrdersDF.createOrReplaceTempView("new_orders") 

    //ordering by column not in select works if I'm not doing UNION 
    spark.sql(
    """ 
     |SELECT oo.id, oo.name FROM old_orders oo 
     |ORDER BY oo.is_old 
    """.stripMargin).show() 

    //ordering by column not in select doesn't work as I'm doing a UNION 
    spark.sql(
    """ 
     |SELECT oo.id, oo.name FROM old_orders oo 
     |UNION 
     |SELECT no.id, no.name FROM new_orders no 
     |ORDER BY oo.is_old 
    """.stripMargin).show() 

上記のコードの出力があります2つのテーブルのUNION。

答えて

0
// So even the syntax of Spark SQL is very similar to SQL, 
// but they are working very differently. Under the hood of Spark, its all about Rdds/dataframes. 
// After the UNION statement, a new dataframe is generated, and we are not able to refer the fields from the old table/dataframe if we did not select them. 

// how to fix 
spark.sql(
    """ 
    |SELECT id, name 
    |FROM (
    | SELECT oo.id, oo.name, oo.is_old FROM old_orders oo 
    | UNION 
    | SELECT no.id, no.name, no.is_old FROM new_orders no 
    | ORDER BY oo.is_old 
    |) t 
    """.stripMargin).show() 

ありがとうございます。

関連する問題