2017-02-18 6 views
1

HDFSにJSONデータをロードしました。次のように、MySQLデータベースに必要な列を含むテーブルを作成しました。sqoopを使ってhdfsからmysqlにjsonデータを挿入するには?

JSONを受け入れるための行フォーマッタを使用してテーブルを作成するにはどうすればよいですか?

マイHDFSデータ

{ 
"Employees" : [ 
{ 
"userId":"rirani", 
"jobTitleName":"Developer", 
"firstName":"Romin", 
"lastName":"Irani", 
"preferredFullName":"Romin Irani", 
"employeeCode":"E1", 
"region":"CA", 
"phoneNumber":"408-1234567", 
"emailAddress":"[email protected]" 
}, 
{ 
"userId":"nirani", 
"jobTitleName":"Developer", 
"firstName":"Neil", 
"lastName":"Irani", 
"preferredFullName":"Neil Irani", 
"employeeCode":"E2", 
"region":"CA", 
"phoneNumber":"408-1111111", 
"emailAddress":"[email protected]" 
}, 
{ 
"userId":"thanks", 
"jobTitleName":"Program Directory", 
"firstName":"Tom", 
"lastName":"Hanks", 
"preferredFullName":"Tom Hanks", 
"employeeCode":"E3", 
"region":"CA", 
"phoneNumber":"408-2222222", 
"emailAddress":"[email protected]" 
} 
] 
} 

私のSQLテーブル構造

sqoop export --connect jdbc:mysql://localhost/emp_scheme --username root --password adithyan --table employee --export-dir /user/adithyan/filesystem/employee.txt 
を次のように私はsqoopエクスポートを使用して、上記の表のためのMySQLへの私のHDFSからデータをロードしようとしています

mysql> create table employee(userid int,jobTitleName varchar(20),firstName varchar(20),lastName varchar(20),preferrredFullName varchar(20),employeeCode varchar(20),region varchar(20),phoneNumber varchar(20), emailAddress varchar(20),modifiedDate timestamp DEFAULT CURRENT_TIMESTAMP); 
mysql> desc employee; 
+--------------------+-------------+------+-----+-------------------+-------+ 
| Field    | Type  | Null | Key | Default   | Extra | 
+--------------------+-------------+------+-----+-------------------+-------+ 
| userid    | int(11)  | YES |  | NULL    |  | 
| jobTitleName  | varchar(20) | YES |  | NULL    |  | 
| firstName   | varchar(20) | YES |  | NULL    |  | 
| lastName   | varchar(20) | YES |  | NULL    |  | 
| preferrredFullName | varchar(20) | YES |  | NULL    |  | 
| employeeCode  | varchar(20) | YES |  | NULL    |  | 
| region    | varchar(20) | YES |  | NULL    |  | 
| phoneNumber  | varchar(20) | YES |  | NULL    |  | 
| emailAddress  | varchar(20) | YES |  | NULL    |  | 
| modifiedDate  | timestamp | NO |  | CURRENT_TIMESTAMP |  | 
+--------------------+-------------+------+-----+-------------------+-------+ 
10 rows in set (0.00 sec) 

次のように例外を伴います。

17/02/18 19:35:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 
17/02/18 19:35:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 
17/02/18 19:35:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 
17/02/18 19:35:35 INFO tool.CodeGenTool: Beginning code generation 
17/02/18 19:35:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1 
17/02/18 19:35:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1 
17/02/18 19:35:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/adithyan/hadoop_dir/hadoop-1.2.1 
Note: /tmp/sqoop-adithyan/compile/35afadf151a1dd1626a3658577cbc2dd/employee.java uses or overrides a deprecated API. 
Note: Recompile with -Xlint:deprecation for details. 
17/02/18 19:35:41 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-adithyan/compile/35afadf151a1dd1626a3658577cbc2dd/employee.jar 
17/02/18 19:35:41 INFO mapreduce.ExportJobBase: Beginning export of employee 
17/02/18 19:35:45 INFO input.FileInputFormat: Total input paths to process : 1 
17/02/18 19:35:45 INFO input.FileInputFormat: Total input paths to process : 1 
17/02/18 19:35:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
17/02/18 19:35:45 WARN snappy.LoadSnappy: Snappy native library not loaded 
17/02/18 19:35:46 INFO mapred.JobClient: Running job: job_201702181051_0002 
17/02/18 19:35:47 INFO mapred.JobClient: map 0% reduce 0% 
17/02/18 19:36:17 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000000_0, Status : FAILED 
java.io.IOException: Can't export data, please check failed map task logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 
Caused by: java.lang.RuntimeException: Can't parse input data: '"firstName":"Tom"' 
    at employee.__loadFromFields(employee.java:596) 
    at employee.parse(employee.java:499) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) 
    ... 10 more 
Caused by: java.lang.NumberFormatException: For input string: ""firstName":"Tom"" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:569) 
    at java.lang.Integer.valueOf(Integer.java:766) 
    at employee.__loadFromFields(employee.java:548) 
    ... 12 more 

17/02/18 19:36:18 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000001_0, Status : FAILED 
java.io.IOException: Can't export data, please check failed map task logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 
Caused by: java.lang.RuntimeException: Can't parse input data: '{' 
    at employee.__loadFromFields(employee.java:596) 
    at employee.parse(employee.java:499) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) 
    ... 10 more 
Caused by: java.lang.NumberFormatException: For input string: "{" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:580) 
    at java.lang.Integer.valueOf(Integer.java:766) 
    at employee.__loadFromFields(employee.java:548) 
    ... 12 more 

17/02/18 19:36:29 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000000_1, Status : FAILED 
java.io.IOException: Can't export data, please check failed map task logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 
Caused by: java.lang.RuntimeException: Can't parse input data: '"firstName":"Tom"' 
    at employee.__loadFromFields(employee.java:596) 
    at employee.parse(employee.java:499) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) 
    ... 10 more 
Caused by: java.lang.NumberFormatException: For input string: ""firstName":"Tom"" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:569) 
    at java.lang.Integer.valueOf(Integer.java:766) 
    at employee.__loadFromFields(employee.java:548) 
    ... 12 more 

17/02/18 19:36:29 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000001_1, Status : FAILED 
java.io.IOException: Can't export data, please check failed map task logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 
Caused by: java.lang.RuntimeException: Can't parse input data: '{' 
    at employee.__loadFromFields(employee.java:596) 
    at employee.parse(employee.java:499) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) 
    ... 10 more 
Caused by: java.lang.NumberFormatException: For input string: "{" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:580) 
    at java.lang.Integer.valueOf(Integer.java:766) 
    at employee.__loadFromFields(employee.java:548) 
    ... 12 more 

17/02/18 19:36:42 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000000_2, Status : FAILED 
java.io.IOException: Can't export data, please check failed map task logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 
Caused by: java.lang.RuntimeException: Can't parse input data: '"firstName":"Tom"' 
    at employee.__loadFromFields(employee.java:596) 
    at employee.parse(employee.java:499) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) 
    ... 10 more 
Caused by: java.lang.NumberFormatException: For input string: ""firstName":"Tom"" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:569) 
    at java.lang.Integer.valueOf(Integer.java:766) 
    at employee.__loadFromFields(employee.java:548) 
    ... 12 more 

17/02/18 19:36:42 INFO mapred.JobClient: Task Id : attempt_201702181051_0002_m_000001_2, Status : FAILED 
java.io.IOException: Can't export data, please check failed map task logs 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) 
    at org.apache.hadoop.mapred.Child.main(Child.java:249) 
Caused by: java.lang.RuntimeException: Can't parse input data: '{' 
    at employee.__loadFromFields(employee.java:596) 
    at employee.parse(employee.java:499) 
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) 
    ... 10 more 
Caused by: java.lang.NumberFormatException: For input string: "{" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
    at java.lang.Integer.parseInt(Integer.java:580) 
    at java.lang.Integer.valueOf(Integer.java:766) 
    at employee.__loadFromFields(employee.java:548) 

誰か助けてくれますか?

+0

まず、useridをvarchar(20)に変更してみてください。あなたのjsonではintではないことは明らかです。 – Igor

+0

変更したユーザーIDをvarchar(20)に変更しようとしましたが、まだ失敗してもjsonデータの入力フィールドを解析できません –

答えて

-1

複数のオプションを調べる必要があるかもしれません。 JSON_SET/REPLACE/INSERT - これらのオプションはまだsqoopによって直接サポートされていないかもしれません。

もう1つの選択肢は、ブタを使用してデータを前処理し、RDBMSにsqoopingする前にHDFSでデータをステージングすることです。

関連する問題