Spring Batchの入力として異なるデータ構造フォーマットの複数のファイル

私の研究に基づいて、Spring Batchはさまざまな種類のデータファイルフォーマットを処理するためのAPIを提供しています。Spring Batchの入力として異なるデータ構造フォーマットの複数のファイル

しかし私は、1つのチャンク/タスクレットで異なるフォーマットの複数のファイルをどのように提供するかについて明確にする必要があります。

私はMultiResourceItemReaderが複数のファイルを処理できることを知っていますが、AFAIKはすべてのファイルが同じフォーマットとデータ構造でなければなりません。

質問：私たちは、どのようにして、異なるデータ形式の複数のファイルをタスクレットの入力として提供できますか？

2016-12-13 Shravya Reddy

:-)あなたは、異なるデータ形式のいずれかの指標がありますか？ –

複数の入力形式用に用意されたバッチ・バッチ・リーダーがないと思います。

あなた自身でビルドする必要があります。もちろん、既存のFileItemReaderをカスタムファイルリーダーのデリゲートとして再利用することができます。また、それぞれのファイルタイプ/フォーマットについて、正しいものを使用してください。

出典

2016-12-13 16:43:19 Asoub

Asoubは正しいですし、「すべて読んでいる」というすぐに使えるSpring Batchリーダーはありません。しかし、かなりシンプルで簡単なクラスのほんの一握りのクラスでは、さまざまなファイル形式を持つ異なるファイルを処理するJava設定スプリングバッチアプリケーションを作成できます。

私のアプリケーションでは、同様のタイプのユースケースがありました。私は、「汎用」リーダーと呼ばれるものを作成するためにSpring Batchフレームワークのかなり簡単で直接的な実装と拡張をたくさん書いています。あなたの質問に答えてみましょう：下に、私は春のバッチを使用してさまざまな種類のファイル形式を調べるために使用したコードを見つけるでしょう。明らかに、ストリップされた実装が見つかりますが、正しい方向に進むはずです。

1行レコードによって表される：

public class Record { 

    private Object[] columns; 

    public void setColumnByIndex(Object candidate, int index) { 
     columns[index] = candidate; 
    } 

    public Object getColumnByIndex(int index){ 
     return columns[index]; 
    } 

    public void setColumns(Object[] columns) { 
     this.columns = columns; 
    } 
}

各行は複数の列を含み、列はデリミタによって分離されています。 file1に10個のカラムが含まれていても、file2に3個のカラムしか含まれていない場合でも問題ありません。

以下読者は単にレコードに各ラインをマッピング：

@Component 
public class GenericReader { 

    @Autowired 
    private GenericLineMapper genericLineMapper; 

    @SuppressWarnings({ "unchecked", "rawtypes" }) 
    public FlatFileItemReader reader(File file) { 
     FlatFileItemReader<Record> reader = new FlatFileItemReader(); 
     reader.setResource(new FileSystemResource(file)); 
     reader.setLineMapper((LineMapper) genericLineMapper.defaultLineMapper()); 
     return reader; 
    } 
}

マッパーは、ラインをとり、オブジェクトの配列に変換：

@Component 
public class GenericLineMapper { 

    @Autowired 
    private ApplicationConfiguration applicationConfiguration; 

    @SuppressWarnings({ "unchecked", "rawtypes" }) 
    public DefaultLineMapper defaultLineMapper() { 
     DefaultLineMapper lineMapper = new DefaultLineMapper(); 
     lineMapper.setLineTokenizer(tokenizer());  
     lineMapper.setFieldSetMapper(new CustomFieldSetMapper()); 
     return lineMapper; 
    } 

    private DelimitedLineTokenizer tokenizer() { 
     DelimitedLineTokenizer tokenize = new DelimitedLineTokenizer();  
     tokenize.setDelimiter(Character.toString(applicationConfiguration.getDelimiter())); 
     tokenize.setQuoteCharacter(applicationConfiguration.getQuote()); 
     return tokenize; 
    } 
}

変換の「魔法」レコードの列はFieldSetMapperで発生します。

@Component 
public class CustomFieldSetMapper implements FieldSetMapper<Record> { 

    @Override 
    public Record mapFieldSet(FieldSet fieldSet) throws BindException { 
     Record record = new Record(); 
     Object[] row = new Object[fieldSet.getValues().length]; 
     for (int i = 0; i < fieldSet.getValues().length; i++) { 
      row[i] = fieldSet.getValues()[i]; 
     } 
     record.setColumns(row); 
     return record; 
    } 
}

yaml configurユーザーは、入力ディレクトリーとファイル名のリストを提供し、列に区切り文字が含まれている場合は、適切な区切り文字と文字を使用して列を引用します。このようなyaml構成の例は次のとおりです。

@Component 
@ConfigurationProperties 
public class ApplicationConfiguration { 

    private String inputDir; 
    private List<String> fileNames; 
    private char delimiter; 
    private char quote; 

    // getters and setters ommitted 
}

次にアプリケーションです。YML：

input-dir: src/main/resources/ 
file-names: [yourfile1.csv, yourfile2.csv, yourfile3.csv] 
delimiter: "|" 
quote: "\""

そして最後にではなく、少なくとも、それをすべて一緒置くには：

@Configuration 
@EnableBatchProcessing 
public class BatchConfiguration { 

    @Autowired 
    public JobBuilderFactory jobBuilderFactory; 
    @Autowired 
    public StepBuilderFactory stepBuilderFactory; 
    @Autowired 
    private GenericReader genericReader; 
    @Autowired 
    private NoOpWriter noOpWriter; 
    @Autowired 
    private ApplicationConfiguration applicationConfiguration; 

    @Bean 
    public Job yourJobName() { 
     List<Step> steps = new ArrayList<>(); 
     applicationConfiguration.getFileNames().forEach(f -> steps.add(loadStep(new File(applicationConfiguration.getInputDir() + f)))); 

     return jobBuilderFactory.get("yourjobName")     
       .start(createParallelFlow(steps)) 
       .end() 
       .build(); 
    } 

    @SuppressWarnings("unchecked") 
    public Step loadStep(File file) { 
     return stepBuilderFactory.get("step-" + file.getName()) 
       .<Record, Record> chunk(10) 
       .reader(genericReader.reader(file)) 
       .writer(noOpWriter) 
       .build(); 
    } 

    private Flow createParallelFlow(List<Step> steps) { 
     SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor(); 
     // max multithreading = -1, no multithreading = 1, smart size = steps.size() 
     taskExecutor.setConcurrencyLimit(1); 

     List<Flow> flows = steps.stream() 
       .map(step -> new FlowBuilder<Flow>("flow_" + step.getName()).start(step).build()) 
       .collect(Collectors.toList()); 

     return new FlowBuilder<SimpleFlow>("parallelStepsFlow") 
       .split(taskExecutor) 
       .add(flows.toArray(new Flow[flows.size()])) 
       .build(); 
    } 
}

デモの目的のために、あなただけの1つのパッケージにすべてのクラスを置くことができます。 NoOpWriterは、単にテストファイルの2番目の列を記録します。

@Component 
public class NoOpWriter implements ItemWriter<Record> { 

    @Override 
    public void write(List<? extends Record> items) throws Exception { 
     items.forEach(i -> System.out.println(i.getColumnByIndex(1)));  
     // NO - OP 
    } 
}

幸運

出典

2016-12-13 20:59:45

Spring Batchの入力として異なるデータ構造フォーマットの複数のファイル

答えて

関連する問題