Spring Batch tutorial with example

Spring batch is used to create and process the batch jobs. It provides various features like logging, job statistics, transaction management, restarting jobs. It is very helpful in processing of large dataset but with finite volume of data.
In this tutorial we will learn how to create and execute the spring batch job. In our example we will create a job which will import all the words from a text file to database and then at last it will print the total number of words available in the database.
Below is the project structure.

Creating batch job

Sample text file to import

Below is the contents of text file which we use for importing the words.

The list below gives you the 1000 most frequently used English words in alphabetical order.
Once you've mastered the shorter vocabulary lists, this is the next step.
It would take time to learn the entire list from scratch, but you are probably already familiar with some of these words.
Feel free to copy this list into your online flashcard management tool, an app, or print it out to make paper flashcards.
You will have to look up the definitions on your own either in English or in your own language. Good luck improving your English vocabulary!

a
ability
able
about
above
accept
according
account

Maven dependency

We need to add below dependencies for spring-batch, h2 database and spring data JPA.

      <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-batch</artifactId>
      </dependency>

      <dependency>
        <groupId>com.h2database</groupId>
        <artifactId>h2</artifactId>
        <scope>runtime</scope>
      </dependency>

      <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
      </dependency>

application.properties configuration

Below properties are added to configure the database properties. Generate ddl property is used to create the tables automatically as per defined entity beans.

spring.datasource.url=jdbc:h2:mem:app-data
spring.datasource.jdbcUrl=jdbc:h2:mem:app-data
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=

spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
spring.jpa.generate-ddl=true

Below property to enable the H2 database console, so we can check the tables and other objects like any SQL editor.

spring.h2.console.enabled=true

Below property need to put if you don't want to run your batch job automatically on every start of your application otherwise by default it will run all the defined job on each time application starts.

spring.batch.job.enabled=false

Data source configuration

In this example we will use same database for both application and batch job. You can check my another post on how to use multiple data source with Spring boot and batch application.
Multiple data source with Spring boot, batch and cloud task

Data source and repository bean configuration

@Configuration
@EnableJpaRepositories(
        entityManagerFactoryRef = "appEntityManagerFactory",
        basePackages = "com.ttj.app.repository"
)
@EnableTransactionManagement
public class AppDataSourceConfig {

    @Bean
    @ConfigurationProperties(prefix = "spring.datasource")
    public DataSource appDataSource(){
        return DataSourceBuilder.create().build();
    }

    @Bean(name = "appEntityManagerFactory")
    public LocalContainerEntityManagerFactoryBean appEntityManagerFactory(EntityManagerFactoryBuilder builder,
            @Qualifier("appDataSource") DataSource appDataSource){

        return builder
                .dataSource(appDataSource)
                .packages("com.ttj.app.domain")
                .persistenceUnit("app")
                .build();
    }
}

Repository class

package com.ttj.app.repository;

import com.ttj.app.domain.Word;
import org.springframework.data.repository.CrudRepository;

public interface WordRepository  extends CrudRepository {}

Domain object (Entity)

package com.ttj.app.domain;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity
@Table(name="WORDS")
public class Word {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column
    private String text;
    
    @Enumerated(EnumType.STRING)
    private Language language;
    
    //getter methods
    //setter methods

}

Below is the enum class used by above entity class.

public enum Language {
 EN, HI;
}

Batch job configuration

Batch job is a collection of steps to execute them in specified order. Any job contains some steps which executed collectively, for example in our case we can list below steps for our job.

Import Words

Read a text file line by line.
Extract the words from each line.
Write the words in bunches to database.

Finally print the total number of words available in database.

Steps can be creates two ways, one is using Tasklet and another is using a chain of reader/processor & writer. We will create the first 3 steps using the reader/writer and processor and for last step we will use Tasklet.

Step1 - Import words

In this step we want to perform a chain of tasks, like read the file then extract the words and then write them to database. So for this step we will use reader/processor and writer implementation to create the step.
Below bean defines the reader where it reads text file line by line from class-path. In mapper we can define it to create some other object also from each line. But here we are reading it as a string only.

    @Bean
    public FlatFileItemReader<String> reader() {
        return new FlatFileItemReaderBuilder<String>()
                .name("fileReader")
                .resource(new ClassPathResource("words.txt"))
                .lineMapper(new LineMapper<String>() {
                    @Override
                    public String mapLine(String s, int i) throws Exception {
                        return s;
                    }
                })
                .build();
    }

In below processor definition we are transforming the single line to list of Word class.

    @Bean
    public ItemProcessor<String, List<Word>> processor() {
        return new ItemProcessor<String, List<Word>>(){
            @Override
            public List<Word> process(String s) throws Exception {
                if(s!=null && s.length()>0){
                    String[] arr = s.split("[\\s,=\\.*]");
                    if(arr!=null && arr.length>0){
                        List<Word> list = new ArrayList<>();
                        for (int i=0;i<arr.length;i++){
                            if(arr[i]!=null && arr[i].length()>0)
                                list.add(new Word(arr[i], Language.EN));
                        }
                        return list;
                    }
                }
                return null;
            }
        };
    }

Now in our writer, it provides the list of items returned by processor definition. Since in our processor we are transforming each line as list of word, so in writer we are getting the list of list of words to process them in chunks. Chunks size we define at job configuration which we will see latter.

    @Bean
    public ItemWriter<List<Word>> writer(@Qualifier("appEntityManagerFactory") EntityManagerFactory appEntityManagerFactory) {
        ItemWriter<List<Word>> writer = new ItemWriter<List<Word>>(){
            @Override
            @Transactional
            public void write(List<? extends List<Word>> items) {
                items.forEach(item->{
                        wordRepository.saveAll(item);
                });
            }
        };
        return writer;
    }

Step2- Print total count of words

In this step we need to print the count of total words in database after import, so we will create a Tasklet bean as given below.

    @Bean
    public Step totalCountStep(){
        return stepBuilderFactory.get("totalCountStep")
                .tasklet(new Tasklet() {

                    @Override
                    public RepeatStatus execute(StepContribution contribution,
                                                ChunkContext chunkContext) throws Exception {

                        System.out.println("Total word count: "+wordRepository.count());
                        return RepeatStatus.FINISHED;
                    }
                }).build();
    }

Create the job configuration using above steps

Now we will define the bean for our Job using the above steps.

    @Bean
    public Job importWordsJob(Step importStep, Step totalCountStep) {
        return jobBuilderFactory.get("importWordsJob")
                .incrementer(new RunIdIncrementer())
                .flow(importStep)
                .next(totalCountStep)
                .end()
                .build();
    }

Autowiring required dependencies

Below are our dependencies for Job builder factory, Step builder factory and repository class which are required to define above beans. We don't need to define these beans as spring already handles them for us with auto-configuration enabled.

    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Autowired
    private WordRepository wordRepository;

Spring boot main class annotations

Below is our main class with required annotations where we have used annotation EnableBatchProcessing so it can configure the required beans like builder factories for batch.

@EnableBatchProcessing
@SpringBootApplication
@ComponentScan("com.ttj")
public class BatchTutorialApplication {

 public static void main(String[] args) {
  SpringApplication.run(BatchTutorialApplication.class, args);
 }
}

Executing batch job

There are multiple ways to run the batch job, like enabling the job execution on application startup and registering with spring cloud data flow server. See the below links on how to setup data flow server and execute the batch job using spring cloud data flow server.
Setup Spring Cloud Data Flow Server
Spring batch job execution with Spring cloud data flow server
Another way to execute using the Job launcher which we will see in this example. We will create a REST service endpoint which will invoke the batch job and this service URL can be called using any browser or HTTP client.

REST Service class to execute the batch job using web URL

Below is the code of our REST service which have job launcher and job bean autowired to execute the job using launcher. Here we are passing a job parameter with date string which is only used to execute the job with unique parameter every time otherwise this job will execute only once till the application is running.

@RestController
@RequestMapping("/jobs")
public class JobController {

    @Autowired
    JobLauncher jobLauncher;

    @Autowired
    private Job importWordsJob;

    @GetMapping("/importWords")
    public void runJob(){
        try {
            JobParametersBuilder builder = new JobParametersBuilder();
            builder.addString("startDate", LocalDateTime.now().toString());

            jobLauncher.run(importWordsJob, builder.toJobParameters());
        }catch(Exception e){
            e.printStackTrace();
        }
    }
}

Now our service is ready to run and execute the batch job. Execute below command in project root directory to run the application.

clean spring-boot:run

Now hit the service URL http://localhost:8080/jobs/importWords in your web browser.
You will see below result in the application log or console.

2019-12-21 16:38:58.542  INFO 7156 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] launched with the following parameters: [{startDate=2019-12-21T16:38:58.495}]
2019-12-21 16:38:58.576  WARN 7156 --- [nio-8080-exec-1] o.s.c.t.b.l.TaskBatchExecutionListener   : This job was executed outside the scope of a task but still used the task listener.
2019-12-21 16:38:58.587  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [importStep]
2019-12-21 16:38:58.857  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.step.AbstractStep         : Step: [importStep] executed in 270ms
2019-12-21 16:38:58.875  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [totalCountStep]
Total word count: 104
2019-12-21 16:38:59.011  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.step.AbstractStep         : Step: [totalCountStep] executed in 135ms
2019-12-21 16:38:59.018  INFO 7156 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] completed with the following parameters: [{startDate=2019-12-21T16:38:58.495}] and the following status: [COMPLETED] in 449ms

Now we we will execute this job one more time using the same service URL and you will see below lines of logs added.

2019-12-21 16:47:17.261  INFO 7156 --- [nio-8080-exec-4] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] launched with the following parameters: [{startDate=2019-12-21T16:47:17.252}]
2019-12-21 16:47:17.264  WARN 7156 --- [nio-8080-exec-4] o.s.c.t.b.l.TaskBatchExecutionListener   : This job was executed outside the scope of a task but still used the task listener.
2019-12-21 16:47:17.271  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.job.SimpleStepHandler     : Executing step: [importStep]
2019-12-21 16:47:17.322  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.step.AbstractStep         : Step: [importStep] executed in 51ms
2019-12-21 16:47:17.327  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.job.SimpleStepHandler     : Executing step: [totalCountStep]
Total word count: 208
2019-12-21 16:47:17.336  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.step.AbstractStep         : Step: [totalCountStep] executed in 9ms
2019-12-21 16:47:17.338  INFO 7156 --- [nio-8080-exec-4] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] completed with the following parameters: [{startDate=2019-12-21T16:47:17.252}] and the following status: [COMPLETED] in 75ms

Git source code

You can find the complete source code at below GIT location. This source code also includes the code for multiple data source configuration with cloud task configuration.
https://github.com/thetechnojournals/spring-tutorials/tree/master/SpringBatchTutorial

Microservices with Spring Boot - complete tutorial

In this tutorial we are going to learn how to develop microservices using spring boot with examples. Main focus of this tutorial is on learning by doing hands-on. Before hands-on we will first understand what is microservices and related terminologies like DDD, 12-Factors App, Dev Ops. What is a Microservice In simple terms microservice is a piece of software which has a single responsibility and can be developed, tested & deployed independently. In microservices we focus on developing independent and single fully functioning modules. Opposite to microservice, with monolithic application it focuses on all the functionality or modules in a single application. So when any changes required to monolithic application it has to deploy and test the complete application while with microservice it has to develop and deploy only affected component which is a small service. It saves lot of development and deployment time in a large application. It's basically an architectural style ...

Chris Lesnar23 July 2021 at 14:02
Thanks for sharing this spring batch tutorial with example. I will really help me a lot. SQL Server Load Soap API
Singharnav5 June 2023 at 11:51
If you're seeking CCNA Training in Noida, your search ends at APTRON NOIDA. With its extensive experience in providing high-quality IT training, APTRON NOIDA stands as a premier institute in the city. With a proven track record of transforming aspiring networking professionals into competent CCNA-certified experts, APTRON NOIDA offers a comprehensive training program that combines theoretical knowledge with practical hands-on experience.
Muskan 22 July 2023 at 16:20
Your blog post provides a concise introduction to Spring Batch and its key features, which include logging, job statistics, transaction management, and job restarting. It also highlights the usefulness of Spring Batch in efficiently processing large datasets, making it a valuable tool for handling batch jobs.
Software Testing Trends To Look Out For In 2023

The Techno Journals

Search This Blog