Dagster supports data backfills for each partition or subsets of partitions. After defining a partitioned job, you can use backfills to submit runs for each partition in the set.
You can launch and monitor backfills of a job using the Partitions tab.
To launch a backfill, click the "Launch backfill" button at the top center of the Partitions tab. This opens the "Launch backfill" modal, which lets you select the set of partitions to launch the backfill over. A run will be launched for each partition.
You can click the button on the bottom right to submit the runs. What happens when you hit this button depends on your Run Coordinator. With the default run coordinator, the modal will exit after all runs have been launched. With the queued run coordinator, the modal will exit after all runs have been queued.
After all the runs have been submitted, you'll be returned to the partitions page, with a filter for runs inside the backfill. This refreshes periodically and allows you to see how the backfill is progressing. Boxes become green or red as steps in the backfill runs succeed or fail.
You can also launch backfills using the backfill
CLI.
In the Partitions concept page, we defined a partitioned job called do_stuff_partitioned
that had date partitions.
Having done so, we can run the command dagster job backfill
to execute the backfill.
$ dagster job backfill -p do_stuff_partitioned
This will display a list of all the partitions in the job, ask you if you want to proceed, and then launch a run for each partition.
You can also execute subsets of the partition sets.
You can specify the --partitions
argument and provide a comma-separated list of partition names you want to backfill:
$ dagster job backfill -p do_stuff_partitioned --partitions 2021-04-01,2021-04-02
Alternatively, you can also specify ranges of partitions using the --from
and --to
arguments:
$ dagster job backfill -p do_stuff_partitioned --from 2021-04-01 --to 2021-05-01