The concept of treating data like code is gaining momentum, with tools such as lakeFS and Oxen.ai offering a Git-for-data system. Branching off the main branch for job executions provides a safe space for data manipulation and metadata recording before deciding whether to merge back into the main branch. Test executions can be conducted using branches to avoid affecting production data, while experiment branches allow for longer-term analysis of experimental data without merging into the main branch. Branches for multi-step jobs help break down complex operations into separate stages. Overall, these branch strategies mimic database transactions with ACID guarantees, offering a structured approach to data management in batch job-based software systems.
https://isaacjordan.me/blog/2025/01/data-branching-for-batch-job-systems