Incremental Append Only Load with Airbyte Demo
Date:
An append-only incremental upload is a data loading pattern where new records are continually added (appended) to a dataset without modifying or deleting existing records. This approach is often used when historical data must be preserved for audit, analysis, or other purposes.
Data is preserved and every new batch of data is added to the existing dataset without altering or removing past data. This is a simple method and is straightforward to implement since it avoids the complexity of handling updates and deletions. It is ideal for systems where data grows over time and past records need to be preserved. Very often we would need an audit trail for historical data analytics and this method maintains a complete history of all data, which is useful for auditing and tracking changes over time. Common Use Cases for Append-Only Incremental Upload
Set up.
Detailed configuration and setup of Airbyte and incremental apppend + duplicate is already covered and I will be continuing from there.
The most import configuraiton is we need to identify the cursor similar to incremental apppend + duplicate, and setting the increment as append only.
Data Setup
We will be using a more simplyfied dataset with only 10 rows on a fresh database both at the Postgress source ans Snowflake target.
1. Full Load
Running the Airbyte Syc
Validating snowflake
We can see that there are 10 rows of the inital load which has been upoloaded into Snowflake.
2. Inserted new rows.
Running Sync for the new rows
Validating the syc of the 2 new rows.
We can see a new syn job in Snowflake with job id 38.
When we query for the row inserted with the job 38, we find the 2 rows which were loaded as expected.
We can also see there 10 original and 2 new rows, totalling 12 rows.
3. Update/Delete Rows
Data setup for Update/delete
Updating Postgress Source
Running the Sync job for the Update/delete rows
15 ## Validating the syc of the 2 new rows. We can see a new syn job in Snowflake with job id 39.
We can see the 2 rows. One for the update and one with the delete.
When we look the at count of rows, updates create new rows unlike the append+deduplicate patter.
- One row updating a anount values with ones
- One row to update the note as DELETE to indicate that the row is deleted.
Append-only incremental uploads are a simple and effective but this needs maintenance and validation regularly. By following best practices for efficient indexing, partitioning, data validation, backup and recovery, monitoring, and data retention we can run a stable extract and load.