Incremental Append Only Load with Airbyte Demo

Date:

image

An append-only incremental upload is a data loading pattern where new records are continually added (appended) to a dataset without modifying or deleting existing records. This approach is often used when historical data must be preserved for audit, analysis, or other purposes.

Data is preserved and every new batch of data is added to the existing dataset without altering or removing past data. This is a simple method and is straightforward to implement since it avoids the complexity of handling updates and deletions. It is ideal for systems where data grows over time and past records need to be preserved. Very often we would need an audit trail for historical data analytics and this method maintains a complete history of all data, which is useful for auditing and tracking changes over time. Common Use Cases for Append-Only Incremental Upload

Set up.

Detailed configuration and setup of Airbyte and incremental apppend + duplicate is already covered and I will be continuing from there.

The most import configuraiton is we need to identify the cursor similar to incremental apppend + duplicate, and setting the increment as append only.

image

Data Setup

We will be using a more simplyfied dataset with only 10 rows on a fresh database both at the Postgress source ans Snowflake target.

image

1. Full Load

Running the Airbyte Syc

image

Validating snowflake

We can see that there are 10 rows of the inital load which has been upoloaded into Snowflake.

image

image

2. Inserted new rows.

image

Running Sync for the new rows

image

Validating the syc of the 2 new rows.

We can see a new syn job in Snowflake with job id 38.

image

When we query for the row inserted with the job 38, we find the 2 rows which were loaded as expected.

image

We can also see there 10 original and 2 new rows, totalling 12 rows.

image

3. Update/Delete Rows

Data setup for Update/delete

image

Updating Postgress Source

image

Running the Sync job for the Update/delete rows

image

15 ## Validating the syc of the 2 new rows. We can see a new syn job in Snowflake with job id 39.

image

We can see the 2 rows. One for the update and one with the delete.

image

When we look the at count of rows, updates create new rows unlike the append+deduplicate patter.

image

  • One row updating a anount values with ones
  • One row to update the note as DELETE to indicate that the row is deleted.

image

Append-only incremental uploads are a simple and effective but this needs maintenance and validation regularly. By following best practices for efficient indexing, partitioning, data validation, backup and recovery, monitoring, and data retention we can run a stable extract and load.