Solve technical problems, increase efficiency and productivity, and improve systems

We need repeatable processes and patterns for designing, building, and maintaining systems that enable the architecture to collect, store, and process large volumes of data. These data pipelines must ensure that data is accessible, reliable, and efficiently available for analysis and decision-making. My goal is to demonstrate how tools and technologies can implement these processes and patterns throughout the data lifecycle, from source to destination, while maintaining quality, scalability, and security.
ConversationAI: Importing Documents
August 17, 2024
ConversationAI: Integrating DialogflowCX with backends data sources using webhooks
August 17, 2024
10 Terraform best practices - Simple enough and all projects need to include
July 01, 2024
Infrastructure as code, Teraform, Azure, Teraform
Iceberg on AWS: Part 3 - Glue Spark Evolves Schema
July 01, 2024
Lakehouse, Glue Spark, Iceberg, AWS
Column Transformations for Staging
July 01, 2024
Lakehouse, DBT, Snowflake, AWS
Avro vs Parquet vs CSV Demo
July 01, 2024
Table Formats Comparison Demo
June 30, 2024
Extract: Batch Transfer From Onprem To Cloud (GCP)
June 29, 2024
Slowly Changing Dimensions - Type 2 with Glue, Pyspark and Iceberg
June 01, 2024
Data Modeling, SCD2, AWS
Incremental (Append and Deduplicate) Load with Airbyte Demo
June 01, 2024
, Delta, Airbyte
Incremental Append Only Load with Airbyte Demo
June 01, 2024
, Delta, Airbyte
Trigger a function when a new file is uploaded to cloud storage
March 01, 2024
Parquet: Best practices demonstration
February 01, 2024
A often overlooked feature of Parquet is its support for Interoperability which is key to enterprise data plaforms which serves different tools and systems, facilitating data exchange and integration. This is my take on Parquet best practices and I have used python-pyarrow to demonstrate them.
Cloud Storage: Best practices
February 01, 2024
Buckets names: I am split on wheter to have have smart names which clear inform about the intent of the bucket and its files and the security concerns that may arise by doing so. If there is a need to hide the intent of buckets from possible attackers, we would need manage and enforce catalogs. However, I have seen the worst of both worlds in which the naming is gives enough and these buckets not being cataloged. I would recommend a naming coventions or rules to catalog bucket names and have audits to ensure compliance.
Snowflake Implementation Notes
March 01, 2020
Virtual Warehouses
Data Engineering Project Initiation Checklist
March 01, 2019
Some upfront work is required to ensure the success of data engineering projects. I have used this checklist to provide a framework for collaborating with multiple stakeholders to define clear requirements and designs.

Kris NUNES

Solve technical problems, increase efficiency and productivity, and improve systems

Virtual Warehouses