Palantir Simple Injest from Postgress

Palantir, Palantir, 2026

Some intitial thoughts

  • We will Use Python (Lightweight) for most ingestion use cases. It’s simpler, faster to start up (no Spark overhead), and handles the vast majority of table sizes. This is what we will be building now with @external_systems + psycopg2.
  • Use will PySpark only when you need Spark’s distributed JDBC reader to parallelize reads across partitions for very large tables. Spark can split a single table read into multiple parallel queries using a partition column: We will try an demonstrate how we would achieve this in a later use case

We developed how to create a data connection

Code Repository

Code Repository in Foundry

  • A Code Repository is a Git-backed project that contains the logic for data transformations, functions, models, or applications. It’s the fundamental unit where all “code” lives in Foundry.

  • What It Contains

    code-repository/
    ├── transforms-python/
    │   ├── conda_recipe/
    │   │   └── meta.yaml              ← Dependencies (packages)
    │   └── src/
    │       ├── myproject/
    │       │   ├── pipeline.py         ← Registers transforms for discovery
    │       │   └── datasets/
    │       │       └── my_transform.py ← Your transform logic
    │       ├── setup.py                ← Entry point declaration
    │       └── test/                   ← Unit tests
    ├── build.gradle                    ← Build configuration
    └── ci.yml                          ← CI pipeline definition
    

alt text

alt text

Give at a name and a folder where it needs to reside. alt text