Show HN: Altimate Code – Open-Source Agentic Data Engineering Harness (opens in new tab)

(github.com)

20 pointsaaur03mo ago2 comments

I'm Anand, co-founder and CTO of Altimate AI. My co-founder Pradnesh and I are open-sourcing Altimate Code. AMA.

Why we built this:

Pradnesh and I have been building tooling for data engineers for three years: dbt Power User and Datamates vscode extensions with combined 750k+ installs, running against real Fortune 500 data estates. The pattern we kept seeing: general-purpose agents can write SQL, but they have no model of what the SQL does. No lineage. No schema context. No understanding of what's in a dbt manifest. That's not a prompt problem; it's a missing tool layer problem.

The numbers make it concrete: 27–33% of AI-generated SQL references tables that don't exist. 78% of errors are silent wrong joins, queries that compile, run, and return confidently incorrect data. One team got a $5k bill from a single Cortex AI query their resource monitors never caught. This isn't a model quality problem. It's a missing harness problem, and we proved it.

Claude Code and Cursor are genuinely good for software engineering. But when you point them at a data stack, they hallucinate column names, ignore partition keys, and have no concept of data contracts or quality rules in your models. From building tooling against real data estates, we knew exactly what was missing at the tool level.

We forked from OpenCode for the agentic scaffolding. What we added is the entire data layer: compiled Rust engines, purpose-built skills, and the harness that wires them together.

What Altimate Code does that general agents can't:

- Live column-level lineage: traces any column through joins, CTEs, and subqueries deterministically. 100% edge match on 500K benchmark queries at 0.26ms/query, and not from a cached manifest. Manifests go stale within hours on active pipelines, which makes cached lineage unreliable for anything agentic

- SQL anti-pattern detection: 26 rules, zero false positives, 0.48ms/query.

- Local SQL validation: interrogates your schema catalog in 2ms without touching your warehouse. Wrong table? Caught with a fuzzy-matched fix suggestion before the LLM goes into a fix loop. That's 10ms for 5 fix cycles vs. 2.5 minutes of Snowflake round-trips

- Purpose-built skills for dbt development, testing, troubleshooting, documentation, SQL optimization, and migration.

- 3 agent modes with compiled permission enforcement (Builder, Analyst, Planner). “Analyst” enforces read-only at the engine level, not just the prompt. That distinction is what makes it safe to run against production

- Persistent memory: cross-session, two scopes (global preferences + project knowledge). Versioned in git, team-inherited on git pull

- PII detection, SQL injection scanning, permission enforcement — all at the engine level, not the prompt

- 10 data connectors: Snowflake, BigQuery, Databricks, PostgreSQL, Redshift, DuckDB, MySQL, SQL Server etc

- Local tracer: every LLM call, tool invocation, and warehouse credit traced locally. No external services.

On the benchmarks:

We ran ADE-bench, the open standard from dbt Labs

Altimate Code (Sonnet 4.6) → 74.4%

Cortex Code (Snowflake) (Opus 4.6) → 65%

Claude Code (baseline) (Sonnet 4.6) → ~40%

A cheaper model with compiled tools outperformed a more expensive model without them. The gap is the harness. Full methodology is in the launch post and linked from the README.

To try it:

- npm install -g @altimateai/altimate-code

- altimate

- altimate /discover

/discover interrogates your dbt projects, warehouse connections, and installed tools automatically.

GitHub: https://github.com/AltimateAI/altimate-code · Docs: [altimate-code.sh](http://altimate-code.sh)

There's a /feedback command that files a GitHub issue directly. If something breaks or doesn't behave the way you'd expect, use that or reply here. I'll be in this thread.

Show HN: Altimate Code – Open-Source Agentic Data Engineering Harness

(github.com)

20 pointsaaur03mo ago2 comments

I'm Anand, co-founder and CTO of Altimate AI. My co-founder Pradnesh and I are open-sourcing Altimate Code. AMA.

Why we built this:

We forked from OpenCode for the agentic scaffolding. What we added is the entire data layer: compiled Rust engines, purpose-built skills, and the harness that wires them together.

What Altimate Code does that general agents can't:

- SQL anti-pattern detection: 26 rules, zero false positives, 0.48ms/query.

- Purpose-built skills for dbt development, testing, troubleshooting, documentation, SQL optimization, and migration.

- Persistent memory: cross-session, two scopes (global preferences + project knowledge). Versioned in git, team-inherited on git pull

- PII detection, SQL injection scanning, permission enforcement — all at the engine level, not the prompt

- 10 data connectors: Snowflake, BigQuery, Databricks, PostgreSQL, Redshift, DuckDB, MySQL, SQL Server etc

- Local tracer: every LLM call, tool invocation, and warehouse credit traced locally. No external services.

On the benchmarks:

We ran ADE-bench, the open standard from dbt Labs

Altimate Code (Sonnet 4.6) → 74.4%

Cortex Code (Snowflake) (Opus 4.6) → 65%

Claude Code (baseline) (Sonnet 4.6) → ~40%

A cheaper model with compiled tools outperformed a more expensive model without them. The gap is the harness. Full methodology is in the launch post and linked from the README.

To try it:

- npm install -g @altimateai/altimate-code

- altimate

- altimate /discover

/discover interrogates your dbt projects, warehouse connections, and installed tools automatically.

GitHub: https://github.com/AltimateAI/altimate-code · Docs: [altimate-code.sh](http://altimate-code.sh)

There's a /feedback command that files a GitHub issue directly. If something breaks or doesn't behave the way you'd expect, use that or reply here. I'll be in this thread.

2 comments

2 comments · 2 top-level

mathisd3mo ago

Few observations related to data engineering in the context of a data warehouse: 1. Protocols and IR (Intermediate Representation) have layed and continue to enable interoperability and composability of data tools (see Apache Arrow, Substrait, Catalog). (great introduction here https://voltrondata.com/codex). 2. Current OSS data tooling is really good (except on user interface). 3. Agentic workflow are working incredibly well for data-engineering tasks. 4. LLM is pushing for declarative tools and docs close to code.

That's why I am working on a (early) project called Orca [1]. Orca is a template and a set of patterns for building a production-ready and agentic-enabled data warehouse using entirely free and open-source tools. Go check-out the README for more info. I would be interested to get feedback to it!

[1] Orca : https://github.com/mathisdrn/orca

1 more reply

yuktharaman3mo ago

Benchmarking results look promising. Excited to try it out !!!!!

j / k navigate · click thread line to collapse