Blog
Technical deep-dives, case studies, and field notes from working with customers on cloud and database engineering.
-
How AI Actually Helps You Fix PostgreSQL Performance Problems (and Where It Lies)
AI won’t replace your EXPLAIN ANALYZE instincts — but grounded in real stats, it compresses the diagnose-to-fix loop from an hour to minutes. A DBA’s field guide to where it helps and where it lies.
-
AI Agent Evals: Production Readiness Guide
Benchmarks tell you whether an agent can solve a task. Production evals tell you whether it will behave safely when the task gets messy.
-
Build the Eval System: Three Graders, 38 Tasks, and the $3-8 Safety Net (Part 2 of 2)
The complete practitioner’s guide: three grader types ($0/$0/$$), four task patterns, CI architecture, three real regressions caught, and a 4-week playbook — all for $3-8 per eval run.
-
AI Agent Evals: Why SWE-bench Isn't Enough Before Production (Part 1 of 2)
Your AI agent scores 78% on SWE-bench. It also just told a developer it deployed infrastructure — without calling a single tool. Here’s what benchmarks miss, and the $0 eval that catches it.
-
Spend Fewer Tokens, Get Better Code: A Context Engineering Guide for AI Code Assistants (Part 1 of 2)
Anthropic cut tool context by 85%. Accuracy improved from 49% to 74%. Five context engineering practices that make your AI code assistant produce better output — while spending fewer tokens.
-
Invisible Compound Savings: Caching, Workflow Discipline, and the Habits That Add Up (Part 2 of 3)
90% of your AI prompt context repeats across every request. Prompt caching gives you 90% off. The retry tax costs you 1.4x. Here is how structural habits compound into invisible savings.
-
The 120x Spread: Understanding What You Pay For and When It Matters (Part 3 of 3)
The cheapest AI model costs 0.25x. The most expensive costs 30x. A three-tier task taxonomy for matching model capability to task complexity, plus the complete three-layer optimization playbook.
-
PostgreSQL EXPLAIN BUFFERS: How We Cut Checkout Latency 96%
A real-world e-commerce case study: one word added to EXPLAIN ANALYZE diagnosed a checkout regression from 50ms to 1.2s that three days of network debugging missed.