Ravo Research
Engineering

Rebuilding our indexing pipeline for daily OpenAlex syncs.

What broke at a few million records, and the queue design that fixed it.

By the Ravo teamFeb 8, 2026 · 9 min read

Daily syncs sound simple until the dataset is millions of records and every upstream change has to propagate without downtime.

Idempotent jobs, narrow batches

We rebuilt around idempotent jobs and narrow batches with explicit checkpoints, so a failed sync resumes instead of restarting. Throughput tripled and the on-call pages stopped.