Engineering
Rebuilding our indexing pipeline for daily OpenAlex syncs.
What broke at a few million records, and the queue design that fixed it.
By the Ravo teamFeb 8, 2026 · 9 min read
Daily syncs sound simple until the dataset is millions of records and every upstream change has to propagate without downtime.
Idempotent jobs, narrow batches
We rebuilt around idempotent jobs and narrow batches with explicit checkpoints, so a failed sync resumes instead of restarting. Throughput tripled and the on-call pages stopped.