How We Index Billions of Pages on a Single Server

The Challenge

Indexing billions of web pages typically requires hundreds of servers. Google uses millions. We do it with one.

Here's how.

The Secret: Not All Pages Are Equal

We use a tiered architecture: - Tier 1 (Hot): Top 100M pages — full vector embeddings, instant search - Tier 2 (Warm): Billions of pages — text index only, keyword search - Tier 3 (Cold): Archive — metadata only, on-demand fetch

Binary Quantization: 512x Compression

A standard embedding vector (1024-dim, float32) takes 4,096 bytes. We use matryoshka truncation (64-dim) + binary quantization = 8 bytes per vector.

That's 512x compression. 1 billion vectors = 8 GB instead of 4 TB.

Qdrant: One Database, Three Search Modes

We use Qdrant as our single database for everything: - Vector similarity for semantic search - Text payload indexes for keyword search - Scalar quantization for RAM efficiency

No Elasticsearch. No Meilisearch. One database.

The Numbers

| Metric | Our Setup |
|--------|-----------|
| Server | AMD Ryzen 9, 128GB RAM, 4TB NVMe |
| Pages indexed | 2 billion |
| Keyword latency | 3ms |
| Semantic latency | 300ms |
| Hybrid latency | 350ms (cached: 2ms) |
| Monthly cost | $109 |

Embedding: Qwen3 via API

We use Qwen3-Embedding-0.6B via API. No local GPU needed. The embedding service runs remotely, and we cache results in Redis (70% hit rate).

Common Crawl: Free Data

3.5 billion pages, pre-crawled, free to download. We stream WET files (clean text), filter for quality, and index directly into Qdrant. No disk needed — pure streaming pipeline.

Open Source

Our entire stack is open source: Go microservices, Qdrant, NATS, gateway-a, Nuxt frontend. Built for AI agents from day one.