The Challenge
Indexing billions of web pages typically requires hundreds of servers. Google uses millions. We do it with one.
Here's how.
The Secret: Not All Pages Are Equal
We use a tiered architecture: - Tier 1 (Hot): Top 100M pages — full vector embeddings, instant search - Tier 2 (Warm): Billions of pages — text index only, keyword search - Tier 3 (Cold): Archive — metadata only, on-demand fetch
Binary Quantization: 512x Compression
A standard embedding vector (1024-dim, float32) takes 4,096 bytes. We use matryoshka truncation (64-dim) + binary quantization = 8 bytes per vector.
That's 512x compression. 1 billion vectors = 8 GB instead of 4 TB.
Qdrant: One Database, Three Search Modes
We use Qdrant as our single database for everything: - Vector similarity for semantic search - Text payload indexes for keyword search - Scalar quantization for RAM efficiency
No Elasticsearch. No Meilisearch. One database.
The Numbers
| Metric | Our Setup | |--------|-----------| | Server | AMD Ryzen 9, 128GB RAM, 4TB NVMe | | Pages indexed | 2 billion | | Keyword latency | 3ms | | Semantic latency | 300ms | | Hybrid latency | 350ms (cached: 2ms) | | Monthly cost | $109 |
Embedding: Qwen3 via API
We use Qwen3-Embedding-0.6B via API. No local GPU needed. The embedding service runs remotely, and we cache results in Redis (70% hit rate).
Common Crawl: Free Data
3.5 billion pages, pre-crawled, free to download. We stream WET files (clean text), filter for quality, and index directly into Qdrant. No disk needed — pure streaming pipeline.
Open Source
Our entire stack is open source: Go microservices, Qdrant, NATS, gateway-a, Nuxt frontend. Built for AI agents from day one.