Overview
pg_textsearch benchmarks measure full-text search performance using real-world datasets.
Benchmarks run nightly on the main branch and on-demand for feature branches
via GitHub Actions.
Datasets
MS MARCO (Primary)
The MS MARCO passage ranking dataset is our primary benchmark. It contains 8.8 million passages from web documents with real search queries from Bing users.
| Metric | Value |
|---|---|
| Documents | 8,841,823 |
| Average document length | ~35 tokens |
| Test queries | 800 (100 per token bucket) |
| Query token buckets | 1, 2, 3, 4, 5, 6, 7, 8+ |
Wikipedia
Wikipedia article abstracts provide longer documents for testing scalability. Available in 10K, 100K, 1M, and full (~6M) configurations.
Cranfield
The classic Cranfield collection (1,400 documents) is used for quick regression testing.
Environment
GitHub Actions Runner
| Component | Specification |
|---|---|
| Platform | Ubuntu 24.04 (ubuntu-latest) |
| CPU | 2-core AMD EPYC (GitHub-hosted) |
| Memory | 7 GB RAM |
| Storage | ~30 GB available after cleanup |
| Postgres | PostgreSQL 17 |
Postgres Configuration
shared_buffers = 4GB
maintenance_work_mem = 1GB
work_mem = 256MB
effective_cache_size = 6GB
random_page_cost = 1.1
Metrics Collected
Index Build
- Build time: Wall-clock time to create the BM25 index
- Index size: On-disk size of the index (from
pg_relation_size)
Query Latency
Queries are grouped by token count (1-8+). For each bucket, we run 100 queries and report:
- p50: Median latency
- p95: 95th percentile latency
- p99: 99th percentile latency
- avg: Mean latency
Throughput
Total time to execute all 800 test queries sequentially, reported as average milliseconds per query.
Benchmark Procedure
- Setup: Start fresh Postgres instance, create extension
- Load: Bulk load dataset using
\copy - Index: Create BM25 index, force spill to segments
- Warmup: Run each query once to warm caches
- Measure: Run timed queries, collect statistics
- Report: Extract metrics, publish to dashboard
LIMIT 10 to simulate
typical search result pages. The Block-Max WAND optimization is enabled
by default.
System X Comparison
We run identical benchmarks against System X (a competitive Postgres BM25 extension) to provide context for our performance numbers. Both extensions:
- Run on the same GitHub Actions runner
- Use identical Postgres configuration
- Process the same dataset and queries
- Use their default configurations
See the detailed comparison for latest results.
Reproducibility
All benchmark code is in the benchmarks/ directory. To run locally:
# MS MARCO benchmark
cd benchmarks/datasets/msmarco
./download.sh full
psql -f load.sql
# Or use the runner script
./benchmarks/runner/run_benchmark.sh msmarco
Limitations
- GitHub Actions runners have variable performance; expect ~10% variance
- Single-threaded query execution (no concurrent load testing)
- Cold start not measured (cache warmup before timing)
- Network latency not included (local socket connection)