Benchmark Methodology

Overview

pg_textsearch benchmarks measure full-text search performance using real-world datasets. Benchmarks run nightly on the main branch and on-demand for feature branches via GitHub Actions.

Datasets

MS MARCO (Primary)

The MS MARCO passage ranking dataset is our primary benchmark. It contains 8.8 million passages from web documents with real search queries from Bing users.

MetricValue
Documents8,841,823
Average document length~35 tokens
Test queries800 (100 per token bucket)
Query token buckets1, 2, 3, 4, 5, 6, 7, 8+

MS MARCO v2

The MS MARCO v2 passage ranking dataset scales the primary benchmark to 138 million passages for large-corpus testing.

MetricValue
Documents138,364,198
Test queries691 (sampled from dev set)
Query token buckets1, 2, 3, 4, 5, 6, 7, 8+

Wikipedia

Wikipedia article abstracts provide longer documents for testing scalability. Available in 10K, 100K, 1M, and full (~6M) configurations.

Cranfield

The classic Cranfield collection (1,400 documents) is used for quick regression testing.

Environment

GitHub Actions Runner

ComponentSpecification
PlatformUbuntu 24.04 (ubuntu-latest)
CPU2-core AMD EPYC (GitHub-hosted)
Memory7 GB RAM
Storage~30 GB available after cleanup
PostgresPostgreSQL 17

Postgres Configuration

shared_buffers = 4GB
maintenance_work_mem = 1GB
work_mem = 256MB
effective_cache_size = 6GB
random_page_cost = 1.1

Metrics Collected

Index Build

Query Latency

Queries are grouped by token count (1-8+). For each bucket, we run 100 queries and report:

Throughput

Total time to execute all test queries sequentially, reported as average milliseconds per query.

Weighted-Average Latency

A single summary metric that weights per-bucket p50 latencies by the observed query-length distribution from MS MARCO v1 (1,010,905 queries after to_tsvector). This reflects realistic workload performance more accurately than an equal-weight average across buckets.

Token BucketQueriesWeight
135,6383.5%
2165,03316.3%
3304,88730.2%
4264,17726.1%
5143,76514.2%
659,5585.9%
722,5952.2%
8+15,2521.5%

Formula: weighted_p50 = Σ(bucket_p50 × weight) / Σ(weight)

Benchmark Procedure

  1. Setup: Start fresh Postgres instance, create extension
  2. Load: Bulk load dataset using \copy
  3. Index: Create BM25 index, force spill to segments
  4. Warmup: Run each query once to warm caches
  5. Measure: Run timed queries, collect statistics
  6. Report: Extract metrics, publish to dashboard
Note: All queries use LIMIT 10 to simulate typical search result pages. The Block-Max WAND optimization is enabled by default.

System X Comparison

We run identical benchmarks against System X (a competitive Postgres BM25 extension) to provide context for our performance numbers. Both extensions:

See the detailed comparison for latest results.

Reproducibility

All benchmark code is in the benchmarks/ directory. To run locally:

# MS MARCO benchmark
cd benchmarks/datasets/msmarco
./download.sh full
psql -f load.sql

# Or use the runner script
./benchmarks/runner/run_benchmark.sh msmarco

Limitations