Current Status
pg_textsearch today
- 3.9x faster overall query throughput
- Faster on all query lengths (1-8+ tokens)
- Smaller index (no positions stored)*
- Parallel index build (4 workers)
- Native Postgres integration
System X v0.21.6
- Faster index build (1.6x)
- Phrase queries supported
- Larger feature set (facets, etc.)
Recent Improvements
- BMW cache optimizations - Cached skip entries and reusable decompression buffers reduce per-block overhead by 20–25% (PR #274)
- SIMD-accelerated decoding - Bitpack decoding with SIMD intrinsics (PR #250)
- Stack-allocated decode buffers - Reduced allocation overhead (PR #253)
- BMW term state optimization - Pointer indirection for ordering (PR #249)
- Arena allocator - Rewritten index build with parallel page pool (PR #231)
- Overall throughput - pg_textsearch now 3.9x faster than System X on 8.8M dataset (up from 3.2x on March 3)
Index Size & Build Time
| Metric | pg_textsearch | System X | Difference |
|---|---|---|---|
| Index Size | 1,215 MB | 1,503 MB | -19% |
| Build Time | 233.5 sec | 142.5 sec | +64% |
| Documents | 8,841,823 | - | |
"quick brown fox". System X stores
positions by default, which adds significant overhead but enables phrase search.
This accounts for most of the index size difference—it's a feature tradeoff, not
a compression advantage.
Query Latency (p50)
Median latency in milliseconds. Lower is better.
| Query Tokens | pg_textsearch | System X | Difference |
|---|---|---|---|
| 1 token | 0.70 ms | 18.05 ms | -96% |
| 2 tokens | 1.31 ms | 18.51 ms | -93% |
| 3 tokens | 2.44 ms | 24.69 ms | -90% |
| 4 tokens | 3.73 ms | 27.35 ms | -86% |
| 5 tokens | 6.07 ms | 28.94 ms | -79% |
| 6 tokens | 8.76 ms | 35.34 ms | -75% |
| 7 tokens | 13.05 ms | 37.29 ms | -65% |
| 8+ tokens | 19.86 ms | 44.98 ms | -56% |
Query Latency (p95)
95th percentile latency in milliseconds. Lower is better.
| Query Tokens | pg_textsearch | System X | Difference |
|---|---|---|---|
| 1 token | 1.62 ms | 24.20 ms | -93% |
| 2 tokens | 3.62 ms | 32.18 ms | -89% |
| 3 tokens | 6.97 ms | 37.57 ms | -81% |
| 4 tokens | 10.86 ms | 36.17 ms | -70% |
| 5 tokens | 17.76 ms | 39.34 ms | -55% |
| 6 tokens | 21.24 ms | 59.09 ms | -64% |
| 7 tokens | 32.55 ms | 67.40 ms | -52% |
| 8+ tokens | 43.94 ms | 70.92 ms | -38% |
Throughput
Total time to execute 800 test queries sequentially.
| Metric | pg_textsearch | System X | Difference |
|---|---|---|---|
| Total time | 6.46 sec | 25.20 sec | -74% |
| Avg ms/query | 8.08 ms | 31.50 ms | -74% |
Analysis
Query latency: pg_textsearch faster across all token counts
pg_textsearch is faster on all 8 token buckets at both p50 and p95, ranging from 26x faster on single-token queries to 2.3x faster on 8+ token queries at p50. SIMD-accelerated bitpack decoding (PR #250), stack-allocated decode buffers (PR #253), and BMW cache optimizations (PR #274) drove improvements across the board.
Overall throughput: pg_textsearch 3.9x faster
pg_textsearch completes 800 queries in 6.5s vs 25.2s for System X, a 3.9x throughput advantage. This is up from 3.2x on March 3, driven by continued scoring path optimizations.
Index build: System X 1.6x faster
System X builds its index in 143s vs 234s for pg_textsearch (1.6x faster). The arena allocator rewrite (PR #231) and leader-only merge (PR #244) previously cut build time from 270s to 234s.
Methodology
Both extensions benchmarked on identical GitHub Actions runners with the same Postgres configuration. See full methodology for details.
MS-MARCO v2 — 138M Passages
Large-Scale Benchmark
Environment
| Component | Specification |
|---|---|
| CPU | Intel Xeon Platinum 8375C @ 2.90 GHz, 8 cores / 16 threads |
| RAM | 123 GB |
| Storage | NVMe SSD (885 GB) |
| Postgres | 17.7, shared_buffers = 31 GB, data on NVMe |
| Table size | 47 GB (87 GB with TOAST) |
Current Status (138M)
pg_textsearch
- 2.3x faster weighted p50 query latency
- 4.7x higher concurrent throughput (16 clients)
- Faster on all 8 token buckets at p50
- 26% smaller index on disk
- Block-Max WAND with cached skip entries
- SIMD-accelerated bitpack decoding
System X v0.21.6
- 1.9x faster index build
- Phrase queries supported
- Larger feature set (facets, etc.)
Index Build (138M)
| Metric | pg_textsearch | System X | Difference |
|---|---|---|---|
| Build time | 17 min 37 s | 8 min 55 s | 1.9x slower |
| Parallel workers | 15 | 14 | - |
| Index size | 17 GB | 23 GB | -26% |
| Documents | 138,364,158 | - | |
| Unique terms | 17,373,764 | - | - |
Single-Client Query Latency (138M)
Top-10 results (LIMIT 10), BMW optimization enabled.
691 queries sampled across 8 token-count buckets.
Median Latency (p50)
| Query Tokens | pg_textsearch | System X | Speedup |
|---|---|---|---|
| 1 token | 5.11 ms | 59.83 ms | 11.7x |
| 2 tokens | 9.14 ms | 59.65 ms | 6.5x |
| 3 tokens | 20.04 ms | 77.62 ms | 3.9x |
| 4 tokens | 41.92 ms | 98.89 ms | 2.4x |
| 5 tokens | 67.76 ms | 125.38 ms | 1.9x |
| 6 tokens | 102.82 ms | 148.78 ms | 1.4x |
| 7 tokens | 159.37 ms | 169.65 ms | 1.1x |
| 8+ tokens | 177.95 ms | 190.47 ms | 1.1x |
95th Percentile Latency (p95)
| Query Tokens | pg_textsearch | System X | Speedup |
|---|---|---|---|
| 1 token | 6.43 ms | 68.34 ms | 10.6x |
| 2 tokens | 32.63 ms | 103.17 ms | 3.2x |
| 3 tokens | 51.51 ms | 114.79 ms | 2.2x |
| 4 tokens | 124.17 ms | 147.32 ms | 1.2x |
| 5 tokens | 167.05 ms | 190.07 ms | 1.1x |
| 6 tokens | 262.07 ms | 201.76 ms | 0.77x |
| 7 tokens | 311.58 ms | 291.09 ms | 0.94x |
| 8+ tokens | 404.95 ms | 310.68 ms | 0.77x |
Weighted-Average Latency
Weighted by observed query-length distribution from 1,010,916 MS-MARCO v1 Bing queries after English stopword removal and stemming (mean 3.7 lexemes, mode 3).
Query length distribution (click to expand)
MS-MARCO Query Lexeme Count Distribution (1,010,916 queries)
Lexemes = distinct stems after English stopword removal
lexemes queries % distribution
─────── ──────── ───── ──────────────────────────────────────────────────
0 11 0.0% ▏
1 35,638 3.5% ▓▓▓▓▓
2 165,033 16.3% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
3 304,887 30.2% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
4 264,177 26.1% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
5 143,765 14.2% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
6 59,558 5.9% ▓▓▓▓▓▓▓▓▓
7 22,595 2.2% ▓▓▓
8 8,627 0.9% ▓
9 3,395 0.3% ▏
10 1,555 0.2% ▏
11 721 0.1% ▏
12 402 0.0% ▏
13 235 0.0% ▏
14 123 0.0% ▏
15+ 193 0.0% ▏
Total: 1,010,916 queries
Mean: 3.7 lexemes
Mode: 3 lexemes (30.2%)
72.6% of queries have 2-4 lexemes.
96.2% of queries have 1-6 lexemes.
Benchmark buckets 1–7 contain 100 queries each; bucket 8+ contains 38 queries covering all lengths ≥8. Weights applied to each bucket match the distribution above.
| Metric | pg_textsearch | System X | Speedup |
|---|---|---|---|
| Weighted p50 | 40.61 ms | 94.36 ms | 2.3x |
| Weighted avg | 46.69 ms | 101.66 ms | 2.2x |
Throughput (138M)
Single-Client Sequential
691 queries run 3 times; median iteration reported.
| Metric | pg_textsearch | System X | Speedup |
|---|---|---|---|
| Avg ms/query | 62.92 ms | 106.53 ms | 1.7x |
| Total (691 queries) | 43.5 s | 73.6 s | 1.7x |
Concurrent (pgbench, 16 clients, 60 s)
| Metric | pg_textsearch | System X | Ratio |
|---|---|---|---|
| Transactions/sec (TPS) | 91.4 | 19.4 | 4.7x |
| Avg latency | 175 ms | 823 ms | 4.7x |
| Transactions (60 s) | 5,526 | 1,180 | 4.7x |
Analysis (138M)
Query latency: pg_textsearch faster across all token counts
pg_textsearch is faster on all 8 token buckets at p50, ranging from 11.7x faster on single-token queries to 1.1x on 8+ token queries. Cached skip entries and reusable decompression buffers (PR #274) reduced per-block overhead in the WAND inner loop by 20–25%, closing the gap on high-token queries. The weighted p50 advantage is 2.3x.
Tail latency: improved but mixed at p95
pg_textsearch has tighter tail latency on 1–5 token queries at p95. On 6–8+ token queries, System X still has tighter tails. The p95 gap narrowed significantly with the cache optimizations (e.g., 5-token p95 went from 200ms to 167ms, now faster than System X's 190ms). Further tail latency optimization on long queries remains an active area of work.
Concurrent throughput: pg_textsearch 4.7x higher TPS
Under 16-client concurrent load, pg_textsearch achieves 91.4 TPS vs 19.4 TPS for System X — a 4.7x advantage. This is significantly wider than the 1.5x single-client gap, indicating that pg_textsearch scales much better under concurrency. pg_textsearch uses native Postgres buffer management and shared memory, avoiding the external process coordination overhead present in System X's architecture.
Index build: System X 1.9x faster
System X builds its index in 8 min 55 s vs 17 min 37 s for pg_textsearch (1.9x faster). pg_textsearch's parallel build uses 15 workers for the scan phase, but the subsequent merge phase is single-threaded and I/O-bound, accounting for the majority of the build time. Despite the slower build, pg_textsearch produces a 26% smaller index (17 GB vs 23 GB).
Methodology (138M)
Both extensions benchmarked on the same dedicated EC2 instance
(c6i.4xlarge), same Postgres 17.7 installation, same dataset. The
table was loaded once; each extension built its index from scratch with
page cache dropped before each build. Query benchmarks include warmup
passes. The pgbench power test uses -M prepared mode with
random query selection from 691 benchmark queries.