Overview
Note: This feature is currently a work in progress. Index building and maintenance are complete; scan implementation is partially implemented. Currently in the branches: SPR-1035-GIN-support-2 (build), SPR-1035-GIN-support-2-scan (scan)GIN (Generalized Inverted Index) support enables efficient text similarity searches using trigram-based indexing. The implementation tokenizes text column values into 3-character trigrams and stores them in an inverted index structure, mapping each trigram to the rows containing it. The index schema stores entries as
(column_position, token, internal_row_id) tuples, allowing multi-column GIN indexes where each column’s trigrams are distinguished by position.
Currently, only gin_trgm_ops opclass is supported, enabling LIKE and ILIKE query operators.
Key Components
Index Building
| Component | Location | Purpose |
|---|---|---|
| Trigram Helpers | src/pg_ext/trgm_helpers.cc | Extracts trigrams from text values using opclass extractValue |
| GIN Schema Builder | src/sys_tbl_mgr/schema_helpers.cc | Creates the 3-column GIN index schema |
| GIN Index Root | src/sys_tbl_mgr/mutable_table.cc | Initializes BTree configured for GIN storage |
| Index Builder | src/pg_log_mgr/indexer.cc | Full build with reconciliation |
Index Scanning (in-progress)
| Component | Location | Purpose |
|---|---|---|
| Query Helpers | src/pg_fdw/trgm_query_helpers.cc | Extracts trigrams from query using opclass extractQuery |
| GINSecondary Iterator | src/sys_tbl_mgr/table.cc | Iterates GIN index with row deduplication |
| FDW Integration | src/pg_fdw/pg_fdw_mgr.cc | Routes LIKE/ILIKE operators to GIN index scan |
GIN Index Schema
| Field | Type | Description |
|---|---|---|
__springtail_idx_position | UINT32 | Column position in the table |
__springtail_gin_idx_token | TEXT | Trigram token (3-byte string) |
__springtail_internal_row_id | UINT64 | Reference to the source row |
Supported Operators
| Operator | Strategy Number | Description |
|---|---|---|
LIKE (~~) | 3 | Case-sensitive pattern matching |
ILIKE (~~*) | 4 | Case-insensitive pattern matching |
Data Flow
Index Building
Index Scanning (in-progress)
Implementation
Index Creation
GIN index creation validates opclass before persisting metadata:Trigram Extraction (Build)
Unpacks PostgreSQL packed trigram integers to 3-byte strings:Full Index Build and reconciliation
Iterates table rows and inserts trigram tuples:Incremental Maintenance
Mutation handler distinguishes GIN from BTree indexes:Schema Creation
GIN index schema defines three key columns:Trigram Extraction (Query)
Invokes opclassextractQuery for search pattern:
GIN Iterator (in-progress)
Deduplicates rows and resolves physical location:FDW Query Routing (in-progress)
Routes LIKE/ILIKE operators to GIN index:Path to Completion
The following work remains to complete GIN index support:-
Scan Implementation - The
GINSecondaryiterator currently iterates all index entries. It needs to:- Filter entries to only those matching the extracted query tokens
- Implement proper token intersection logic (all query trigrams must match)
-
Query Optimization - The FDW currently uses hardcoded
gin_trgm_opsand collation. This should be derived from the index metadata. - Testing - End-to-end testing of LIKE/ILIKE queries using GIN indexes.