Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.springtail.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Note: This feature is currently a work in progress. Index building and maintenance are complete; scan implementation is partially implemented. Currently in the branches: SPR-1035-GIN-support-2 (build), SPR-1035-GIN-support-2-scan (scan)
GIN (Generalized Inverted Index) support enables efficient text similarity searches using trigram-based indexing. The implementation tokenizes text column values into 3-character trigrams and stores them in an inverted index structure, mapping each trigram to the rows containing it. The index schema stores entries as (column_position, token, internal_row_id) tuples, allowing multi-column GIN indexes where each column’s trigrams are distinguished by position. Currently, only gin_trgm_ops opclass is supported, enabling LIKE and ILIKE query operators.

Key Components

Index Building

ComponentLocationPurpose
Trigram Helperssrc/pg_ext/trgm_helpers.ccExtracts trigrams from text values using opclass extractValue
GIN Schema Buildersrc/sys_tbl_mgr/schema_helpers.ccCreates the 3-column GIN index schema
GIN Index Rootsrc/sys_tbl_mgr/mutable_table.ccInitializes BTree configured for GIN storage
Index Buildersrc/pg_log_mgr/indexer.ccFull build with reconciliation

Index Scanning (in-progress)

ComponentLocationPurpose
Query Helperssrc/pg_fdw/trgm_query_helpers.ccExtracts trigrams from query using opclass extractQuery
GINSecondary Iteratorsrc/sys_tbl_mgr/table.ccIterates GIN index with row deduplication
FDW Integrationsrc/pg_fdw/pg_fdw_mgr.ccRoutes LIKE/ILIKE operators to GIN index scan

GIN Index Schema

FieldTypeDescription
__springtail_idx_positionUINT32Column position in the table
__springtail_gin_idx_tokenTEXTTrigram token (3-byte string)
__springtail_internal_row_idUINT64Reference to the source row

Supported Operators

OperatorStrategy NumberDescription
LIKE (~~)3Case-sensitive pattern matching
ILIKE (~~*)4Case-insensitive pattern matching

Data Flow

Index Building

INDEX CREATION


Server::_create_index()
  → _check_gin_index_columns() validates opclass
  → _upsert_index_name() persists metadata


Indexer::_build_index()
  → Detects INDEX_TYPE_GIN
  → Calls _build_gin_index()


Indexer::_build_gin_index()
  (builds index at the XID x1 in a separate thread in Indexer)
  → create_gin_index_root() initializes BTree
  → For each row, for each indexed column:
      → extract_trgm_from_value() extracts trigrams
      → Insert (position, token, row_id) into BTree

RECONCILIATION


  → committer triggers process_index_reconciliation()
      → Calls _reconcile_index()
      → Reconciles index using new mutations happed post XID x1
      → Insert (position, token, row_id) into BTree for inserts/updates
      → Remove (position, token, row_id) from BTree for deletes

INCREMENTAL MAINTENANCE


MutableTable::apply_mutation<INSERT/DELETE>()
  → index_mutation_handler() checks index type via _index_lookup
  → For GIN: extracts trigrams, inserts/removes tuples
  → For BTree: standard key-value operation

Index Scanning (in-progress)

QUERY: SELECT * FROM t WHERE col LIKE '%pattern%'


FDW::_init_quals()
  → Selects GIN index with matching column


FDW::_set_scan_iterators()
  → extract_gin_keys_from_string() extracts query trigrams
  → Table::begin(index_id, tokens)


Table::Iterator (GINSecondary)
  → Iterates BTree entries matching tokens
  → Deduplicates rows via _visited_internal_row_ids
  → Resolves row location via look_aside_index
  → Returns matching table rows

Implementation

Index Creation

GIN index creation validates opclass before persisting metadata:
// server.cc - Server::_check_gin_index_columns()
for (auto& idx_column : index_info.columns()) {
    if (std::ranges::find(ALLOWED_GIN_OPS, idx_column.opclass()) == std::end(ALLOWED_GIN_OPS)) {
        LOG_ERROR("Unsupported opclass '{}' for GIN index column", idx_column.opclass());
        return false;
    }
}

Trigram Extraction (Build)

Unpacks PostgreSQL packed trigram integers to 3-byte strings:
// trgm_helpers.cc - unpack_trigram_int_to_string()
std::string unpack_trigram_int_to_string(uint32_t v) {
    unsigned char b1 = (v >> 16) & 0xFF;
    unsigned char b2 = (v >> 8) & 0xFF;
    unsigned char b3 = v & 0xFF;
    return std::string({b1, b2, b3});
}

// trgm_helpers.cc - extract_trgm_from_value()
auto&& extract_func = PgExtnRegistry::get_instance()
    ->get_opclass_method_func_ptr_by_method_name(opclass, method_strategy_number);
Datum result = extract_func(fcinfo);
Datum *entries = reinterpret_cast<Datum *>(DatumGetPointer(result));

Full Index Build and reconciliation

Iterates table rows and inserts trigram tuples:
// indexer.cc - Indexer::_build_gin_index()
for (auto row_i = table->begin(); row_i != table->end(); ++row_i) {
    auto&& row = *row_i;
    for (int i = 0; i < idx_cols.size(); i++) {
        auto&& tokens = extract_trgm_from_value(col_field->get_text(&row),
                column.opclass(), GIN_EXTRACTVALUE);

        for (auto& token : tokens) {
            // Key: (column_position, trigram_token, internal_row_id)
            key_fields->at(0) = std::make_shared<ConstTypeField<uint32_t>>(pos);
            key_fields->at(1) = std::make_shared<ConstTypeField<std::string>>(token);
            key_fields->at(2) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id);
            root->insert(std::make_shared<FieldTuple>(key_fields, nullptr));
        }
    }
}

Reconciliation
// indexer.cc - Indexer::_reconcile_index()
auto&& tokens = extract_trgm_from_value(std::string(col_field->get_text(&row)), column.opclass(), GIN_EXTRACTVALUE);

for (auto& token: tokens) {
    // Set idx_position, token, internal_row_id
    key_fields->at(0) = std::make_shared<ConstTypeField<uint32_t>>(pos);
    key_fields->at(1) = std::make_shared<ConstTypeField<std::string>>(token);
    key_fields->at(2) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id_f->get_uint64(&row));
    auto tuple = std::make_shared<FieldTuple>(key_fields, nullptr);
    ---
    // for inserts/updates
    idx_state._root->insert(tuple);
      and 
    // For deletes
    idx_state._root->remove(tuple);
}

Incremental Maintenance

Mutation handler distinguishes GIN from BTree indexes:
// mutable_table.cc - index_mutation_handler()
if (index_lookup.at(index_id).index_type == constant::INDEX_TYPE_GIN) {
    auto&& tokens = extract_trgm_from_value(col_field->get_text(&row),
            column.opclass, GIN_EXTRACTVALUE);
    for (auto& token : tokens) {
        key_fields->at(0) = std::make_shared<ConstTypeField<uint32_t>>(pos);
        key_fields->at(1) = std::make_shared<ConstTypeField<std::string>>(token);
        key_fields->at(2) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id);
        if constexpr (op == IndexOperation::Insert)
            idx.first->insert(tuple);
        else
            idx.first->remove(tuple);
    }
}

Schema Creation

GIN index schema defines three key columns:
// schema_helpers.cc - create_gin_index_schema()
SchemaColumn idx_position_c(constant::INDEX_POSITION_FIELD, 0, SchemaType::UINT32, 0, false);
SchemaColumn idx_gin_token_c(constant::INDEX_GIN_TOKEN_FIELD, 0, SchemaType::TEXT, 0, false);
SchemaColumn internal_row_id(constant::INTERNAL_ROW_ID, 0, SchemaType::UINT64, 0, false);

return base_schema->create_index_schema({},
    { idx_position_c, idx_gin_token_c, internal_row_id },
    gin_index_keys, extension_callback);

Trigram Extraction (Query)

Invokes opclass extractQuery for search pattern:
// trgm_query_helpers.cc - extract_gin_keys_from_string()
Oid procOid = get_opfamily_proc(opfamily, inputType, inputType, GIN_EXTRACTQUERY_PROC);

Datum *keys = (Datum *) DatumGetPointer(
    FunctionCall7Coll(&flinfo, collation, queryDatum,
        PointerGetDatum(&nkeys),
        UInt16GetDatum(op_strategy_number),
        PointerGetDatum(&partial_matches),
        PointerGetDatum(&extra_data),
        PointerGetDatum(&nullFlags),
        PointerGetDatum(&searchMode)));

GIN Iterator (in-progress)

Deduplicates rows and resolves physical location:
// table.cc - GINSecondary::update_page()
while (true) {
    auto&& index_row = *_btree_i;
    internal_row_id = _internal_row_id_f->get_uint64(&index_row);

    // Skip already visited rows (same row may match multiple trigrams)
    if (!_visited_internal_row_ids.contains(internal_row_id)) {
        _visited_internal_row_ids.emplace(internal_row_id);
        break;
    }
    ++_btree_i;
}

// Resolve row location via look-aside index
_look_aside_key_fields->at(0) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id);
auto lookup_tuple = std::make_shared<FieldTuple>(_look_aside_key_fields, nullptr);
auto&& lookup_i = look_aside_index->lower_bound(lookup_tuple);

FDW Query Routing (in-progress)

Routes LIKE/ILIKE operators to GIN index:
// pg_fdw_mgr.cc - _iter_start()
case LIKE:
case ILIKE:
    auto search_key = tuple->to_string();
    auto tokens = extract_gin_keys_from_string(search_key, "gin_trgm_ops",
            100, TRGM_LIKE_STRATEGY_NUMBER);
    state->iter_start.emplace(state->table->begin(state->index->id,
            state->index_only_scan, tokens));
    state->iter_end.emplace(state->table->end(state->index->id, state->index_only_scan));
    break;

// pg_fdw_mgr.cc - _is_valid_qual()
case TEXTOID:
    return (op == EQUALS || op == NOT_EQUALS || op == LIKE || op == ILIKE);

Path to Completion

The following work remains to complete GIN index support:
  1. Scan Implementation - The GINSecondary iterator currently iterates all index entries. It needs to:
    • Filter entries to only those matching the extracted query tokens
    • Implement proper token intersection logic (all query trigrams must match)
  2. Query Optimization - The FDW currently uses hardcoded gin_trgm_ops and collation. This should be derived from the index metadata.
  3. Testing - End-to-end testing of LIKE/ILIKE queries using GIN indexes.