> ## Documentation Index
> Fetch the complete documentation index at: https://docs.springtail.io/llms.txt
> Use this file to discover all available pages before exploring further.

# GIN Index support

## Overview

> **Note:** This feature is currently a work in progress. Index building and maintenance are complete; scan implementation is partially implemented.  Currently in the branches: SPR-1035-GIN-support-2 (build), SPR-1035-GIN-support-2-scan (scan)

GIN (Generalized Inverted Index) support enables efficient text similarity searches using trigram-based indexing. The implementation tokenizes text column values into 3-character trigrams and stores them in an inverted index structure, mapping each trigram to the rows containing it.

The index schema stores entries as `(column_position, token, internal_row_id)` tuples, allowing multi-column GIN indexes where each column's trigrams are distinguished by position.

Currently, only `gin_trgm_ops` opclass is supported, enabling `LIKE` and `ILIKE` query operators.

***

## Key Components

### Index Building

| Component          | Location                            | Purpose                                                         |
| ------------------ | ----------------------------------- | --------------------------------------------------------------- |
| Trigram Helpers    | `src/pg_ext/trgm_helpers.cc`        | Extracts trigrams from text values using opclass `extractValue` |
| GIN Schema Builder | `src/sys_tbl_mgr/schema_helpers.cc` | Creates the 3-column GIN index schema                           |
| GIN Index Root     | `src/sys_tbl_mgr/mutable_table.cc`  | Initializes BTree configured for GIN storage                    |
| Index Builder      | `src/pg_log_mgr/indexer.cc`         | Full build with reconciliation                                  |

### Index Scanning (in-progress)

| Component             | Location                           | Purpose                                                   |
| --------------------- | ---------------------------------- | --------------------------------------------------------- |
| Query Helpers         | `src/pg_fdw/trgm_query_helpers.cc` | Extracts trigrams from query using opclass `extractQuery` |
| GINSecondary Iterator | `src/sys_tbl_mgr/table.cc`         | Iterates GIN index with row deduplication                 |
| FDW Integration       | `src/pg_fdw/pg_fdw_mgr.cc`         | Routes LIKE/ILIKE operators to GIN index scan             |

### GIN Index Schema

| Field                          | Type   | Description                   |
| ------------------------------ | ------ | ----------------------------- |
| `__springtail_idx_position`    | UINT32 | Column position in the table  |
| `__springtail_gin_idx_token`   | TEXT   | Trigram token (3-byte string) |
| `__springtail_internal_row_id` | UINT64 | Reference to the source row   |

### Supported Operators

| Operator        | Strategy Number | Description                       |
| --------------- | --------------- | --------------------------------- |
| `LIKE` (`~~`)   | 3               | Case-sensitive pattern matching   |
| `ILIKE` (`~~*`) | 4               | Case-insensitive pattern matching |

***

## Data Flow

### Index Building

```mermaid theme={null}
flowchart TD
    subgraph Creation [INDEX CREATION]
        direction TB
        Create["Server::_create_index()<br/>→ _check_gin_index_columns() validates opclass<br/>→ _upsert_index_name() persists metadata"]
        BuildIdx["Indexer::_build_index()<br/>→ Detects INDEX_TYPE_GIN<br/>→ Calls _build_gin_index()"]
        BuildGin["Indexer::_build_gin_index()<br/>(builds index at XID x1 in a separate Indexer thread)<br/>→ create_gin_index_root() initializes BTree<br/>→ For each row, for each indexed column:<br/>→ extract_trgm_from_value() extracts trigrams<br/>→ Insert (position, token, row_id) into BTree"]
        Create --> BuildIdx --> BuildGin
    end

    subgraph Recon [RECONCILIATION]
        direction TB
        ReconStep["committer triggers process_index_reconciliation()<br/>→ Calls _reconcile_index()<br/>→ Reconciles index using new mutations after XID x1<br/>→ Insert (position, token, row_id) for inserts/updates<br/>→ Remove (position, token, row_id) for deletes"]
    end

    subgraph Maint [INCREMENTAL MAINTENANCE]
        direction TB
        Apply["MutableTable::apply_mutation&lt;INSERT/DELETE&gt;()<br/>→ index_mutation_handler() checks index type via _index_lookup<br/>→ For GIN: extracts trigrams, inserts/removes tuples<br/>→ For BTree: standard key-value operation"]
    end

    Creation --> Recon --> Maint
```

### Index Scanning (in-progress)

```mermaid theme={null}
flowchart TD
    Query["QUERY: SELECT * FROM t WHERE col LIKE '%pattern%'"]
    InitQuals["FDW::_init_quals()<br/>→ Selects GIN index with matching column"]
    SetIter["FDW::_set_scan_iterators()<br/>→ extract_gin_keys_from_string() extracts query trigrams<br/>→ Table::begin(index_id, tokens)"]
    Iter["Table::Iterator (GINSecondary)<br/>→ Iterates BTree entries matching tokens<br/>→ Deduplicates rows via _visited_internal_row_ids<br/>→ Resolves row location via look_aside_index<br/>→ Returns matching table rows"]

    Query --> InitQuals --> SetIter --> Iter
```

***

## Implementation

### Index Creation

GIN index creation validates opclass before persisting metadata:

```cpp theme={null}
// server.cc - Server::_check_gin_index_columns()
for (auto& idx_column : index_info.columns()) {
    if (std::ranges::find(ALLOWED_GIN_OPS, idx_column.opclass()) == std::end(ALLOWED_GIN_OPS)) {
        LOG_ERROR("Unsupported opclass '{}' for GIN index column", idx_column.opclass());
        return false;
    }
}
```

### Trigram Extraction (Build)

Unpacks PostgreSQL packed trigram integers to 3-byte strings:

```cpp theme={null}
// trgm_helpers.cc - unpack_trigram_int_to_string()
std::string unpack_trigram_int_to_string(uint32_t v) {
    unsigned char b1 = (v >> 16) & 0xFF;
    unsigned char b2 = (v >> 8) & 0xFF;
    unsigned char b3 = v & 0xFF;
    return std::string({b1, b2, b3});
}

// trgm_helpers.cc - extract_trgm_from_value()
auto&& extract_func = PgExtnRegistry::get_instance()
    ->get_opclass_method_func_ptr_by_method_name(opclass, method_strategy_number);
Datum result = extract_func(fcinfo);
Datum *entries = reinterpret_cast<Datum *>(DatumGetPointer(result));
```

### Full Index Build and reconciliation

Iterates table rows and inserts trigram tuples:

```cpp theme={null}
// indexer.cc - Indexer::_build_gin_index()
for (auto row_i = table->begin(); row_i != table->end(); ++row_i) {
    auto&& row = *row_i;
    for (int i = 0; i < idx_cols.size(); i++) {
        auto&& tokens = extract_trgm_from_value(col_field->get_text(&row),
                column.opclass(), GIN_EXTRACTVALUE);

        for (auto& token : tokens) {
            // Key: (column_position, trigram_token, internal_row_id)
            key_fields->at(0) = std::make_shared<ConstTypeField<uint32_t>>(pos);
            key_fields->at(1) = std::make_shared<ConstTypeField<std::string>>(token);
            key_fields->at(2) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id);
            root->insert(std::make_shared<FieldTuple>(key_fields, nullptr));
        }
    }
}

Reconciliation
// indexer.cc - Indexer::_reconcile_index()
auto&& tokens = extract_trgm_from_value(std::string(col_field->get_text(&row)), column.opclass(), GIN_EXTRACTVALUE);

for (auto& token: tokens) {
    // Set idx_position, token, internal_row_id
    key_fields->at(0) = std::make_shared<ConstTypeField<uint32_t>>(pos);
    key_fields->at(1) = std::make_shared<ConstTypeField<std::string>>(token);
    key_fields->at(2) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id_f->get_uint64(&row));
    auto tuple = std::make_shared<FieldTuple>(key_fields, nullptr);
    ---
    // for inserts/updates
    idx_state._root->insert(tuple);
      and 
    // For deletes
    idx_state._root->remove(tuple);
}

```

### Incremental Maintenance

Mutation handler distinguishes GIN from BTree indexes:

```cpp theme={null}
// mutable_table.cc - index_mutation_handler()
if (index_lookup.at(index_id).index_type == constant::INDEX_TYPE_GIN) {
    auto&& tokens = extract_trgm_from_value(col_field->get_text(&row),
            column.opclass, GIN_EXTRACTVALUE);
    for (auto& token : tokens) {
        key_fields->at(0) = std::make_shared<ConstTypeField<uint32_t>>(pos);
        key_fields->at(1) = std::make_shared<ConstTypeField<std::string>>(token);
        key_fields->at(2) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id);
        if constexpr (op == IndexOperation::Insert)
            idx.first->insert(tuple);
        else
            idx.first->remove(tuple);
    }
}
```

### Schema Creation

GIN index schema defines three key columns:

```cpp theme={null}
// schema_helpers.cc - create_gin_index_schema()
SchemaColumn idx_position_c(constant::INDEX_POSITION_FIELD, 0, SchemaType::UINT32, 0, false);
SchemaColumn idx_gin_token_c(constant::INDEX_GIN_TOKEN_FIELD, 0, SchemaType::TEXT, 0, false);
SchemaColumn internal_row_id(constant::INTERNAL_ROW_ID, 0, SchemaType::UINT64, 0, false);

return base_schema->create_index_schema({},
    { idx_position_c, idx_gin_token_c, internal_row_id },
    gin_index_keys, extension_callback);
```

### Trigram Extraction (Query)

Invokes opclass `extractQuery` for search pattern:

```cpp theme={null}
// trgm_query_helpers.cc - extract_gin_keys_from_string()
Oid procOid = get_opfamily_proc(opfamily, inputType, inputType, GIN_EXTRACTQUERY_PROC);

Datum *keys = (Datum *) DatumGetPointer(
    FunctionCall7Coll(&flinfo, collation, queryDatum,
        PointerGetDatum(&nkeys),
        UInt16GetDatum(op_strategy_number),
        PointerGetDatum(&partial_matches),
        PointerGetDatum(&extra_data),
        PointerGetDatum(&nullFlags),
        PointerGetDatum(&searchMode)));
```

### GIN Iterator (in-progress)

Deduplicates rows and resolves physical location:

```cpp theme={null}
// table.cc - GINSecondary::update_page()
while (true) {
    auto&& index_row = *_btree_i;
    internal_row_id = _internal_row_id_f->get_uint64(&index_row);

    // Skip already visited rows (same row may match multiple trigrams)
    if (!_visited_internal_row_ids.contains(internal_row_id)) {
        _visited_internal_row_ids.emplace(internal_row_id);
        break;
    }
    ++_btree_i;
}

// Resolve row location via look-aside index
_look_aside_key_fields->at(0) = std::make_shared<ConstTypeField<uint64_t>>(internal_row_id);
auto lookup_tuple = std::make_shared<FieldTuple>(_look_aside_key_fields, nullptr);
auto&& lookup_i = look_aside_index->lower_bound(lookup_tuple);
```

### FDW Query Routing (in-progress)

Routes LIKE/ILIKE operators to GIN index:

```cpp theme={null}
// pg_fdw_mgr.cc - _iter_start()
case LIKE:
case ILIKE:
    auto search_key = tuple->to_string();
    auto tokens = extract_gin_keys_from_string(search_key, "gin_trgm_ops",
            100, TRGM_LIKE_STRATEGY_NUMBER);
    state->iter_start.emplace(state->table->begin(state->index->id,
            state->index_only_scan, tokens));
    state->iter_end.emplace(state->table->end(state->index->id, state->index_only_scan));
    break;

// pg_fdw_mgr.cc - _is_valid_qual()
case TEXTOID:
    return (op == EQUALS || op == NOT_EQUALS || op == LIKE || op == ILIKE);
```

***

## Path to Completion

The following work remains to complete GIN index support:

1. **Scan Implementation** - The `GINSecondary` iterator currently iterates all index entries. It needs to:
   * Filter entries to only those matching the extracted query tokens
   * Implement proper token intersection logic (all query trigrams must match)

2. **Query Optimization** - The FDW currently uses hardcoded `gin_trgm_ops` and collation. This should be derived from the index metadata.

3. **Testing** - End-to-end testing of LIKE/ILIKE queries using GIN indexes.
