Documentation Index
Fetch the complete documentation index at: https://docs.springtail.io/llms.txt
Use this file to discover all available pages before exploring further.
Schema Management Architecture
Table of Contents
- Overview
- System Tables
- Schema Information Management
- Caching Architecture
- System Table Manager
- Data Flow
- Key Design Patterns
Overview
The Springtail schema management system provides versioned metadata tracking for all database objects including tables, schemas, indexes, and user-defined types. The architecture is designed to support:- Multi-Version Concurrency Control (MVCC): Every metadata entry is versioned by transaction ID (XID) and Log Sequence Number (LSN)
- Multi-Process Caching: Shared memory caches enable efficient metadata sharing across processes
- DDL Evolution Tracking: Complete history of schema changes over time
- Client-Server Architecture: Centralized management via gRPC with distributed caching
- System Tables: Persistent storage of metadata on disk
- Caching Layers: Multi-level caches (in-process and shared memory)
- System Table Manager: Service layer managing reads, writes, and synchronization
System Tables
System tables are special metadata tables that track information about user tables, schemas, and indexes. They are stored on disk and managed by the storage engine like regular tables, but contain metadata rather than user data.The Eight System Tables
Defined ininclude/sys_tbl_mgr/system_tables.hh:
1. TableNames (ID: 1)
Tracks all tables in the database. Data Columns:namespace_id- Schema/namespace IDname- Table nametable_id- Unique table identifierxid,lsn- Transaction versionexists- Deletion marker (soft delete)parent_table_id- For partitioned tablespartition_key- Partition key expressionpartition_bound- Partition boundsrls_enabled- Row-level security flagrls_forced- Force RLS for table ownerinternal_row_id- Internal identifier
(table_id, xid, lsn)
Secondary Index: (namespace_id, name, xid, lsn) - For lookup by qualified name
2. TableRoots (ID: 2)
Stores B-tree root extent IDs and table statistics at each XID. Data Columns:table_id- Table identifierindex_id- Index identifier (0 = primary)xid- Transaction versionextent_id- Root extent ID of the B-treesnapshot_xid- Snapshot identifier for consistencyend_offset- Data file offset after syncinternal_row_id- Internal identifier
(table_id, index_id, xid)
Purpose: Enables accessing a table’s B-tree root at any historical XID.
3. Indexes (ID: 3)
Maps which columns participate in which indexes. Data Columns:table_id- Table identifierindex_id- Index identifierxid,lsn- Transaction versionposition- Position in index (0-based)column_id- Column position in tableinternal_row_id- Internal identifier
(table_id, index_id, xid, lsn, position)
Purpose: Multiple rows per index define the column ordering.
4. Schemas (ID: 4)
Column definitions for all tables. Data Columns:table_id- Table identifierposition- Column position (can have gaps)xid,lsn- Transaction versionexists- Column active/dropped flagname- Column nametype- Springtail internal type (SchemaType)pg_type- PostgreSQL type OIDnullable- NULL constraintdefault- Default value expressionupdate_type- Type of change (ADD, DROP, MODIFY)internal_row_id- Internal identifier
(table_id, position, xid, lsn)
Purpose: Tracks column history, allowing schema evolution tracking.
5. TableStats (ID: 5)
Table statistics at each XID. Data Columns:table_id- Table identifierxid- Transaction versionrow_count- Number of rowslast_internal_row_id- Last assigned row IDinternal_row_id- Internal identifier
(table_id, xid)
Purpose: Used for query planning and optimization.
6. IndexNames (ID: 6)
Secondary index metadata and lifecycle state. Data Columns:table_id- Table identifierindex_id- Index identifierxid,lsn- Transaction versionnamespace_id- Schema/namespace IDname- Index namestate- Lifecycle state (NOT_READY, READY, DELETED, BEING_DELETED)is_unique- Uniqueness constraintinternal_row_id- Internal identifier
(table_id, index_id, xid, lsn)
Purpose: Tracks index build status and metadata.
7. NamespaceNames (ID: 7)
Database schemas/namespaces from PostgreSQL. Data Columns:namespace_id- Unique namespace identifiername- Namespace name (e.g., “public”)xid,lsn- Transaction versionexists- Deletion markerinternal_row_id- Internal identifier
(namespace_id, xid, lsn)
Secondary Index: (name, xid, lsn) - For lookup by name
8. UserTypes (ID: 8)
User-defined types (primarily enums). Data Columns:type_id- Type identifiernamespace_id- Schema containing the typename- Type namevalue- JSON-encoded type valuesxid,lsn- Transaction versiontype- Type category (‘E’ for enum)exists- Deletion markerinternal_row_id- Internal identifier
(type_id, xid, lsn)
Purpose: Stores enum values and other user-defined type information.
Key Concepts
XID/LSN Versioning:- Every metadata entry is stamped with
(xid, lsn)pair - Enables querying metadata at any historical transaction
- Critical for MVCC and consistency
- The
existsflag marks objects as deleted without physical removal - Allows historical queries to see deleted objects at past XIDs
- Physical cleanup can happen during maintenance
- Stored in TableRoots and TableNames for partitioned tables
- Represents the XID at which table data was synchronized
- Used for schema version matching during data reads
Schema Information Management
Schema information describes the structure of a table (columns, types, indexes) at a specific point in time.SchemaMetadata Structure
Defined ininclude/storage/schema.hh:
SchemaColumn
Each column is described by:- Identity:
name,position,xid,lsn - Type Information:
type(internal),pg_type(PostgreSQL OID) - Constraints:
nullable,pkey_position,default_value - Type Metadata:
type_name,type_namespace,collation,type_category - Lifecycle:
exists,update_type
Index Structure
Index::Column contains:
idx_position- Position in index (0-based)position- Column position in table
Schema Change Tracking
Theupdate_type field tracks the nature of each change:
NEW_COLUMN- Column addedREMOVE_COLUMN- Column droppedNAME_CHANGE- Column renamedNULLABLE_CHANGE- NULL constraint changedRESYNC- Table resynchronizedNEW_INDEX- Index addedDROP_INDEX- Index droppedNO_CHANGE- No modification
- The
historyvector inSchemaMetadatatracks all changes in XID order - Each entry represents a schema modification event
- Enables replaying schema evolution for synchronization
Caching Architecture
The system employs a multi-level caching strategy to minimize disk I/O and RPC overhead.1. Shared Memory Cache (ShmCache)
Location:include/sys_tbl_mgr/shm_cache.hh, src/sys_tbl_mgr/shm_cache.cc
Purpose: Cross-process caching of serialized metadata using Boost interprocess shared memory.
Architecture
Technology Stack:boost::interprocess::managed_shared_memory- Shared memory segmentboost::interprocess::named_sharable_mutex- Cross-process lockingboost::multi_index_container- LRU eviction tracking
SHM_CACHE_ROOTS("springtail.roots") - Table roots and statisticsSHM_CACHE_SCHEMAS("springtail.schemas") - Schema metadataSHM_CACHE_USERTYPES("springtail.usertypes") - User-defined typesSHM_CACHE_TABLE_IDS("springtail.table_ids") - Table ID lookupsSHM_CACHE_EXTENTS("springtail.extents") - Extent metadata
Cache Data Structure
- Serialized Storage: Stores protobuf-serialized messages for portability
- XID Versioning: Multiple versions per object, sorted by XID/LSN
- LRU Eviction: Automatically evicts least-recently-used entries when memory fills
- Dropped Markers: Can mark objects as dropped without removal
- Memory Management: Auto-evicts when free memory falls below 30% (target: 50%)
XID Lifecycle Management
Committed XID Tracking:- Tracks the last committed XID per database
- Records whether schema changes occurred
- Maintains XID history to prevent accessing stale schemas
- Must call
keep_alive()orupdate_committed_xid()every 60ms - Prevents using stale committed XIDs from crashed processes
- Timestamp-based freshness checking
- Tracks XIDs that have modified metadata but haven’t yet committed
- Used during crash recovery and consistency checks
XID History
Optionally tracks schema change history:- Survives
finalize()to track schema evolution - Used to find appropriate committed XID for a given schema XID
- Cleaned up via
cleanup_xid_history()
2. In-Process Schema Cache (SchemaCache)
Location:include/sys_tbl_mgr/schema_cache.hh, src/sys_tbl_mgr/schema_cache.cc
Purpose: In-process LRU cache of constructed SchemaMetadata objects (not serialized).
Architecture
Cache Entry:(db_id, table_id)
Capacity: Default 128 entries (configurable)
Features
- LRU Eviction: Removes least-recently-used schemas when capacity exceeded
- Lazy Loading: Populates on demand via callback function
- Invalidation: Marks schemas invalid when DDL detected
- Index Mapping: Tracks
(db_id, index_id) -> table_idfor drop-index invalidation - Concurrent Fetch Coordination: Uses condition variables to prevent duplicate fetches
Invalidation Strategies
Table Invalidation:- Marks the schema entry as ending at the provided XID
- Future accesses beyond this XID will trigger re-fetch
- Uses index-to-table mapping to find affected table
- Invalidates table schema
- Invalidates all tables in the database
- Called when DDL changes detected at FDW level
3. Generic Message Cache (MsgCache)
Location:include/sys_tbl_mgr/msg_cache.hh
Purpose: Template-based foundation for ShmCache, providing generic serialized message caching.
Design Pattern:
- Uses traits-based design for customization
- Supports any allocator (regular or shared memory)
- LRU eviction via Boost multi-index container
- Thread-safe via template mutex parameter
Caching Strategy Summary
| Cache Level | Scope | Data Format | Eviction | Use Case |
|---|---|---|---|---|
| ShmCache | Cross-process | Protobuf serialized | LRU + memory threshold | Share metadata between FDW workers |
| SchemaCache | Single process | Constructed objects | LRU (128 entries) | Fast in-memory access to schemas |
| Server Uncommitted Caches | Server only | Native structures | Manual (on finalize) | Track pending DDL changes |
- Client checks in-process SchemaCache
- On miss, checks ShmCache
- On miss, RPCs to server
- Server checks uncommitted caches, then disk
- Response propagates back: Server → SHM → SchemaCache → Client
System Table Manager
Thesys_tbl_mgr is a gRPC service that manages reading and writing system tables.
Architecture - Two Modes
Server-Side (SystemTableMgrServer)
Location: include/sys_tbl_mgr/server.hh, src/sys_tbl_mgr/server.cc
Purpose: Manages read/write access to system tables within the server process.
Key Responsibilities:
- Handle CREATE/ALTER/DROP table/index/namespace/type operations
- Maintain uncommitted caches for pending transactions
- Persist system tables to disk on finalize
- Manage XID progression and synchronization
- Serve gRPC requests from clients
Client-Side (SystemTableMgrClient)
Location: include/sys_tbl_mgr/client.hh, src/sys_tbl_mgr/client.cc
Purpose: Remote access to system tables via gRPC from FDW processes.
Key Responsibilities:
- Proxy read requests to server
- Maintain local caches (SchemaCache + SHM caches)
- Handle invalidation notifications
- Coordinate with multiple worker processes
gRPC Service Definition
Location:proto/sys_tbl_mgr.proto
Ping: Health checkGetRoots: Fetch table roots and stats at XIDGetSchema: Fetch table schema at XIDGetTargetSchema: Fetch schema with change history between XIDsExists: Check if table exists at XIDGetUserType: Fetch user-defined type at XID
Server-Side Uncommitted Caches
The server maintains multiple in-memory caches for uncommitted DDL changes:1. Table Cache (_table_cache)
DB → Table ID → XID/LSN → TableCacheRecord
Purpose: Track table metadata during CREATE/ALTER TABLE before commit.
2. Roots Cache (_roots_cache)
DB → Table ID → XID/LSN → RootsCacheRecord
Purpose: Track table roots and statistics during data synchronization.
3. Schema Cache (_schema_cache)
DB → Table ID → Column ID → vector<ColumnHistory>
Purpose: Track column additions, drops, and modifications during ALTER TABLE.
4. Index Cache (_index_cache)
DB → Table ID → Index ID → vector<IndexCacheItem>
Purpose: Track index definitions during CREATE/DROP INDEX.
5. Namespace Caches
_namespace_id_cache: MapDB → Namespace ID → XID/LSN → NamespaceRecord_namespace_name_cache: MapDB → Namespace Name → XID/LSN → NamespaceRecord
6. User Type Cache (_usertype_id_cache)
Map: DB → Type ID → XID/LSN → UserTypeRecord
Purpose: Track user-defined type changes during CREATE/ALTER/DROP TYPE.
7. Table Existence Cache (_table_existence_cache)
Special Property: Persists across finalize() calls
DB → Table ID → vector<TableExistenceRange>
Purpose:
- Track table lifecycle across drops and recreates (resync operations)
- Survives finalize to support historical queries
- Protected by dedicated
_table_existence_cache_mutex
Server Lifecycle Operations
DDL Operations
- Populate uncommitted caches
- Do NOT write to disk
- Return DDL JSON for DDL manager
Synchronization Operations
Transaction Finalization
- Write all uncommitted caches to system tables on disk
- Flush system tables to disk (if
call_sync=trueor via separatesync()) - Update committed XID in SHM caches with schema change flags
- Clear uncommitted caches (except
_table_existence_cache)
finalize() can be called without sync() for async persistence.
Transaction Abort
- Discard all uncommitted changes for the given XID
- Clear entries from all uncommitted caches
- No disk writes
Cache Invalidation
Client-Side Operations
TheClient singleton provides a simplified interface:
- Check in-process SchemaCache
- On miss, check SHM cache
- On miss, issue gRPC call to server
- Cache response in both SHM and SchemaCache
- Return to caller
Data Flow
Schema Read Flow (FDW → Server)
Table Roots Read Flow
DDL Write Flow (CREATE TABLE)
Cache Invalidation Flow
When DDL changes occur, caches must be invalidated:- Server updates SHM cache during
finalize()with new schema - Clients invalidate local SchemaCache when notified
- Next access automatically fetches new schema from SHM or server
- Multi-process coordination via SHM ensures consistency
Key Design Patterns
1. Two-Level Client Caching
Pattern: Client maintains both SHM cache (cross-process, serialized) and SchemaCache (in-process, deserialized). Benefits:- SHM cache enables sharing between FDW worker processes
- SchemaCache provides fast in-memory access without deserialization
- Reduces RPC overhead significantly
2. Uncommitted vs. Committed Data Separation
Pattern: Server maintains separate caches for uncommitted DDL changes. Benefits:- Allows querying pending changes without disk I/O
- Clean separation between in-flight and committed metadata
- Enables atomic commit via
finalize()
- Uncommitted:
_table_cache,_schema_cache,_index_cache, etc. - Committed: System tables on disk, SHM caches
- Cleared on
finalize()orrevert()
3. XID/LSN Versioning
Pattern: Every metadata entry tagged with(xid, lsn).
Benefits:
- Enables querying schema at any historical point (MVCC)
- Supports concurrent transactions without locking
- Critical for consistency in distributed system
4. LRU Eviction with XID History Preservation
Pattern: SHM cache uses LRU for memory management but preserves XID history. Benefits:- Prevents memory exhaustion
- Retains critical XID commit information
- Allows detecting dropped objects without full history
- Message data evicted via LRU
- XID history retained via
_xid_history_map keep_alive()ensures timestamp freshness
5. Lazy Evaluation
Pattern: SchemaCache populates on-demand via callback. Benefits:- Defers expensive construction until needed
- Allows server to control fetch logic
- Reduces memory footprint for unused schemas
6. Persistent Table Existence Cache
Pattern:_table_existence_cache survives finalize() calls.
Benefits:
- Supports resync operations (drop and recreate table)
- Enables fast existence checks without disk I/O
- Tracks complete table lifecycle across multiple creation/deletion cycles
[start_xid, end_xid).
7. Index-to-Table Mapping
Pattern: SchemaCache maintains(db, index_id) -> table_id mapping.
Benefits:
- Enables invalidation during DROP INDEX when table_id not provided
- PostgreSQL event trigger doesn’t provide table_id for index drops
- Efficient schema cache invalidation
8. Soft Deletes with exists Flag
Pattern: Mark objects as deleted with exists=false rather than physical deletion.
Benefits:
- Historical queries can see deleted objects at past XIDs
- Simplifies MVCC implementation
- Avoids complex deletion cascade logic
9. Write-Through Caching
Pattern: Server writes to both uncommitted cache AND system tables simultaneously. Benefits:- Ensures consistency between cache and disk
- Simplifies finalize logic (just flush to disk)
- Enables fast queries during transaction
10. XID Heartbeat Mechanism
Pattern: Require periodickeep_alive() calls to validate committed XIDs.
Benefits:
- Prevents using stale XIDs from crashed processes
- Simple liveness detection
- No complex distributed consensus required
XID_KEEP_ALIVE_PERIOD = 60msis_alive()checks timestamp freshnessget_committed_xid()fails if not alive
Appendix: Key File Reference
Core Headers
include/sys_tbl_mgr/system_tables.hh- System table schemas and helper classesinclude/sys_tbl_mgr/shm_cache.hh- Shared memory cache interfaceinclude/sys_tbl_mgr/schema_cache.hh- In-process schema cacheinclude/sys_tbl_mgr/server.hh- Server-side managementinclude/sys_tbl_mgr/client.hh- Client-side RPC interfaceinclude/sys_tbl_mgr/table.hh- Table interface and SchemaMetadata definitioninclude/sys_tbl_mgr/msg_cache.hh- Generic message cache templateinclude/storage/schema.hh- SchemaColumn, Index, SchemaMetadata definitions
Implementations
src/sys_tbl_mgr/system_tables.cc- System table schema definitionssrc/sys_tbl_mgr/shm_cache.cc- Shared memory cache implementationsrc/sys_tbl_mgr/schema_cache.cc- In-process cache implementationsrc/sys_tbl_mgr/server.cc- Server logic and DDL handlingsrc/sys_tbl_mgr/client.cc- Client logic and RPC implementation
Protocol Definitions
proto/sys_tbl_mgr.proto- gRPC service and message definitions
Summary
The Springtail schema management architecture provides a robust, versioned metadata system supporting MVCC, DDL evolution tracking, and efficient multi-process access. Key strengths include:- Multi-Version Concurrency Control: Every metadata entry versioned by XID/LSN
- Three-Tier Caching: SHM (cross-process) → SchemaCache (in-process) → Disk
- Uncommitted Change Tracking: Server-side caches for pending transactions
- Historical Queries: Access schema at any past XID via version tracking
- Efficient Invalidation: Targeted cache invalidation on DDL changes
- Cross-Process Coordination: Shared memory enables worker process efficiency