Real Database Testing Strategy¶

Comprehensive testing approach using actual database systems and infrastructure

Overview¶

Testing database functionality requires real database systems rather than mocks or simulators. The mcp_ohmy_sql project implements a sophisticated testing strategy that provisions and manages actual database instances across multiple technologies, ensuring comprehensive validation of database operations, schema extraction, and query execution.

The testing strategy balances realism with practicality by using different approaches for different database types:

SQLite: Local files for fast, isolated testing
PostgreSQL: Docker containers for realistic SQL engine testing
AWS Redshift: Real cloud infrastructure for production-like validation

Key Principles:

Real Databases: Use actual database systems, not mocks
Automated Provisioning: Scripted setup and teardown of test environments
Reusable Fixtures: pytest fixtures that manage database lifecycle
Sample Data: Consistent Chinook dataset across all database types
Infrastructure as Code: CDK for cloud resource provisioning

Test Infrastructure Architecture¶

The testing system follows a layered fixture approach that coordinates database provisioning, data loading, and test execution:

Test Execution
├── pytest Fixtures (conftest.py)
│   ├── Database Provisioning
│   ├── Schema Setup
│   └── Data Loading
├── Sample Data System (chinook/)
│   ├── Data Models (SQLAlchemy ORM)
│   ├── Data Loader (JSON to DataFrame)
│   └── Data Files (Chinook dataset)
├── Database Setup Scripts
│   ├── setup_relational_database.py
│   └── setup_aws_redshift_database.py
└── Infrastructure (aws/stacks/)
    └── CDK Stack for AWS Resources

Reusable Fixture System: conftest.py¶

Purpose: Centralized pytest fixture management for database testing

Location: mcp_ohmy_sql/tests/conftest.py

The conftest.py file implements a sophisticated fixture hierarchy that manages the complete lifecycle of test databases:

Data Container Classes:

@dataclasses.dataclass
class SaEngineObjs:
    engine: sa.Engine
    metadata: sa.MetaData
    db_type: DbTypeEnum

These dataclasses provide structured access to database objects and metadata, enabling consistent testing patterns across different database types.

Fixture Categories:

In-Memory SQLite Fixtures: Fast, isolated fixtures for unit testing that create fresh SQLite databases for each test function.
Real Database Factories: Sophisticated factory fixtures that provision and configure actual database instances with proper cleanup.
Configuration and Adapter Fixtures: Class-scoped fixtures that provide configured adapter instances for integration testing.
AWS Redshift Fixtures: Cloud-based fixtures that manage real Redshift Serverless instances for production-like testing.

Key Fixture Patterns:

@pytest.fixture(scope="function")
def in_memory_sqlite_engine_objs():
    # Create fresh SQLite database for each test
    engine = sa.create_engine("sqlite:///:memory:")
    create_all_tables(engine=engine, metadata=Base.metadata, drop_first=False)
    create_all_views(engine=engine, db_type=DbTypeEnum.SQLITE)

    # Yield database for testing
    yield SaEngineObjs(engine=engine, metadata=metadata, db_type=DbTypeEnum.SQLITE)

    # Cleanup after test
    drop_all_views(engine=engine, db_type=DbTypeEnum.SQLITE)
    drop_all_tables(engine=engine, metadata=Base.metadata)

Sample Data System: chinook/¶

Purpose: Provide consistent test data across all database systems

Location: mcp_ohmy_sql/tests/chinook/

The Chinook sample dataset provides a realistic music store database that includes:

Multiple tables with foreign key relationships
Sample views for testing complex queries
Varied data types (strings, integers, decimals, dates)
Realistic data volume for performance testing

ChinookDataLoader: ChinookDataLoader

The data loader implements a sophisticated JSON-to-DataFrame pipeline:

class ChinookDataLoader:
    @cached_property
    def data(self) -> dict:
        return json.loads(path_ChinookData_json.read_text(encoding="utf-8"))

    def get_table_df(self, table_name: str) -> pl.DataFrame:
        # Load raw JSON data
        df = pl.DataFrame(self.data[table_name])

        # Apply database-specific type conversions
        for col_name, col in table.columns.items():
            if isinstance(col.type, sa.Integer):
                df = df.with_columns(pl.col(col_name).cast(pl.Int32))
            elif isinstance(col.type, sa.DateTime):
                df = df.with_columns(pl.col(col_name).str.strptime(...))

Benefits:

Type Safety: Automatic conversion between JSON and database types
Performance: Polars DataFrames for efficient data processing
Consistency: Same data across SQLite, PostgreSQL, and Redshift
Flexibility: Easy to add new tables or modify existing data

Relational Database Setup: setup_relational_database.py¶

Purpose: Automated setup for SQLAlchemy-compatible databases

Location: mcp_ohmy_sql/tests/setup_relational_database.py

Key Functions: create_all_tables(), create_all_views(), insert_all_data()

The relational setup script provides idempotent database provisioning:

Schema Management:

def create_all_tables(engine: sa.Engine, metadata: sa.MetaData, drop_first: bool = True):
    if drop_first:
        drop_all_tables(engine=engine, metadata=metadata)
    metadata.create_all(engine, checkfirst=True)

View Creation: Uses the per-SDK utilities to generate database-specific CREATE VIEW statements, handling SQL dialect differences across database systems.

Data Population: Leverages the ChinookDataLoader to populate tables with consistent sample data across different database types.

Benefits:

Database Agnostic: Works with SQLite, PostgreSQL, MySQL, etc.
Idempotent: Safe to run multiple times
Complete: Tables, views, and data in single operation
Fast: Optimized for test execution speed

AWS Redshift Setup: setup_aws_redshift_database.py¶

Purpose: Specialized setup for AWS Redshift cloud data warehouse

Location: mcp_ohmy_sql/tests/setup_aws_redshift_database.py

Key Functions: create_all_redshift_tables(), insert_all_data_to_redshift()

AWS Redshift requires specialized handling due to its cloud-native architecture:

Table Creation: Uses Redshift-specific SQL DDL statements optimized for columnar storage and distributed architecture.

Data Loading Strategy:

def insert_all_data_to_redshift(conn_or_engine: T_CONN_OR_ENGINE):
    for table in Base.metadata.sorted_tables:
        insert_data_to_one_table(conn_or_engine=conn_or_engine, table=table)

Dual Loading Methods:

Direct INSERT: Fast for small datasets using parameterized queries
S3 COPY: Scalable for large datasets using S3 staging and COPY commands

Cloud Integration: - S3 Staging: Uses S3 for efficient bulk data loading - IAM Roles: Proper authentication between Redshift and S3 - Parquet Format: Optimized data format for Redshift ingestion

Benefits:

Production-Like: Tests against real Redshift infrastructure
Performance: Optimized loading strategies for different data sizes
Security: Proper IAM role-based authentication
Scalability: Handles both small test datasets and larger validation data

Configuration-Driven Testing: test_adapter.py¶

Purpose: Demonstrate real-world adapter usage with actual databases

Location: mcp_ohmy_sql/tests/test_adapter.py

The test adapter module shows how the complete system integrates:

from .test_config import DatabaseEnum, test_config
test_adapter = Adapter(config=test_config)

# Provision SQLite database
sqlite_database = DatabaseEnum.chinook_sqlite
setup_relational_database(
    engine=sqlite_database.connection.sa_engine,
    metadata=Base.metadata,
    db_type=sqlite_database.db_type_enum,
)

Integration Benefits:

Real Configuration: Uses actual configuration objects from test_config.py
Multiple Databases: Can test across SQLite, PostgreSQL, and Redshift simultaneously
Adapter Testing: Validates the complete configuration → adapter → database pipeline
Production Simulation: Tests the exact same code paths used in production

Infrastructure as Code: aws/stacks/¶

Purpose: Provision real AWS infrastructure for Redshift testing

Location: mcp_ohmy_sql/tests/aws/stacks/mcp_ohmy_sql_stack/

CDK Infrastructure: Stack

The CDK stack provisions complete AWS infrastructure for testing:

Infrastructure Components:

def create_workgroup(self):
    self.workgroup = redshiftserverless.CfnWorkgroup(
        workgroup_name=self.workgroup_name,
        namespace_name=self.namespace_name,
        base_capacity=8,  # minimal capacity 8 RPUs
        publicly_accessible=True,
        subnet_ids=subnet_ids,
        security_group_ids=[self.sg.security_group_id],
    )

Security Configuration: - VPC Integration: Uses existing VPC with proper subnet configuration - Security Groups: Restricts access to developer IP addresses - IAM Roles: Proper permissions for Redshift to access S3

Cost Optimization: - Serverless: Pay-per-use pricing model - Minimal Capacity: 8 RPU minimum for cost control - Automatic Cleanup: CDK removal policies for resource cleanup

Benefits:

Reproducible: Infrastructure defined as code
Secure: Proper network and IAM security
Cost-Effective: Minimal resource usage for testing
Realistic: Production-like infrastructure for validation

Testing Workflow Integration¶

The complete testing workflow demonstrates how all components work together:

Local Development: 1. Run unit tests with in-memory SQLite fixtures 2. Validate against local PostgreSQL container 3. Execute integration tests with real AWS Redshift

CI/CD Pipeline: 1. Provision AWS infrastructure using CDK 2. Run test suite against all database types 3. Clean up infrastructure after testing

Test Categories:

Unit Tests: Fast tests using in-memory SQLite with fixture-managed lifecycle
Integration Tests: Medium-speed tests using containerized PostgreSQL for realistic SQL engine behavior
End-to-End Tests: Comprehensive tests using real AWS Redshift for production validation
Performance Tests: Load testing using the full Chinook dataset across all database types

Architecture Benefits¶

For Development:

Fast Feedback: In-memory SQLite for rapid unit testing
Realistic Testing: Actual database engines catch real-world issues
Multi-Database: Validates compatibility across different systems
Automated Setup: No manual database configuration required

For Quality Assurance:

Production Parity: Tests against same infrastructure as production
Comprehensive Coverage: All database types and operations tested
Performance Validation: Real query performance measurement
Security Testing: Actual authentication and authorization mechanisms

For Operations:

Infrastructure Validation: CDK ensures consistent environment provisioning
Cost Control: Automated cleanup prevents resource waste
Monitoring: Real AWS metrics and logging
Scalability Testing: Validate performance under realistic loads

The real database testing strategy ensures that the MCP server works reliably across diverse database environments while maintaining development velocity through intelligent use of different testing approaches for different scenarios.