Real Database Testing Strategy

Comprehensive testing approach using actual database systems and infrastructure

Overview

Testing database functionality requires real database systems rather than mocks or simulators. The mcp_ohmy_sql project implements a sophisticated testing strategy that provisions and manages actual database instances across multiple technologies, ensuring comprehensive validation of database operations, schema extraction, and query execution.

The testing strategy balances realism with practicality by using different approaches for different database types:

  • SQLite: Local files for fast, isolated testing

  • PostgreSQL: Docker containers for realistic SQL engine testing

  • AWS Redshift: Real cloud infrastructure for production-like validation

Key Principles:

  • Real Databases: Use actual database systems, not mocks

  • Automated Provisioning: Scripted setup and teardown of test environments

  • Reusable Fixtures: pytest fixtures that manage database lifecycle

  • Sample Data: Consistent Chinook dataset across all database types

  • Infrastructure as Code: CDK for cloud resource provisioning

Test Infrastructure Architecture

The testing system follows a layered fixture approach that coordinates database provisioning, data loading, and test execution:

Test Execution
├── pytest Fixtures (conftest.py)
│   ├── Database Provisioning
│   ├── Schema Setup
│   └── Data Loading
├── Sample Data System (chinook/)
│   ├── Data Models (SQLAlchemy ORM)
│   ├── Data Loader (JSON to DataFrame)
│   └── Data Files (Chinook dataset)
├── Database Setup Scripts
│   ├── setup_relational_database.py
│   └── setup_aws_redshift_database.py
└── Infrastructure (aws/stacks/)
    └── CDK Stack for AWS Resources

Reusable Fixture System: conftest.py

Purpose: Centralized pytest fixture management for database testing

Location: mcp_ohmy_sql/tests/conftest.py

The conftest.py file implements a sophisticated fixture hierarchy that manages the complete lifecycle of test databases:

Data Container Classes:

@dataclasses.dataclass
class SaEngineObjs:
    engine: sa.Engine
    metadata: sa.MetaData
    db_type: DbTypeEnum

These dataclasses provide structured access to database objects and metadata, enabling consistent testing patterns across different database types.

Fixture Categories:

In-Memory SQLite Fixtures

Fast, isolated fixtures for unit testing that create fresh SQLite databases for each test function.

Real Database Factories

Sophisticated factory fixtures that provision and configure actual database instances with proper cleanup.

Configuration and Adapter Fixtures

Class-scoped fixtures that provide configured adapter instances for integration testing.

AWS Redshift Fixtures

Cloud-based fixtures that manage real Redshift Serverless instances for production-like testing.

Key Fixture Patterns:

@pytest.fixture(scope="function")
def in_memory_sqlite_engine_objs():
    # Create fresh SQLite database for each test
    engine = sa.create_engine("sqlite:///:memory:")
    create_all_tables(engine=engine, metadata=Base.metadata, drop_first=False)
    create_all_views(engine=engine, db_type=DbTypeEnum.SQLITE)

    # Yield database for testing
    yield SaEngineObjs(engine=engine, metadata=metadata, db_type=DbTypeEnum.SQLITE)

    # Cleanup after test
    drop_all_views(engine=engine, db_type=DbTypeEnum.SQLITE)
    drop_all_tables(engine=engine, metadata=Base.metadata)

Sample Data System: chinook/

Purpose: Provide consistent test data across all database systems

Location: mcp_ohmy_sql/tests/chinook/

The Chinook sample dataset provides a realistic music store database that includes:

  • Multiple tables with foreign key relationships

  • Sample views for testing complex queries

  • Varied data types (strings, integers, decimals, dates)

  • Realistic data volume for performance testing

ChinookDataLoader: ChinookDataLoader

The data loader implements a sophisticated JSON-to-DataFrame pipeline:

class ChinookDataLoader:
    @cached_property
    def data(self) -> dict:
        return json.loads(path_ChinookData_json.read_text(encoding="utf-8"))

    def get_table_df(self, table_name: str) -> pl.DataFrame:
        # Load raw JSON data
        df = pl.DataFrame(self.data[table_name])

        # Apply database-specific type conversions
        for col_name, col in table.columns.items():
            if isinstance(col.type, sa.Integer):
                df = df.with_columns(pl.col(col_name).cast(pl.Int32))
            elif isinstance(col.type, sa.DateTime):
                df = df.with_columns(pl.col(col_name).str.strptime(...))

Benefits:

  • Type Safety: Automatic conversion between JSON and database types

  • Performance: Polars DataFrames for efficient data processing

  • Consistency: Same data across SQLite, PostgreSQL, and Redshift

  • Flexibility: Easy to add new tables or modify existing data

Relational Database Setup: setup_relational_database.py

Purpose: Automated setup for SQLAlchemy-compatible databases

Location: mcp_ohmy_sql/tests/setup_relational_database.py

Key Functions: create_all_tables(), create_all_views(), insert_all_data()

The relational setup script provides idempotent database provisioning:

Schema Management:

def create_all_tables(engine: sa.Engine, metadata: sa.MetaData, drop_first: bool = True):
    if drop_first:
        drop_all_tables(engine=engine, metadata=metadata)
    metadata.create_all(engine, checkfirst=True)

View Creation: Uses the per-SDK utilities to generate database-specific CREATE VIEW statements, handling SQL dialect differences across database systems.

Data Population: Leverages the ChinookDataLoader to populate tables with consistent sample data across different database types.

Benefits:

  • Database Agnostic: Works with SQLite, PostgreSQL, MySQL, etc.

  • Idempotent: Safe to run multiple times

  • Complete: Tables, views, and data in single operation

  • Fast: Optimized for test execution speed

AWS Redshift Setup: setup_aws_redshift_database.py

Purpose: Specialized setup for AWS Redshift cloud data warehouse

Location: mcp_ohmy_sql/tests/setup_aws_redshift_database.py

Key Functions: create_all_redshift_tables(), insert_all_data_to_redshift()

AWS Redshift requires specialized handling due to its cloud-native architecture:

Table Creation: Uses Redshift-specific SQL DDL statements optimized for columnar storage and distributed architecture.

Data Loading Strategy:

def insert_all_data_to_redshift(conn_or_engine: T_CONN_OR_ENGINE):
    for table in Base.metadata.sorted_tables:
        insert_data_to_one_table(conn_or_engine=conn_or_engine, table=table)

Dual Loading Methods:

  1. Direct INSERT: Fast for small datasets using parameterized queries

  2. S3 COPY: Scalable for large datasets using S3 staging and COPY commands

Cloud Integration: - S3 Staging: Uses S3 for efficient bulk data loading - IAM Roles: Proper authentication between Redshift and S3 - Parquet Format: Optimized data format for Redshift ingestion

Benefits:

  • Production-Like: Tests against real Redshift infrastructure

  • Performance: Optimized loading strategies for different data sizes

  • Security: Proper IAM role-based authentication

  • Scalability: Handles both small test datasets and larger validation data

Configuration-Driven Testing: test_adapter.py

Purpose: Demonstrate real-world adapter usage with actual databases

Location: mcp_ohmy_sql/tests/test_adapter.py

The test adapter module shows how the complete system integrates:

from .test_config import DatabaseEnum, test_config
test_adapter = Adapter(config=test_config)

# Provision SQLite database
sqlite_database = DatabaseEnum.chinook_sqlite
setup_relational_database(
    engine=sqlite_database.connection.sa_engine,
    metadata=Base.metadata,
    db_type=sqlite_database.db_type_enum,
)

Integration Benefits:

  • Real Configuration: Uses actual configuration objects from test_config.py

  • Multiple Databases: Can test across SQLite, PostgreSQL, and Redshift simultaneously

  • Adapter Testing: Validates the complete configuration → adapter → database pipeline

  • Production Simulation: Tests the exact same code paths used in production

Infrastructure as Code: aws/stacks/

Purpose: Provision real AWS infrastructure for Redshift testing

Location: mcp_ohmy_sql/tests/aws/stacks/mcp_ohmy_sql_stack/

CDK Infrastructure: Stack

The CDK stack provisions complete AWS infrastructure for testing:

Infrastructure Components:

def create_workgroup(self):
    self.workgroup = redshiftserverless.CfnWorkgroup(
        workgroup_name=self.workgroup_name,
        namespace_name=self.namespace_name,
        base_capacity=8,  # minimal capacity 8 RPUs
        publicly_accessible=True,
        subnet_ids=subnet_ids,
        security_group_ids=[self.sg.security_group_id],
    )

Security Configuration: - VPC Integration: Uses existing VPC with proper subnet configuration - Security Groups: Restricts access to developer IP addresses - IAM Roles: Proper permissions for Redshift to access S3

Cost Optimization: - Serverless: Pay-per-use pricing model - Minimal Capacity: 8 RPU minimum for cost control - Automatic Cleanup: CDK removal policies for resource cleanup

Benefits:

  • Reproducible: Infrastructure defined as code

  • Secure: Proper network and IAM security

  • Cost-Effective: Minimal resource usage for testing

  • Realistic: Production-like infrastructure for validation

Testing Workflow Integration

The complete testing workflow demonstrates how all components work together:

Local Development: 1. Run unit tests with in-memory SQLite fixtures 2. Validate against local PostgreSQL container 3. Execute integration tests with real AWS Redshift

CI/CD Pipeline: 1. Provision AWS infrastructure using CDK 2. Run test suite against all database types 3. Clean up infrastructure after testing

Test Categories:

Unit Tests

Fast tests using in-memory SQLite with fixture-managed lifecycle

Integration Tests

Medium-speed tests using containerized PostgreSQL for realistic SQL engine behavior

End-to-End Tests

Comprehensive tests using real AWS Redshift for production validation

Performance Tests

Load testing using the full Chinook dataset across all database types

Architecture Benefits

For Development:

  • Fast Feedback: In-memory SQLite for rapid unit testing

  • Realistic Testing: Actual database engines catch real-world issues

  • Multi-Database: Validates compatibility across different systems

  • Automated Setup: No manual database configuration required

For Quality Assurance:

  • Production Parity: Tests against same infrastructure as production

  • Comprehensive Coverage: All database types and operations tested

  • Performance Validation: Real query performance measurement

  • Security Testing: Actual authentication and authorization mechanisms

For Operations:

  • Infrastructure Validation: CDK ensures consistent environment provisioning

  • Cost Control: Automated cleanup prevents resource waste

  • Monitoring: Real AWS metrics and logging

  • Scalability Testing: Validate performance under realistic loads

The real database testing strategy ensures that the MCP server works reliably across diverse database environments while maintaining development velocity through intelligent use of different testing approaches for different scenarios.