Real Database Testing Strategy¶
Comprehensive testing approach using actual database systems and infrastructure
Overview¶
Testing database functionality requires real database systems rather than mocks or simulators. The mcp_ohmy_sql project implements a sophisticated testing strategy that provisions and manages actual database instances across multiple technologies, ensuring comprehensive validation of database operations, schema extraction, and query execution.
The testing strategy balances realism with practicality by using different approaches for different database types:
SQLite: Local files for fast, isolated testing
PostgreSQL: Docker containers for realistic SQL engine testing
AWS Redshift: Real cloud infrastructure for production-like validation
Key Principles:
Real Databases: Use actual database systems, not mocks
Automated Provisioning: Scripted setup and teardown of test environments
Reusable Fixtures: pytest fixtures that manage database lifecycle
Sample Data: Consistent Chinook dataset across all database types
Infrastructure as Code: CDK for cloud resource provisioning
Test Infrastructure Architecture¶
The testing system follows a layered fixture approach that coordinates database provisioning, data loading, and test execution:
Test Execution
├── pytest Fixtures (conftest.py)
│ ├── Database Provisioning
│ ├── Schema Setup
│ └── Data Loading
├── Sample Data System (chinook/)
│ ├── Data Models (SQLAlchemy ORM)
│ ├── Data Loader (JSON to DataFrame)
│ └── Data Files (Chinook dataset)
├── Database Setup Scripts
│ ├── setup_relational_database.py
│ └── setup_aws_redshift_database.py
└── Infrastructure (aws/stacks/)
└── CDK Stack for AWS Resources
Reusable Fixture System: conftest.py¶
Purpose: Centralized pytest fixture management for database testing
Location: mcp_ohmy_sql/tests/conftest.py
The conftest.py file implements a sophisticated fixture hierarchy that manages the complete lifecycle of test databases:
Data Container Classes:
@dataclasses.dataclass
class SaEngineObjs:
engine: sa.Engine
metadata: sa.MetaData
db_type: DbTypeEnum
These dataclasses provide structured access to database objects and metadata, enabling consistent testing patterns across different database types.
Fixture Categories:
- In-Memory SQLite Fixtures
Fast, isolated fixtures for unit testing that create fresh SQLite databases for each test function.
- Real Database Factories
Sophisticated factory fixtures that provision and configure actual database instances with proper cleanup.
- Configuration and Adapter Fixtures
Class-scoped fixtures that provide configured adapter instances for integration testing.
- AWS Redshift Fixtures
Cloud-based fixtures that manage real Redshift Serverless instances for production-like testing.
Key Fixture Patterns:
@pytest.fixture(scope="function")
def in_memory_sqlite_engine_objs():
# Create fresh SQLite database for each test
engine = sa.create_engine("sqlite:///:memory:")
create_all_tables(engine=engine, metadata=Base.metadata, drop_first=False)
create_all_views(engine=engine, db_type=DbTypeEnum.SQLITE)
# Yield database for testing
yield SaEngineObjs(engine=engine, metadata=metadata, db_type=DbTypeEnum.SQLITE)
# Cleanup after test
drop_all_views(engine=engine, db_type=DbTypeEnum.SQLITE)
drop_all_tables(engine=engine, metadata=Base.metadata)
Sample Data System: chinook/¶
Purpose: Provide consistent test data across all database systems
Location: mcp_ohmy_sql/tests/chinook/
The Chinook sample dataset provides a realistic music store database that includes:
Multiple tables with foreign key relationships
Sample views for testing complex queries
Varied data types (strings, integers, decimals, dates)
Realistic data volume for performance testing
ChinookDataLoader: ChinookDataLoader
The data loader implements a sophisticated JSON-to-DataFrame pipeline:
class ChinookDataLoader:
@cached_property
def data(self) -> dict:
return json.loads(path_ChinookData_json.read_text(encoding="utf-8"))
def get_table_df(self, table_name: str) -> pl.DataFrame:
# Load raw JSON data
df = pl.DataFrame(self.data[table_name])
# Apply database-specific type conversions
for col_name, col in table.columns.items():
if isinstance(col.type, sa.Integer):
df = df.with_columns(pl.col(col_name).cast(pl.Int32))
elif isinstance(col.type, sa.DateTime):
df = df.with_columns(pl.col(col_name).str.strptime(...))
Benefits:
Type Safety: Automatic conversion between JSON and database types
Performance: Polars DataFrames for efficient data processing
Consistency: Same data across SQLite, PostgreSQL, and Redshift
Flexibility: Easy to add new tables or modify existing data
Relational Database Setup: setup_relational_database.py¶
Purpose: Automated setup for SQLAlchemy-compatible databases
Location: mcp_ohmy_sql/tests/setup_relational_database.py
Key Functions: create_all_tables(), create_all_views(), insert_all_data()
The relational setup script provides idempotent database provisioning:
Schema Management:
def create_all_tables(engine: sa.Engine, metadata: sa.MetaData, drop_first: bool = True):
if drop_first:
drop_all_tables(engine=engine, metadata=metadata)
metadata.create_all(engine, checkfirst=True)
View Creation: Uses the per-SDK utilities to generate database-specific CREATE VIEW statements, handling SQL dialect differences across database systems.
Data Population: Leverages the ChinookDataLoader to populate tables with consistent sample data across different database types.
Benefits:
Database Agnostic: Works with SQLite, PostgreSQL, MySQL, etc.
Idempotent: Safe to run multiple times
Complete: Tables, views, and data in single operation
Fast: Optimized for test execution speed
AWS Redshift Setup: setup_aws_redshift_database.py¶
Purpose: Specialized setup for AWS Redshift cloud data warehouse
Location: mcp_ohmy_sql/tests/setup_aws_redshift_database.py
Key Functions: create_all_redshift_tables(), insert_all_data_to_redshift()
AWS Redshift requires specialized handling due to its cloud-native architecture:
Table Creation: Uses Redshift-specific SQL DDL statements optimized for columnar storage and distributed architecture.
Data Loading Strategy:
def insert_all_data_to_redshift(conn_or_engine: T_CONN_OR_ENGINE):
for table in Base.metadata.sorted_tables:
insert_data_to_one_table(conn_or_engine=conn_or_engine, table=table)
Dual Loading Methods:
Direct INSERT: Fast for small datasets using parameterized queries
S3 COPY: Scalable for large datasets using S3 staging and COPY commands
Cloud Integration: - S3 Staging: Uses S3 for efficient bulk data loading - IAM Roles: Proper authentication between Redshift and S3 - Parquet Format: Optimized data format for Redshift ingestion
Benefits:
Production-Like: Tests against real Redshift infrastructure
Performance: Optimized loading strategies for different data sizes
Security: Proper IAM role-based authentication
Scalability: Handles both small test datasets and larger validation data
Configuration-Driven Testing: test_adapter.py¶
Purpose: Demonstrate real-world adapter usage with actual databases
Location: mcp_ohmy_sql/tests/test_adapter.py
The test adapter module shows how the complete system integrates:
from .test_config import DatabaseEnum, test_config
test_adapter = Adapter(config=test_config)
# Provision SQLite database
sqlite_database = DatabaseEnum.chinook_sqlite
setup_relational_database(
engine=sqlite_database.connection.sa_engine,
metadata=Base.metadata,
db_type=sqlite_database.db_type_enum,
)
Integration Benefits:
Real Configuration: Uses actual configuration objects from test_config.py
Multiple Databases: Can test across SQLite, PostgreSQL, and Redshift simultaneously
Adapter Testing: Validates the complete configuration → adapter → database pipeline
Production Simulation: Tests the exact same code paths used in production
Infrastructure as Code: aws/stacks/¶
Purpose: Provision real AWS infrastructure for Redshift testing
Location: mcp_ohmy_sql/tests/aws/stacks/mcp_ohmy_sql_stack/
CDK Infrastructure: Stack
The CDK stack provisions complete AWS infrastructure for testing:
Infrastructure Components:
def create_workgroup(self):
self.workgroup = redshiftserverless.CfnWorkgroup(
workgroup_name=self.workgroup_name,
namespace_name=self.namespace_name,
base_capacity=8, # minimal capacity 8 RPUs
publicly_accessible=True,
subnet_ids=subnet_ids,
security_group_ids=[self.sg.security_group_id],
)
Security Configuration: - VPC Integration: Uses existing VPC with proper subnet configuration - Security Groups: Restricts access to developer IP addresses - IAM Roles: Proper permissions for Redshift to access S3
Cost Optimization: - Serverless: Pay-per-use pricing model - Minimal Capacity: 8 RPU minimum for cost control - Automatic Cleanup: CDK removal policies for resource cleanup
Benefits:
Reproducible: Infrastructure defined as code
Secure: Proper network and IAM security
Cost-Effective: Minimal resource usage for testing
Realistic: Production-like infrastructure for validation
Testing Workflow Integration¶
The complete testing workflow demonstrates how all components work together:
Local Development: 1. Run unit tests with in-memory SQLite fixtures 2. Validate against local PostgreSQL container 3. Execute integration tests with real AWS Redshift
CI/CD Pipeline: 1. Provision AWS infrastructure using CDK 2. Run test suite against all database types 3. Clean up infrastructure after testing
Test Categories:
- Unit Tests
Fast tests using in-memory SQLite with fixture-managed lifecycle
- Integration Tests
Medium-speed tests using containerized PostgreSQL for realistic SQL engine behavior
- End-to-End Tests
Comprehensive tests using real AWS Redshift for production validation
- Performance Tests
Load testing using the full Chinook dataset across all database types
Architecture Benefits¶
For Development:
Fast Feedback: In-memory SQLite for rapid unit testing
Realistic Testing: Actual database engines catch real-world issues
Multi-Database: Validates compatibility across different systems
Automated Setup: No manual database configuration required
For Quality Assurance:
Production Parity: Tests against same infrastructure as production
Comprehensive Coverage: All database types and operations tested
Performance Validation: Real query performance measurement
Security Testing: Actual authentication and authorization mechanisms
For Operations:
Infrastructure Validation: CDK ensures consistent environment provisioning
Cost Control: Automated cleanup prevents resource waste
Monitoring: Real AWS metrics and logging
Scalability Testing: Validate performance under realistic loads
The real database testing strategy ensures that the MCP server works reliably across diverse database environments while maintaining development velocity through intelligent use of different testing approaches for different scenarios.