.. _schema-encoding: Schema Encoding ============================================================================== This proof of concept (POC) explores different methods of encoding database schema information to optimize its comprehension by large language models (LMs). We evaluated three encoding formats: 1. **Hierarchy-encoded format**: A custom-designed structure that maximizes token efficiency while preserving all schema details. 2. **Formatted JSON (human-readable)**: A standard indented JSON format that is easy for humans to read and understand. 3. **Compact JSON (machine-optimized)**: A minified, single-line JSON format optimized for machine parsing. We used `OpenAI's Tokenizer `_ to calculate the number of tokens generated by each format and compared their sizes: Token and Character Comparison: ==================== ======= ============ Encoding Format Tokens Characters ==================== ======= ============ hierarchy 596 1980 json formatted 8241 49581 json compact 6437 22319 ==================== ======= ============