Extract Metadata from AWS Redshift¶
本文详细介绍如何从 AWS Redshift 的 System Table 中提取所有 Schema, Table, Column 的完整元数据信息, 以便将这些结构化数据编码后提供给 LLM, 帮助我们生成更准确的 SQL 查询语句.
Setup Helpers¶
在开始元数据提取工作之前, 我们首先配置必要的工具和连接环境. 这些工具将帮助我们在 Notebook 中便捷地执行 SQL 查询并直观地预览查询结果.
[35]:
import polars as pl
import sqlalchemy as sa
from rich import print as rprint
from rich.console import Console
from rich.syntax import Syntax
import simple_aws_redshift.api as aws_rs
from boto_session_manager import BotoSesManager
from mcp_ohmy_sql.tests.aws.constants import aws_profile, database_name, namespace_name, workgroup_name
接下来创建与 Redshift Serverless 的数据库连接:
[36]:
# Create sqlalchemy engine.
bsm = BotoSesManager(profile_name=aws_profile)
params = aws_rs.RedshiftServerlessConnectionParams.new(
redshift_serverless_client=bsm.redshiftserverless_client,
namespace_name=namespace_name,
workgroup_name=workgroup_name,
)
engine = params.get_engine()
为了提高开发效率, 我们定义一个通用的 SQL 执行和结果展示函数:
[37]:
# 定义一个用来运行 SQL 以及展示结果的 helper 函数
console = Console()
pl.Config(tbl_rows=20)
def run_sql(sql: str)-> pl.DataFrame:
syntax = Syntax(sql, "sql", line_numbers=True)
console.print(syntax)
with engine.connect() as conn:
rows = conn.execute(sa.text(sql)).fetchall()
df = pl.DataFrame(rows)
return df
sql = "SELECT 1 as value;"
run_sql(sql)
1 SELECT 1 as value;
[37]:
| value |
|---|
| i64 |
| 1 |
Explore Key System Tables¶
Redshift 继承了 PostgreSQL 的架构设计, 采用 Schema → Table → Column 的三层数据组织结构. 我们的目标是找到三个核心 SQL 查询, 分别用于获取:
Schema 信息 - 每行代表一个 Schema 及其属性
Table 信息 - 每行代表一个 Table 及其属性
Column 信息 - 每行代表一个 Column 及其详细属性
在编写复杂的元数据查询之前, 我们需要先了解 Redshift 中可用的系统表及其数据结构. 值得注意的是, Redshift 在 PostgreSQL 原有系统表基础上进行了扩展, 添加了许多 _info 后缀的视图 (如 pg_user_info, pg_class_info), 这些视图对 Redshift 用户更加友好, 我们将优先使用这些增强版本.
pg_user_info¶
pg_user_info 是 Amazon Redshift 提供的系统视图, 显示用户信息包括用户 ID 和密码过期时间等详细信息. 需要注意的是, 只有超级用户才能访问 PG_USER_INFO 视图. 对于我们的元数据提取需求, 使用基础的 pg_user 表就足够了, 因为我们主要需要获取用户名信息用于权限过滤.
[38]:
query = """
SELECT id, name
FROM users
WHERE active = 1;
"""
[39]:
sql = """
SELECT
*
FROM pg_user t
LIMIT 10;
"""
run_sql(sql)
1 2 SELECT 3 * 4 FROM pg_user t 5 LIMIT 10; 6
[39]:
| usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig |
|---|---|---|---|---|---|---|---|
| str | i64 | bool | bool | bool | str | str | null |
| "rdsdb" | 1 | true | true | true | "********" | "infinity" | null |
| "admin" | 100 | true | true | false | "********" | null | null |
| "IAMR:cdk-hnb659fds-cfn-exec-ro… | 101 | true | true | false | "********" | null | null |
| "IAM:sanhe" | 102 | false | false | false | "********" | null | null |
pg_description¶
pg_description 表是 PostgreSQL 的标准系统表, 用于存储数据库对象的注释信息 (也称为描述信息). 这些注释通过 SQL 的 COMMENT 语句定义, 相当于将文档直接嵌入到数据库结构中.
[40]:
sql = """
SELECT
*
FROM pg_description t
LIMIT 10;
"""
run_sql(sql)
1 2 SELECT 3 * 4 FROM pg_description t 5 LIMIT 10; 6
[40]:
| objoid | classoid | objsubid | description |
|---|---|---|---|
| i64 | i64 | i64 | str |
| 1242 | 1255 | 0 | "I/O" |
| 1243 | 1255 | 0 | "I/O" |
| 1244 | 1255 | 0 | "I/O" |
| 31 | 1255 | 0 | "I/O" |
| 1245 | 1255 | 0 | "I/O" |
| 33 | 1255 | 0 | "I/O" |
| 34 | 1255 | 0 | "I/O" |
| 35 | 1255 | 0 | "I/O" |
| 38 | 1255 | 0 | "I/O" |
| 39 | 1255 | 0 | "I/O" |
我们可以通过 JOIN 操作将 pg_description 与其他系统表结合, 获取带有描述信息的完整元数据:
[41]:
sql = """
-- oid is the hidden database object id
SELECT
pg_namespace.oid as object_id,
pg_namespace.nspname as schema_name,
pg_description.description as description
FROM pg_namespace
JOIN pg_description
ON pg_namespace.oid = pg_description.objoid
WHERE
pg_namespace.nspname NOT IN ('information_schema', 'catalog_history')
AND pg_namespace.nspname NOT LIKE 'pg_%'
LIMIT 20;
"""
run_sql(sql)
1 2 -- oid is the hidden database object id 3 SELECT 4 pg_namespace.oid as object_id, 5 pg_namespace.nspname as schema_name, 6 pg_description.description as description 7 FROM pg_namespace 8 JOIN pg_description 9 ON pg_namespace.oid = pg_description.objoid 10 WHERE 11 pg_namespace.nspname NOT IN ('information_schema', 'catalog_history') 12 AND pg_namespace.nspname NOT LIKE 'pg_%' 13 LIMIT 20; 14
[41]:
| object_id | schema_name | description |
|---|---|---|
| i64 | str | str |
| 2200 | "public" | "Standard public schema" |
pg_namespace¶
pg_namespace 是 PostgreSQL 的核心系统表. 在数据库理论中, namespace 是 SQL schema 的底层实现结构, 每个 namespace 可以包含独立的关系 (表), 类型等对象, 且不会产生命名冲突. 重要特征:
来源: PostgreSQL 原生系统表
用户友好性: 是, 自动根据当前用户权限过滤结果
数据粒度: 表中每一行代表一个 Schema
[42]:
sql = """
SELECT *
FROM pg_namespace t
LIMIT 20;
"""
df = run_sql(sql)
df
1 2 SELECT * 3 FROM pg_namespace t 4 LIMIT 20; 5
[42]:
| nspname | nspowner | nspacl |
|---|---|---|
| str | i64 | str |
| "pg_toast" | 1 | null |
| "pg_internal" | 1 | null |
| "catalog_history" | 1 | null |
| "pg_mv" | 1 | null |
| "pg_automv" | 1 | null |
| "pg_auto_copy" | 1 | null |
| "pg_s3" | 1 | null |
| "pg_catalog" | 1 | "{rdsdb=UCDA/rdsdb,=U/rdsdb}" |
| "public" | 1 | "{rdsdb=UCDA/rdsdb,=UC/rdsdb}" |
| "information_schema" | 1 | "{rdsdb=UCDA/rdsdb,=U/rdsdb}" |
为了获取用户自定义的 Schema (排除系统 Schema), 我们使用以下过滤条件:
[43]:
sql = """
-- oid is the hidden database object id
SELECT
t.oid as object_id,
t.*
FROM pg_namespace t
WHERE
t.nspname NOT IN ('information_schema', 'catalog_history')
AND t.nspname NOT LIKE 'pg_%'
LIMIT 20;
"""
run_sql(sql)
1 2 -- oid is the hidden database object id 3 SELECT 4 t.oid as object_id, 5 t.* 6 FROM pg_namespace t 7 WHERE 8 t.nspname NOT IN ('information_schema', 'catalog_history') 9 AND t.nspname NOT LIKE 'pg_%' 10 LIMIT 20; 11
[43]:
| object_id | nspname | nspowner | nspacl |
|---|---|---|---|
| i64 | str | i64 | str |
| 2200 | "public" | 1 | "{rdsdb=UCDA/rdsdb,=UC/rdsdb}" |
这个 SQL 就是我们要的, 给出了所有的 Schema 和 Description.
[44]:
sql = """
SELECT
pg_namespace.nspname as schema_name,
pg_description.description as description
FROM pg_namespace
JOIN pg_description
ON pg_namespace.oid = pg_description.objoid
WHERE
pg_namespace.nspname NOT IN ('information_schema', 'catalog_history')
AND pg_namespace.nspname NOT LIKE 'pg_%'
;
"""
run_sql(sql)
1 2 SELECT 3 pg_namespace.nspname as schema_name, 4 pg_description.description as description 5 FROM pg_namespace 6 JOIN pg_description 7 ON pg_namespace.oid = pg_description.objoid 8 WHERE 9 pg_namespace.nspname NOT IN ('information_schema', 'catalog_history') 10 AND pg_namespace.nspname NOT LIKE 'pg_%' 11 ; 12
[44]:
| schema_name | description |
|---|---|
| str | str |
| "public" | "Standard public schema" |
pg_class_info¶
PG_CLASS_INFO 是 Amazon Redshift 专有的增强视图, 包含表的创建时间和当前分布样式等详细信息. 重要特征:
来源: Redshift 专有 (非 PostgreSQL 原生)
用户友好性: 是, 只显示当前用户有权限访问的表
数据粒度: 表中每一行代表一个 Table 及其属性
[45]:
sql = """
SELECT *
FROM PG_CLASS_INFO
LIMIT 5;
"""
run_sql(sql)
1 2 SELECT * 3 FROM PG_CLASS_INFO 4 LIMIT 5; 5
[45]:
| reloid | relname | relnamespace | reltype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid | relhasindex | relisshared | relkind | relnatts | relexternid | relisreplicated | relispinned | reldiststyle | relprojbaseid | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | relhasrules | relhassubclass | relacl | releffectivediststyle | relcreationtime |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | f64 | i64 | i64 | bool | bool | str | i64 | i64 | bool | bool | i64 | i64 | i64 | i64 | i64 | i64 | i64 | bool | bool | bool | bool | null | i64 | null |
| 376 | "pg_xactlock" | 11 | 0 | 1 | 0 | 0 | 1664 | 0 | 0.0 | 0 | 0 | false | true | "s" | 1 | 4294967295 | false | false | 0 | 0 | 0 | 0 | 0 | 0 | 0 | false | false | false | false | null | 0 | null |
| 16774 | "pg_aggregate_fnoid_index" | 11 | 0 | 1 | 403 | 16774 | 0 | 0 | 0.0 | 0 | 0 | false | false | "i" | 1 | 4294967295 | false | false | 0 | 0 | 0 | 0 | 0 | 0 | 0 | false | false | false | false | null | 0 | null |
| 16775 | "pg_am_name_index" | 11 | 0 | 1 | 403 | 16775 | 0 | 0 | 0.0 | 0 | 0 | false | false | "i" | 1 | 4294967295 | false | false | 0 | 0 | 0 | 0 | 0 | 0 | 0 | false | false | false | false | null | 0 | null |
| 16776 | "pg_am_oid_index" | 11 | 0 | 1 | 403 | 16776 | 0 | 0 | 0.0 | 0 | 0 | false | false | "i" | 1 | 4294967295 | false | false | 0 | 0 | 0 | 0 | 0 | 0 | 0 | false | false | false | false | null | 0 | null |
| 16777 | "pg_amop_opc_strat_index" | 11 | 0 | 1 | 403 | 16777 | 0 | 0 | 0.0 | 0 | 0 | false | false | "i" | 3 | 4294967295 | false | false | 0 | 0 | 0 | 0 | 0 | 0 | 0 | false | false | false | false | null | 0 | null |
通过与 pg_namespace 进行 JOIN 操作, 我们可以获得完整的 schema_name.table_name 信息, 并且包含 Redshift 特有的分布键信息:
[46]:
sql = """
SELECT
pg_class_info.reloid AS table_id,
trim(pg_namespace.nspname) AS schema_name,
trim(pg_class_info.relname) AS table_name,
pg_class_info.reldiststyle,
pg_class_info.releffectivediststyle,
CASE WHEN pg_class_info.reldiststyle = 0 THEN 'EVEN'::text
WHEN pg_class_info.reldiststyle = 1 THEN 'KEY'::text
WHEN pg_class_info.reldiststyle = 8 THEN 'ALL'::text
WHEN pg_class_info.releffectivediststyle = 10 THEN 'AUTO(ALL)'::text
WHEN pg_class_info.releffectivediststyle = 11 THEN 'AUTO(EVEN)'::text
WHEN pg_class_info.releffectivediststyle = 12 THEN 'AUTO(KEY)'::text ELSE '<<UNKNOWN>>'::text END AS diststyle,
pg_class_info.relcreationtime AS creation_time,
pg_class_info.relowner AS owner
FROM pg_class_info
LEFT JOIN pg_namespace
ON pg_class_info.relnamespace = pg_namespace.oid
WHERE
pg_namespace.nspname NOT IN ('information_schema', 'catalog_history')
AND pg_namespace.nspname NOT LIKE 'pg_%'
LIMIT 100;
"""
run_sql(sql)
1 2 SELECT 3 pg_class_info.reloid AS table_id, 4 trim(pg_namespace.nspname) AS schema_name, 5 trim(pg_class_info.relname) AS table_name, 6 pg_class_info.reldiststyle, 7 pg_class_info.releffectivediststyle, 8 CASE WHEN pg_class_info.reldiststyle = 0 THEN 'EVEN'::text 9 WHEN pg_class_info.reldiststyle = 1 THEN 'KEY'::text 10 WHEN pg_class_info.reldiststyle = 8 THEN 'ALL'::text 11 WHEN pg_class_info.releffectivediststyle = 10 THEN 'AUTO(ALL)'::text 12 WHEN pg_class_info.releffectivediststyle = 11 THEN 'AUTO(EVEN)'::text 13 WHEN pg_class_info.releffectivediststyle = 12 THEN 'AUTO(KEY)'::text ELSE '<<UNKNOWN>>'::text END AS d 14 pg_class_info.relcreationtime AS creation_time, 15 pg_class_info.relowner AS owner 16 FROM pg_class_info 17 LEFT JOIN pg_namespace 18 ON pg_class_info.relnamespace = pg_namespace.oid 19 WHERE 20 pg_namespace.nspname NOT IN ('information_schema', 'catalog_history') 21 AND pg_namespace.nspname NOT LIKE 'pg_%' 22 LIMIT 100; 23
[46]:
| table_id | schema_name | table_name | reldiststyle | releffectivediststyle | diststyle | creation_time | owner |
|---|---|---|---|---|---|---|---|
| i64 | str | str | i64 | i64 | str | datetime[μs] | i64 |
| 135770 | "public" | "artist_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135774 | "public" | "album_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135782 | "public" | "genre_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135786 | "public" | "mediatype_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135790 | "public" | "track_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135772 | "public" | "album" | 1 | 1 | "KEY" | 2025-06-16 20:18:56.956046 | 102 |
| 135784 | "public" | "mediatype" | 8 | 8 | "ALL" | 2025-06-16 20:18:57.083479 | 102 |
| 135780 | "public" | "genre" | 8 | 8 | "ALL" | 2025-06-16 20:18:57.024100 | 102 |
| 135806 | "public" | "playlist_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135810 | "public" | "playlisttrack_pkey" | 0 | 0 | "EVEN" | null | 102 |
| … | … | … | … | … | … | … | … |
| 135830 | "public" | "customer_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135820 | "public" | "employee" | 8 | 8 | "ALL" | 2025-06-16 20:18:57.340479 | 102 |
| 135838 | "public" | "invoice_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135828 | "public" | "customer" | 1 | 1 | "KEY" | 2025-06-16 20:18:57.409248 | 102 |
| 135846 | "public" | "invoiceline_pkey" | 0 | 0 | "EVEN" | null | 102 |
| 135836 | "public" | "invoice" | 1 | 1 | "KEY" | 2025-06-16 20:18:57.477920 | 102 |
| 135844 | "public" | "invoiceline" | 1 | 1 | "KEY" | 2025-06-16 20:18:57.536679 | 102 |
| 135788 | "public" | "track" | 1 | 1 | "KEY" | 2025-06-16 20:18:57.143263 | 102 |
| 135856 | "public" | "albumsalesstats" | 0 | 0 | "EVEN" | null | 102 |
| 135768 | "public" | "artist" | 8 | 8 | "ALL" | 2025-06-16 20:18:56.862195 | 102 |
这个 SQL 就是我们要的, 给出了所有的 Table, Distribution Key 信息, 和 Owner 信息
[47]:
sql = """
SELECT
trim(pg_namespace.nspname) AS schema_name,
trim(pg_class_info.relname) AS table_name,
CASE WHEN pg_class_info.reldiststyle = 0 THEN 'EVEN'::text
WHEN pg_class_info.reldiststyle = 1 THEN 'KEY'::text
WHEN pg_class_info.reldiststyle = 8 THEN 'ALL'::text
WHEN pg_class_info.releffectivediststyle = 10 THEN 'AUTO(ALL)'::text
WHEN pg_class_info.releffectivediststyle = 11 THEN 'AUTO(EVEN)'::text
WHEN pg_class_info.releffectivediststyle = 12 THEN 'AUTO(KEY)'::text ELSE '<<UNKNOWN>>'::text END AS diststyle,
pg_user.usename AS owner_name,
pg_description.description as description
FROM pg_class_info
LEFT JOIN pg_namespace
ON pg_class_info.relnamespace = pg_namespace.oid
LEFT JOIN pg_description
ON pg_namespace.oid = pg_description.objoid
LEFT JOIN pg_user
ON pg_class_info.relowner = pg_user.usesysid
WHERE
pg_namespace.nspname NOT IN ('information_schema', 'catalog_history')
AND pg_namespace.nspname NOT LIKE 'pg_%'
LIMIT 100;
"""
run_sql(sql)
1 2 SELECT 3 trim(pg_namespace.nspname) AS schema_name, 4 trim(pg_class_info.relname) AS table_name, 5 CASE WHEN pg_class_info.reldiststyle = 0 THEN 'EVEN'::text 6 WHEN pg_class_info.reldiststyle = 1 THEN 'KEY'::text 7 WHEN pg_class_info.reldiststyle = 8 THEN 'ALL'::text 8 WHEN pg_class_info.releffectivediststyle = 10 THEN 'AUTO(ALL)'::text 9 WHEN pg_class_info.releffectivediststyle = 11 THEN 'AUTO(EVEN)'::text 10 WHEN pg_class_info.releffectivediststyle = 12 THEN 'AUTO(KEY)'::text ELSE '<<UNKNOWN>>'::text END AS d 11 pg_user.usename AS owner_name, 12 pg_description.description as description 13 FROM pg_class_info 14 LEFT JOIN pg_namespace 15 ON pg_class_info.relnamespace = pg_namespace.oid 16 LEFT JOIN pg_description 17 ON pg_namespace.oid = pg_description.objoid 18 LEFT JOIN pg_user 19 ON pg_class_info.relowner = pg_user.usesysid 20 WHERE 21 pg_namespace.nspname NOT IN ('information_schema', 'catalog_history') 22 AND pg_namespace.nspname NOT LIKE 'pg_%' 23 LIMIT 100; 24
[47]:
| schema_name | table_name | diststyle | owner_name | description |
|---|---|---|---|---|
| str | str | str | str | str |
| "public" | "artist_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "album_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "genre_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "mediatype_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "track_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "album" | "KEY" | "IAM:sanhe" | "Standard public schema" |
| "public" | "mediatype" | "ALL" | "IAM:sanhe" | "Standard public schema" |
| "public" | "genre" | "ALL" | "IAM:sanhe" | "Standard public schema" |
| "public" | "playlist_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "playlisttrack_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| … | … | … | … | … |
| "public" | "customer_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "employee" | "ALL" | "IAM:sanhe" | "Standard public schema" |
| "public" | "invoice_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "customer" | "KEY" | "IAM:sanhe" | "Standard public schema" |
| "public" | "invoiceline_pkey" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "invoice" | "KEY" | "IAM:sanhe" | "Standard public schema" |
| "public" | "invoiceline" | "KEY" | "IAM:sanhe" | "Standard public schema" |
| "public" | "track" | "KEY" | "IAM:sanhe" | "Standard public schema" |
| "public" | "albumsalesstats" | "EVEN" | "IAM:sanhe" | "Standard public schema" |
| "public" | "artist" | "ALL" | "IAM:sanhe" | "Standard public schema" |
pg_table_def¶
pg_table_def 是 Redshift 中最重要的元数据表之一, 包含了所有表和列的详细定义信息. 重要特征:
来源: Redshift 专有 (非 PostgreSQL 原生)
用户友好性: 是, 只显示当前用户有权限访问的表
数据粒度: 表中每一行代表一个 Column 的详细信息
这个表是我们进行列级元数据提取的核心数据源, 下面这个 Query 就已经够用了.
[48]:
sql = """
SELECT
td.schemaname AS schema_name,
td.tablename AS table_name,
td.column AS column_name,
td.type AS column_type,
td.encoding AS column_encoding,
td.distkey AS is_column_a_distkey,
td.sortkey AS sortkey_position,
td.notnull AS is_column_notnull
FROM pg_table_def td
WHERE td.schemaname NOT IN ('pg_catalog', 'information_schema')
LIMIT 20;
"""
run_sql(sql)
1 2 SELECT 3 td.schemaname AS schema_name, 4 td.tablename AS table_name, 5 td.column AS column_name, 6 td.type AS column_type, 7 td.encoding AS column_encoding, 8 td.distkey AS is_column_a_distkey, 9 td.sortkey AS sortkey_position, 10 td.notnull AS is_column_notnull 11 FROM pg_table_def td 12 WHERE td.schemaname NOT IN ('pg_catalog', 'information_schema') 13 LIMIT 20; 14
[48]:
| schema_name | table_name | column_name | column_type | column_encoding | is_column_a_distkey | sortkey_position | is_column_notnull |
|---|---|---|---|---|---|---|---|
| str | str | str | str | str | bool | i64 | bool |
| "public" | "album" | "albumid" | "integer" | "none" | false | 1 | true |
| "public" | "album" | "title" | "character varying(255)" | "lzo" | false | 0 | true |
| "public" | "album" | "artistid" | "integer" | "none" | true | 2 | true |
| "public" | "albumsalesstats" | "albumid" | "integer" | "none" | false | 0 | false |
| "public" | "albumsalesstats" | "albumtitle" | "character varying(255)" | "none" | false | 0 | false |
| "public" | "albumsalesstats" | "artistname" | "character varying(255)" | "none" | false | 0 | false |
| "public" | "albumsalesstats" | "totalsales" | "integer" | "none" | false | 0 | false |
| "public" | "albumsalesstats" | "totalquantity" | "integer" | "none" | false | 0 | false |
| "public" | "albumsalesstats" | "totalrevenue" | "numeric(10,2)" | "none" | false | 0 | false |
| "public" | "albumsalesstats" | "avgtrackprice" | "numeric(10,2)" | "none" | false | 0 | false |
| "public" | "albumsalesstats" | "tracksinalbum" | "integer" | "none" | false | 0 | false |
| "public" | "artist" | "artistid" | "integer" | "none" | false | 1 | true |
| "public" | "artist" | "name" | "character varying(255)" | "lzo" | false | 0 | false |
| "public" | "customer" | "customerid" | "integer" | "none" | true | 1 | true |
| "public" | "customer" | "firstname" | "character varying(255)" | "lzo" | false | 0 | true |
| "public" | "customer" | "lastname" | "character varying(255)" | "lzo" | false | 0 | true |
| "public" | "customer" | "company" | "character varying(255)" | "lzo" | false | 0 | false |
| "public" | "customer" | "address" | "character varying(500)" | "lzo" | false | 0 | false |
| "public" | "customer" | "city" | "character varying(100)" | "none" | false | 3 | false |
| "public" | "customer" | "state" | "character varying(100)" | "lzo" | false | 0 | false |
Summary¶
通过探索这些关键的系统表, 我们现在掌握了从 Redshift 中提取完整元数据的方法:
Schema 层级: 使用 pg_namespace 获取所有用户定义的 Schema
Table 层级: 使用 pg_class_info 结合 pg_namespace 获取表的完整信息
Column 层级: 使用 pg_table_def 获取列的详细定义
这些查询的结果可以被结构化处理后提供给 LLM, 帮助生成更准确和上下文相关的 SQL 语句. 在后续的实现中, 我们将基于这些基础查询构建完整的元数据提取流程.
[ ]: