schematools.factories module

Module for SQLAlchemy-based database table creation.

schematools.factories.index_factory(dataset_table: DatasetTableSchema, metadata: MetaData | None = None, db_table_name: str | None = None, db_schema_name: str | None = None, is_versioned_dataset: bool = False) → dict[str, list[Index]]

Generates one or more SQLAlchemy Index objects to work with the JSON Schema.

Parameters

dataset_table – The Amsterdam Schema definition of the table
metadata – SQLAlchemy schema metadata that groups all tables to a single connection.
db_table_name – Optional table name, which is otherwise inferred from the schema name.
db_schema_name – Optional database schema name, which is otherwise None and defaults to “public”
is_versioned_dataset – Indicate whether the indices should be created in a private DB schema with a version in their name. See also: BaseImporter.is_versioned_dataset. The private schema name will be derived from the dataset ID, unless overridden by the db_schema_name parameter.

The returned Index objects are grouped by table names.

schematools.factories.tables_factory(dataset: DatasetSchema, metadata: MetaData | None = None, db_table_names: dict[str, str | None] | None = None, db_schema_names: dict[str, str | None] | None = None, limit_tables_to: set | None = None, is_versioned_dataset: bool = False) → dict[str, Table]

Generate the SQLAlchemy Table objects base on a DatasetSchema definition.

Parameters

dataset – The Amsterdam Schema definition of the dataset
metadata – SQLAlchemy schema metadata that groups all tables to a single connection.
db_table_names – Optional sql table names, keyed on dataset_table_id. If not give, db_table_names are inferred from the schema name.
db_schema_names – Optional database schema names, keyed on dataset_table_id. If not given, schema names default to public.
limit_tables_to – Only process the indicated tables (based on table.id).
is_versioned_dataset – Indicate whether the tables should be created in a private DB schema with a version in their name. See also: BaseImporter.is_versioned_dataset. The private schema name will be derived from the dataset ID, unless overridden by the db_schema_name parameter.

The returned tables are keyed on the name of the dataset and table. SA Table objects are also created for the junction tables that are needed for relations.

The nested and through tables that are generated on-the-fly are also taken into account. Special care is needed to add postfixes to tables names if the parent table of these tables has a non-default name. In that case, these on-the-fly tables also need to get an extra postfix in their db name.

One caveat: The assumption now is that special “overridden” names for tables only are used because postfixes are added, and for no other reasons.

schematools.factories.views_factory(dataset: DatasetSchema, tables: dict[str, sqlalchemy.sql.schema.Table]) → dict[str, psycopg2.sql.SQL]

Create VIEW statements.

The views these statements define are there to provide the illusion that the tables they wrap are:

not versioned
do live in the “public” schema

This illusion is needed for backwards compatability reasons.

Parameters

dataset – The dataset currently being processed
tables – The tables as generated by tables_factory()

Returns: A dict mapping table names to VIEW statements.