schematools.types module

Python types for the Amsterdam Schema JSON file contents.

class schematools.types.AdditionalRelationSchema(_id: str, _parent_table: DatasetTableSchema | None = None, **kwargs)

Bases: DatasetType

Data class describing the additional relation block.

__init__(_id: str, _parent_table: DatasetTableSchema | None = None, **kwargs)

property format

“summary” or “embedded”.

Type: Format

property id

is_reverse_relation(field: DatasetFieldSchema): See whether this relation

property name: str

property parent_table

property python_name: str

property related_field: DatasetFieldSchema: Return the field this reverse relation queries to find objects.

property related_table: DatasetTableSchema: Return the table this relation references.

property relation: str: Relation identifier.

class schematools.types.DatasetFieldSchema(*args: Any, _parent_table: DatasetTableSchema | None, _parent_field: DatasetFieldSchema | None = None, _required: bool = False, _temporal_range: bool = False, **kwargs: Any)

Bases: DatasetType

A single field (column) in a table.

__init__(*args: Any, _parent_table: DatasetTableSchema | None, _parent_field: DatasetFieldSchema | None = None, _required: bool = False, _temporal_range: bool = False, **kwargs: Any) → None

property auth: frozenset[str]: Auth of the field, or OPENBAAR.

property crs: str | None: CRS for this field, or None if not a geo field.

property db_name: str: Return the name that is being used in the database. This can be a different name then the internal name when the field is a relation, or has a short-name.

property description: str | None

property faker: str | None: Return faker name and properties used for mocking data.

property field_items: Json | None: Return the item definition for an array type.

property format: str | None

get_field_by_id = <methodtools._LruCacheWire object>

property has_shortname: bool

Reports whether this field has a shortname.

You should never have to call this: name returns the shortname, if present.

property id: str

The id of a field uniquely identifies it among the fields of a table.

Note that comparisons against id should be avoided when fields are retrieved using .get_fields(include_subfields=True). In such case, a subfield with a similar ID will match with the top-level field.

property is_array: bool: Checks if field is an array field.

property is_array_of_objects: bool: Checks if field is an array of objects.

property is_array_of_scalars: bool: Checks if field is an array of scalars.

property is_autoincrement: bool

property is_composite_key: Tell whether the relation uses a composite key

property is_geo: bool: Tell whether the field references a geo object.

property is_identifier_part: bool: Tell whether the field is part of the composite primary key

property is_loose_relation: Determine if relation is loose or not.

property is_nested_object: bool: Checks if field is a possible nested object definition.

property is_nested_table: bool: Checks if field is a possible nested table.

property is_object: bool: Tell whether the field references an object. This might also be a relation, with a composite key. In both cases, the object subfields could be inlined in the main SQL table. See also: is_nested_object and is_composite_key.

property is_primary: bool: When name is ‘id’ the field should be the primary key For composite keys (table.identifier has > 1 item), an ‘id’ field is autogenerated.

property is_relation_temporal: Tell whether the 1-N relationship is modelled by an intermediate table. This allows tracking multiple versions of the relationship.

property is_scalar: bool: Tell whether the field is a scalar.

property is_subfield: bool: Tell whether this field is part of an embedded object (e.g. temporal relation)

property is_temporal_range: bool: Tell whether the field is used to store the range of a temporal dimension. (e.g. beginGeldigheid or eindGeldigheid).

property is_through_table: bool

Checks if field is a possible through table.

NM tables always are through tables. For 1N tables, there is a through tables if the target of the relation is temporal.

property multipleof: float | None

property name: str: The name as it is shown to the external world, camel-cased. In general, the “id” field is already camel-cased, but in case that didn’t happen this property will fix that.

property nested_table: DatasetTableSchema | None: Access the nested table that this field needs to store its data.

property nm_relation: str | None

M relation, if it exists (called M2M in Django).

Type: Give the N

property parent_field: DatasetFieldSchema | None: Provide access to the top-level field where it is a property for.

property provenance: str | None: Get the provenance info, if available, or None.

property python_name: str: The name as its used internally in Python or an ORM, snake cased

property qualified_id: str: The fully qualified ID (for debugging)

property related_field_ids: list[str] | None

For a relation field, returns the identifiers of the referenced fields.

The returned list contains only the fields, e.g., [“id”, “volgnummer”]. These are fields on the table self.related_table.

For loose relations, it will only return the first field of the related table.

If self is not a relation field, the return value is None.

property related_fields: list[DatasetFieldSchema] | None: Convenience property that returns the related field schemas.

property related_table: DatasetTableSchema | None: If this field is a relation, return the table this relation references.

property relation: str | None

N relation, if it exists.

Type: Give the 1

property required: bool

property reverse_relation: AdditionalRelationSchema | None

Find the opposite description of a relation.

When there is a relation, this only returns a description when the linked table also describes the other end of relationship.

property shortname: str: The shorter name if present, otherwise the ID. Note this is only used to generate human-readable database table names.

property srid: int | None: The integer value for the spatial reference ID (for geometry fields).

property subfields: list[schematools.types.DatasetFieldSchema]

Return the subfields for a nested structure.

For a nested object, fields are based on its properties, for an array of objects, fields are based on the properties of the “items” field.

When subfields are added as part of an 1m-relation those subfields need to be prefixed with the name of the relation field. However, this is not the case for the so-called dimension fields of a temporal relation (e.g. beginGeldigheid and eindGeldigheid).

If self is not an object or array, the return value is an empty iterator.

property table: DatasetTableSchema | None: The table that this field is a part of

property through_table: DatasetTableSchema | None: Access the through table that this fields needs to store its data.

property title: str | None: Title of the field.

property type: str

Returns the type of this field.

The type is one of the JSON Schema types “string”, “integer”, “number”, “object”, “array” or “boolean”, or the URL of a schema defining a type (for geo types). “null” is never used by Amsterdam Schemas.

Dates and URLs have type “string”. Check the format to distinguish them from free-form text.

See https://schemas.data.amsterdam.nl/docs/ams-schema-spec.html#data-types for details.

class schematools.types.DatasetSchema(data: dict, dataset_collection: CachedSchemaLoader | None = None)

Bases: SchemaType

The schema of a dataset.

This is a collection of JSON Schema’s within a single file.

class Status(value)

Bases: Enum

The allowed status values according to the Amsterdam schema spec.

beschikbaar = 'beschikbaar'

niet_beschikbaar = 'niet_beschikbaar'

__init__(data: dict, dataset_collection: CachedSchemaLoader | None = None) → None

When initializing a datasets, a cache of related datasets can be added (at classlevel). Thus, we are able to get (temporal) info about the related datasets.

Parameters

data – The JSON data from the file.
dataset_collection – The shared collection that the dataset should become part of. This is used to resolve relations between different datasets.

property auth: frozenset[str]: Auth of the dataset, or OPENBAAR.

build_nested_table(field: DatasetFieldSchema) → DatasetTableSchema: Construct an in-line table object for a nested field.

build_through_table(field: DatasetFieldSchema) → DatasetTableSchema

Build the through table.

The through tables are not defined separately in a schema. The fact that a M2M relation needs an extra table is an implementation aspect. However, the through (aka. junction) table schema is needed for the dynamic model generation and for data-importing.

FK relations also have an additional through table, because the temporal information of the relation needs to be stored somewhere.

For relations with an object-type definition of the relation, the fields for the source and target side of the relation are stored separately in the through table. E.g. for a M2M relation like this:

“bestaatUitBuurten”: {
“type”: “array”, “items”: {

“type”: “object”, “properties”: {

“identificatie”: {
“type”: “string”

}, “volgnummer”: {

“type”: “integer”

}

}

}, “relation”: “gebieden:buurten”, “description”: “De buurten waaruit het object bestaat.”

}

The through table has the following fields:

ggwgebieden_id
buurten_id
ggwgebieden_identificatie
ggwgebieden_volgnummer
bestaat_uit_buurten_identificatie
bestaat_uit_buurten_volgnummer

property default_version: str: Default version for this schema.

property description: str | None: Description of the dataset (if set).

classmethod from_dict(obj: dict[str, Any], dataset_collection: CachedSchemaLoader | None = None) → DatasetSchema: Parses given dict and validates the given schema.

get_table_by_id = <methodtools._LruCacheWire object>

get_tables(include_nested: bool = False, include_through: bool = False) → list[schematools.types.DatasetTableSchema]: List tables, including nested.

property identifier: str: Which fields acts as identifier. (default is Django “pk” field).

property is_default_version: bool: Is this Default Dataset version. Defaults to True, in order to stay backwards compatible.

json(inline_tables: bool = False) → str: Overwritten JSON logic that inlines tables by default.

json_data(inline_tables: bool = False) → Union[str, int, float, bool, None, Dict[str, Any], List[Any]]: Overwritten logic that inlines tables

property license: str | None: The license of the table as stated in the schema.

property nested_tables: list[schematools.types.DatasetTableSchema]: Access list of nested tables.

property python_name: str: The ‘id’, but camel cased like a class name.

property related_dataset_schema_ids: set[str]

Access the list or related schema ids.

This property calculates the related data that are needed, so the users of this dataset can preload these datasets. This can also include the current dataset, for relations that point to other tables within the same dataset.

property status: Status

property table_versions: dict[str, schematools.types.TableVersions]: Access different versions of the table, as mentioned in the dataset file.

property tables: list[schematools.types.DatasetTableSchema]: Access the tables within the file.

property through_tables: list[schematools.types.DatasetTableSchema]: Access list of through_tables, for n-m relations.

property title: str | None: Title of the dataset (if set)

property version: str: Dataset Schema version.

class schematools.types.DatasetTableSchema(*args: Any, parent_schema: DatasetSchema, _parent_table: DatasetTableSchema | None = None, nested_table: bool = False, through_table: bool = False, **kwargs: Any)

Bases: SchemaType

The table within a dataset. This table definition follows the JSON Schema spec.

This class has an id property (inherited from SchemaType) to uniquely address this dataset-table in the scope of the DatasetSchema. This id is used in lots of places in the dynamic model generation in Django.

There is also a db_name method, that is used for the auto-generation of database table names. This also reads the shortname, to define a human-readable abbreviation that fits inside the maximum database table name length.

__init__(*args: Any, parent_schema: DatasetSchema, _parent_table: DatasetTableSchema | None = None, nested_table: bool = False, through_table: bool = False, **kwargs: Any) → None

property additional_relations: list[schematools.types.AdditionalRelationSchema]

Fetch list of additional (backwards or N-N) relations.

This is a dictionary of names for existing forward relations in other tables with either the ‘embedded’ or ‘summary’ property

property auth: frozenset[str]: Auth of the table, or OPENBAAR.

property crs: str

property dataset: DatasetSchema: The dataset that this table is part of.

property db_name: str

Return the standard database name for the table.

For some custom situations (e.g. importer, or handling table versions), use db_name_variant().

db_name_variant(*, with_dataset_prefix: bool = True, with_version: bool = False, postfix: str = '', check_assert: bool = True) → str

Return derived table name for DB usage.

Parameters

with_dataset_prefix – if True, include dataset ID as a prefix to the table name.
with_version – if True, include the major and minor version number in the table name.
postfix – An optional postfix to append to the table name
check_assert – Check max table length name. Can be turned of to have the check done by validation code (with much better error reporting.)

Returns

A derived table name suitable for DB usage.

property description: str | None: The description of the table as stated in the schema.

property display_field: DatasetFieldSchema | None: Tell which fields can be used as display field.

property fields: list[schematools.types.DatasetFieldSchema]

All the fields of the table.

This returns the direct fields that are part of the table. Fields that have “type=object” can define nested fields, which are not included here. These fields can either be read using field.subfields, or be inlined using get_fields(include_subfields=True).

classmethod from_dict(obj: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) → DTS

get_additional_relation_by_id = <methodtools._LruCacheWire object>

get_dataset_schema(dataset_id: str) → DatasetSchema | None: Return the associated parent datasetschema for this table.

get_field_by_id = <methodtools._LruCacheWire object>

get_fields(include_subfields: bool = False) → Iterator[DatasetFieldSchema]

Get the fields for this table.

Parameters: include_subfields – This includes the subfields of object fields, so those can be inlined in the main table. This is useful for ORM and SQL databases, that can’t support a nested structure.

get_reverse_relation(field: DatasetFieldSchema) → AdditionalRelationSchema | None: Find the description of a reverse relation for a field.

property has_composite_key: bool: Tell whether the table uses multiple attributes together as it’s identifier.

property has_geometry_fields: bool

property has_parent_table: bool: For nested or through tables, there is a parent table.

property has_shortname: bool

property identifier: list[str]: The main identifier field, if there is an identifier field available. Default to “id” for existing schemas without an identifier field.

property identifier_fields: list[schematools.types.DatasetFieldSchema]: Return the field schema’s for the identifier fields.

property is_autoincrement: bool: Return bool indicating autoincrement behaviour of the table identifier.

property is_nested_table: bool: Indicates if table is a nested table.

property is_temporal: bool: Indicates if this is a table with temporal characteristics

property is_through_table: bool

m relation table) or base table.

Type: Indicate if table is an intersection table (n

property main_geometry: str: The main geometry field, if there is a geometry field available. Default to “geometry” for existing schemas without a mainGeometry field.

property main_geometry_field: DatasetFieldSchema: The main geometry as field object

property parent_table: DatasetTableSchema | None

The parent table of this table.

For nested and through tables, the parent table is available.

property parent_table_field: DatasetFieldSchema | None: Provide the NM-relation that generated this through table.

property python_name: str: The ‘id’, but camel cased like a class name.

property qualified_id: str: The fully qualified ID (for debugging)

property related_dataset_ids: set[str]

Tell which dataset ID’s relations point to.

This can also include the current dataset, for relations that point to other tables within the same dataset.

property shortname: str: The shorter name if present, otherwise the ID. This is only used to generate human-readable database table names.

property temporal: Temporal | None

The temporal property of a Table. Describes validity of objects for tables where different versions of objects are valid over time.

Temporal has an identifier property that refers to the attribute of objects in the table that uniquely identifies a specific version of an object from among other versions of the same object.

Temporal also has a dimensions property, which gives the attributes of objects that determine for what (time)period an object is valid.

property through_fields: tuple[DatasetFieldSchema, DatasetFieldSchema] | None

Return the left and right side of an M2M through table.

This only returns results when the table describes the intermediate table of an M2M relation (is_through_table is true).

property title: str | None: Title of the table.

validate(row: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) → None: Validate a record against the schema.

property version: SemVer: Get table version.

class schematools.types.DatasetType(dict=None, /, **kwargs)

Bases: JsonDict

Base class for child elements of the schema.

class schematools.types.Faker(name: str, properties: dict[str, typing.Any] = <factory>)

Bases: object

Name and properties that can be used for mock data.

__init__(name: str, properties: dict[str, typing.Any] = <factory>) → None

name: str

properties: dict[str, Any]

class schematools.types.JsonDict(dict=None, /, **kwargs)

Bases: UserDict

json() → str

json_data() → Union[str, int, float, bool, None, Dict[str, Any], List[Any]]

class schematools.types.Permission(level: PermissionLevel, sub_value: str | None = None, source: str | None = None)

Bases: object

The result of an authorisation check.

The extra fields in this dataclass are mainly provided for debugging purposes. The dataclass can also be ordered; they get sorted by access level.

__init__(level: PermissionLevel, sub_value: str | None = None, source: str | None = None) → None

classmethod from_string(value: str | None, source: str | None = None) → Permission: Cast the string value to a permission level object.

level: PermissionLevel: The permission level given by the profile

none = Permission(level=<PermissionLevel.NONE: 0>, sub_value=None, source='schema')

source: str | None = None: Who authenticated this (added for easier debugging. typically tested against)

sub_value: str | None = None

3”)

Type: The extra parameter for the level (e.g. “letters

transform_function() → Callable[[Json], Json] | None: Adjust the value, when the permission level requires this. This is needed for “letters:3”, and things like “encoded”.

class schematools.types.PermissionLevel(value)

Bases: Enum

The various levels that can be provided on specific fields.

ENCODED = 40

LETTERS = 10

NONE = 0

RANDOM = 30

READ = 50

SUBOBJECTS_ONLY = 1

classmethod from_string(value: str | None) → PermissionLevel: Cast the string value to a permission level object.

highest = 50

class schematools.types.ProfileDatasetSchema(_id: str, _parent_schema: ProfileSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]])

Bases: DatasetType

A schema inside the profile dataset.

It grants permissions to a dataset on a global level, or more fine-grained permissions to specific tables.

__init__(_id: str, _parent_schema: ProfileSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) → None

property id: str

property permissions: Permission: Global permissions that are granted to the dataset. e.g. “read”.

property profile: ProfileSchema | None: The profile that this definition is part of.

property tables: dict[str, schematools.types.ProfileTableSchema]: The tables that this profile provides additional access rules for.

class schematools.types.ProfileSchema(dict=None, /, **kwargs)

Bases: SchemaType

The complete profile object.

It contains the scopes that the user should match, and definitions for various datasets.

property datasets: dict[str, schematools.types.ProfileDatasetSchema]: The datasets that this profile provides additional access rules for.

classmethod from_dict(obj: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) → ProfileSchema: Parses given dict and validates the given schema

classmethod from_file(filename: str) → ProfileSchema: Open an Amsterdam schema from a file.

property name: str | None: Name of Profile (if set)

property scopes: frozenset[str]: All these scopes should match in order to activate the profile.

class schematools.types.ProfileTableSchema(_id: str, _parent_schema: ProfileDatasetSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]])

Bases: DatasetType

A single table in the profile.

This grants permissions to a specific table, or more fine-grained permissions to specific fields. When the mandatory_filtersets is defined, the table may only be queried when a specific search query parameters are issued.

__init__(_id: str, _parent_schema: ProfileDatasetSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) → None

property dataset: ProfileDatasetSchema | None: The profile that this definition is part of.

property fields: dict[str, schematools.types.Permission]

The fields with their granted permission level.

This can be “read” or things like “letters:3”.

property id: str

property mandatory_filtersets: list[list[str]]

Tell whether the listing can only be requested with certain inputs.

E.g., an API user may only list data when they supply the lastname + birthdate.

Example value:

[
  ["bsn", "lastname"],
  ["postcode", "regimes.aantal[gte]"]
]

property permissions: Permission: Global permissions that are granted for the table, e.g. “read”.

class schematools.types.SchemaType(dict=None, /, **kwargs)

Bases: JsonDict

Base class for top-level schema objects (dataset, table, profile).

property db_name: str: The object name in a database-compatible format.

classmethod from_dict(obj: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) → ST

property id: str

property python_name: str: The ‘id’, but snake-cased like a python variable. Some object types (e.g. dataset and table) may override this in classname notation.

property type: str

class schematools.types.SemVer(version: str)

Bases: str

Semantic version numbers.

Semantic version numbers take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. For instance: 1.9.0 -> 1.10.0 -> 1.11.0.

See also: https://semver.org/ (where the above “definition” was taken from)

This class allows semantic version numbers to be prefixed with “v”. Eg “v1.11.0”. However, their canonical form, as outputted by the __str__() and __repr__() methods, will not include that prefix.

In addition, the minor and patch version can be left unspecified. SemVer will assume them to be equal to 0 in that case.

This class was specifically made a subclass of str to allow for seamless JSON serialization.

PAT: ClassVar[Pattern[str]] = re.compile("\n ^v? # Optionally start with a 'v' (for version)\n (?P<major>\\d+) # A major version number is compulsory\n (?:\\. # Optionally f, re.VERBOSE)

__init__(version: str) → None

Create a SemVer using a str that could be interpreted as an semantic version number.

Examples

>>> SemVer("1.0.0")
SemVer("1.0.0")

>>> SemVer("v54")
SemVer("54.0.0")

>>> SemVer("v3.9.0")
SemVer("3.9.0")

Parameters: version – A semantic version number, optionally prefixed with a “v”.
Raises: ValueError if the string supplied is not a semantic version number. –

major: int

minor: int

patch: int

property signif: str

Return stringified significant part of SemVer.

Significant being the version numbers without the patch level. Stringified as in both significant numbers as a string separated by an underscore.

Examples

>>> SemVer("v3.9.0").signif
"3_9"

class schematools.types.TableVersions(table_id: str, default_version: str, version_paths: dict[str, str], parent_dataset: DatasetSchema)

Bases: Mapping[str, DatasetTableSchema]

Lazy evaluated dict that provides access to other table versions.

__init__(table_id: str, default_version: str, version_paths: dict[str, str], parent_dataset: DatasetSchema)

class schematools.types.Temporal(identifier: str, identifier_field: ~schematools.types.DatasetFieldSchema, dimensions: dict[str, schematools.types.TemporalDimensionFields] = <factory>)

Bases: object

The temporal property of a Table.

Describes validity of objects for tables where different versions of objects are valid over time.

identifier

The key to the property that uniquely identifies a specific version of an object from among other versions of the same object.

This property combined with the fixed identifier forms a unique key for an object.

These identifier properties are non-contiguous increasing integers. The latest version of an object will have the highest value for identifier.

Type: str

dimensions

Contains the attributes of objects that determine for what (time)period an object is valid.

Dimensions is of type dict. A dimension is a tuple of the form “(‘valid_start’, ‘valid_end’)”, describing a closed set along the dimension for which an object is valid.

Example

With dimensions = {“time”:(‘valid_start’, ‘valid_end’)} an_object will be valid on some_time if: an_object.valid_start <= some_time < an_object.valid_end

Type: dict[str, schematools.types.TemporalDimensionFields]

__init__(identifier: str, identifier_field: ~schematools.types.DatasetFieldSchema, dimensions: dict[str, schematools.types.TemporalDimensionFields] = <factory>) → None

dimensions: dict[str, schematools.types.TemporalDimensionFields]

identifier: str

identifier_field: DatasetFieldSchema

class schematools.types.TemporalDimensionFields(start: DatasetFieldSchema, end: DatasetFieldSchema)

Bases: NamedTuple

A tuple that describes the fields for start field and end field of a range.

This could be something like ("beginGeldigheid", "eindGeldigheid").

end: DatasetFieldSchema: Alias for field number 1

start: DatasetFieldSchema: Alias for field number 0