schematools.types module

Python types for the Amsterdam Schema JSON file contents.

class schematools.types.AdditionalRelationSchema(_id: str, _parent_table: DatasetTableSchema | None = None, **kwargs)

Bases: DatasetType

Data class describing the additional relation block.

__init__(_id: str, _parent_table: DatasetTableSchema | None = None, **kwargs)
property format

“summary” or “embedded”.

Type

Format

property id
is_reverse_relation(field: DatasetFieldSchema)

See whether this relation

property name: str
property parent_table
property python_name: str
property related_field: DatasetFieldSchema

Return the field this reverse relation queries to find objects.

property related_table: DatasetTableSchema

Return the table this relation references.

property relation: str

Relation identifier.

class schematools.types.DatasetFieldSchema(*args: Any, _parent_table: DatasetTableSchema | None, _parent_field: DatasetFieldSchema | None = None, _required: bool = False, _temporal_range: bool = False, **kwargs: Any)

Bases: DatasetType

A single field (column) in a table.

__init__(*args: Any, _parent_table: DatasetTableSchema | None, _parent_field: DatasetFieldSchema | None = None, _required: bool = False, _temporal_range: bool = False, **kwargs: Any) None
property auth: frozenset[str]

Auth of the field, or OPENBAAR.

property crs: str | None

CRS for this field, or None if not a geo field.

property db_name: str

Return the name that is being used in the database. This can be a different name then the internal name when the field is a relation, or has a short-name.

property description: str | None
property faker: str | None

Return faker name and properties used for mocking data.

property field_items: Json | None

Return the item definition for an array type.

property format: str | None
get_field_by_id = <methodtools._LruCacheWire object>
property has_shortname: bool

Reports whether this field has a shortname.

You should never have to call this: name returns the shortname, if present.

property id: str

The id of a field uniquely identifies it among the fields of a table.

Note that comparisons against id should be avoided when fields are retrieved using .get_fields(include_subfields=True). In such case, a subfield with a similar ID will match with the top-level field.

property is_array: bool

Checks if field is an array field.

property is_array_of_objects: bool

Checks if field is an array of objects.

property is_array_of_scalars: bool

Checks if field is an array of scalars.

property is_autoincrement: bool
property is_composite_key

Tell whether the relation uses a composite key

property is_geo: bool

Tell whether the field references a geo object.

property is_identifier_part: bool

Tell whether the field is part of the composite primary key

property is_loose_relation

Determine if relation is loose or not.

property is_nested_object: bool

Checks if field is a possible nested object definition.

property is_nested_table: bool

Checks if field is a possible nested table.

property is_object: bool

Tell whether the field references an object. This might also be a relation, with a composite key. In both cases, the object subfields could be inlined in the main SQL table. See also: is_nested_object and is_composite_key.

property is_primary: bool

When name is ‘id’ the field should be the primary key For composite keys (table.identifier has > 1 item), an ‘id’ field is autogenerated.

property is_relation_temporal

Tell whether the 1-N relationship is modelled by an intermediate table. This allows tracking multiple versions of the relationship.

property is_scalar: bool

Tell whether the field is a scalar.

property is_subfield: bool

Tell whether this field is part of an embedded object (e.g. temporal relation)

property is_temporal_range: bool

Tell whether the field is used to store the range of a temporal dimension. (e.g. beginGeldigheid or eindGeldigheid).

property is_through_table: bool

Checks if field is a possible through table.

NM tables always are through tables. For 1N tables, there is a through tables if the target of the relation is temporal.

property multipleof: float | None
property name: str

The name as it is shown to the external world, camel-cased. In general, the “id” field is already camel-cased, but in case that didn’t happen this property will fix that.

property nested_table: DatasetTableSchema | None

Access the nested table that this field needs to store its data.

property nm_relation: str | None

M relation, if it exists (called M2M in Django).

Type

Give the N

property parent_field: DatasetFieldSchema | None

Provide access to the top-level field where it is a property for.

property provenance: str | None

Get the provenance info, if available, or None.

property python_name: str

The name as its used internally in Python or an ORM, snake cased

property qualified_id: str

The fully qualified ID (for debugging)

property related_field_ids: list[str] | None

For a relation field, returns the identifiers of the referenced fields.

The returned list contains only the fields, e.g., [“id”, “volgnummer”]. These are fields on the table self.related_table.

For loose relations, it will only return the first field of the related table.

If self is not a relation field, the return value is None.

property related_fields: list[DatasetFieldSchema] | None

Convenience property that returns the related field schemas.

property related_table: DatasetTableSchema | None

If this field is a relation, return the table this relation references.

property relation: str | None

N relation, if it exists.

Type

Give the 1

property required: bool
property reverse_relation: AdditionalRelationSchema | None

Find the opposite description of a relation.

When there is a relation, this only returns a description when the linked table also describes the other end of relationship.

property shortname: str

The shorter name if present, otherwise the ID. Note this is only used to generate human-readable database table names.

property srid: int | None

The integer value for the spatial reference ID (for geometry fields).

property subfields: list[schematools.types.DatasetFieldSchema]

Return the subfields for a nested structure.

For a nested object, fields are based on its properties, for an array of objects, fields are based on the properties of the “items” field.

When subfields are added as part of an 1m-relation those subfields need to be prefixed with the name of the relation field. However, this is not the case for the so-called dimension fields of a temporal relation (e.g. beginGeldigheid and eindGeldigheid).

If self is not an object or array, the return value is an empty iterator.

property table: DatasetTableSchema | None

The table that this field is a part of

property through_table: DatasetTableSchema | None

Access the through table that this fields needs to store its data.

property title: str | None

Title of the field.

property type: str

Returns the type of this field.

The type is one of the JSON Schema types “string”, “integer”, “number”, “object”, “array” or “boolean”, or the URL of a schema defining a type (for geo types). “null” is never used by Amsterdam Schemas.

Dates and URLs have type “string”. Check the format to distinguish them from free-form text.

See https://schemas.data.amsterdam.nl/docs/ams-schema-spec.html#data-types for details.

class schematools.types.DatasetSchema(data: dict, dataset_collection: CachedSchemaLoader | None = None)

Bases: SchemaType

The schema of a dataset.

This is a collection of JSON Schema’s within a single file.

class Status(value)

Bases: Enum

The allowed status values according to the Amsterdam schema spec.

beschikbaar = 'beschikbaar'
niet_beschikbaar = 'niet_beschikbaar'
__init__(data: dict, dataset_collection: CachedSchemaLoader | None = None) None

When initializing a datasets, a cache of related datasets can be added (at classlevel). Thus, we are able to get (temporal) info about the related datasets.

Parameters
  • data – The JSON data from the file.

  • dataset_collection – The shared collection that the dataset should become part of. This is used to resolve relations between different datasets.

property auth: frozenset[str]

Auth of the dataset, or OPENBAAR.

build_nested_table(field: DatasetFieldSchema) DatasetTableSchema

Construct an in-line table object for a nested field.

build_through_table(field: DatasetFieldSchema) DatasetTableSchema

Build the through table.

The through tables are not defined separately in a schema. The fact that a M2M relation needs an extra table is an implementation aspect. However, the through (aka. junction) table schema is needed for the dynamic model generation and for data-importing.

FK relations also have an additional through table, because the temporal information of the relation needs to be stored somewhere.

For relations with an object-type definition of the relation, the fields for the source and target side of the relation are stored separately in the through table. E.g. for a M2M relation like this:

“bestaatUitBuurten”: {

“type”: “array”, “items”: {

“type”: “object”, “properties”: {

“identificatie”: {

“type”: “string”

}, “volgnummer”: {

“type”: “integer”

}

}

}, “relation”: “gebieden:buurten”, “description”: “De buurten waaruit het object bestaat.”

}

The through table has the following fields:
  • ggwgebieden_id

  • buurten_id

  • ggwgebieden_identificatie

  • ggwgebieden_volgnummer

  • bestaat_uit_buurten_identificatie

  • bestaat_uit_buurten_volgnummer

property default_version: str

Default version for this schema.

property description: str | None

Description of the dataset (if set).

classmethod from_dict(obj: dict[str, Any], dataset_collection: CachedSchemaLoader | None = None) DatasetSchema

Parses given dict and validates the given schema.

get_table_by_id = <methodtools._LruCacheWire object>
get_tables(include_nested: bool = False, include_through: bool = False) list[schematools.types.DatasetTableSchema]

List tables, including nested.

property identifier: str

Which fields acts as identifier. (default is Django “pk” field).

property is_default_version: bool

Is this Default Dataset version. Defaults to True, in order to stay backwards compatible.

json(inline_tables: bool = False) str

Overwritten JSON logic that inlines tables by default.

json_data(inline_tables: bool = False) Union[str, int, float, bool, None, Dict[str, Any], List[Any]]

Overwritten logic that inlines tables

property license: str | None

The license of the table as stated in the schema.

property nested_tables: list[schematools.types.DatasetTableSchema]

Access list of nested tables.

property python_name: str

The ‘id’, but camel cased like a class name.

property related_dataset_schema_ids: set[str]

Access the list or related schema ids.

This property calculates the related data that are needed, so the users of this dataset can preload these datasets. This can also include the current dataset, for relations that point to other tables within the same dataset.

property status: Status
property table_versions: dict[str, schematools.types.TableVersions]

Access different versions of the table, as mentioned in the dataset file.

property tables: list[schematools.types.DatasetTableSchema]

Access the tables within the file.

property through_tables: list[schematools.types.DatasetTableSchema]

Access list of through_tables, for n-m relations.

property title: str | None

Title of the dataset (if set)

property version: str

Dataset Schema version.

class schematools.types.DatasetTableSchema(*args: Any, parent_schema: DatasetSchema, _parent_table: DatasetTableSchema | None = None, nested_table: bool = False, through_table: bool = False, **kwargs: Any)

Bases: SchemaType

The table within a dataset. This table definition follows the JSON Schema spec.

This class has an id property (inherited from SchemaType) to uniquely address this dataset-table in the scope of the DatasetSchema. This id is used in lots of places in the dynamic model generation in Django.

There is also a db_name method, that is used for the auto-generation of database table names. This also reads the shortname, to define a human-readable abbreviation that fits inside the maximum database table name length.

__init__(*args: Any, parent_schema: DatasetSchema, _parent_table: DatasetTableSchema | None = None, nested_table: bool = False, through_table: bool = False, **kwargs: Any) None
property additional_relations: list[schematools.types.AdditionalRelationSchema]

Fetch list of additional (backwards or N-N) relations.

This is a dictionary of names for existing forward relations in other tables with either the ‘embedded’ or ‘summary’ property

property auth: frozenset[str]

Auth of the table, or OPENBAAR.

property crs: str
property dataset: DatasetSchema

The dataset that this table is part of.

property db_name: str

Return the standard database name for the table.

For some custom situations (e.g. importer, or handling table versions), use db_name_variant().

db_name_variant(*, with_dataset_prefix: bool = True, with_version: bool = False, postfix: str = '', check_assert: bool = True) str

Return derived table name for DB usage.

Parameters
  • with_dataset_prefix – if True, include dataset ID as a prefix to the table name.

  • with_version – if True, include the major and minor version number in the table name.

  • postfix – An optional postfix to append to the table name

  • check_assert – Check max table length name. Can be turned of to have the check done by validation code (with much better error reporting.)

Returns

A derived table name suitable for DB usage.

property description: str | None

The description of the table as stated in the schema.

property display_field: DatasetFieldSchema | None

Tell which fields can be used as display field.

property fields: list[schematools.types.DatasetFieldSchema]

All the fields of the table.

This returns the direct fields that are part of the table. Fields that have “type=object” can define nested fields, which are not included here. These fields can either be read using field.subfields, or be inlined using get_fields(include_subfields=True).

classmethod from_dict(obj: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) DTS
get_additional_relation_by_id = <methodtools._LruCacheWire object>
get_dataset_schema(dataset_id: str) DatasetSchema | None

Return the associated parent datasetschema for this table.

get_field_by_id = <methodtools._LruCacheWire object>
get_fields(include_subfields: bool = False) Iterator[DatasetFieldSchema]

Get the fields for this table.

Parameters

include_subfields – This includes the subfields of object fields, so those can be inlined in the main table. This is useful for ORM and SQL databases, that can’t support a nested structure.

get_reverse_relation(field: DatasetFieldSchema) AdditionalRelationSchema | None

Find the description of a reverse relation for a field.

property has_composite_key: bool

Tell whether the table uses multiple attributes together as it’s identifier.

property has_geometry_fields: bool
property has_parent_table: bool

For nested or through tables, there is a parent table.

property has_shortname: bool
property identifier: list[str]

The main identifier field, if there is an identifier field available. Default to “id” for existing schemas without an identifier field.

property identifier_fields: list[schematools.types.DatasetFieldSchema]

Return the field schema’s for the identifier fields.

property is_autoincrement: bool

Return bool indicating autoincrement behaviour of the table identifier.

property is_nested_table: bool

Indicates if table is a nested table.

property is_temporal: bool

Indicates if this is a table with temporal characteristics

property is_through_table: bool

m relation table) or base table.

Type

Indicate if table is an intersection table (n

property main_geometry: str

The main geometry field, if there is a geometry field available. Default to “geometry” for existing schemas without a mainGeometry field.

property main_geometry_field: DatasetFieldSchema

The main geometry as field object

property parent_table: DatasetTableSchema | None

The parent table of this table.

For nested and through tables, the parent table is available.

property parent_table_field: DatasetFieldSchema | None

Provide the NM-relation that generated this through table.

property python_name: str

The ‘id’, but camel cased like a class name.

property qualified_id: str

The fully qualified ID (for debugging)

property related_dataset_ids: set[str]

Tell which dataset ID’s relations point to.

This can also include the current dataset, for relations that point to other tables within the same dataset.

property shortname: str

The shorter name if present, otherwise the ID. This is only used to generate human-readable database table names.

property temporal: Temporal | None

The temporal property of a Table. Describes validity of objects for tables where different versions of objects are valid over time.

Temporal has an identifier property that refers to the attribute of objects in the table that uniquely identifies a specific version of an object from among other versions of the same object.

Temporal also has a dimensions property, which gives the attributes of objects that determine for what (time)period an object is valid.

property through_fields: tuple[DatasetFieldSchema, DatasetFieldSchema] | None

Return the left and right side of an M2M through table.

This only returns results when the table describes the intermediate table of an M2M relation (is_through_table is true).

property title: str | None

Title of the table.

validate(row: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) None

Validate a record against the schema.

property version: SemVer

Get table version.

class schematools.types.DatasetType(dict=None, /, **kwargs)

Bases: JsonDict

Base class for child elements of the schema.

class schematools.types.Faker(name: str, properties: dict[str, typing.Any] = <factory>)

Bases: object

Name and properties that can be used for mock data.

__init__(name: str, properties: dict[str, typing.Any] = <factory>) None
name: str
properties: dict[str, Any]
class schematools.types.JsonDict(dict=None, /, **kwargs)

Bases: UserDict

json() str
json_data() Union[str, int, float, bool, None, Dict[str, Any], List[Any]]
class schematools.types.Permission(level: PermissionLevel, sub_value: str | None = None, source: str | None = None)

Bases: object

The result of an authorisation check.

The extra fields in this dataclass are mainly provided for debugging purposes. The dataclass can also be ordered; they get sorted by access level.

__init__(level: PermissionLevel, sub_value: str | None = None, source: str | None = None) None
classmethod from_string(value: str | None, source: str | None = None) Permission

Cast the string value to a permission level object.

level: PermissionLevel

The permission level given by the profile

none = Permission(level=<PermissionLevel.NONE: 0>, sub_value=None, source='schema')
source: str | None = None

Who authenticated this (added for easier debugging. typically tested against)

sub_value: str | None = None

3”)

Type

The extra parameter for the level (e.g. “letters

transform_function() Callable[[Json], Json] | None

Adjust the value, when the permission level requires this. This is needed for “letters:3”, and things like “encoded”.

class schematools.types.PermissionLevel(value)

Bases: Enum

The various levels that can be provided on specific fields.

ENCODED = 40
LETTERS = 10
NONE = 0
RANDOM = 30
READ = 50
SUBOBJECTS_ONLY = 1
classmethod from_string(value: str | None) PermissionLevel

Cast the string value to a permission level object.

highest = 50
class schematools.types.ProfileDatasetSchema(_id: str, _parent_schema: ProfileSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]])

Bases: DatasetType

A schema inside the profile dataset.

It grants permissions to a dataset on a global level, or more fine-grained permissions to specific tables.

__init__(_id: str, _parent_schema: ProfileSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) None
property id: str
property permissions: Permission

Global permissions that are granted to the dataset. e.g. “read”.

property profile: ProfileSchema | None

The profile that this definition is part of.

property tables: dict[str, schematools.types.ProfileTableSchema]

The tables that this profile provides additional access rules for.

class schematools.types.ProfileSchema(dict=None, /, **kwargs)

Bases: SchemaType

The complete profile object.

It contains the scopes that the user should match, and definitions for various datasets.

property datasets: dict[str, schematools.types.ProfileDatasetSchema]

The datasets that this profile provides additional access rules for.

classmethod from_dict(obj: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) ProfileSchema

Parses given dict and validates the given schema

classmethod from_file(filename: str) ProfileSchema

Open an Amsterdam schema from a file.

property name: str | None

Name of Profile (if set)

property scopes: frozenset[str]

All these scopes should match in order to activate the profile.

class schematools.types.ProfileTableSchema(_id: str, _parent_schema: ProfileDatasetSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]])

Bases: DatasetType

A single table in the profile.

This grants permissions to a specific table, or more fine-grained permissions to specific fields. When the mandatory_filtersets is defined, the table may only be queried when a specific search query parameters are issued.

__init__(_id: str, _parent_schema: ProfileDatasetSchema, data: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) None
property dataset: ProfileDatasetSchema | None

The profile that this definition is part of.

property fields: dict[str, schematools.types.Permission]

The fields with their granted permission level.

This can be “read” or things like “letters:3”.

property id: str
property mandatory_filtersets: list[list[str]]

Tell whether the listing can only be requested with certain inputs.

E.g., an API user may only list data when they supply the lastname + birthdate.

Example value:

[
  ["bsn", "lastname"],
  ["postcode", "regimes.aantal[gte]"]
]
property permissions: Permission

Global permissions that are granted for the table, e.g. “read”.

class schematools.types.SchemaType(dict=None, /, **kwargs)

Bases: JsonDict

Base class for top-level schema objects (dataset, table, profile).

property db_name: str

The object name in a database-compatible format.

classmethod from_dict(obj: Union[str, int, float, bool, None, Dict[str, Any], List[Any]]) ST
property id: str
property python_name: str

The ‘id’, but snake-cased like a python variable. Some object types (e.g. dataset and table) may override this in classname notation.

property type: str
class schematools.types.SemVer(version: str)

Bases: str

Semantic version numbers.

Semantic version numbers take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. For instance: 1.9.0 -> 1.10.0 -> 1.11.0.

See also: https://semver.org/ (where the above “definition” was taken from)

This class allows semantic version numbers to be prefixed with “v”. Eg “v1.11.0”. However, their canonical form, as outputted by the __str__() and __repr__() methods, will not include that prefix.

In addition, the minor and patch version can be left unspecified. SemVer will assume them to be equal to 0 in that case.

This class was specifically made a subclass of str to allow for seamless JSON serialization.

PAT: ClassVar[Pattern[str]] = re.compile("\n        ^v?                     # Optionally start with a 'v' (for version)\n        (?P<major>\\d+)          # A major version number is compulsory\n        (?:\\.                   # Optionally f, re.VERBOSE)
__init__(version: str) None

Create a SemVer using a str that could be interpreted as an semantic version number.

Examples

>>> SemVer("1.0.0")
SemVer("1.0.0")
>>> SemVer("v54")
SemVer("54.0.0")
>>> SemVer("v3.9.0")
SemVer("3.9.0")
Parameters

version – A semantic version number, optionally prefixed with a “v”.

Raises

ValueError if the string supplied is not a semantic version number.

major: int
minor: int
patch: int
property signif: str

Return stringified significant part of SemVer.

Significant being the version numbers without the patch level. Stringified as in both significant numbers as a string separated by an underscore.

Examples

>>> SemVer("v3.9.0").signif
"3_9"
class schematools.types.TableVersions(table_id: str, default_version: str, version_paths: dict[str, str], parent_dataset: DatasetSchema)

Bases: Mapping[str, DatasetTableSchema]

Lazy evaluated dict that provides access to other table versions.

__init__(table_id: str, default_version: str, version_paths: dict[str, str], parent_dataset: DatasetSchema)
class schematools.types.Temporal(identifier: str, identifier_field: ~schematools.types.DatasetFieldSchema, dimensions: dict[str, schematools.types.TemporalDimensionFields] = <factory>)

Bases: object

The temporal property of a Table.

Describes validity of objects for tables where different versions of objects are valid over time.

identifier

The key to the property that uniquely identifies a specific version of an object from among other versions of the same object.

This property combined with the fixed identifier forms a unique key for an object.

These identifier properties are non-contiguous increasing integers. The latest version of an object will have the highest value for identifier.

Type

str

dimensions

Contains the attributes of objects that determine for what (time)period an object is valid.

Dimensions is of type dict. A dimension is a tuple of the form “(‘valid_start’, ‘valid_end’)”, describing a closed set along the dimension for which an object is valid.

Example

With dimensions = {“time”:(‘valid_start’, ‘valid_end’)} an_object will be valid on some_time if: an_object.valid_start <= some_time < an_object.valid_end

Type

dict[str, schematools.types.TemporalDimensionFields]

__init__(identifier: str, identifier_field: ~schematools.types.DatasetFieldSchema, dimensions: dict[str, schematools.types.TemporalDimensionFields] = <factory>) None
dimensions: dict[str, schematools.types.TemporalDimensionFields]
identifier: str
identifier_field: DatasetFieldSchema
class schematools.types.TemporalDimensionFields(start: DatasetFieldSchema, end: DatasetFieldSchema)

Bases: NamedTuple

A tuple that describes the fields for start field and end field of a range.

This could be something like ("beginGeldigheid", "eindGeldigheid").

end: DatasetFieldSchema

Alias for field number 1

start: DatasetFieldSchema

Alias for field number 0