schematools.loaders module

class schematools.loaders.CachedSchemaLoader(loader: SchemaLoader | None)

Bases: SchemaLoader

Base class for a loader that caches the results.

__init__(loader: SchemaLoader | None): Initialize the cache. When the loader is not defined, this acts as a simple cache.

add_dataset(dataset: DatasetSchema) → None: Add a dataset to the cache.

clear() → None: Clear the cache.

get_all_datasets() → dict[str, schematools.types.DatasetSchema]: Load all datasets, and fill the cache

get_dataset(dataset_id: str, prefetch_related: bool = False) → DatasetSchema

Gets a dataset by id from the cache.

If not available, load the dataset from the SCHEMA_URL location. NB. Because dataset schemas are imported into the Postgresql database by the DSO API, there is a chance that the dataset that is loaded from SCHEMA_URL differs from the definition that is in de Postgresql database.

get_dataset_path(dataset_id) → str: Find the relative path of a dataset within the location

get_table(dataset: DatasetSchema, table_ref: str) → DatasetTableSchema: Retrieves a versioned table by reference

class schematools.loaders.FileSystemSchemaLoader(schema_url: Path | str, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)

Bases: _FileBasedSchemaLoader

Loader that loads dataset schemas from the filesystem.

__init__(schema_url: Path | str, *, loaded_callback: Callable[[DatasetSchema], None] | None = None): Initialize the loader with a folder where it needs to search for datasets. For the convenience of importing a selected subset, it’s possible to point to a subfolder of the datasets repository.

classmethod from_file(dataset_file: Path | str, **kwargs): Helper function to support old patterns of loading random files as schema.

get_dataset_from_file(dataset_file: Path | str, prefetch_related: bool = False)

Extra method, to read a dataset directly from a JSON file. This is mainly a helper function for testing.

Normally, datasets are only detected when they use the format folder/dataset.json. This method allows a more free-format naming convention for experimenting with files (e.g. useful for unit testing). It will however not be possible to resolve relations to other datasets when those datasets use same free-format for naming their files.

Concluding, any relations will only resolve when:

the dataset follows the name/dataset.json convention;
or when the related dataset is also read and cached by the same loader instance.

classmethod get_root(dataset_file: Path | str) → Path: Resolve the real root folder. This makes sure that the datasets-path value will be correct, even when a subfolder is selected for importing files.

class schematools.loaders.SchemaLoader

Bases: object

Interface that defines what a schema loader should provide.

get_all_datasets() → dict[str, schematools.types.DatasetSchema]

Gets all datasets from the schema_url location.

The return value maps dataset paths (foo/bar) to schema’s.

get_dataset(dataset_id: str, prefetch_related: bool = False) → DatasetSchema: Gets a dataset for dataset_id.

get_dataset_path(dataset_id: str) → str: Find the relative path of a dataset within the location

get_table(dataset: DatasetSchema, table_ref: str) → DatasetTableSchema: Retrieves a versioned table by reference

class schematools.loaders.URLSchemaLoader(schema_url: URL | str | None = None, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)

Bases: _SharedConnectionMixin, _FileBasedSchemaLoader

Loader that loads dataset schemas from an URL.

__init__(schema_url: URL | str | None = None, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)

get_all_datasets() → dict[str, schematools.types.DatasetSchema]: Gets all datasets from a web url based on the self.schema_url path.

get_dataset(dataset_id: str, prefetch_related: bool = True) → DatasetSchema: Retrieve a dataset and its contents with a single connection.

schematools.loaders.get_schema_loader(schema_url: URL | str | None = None, **kwargs) → SchemaLoader

Initialize the schema loader based on the given location.

schema_url:: Location where the schemas can be found. This can be a web url, or a filesystem path.