schematools.loaders module

class schematools.loaders.CachedSchemaLoader(loader: SchemaLoader | None)

Bases: SchemaLoader

Base class for a loader that caches the results.

__init__(loader: SchemaLoader | None)

Initialize the cache. When the loader is not defined, this acts as a simple cache.

add_dataset(dataset: DatasetSchema) None

Add a dataset to the cache.

clear() None

Clear the cache.

get_all_datasets() dict[str, schematools.types.DatasetSchema]

Load all datasets, and fill the cache

get_dataset(dataset_id: str, prefetch_related: bool = False) DatasetSchema

Gets a dataset by id from the cache.

If not available, load the dataset from the SCHEMA_URL location. NB. Because dataset schemas are imported into the Postgresql database by the DSO API, there is a chance that the dataset that is loaded from SCHEMA_URL differs from the definition that is in de Postgresql database.

get_dataset_path(dataset_id) str

Find the relative path of a dataset within the location

get_table(dataset: DatasetSchema, table_ref: str) DatasetTableSchema

Retrieves a versioned table by reference

class schematools.loaders.FileSystemSchemaLoader(schema_url: Path | str, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)

Bases: _FileBasedSchemaLoader

Loader that loads dataset schemas from the filesystem.

__init__(schema_url: Path | str, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)

Initialize the loader with a folder where it needs to search for datasets. For the convenience of importing a selected subset, it’s possible to point to a subfolder of the datasets repository.

classmethod from_file(dataset_file: Path | str, **kwargs)

Helper function to support old patterns of loading random files as schema.

get_dataset_from_file(dataset_file: Path | str, prefetch_related: bool = False)

Extra method, to read a dataset directly from a JSON file. This is mainly a helper function for testing.

Normally, datasets are only detected when they use the format folder/dataset.json. This method allows a more free-format naming convention for experimenting with files (e.g. useful for unit testing). It will however not be possible to resolve relations to other datasets when those datasets use same free-format for naming their files.

Concluding, any relations will only resolve when:

  • the dataset follows the name/dataset.json convention;

  • or when the related dataset is also read and cached by the same loader instance.

classmethod get_root(dataset_file: Path | str) Path

Resolve the real root folder. This makes sure that the datasets-path value will be correct, even when a subfolder is selected for importing files.

class schematools.loaders.SchemaLoader

Bases: object

Interface that defines what a schema loader should provide.

get_all_datasets() dict[str, schematools.types.DatasetSchema]

Gets all datasets from the schema_url location.

The return value maps dataset paths (foo/bar) to schema’s.

get_dataset(dataset_id: str, prefetch_related: bool = False) DatasetSchema

Gets a dataset for dataset_id.

get_dataset_path(dataset_id: str) str

Find the relative path of a dataset within the location

get_table(dataset: DatasetSchema, table_ref: str) DatasetTableSchema

Retrieves a versioned table by reference

class schematools.loaders.URLSchemaLoader(schema_url: URL | str | None = None, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)

Bases: _SharedConnectionMixin, _FileBasedSchemaLoader

Loader that loads dataset schemas from an URL.

__init__(schema_url: URL | str | None = None, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)
get_all_datasets() dict[str, schematools.types.DatasetSchema]

Gets all datasets from a web url based on the self.schema_url path.

get_dataset(dataset_id: str, prefetch_related: bool = True) DatasetSchema

Retrieve a dataset and its contents with a single connection.

schematools.loaders.get_schema_loader(schema_url: URL | str | None = None, **kwargs) SchemaLoader

Initialize the schema loader based on the given location.

schema_url:

Location where the schemas can be found. This can be a web url, or a filesystem path.