schematools.loaders module
- class schematools.loaders.CachedSchemaLoader(loader: SchemaLoader | None)
Bases:
SchemaLoader
Base class for a loader that caches the results.
- __init__(loader: SchemaLoader | None)
Initialize the cache. When the loader is not defined, this acts as a simple cache.
- add_dataset(dataset: DatasetSchema) None
Add a dataset to the cache.
- get_all_datasets() dict[str, schematools.types.DatasetSchema]
Load all datasets, and fill the cache
- get_dataset(dataset_id: str, prefetch_related: bool = False) DatasetSchema
Gets a dataset by id from the cache.
If not available, load the dataset from the SCHEMA_URL location. NB. Because dataset schemas are imported into the Postgresql database by the DSO API, there is a chance that the dataset that is loaded from SCHEMA_URL differs from the definition that is in de Postgresql database.
- get_table(dataset: DatasetSchema, table_ref: str) DatasetTableSchema
Retrieves a versioned table by reference
- class schematools.loaders.FileSystemSchemaLoader(schema_url: Path | str, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)
Bases:
_FileBasedSchemaLoader
Loader that loads dataset schemas from the filesystem.
- __init__(schema_url: Path | str, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)
Initialize the loader with a folder where it needs to search for datasets. For the convenience of importing a selected subset, it’s possible to point to a subfolder of the datasets repository.
- classmethod from_file(dataset_file: Path | str, **kwargs)
Helper function to support old patterns of loading random files as schema.
- get_dataset_from_file(dataset_file: Path | str, prefetch_related: bool = False)
Extra method, to read a dataset directly from a JSON file. This is mainly a helper function for testing.
Normally, datasets are only detected when they use the format
folder/dataset.json
. This method allows a more free-format naming convention for experimenting with files (e.g. useful for unit testing). It will however not be possible to resolve relations to other datasets when those datasets use same free-format for naming their files.Concluding, any relations will only resolve when:
the dataset follows the
name/dataset.json
convention;or when the related dataset is also read and cached by the same loader instance.
- class schematools.loaders.SchemaLoader
Bases:
object
Interface that defines what a schema loader should provide.
- get_all_datasets() dict[str, schematools.types.DatasetSchema]
Gets all datasets from the schema_url location.
The return value maps dataset paths (foo/bar) to schema’s.
- get_dataset(dataset_id: str, prefetch_related: bool = False) DatasetSchema
Gets a dataset for dataset_id.
- get_table(dataset: DatasetSchema, table_ref: str) DatasetTableSchema
Retrieves a versioned table by reference
- class schematools.loaders.URLSchemaLoader(schema_url: URL | str | None = None, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)
Bases:
_SharedConnectionMixin
,_FileBasedSchemaLoader
Loader that loads dataset schemas from an URL.
- __init__(schema_url: URL | str | None = None, *, loaded_callback: Callable[[DatasetSchema], None] | None = None)
- get_all_datasets() dict[str, schematools.types.DatasetSchema]
Gets all datasets from a web url based on the self.schema_url path.
- get_dataset(dataset_id: str, prefetch_related: bool = True) DatasetSchema
Retrieve a dataset and its contents with a single connection.
- schematools.loaders.get_schema_loader(schema_url: URL | str | None = None, **kwargs) SchemaLoader
Initialize the schema loader based on the given location.
- schema_url:
Location where the schemas can be found. This can be a web url, or a filesystem path.