webknossos.dataset.dataset ¶

Dataset ¶

Dataset(
    dataset_path: str | PathLike | UPath,
    voxel_size: tuple[float, float, float] | None = None,
    name: str | None = None,
    exist_ok: bool = False,
    *,
    voxel_size_with_unit: VoxelSize | None = None,
    read_only: bool = False
)

Bases: AbstractDataset[Layer, SegmentationLayer]

A dataset is the entry point of the Dataset API.

An existing dataset on disk can be opened or new datasets can be created.

A dataset stores the data in .wkw files on disk with metadata in datasource-properties.json. The information in those files are kept in sync with the object.

Each dataset consists of one or more layers (webknossos.dataset.layer.Layer), which themselves can comprise multiple magnifications (webknossos.dataset.mag_view.MagView).

Examples:

Create a new dataset:

ds = Dataset("path/to/dataset", voxel_size=(11.2, 11.2, 25))

Open an existing dataset:

ds = Dataset.open("path/to/dataset")

Open a remote dataset:

ds = RemoteDataset.open("my_dataset", "organization_id")

Create a new dataset or open an existing one.

Creates a new dataset and the associated datasource-properties.json if one does not exist. If the dataset already exists and exist_ok is True, it is opened (the provided voxel_size and name are asserted to match the existing dataset).

Please use Dataset.open if you intend to open an existing dataset and don't want/need the creation behavior.

Parameters:

dataset_path (str | PathLike | UPath) –

Path where the dataset should be created/opened
voxel_size (tuple[float, float, float] | None, default: None ) –

Optional tuple of floats (x, y, z) specifying voxel size in nanometers
name (str | None, default: None ) –

Optional name for the dataset, defaults to last part of dataset_path if not provided
exist_ok (bool, default: False ) –

Whether to open an existing dataset at the path rather than failing
voxel_size_with_unit (VoxelSize | None, default: None ) –

Optional voxel size with unit specification
read_only (bool, default: False ) –

Whether to open dataset in read-only mode

Raises:

RuntimeError –

If dataset exists and exist_ok=False
AssertionError –

If opening existing dataset with mismatched voxel size or name

default_view_configuration `property` `writable` ¶

default_view_configuration: DatasetViewConfiguration | None

Default view configuration for this dataset in webknossos.

Controls how the dataset is displayed in webknossos when first opened by a user, including position, zoom level, rotation etc.

Returns:

DatasetViewConfiguration | None –

DatasetViewConfiguration | None: Current view configuration if set

Examples:

ds.default_view_configuration = DatasetViewConfiguration(
    zoom=1.5,
    position=(100, 100, 100)
)

layers `property` ¶

layers: Mapping[str, LayerType]

Dictionary containing all layers of this dataset.

Returns:

Mapping[str, LayerType] –

dict[str, Layer]: Dictionary mapping layer names to Layer objects

Examples:

for layer_name, layer in ds.layers.items():
   print(layer_name)

name `property` `writable` ¶

name: str

Name of this dataset as specified in datasource-properties.json.

Can be modified to rename the dataset. Changes are persisted to the properties file.

Returns:

str ( str ) –

Current dataset name

Examples:

ds.name = "my_renamed_dataset"  # Updates the name in properties file

path `instance-attribute` ¶

path: UPath = path

read_only `property` ¶

read_only: bool

Whether this dataset is opened in read-only mode.

When True, operations that would modify the dataset (adding layers, changing properties, etc.) are not allowed and will raise RuntimeError.

Returns:

bool ( bool ) –

True if dataset is read-only, False otherwise

resolved_path `property` ¶

resolved_path: UPath

voxel_size `property` ¶

voxel_size: tuple[float, float, float]

Size of each voxel in nanometers along each dimension (x, y, z).

Returns:

tuple[float, float, float] –

tuple[float, float, float]: Size of each voxel in nanometers for x,y,z dimensions

Examples:

vx, vy, vz = ds.voxel_size
print(f"X resolution is {vx}nm")

voxel_size_with_unit `property` ¶

voxel_size_with_unit: VoxelSize

Size of voxels including unit information.

Size of each voxel along each dimension (x, y, z), including unit specification. The default unit is nanometers.

Returns:

VoxelSize ( VoxelSize ) –

Object containing voxel sizes and their units

ConversionLayerMapping ¶

Bases: Enum

Strategies for mapping file paths to layers when importing images.

These strategies determine how input image files are grouped into layers during dataset creation using Dataset.from_images(). If no strategy is provided, INSPECT_SINGLE_FILE is used as the default.

If none of the pre-defined strategies fit your needs, you can provide a custom callable that takes a Path and returns a layer name string.

Examples:

Using default strategy:

ds = Dataset.from_images("images/", "dataset/")

Explicit strategy:

ds = Dataset.from_images(
    "images/",
    "dataset/",
    map_filepath_to_layer_name=ConversionLayerMapping.ENFORCE_SINGLE_LAYER
)

Custom mapping function:

ds = Dataset.from_images(
    "images/",
    "dataset/",
    map_filepath_to_layer_name=lambda p: p.stem
)

ENFORCE_LAYER_PER_FILE `class-attribute` `instance-attribute` ¶

ENFORCE_LAYER_PER_FILE = 'enforce_layer_per_file'

Creates a new layer for each input file. Useful for converting multiple 3D images or when each 2D image should become its own layer.

ENFORCE_LAYER_PER_FOLDER `class-attribute` `instance-attribute` ¶

ENFORCE_LAYER_PER_FOLDER = 'enforce_layer_per_folder'

Groups files by their containing folder. Each folder becomes one layer. Useful for organized 2D image stacks.

ENFORCE_LAYER_PER_TOPLEVEL_FOLDER `class-attribute` `instance-attribute` ¶

ENFORCE_LAYER_PER_TOPLEVEL_FOLDER = (
    "enforce_layer_per_toplevel_folder"
)

Groups files by their top-level folder. Useful when multiple layers each have their stacks split across subfolders.

ENFORCE_SINGLE_LAYER `class-attribute` `instance-attribute` ¶

ENFORCE_SINGLE_LAYER = 'enforce_single_layer'

Combines all input files into a single layer. Only useful when all images are 2D slices that should be combined.

INSPECT_EVERY_FILE `class-attribute` `instance-attribute` ¶

INSPECT_EVERY_FILE = 'inspect_every_file'

Like INSPECT_SINGLE_FILE but determines strategy separately for each file. More flexible but slower for many files.

INSPECT_SINGLE_FILE `class-attribute` `instance-attribute` ¶

INSPECT_SINGLE_FILE = 'inspect_single_file'

Default strategy. Inspects first image file to determine if data is 2D or 3D. For 2D data uses ENFORCE_LAYER_PER_FOLDER, for 3D uses ENFORCE_LAYER_PER_FILE.

add_copy_layer ¶

add_copy_layer(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
    *,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    data_format: str | DataFormat | None = None,
    compress: bool | None = None,
    exists_ok: bool = False,
    executor: Executor | None = None,
    with_attachments: bool = True
) -> Layer

Deprecated. Use Dataset.add_layer_as_copy instead.

add_fs_copy_layer ¶

add_fs_copy_layer(
    foreign_layer: str | PathLike | UPath | Layer,
    new_layer_name: str | None = None,
) -> Layer

Deprecated. File-based copy is automatically used in Dataset.add_layer_as_copy.

Copies the files at foreign_layer which belongs to another dataset to the current dataset via the filesystem. Additionally, the relevant information from the datasource-properties.json of the other dataset are copied too. If new_layer_name is None, the name of the foreign layer is used.

add_layer ¶

add_layer(
    layer_name: str,
    category: LayerCategoryType,
    *,
    dtype_per_layer: DTypeLike | None = None,
    dtype_per_channel: DTypeLike | None = None,
    num_channels: int | None = None,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    bounding_box: NDBoundingBox | None = None,
    **kwargs: Any
) -> Layer

Create a new layer in the dataset.

Creates a new layer with the given name, category, and data type.

Parameters:

layer_name (str) –

Name for the new layer
category (LayerCategoryType) –

Either 'color' or 'segmentation'
dtype_per_layer (DTypeLike | None, default: None ) –

Deprecated, use dtype_per_channel. Optional data type for entire layer, e.g. np.uint8
dtype_per_channel (DTypeLike | None, default: None ) –

Optional data type per channel, e.g. np.uint8
num_channels (int | None, default: None ) –

Number of channels (default 1)
data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

Format to store data ('wkw', 'zarr', 'zarr3')
bounding_box (NDBoundingBox | None, default: None ) –

Optional initial bounding box of layer
**kwargs (Any, default: {} ) –

Additional arguments: - largest_segment_id: For segmentation layers, initial largest ID - mappings: For segmentation layers, optional ID mappings

Returns:

Layer ( Layer ) –

The newly created layer

Raises:

IndexError –

If layer with given name already exists
RuntimeError –

If invalid category specified
AttributeError –

If both dtype_per_layer and dtype_per_channel specified
AssertionError –

If invalid layer name or WKW format used with remote dataset

Examples:

Create color layer:

layer = ds.add_layer(
    "my_raw_microscopy_layer",
    LayerCategoryType.COLOR_CATEGORY,
    dtype_per_channel=np.uint8,
)

Create segmentation layer:

layer = ds.add_layer(
    "my_segmentation_labels",
    LayerCategoryType.SEGMENTATION_CATEGORY,
    dtype_per_channel=np.uint64
)

Note

The dtype can be specified either per layer or per channel, but not both. If neither is specified, uint8 per channel is used by default. WKW format can only be used with local datasets.

add_layer_as_copy ¶

add_layer_as_copy(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
    *,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    data_format: str | DataFormat | None = None,
    compress: bool | Zarr3Config | None = None,
    exists_ok: bool = False,
    executor: Executor | None = None,
    with_attachments: bool = True
) -> Layer

Copy layer from another dataset to this one.

Creates a new layer in this dataset by copying data and metadata from a layer in another dataset.

Parameters:

foreign_layer (str | PathLike | UPath | Layer | RemoteLayer) –

Layer to copy (path or Layer object)
new_layer_name (str | None, default: None ) –

Optional name for the new layer, uses original name if None
chunk_shape (Vec3IntLike | int | None, default: None ) –

Optional shape of chunks for storage
shard_shape (Vec3IntLike | int | None, default: None ) –

Optional shape of shards for storage
chunks_per_shard (Vec3IntLike | int | None, default: None ) –

Deprecated, use shard_shape. Optional number of chunks per shard
data_format (str | DataFormat | None, default: None ) –

Optional format to store copied data ('wkw', 'zarr', etc.)
compress (bool | Zarr3Config | None, default: None ) –

Optional whether to compress copied data
exists_ok (bool, default: False ) –

Whether to overwrite existing layers
executor (Executor | None, default: None ) –

Optional executor for parallel copying

Returns:

Layer ( Layer ) –

The newly created copy of the layer

Raises:

IndexError –

If target layer name already exists
RuntimeError –

If dataset is read-only

Examples:

Copy layer keeping same name:

other_ds = Dataset.open("other/dataset")
copied = ds.add_layer_as_copy(other_ds.get_layer("color"))

Copy with new name:

copied = ds.add_layer_as_copy(
    other_ds.get_layer("color"),
    new_layer_name="color_copy",
    compress=True
)

add_layer_as_ref ¶

add_layer_as_ref(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
) -> Layer

Add a layer from another dataset by reference.

Creates a layer that references data from a remote dataset. The image data will be streamed on-demand when accessed.

Parameters:

foreign_layer (str | PathLike | UPath | Layer | RemoteLayer) –

Foreign layer to add (path or Layer object)
new_layer_name (str | None, default: None ) –

Optional name for the new layer, uses original name if None

Returns:

Layer ( Layer ) –

The newly created remote layer referencing the foreign data

Raises:

IndexError –

If target layer name already exists
AssertionError –

If trying to add non-remote layer or same origin dataset
RuntimeError –

If dataset is read-only

Examples:

ds = Dataset.open("other/dataset")
remote_ds = RemoteDataset.open("my_dataset", "my_org_id")
new_layer = ds.add_layer_as_ref(
    remote_ds.get_layer("color")
)

Note

Changes to the original layer's properties afterwards won't affect this dataset. Data is only referenced, not copied.

add_layer_for_existing_files ¶

add_layer_for_existing_files(
    layer_name: str,
    category: LayerCategoryType,
    **kwargs: Any
) -> Layer

Create a new layer from existing data files.

Adds a layer by discovering and incorporating existing data files that were created externally, rather than creating new ones. The layer properties are inferred from the existing files unless overridden.

Parameters:

layer_name (str) –

Name for the new layer
category (LayerCategoryType) –

Layer category ('color' or 'segmentation')
**kwargs (Any, default: {} ) –

Additional arguments: - num_channels: Override detected number of channels - dtype_per_channel: Override detected data type - data_format: Override detected data format - bounding_box: Override detected bounding box

Returns:

Layer ( Layer ) –

The newly created layer referencing the existing files

Raises:

AssertionError –

If layer already exists or no valid files found
RuntimeError –

If dataset is read-only

Examples:

Basic usage:

layer = ds.add_layer_for_existing_files(
    "external_data",
    "color"
)

Override properties:

layer = ds.add_layer_for_existing_files(
    "segmentation_data",
    "segmentation",
    dtype_per_channel=np.uint64
)

Note

The data files must already exist in the dataset directory under the layer name. Files are analyzed to determine properties like data type and number of channels. Magnifications are discovered automatically.

add_layer_from_images ¶

add_layer_from_images(
    images: Union[
        str, FramesSequence, list[str | PathLike | UPath]
    ],
    layer_name: str,
    category: LayerCategoryType | None = "color",
    *,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    mag: MagLike = Mag(1),
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: int | Vec3IntLike | None = None,
    compress: bool = True,
    topleft: VecIntLike = zeros(),
    swap_xy: bool = False,
    flip_x: bool = False,
    flip_y: bool = False,
    flip_z: bool = False,
    dtype: DTypeLike | None = None,
    use_bioformats: bool | None = None,
    channel: int | None = None,
    timepoint: int | None = None,
    czi_channel: int | None = None,
    batch_size: int | None = None,
    allow_multiple_layers: bool = False,
    max_layers: int = 20,
    truncate_rgba_to_rgb: bool = True,
    executor: Executor | None = None
) -> Layer

Creates a new layer called layer_name with mag mag from images. images can be one of the following:

glob-string
list of paths
pims.FramesSequence instance

Please see the pims docs for more information.

This method needs extra packages like tifffile or pylibczirw. Please install the respective extras, e.g. using python -m pip install "webknossos[all]".

Further Arguments:

category: color by default, may be set to "segmentation"
data_format: by default zarr3 files are written, may be set to "wkw" or "zarr" to write in these formats.
mag: magnification to use for the written data
chunk_shape, chunks_per_shard, shard_shape, compress: adjust how the data is stored on disk
topleft: set an offset in Mag(1) to start writing the data, only affecting the output
swap_xy: set to True to interchange x and y axis before writing to disk
flip_x, flip_y, flip_z: set to True to reverse the respective axis before writing to disk
dtype: the read image data will be convertoed to this dtype using numpy.ndarray.astype
use_bioformats: set to True to only use the pims bioformats adapter directly, needs a JVM, set to False to forbid using the bioformats adapter, by default it is tried as a last option
channel: may be used to select a single channel, if multiple are available
timepoint: for timeseries, select a timepoint to use by specifying it as an int, starting from 0
czi_channel: may be used to select a channel for .czi images, which differs from normal color-channels
batch_size: size to process the images (influences RAM consumption), must be a multiple of the chunk-size z-axis for uncompressed and the shard-size z-axis for compressed layers, default is the chunk-size or shard-size respectively
allow_multiple_layers: set to True if timepoints or channels may result in multiple layers being added (only the first is returned)
max_layers: only applies if allow_multiple_layers=True, limits the number of layers added via different channels or timepoints
truncate_rgba_to_rgb: only applies if allow_multiple_layers=True, set to False to write four channels into layers instead of an RGB channel
executor: pass a ClusterExecutor instance to parallelize the conversion jobs across the batches

add_layer_like ¶

add_layer_like(
    other_layer: Layer | RemoteLayer, layer_name: str
) -> Layer

add_remote_layer ¶

add_remote_layer(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
) -> Layer

Deprecated. Use Dataset.add_layer_as_ref instead.

add_symlink_layer ¶

add_symlink_layer(
    foreign_layer: str | PathLike | UPath | Layer,
    new_layer_name: str | None = None,
    *,
    make_relative: bool = False
) -> Layer

Deprecated. Use Dataset.add_layer_as_ref instead.

Create symbolic link to layer from another dataset.

Instead of copying data, creates a symbolic link to the original layer's data and copies only the layer metadata. Changes to the original layer's properties, e.g. bounding box, afterwards won't affect this dataset and vice-versa.

Parameters:

foreign_layer (str | PathLike | UPath | Layer) –

Layer to link to (path or Layer object)
make_relative (bool, default: False ) –

Whether to create relative symlinks
new_layer_name (str | None, default: None ) –

Optional name for the linked layer, uses original name if None

Returns:

Layer ( Layer ) –

The newly created symbolic link layer

Raises:

IndexError –

If target layer name already exists
AssertionError –

If trying to create symlinks in/to remote datasets
RuntimeError –

If dataset is read-only

Examples:

other_ds = Dataset.open("other/dataset")
linked = ds.add_symlink_layer(
    other_ds.get_layer("color"),
    make_relative=True
)

Note

Only works with local file systems, cannot link remote datasets or create symlinks in remote datasets.

calculate_bounding_box ¶

calculate_bounding_box() -> NDBoundingBox

Calculate the enclosing bounding box of all layers.

Finds the smallest box that contains all data from all layers in the dataset.

Returns:

NDBoundingBox ( NDBoundingBox ) –

Bounding box containing all layer data

Examples:

bbox = ds.calculate_bounding_box()
print(f"Dataset spans {bbox.size} voxels")
print(f"Dataset starts at {bbox.topleft}")

compress ¶

compress(*, executor: Executor | None = None) -> None

Compress all uncompressed magnifications in-place.

Compresses the data of all magnification levels that aren't already compressed, for all layers in the dataset.

Parameters:

executor (Executor | None, default: None ) –

Optional executor for parallel compression

Raises:

RuntimeError –

If dataset is read-only

Examples:

ds.compress()

Note

If data is already compressed, this will have no effect.

copy_dataset ¶

copy_dataset(
    new_dataset_path: str | PathLike | UPath,
    *,
    voxel_size: tuple[float, float, float] | None = None,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    data_format: str | DataFormat | None = None,
    compress: bool | None = None,
    exists_ok: bool = False,
    executor: Executor | None = None,
    voxel_size_with_unit: VoxelSize | None = None,
    layers_to_ignore: Iterable[str] | None = None
) -> Dataset

Creates an independent copy of the dataset with all layers at a new location. Data storage parameters can be customized for the copied dataset.

Parameters:

new_dataset_path (str | PathLike | UPath) –

Path where new dataset should be created
voxel_size (tuple[float, float, float] | None, default: None ) –

Optional tuple of floats (x,y,z) specifying voxel size in nanometers
chunk_shape (Vec3IntLike | int | None, default: None ) –

Optional shape of chunks for data storage
shard_shape (Vec3IntLike | int | None, default: None ) –

Optional shape of shards for data storage
chunks_per_shard (Vec3IntLike | int | None, default: None ) –

Deprecated, use shard_shape. Optional number of chunks per shard
data_format (str | DataFormat | None, default: None ) –

Optional format to store data ('wkw', 'zarr', 'zarr3')
compress (bool | None, default: None ) –

Optional whether to compress data
exists_ok (bool, default: False ) –

Whether to overwrite existing datasets and layers
executor (Executor | None, default: None ) –

Optional executor for parallel copying
voxel_size_with_unit (VoxelSize | None, default: None ) –

Optional voxel size specification with units
layers_to_ignore (Iterable[str] | None, default: None ) –

List of layer names to exclude from the copy

Returns:

Dataset ( Dataset ) –

The newly created copy

Raises:

AssertionError –

If trying to copy WKW layers to remote dataset

Examples:

Basic copy:

copied = ds.copy_dataset("path/to/copy")

Copy with different storage:

copied = ds.copy_dataset(
    "path/to/copy",
    data_format="zarr",
    compress=True
)

Note

WKW layers can only be copied to datasets on local file systems. For remote datasets, use data_format='zarr3'.

delete_layer ¶

delete_layer(layer_name: str) -> None

Delete a layer from the dataset.

Removes the layer's data and metadata from disk completely. This deletes both the datasource-properties.json entry and all data files for the layer.

Parameters:

layer_name (str) –

Name of layer to delete

Raises:

IndexError –

If no layer with the given name exists
RuntimeError –

If dataset is read-only

Examples:

ds.delete_layer("old_layer")
print("Remaining layers:", list(ds.layers))

download `classmethod` ¶

download(
    dataset_name_or_url: str,
    *,
    organization_id: str | None = None,
    sharing_token: str | None = None,
    webknossos_url: str | None = None,
    bbox: BoundingBox | None = None,
    layers: list[str] | str | None = None,
    mags: list[Mag] | None = None,
    path: PathLike | UPath | str | None = None,
    exist_ok: bool = False
) -> Dataset

Downloads a dataset and returns the Dataset instance.

dataset_name_or_url may be a dataset name or a full URL to a dataset view, e.g. https://webknossos.org/datasets/scalable_minds/l4_sample_dev/view If a URL is used, organization_id, webknossos_url and sharing_token must not be set.
organization_id may be supplied if a dataset name was used in the previous argument, it defaults to your current organization from the webknossos_context. You can find your organization_id here.
sharing_token may be supplied if a dataset name was used and can specify a sharing token.
webknossos_url may be supplied if a dataset name was used, and allows to specify in which webknossos instance to search for the dataset. It defaults to the url from your current webknossos_context, using https://webknossos.org as a fallback.
bbox, layers, and mags specify which parts of the dataset to download. If nothing is specified the whole image, all layers, and all mags are downloaded respectively.
path and exist_ok specify where to save the downloaded dataset and whether to overwrite if the path exists.

downsample ¶

downsample(
    *,
    sampling_mode: SamplingModes = ANISOTROPIC,
    coarsest_mag: Mag | None = None,
    interpolation_mode: str = "default",
    compress: bool | Zarr3Config = True,
    executor: Executor | None = None
) -> None

Generate downsampled magnifications for all layers.

Creates lower resolution versions (coarser magnifications) of all layers that are not yet downsampled, up to the specified coarsest magnification.

Parameters:

sampling_mode (SamplingModes, default: ANISOTROPIC ) –

Strategy for downsampling (e.g. ANISOTROPIC, MAX)
coarsest_mag (Mag | None, default: None ) –

Optional maximum/coarsest magnification to generate
interpolation_mode (str, default: 'default' ) –

Interpolation method to use. Defaults to "default" (= "mode" for segmentation, "median" for color).
compress (bool | Zarr3Config, default: True ) –

Whether to compress generated magnifications. For Zarr3 datasets, codec configuration and chunk key encoding may also be supplied. Defaults to True.
executor (Executor | None, default: None ) –

Optional executor for parallel processing

Raises:

RuntimeError –

If dataset is read-only

Examples:

Basic downsampling:

ds.downsample()

With custom parameters:

ds.downsample(
    sampling_mode=SamplingModes.ANISOTROPIC,
    coarsest_mag=Mag(8),
)

Note

ANISOTROPIC sampling creates anisotropic downsampling until dataset is isotropic
Other modes like MAX, CONSTANT etc create regular downsampling patterns
If magnifications already exist they will not be regenerated

from_images `classmethod` ¶

from_images(
    input_path: str | PathLike | UPath,
    output_path: str | PathLike | UPath,
    voxel_size: tuple[float, float, float] | None = None,
    name: str | None = None,
    *,
    map_filepath_to_layer_name: (
        ConversionLayerMapping | Callable[[UPath], str]
    ) = INSPECT_SINGLE_FILE,
    z_slices_sort_key: Callable[
        [UPath], Any
    ] = natsort_keygen(),
    voxel_size_with_unit: VoxelSize | None = None,
    layer_name: str | None = None,
    layer_category: LayerCategoryType | None = None,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: int | Vec3IntLike | None = None,
    compress: bool = True,
    swap_xy: bool = False,
    flip_x: bool = False,
    flip_y: bool = False,
    flip_z: bool = False,
    use_bioformats: bool | None = None,
    max_layers: int = 20,
    batch_size: int | None = None,
    executor: Executor | None = None
) -> Dataset

This method imports image data in a folder or from a file as a webknossos dataset.

The image data can be 3D images (such as multipage tiffs) or stacks of 2D images. Multiple 3D images or image stacks are mapped to different layers based on the mapping strategy.

The exact mapping is handled by the argument map_filepath_to_layer_name, which can be a pre-defined strategy from the enum ConversionLayerMapping, or a custom callable, taking a path of an image file and returning the corresponding layer name. All files belonging to the same layer name are then grouped. In case of multiple files per layer, those are usually mapped to the z-dimension. The order of the z-slices can be customized by setting z_slices_sort_key.

For more fine-grained control, please create an empty dataset and use add_layer_from_images.

Parameters:

input_path (str | PathLike | UPath) –

Path to input image files
output_path (str | PathLike | UPath) –

Output path for created dataset
voxel_size (tuple[float, float, float] | None, default: None ) –

Optional tuple of floats (x,y,z) for voxel size in nm
name (str | None, default: None ) –

Optional name for dataset
map_filepath_to_layer_name (ConversionLayerMapping | Callable[[UPath], str], default: INSPECT_SINGLE_FILE ) –

Strategy for mapping files to layers, either a ConversionLayerMapping enum value or callable taking Path and returning str
z_slices_sort_key (Callable[[UPath], Any], default: natsort_keygen() ) –

Optional key function for sorting z-slices
voxel_size_with_unit (VoxelSize | None, default: None ) –

Optional voxel size with unit specification
layer_name (str | None, default: None ) –

Optional name for layer(s)
layer_category (LayerCategoryType | None, default: None ) –

Optional category override (LayerCategoryType.color / LayerCategoryType.segmentation)
data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

Format to store data in ('wkw'/'zarr'/'zarr3)
chunk_shape (Vec3IntLike | int | None, default: None ) –

Optional. Shape of chunks to store data in
shard_shape (Vec3IntLike | int | None, default: None ) –

Optional. Shape of shards to store data in
chunks_per_shard (int | Vec3IntLike | None, default: None ) –

Deprecated, use shard_shape. Optional. number of chunks per shard
compress (bool, default: True ) –

Whether to compress the data
swap_xy (bool, default: False ) –

Whether to swap x and y axes
flip_x (bool, default: False ) –

Whether to flip the x axis
flip_y (bool, default: False ) –

Whether to flip the y axis
flip_z (bool, default: False ) –

Whether to flip the z axis
use_bioformats (bool | None, default: None ) –

Whether to use bioformats for reading
max_layers (int, default: 20 ) –

Maximum number of layers to create
batch_size (int | None, default: None ) –

Size of batches for processing
executor (Executor | None, default: None ) –

Optional executor for parallelization

Returns:

Dataset ( Dataset ) –

The created dataset instance

Examples:

ds = Dataset.from_images("path/to/images/",
                        "path/to/dataset/",
                        voxel_size=(1, 1, 1))

Note

This method needs extra packages like tifffile or pylibczirw. Install with pip install "webknossos[all]" and pip install --extra-index-url https://pypi.scm.io/simple/ "webknossos[czi]".

fs_copy_dataset ¶

fs_copy_dataset(
    new_dataset_path: str | PathLike | UPath,
    *,
    exists_ok: bool = False,
    layers_to_ignore: Iterable[str] | None = None
) -> Dataset

Deprecated. File-based copy is automatically used by Dataset.copy_dataset.

Creates an independent copy of the dataset with all layers at a new location.

This method copies the files of the dataset as is and, therefore, might be faster than Dataset.copy_dataset, which decodes and encodes all the data. If you wish to change the data storage parameters, use Dataset.copy_dataset.

Parameters:

new_dataset_path (str | PathLike | UPath) –

Path where new dataset should be created
exists_ok (bool, default: False ) –

Whether to overwrite existing datasets and layers
layers_to_ignore (Iterable[str] | None, default: None ) –

List of layer names to exclude from the copy

Returns:

Dataset ( Dataset ) –

The newly created copy

Raises:

AssertionError –

If trying to copy WKW layers to remote dataset

Examples:

Basic copy:

copied = ds.fs_copy_dataset("path/to/copy")

Note

WKW layers can only be copied to datasets on local file systems.

get_color_layers ¶

get_color_layers() -> list[LayerType]

Get all color layers in the dataset.

Provides access to all layers with category 'color'. Useful when a dataset contains multiple color layers.

Returns:

list[LayerType] –

list[Layer]: List of all color layers in order

Examples:

Print all color layer names:

for layer in ds.get_color_layers():
    print(layer.name)

Note

If you need only a single color layer, consider using get_layer() with the specific layer name instead.

get_layer ¶

get_layer(layer_name: str) -> LayerType

Get a specific layer from this dataset.

Parameters:

layer_name (str) –

Name of the layer to retrieve

Returns:

Layer ( LayerType ) –

The requested layer object

Raises:

IndexError –

If no layer with the given name exists

Examples:

color_layer = ds.get_layer("color")
seg_layer = ds.get_layer("segmentation")

Note

Use layers property to access all layers at once.

get_or_add_layer ¶

get_or_add_layer(
    layer_name: str,
    category: LayerCategoryType,
    *,
    dtype_per_layer: DTypeLike | None = None,
    dtype_per_channel: DTypeLike | None = None,
    num_channels: int | None = None,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    **kwargs: Any
) -> Layer

Get an existing layer or create a new one.

Gets a layer with the given name if it exists, otherwise creates a new layer with the specified parameters.

Parameters:

layer_name (str) –

Name of the layer to get or create
category (LayerCategoryType) –

Layer category ('color' or 'segmentation')
dtype_per_layer (DTypeLike | None, default: None ) –

Deprecated, use dtype_per_channel. Optional data type for entire layer
dtype_per_channel (DTypeLike | None, default: None ) –

Optional data type per channel
num_channels (int | None, default: None ) –

Optional number of channels
data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

Format to store data ('wkw', 'zarr', etc.)
**kwargs (Any, default: {} ) –

Additional arguments passed to add_layer()

Returns:

Layer ( Layer ) –

The existing or newly created layer

Raises:

AssertionError –

If existing layer's properties don't match specified parameters
ValueError –

If both dtype_per_layer and dtype_per_channel specified
RuntimeError –

If invalid category specified

Examples:

layer = ds.get_or_add_layer(
    "segmentation",
    LayerCategoryType.SEGMENTATION_CATEGORY,
    dtype_per_channel=np.uint64,
)

Note

The dtype can be specified either per layer or per channel, but not both. For existing layers, the parameters are validated against the layer properties.

get_remote_datasets `staticmethod` ¶

get_remote_datasets(
    *,
    organization_id: str | None = None,
    tags: str | Sequence[str] | None = None,
    name: str | None = None,
    folder_id: RemoteFolder | str | None = None
) -> Mapping[str, RemoteDataset]

get_segmentation_layer ¶

get_segmentation_layer(
    layer_name: str,
) -> SegmentationLayerType

Get a segmentation layer by name.

Parameters:

layer_name (str) –

Name of the layer to get

Returns:

SegmentationLayer ( SegmentationLayerType ) –

The segmentation layer

get_segmentation_layers ¶

get_segmentation_layers() -> list[SegmentationLayerType]

Get all segmentation layers in the dataset.

Provides access to all layers with category 'segmentation'. Useful when a dataset contains multiple segmentation layers.

Returns:

list[SegmentationLayerType] –

list[SegmentationLayer]: List of all segmentation layers in order

Examples:

Print all segmentation layer names:

for layer in ds.get_segmentation_layers():
    print(layer.name)

Note

If you need only a single segmentation layer, consider using get_layer() with the specific layer name instead.

open `classmethod` ¶

open(
    dataset_path: str | PathLike | UPath,
    read_only: bool = False,
) -> Dataset

To open an existing dataset on disk, simply call Dataset.open("your_path"). This requires datasource-properties.json to exist in this folder. Based on the datasource-properties.json, a dataset object is constructed. Only layers and magnifications that are listed in the properties are loaded (even though there might exist more layers or magnifications on disk).

The dataset_path refers to the top level directory of the dataset (excluding layer or magnification names).

open_remote `classmethod` ¶

open_remote(
    dataset_name_or_url: str | None = None,
    organization_id: str | None = None,
    sharing_token: str | None = None,
    webknossos_url: str | None = None,
    dataset_id: str | None = None,
    annotation_id: str | None = None,
    use_zarr_streaming: bool = True,
    read_only: bool = False,
) -> RemoteDataset

publish_to_preliminary_dataset ¶

publish_to_preliminary_dataset(
    dataset_id: str,
    path_prefix: str | None = None,
    symlink_data_instead_of_copy: bool = False,
) -> None

Copies or symlinks the data to paths returned by WEBKNOSSOS The dataset needs to be in status "uploading". The dataset already exists in WEBKNOSSOS but has no dataset_properties. With the dataset_properties WEBKNOSSOS can reserve the paths. Args: dataset_id: The dataset_id of the already existing dataset path_prefix: The prefix of the storage path, can be used to select one of the storage path options. symlink_data_instead_of_copy: Set to true if the client has access to the same file system as the WEBKNOSSOS datastore.

shallow_copy_dataset ¶

shallow_copy_dataset(
    new_dataset_path: str | PathLike | UPath,
    *,
    name: str | None = None,
    layers_to_ignore: Iterable[str] | None = None,
    make_relative: bool | None = None
) -> Dataset

Create a new dataset that contains references to the layers, mags and attachments of another dataset.

Useful for creating alternative views or exposing datasets to WEBKNOSOSS.

Parameters:

new_dataset_path (str | PathLike | UPath) –

Path where new dataset should be created
name (str | None, default: None ) –

Optional name for the new dataset, uses original name if None
layers_to_ignore (Iterable[str] | None, default: None ) –

Optional iterable of layer names to exclude
executor –

Optional executor for copy operations

Returns:

Dataset ( Dataset ) –

The newly created dataset with linked layers

Raises:

RuntimeError –

If dataset is read-only

Examples:

Basic shallow copy:

linked = ds.shallow_copy_dataset("path/to/link")

With relative links excluding layers:

linked = ds.shallow_copy_dataset(
    "path/to/link",
    make_relative=True,
    layers_to_ignore=["temp_layer"]
)

trigger_dataset_import `classmethod` ¶

trigger_dataset_import(
    directory_name: str,
    organization: str,
    token: str | None = None,
) -> None

Deprecated. Use Dataset.trigger_reload_in_datastore instead.

trigger_reload_in_datastore `classmethod` ¶

trigger_reload_in_datastore(
    dataset_name_or_url: str | None = None,
    organization_id: str | None = None,
    webknossos_url: str | None = None,
    dataset_id: str | None = None,
    organization: str | None = None,
    token: str | None = None,
    datastore_url: str | None = None,
) -> None

upload ¶

upload(
    new_dataset_name: str | None = None,
    initial_team_ids: list[str] | None = None,
    folder_id: str | RemoteFolder | None = None,
    require_unique_name: bool = False,
    layers_to_link: (
        list[LayerToLink | RemoteLayer] | None
    ) = None,
    upload_directly_to_common_storage: bool = False,
    jobs: int | None = None,
    common_storage_path_prefix: str | None = None,
    symlink_data_instead_of_copy: bool = False,
) -> RemoteDataset

Upload this dataset to webknossos.

Creates database entries and sets access rights on the webknossos instance before the actual data upload. The client then copies the data directly to the returned paths.

Parameters:

new_dataset_name (str | None, default: None ) –

Name for the new dataset defaults to the current name.
initial_team_ids (list[str] | None, default: None ) –

Optional list of team IDs to grant initial access
folder_id (str | RemoteFolder | None, default: None ) –

Optional ID of folder where dataset should be placed
require_unique_name (bool, default: False ) –

Whether to make request fail in case a dataset with the name already exists
layers_to_link (list[LayerToLink | RemoteLayer] | None, default: None ) –

Optional list of LayerToLink to link already published layers to the dataset.
upload_directly_to_common_storage (bool, default: False ) –

Set this to true when the client has access to the same storage system as the WEBKNOSSOS datastore (file system or cloud storage).
jobs (int | None, default: None ) –

Optional number of jobs to use for uploading the data.
common_storage_path_prefix (str | None, default: None ) –

Optional path prefix used when upload_directly_to_common_storage is true to select one of the available mount points for the dataset folder.
symlink_data_instead_of_copy (bool, default: False ) –

Considered, when upload_directly_to_common_storage is True. Set this to true when the client has access to the same file system as the WEBKNOSSOS datastore.

Returns: RemoteDataset: Reference to the newly created remote dataset Note: upload_directly_to_common_storage is typically only used by administrators with direct file system or S3 access to the WEBKNOSSOS datastore. Most users should let upload_directly_to_common_storage to default to False Examples:

remote_ds = ds.upload(
    "my_dataset",
    ["team_a", "team_b"],
    "folder_123"
)
print(remote_ds.url)

Link existing layers:

link = LayerToLink.from_remote_layer(existing_layer)
remote_ds = ds.upload(layers_to_link=[link])

write_layer ¶

write_layer(
    layer_name: str,
    category: LayerCategoryType,
    data: ndarray,
    *,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    downsample: bool = True,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    axes: Iterable[str] | None = None,
    absolute_offset: Vec3IntLike | VecIntLike | None = None,
    mag: MagLike = Mag(1)
) -> Layer

Write a numpy array to a new layer and downsample.

Parameters:

layer_name (str) –

Name of the new layer.
category (LayerCategoryType) –

Category of the new layer.
data (ndarray) –

The data to write.
data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

Format to store the data. Defaults to zarr3.
downsample (bool, default: True ) –

Whether to downsample the data. Defaults to True.
chunk_shape (Vec3IntLike | int | None, default: None ) –

Shape of chunks for storage. Recommended (32,32,32) or (64,64,64). Defaults to (32,32,32).
shard_shape (Vec3IntLike | int | None, default: None ) –

Shape of shards for storage. Must be a multiple of chunk_shape. If specified, chunks_per_shard must not be specified. Defaults to (1024, 1024, 1024).
chunks_per_shard (Vec3IntLike | int | None, default: None ) –

Deprecated, use shard_shape. Number of chunks per shards. If specified, shard_shape must not be specified.
axes (Iterable[str] | None, default: None ) –

The axes of the data for non-3D data.
absolute_offset (Vec3IntLike | VecIntLike | None, default: None ) –

The offset of the data. Specified in Mag 1.
mag (MagLike, default: Mag(1) ) –

Magnification to write the data at.

webknossos.dataset.dataset ¶

Dataset ¶

default_view_configuration property writable ¶

layers property ¶

name property writable ¶

path instance-attribute ¶

read_only property ¶

resolved_path property ¶

voxel_size property ¶

voxel_size_with_unit property ¶

ConversionLayerMapping ¶

ENFORCE_LAYER_PER_FILE class-attribute instance-attribute ¶

ENFORCE_LAYER_PER_FOLDER class-attribute instance-attribute ¶

ENFORCE_LAYER_PER_TOPLEVEL_FOLDER class-attribute instance-attribute ¶

ENFORCE_SINGLE_LAYER class-attribute instance-attribute ¶

INSPECT_EVERY_FILE class-attribute instance-attribute ¶

INSPECT_SINGLE_FILE class-attribute instance-attribute ¶

add_copy_layer ¶

add_fs_copy_layer ¶

add_layer ¶

add_layer_as_copy ¶

add_layer_as_ref ¶

add_layer_for_existing_files ¶

add_layer_from_images ¶

add_layer_like ¶

add_remote_layer ¶

add_symlink_layer ¶

calculate_bounding_box ¶

compress ¶

copy_dataset ¶

delete_layer ¶

download classmethod ¶

downsample ¶

from_images classmethod ¶

fs_copy_dataset ¶

get_color_layers ¶

get_layer ¶

get_or_add_layer ¶

get_remote_datasets staticmethod ¶

get_segmentation_layer ¶

get_segmentation_layers ¶

open classmethod ¶

open_remote classmethod ¶

publish_to_preliminary_dataset ¶

shallow_copy_dataset ¶

trigger_dataset_import classmethod ¶

trigger_reload_in_datastore classmethod ¶

upload ¶

write_layer ¶

default_view_configuration `property` `writable` ¶

layers `property` ¶

name `property` `writable` ¶

path `instance-attribute` ¶

read_only `property` ¶

resolved_path `property` ¶

voxel_size `property` ¶

voxel_size_with_unit `property` ¶

ENFORCE_LAYER_PER_FILE `class-attribute` `instance-attribute` ¶

ENFORCE_LAYER_PER_FOLDER `class-attribute` `instance-attribute` ¶

ENFORCE_LAYER_PER_TOPLEVEL_FOLDER `class-attribute` `instance-attribute` ¶

ENFORCE_SINGLE_LAYER `class-attribute` `instance-attribute` ¶

INSPECT_EVERY_FILE `class-attribute` `instance-attribute` ¶

INSPECT_SINGLE_FILE `class-attribute` `instance-attribute` ¶

download `classmethod` ¶

from_images `classmethod` ¶

get_remote_datasets `staticmethod` ¶

open `classmethod` ¶

open_remote `classmethod` ¶

trigger_dataset_import `classmethod` ¶

trigger_reload_in_datastore `classmethod` ¶