Skip to content

webknossos.dataset.dataset

Dataset

Dataset(
    dataset_path: str | PathLike | UPath,
    voxel_size: tuple[float, float, float] | None = None,
    name: str | None = None,
    exist_ok: bool = False,
    *,
    voxel_size_with_unit: VoxelSize | None = None,
    read_only: bool = False
)

Bases: AbstractDataset[Layer, SegmentationLayer]

A dataset is the entry point of the Dataset API.

An existing dataset on disk can be opened or new datasets can be created.

A dataset stores the data in .wkw files on disk with metadata in datasource-properties.json. The information in those files are kept in sync with the object.

Each dataset consists of one or more layers (webknossos.dataset.layer.Layer), which themselves can comprise multiple magnifications (webknossos.dataset.mag_view.MagView).

Examples:

Create a new dataset:

ds = Dataset("path/to/dataset", voxel_size=(11.2, 11.2, 25))

Open an existing dataset:

ds = Dataset.open("path/to/dataset")

Open a remote dataset:

ds = RemoteDataset.open("my_dataset", "organization_id")

Create a new dataset or open an existing one.

Creates a new dataset and the associated datasource-properties.json if one does not exist. If the dataset already exists and exist_ok is True, it is opened (the provided voxel_size and name are asserted to match the existing dataset).

Please use Dataset.open if you intend to open an existing dataset and don't want/need the creation behavior.

Parameters:

  • dataset_path (str | PathLike | UPath) –

    Path where the dataset should be created/opened

  • voxel_size (tuple[float, float, float] | None, default: None ) –

    Optional tuple of floats (x, y, z) specifying voxel size in nanometers

  • name (str | None, default: None ) –

    Optional name for the dataset, defaults to last part of dataset_path if not provided

  • exist_ok (bool, default: False ) –

    Whether to open an existing dataset at the path rather than failing

  • voxel_size_with_unit (VoxelSize | None, default: None ) –

    Optional voxel size with unit specification

  • read_only (bool, default: False ) –

    Whether to open dataset in read-only mode

Raises:

  • RuntimeError

    If dataset exists and exist_ok=False

  • AssertionError

    If opening existing dataset with mismatched voxel size or name

default_view_configuration property writable

default_view_configuration: DatasetViewConfiguration | None

Default view configuration for this dataset in webknossos.

Controls how the dataset is displayed in webknossos when first opened by a user, including position, zoom level, rotation etc.

Returns:

Examples:

ds.default_view_configuration = DatasetViewConfiguration(
    zoom=1.5,
    position=(100, 100, 100)
)

layers property

layers: Mapping[str, LayerType]

Dictionary containing all layers of this dataset.

Returns:

  • Mapping[str, LayerType]

    dict[str, Layer]: Dictionary mapping layer names to Layer objects

Examples:

for layer_name, layer in ds.layers.items():
   print(layer_name)

name property writable

name: str

Name of this dataset as specified in datasource-properties.json.

Can be modified to rename the dataset. Changes are persisted to the properties file.

Returns:

  • str ( str ) –

    Current dataset name

Examples:

ds.name = "my_renamed_dataset"  # Updates the name in properties file

path instance-attribute

path: UPath = path

read_only property

read_only: bool

Whether this dataset is opened in read-only mode.

When True, operations that would modify the dataset (adding layers, changing properties, etc.) are not allowed and will raise RuntimeError.

Returns:

  • bool ( bool ) –

    True if dataset is read-only, False otherwise

resolved_path property

resolved_path: UPath

voxel_size property

voxel_size: tuple[float, float, float]

Size of each voxel in nanometers along each dimension (x, y, z).

Returns:

  • tuple[float, float, float]

    tuple[float, float, float]: Size of each voxel in nanometers for x,y,z dimensions

Examples:

vx, vy, vz = ds.voxel_size
print(f"X resolution is {vx}nm")

voxel_size_with_unit property

voxel_size_with_unit: VoxelSize

Size of voxels including unit information.

Size of each voxel along each dimension (x, y, z), including unit specification. The default unit is nanometers.

Returns:

  • VoxelSize ( VoxelSize ) –

    Object containing voxel sizes and their units

ConversionLayerMapping

Bases: Enum

Strategies for mapping file paths to layers when importing images.

These strategies determine how input image files are grouped into layers during dataset creation using Dataset.from_images(). If no strategy is provided, INSPECT_SINGLE_FILE is used as the default.

If none of the pre-defined strategies fit your needs, you can provide a custom callable that takes a Path and returns a layer name string.

Examples:

Using default strategy:

ds = Dataset.from_images("images/", "dataset/")

Explicit strategy:

ds = Dataset.from_images(
    "images/",
    "dataset/",
    map_filepath_to_layer_name=ConversionLayerMapping.ENFORCE_SINGLE_LAYER
)

Custom mapping function:

ds = Dataset.from_images(
    "images/",
    "dataset/",
    map_filepath_to_layer_name=lambda p: p.stem
)

ENFORCE_LAYER_PER_FILE class-attribute instance-attribute

ENFORCE_LAYER_PER_FILE = 'enforce_layer_per_file'

Creates a new layer for each input file. Useful for converting multiple 3D images or when each 2D image should become its own layer.

ENFORCE_LAYER_PER_FOLDER class-attribute instance-attribute

ENFORCE_LAYER_PER_FOLDER = 'enforce_layer_per_folder'

Groups files by their containing folder. Each folder becomes one layer. Useful for organized 2D image stacks.

ENFORCE_LAYER_PER_TOPLEVEL_FOLDER class-attribute instance-attribute

ENFORCE_LAYER_PER_TOPLEVEL_FOLDER = (
    "enforce_layer_per_toplevel_folder"
)

Groups files by their top-level folder. Useful when multiple layers each have their stacks split across subfolders.

ENFORCE_SINGLE_LAYER class-attribute instance-attribute

ENFORCE_SINGLE_LAYER = 'enforce_single_layer'

Combines all input files into a single layer. Only useful when all images are 2D slices that should be combined.

INSPECT_EVERY_FILE class-attribute instance-attribute

INSPECT_EVERY_FILE = 'inspect_every_file'

Like INSPECT_SINGLE_FILE but determines strategy separately for each file. More flexible but slower for many files.

INSPECT_SINGLE_FILE class-attribute instance-attribute

INSPECT_SINGLE_FILE = 'inspect_single_file'

Default strategy. Inspects first image file to determine if data is 2D or 3D. For 2D data uses ENFORCE_LAYER_PER_FOLDER, for 3D uses ENFORCE_LAYER_PER_FILE.

add_copy_layer

add_copy_layer(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
    *,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    data_format: str | DataFormat | None = None,
    compress: bool | None = None,
    exists_ok: bool = False,
    executor: Executor | None = None,
    with_attachments: bool = True
) -> Layer

Deprecated. Use Dataset.add_layer_as_copy instead.

add_fs_copy_layer

add_fs_copy_layer(
    foreign_layer: str | PathLike | UPath | Layer,
    new_layer_name: str | None = None,
) -> Layer

Deprecated. File-based copy is automatically used in Dataset.add_layer_as_copy.

Copies the files at foreign_layer which belongs to another dataset to the current dataset via the filesystem. Additionally, the relevant information from the datasource-properties.json of the other dataset are copied too. If new_layer_name is None, the name of the foreign layer is used.

add_layer

add_layer(
    layer_name: str,
    category: LayerCategoryType,
    *,
    dtype_per_layer: DTypeLike | None = None,
    dtype_per_channel: DTypeLike | None = None,
    num_channels: int | None = None,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    bounding_box: NDBoundingBox | None = None,
    **kwargs: Any
) -> Layer

Create a new layer in the dataset.

Creates a new layer with the given name, category, and data type.

Parameters:

  • layer_name (str) –

    Name for the new layer

  • category (LayerCategoryType) –

    Either 'color' or 'segmentation'

  • dtype_per_layer (DTypeLike | None, default: None ) –

    Deprecated, use dtype_per_channel. Optional data type for entire layer, e.g. np.uint8

  • dtype_per_channel (DTypeLike | None, default: None ) –

    Optional data type per channel, e.g. np.uint8

  • num_channels (int | None, default: None ) –

    Number of channels (default 1)

  • data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

    Format to store data ('wkw', 'zarr', 'zarr3')

  • bounding_box (NDBoundingBox | None, default: None ) –

    Optional initial bounding box of layer

  • **kwargs (Any, default: {} ) –

    Additional arguments: - largest_segment_id: For segmentation layers, initial largest ID - mappings: For segmentation layers, optional ID mappings

Returns:

  • Layer ( Layer ) –

    The newly created layer

Raises:

  • IndexError

    If layer with given name already exists

  • RuntimeError

    If invalid category specified

  • AttributeError

    If both dtype_per_layer and dtype_per_channel specified

  • AssertionError

    If invalid layer name or WKW format used with remote dataset

Examples:

Create color layer:

layer = ds.add_layer(
    "my_raw_microscopy_layer",
    LayerCategoryType.COLOR_CATEGORY,
    dtype_per_channel=np.uint8,
)

Create segmentation layer:

layer = ds.add_layer(
    "my_segmentation_labels",
    LayerCategoryType.SEGMENTATION_CATEGORY,
    dtype_per_channel=np.uint64
)
Note

The dtype can be specified either per layer or per channel, but not both. If neither is specified, uint8 per channel is used by default. WKW format can only be used with local datasets.

add_layer_as_copy

add_layer_as_copy(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
    *,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    data_format: str | DataFormat | None = None,
    compress: bool | Zarr3Config | None = None,
    exists_ok: bool = False,
    executor: Executor | None = None,
    with_attachments: bool = True
) -> Layer

Copy layer from another dataset to this one.

Creates a new layer in this dataset by copying data and metadata from a layer in another dataset.

Parameters:

  • foreign_layer (str | PathLike | UPath | Layer | RemoteLayer) –

    Layer to copy (path or Layer object)

  • new_layer_name (str | None, default: None ) –

    Optional name for the new layer, uses original name if None

  • chunk_shape (Vec3IntLike | int | None, default: None ) –

    Optional shape of chunks for storage

  • shard_shape (Vec3IntLike | int | None, default: None ) –

    Optional shape of shards for storage

  • chunks_per_shard (Vec3IntLike | int | None, default: None ) –

    Deprecated, use shard_shape. Optional number of chunks per shard

  • data_format (str | DataFormat | None, default: None ) –

    Optional format to store copied data ('wkw', 'zarr', etc.)

  • compress (bool | Zarr3Config | None, default: None ) –

    Optional whether to compress copied data

  • exists_ok (bool, default: False ) –

    Whether to overwrite existing layers

  • executor (Executor | None, default: None ) –

    Optional executor for parallel copying

Returns:

  • Layer ( Layer ) –

    The newly created copy of the layer

Raises:

  • IndexError

    If target layer name already exists

  • RuntimeError

    If dataset is read-only

Examples:

Copy layer keeping same name:

other_ds = Dataset.open("other/dataset")
copied = ds.add_layer_as_copy(other_ds.get_layer("color"))

Copy with new name:

copied = ds.add_layer_as_copy(
    other_ds.get_layer("color"),
    new_layer_name="color_copy",
    compress=True
)

add_layer_as_ref

add_layer_as_ref(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
) -> Layer

Add a layer from another dataset by reference.

Creates a layer that references data from a remote dataset. The image data will be streamed on-demand when accessed.

Parameters:

  • foreign_layer (str | PathLike | UPath | Layer | RemoteLayer) –

    Foreign layer to add (path or Layer object)

  • new_layer_name (str | None, default: None ) –

    Optional name for the new layer, uses original name if None

Returns:

  • Layer ( Layer ) –

    The newly created remote layer referencing the foreign data

Raises:

  • IndexError

    If target layer name already exists

  • AssertionError

    If trying to add non-remote layer or same origin dataset

  • RuntimeError

    If dataset is read-only

Examples:

ds = Dataset.open("other/dataset")
remote_ds = RemoteDataset.open("my_dataset", "my_org_id")
new_layer = ds.add_layer_as_ref(
    remote_ds.get_layer("color")
)
Note

Changes to the original layer's properties afterwards won't affect this dataset. Data is only referenced, not copied.

add_layer_for_existing_files

add_layer_for_existing_files(
    layer_name: str,
    category: LayerCategoryType,
    **kwargs: Any
) -> Layer

Create a new layer from existing data files.

Adds a layer by discovering and incorporating existing data files that were created externally, rather than creating new ones. The layer properties are inferred from the existing files unless overridden.

Parameters:

  • layer_name (str) –

    Name for the new layer

  • category (LayerCategoryType) –

    Layer category ('color' or 'segmentation')

  • **kwargs (Any, default: {} ) –

    Additional arguments: - num_channels: Override detected number of channels - dtype_per_channel: Override detected data type - data_format: Override detected data format - bounding_box: Override detected bounding box

Returns:

  • Layer ( Layer ) –

    The newly created layer referencing the existing files

Raises:

  • AssertionError

    If layer already exists or no valid files found

  • RuntimeError

    If dataset is read-only

Examples:

Basic usage:

layer = ds.add_layer_for_existing_files(
    "external_data",
    "color"
)

Override properties:

layer = ds.add_layer_for_existing_files(
    "segmentation_data",
    "segmentation",
    dtype_per_channel=np.uint64
)
Note

The data files must already exist in the dataset directory under the layer name. Files are analyzed to determine properties like data type and number of channels. Magnifications are discovered automatically.

add_layer_from_images

add_layer_from_images(
    images: Union[
        str, FramesSequence, list[str | PathLike | UPath]
    ],
    layer_name: str,
    category: LayerCategoryType | None = "color",
    *,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    mag: MagLike = Mag(1),
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: int | Vec3IntLike | None = None,
    compress: bool = True,
    topleft: VecIntLike = zeros(),
    swap_xy: bool = False,
    flip_x: bool = False,
    flip_y: bool = False,
    flip_z: bool = False,
    dtype: DTypeLike | None = None,
    use_bioformats: bool | None = None,
    channel: int | None = None,
    timepoint: int | None = None,
    czi_channel: int | None = None,
    batch_size: int | None = None,
    allow_multiple_layers: bool = False,
    max_layers: int = 20,
    truncate_rgba_to_rgb: bool = True,
    executor: Executor | None = None
) -> Layer

Creates a new layer called layer_name with mag mag from images. images can be one of the following:

  • glob-string
  • list of paths
  • pims.FramesSequence instance

Please see the pims docs for more information.

This method needs extra packages like tifffile or pylibczirw. Please install the respective extras, e.g. using python -m pip install "webknossos[all]".

Further Arguments:

  • category: color by default, may be set to "segmentation"
  • data_format: by default zarr3 files are written, may be set to "wkw" or "zarr" to write in these formats.
  • mag: magnification to use for the written data
  • chunk_shape, chunks_per_shard, shard_shape, compress: adjust how the data is stored on disk
  • topleft: set an offset in Mag(1) to start writing the data, only affecting the output
  • swap_xy: set to True to interchange x and y axis before writing to disk
  • flip_x, flip_y, flip_z: set to True to reverse the respective axis before writing to disk
  • dtype: the read image data will be convertoed to this dtype using numpy.ndarray.astype
  • use_bioformats: set to True to only use the pims bioformats adapter directly, needs a JVM, set to False to forbid using the bioformats adapter, by default it is tried as a last option
  • channel: may be used to select a single channel, if multiple are available
  • timepoint: for timeseries, select a timepoint to use by specifying it as an int, starting from 0
  • czi_channel: may be used to select a channel for .czi images, which differs from normal color-channels
  • batch_size: size to process the images (influences RAM consumption), must be a multiple of the chunk-size z-axis for uncompressed and the shard-size z-axis for compressed layers, default is the chunk-size or shard-size respectively
  • allow_multiple_layers: set to True if timepoints or channels may result in multiple layers being added (only the first is returned)
  • max_layers: only applies if allow_multiple_layers=True, limits the number of layers added via different channels or timepoints
  • truncate_rgba_to_rgb: only applies if allow_multiple_layers=True, set to False to write four channels into layers instead of an RGB channel
  • executor: pass a ClusterExecutor instance to parallelize the conversion jobs across the batches

add_layer_like

add_layer_like(
    other_layer: Layer | RemoteLayer, layer_name: str
) -> Layer

add_remote_layer

add_remote_layer(
    foreign_layer: (
        str | PathLike | UPath | Layer | RemoteLayer
    ),
    new_layer_name: str | None = None,
) -> Layer

Deprecated. Use Dataset.add_layer_as_ref instead.

add_symlink_layer(
    foreign_layer: str | PathLike | UPath | Layer,
    new_layer_name: str | None = None,
    *,
    make_relative: bool = False
) -> Layer

Deprecated. Use Dataset.add_layer_as_ref instead.

Create symbolic link to layer from another dataset.

Instead of copying data, creates a symbolic link to the original layer's data and copies only the layer metadata. Changes to the original layer's properties, e.g. bounding box, afterwards won't affect this dataset and vice-versa.

Parameters:

  • foreign_layer (str | PathLike | UPath | Layer) –

    Layer to link to (path or Layer object)

  • make_relative (bool, default: False ) –

    Whether to create relative symlinks

  • new_layer_name (str | None, default: None ) –

    Optional name for the linked layer, uses original name if None

Returns:

  • Layer ( Layer ) –

    The newly created symbolic link layer

Raises:

  • IndexError

    If target layer name already exists

  • AssertionError

    If trying to create symlinks in/to remote datasets

  • RuntimeError

    If dataset is read-only

Examples:

other_ds = Dataset.open("other/dataset")
linked = ds.add_symlink_layer(
    other_ds.get_layer("color"),
    make_relative=True
)
Note

Only works with local file systems, cannot link remote datasets or create symlinks in remote datasets.

calculate_bounding_box

calculate_bounding_box() -> NDBoundingBox

Calculate the enclosing bounding box of all layers.

Finds the smallest box that contains all data from all layers in the dataset.

Returns:

  • NDBoundingBox ( NDBoundingBox ) –

    Bounding box containing all layer data

Examples:

bbox = ds.calculate_bounding_box()
print(f"Dataset spans {bbox.size} voxels")
print(f"Dataset starts at {bbox.topleft}")

compress

compress(*, executor: Executor | None = None) -> None

Compress all uncompressed magnifications in-place.

Compresses the data of all magnification levels that aren't already compressed, for all layers in the dataset.

Parameters:

  • executor (Executor | None, default: None ) –

    Optional executor for parallel compression

Raises:

  • RuntimeError

    If dataset is read-only

Examples:

ds.compress()
Note

If data is already compressed, this will have no effect.

copy_dataset

copy_dataset(
    new_dataset_path: str | PathLike | UPath,
    *,
    voxel_size: tuple[float, float, float] | None = None,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    data_format: str | DataFormat | None = None,
    compress: bool | None = None,
    exists_ok: bool = False,
    executor: Executor | None = None,
    voxel_size_with_unit: VoxelSize | None = None,
    layers_to_ignore: Iterable[str] | None = None
) -> Dataset

Creates an independent copy of the dataset with all layers at a new location. Data storage parameters can be customized for the copied dataset.

Parameters:

  • new_dataset_path (str | PathLike | UPath) –

    Path where new dataset should be created

  • voxel_size (tuple[float, float, float] | None, default: None ) –

    Optional tuple of floats (x,y,z) specifying voxel size in nanometers

  • chunk_shape (Vec3IntLike | int | None, default: None ) –

    Optional shape of chunks for data storage

  • shard_shape (Vec3IntLike | int | None, default: None ) –

    Optional shape of shards for data storage

  • chunks_per_shard (Vec3IntLike | int | None, default: None ) –

    Deprecated, use shard_shape. Optional number of chunks per shard

  • data_format (str | DataFormat | None, default: None ) –

    Optional format to store data ('wkw', 'zarr', 'zarr3')

  • compress (bool | None, default: None ) –

    Optional whether to compress data

  • exists_ok (bool, default: False ) –

    Whether to overwrite existing datasets and layers

  • executor (Executor | None, default: None ) –

    Optional executor for parallel copying

  • voxel_size_with_unit (VoxelSize | None, default: None ) –

    Optional voxel size specification with units

  • layers_to_ignore (Iterable[str] | None, default: None ) –

    List of layer names to exclude from the copy

Returns:

  • Dataset ( Dataset ) –

    The newly created copy

Raises:

  • AssertionError

    If trying to copy WKW layers to remote dataset

Examples:

Basic copy:

copied = ds.copy_dataset("path/to/copy")

Copy with different storage:

copied = ds.copy_dataset(
    "path/to/copy",
    data_format="zarr",
    compress=True
)
Note

WKW layers can only be copied to datasets on local file systems. For remote datasets, use data_format='zarr3'.

delete_layer

delete_layer(layer_name: str) -> None

Delete a layer from the dataset.

Removes the layer's data and metadata from disk completely. This deletes both the datasource-properties.json entry and all data files for the layer.

Parameters:

  • layer_name (str) –

    Name of layer to delete

Raises:

  • IndexError

    If no layer with the given name exists

  • RuntimeError

    If dataset is read-only

Examples:

ds.delete_layer("old_layer")
print("Remaining layers:", list(ds.layers))

download classmethod

download(
    dataset_name_or_url: str,
    *,
    organization_id: str | None = None,
    sharing_token: str | None = None,
    webknossos_url: str | None = None,
    bbox: BoundingBox | None = None,
    layers: list[str] | str | None = None,
    mags: list[Mag] | None = None,
    path: PathLike | UPath | str | None = None,
    exist_ok: bool = False
) -> Dataset

Downloads a dataset and returns the Dataset instance.

  • dataset_name_or_url may be a dataset name or a full URL to a dataset view, e.g. https://webknossos.org/datasets/scalable_minds/l4_sample_dev/view If a URL is used, organization_id, webknossos_url and sharing_token must not be set.
  • organization_id may be supplied if a dataset name was used in the previous argument, it defaults to your current organization from the webknossos_context. You can find your organization_id here.
  • sharing_token may be supplied if a dataset name was used and can specify a sharing token.
  • webknossos_url may be supplied if a dataset name was used, and allows to specify in which webknossos instance to search for the dataset. It defaults to the url from your current webknossos_context, using https://webknossos.org as a fallback.
  • bbox, layers, and mags specify which parts of the dataset to download. If nothing is specified the whole image, all layers, and all mags are downloaded respectively.
  • path and exist_ok specify where to save the downloaded dataset and whether to overwrite if the path exists.

downsample

downsample(
    *,
    sampling_mode: SamplingModes = ANISOTROPIC,
    coarsest_mag: Mag | None = None,
    interpolation_mode: str = "default",
    compress: bool | Zarr3Config = True,
    executor: Executor | None = None
) -> None

Generate downsampled magnifications for all layers.

Creates lower resolution versions (coarser magnifications) of all layers that are not yet downsampled, up to the specified coarsest magnification.

Parameters:

  • sampling_mode (SamplingModes, default: ANISOTROPIC ) –

    Strategy for downsampling (e.g. ANISOTROPIC, MAX)

  • coarsest_mag (Mag | None, default: None ) –

    Optional maximum/coarsest magnification to generate

  • interpolation_mode (str, default: 'default' ) –

    Interpolation method to use. Defaults to "default" (= "mode" for segmentation, "median" for color).

  • compress (bool | Zarr3Config, default: True ) –

    Whether to compress generated magnifications. For Zarr3 datasets, codec configuration and chunk key encoding may also be supplied. Defaults to True.

  • executor (Executor | None, default: None ) –

    Optional executor for parallel processing

Raises:

  • RuntimeError

    If dataset is read-only

Examples:

Basic downsampling:

ds.downsample()

With custom parameters:

ds.downsample(
    sampling_mode=SamplingModes.ANISOTROPIC,
    coarsest_mag=Mag(8),
)
Note
  • ANISOTROPIC sampling creates anisotropic downsampling until dataset is isotropic
  • Other modes like MAX, CONSTANT etc create regular downsampling patterns
  • If magnifications already exist they will not be regenerated

from_images classmethod

from_images(
    input_path: str | PathLike | UPath,
    output_path: str | PathLike | UPath,
    voxel_size: tuple[float, float, float] | None = None,
    name: str | None = None,
    *,
    map_filepath_to_layer_name: (
        ConversionLayerMapping | Callable[[UPath], str]
    ) = INSPECT_SINGLE_FILE,
    z_slices_sort_key: Callable[
        [UPath], Any
    ] = natsort_keygen(),
    voxel_size_with_unit: VoxelSize | None = None,
    layer_name: str | None = None,
    layer_category: LayerCategoryType | None = None,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: int | Vec3IntLike | None = None,
    compress: bool = True,
    swap_xy: bool = False,
    flip_x: bool = False,
    flip_y: bool = False,
    flip_z: bool = False,
    use_bioformats: bool | None = None,
    max_layers: int = 20,
    batch_size: int | None = None,
    executor: Executor | None = None
) -> Dataset

This method imports image data in a folder or from a file as a webknossos dataset.

The image data can be 3D images (such as multipage tiffs) or stacks of 2D images. Multiple 3D images or image stacks are mapped to different layers based on the mapping strategy.

The exact mapping is handled by the argument map_filepath_to_layer_name, which can be a pre-defined strategy from the enum ConversionLayerMapping, or a custom callable, taking a path of an image file and returning the corresponding layer name. All files belonging to the same layer name are then grouped. In case of multiple files per layer, those are usually mapped to the z-dimension. The order of the z-slices can be customized by setting z_slices_sort_key.

For more fine-grained control, please create an empty dataset and use add_layer_from_images.

Parameters:

  • input_path (str | PathLike | UPath) –

    Path to input image files

  • output_path (str | PathLike | UPath) –

    Output path for created dataset

  • voxel_size (tuple[float, float, float] | None, default: None ) –

    Optional tuple of floats (x,y,z) for voxel size in nm

  • name (str | None, default: None ) –

    Optional name for dataset

  • map_filepath_to_layer_name (ConversionLayerMapping | Callable[[UPath], str], default: INSPECT_SINGLE_FILE ) –

    Strategy for mapping files to layers, either a ConversionLayerMapping enum value or callable taking Path and returning str

  • z_slices_sort_key (Callable[[UPath], Any], default: natsort_keygen() ) –

    Optional key function for sorting z-slices

  • voxel_size_with_unit (VoxelSize | None, default: None ) –

    Optional voxel size with unit specification

  • layer_name (str | None, default: None ) –

    Optional name for layer(s)

  • layer_category (LayerCategoryType | None, default: None ) –

    Optional category override (LayerCategoryType.color / LayerCategoryType.segmentation)

  • data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

    Format to store data in ('wkw'/'zarr'/'zarr3)

  • chunk_shape (Vec3IntLike | int | None, default: None ) –

    Optional. Shape of chunks to store data in

  • shard_shape (Vec3IntLike | int | None, default: None ) –

    Optional. Shape of shards to store data in

  • chunks_per_shard (int | Vec3IntLike | None, default: None ) –

    Deprecated, use shard_shape. Optional. number of chunks per shard

  • compress (bool, default: True ) –

    Whether to compress the data

  • swap_xy (bool, default: False ) –

    Whether to swap x and y axes

  • flip_x (bool, default: False ) –

    Whether to flip the x axis

  • flip_y (bool, default: False ) –

    Whether to flip the y axis

  • flip_z (bool, default: False ) –

    Whether to flip the z axis

  • use_bioformats (bool | None, default: None ) –

    Whether to use bioformats for reading

  • max_layers (int, default: 20 ) –

    Maximum number of layers to create

  • batch_size (int | None, default: None ) –

    Size of batches for processing

  • executor (Executor | None, default: None ) –

    Optional executor for parallelization

Returns:

  • Dataset ( Dataset ) –

    The created dataset instance

Examples:

ds = Dataset.from_images("path/to/images/",
                        "path/to/dataset/",
                        voxel_size=(1, 1, 1))
Note

This method needs extra packages like tifffile or pylibczirw. Install with pip install "webknossos[all]" and pip install --extra-index-url https://pypi.scm.io/simple/ "webknossos[czi]".

fs_copy_dataset

fs_copy_dataset(
    new_dataset_path: str | PathLike | UPath,
    *,
    exists_ok: bool = False,
    layers_to_ignore: Iterable[str] | None = None
) -> Dataset

Deprecated. File-based copy is automatically used by Dataset.copy_dataset.

Creates an independent copy of the dataset with all layers at a new location.

This method copies the files of the dataset as is and, therefore, might be faster than Dataset.copy_dataset, which decodes and encodes all the data. If you wish to change the data storage parameters, use Dataset.copy_dataset.

Parameters:

  • new_dataset_path (str | PathLike | UPath) –

    Path where new dataset should be created

  • exists_ok (bool, default: False ) –

    Whether to overwrite existing datasets and layers

  • layers_to_ignore (Iterable[str] | None, default: None ) –

    List of layer names to exclude from the copy

Returns:

  • Dataset ( Dataset ) –

    The newly created copy

Raises:

  • AssertionError

    If trying to copy WKW layers to remote dataset

Examples:

Basic copy:

copied = ds.fs_copy_dataset("path/to/copy")
Note

WKW layers can only be copied to datasets on local file systems.

get_color_layers

get_color_layers() -> list[LayerType]

Get all color layers in the dataset.

Provides access to all layers with category 'color'. Useful when a dataset contains multiple color layers.

Returns:

  • list[LayerType]

    list[Layer]: List of all color layers in order

Examples:

Print all color layer names:

for layer in ds.get_color_layers():
    print(layer.name)
Note

If you need only a single color layer, consider using get_layer() with the specific layer name instead.

get_layer

get_layer(layer_name: str) -> LayerType

Get a specific layer from this dataset.

Parameters:

  • layer_name (str) –

    Name of the layer to retrieve

Returns:

  • Layer ( LayerType ) –

    The requested layer object

Raises:

  • IndexError

    If no layer with the given name exists

Examples:

color_layer = ds.get_layer("color")
seg_layer = ds.get_layer("segmentation")
Note

Use layers property to access all layers at once.

get_or_add_layer

get_or_add_layer(
    layer_name: str,
    category: LayerCategoryType,
    *,
    dtype_per_layer: DTypeLike | None = None,
    dtype_per_channel: DTypeLike | None = None,
    num_channels: int | None = None,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    **kwargs: Any
) -> Layer

Get an existing layer or create a new one.

Gets a layer with the given name if it exists, otherwise creates a new layer with the specified parameters.

Parameters:

  • layer_name (str) –

    Name of the layer to get or create

  • category (LayerCategoryType) –

    Layer category ('color' or 'segmentation')

  • dtype_per_layer (DTypeLike | None, default: None ) –

    Deprecated, use dtype_per_channel. Optional data type for entire layer

  • dtype_per_channel (DTypeLike | None, default: None ) –

    Optional data type per channel

  • num_channels (int | None, default: None ) –

    Optional number of channels

  • data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

    Format to store data ('wkw', 'zarr', etc.)

  • **kwargs (Any, default: {} ) –

    Additional arguments passed to add_layer()

Returns:

  • Layer ( Layer ) –

    The existing or newly created layer

Raises:

  • AssertionError

    If existing layer's properties don't match specified parameters

  • ValueError

    If both dtype_per_layer and dtype_per_channel specified

  • RuntimeError

    If invalid category specified

Examples:

layer = ds.get_or_add_layer(
    "segmentation",
    LayerCategoryType.SEGMENTATION_CATEGORY,
    dtype_per_channel=np.uint64,
)
Note

The dtype can be specified either per layer or per channel, but not both. For existing layers, the parameters are validated against the layer properties.

get_remote_datasets staticmethod

get_remote_datasets(
    *,
    organization_id: str | None = None,
    tags: str | Sequence[str] | None = None,
    name: str | None = None,
    folder_id: RemoteFolder | str | None = None
) -> Mapping[str, RemoteDataset]

get_segmentation_layer

get_segmentation_layer(
    layer_name: str,
) -> SegmentationLayerType

Get a segmentation layer by name.

Parameters:

  • layer_name (str) –

    Name of the layer to get

Returns:

  • SegmentationLayer ( SegmentationLayerType ) –

    The segmentation layer

get_segmentation_layers

get_segmentation_layers() -> list[SegmentationLayerType]

Get all segmentation layers in the dataset.

Provides access to all layers with category 'segmentation'. Useful when a dataset contains multiple segmentation layers.

Returns:

  • list[SegmentationLayerType]

    list[SegmentationLayer]: List of all segmentation layers in order

Examples:

Print all segmentation layer names:

for layer in ds.get_segmentation_layers():
    print(layer.name)
Note

If you need only a single segmentation layer, consider using get_layer() with the specific layer name instead.

open classmethod

open(
    dataset_path: str | PathLike | UPath,
    read_only: bool = False,
) -> Dataset

To open an existing dataset on disk, simply call Dataset.open("your_path"). This requires datasource-properties.json to exist in this folder. Based on the datasource-properties.json, a dataset object is constructed. Only layers and magnifications that are listed in the properties are loaded (even though there might exist more layers or magnifications on disk).

The dataset_path refers to the top level directory of the dataset (excluding layer or magnification names).

open_remote classmethod

open_remote(
    dataset_name_or_url: str | None = None,
    organization_id: str | None = None,
    sharing_token: str | None = None,
    webknossos_url: str | None = None,
    dataset_id: str | None = None,
    annotation_id: str | None = None,
    use_zarr_streaming: bool = True,
    read_only: bool = False,
) -> RemoteDataset

publish_to_preliminary_dataset

publish_to_preliminary_dataset(
    dataset_id: str,
    path_prefix: str | None = None,
    symlink_data_instead_of_copy: bool = False,
) -> None

Copies or symlinks the data to paths returned by WEBKNOSSOS The dataset needs to be in status "uploading". The dataset already exists in WEBKNOSSOS but has no dataset_properties. With the dataset_properties WEBKNOSSOS can reserve the paths. Args: dataset_id: The dataset_id of the already existing dataset path_prefix: The prefix of the storage path, can be used to select one of the storage path options. symlink_data_instead_of_copy: Set to true if the client has access to the same file system as the WEBKNOSSOS datastore.

shallow_copy_dataset

shallow_copy_dataset(
    new_dataset_path: str | PathLike | UPath,
    *,
    name: str | None = None,
    layers_to_ignore: Iterable[str] | None = None,
    make_relative: bool | None = None
) -> Dataset

Create a new dataset that contains references to the layers, mags and attachments of another dataset.

Useful for creating alternative views or exposing datasets to WEBKNOSOSS.

Parameters:

  • new_dataset_path (str | PathLike | UPath) –

    Path where new dataset should be created

  • name (str | None, default: None ) –

    Optional name for the new dataset, uses original name if None

  • layers_to_ignore (Iterable[str] | None, default: None ) –

    Optional iterable of layer names to exclude

  • executor

    Optional executor for copy operations

Returns:

  • Dataset ( Dataset ) –

    The newly created dataset with linked layers

Raises:

  • RuntimeError

    If dataset is read-only

Examples:

Basic shallow copy:

linked = ds.shallow_copy_dataset("path/to/link")

With relative links excluding layers:

linked = ds.shallow_copy_dataset(
    "path/to/link",
    make_relative=True,
    layers_to_ignore=["temp_layer"]
)

trigger_dataset_import classmethod

trigger_dataset_import(
    directory_name: str,
    organization: str,
    token: str | None = None,
) -> None

Deprecated. Use Dataset.trigger_reload_in_datastore instead.

trigger_reload_in_datastore classmethod

trigger_reload_in_datastore(
    dataset_name_or_url: str | None = None,
    organization_id: str | None = None,
    webknossos_url: str | None = None,
    dataset_id: str | None = None,
    organization: str | None = None,
    token: str | None = None,
    datastore_url: str | None = None,
) -> None

upload

upload(
    new_dataset_name: str | None = None,
    initial_team_ids: list[str] | None = None,
    folder_id: str | RemoteFolder | None = None,
    require_unique_name: bool = False,
    layers_to_link: (
        list[LayerToLink | RemoteLayer] | None
    ) = None,
    upload_directly_to_common_storage: bool = False,
    jobs: int | None = None,
    common_storage_path_prefix: str | None = None,
    symlink_data_instead_of_copy: bool = False,
) -> RemoteDataset

Upload this dataset to webknossos.

Creates database entries and sets access rights on the webknossos instance before the actual data upload. The client then copies the data directly to the returned paths.

Parameters:

  • new_dataset_name (str | None, default: None ) –

    Name for the new dataset defaults to the current name.

  • initial_team_ids (list[str] | None, default: None ) –

    Optional list of team IDs to grant initial access

  • folder_id (str | RemoteFolder | None, default: None ) –

    Optional ID of folder where dataset should be placed

  • require_unique_name (bool, default: False ) –

    Whether to make request fail in case a dataset with the name already exists

  • layers_to_link (list[LayerToLink | RemoteLayer] | None, default: None ) –

    Optional list of LayerToLink to link already published layers to the dataset.

  • upload_directly_to_common_storage (bool, default: False ) –

    Set this to true when the client has access to the same storage system as the WEBKNOSSOS datastore (file system or cloud storage).

  • jobs (int | None, default: None ) –

    Optional number of jobs to use for uploading the data.

  • common_storage_path_prefix (str | None, default: None ) –

    Optional path prefix used when upload_directly_to_common_storage is true to select one of the available mount points for the dataset folder.

  • symlink_data_instead_of_copy (bool, default: False ) –

    Considered, when upload_directly_to_common_storage is True. Set this to true when the client has access to the same file system as the WEBKNOSSOS datastore.

Returns: RemoteDataset: Reference to the newly created remote dataset Note: upload_directly_to_common_storage is typically only used by administrators with direct file system or S3 access to the WEBKNOSSOS datastore. Most users should let upload_directly_to_common_storage to default to False Examples:

remote_ds = ds.upload(
    "my_dataset",
    ["team_a", "team_b"],
    "folder_123"
)
print(remote_ds.url)
Link existing layers:
link = LayerToLink.from_remote_layer(existing_layer)
remote_ds = ds.upload(layers_to_link=[link])

write_layer

write_layer(
    layer_name: str,
    category: LayerCategoryType,
    data: ndarray,
    *,
    data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
    downsample: bool = True,
    chunk_shape: Vec3IntLike | int | None = None,
    shard_shape: Vec3IntLike | int | None = None,
    chunks_per_shard: Vec3IntLike | int | None = None,
    axes: Iterable[str] | None = None,
    absolute_offset: Vec3IntLike | VecIntLike | None = None,
    mag: MagLike = Mag(1)
) -> Layer

Write a numpy array to a new layer and downsample.

Parameters:

  • layer_name (str) –

    Name of the new layer.

  • category (LayerCategoryType) –

    Category of the new layer.

  • data (ndarray) –

    The data to write.

  • data_format (str | DataFormat, default: DEFAULT_DATA_FORMAT ) –

    Format to store the data. Defaults to zarr3.

  • downsample (bool, default: True ) –

    Whether to downsample the data. Defaults to True.

  • chunk_shape (Vec3IntLike | int | None, default: None ) –

    Shape of chunks for storage. Recommended (32,32,32) or (64,64,64). Defaults to (32,32,32).

  • shard_shape (Vec3IntLike | int | None, default: None ) –

    Shape of shards for storage. Must be a multiple of chunk_shape. If specified, chunks_per_shard must not be specified. Defaults to (1024, 1024, 1024).

  • chunks_per_shard (Vec3IntLike | int | None, default: None ) –

    Deprecated, use shard_shape. Number of chunks per shards. If specified, shard_shape must not be specified.

  • axes (Iterable[str] | None, default: None ) –

    The axes of the data for non-3D data.

  • absolute_offset (Vec3IntLike | VecIntLike | None, default: None ) –

    The offset of the data. Specified in Mag 1.

  • mag (MagLike, default: Mag(1) ) –

    Magnification to write the data at.