webknossos.dataset.dataset
¶
Dataset
¶
Dataset(
dataset_path: str | PathLike | UPath,
voxel_size: tuple[float, float, float] | None = None,
name: str | None = None,
exist_ok: bool = False,
*,
voxel_size_with_unit: VoxelSize | None = None,
read_only: bool = False
)
Bases: AbstractDataset[Layer, SegmentationLayer]
A dataset is the entry point of the Dataset API.
An existing dataset on disk can be opened or new datasets can be created.
A dataset stores the data in .wkw
files on disk with metadata in datasource-properties.json
.
The information in those files are kept in sync with the object.
Each dataset consists of one or more layers (webknossos.dataset.layer.Layer), which themselves can comprise multiple magnifications (webknossos.dataset.mag_view.MagView).
Examples:
Create a new dataset:
ds = Dataset("path/to/dataset", voxel_size=(11.2, 11.2, 25))
Open an existing dataset:
ds = Dataset.open("path/to/dataset")
Open a remote dataset:
ds = RemoteDataset.open("my_dataset", "organization_id")
Create a new dataset or open an existing one.
Creates a new dataset and the associated datasource-properties.json
if one does not exist.
If the dataset already exists and exist_ok is True, it is opened (the provided voxel_size
and name are asserted to match the existing dataset).
Please use Dataset.open
if you intend to open an existing dataset and don't want/need
the creation behavior.
Parameters:
-
dataset_path
(str | PathLike | UPath
) –Path where the dataset should be created/opened
-
voxel_size
(tuple[float, float, float] | None
, default:None
) –Optional tuple of floats (x, y, z) specifying voxel size in nanometers
-
name
(str | None
, default:None
) –Optional name for the dataset, defaults to last part of dataset_path if not provided
-
exist_ok
(bool
, default:False
) –Whether to open an existing dataset at the path rather than failing
-
voxel_size_with_unit
(VoxelSize | None
, default:None
) –Optional voxel size with unit specification
-
read_only
(bool
, default:False
) –Whether to open dataset in read-only mode
Raises:
-
RuntimeError
–If dataset exists and exist_ok=False
-
AssertionError
–If opening existing dataset with mismatched voxel size or name
default_view_configuration
property
writable
¶
default_view_configuration: DatasetViewConfiguration | None
Default view configuration for this dataset in webknossos.
Controls how the dataset is displayed in webknossos when first opened by a user, including position, zoom level, rotation etc.
Returns:
-
DatasetViewConfiguration | None
–DatasetViewConfiguration | None: Current view configuration if set
Examples:
ds.default_view_configuration = DatasetViewConfiguration(
zoom=1.5,
position=(100, 100, 100)
)
layers
property
¶
layers: Mapping[str, LayerType]
Dictionary containing all layers of this dataset.
Returns:
-
Mapping[str, LayerType]
–dict[str, Layer]: Dictionary mapping layer names to Layer objects
Examples:
for layer_name, layer in ds.layers.items():
print(layer_name)
name
property
writable
¶
name: str
Name of this dataset as specified in datasource-properties.json.
Can be modified to rename the dataset. Changes are persisted to the properties file.
Returns:
-
str
(str
) –Current dataset name
Examples:
ds.name = "my_renamed_dataset" # Updates the name in properties file
read_only
property
¶
read_only: bool
Whether this dataset is opened in read-only mode.
When True, operations that would modify the dataset (adding layers, changing properties, etc.) are not allowed and will raise RuntimeError.
Returns:
-
bool
(bool
) –True if dataset is read-only, False otherwise
voxel_size
property
¶
voxel_size: tuple[float, float, float]
Size of each voxel in nanometers along each dimension (x, y, z).
Returns:
-
tuple[float, float, float]
–tuple[float, float, float]: Size of each voxel in nanometers for x,y,z dimensions
Examples:
vx, vy, vz = ds.voxel_size
print(f"X resolution is {vx}nm")
voxel_size_with_unit
property
¶
voxel_size_with_unit: VoxelSize
Size of voxels including unit information.
Size of each voxel along each dimension (x, y, z), including unit specification. The default unit is nanometers.
Returns:
-
VoxelSize
(VoxelSize
) –Object containing voxel sizes and their units
ConversionLayerMapping
¶
Bases: Enum
Strategies for mapping file paths to layers when importing images.
These strategies determine how input image files are grouped into layers during
dataset creation using Dataset.from_images()
. If no strategy is provided,
INSPECT_SINGLE_FILE
is used as the default.
If none of the pre-defined strategies fit your needs, you can provide a custom callable that takes a Path and returns a layer name string.
Examples:
Using default strategy:
ds = Dataset.from_images("images/", "dataset/")
Explicit strategy:
ds = Dataset.from_images(
"images/",
"dataset/",
map_filepath_to_layer_name=ConversionLayerMapping.ENFORCE_SINGLE_LAYER
)
Custom mapping function:
ds = Dataset.from_images(
"images/",
"dataset/",
map_filepath_to_layer_name=lambda p: p.stem
)
ENFORCE_LAYER_PER_FILE
class-attribute
instance-attribute
¶
ENFORCE_LAYER_PER_FILE = 'enforce_layer_per_file'
Creates a new layer for each input file. Useful for converting multiple 3D images or when each 2D image should become its own layer.
ENFORCE_LAYER_PER_FOLDER
class-attribute
instance-attribute
¶
ENFORCE_LAYER_PER_FOLDER = 'enforce_layer_per_folder'
Groups files by their containing folder. Each folder becomes one layer. Useful for organized 2D image stacks.
ENFORCE_LAYER_PER_TOPLEVEL_FOLDER
class-attribute
instance-attribute
¶
ENFORCE_LAYER_PER_TOPLEVEL_FOLDER = (
"enforce_layer_per_toplevel_folder"
)
Groups files by their top-level folder. Useful when multiple layers each have their stacks split across subfolders.
ENFORCE_SINGLE_LAYER
class-attribute
instance-attribute
¶
ENFORCE_SINGLE_LAYER = 'enforce_single_layer'
Combines all input files into a single layer. Only useful when all images are 2D slices that should be combined.
INSPECT_EVERY_FILE
class-attribute
instance-attribute
¶
INSPECT_EVERY_FILE = 'inspect_every_file'
Like INSPECT_SINGLE_FILE but determines strategy separately for each file. More flexible but slower for many files.
INSPECT_SINGLE_FILE
class-attribute
instance-attribute
¶
INSPECT_SINGLE_FILE = 'inspect_single_file'
Default strategy. Inspects first image file to determine if data is 2D or 3D. For 2D data uses ENFORCE_LAYER_PER_FOLDER, for 3D uses ENFORCE_LAYER_PER_FILE.
add_copy_layer
¶
add_copy_layer(
foreign_layer: (
str | PathLike | UPath | Layer | RemoteLayer
),
new_layer_name: str | None = None,
*,
chunk_shape: Vec3IntLike | int | None = None,
shard_shape: Vec3IntLike | int | None = None,
chunks_per_shard: Vec3IntLike | int | None = None,
data_format: str | DataFormat | None = None,
compress: bool | None = None,
exists_ok: bool = False,
executor: Executor | None = None,
with_attachments: bool = True
) -> Layer
Deprecated. Use Dataset.add_layer_as_copy
instead.
add_fs_copy_layer
¶
add_fs_copy_layer(
foreign_layer: str | PathLike | UPath | Layer,
new_layer_name: str | None = None,
) -> Layer
Deprecated. File-based copy is automatically used in Dataset.add_layer_as_copy
.
Copies the files at foreign_layer
which belongs to another dataset
to the current dataset via the filesystem. Additionally, the relevant
information from the datasource-properties.json
of the other dataset
are copied too. If new_layer_name is None, the name of the foreign
layer is used.
add_layer
¶
add_layer(
layer_name: str,
category: LayerCategoryType,
*,
dtype_per_layer: DTypeLike | None = None,
dtype_per_channel: DTypeLike | None = None,
num_channels: int | None = None,
data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
bounding_box: NDBoundingBox | None = None,
**kwargs: Any
) -> Layer
Create a new layer in the dataset.
Creates a new layer with the given name, category, and data type.
Parameters:
-
layer_name
(str
) –Name for the new layer
-
category
(LayerCategoryType
) –Either 'color' or 'segmentation'
-
dtype_per_layer
(DTypeLike | None
, default:None
) –Deprecated, use dtype_per_channel. Optional data type for entire layer, e.g. np.uint8
-
dtype_per_channel
(DTypeLike | None
, default:None
) –Optional data type per channel, e.g. np.uint8
-
num_channels
(int | None
, default:None
) –Number of channels (default 1)
-
data_format
(str | DataFormat
, default:DEFAULT_DATA_FORMAT
) –Format to store data ('wkw', 'zarr', 'zarr3')
-
bounding_box
(NDBoundingBox | None
, default:None
) –Optional initial bounding box of layer
-
**kwargs
(Any
, default:{}
) –Additional arguments: - largest_segment_id: For segmentation layers, initial largest ID - mappings: For segmentation layers, optional ID mappings
Returns:
-
Layer
(Layer
) –The newly created layer
Raises:
-
IndexError
–If layer with given name already exists
-
RuntimeError
–If invalid category specified
-
AttributeError
–If both dtype_per_layer and dtype_per_channel specified
-
AssertionError
–If invalid layer name or WKW format used with remote dataset
Examples:
Create color layer:
layer = ds.add_layer(
"my_raw_microscopy_layer",
LayerCategoryType.COLOR_CATEGORY,
dtype_per_channel=np.uint8,
)
Create segmentation layer:
layer = ds.add_layer(
"my_segmentation_labels",
LayerCategoryType.SEGMENTATION_CATEGORY,
dtype_per_channel=np.uint64
)
Note
The dtype can be specified either per layer or per channel, but not both. If neither is specified, uint8 per channel is used by default. WKW format can only be used with local datasets.
add_layer_as_copy
¶
add_layer_as_copy(
foreign_layer: (
str | PathLike | UPath | Layer | RemoteLayer
),
new_layer_name: str | None = None,
*,
chunk_shape: Vec3IntLike | int | None = None,
shard_shape: Vec3IntLike | int | None = None,
chunks_per_shard: Vec3IntLike | int | None = None,
data_format: str | DataFormat | None = None,
compress: bool | Zarr3Config | None = None,
exists_ok: bool = False,
executor: Executor | None = None,
with_attachments: bool = True
) -> Layer
Copy layer from another dataset to this one.
Creates a new layer in this dataset by copying data and metadata from a layer in another dataset.
Parameters:
-
foreign_layer
(str | PathLike | UPath | Layer | RemoteLayer
) –Layer to copy (path or Layer object)
-
new_layer_name
(str | None
, default:None
) –Optional name for the new layer, uses original name if None
-
chunk_shape
(Vec3IntLike | int | None
, default:None
) –Optional shape of chunks for storage
-
shard_shape
(Vec3IntLike | int | None
, default:None
) –Optional shape of shards for storage
-
chunks_per_shard
(Vec3IntLike | int | None
, default:None
) –Deprecated, use shard_shape. Optional number of chunks per shard
-
data_format
(str | DataFormat | None
, default:None
) –Optional format to store copied data ('wkw', 'zarr', etc.)
-
compress
(bool | Zarr3Config | None
, default:None
) –Optional whether to compress copied data
-
exists_ok
(bool
, default:False
) –Whether to overwrite existing layers
-
executor
(Executor | None
, default:None
) –Optional executor for parallel copying
Returns:
-
Layer
(Layer
) –The newly created copy of the layer
Raises:
-
IndexError
–If target layer name already exists
-
RuntimeError
–If dataset is read-only
Examples:
Copy layer keeping same name:
other_ds = Dataset.open("other/dataset")
copied = ds.add_layer_as_copy(other_ds.get_layer("color"))
Copy with new name:
copied = ds.add_layer_as_copy(
other_ds.get_layer("color"),
new_layer_name="color_copy",
compress=True
)
add_layer_as_ref
¶
add_layer_as_ref(
foreign_layer: (
str | PathLike | UPath | Layer | RemoteLayer
),
new_layer_name: str | None = None,
) -> Layer
Add a layer from another dataset by reference.
Creates a layer that references data from a remote dataset. The image data will be streamed on-demand when accessed.
Parameters:
-
foreign_layer
(str | PathLike | UPath | Layer | RemoteLayer
) –Foreign layer to add (path or Layer object)
-
new_layer_name
(str | None
, default:None
) –Optional name for the new layer, uses original name if None
Returns:
-
Layer
(Layer
) –The newly created remote layer referencing the foreign data
Raises:
-
IndexError
–If target layer name already exists
-
AssertionError
–If trying to add non-remote layer or same origin dataset
-
RuntimeError
–If dataset is read-only
Examples:
ds = Dataset.open("other/dataset")
remote_ds = RemoteDataset.open("my_dataset", "my_org_id")
new_layer = ds.add_layer_as_ref(
remote_ds.get_layer("color")
)
Note
Changes to the original layer's properties afterwards won't affect this dataset. Data is only referenced, not copied.
add_layer_for_existing_files
¶
add_layer_for_existing_files(
layer_name: str,
category: LayerCategoryType,
**kwargs: Any
) -> Layer
Create a new layer from existing data files.
Adds a layer by discovering and incorporating existing data files that were created externally, rather than creating new ones. The layer properties are inferred from the existing files unless overridden.
Parameters:
-
layer_name
(str
) –Name for the new layer
-
category
(LayerCategoryType
) –Layer category ('color' or 'segmentation')
-
**kwargs
(Any
, default:{}
) –Additional arguments: - num_channels: Override detected number of channels - dtype_per_channel: Override detected data type - data_format: Override detected data format - bounding_box: Override detected bounding box
Returns:
-
Layer
(Layer
) –The newly created layer referencing the existing files
Raises:
-
AssertionError
–If layer already exists or no valid files found
-
RuntimeError
–If dataset is read-only
Examples:
Basic usage:
layer = ds.add_layer_for_existing_files(
"external_data",
"color"
)
Override properties:
layer = ds.add_layer_for_existing_files(
"segmentation_data",
"segmentation",
dtype_per_channel=np.uint64
)
Note
The data files must already exist in the dataset directory under the layer name. Files are analyzed to determine properties like data type and number of channels. Magnifications are discovered automatically.
add_layer_from_images
¶
add_layer_from_images(
images: Union[
str, FramesSequence, list[str | PathLike | UPath]
],
layer_name: str,
category: LayerCategoryType | None = "color",
*,
data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
mag: MagLike = Mag(1),
chunk_shape: Vec3IntLike | int | None = None,
shard_shape: Vec3IntLike | int | None = None,
chunks_per_shard: int | Vec3IntLike | None = None,
compress: bool = True,
topleft: VecIntLike = zeros(),
swap_xy: bool = False,
flip_x: bool = False,
flip_y: bool = False,
flip_z: bool = False,
dtype: DTypeLike | None = None,
use_bioformats: bool | None = None,
channel: int | None = None,
timepoint: int | None = None,
czi_channel: int | None = None,
batch_size: int | None = None,
allow_multiple_layers: bool = False,
max_layers: int = 20,
truncate_rgba_to_rgb: bool = True,
executor: Executor | None = None
) -> Layer
Creates a new layer called layer_name
with mag mag
from images
.
images
can be one of the following:
- glob-string
- list of paths
pims.FramesSequence
instance
Please see the pims docs for more information.
This method needs extra packages like tifffile or pylibczirw. Please install the respective extras,
e.g. using python -m pip install "webknossos[all]"
.
Further Arguments:
category
:color
by default, may be set to "segmentation"data_format
: by default zarr3 files are written, may be set to "wkw" or "zarr" to write in these formats.mag
: magnification to use for the written datachunk_shape
,chunks_per_shard
,shard_shape
,compress
: adjust how the data is stored on disktopleft
: set an offset in Mag(1) to start writing the data, only affecting the outputswap_xy
: set toTrue
to interchange x and y axis before writing to diskflip_x
,flip_y
,flip_z
: set toTrue
to reverse the respective axis before writing to diskdtype
: the read image data will be convertoed to this dtype usingnumpy.ndarray.astype
use_bioformats
: set toTrue
to only use the pims bioformats adapter directly, needs a JVM, set toFalse
to forbid using the bioformats adapter, by default it is tried as a last optionchannel
: may be used to select a single channel, if multiple are availabletimepoint
: for timeseries, select a timepoint to use by specifying it as an int, starting from 0czi_channel
: may be used to select a channel for .czi images, which differs from normal color-channelsbatch_size
: size to process the images (influences RAM consumption), must be a multiple of the chunk-size z-axis for uncompressed and the shard-size z-axis for compressed layers, default is the chunk-size or shard-size respectivelyallow_multiple_layers
: set toTrue
if timepoints or channels may result in multiple layers being added (only the first is returned)max_layers
: only applies ifallow_multiple_layers=True
, limits the number of layers added via different channels or timepointstruncate_rgba_to_rgb
: only applies ifallow_multiple_layers=True
, set toFalse
to write four channels into layers instead of an RGB channelexecutor
: pass aClusterExecutor
instance to parallelize the conversion jobs across the batches
add_remote_layer
¶
add_remote_layer(
foreign_layer: (
str | PathLike | UPath | Layer | RemoteLayer
),
new_layer_name: str | None = None,
) -> Layer
Deprecated. Use Dataset.add_layer_as_ref
instead.
add_symlink_layer
¶
add_symlink_layer(
foreign_layer: str | PathLike | UPath | Layer,
new_layer_name: str | None = None,
*,
make_relative: bool = False
) -> Layer
Deprecated. Use Dataset.add_layer_as_ref
instead.
Create symbolic link to layer from another dataset.
Instead of copying data, creates a symbolic link to the original layer's data and copies only the layer metadata. Changes to the original layer's properties, e.g. bounding box, afterwards won't affect this dataset and vice-versa.
Parameters:
-
foreign_layer
(str | PathLike | UPath | Layer
) –Layer to link to (path or Layer object)
-
make_relative
(bool
, default:False
) –Whether to create relative symlinks
-
new_layer_name
(str | None
, default:None
) –Optional name for the linked layer, uses original name if None
Returns:
-
Layer
(Layer
) –The newly created symbolic link layer
Raises:
-
IndexError
–If target layer name already exists
-
AssertionError
–If trying to create symlinks in/to remote datasets
-
RuntimeError
–If dataset is read-only
Examples:
other_ds = Dataset.open("other/dataset")
linked = ds.add_symlink_layer(
other_ds.get_layer("color"),
make_relative=True
)
Note
Only works with local file systems, cannot link remote datasets or create symlinks in remote datasets.
calculate_bounding_box
¶
calculate_bounding_box() -> NDBoundingBox
Calculate the enclosing bounding box of all layers.
Finds the smallest box that contains all data from all layers in the dataset.
Returns:
-
NDBoundingBox
(NDBoundingBox
) –Bounding box containing all layer data
Examples:
bbox = ds.calculate_bounding_box()
print(f"Dataset spans {bbox.size} voxels")
print(f"Dataset starts at {bbox.topleft}")
compress
¶
compress(*, executor: Executor | None = None) -> None
Compress all uncompressed magnifications in-place.
Compresses the data of all magnification levels that aren't already compressed, for all layers in the dataset.
Parameters:
-
executor
(Executor | None
, default:None
) –Optional executor for parallel compression
Raises:
-
RuntimeError
–If dataset is read-only
Examples:
ds.compress()
Note
If data is already compressed, this will have no effect.
copy_dataset
¶
copy_dataset(
new_dataset_path: str | PathLike | UPath,
*,
voxel_size: tuple[float, float, float] | None = None,
chunk_shape: Vec3IntLike | int | None = None,
shard_shape: Vec3IntLike | int | None = None,
chunks_per_shard: Vec3IntLike | int | None = None,
data_format: str | DataFormat | None = None,
compress: bool | None = None,
exists_ok: bool = False,
executor: Executor | None = None,
voxel_size_with_unit: VoxelSize | None = None,
layers_to_ignore: Iterable[str] | None = None
) -> Dataset
Creates an independent copy of the dataset with all layers at a new location. Data storage parameters can be customized for the copied dataset.
Parameters:
-
new_dataset_path
(str | PathLike | UPath
) –Path where new dataset should be created
-
voxel_size
(tuple[float, float, float] | None
, default:None
) –Optional tuple of floats (x,y,z) specifying voxel size in nanometers
-
chunk_shape
(Vec3IntLike | int | None
, default:None
) –Optional shape of chunks for data storage
-
shard_shape
(Vec3IntLike | int | None
, default:None
) –Optional shape of shards for data storage
-
chunks_per_shard
(Vec3IntLike | int | None
, default:None
) –Deprecated, use shard_shape. Optional number of chunks per shard
-
data_format
(str | DataFormat | None
, default:None
) –Optional format to store data ('wkw', 'zarr', 'zarr3')
-
compress
(bool | None
, default:None
) –Optional whether to compress data
-
exists_ok
(bool
, default:False
) –Whether to overwrite existing datasets and layers
-
executor
(Executor | None
, default:None
) –Optional executor for parallel copying
-
voxel_size_with_unit
(VoxelSize | None
, default:None
) –Optional voxel size specification with units
-
layers_to_ignore
(Iterable[str] | None
, default:None
) –List of layer names to exclude from the copy
Returns:
-
Dataset
(Dataset
) –The newly created copy
Raises:
-
AssertionError
–If trying to copy WKW layers to remote dataset
Examples:
Basic copy:
copied = ds.copy_dataset("path/to/copy")
Copy with different storage:
copied = ds.copy_dataset(
"path/to/copy",
data_format="zarr",
compress=True
)
Note
WKW layers can only be copied to datasets on local file systems. For remote datasets, use data_format='zarr3'.
delete_layer
¶
delete_layer(layer_name: str) -> None
Delete a layer from the dataset.
Removes the layer's data and metadata from disk completely. This deletes both the datasource-properties.json entry and all data files for the layer.
Parameters:
-
layer_name
(str
) –Name of layer to delete
Raises:
-
IndexError
–If no layer with the given name exists
-
RuntimeError
–If dataset is read-only
Examples:
ds.delete_layer("old_layer")
print("Remaining layers:", list(ds.layers))
download
classmethod
¶
download(
dataset_name_or_url: str,
*,
organization_id: str | None = None,
sharing_token: str | None = None,
webknossos_url: str | None = None,
bbox: BoundingBox | None = None,
layers: list[str] | str | None = None,
mags: list[Mag] | None = None,
path: PathLike | UPath | str | None = None,
exist_ok: bool = False
) -> Dataset
Downloads a dataset and returns the Dataset instance.
dataset_name_or_url
may be a dataset name or a full URL to a dataset view, e.g.https://webknossos.org/datasets/scalable_minds/l4_sample_dev/view
If a URL is used,organization_id
,webknossos_url
andsharing_token
must not be set.organization_id
may be supplied if a dataset name was used in the previous argument, it defaults to your current organization from thewebknossos_context
. You can find yourorganization_id
here.sharing_token
may be supplied if a dataset name was used and can specify a sharing token.webknossos_url
may be supplied if a dataset name was used, and allows to specify in which webknossos instance to search for the dataset. It defaults to the url from your currentwebknossos_context
, using https://webknossos.org as a fallback.bbox
,layers
, andmags
specify which parts of the dataset to download. If nothing is specified the whole image, all layers, and all mags are downloaded respectively.path
andexist_ok
specify where to save the downloaded dataset and whether to overwrite if thepath
exists.
downsample
¶
downsample(
*,
sampling_mode: SamplingModes = ANISOTROPIC,
coarsest_mag: Mag | None = None,
interpolation_mode: str = "default",
compress: bool | Zarr3Config = True,
executor: Executor | None = None
) -> None
Generate downsampled magnifications for all layers.
Creates lower resolution versions (coarser magnifications) of all layers that are not yet downsampled, up to the specified coarsest magnification.
Parameters:
-
sampling_mode
(SamplingModes
, default:ANISOTROPIC
) –Strategy for downsampling (e.g. ANISOTROPIC, MAX)
-
coarsest_mag
(Mag | None
, default:None
) –Optional maximum/coarsest magnification to generate
-
interpolation_mode
(str
, default:'default'
) –Interpolation method to use. Defaults to "default" (= "mode" for segmentation, "median" for color).
-
compress
(bool | Zarr3Config
, default:True
) –Whether to compress generated magnifications. For Zarr3 datasets, codec configuration and chunk key encoding may also be supplied. Defaults to True.
-
executor
(Executor | None
, default:None
) –Optional executor for parallel processing
Raises:
-
RuntimeError
–If dataset is read-only
Examples:
Basic downsampling:
ds.downsample()
With custom parameters:
ds.downsample(
sampling_mode=SamplingModes.ANISOTROPIC,
coarsest_mag=Mag(8),
)
Note
- ANISOTROPIC sampling creates anisotropic downsampling until dataset is isotropic
- Other modes like MAX, CONSTANT etc create regular downsampling patterns
- If magnifications already exist they will not be regenerated
from_images
classmethod
¶
from_images(
input_path: str | PathLike | UPath,
output_path: str | PathLike | UPath,
voxel_size: tuple[float, float, float] | None = None,
name: str | None = None,
*,
map_filepath_to_layer_name: (
ConversionLayerMapping | Callable[[UPath], str]
) = INSPECT_SINGLE_FILE,
z_slices_sort_key: Callable[
[UPath], Any
] = natsort_keygen(),
voxel_size_with_unit: VoxelSize | None = None,
layer_name: str | None = None,
layer_category: LayerCategoryType | None = None,
data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
chunk_shape: Vec3IntLike | int | None = None,
shard_shape: Vec3IntLike | int | None = None,
chunks_per_shard: int | Vec3IntLike | None = None,
compress: bool = True,
swap_xy: bool = False,
flip_x: bool = False,
flip_y: bool = False,
flip_z: bool = False,
use_bioformats: bool | None = None,
max_layers: int = 20,
batch_size: int | None = None,
executor: Executor | None = None
) -> Dataset
This method imports image data in a folder or from a file as a webknossos dataset.
The image data can be 3D images (such as multipage tiffs) or stacks of 2D images. Multiple 3D images or image stacks are mapped to different layers based on the mapping strategy.
The exact mapping is handled by the argument map_filepath_to_layer_name
, which can be a pre-defined
strategy from the enum ConversionLayerMapping
, or a custom callable, taking
a path of an image file and returning the corresponding layer name. All
files belonging to the same layer name are then grouped. In case of
multiple files per layer, those are usually mapped to the z-dimension.
The order of the z-slices can be customized by setting
z_slices_sort_key
.
For more fine-grained control, please create an empty dataset and use add_layer_from_images
.
Parameters:
-
input_path
(str | PathLike | UPath
) –Path to input image files
-
output_path
(str | PathLike | UPath
) –Output path for created dataset
-
voxel_size
(tuple[float, float, float] | None
, default:None
) –Optional tuple of floats (x,y,z) for voxel size in nm
-
name
(str | None
, default:None
) –Optional name for dataset
-
map_filepath_to_layer_name
(ConversionLayerMapping | Callable[[UPath], str]
, default:INSPECT_SINGLE_FILE
) –Strategy for mapping files to layers, either a ConversionLayerMapping enum value or callable taking Path and returning str
-
z_slices_sort_key
(Callable[[UPath], Any]
, default:natsort_keygen()
) –Optional key function for sorting z-slices
-
voxel_size_with_unit
(VoxelSize | None
, default:None
) –Optional voxel size with unit specification
-
layer_name
(str | None
, default:None
) –Optional name for layer(s)
-
layer_category
(LayerCategoryType | None
, default:None
) –Optional category override (LayerCategoryType.color / LayerCategoryType.segmentation)
-
data_format
(str | DataFormat
, default:DEFAULT_DATA_FORMAT
) –Format to store data in ('wkw'/'zarr'/'zarr3)
-
chunk_shape
(Vec3IntLike | int | None
, default:None
) –Optional. Shape of chunks to store data in
-
shard_shape
(Vec3IntLike | int | None
, default:None
) –Optional. Shape of shards to store data in
-
chunks_per_shard
(int | Vec3IntLike | None
, default:None
) –Deprecated, use shard_shape. Optional. number of chunks per shard
-
compress
(bool
, default:True
) –Whether to compress the data
-
swap_xy
(bool
, default:False
) –Whether to swap x and y axes
-
flip_x
(bool
, default:False
) –Whether to flip the x axis
-
flip_y
(bool
, default:False
) –Whether to flip the y axis
-
flip_z
(bool
, default:False
) –Whether to flip the z axis
-
use_bioformats
(bool | None
, default:None
) –Whether to use bioformats for reading
-
max_layers
(int
, default:20
) –Maximum number of layers to create
-
batch_size
(int | None
, default:None
) –Size of batches for processing
-
executor
(Executor | None
, default:None
) –Optional executor for parallelization
Returns:
-
Dataset
(Dataset
) –The created dataset instance
Examples:
ds = Dataset.from_images("path/to/images/",
"path/to/dataset/",
voxel_size=(1, 1, 1))
Note
This method needs extra packages like tifffile or pylibczirw.
Install with pip install "webknossos[all]"
and pip install --extra-index-url https://pypi.scm.io/simple/ "webknossos[czi]"
.
fs_copy_dataset
¶
fs_copy_dataset(
new_dataset_path: str | PathLike | UPath,
*,
exists_ok: bool = False,
layers_to_ignore: Iterable[str] | None = None
) -> Dataset
Deprecated. File-based copy is automatically used by Dataset.copy_dataset
.
Creates an independent copy of the dataset with all layers at a new location.
This method copies the files of the dataset as is and, therefore, might be faster than Dataset.copy_dataset, which decodes and encodes all the data. If you wish to change the data storage parameters, use Dataset.copy_dataset.
Parameters:
-
new_dataset_path
(str | PathLike | UPath
) –Path where new dataset should be created
-
exists_ok
(bool
, default:False
) –Whether to overwrite existing datasets and layers
-
layers_to_ignore
(Iterable[str] | None
, default:None
) –List of layer names to exclude from the copy
Returns:
-
Dataset
(Dataset
) –The newly created copy
Raises:
-
AssertionError
–If trying to copy WKW layers to remote dataset
Examples:
Basic copy:
copied = ds.fs_copy_dataset("path/to/copy")
Note
WKW layers can only be copied to datasets on local file systems.
get_color_layers
¶
get_color_layers() -> list[LayerType]
Get all color layers in the dataset.
Provides access to all layers with category 'color'. Useful when a dataset contains multiple color layers.
Returns:
-
list[LayerType]
–list[Layer]: List of all color layers in order
Examples:
Print all color layer names:
for layer in ds.get_color_layers():
print(layer.name)
Note
If you need only a single color layer, consider using
get_layer()
with the specific layer name instead.
get_layer
¶
get_layer(layer_name: str) -> LayerType
Get a specific layer from this dataset.
Parameters:
-
layer_name
(str
) –Name of the layer to retrieve
Returns:
-
Layer
(LayerType
) –The requested layer object
Raises:
-
IndexError
–If no layer with the given name exists
Examples:
color_layer = ds.get_layer("color")
seg_layer = ds.get_layer("segmentation")
Note
Use layers
property to access all layers at once.
get_or_add_layer
¶
get_or_add_layer(
layer_name: str,
category: LayerCategoryType,
*,
dtype_per_layer: DTypeLike | None = None,
dtype_per_channel: DTypeLike | None = None,
num_channels: int | None = None,
data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
**kwargs: Any
) -> Layer
Get an existing layer or create a new one.
Gets a layer with the given name if it exists, otherwise creates a new layer with the specified parameters.
Parameters:
-
layer_name
(str
) –Name of the layer to get or create
-
category
(LayerCategoryType
) –Layer category ('color' or 'segmentation')
-
dtype_per_layer
(DTypeLike | None
, default:None
) –Deprecated, use dtype_per_channel. Optional data type for entire layer
-
dtype_per_channel
(DTypeLike | None
, default:None
) –Optional data type per channel
-
num_channels
(int | None
, default:None
) –Optional number of channels
-
data_format
(str | DataFormat
, default:DEFAULT_DATA_FORMAT
) –Format to store data ('wkw', 'zarr', etc.)
-
**kwargs
(Any
, default:{}
) –Additional arguments passed to add_layer()
Returns:
-
Layer
(Layer
) –The existing or newly created layer
Raises:
-
AssertionError
–If existing layer's properties don't match specified parameters
-
ValueError
–If both dtype_per_layer and dtype_per_channel specified
-
RuntimeError
–If invalid category specified
Examples:
layer = ds.get_or_add_layer(
"segmentation",
LayerCategoryType.SEGMENTATION_CATEGORY,
dtype_per_channel=np.uint64,
)
Note
The dtype can be specified either per layer or per channel, but not both. For existing layers, the parameters are validated against the layer properties.
get_remote_datasets
staticmethod
¶
get_remote_datasets(
*,
organization_id: str | None = None,
tags: str | Sequence[str] | None = None,
name: str | None = None,
folder_id: RemoteFolder | str | None = None
) -> Mapping[str, RemoteDataset]
get_segmentation_layer
¶
get_segmentation_layer(
layer_name: str,
) -> SegmentationLayerType
Get a segmentation layer by name.
Parameters:
-
layer_name
(str
) –Name of the layer to get
Returns:
-
SegmentationLayer
(SegmentationLayerType
) –The segmentation layer
get_segmentation_layers
¶
get_segmentation_layers() -> list[SegmentationLayerType]
Get all segmentation layers in the dataset.
Provides access to all layers with category 'segmentation'. Useful when a dataset contains multiple segmentation layers.
Returns:
-
list[SegmentationLayerType]
–list[SegmentationLayer]: List of all segmentation layers in order
Examples:
Print all segmentation layer names:
for layer in ds.get_segmentation_layers():
print(layer.name)
Note
If you need only a single segmentation layer, consider using
get_layer()
with the specific layer name instead.
open
classmethod
¶
open(
dataset_path: str | PathLike | UPath,
read_only: bool = False,
) -> Dataset
To open an existing dataset on disk, simply call Dataset.open("your_path")
.
This requires datasource-properties.json
to exist in this folder. Based on the datasource-properties.json
,
a dataset object is constructed. Only layers and magnifications that are listed in the properties are loaded
(even though there might exist more layers or magnifications on disk).
The dataset_path
refers to the top level directory of the dataset (excluding layer or magnification names).
open_remote
classmethod
¶
open_remote(
dataset_name_or_url: str | None = None,
organization_id: str | None = None,
sharing_token: str | None = None,
webknossos_url: str | None = None,
dataset_id: str | None = None,
annotation_id: str | None = None,
use_zarr_streaming: bool = True,
read_only: bool = False,
) -> RemoteDataset
publish_to_preliminary_dataset
¶
publish_to_preliminary_dataset(
dataset_id: str,
path_prefix: str | None = None,
symlink_data_instead_of_copy: bool = False,
) -> None
Copies or symlinks the data to paths returned by WEBKNOSSOS The dataset needs to be in status "uploading". The dataset already exists in WEBKNOSSOS but has no dataset_properties. With the dataset_properties WEBKNOSSOS can reserve the paths. Args: dataset_id: The dataset_id of the already existing dataset path_prefix: The prefix of the storage path, can be used to select one of the storage path options. symlink_data_instead_of_copy: Set to true if the client has access to the same file system as the WEBKNOSSOS datastore.
shallow_copy_dataset
¶
shallow_copy_dataset(
new_dataset_path: str | PathLike | UPath,
*,
name: str | None = None,
layers_to_ignore: Iterable[str] | None = None,
make_relative: bool | None = None
) -> Dataset
Create a new dataset that contains references to the layers, mags and attachments of another dataset.
Useful for creating alternative views or exposing datasets to WEBKNOSOSS.
Parameters:
-
new_dataset_path
(str | PathLike | UPath
) –Path where new dataset should be created
-
name
(str | None
, default:None
) –Optional name for the new dataset, uses original name if None
-
layers_to_ignore
(Iterable[str] | None
, default:None
) –Optional iterable of layer names to exclude
-
executor
–Optional executor for copy operations
Returns:
-
Dataset
(Dataset
) –The newly created dataset with linked layers
Raises:
-
RuntimeError
–If dataset is read-only
Examples:
Basic shallow copy:
linked = ds.shallow_copy_dataset("path/to/link")
With relative links excluding layers:
linked = ds.shallow_copy_dataset(
"path/to/link",
make_relative=True,
layers_to_ignore=["temp_layer"]
)
trigger_dataset_import
classmethod
¶
trigger_dataset_import(
directory_name: str,
organization: str,
token: str | None = None,
) -> None
Deprecated. Use Dataset.trigger_reload_in_datastore
instead.
trigger_reload_in_datastore
classmethod
¶
trigger_reload_in_datastore(
dataset_name_or_url: str | None = None,
organization_id: str | None = None,
webknossos_url: str | None = None,
dataset_id: str | None = None,
organization: str | None = None,
token: str | None = None,
datastore_url: str | None = None,
) -> None
upload
¶
upload(
new_dataset_name: str | None = None,
initial_team_ids: list[str] | None = None,
folder_id: str | RemoteFolder | None = None,
require_unique_name: bool = False,
layers_to_link: (
list[LayerToLink | RemoteLayer] | None
) = None,
upload_directly_to_common_storage: bool = False,
jobs: int | None = None,
common_storage_path_prefix: str | None = None,
symlink_data_instead_of_copy: bool = False,
) -> RemoteDataset
Upload this dataset to webknossos.
Creates database entries and sets access rights on the webknossos instance before the actual data upload. The client then copies the data directly to the returned paths.
Parameters:
-
new_dataset_name
(str | None
, default:None
) –Name for the new dataset defaults to the current name.
-
initial_team_ids
(list[str] | None
, default:None
) –Optional list of team IDs to grant initial access
-
folder_id
(str | RemoteFolder | None
, default:None
) –Optional ID of folder where dataset should be placed
-
require_unique_name
(bool
, default:False
) –Whether to make request fail in case a dataset with the name already exists
-
layers_to_link
(list[LayerToLink | RemoteLayer] | None
, default:None
) –Optional list of LayerToLink to link already published layers to the dataset.
-
upload_directly_to_common_storage
(bool
, default:False
) –Set this to true when the client has access to the same storage system as the WEBKNOSSOS datastore (file system or cloud storage).
-
jobs
(int | None
, default:None
) –Optional number of jobs to use for uploading the data.
-
common_storage_path_prefix
(str | None
, default:None
) –Optional path prefix used when upload_directly_to_common_storage is true to select one of the available mount points for the dataset folder.
-
symlink_data_instead_of_copy
(bool
, default:False
) –Considered, when upload_directly_to_common_storage is True. Set this to true when the client has access to the same file system as the WEBKNOSSOS datastore.
Returns: RemoteDataset: Reference to the newly created remote dataset Note: upload_directly_to_common_storage is typically only used by administrators with direct file system or S3 access to the WEBKNOSSOS datastore. Most users should let upload_directly_to_common_storage to default to False Examples:
remote_ds = ds.upload(
"my_dataset",
["team_a", "team_b"],
"folder_123"
)
print(remote_ds.url)
link = LayerToLink.from_remote_layer(existing_layer)
remote_ds = ds.upload(layers_to_link=[link])
write_layer
¶
write_layer(
layer_name: str,
category: LayerCategoryType,
data: ndarray,
*,
data_format: str | DataFormat = DEFAULT_DATA_FORMAT,
downsample: bool = True,
chunk_shape: Vec3IntLike | int | None = None,
shard_shape: Vec3IntLike | int | None = None,
chunks_per_shard: Vec3IntLike | int | None = None,
axes: Iterable[str] | None = None,
absolute_offset: Vec3IntLike | VecIntLike | None = None,
mag: MagLike = Mag(1)
) -> Layer
Write a numpy array to a new layer and downsample.
Parameters:
-
layer_name
(str
) –Name of the new layer.
-
category
(LayerCategoryType
) –Category of the new layer.
-
data
(ndarray
) –The data to write.
-
data_format
(str | DataFormat
, default:DEFAULT_DATA_FORMAT
) –Format to store the data. Defaults to zarr3.
-
downsample
(bool
, default:True
) –Whether to downsample the data. Defaults to True.
-
chunk_shape
(Vec3IntLike | int | None
, default:None
) –Shape of chunks for storage. Recommended (32,32,32) or (64,64,64). Defaults to (32,32,32).
-
shard_shape
(Vec3IntLike | int | None
, default:None
) –Shape of shards for storage. Must be a multiple of chunk_shape. If specified, chunks_per_shard must not be specified. Defaults to (1024, 1024, 1024).
-
chunks_per_shard
(Vec3IntLike | int | None
, default:None
) –Deprecated, use shard_shape. Number of chunks per shards. If specified, shard_shape must not be specified.
-
axes
(Iterable[str] | None
, default:None
) –The axes of the data for non-3D data.
-
absolute_offset
(Vec3IntLike | VecIntLike | None
, default:None
) –The offset of the data. Specified in Mag 1.
-
mag
(MagLike
, default:Mag(1)
) –Magnification to write the data at.
- Get Help
- Community Forums
- Email Support