Dataset Usage¶
The high-level dataset API allows to interact with datasets while automatically maintaining meta data for any dataset,
such as the datasource-properties.json.
The Dataset class is the entry-point for this API.
The dataset stores the data on disk in .wkw-files.
Each dataset consists of one or more layers,
which themselves can comprise multiple magnifications represented via MagViews.
import numpy as np
import webknossos as wk
# ruff: noqa: F841 unused-variable
def main() -> None:
    #####################
    # Opening a dataset #
    #####################
    dataset = wk.Dataset.open("testdata/simple_wkw_dataset")
    # Assuming that the dataset has a layer "color"
    # and the layer has the magnification 1
    layer = dataset.get_layer("color")
    mag1 = layer.get_mag("1")
    ######################
    # Creating a dataset #
    ######################
    dataset = wk.Dataset("testoutput/my_new_dataset", voxel_size=(1, 1, 1))
    layer = dataset.add_layer(
        layer_name="color",
        category="color",
        dtype_per_channel="uint8",
        num_channels=3,
        bounding_box=wk.BoundingBox((10, 20, 30), (512, 512, 32)),
    )
    mag1 = layer.add_mag("1")
    mag2 = layer.add_mag("2")
    ##########################
    # Writing into a dataset #
    ##########################
    # The properties are updated automatically
    # when the written data exceeds the bounding box in the properties
    mag1.write(
        absolute_offset=(10, 20, 30),
        # assuming the layer has 3 channels:
        data=(np.random.rand(3, 512, 512, 32) * 255).astype(np.uint8),
        allow_unaligned=True,
    )
    mag2.write(
        absolute_offset=(10, 20, 30),
        data=(np.random.rand(3, 256, 256, 16) * 255).astype(np.uint8),
        allow_unaligned=True,
    )
    ##########################
    # Reading from a dataset #
    ##########################
    data_in_mag1 = mag1.read()  # the offset and size from the properties are used
    data_in_mag1_subset = mag1.read(absolute_offset=(10, 20, 30), size=(512, 512, 32))
    data_in_mag2 = mag2.read()
    data_in_mag2_subset = mag2.read(absolute_offset=(10, 20, 30), size=(512, 512, 32))
    assert data_in_mag2_subset.shape == (3, 256, 256, 16)
    #####################
    # Copying a dataset #
    #####################
    copy_of_dataset = dataset.copy_dataset(
        "testoutput/copy_of_dataset",
        chunk_shape=(32, 32, 32),
        shard_shape=(64, 64, 64),
        compress=True,
    )
    new_layer = dataset.add_layer(
        layer_name="segmentation",
        category="segmentation",
        dtype_per_channel="uint8",
        largest_segment_id=0,
    )
    # Link a layer of the initial dataset to the copy:
    sym_layer = copy_of_dataset.add_symlink_layer(new_layer)
if __name__ == "__main__":
    main()
Parallel Access of WEBKNOSSOS Datasets¶
Please consider these restrictions when accessing a WEBKNOSSOS dataset in a multiprocessing-context:
- When writing shards in parallel, 
json_update_allowedshould be set toFalseto disable the automatic update of the bounding box metadata. Otherwise, race conditions may happen. The user is responsible for updating the bounding box manually. - When writing to chunks in shards, one chunk may only be written to by one actor at any time.
 - When writing to compressed shards, one shard may only be written to by one actor at any time.
 - For Zarr datasets, parallel write access to shards is not allowed at all.
 - Reading in parallel without concurrent writes is fine.
 
- Get Help
 - Community Forums
 - Email Support