copy-dataset
¶
The copy-dataset
command allows you to create a copy of a WEBKNOSSOS dataset. It supports copying datasets locally or from remote paths (e.g., HTTP or S3). The command provides options to customize the data format, chunking, sharding, and parallel execution.
Usage¶
webknossos copy-dataset [OPTIONS] SOURCE TARGET
Arguments¶
-
SOURCE
Path to the source WEBKNOSSOS dataset.
Example:/path/to/source/dataset
ors3://bucket-name/source-dataset
. -
TARGET
Path to the target WEBKNOSSOS dataset.
Example:/path/to/target/dataset
ors3://bucket-name/target-dataset
.
Options¶
-
--data-format
Specify the data format to store the target dataset.
Options:wkw
,zarr
,zarr3
Example:--data-format zarr3
. -
--chunk-shape
Number of voxels to be stored as a chunk in the target dataset.
Example:--chunk-shape 32,32,32
. -
--shard-shape
Number of voxels to be stored as a shard in the target dataset.
Example:--shard-shape 1024,1024,1024
. -
--exists-ok
Allow overwriting an existing dataset.
Default:False
. -
--jobs
Number of processes to be spawned for parallel execution.
Default: Number of CPU cores. -
--distribution-strategy
Strategy to distribute the task across CPUs or nodes.
Options:multiprocessing
,slurm
,kubernetes
,sequential
.
Default:multiprocessing
. -
--job-resources
Specify resources for jobs when using the SLURM distribution strategy.
Example:--job-resources '{"mem": "10M"}'
.
Environment Variables for Remote Paths¶
When using remote paths (e.g., HTTP or S3), configure the following environment variables:
HTTP Basic Authentication:¶
HTTP_BASIC_USER
HTTP_BASIC_PASSWORD
S3 Configuration:¶
S3_ENDPOINT_URL
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Example Commands¶
Copy a dataset locally:¶
webknossos copy-dataset /path/to/source/dataset /path/to/target/dataset
Copy a local dataset to a S3 storage:¶
AWS_ACCESS_KEY_ID=XXX AWS_SECRET_ACCESS_KEY=XXX \
webknossos copy-dataset \
--data-format zarr3 \
--jobs 4 \
data/source s3://webknossos-bucket/target
Copy a dataset with parallel execution:¶
webknossos copy-dataset --jobs 4 /path/to/source/dataset /path/to/target/dataset
Copy a dataset using SLURM with custom job resources:¶
webknossos copy-dataset --distribution-strategy slurm --job-resources '{"mem": "10M"}' /path/to/source/dataset /path/to/target/dataset
Notes¶
- Ensure that the source and target paths are accessible and properly configured.
- Use the
--exists-ok
option to overwrite an existing target dataset if necessary. - For remote paths, make sure the required environment variables are set.
- This command is designed to handle large datasets efficiently by leveraging parallel processing and customizable storage configurations.
- Get Help
- Community Forums
- Email Support