Skip to content

AgglomerateAttachment Specification

Current version: 4

An AgglomerateAttachment stores the agglomeration graph for a segmentation layer. It maps every segment to an agglomerate and stores, for each agglomerate, its constituent segments, the edges between them, the affinity scores of those edges, and a representative voxel position per segment. Usually, there are multiple agglomerate attachments with varying degrees of agglomeration.

File Format

The artifact is stored as a Zarr v3 hierarchy on disk.

Directory Structure

{attachment_name}/
  zarr.json                            # group metadata (version, class)
  segment_to_agglomerate/              # array
  agglomerate_to_segments_offsets/     # array
  agglomerate_to_segments/             # array
  agglomerate_to_edges_offsets/        # array
  agglomerate_to_edges/                # array
  agglomerate_to_affinities/           # array
  agglomerate_to_positions/            # array

The attachment name is arbitrary. When computed with Voxelytics it is usually agglomerate_view_{mapping_id}, where mapping_id is either an integer (commonly a percentile of the agglomeration score) or a string identifier. The directory is referenced from the layer's datasource-properties.json via AttachmentsProperties.agglomerates.


Group Metadata (zarr.json)

The group zarr.json stores the following attributes under the voxelytics key:

Key Value
zarr_format 3
node_type "group"
attributes.voxelytics.artifact_schema_version 4
attributes.voxelytics.artifact_class "AgglomerateViewArtifact"

Example:

{
  "zarr_format": 3,
  "node_type": "group",
  "attributes": {
    "voxelytics": {
      "artifact_schema_version": 4,
      "artifact_class": "AgglomerateViewArtifact"
    }
  }
}

Notation

Let:

  • n_segments = total number of segments (segment IDs are 1-based and dense: every integer from 1 to n_segments must be present; segment 0 is the background)
  • n_agglomerates = number of real agglomerates (agglomerate 0 is reserved and always empty)
  • n_edges = total number of edges across all agglomerates

The segmentation_dtype must be uint32 or uint64 and must match the dtype of the corresponding segmentation layer.


Arrays

segment_to_agglomerate

Property Value
Shape (n_segments + 1,)
Dtype uint64

Maps each segment ID to its agglomerate ID. Index 0 is the background segment and maps to agglomerate 0.

Example: [0, 1, 1, 1, 1, 2, 2, 1] — segments 1–4 and 7 belong to agglomerate 1, segments 5–6 belong to agglomerate 2.


agglomerate_to_segments_offsets

Property Value
Shape (n_agglomerates + 2,)
Dtype uint64

CSR-style offset array into agglomerate_to_segments. The segments belonging to agglomerate i are at indices [offsets[i], offsets[i+1]) in agglomerate_to_segments.

Agglomerate 0 is always empty: offsets[0] == offsets[1] == 0. The last entry equals n_segments.

Example: [0, 0, 5, 7] — agglomerate 0 is empty, agglomerate 1 has 5 segments (indices 0–4), agglomerate 2 has 2 segments (indices 5–6).


agglomerate_to_segments

Property Value
Shape (n_segments,)
Dtype segmentation_dtype

All segment IDs, grouped by agglomerate. The segments for agglomerate i occupy agglomerate_to_segments[offsets[i]:offsets[i+1]] and are sorted in ascending order within each agglomerate.

Example: [1, 2, 3, 4, 7, 5, 6] — agglomerate 1 contains segments {1, 2, 3, 4, 7} and agglomerate 2 contains {5, 6}.


agglomerate_to_edges_offsets

Property Value
Shape (n_agglomerates + 2,)
Dtype uint64

CSR-style offset array into agglomerate_to_edges and agglomerate_to_affinities. The edges for agglomerate i are at indices [offsets[i], offsets[i+1]). The last entry equals n_edges.

Agglomerate 0 is always empty: offsets[0] == offsets[1] == 0.

Example: [0, 0, 4, 5] — agglomerate 1 has 4 edges, agglomerate 2 has 1 edge.


agglomerate_to_edges

Property Value
Shape (n_edges, 2)
Dtype segmentation_dtype

All edges, grouped by agglomerate. Values are zero-based local node indices within each agglomerate (i.e. positions within the agglomerate's slice of agglomerate_to_segments).

For each edge (n1, n2): - n1 < n2 - Edges within each agglomerate are sorted lexicographically: first by n1, then by n2.

Example: [[0,1], [0,4], [1,2], [2,3], [0,1]]

To convert local indices to global segment IDs, index into agglomerate_to_segments using the agglomerate's offset from agglomerate_to_segments_offsets.


agglomerate_to_affinities

Property Value
Shape (n_edges,)
Dtype float32

Affinity score for each edge, co-indexed with agglomerate_to_edges. Higher values indicate stronger evidence for merging.

Example: [124.0, 65.5, 0.0, 250.5, 80.0]


agglomerate_to_positions

Property Value
Shape (n_segments, 3)
Dtype int32

Representative voxel position (x, y, z) for each segment, co-indexed with agglomerate_to_segments. The positions for agglomerate i occupy agglomerate_to_positions[offsets[i]:offsets[i+1]] where offsets is agglomerate_to_segments_offsets.


All arrays are written with Zarr v3 sharding (sharding_indexed codec). The recommendended chunk and shard sizes are derived from the array's shape and dtype to approximate the targets below.

Array Target chunk size Target shard size
segment_to_agglomerate 256 KB 1 GB
agglomerate_to_segments 256 KB 1 GB
agglomerate_to_segments_offsets 64 KB 256 MB
agglomerate_to_edges_offsets 64 KB 256 MB
agglomerate_to_edges 256 KB 1 GB
agglomerate_to_affinities 256 KB 1 GB
agglomerate_to_positions 256 KB 1 GB

The first axis is used as the "row" axis for size calculations; all remaining axes are kept whole in every chunk and shard. The shard shape is always rounded up to the nearest multiple of the chunk shape.

Codec stack (inner chunks): bytes (little-endian) → zstd (level 5, checksum enabled)

Shard index codecs: bytes (little-endian) → crc32c


Invariants

  • Segment IDs are 1-based and dense: every integer from 1 to n_segments must appear as a node.
  • Agglomerate 0 is always empty: no segments and no edges.
  • Segment IDs within each agglomerate are sorted in ascending order.
  • For each edge (n1, n2): n1 < n2.
  • Edges within each agglomerate are sorted lexicographically (n1, n2).
  • Edge node indices are zero-based local indices into the agglomerate's segment list.
  • agglomerate_to_positions is co-indexed with agglomerate_to_segments.
  • All offset arrays have shape (n_agglomerates + 2,); the last entry equals the total element count of the corresponding data array.

Example

Consider the following 7 segments and 5 edges:

Segments 1, 2, 3, 4, 5, 6, 7
Edges (1, 2), (2, 3), (3, 4), (5, 6), (1, 7)

These result in two agglomerates: | | | | --- | --- | | Agglomerate 1 | 1, 2, 3, 4, 7 | | Agglomerate 2 | 5, 6 |

Now, let's rewrite the segment IDs to local indices: | Segment | Agglomerate | Localized Segment | | --- | --- | --- | | 1 | 1 | 0 | | 2 | 1 | 1 | | 3 | 1 | 2 | | 4 | 1 | 3 | | 5 | 2 | 0 | | 6 | 2 | 1 | | 7 | 1 | 4 |

With this, we can rewrite and sort the edges: | Edge | Agglomerate | Localized Edge | | --- | --- | --- | | (1, 2) | 1 | (0, 1) | | (1, 7) | 1 | (0, 4) | | (2, 3) | 1 | (1, 2) | | (3, 4) | 1 | (2, 3) | | (5, 6) | 2 | (0, 1) |

This would be the content of the arrays: | Array | Content | Shape | | --- | --- | --- | | segment_to_agglomerate | [0, 1, 1, 1, 1, 2, 2, 1] | (8,) | | agglomerate_to_segments_offsets | [0, 0, 5, 7] | (4,) | | agglomerate_to_segments | [1, 2, 3, 4, 7, 5, 6] | (7,) | | agglomerate_to_edges_offsets | [0, 0, 4, 5] | (4,) | | agglomerate_to_edges | [[0, 1], [0, 4], [1, 2], [2, 3], [0, 1]] | (5, 2) | | agglomerate_to_affinities | [124.0, 65.5, 0.0, 250.5, 80.0] | (5,) | | agglomerate_to_positions | [[x1, y1, z1], ..., [x7, y7, z7]] | (7, 3) |