HDF5 Output Format¶
The ngimager pipeline writes a single HDF5 file per run. The file is designed to be:
- self-describing — contains a config snapshot and a README,
- traceable — you can go from pixels → cones → events → hits (and back),
- diagnostic-friendly — counters from each stage are stored.
This page documents the current layout.
Top-level structure¶
At the top level, the file contains:
- attributes:
format_version(str): ng-imager HDF5 schema version.created_utc(str): ISO-8601 timestamp of file creation.software(str): software tag, e.g."ng-imager 0.3.1", derived from the installed package metadata.run_command(str, optional): shell-style command string used to launch the run, e.g.python -m ngimager.pipelines.core examples/configs/foo.toml --fast.run_command_argv_json(str, optional): JSON-encodedargvlist corresponding torun_command, e.g.["python","-m","ngimager.pipelines.core","examples/configs/foo.toml","--fast"].config_text(str): verbatim TOML config snapshot as used.
- groups:
/meta/images/cones/lm(list-mode and event/hit info; always present, richer in list mode)
/meta¶
/meta holds geometry, run flags, human-readable text, and counters.
Geometry and run info¶
Group attributes:
plane.P0:[3]float64, plane origin in cmplane.n:[3]float64, plane normalplane.eu:[3]float64, plane u-basis-
plane.ev:[3]float64, plane v-basis -
grid.u_min,grid.u_max,grid.du grid.v_min,grid.v_max,grid.dvgrid.nu,grid.nv
Run flags:
run_fast: boolrun_list: boolrun_neutron: boolrun_gamma: bool
Config snapshot and README¶
Datasets:
/meta/config_toml— the full TOML configuration used for this run, stored as a single variable-length UTF-8 string./meta/readme— a short README (array of strings) summarizing the layout and pointing to the online documentation. (You are here, welcome.)
/meta/config_effective¶
A structured snapshot of the effective configuration used for the run, including any CLI overrides that were applied on top of the TOML file.
It is represented as three parallel 1D datasets:
/meta/config_effective/keys(str[N]) – dotted field paths such as"run.fast","run.list","io.input.path","run.meta.beam"./meta/config_effective/types(str[N]) – Python type names for each entry, e.g."bool","int","float","str","list", etc./meta/config_effective/values(str[N]) – JSON-encoded representation of the value when possible (e.g.true,"foo",["a","b"]), otherwise a stringifiedrepr(...)of the value.
This table mirrors cfg.model_dump() at the time of writing the HDF5 file and therefore reflects the actual settings used, not just the raw TOML contents.
Extra text metadata¶
If [io.extra_text_files] is set in the TOML config, ng-imager will
embed those files under:
/meta/extra_text/{key}
Each dataset is a single variable-length UTF-8 string containing the full contents of the corresponding file.
For example, with:
[io.extra_text_files]
phits_input = "phits/input/deck.inp"
daq_config = "daq/config/daq_settings.txt"
you would get:
/meta/extra_text/phits_input/meta/extra_text/daq_config
storing the PHITS input deck and DAQ config text, respectively.
Run metadata from config¶
For convenience, ng-imager can carry high-level run descriptors from
the TOML into the HDF5 file. When a [run.meta] table is present in
the config, all of its key–value pairs are mirrored into:
/meta/run_meta: group
Each key in [run.meta] becomes a string attribute on this group. For
example, with:
[run]
plot_label = "175 MeV p, target B, det config 3"
[run.meta]
beam = "175 MeV proton"
target = "Geometry B"
det_setup = "Arrangement 3"
facility = "OncoRay"
the HDF5 file will contain:
- group
/meta/run_metawith attributes:beam = "175 MeV proton"target = "Geometry B"det_setup = "Arrangement 3"facility = "OncoRay"
If [run].plot_label is set, its value is also stored as the
plot_label attribute on /meta/run_meta. Visualization tools
(including the built-in pipeline PNG export and the ng-viz CLI) may
use this string as a figure title or annotation.
The /meta/run_meta group is deliberately free-form: consumers should
treat it as optional and robust to extra keys. It is intended to
capture human-facing run descriptions that travel with the file rather
than physics-critical configuration (which remains in
/meta/config_toml).
Counters¶
Counters are stored as simple datasets under /meta/counters:
/meta/counters/s1_raw_events_total/meta/counters/s1_hits_after_filters/meta/counters/s2_events_after_filters/meta/counters/s3_cones_kept_n/meta/counters/s3_cones_kept_g/meta/counters/s3_cones_rejected_delta_theta- … etc.
The exact key set may grow over time, but the naming convention is:
s1_*— Stage 1 (raw events → hits)s2_*— Stage 2 (hits → shaped/typed events → event filters)s3_*— Stage 3 (events → cones + cone filters)s4_*— Stage 4 (cones → images)
All counters are int64 and are intended for quick QA and experiment
diagnostics.
/images/summed¶
Summed images live under /images/summed:
/images/summed/n:[nv, nu]float32 — neutron-only SBP image/images/summed/g:[nv, nu]float32 — gamma-only SBP image (ifrun.gammas = true)/images/summed/all:[nv, nu]float32 —n + g, only written when both species are present and produce non-zero images.
These datasets use gzip compression and match the grid described under /meta.
/images/summed/projections¶
When [vis.projections].enabled = true in the TOML config, the pipeline writes 1D u/v projections of each summed image under /images/summed/projections.
Layout:
/images/summed/projections/n/u # [nu] float32, sum over v (rows)
/images/summed/projections/n/v # [nv] float32, sum over u (cols)
/images/summed/projections/n/u_roi # [nu] float32, ROI-limited (optional)
/images/summed/projections/n/v_roi # [nv] float32, ROI-limited (optional)
# similarly for "g" and "all"
Here:
u[i]is the sum of all pixels in columniof/images/summed/n(i.e. integrated along the v-axis),v[j]is the sum of all pixels in rowj(integrated along u),nuandnvmatch the imaging grid (grid.nu,grid.nv) stored in/meta,- the mapping from index → coordinate uses the same grid as the 2D images:
If a rectangular ROI is configured via [vis.projections], the ROI-limited projections (u_roi, v_roi) are computed by summing only pixels whose centers fall inside the ROI. The ROI bounds (in cm) are stored as attributes on each species group, e.g.:
These projections provide a convenient sanity check for the u/v orientation and are intended to simplify downstream 1D analyses (e.g. peak finding, edge detection) without needing to re-derive them from the 2D images.
/images/summed/projections/*/metrics — Projection Statistics and Fitting Results¶
When [vis.projections.metrics] is enabled, ng-imager computes and
stores per-axis (u, v) metrics for each species (n, g, all).
Metrics are written under:
/images/summed/projections/{species}/metrics/u
/images/summed/projections/{species}/metrics/v
If a rectangular ROI is configured, additional groups exist:
/images/summed/projections/{species}/metrics/u_roi
/images/summed/projections/{species}/metrics/v_roi
Each of these groups contains scalar (0D) datasets describing the corresponding 1D projection curve.
Summary statistics (compute_summary = true)¶
Scalar datasets:
total_counts(float64) – sum of all counts in the projectionmean_cm(float64) – weighted mean coordinate (cm)median_cm(float64) – median coordinate (cm) from the CDFstd_cm(float64) – standard deviation (cm) about the meansummary_ok(bool) – validity flag (true if enough counts, etc.)
summary_ok is set to false and the coordinate-valued metrics are
set to NaN if the total counts fall below min_counts.
Peak metrics (compute_peak = true)¶
Peak metrics are currently based on the location of the maximum bin of the projection (no Gaussian fit yet):
peak_pos_cm(float64) – coordinate of the maximum bin (cm)peak_value(float64) – value of the maximum bin (counts)peak_ok(bool) – validity flag
Fractional edges (compute_edges = true)¶
Edges are derived from the normalized cumulative integral of the projection, using configuration:
edge_low_frac(float64, attribute on metrics/u and metrics/v)edge_high_frac(float64, attribute on metrics/u and metrics/v)min_counts(float64, attribute)
Scalar datasets in each metrics group:
edge_low_cm(float64) – coordinate where CDF crossesedge_low_fracedge_high_cm(float64) – coordinate where CDF crossesedge_high_fracedge_width_cm(float64) –edge_high_cm - edge_low_cmedges_ok(bool) – validity flag
ROI metrics¶
If an ROI is defined via [vis.projections], the ROI-limited metrics
are stored under:
/images/summed/projections/{species}/metrics/u_roi
/images/summed/projections/{species}/metrics/v_roi
These groups contain the same dataset names and semantics as the non-ROI metrics. ROI projections use the full u/v index range; bins outside the ROI are zero.
2D centroid (compute_centroid = true)¶
Stored at:
/images/summed/projections/{species}/metrics/centroid
Datasets:
u_centroid(float64)v_centroid(float64)total_counts(float64)
These describe the centroid of the full 2D reconstructed image.
Additional notes for downstream tools¶
- Metrics are always datasets, not attributes (except the edge
configuration, which lives as attributes on
metrics/uandmetrics/v). - Tools should detect metric availability by checking dataset existence.
- Coordinate values are stored in centimeters, regardless of plot-unit setting.
-
ROI bounds appear as attributes under each species projection group:
roi_u_min_cm roi_u_max_cm roi_v_min_cm roi_v_max_cm
Additional notes for downstream tools¶
- Metrics are always datasets, not attributes (except fractional edge config).
- Tools should detect metric availability by checking dataset existence.
- Coordinate values are stored in centimeters, regardless of plot-unit setting.
- ROI bounds appear as attributes under the species projection group:
/cones¶
/cones stores the geometric and physical properties of each cone.
Datasets:
/cones/cone_id:[N_cones]uint32/cones/apex_xyz_cm:[N_cones, 3]float32/cones/axis_xyz:[N_cones, 3]float32 (unit vectors)/cones/theta_rad:[N_cones]float32/cones/incident_energy_MeV:[N_cones]float32/cones/event_index:[N_cones]int32- Row index into
/lm/event_typeand/lm/hit_*for the event that produced this cone.
- Row index into
/cones/gamma_hit_order:[N_cones, 3]int8- For gamma cones (those with
species == 1), each row is a triple(i0, i1, i2)giving the indices into/lm/hit_*[event_index, :, :]that correspond to the (first scatter, second scatter, third point) used to build the Compton cone. For neutron cones (species == 0), the row is(-1, -1, -1)and should be ignored.
- For gamma cones (those with
Classification:
/cones/species:[N_cones]uint8/cones/species_labels: string array legend,- e.g.
["0=neutron", "1=gamma"]
- e.g.
/cones/recoil_code:[N_cones]uint8/cones/recoil_code_labels: string array legend,- e.g.
["0=NA/gamma/unknown", "1=proton", "2=carbon"]
- e.g.
Interpretation:
speciesdistinguishes neutron and gamma cones.recoil_codedistinguishes proton vs carbon recoils for neutron cones.incident_energy_MeVis the kinematically inferred incident particle energy for that cone:- for neutrons, from ToF + deposited energy at the first scatter;
- for gammas, from Compton kinematics.
event_index+gamma_hit_orderallow you to recover, for each gamma cone, exactly which of the three stored hits in/lm/hit_*were interpreted as first/second/third in the selected Compton ordering.
/lm: events and hits¶
The /lm group contains per-event and per-hit data used for list-mode analysis.
It is written regardless of run.list; list-mode imaging adds extra datasets.
Event-level datasets¶
/lm/event_type:[N_events]uint8 (0=neutron,1=gamma)/lm/event_type_labels: legend array, e.g.["0=neutron", "1=gamma"]/lm/event_meta_run_id:[N_events]int32 (optional provenance)/lm/event_meta_file_ix:[N_events]int32 (optional provenance)
An “event” here means a fully typed NeutronEvent or GammaEvent that has
survived hit-level and event-level filters (Stage 1–2).
Hit-level datasets¶
Hits are stored in fixed slots per event:
/lm/hit_pos_cm:[N_events, 3, 3]float32/lm/hit_t_ns:[N_events, 3]float64/lm/hit_L_mevee:[N_events, 3]float32/lm/hit_det_id:[N_events, 3]int32/lm/hit_material_id:[N_events, 3]int16
Conventions:
- Neutron events use slots 0 and 1; slot 2 is filled with NaNs / -1.
- Gamma events use all three slots.
hit_L_meveeisHit.L(calibrated light in MeVee for real data orEdepin MeV for PHITS-style sources).
Material labels:
/lm/material_id_labels: string array mappinghit_material_idvalues back to human-readable material names ("OGS","M600", etc.).
/lm: list-mode imaging (optional)¶
When run.list = true, the SBP reconstruction also tracks which pixels each
cone contributes to. This information is written to /lm in a form that
makes it easy to map:
pixels → cones → events → hits
Per-cone pixel indices¶
/lm/cone_pixel_indices:[K, 2]uint32- Canonical location: each row is
(cone_id, flat_pixel_index).
- Canonical location: each row is
/images/list_mode/cone_pixel_indices: alias (HDF5 soft link) to/lm/cone_pixel_indices
Each row is:
where:
cone_idis an index into/cones/cone_idand the other cone arrays, andflat_pixel_indexis a flattened(u, v)index:
You can recover (u, v) from flat using divmod(flat, nu).
Only cones that actually intersect the imaging plane appear in this dataset.
Event survival table¶
To make the pipeline’s decisions transparent, a small “survival table” is stored:
/lm/event_survival:[N_events, 3]int32
Columns:
event_index— row index into the/lmevent/hit datasets.first_cone_index— index into/cones/*for the first cone built from this event, or-1if the event never produced a cone.first_imaged_cone_index— cone index for the first cone that both was built and intersected the plane (i.e. appeared in/lm/cone_pixel_indices), or-1if none of that event’s cones hit the plane.
Together, the mapping looks like:
hit (lm/hit_*) ← event_index ← event_survival[:, 0]
↘ first_cone_index → cones/*
↘ first_imaged_cone_index → cone_pixel_indices
If you need more detailed correlations (e.g. many cones per event), you can walk:
/cones/event_indexto find all cones belonging to an event;/lm/cone_pixel_indicesto find all pixels touched by each cone.
Adapter-specific extras¶
Adapters are allowed to attach additional, source-specific information under /meta and elsewhere, as long as the core layout described above remains stable.
At the moment there are two main adapter families with extra payloads:
PHITS adapter (phits_usrdef)¶
The PHITS adapter may populate ragged list-mode datasets under /lm/hits and /lm/events as a more PHITS-like representation of the input, mainly for debugging and cross-checks:
-
/lm/hitsevent_ptr: CSR-style pointer array (length = N_events + 1)x_cm, y_cm, z_cm, t_ns, Edep_MeV, reg: flat per-hit arrays
-
/lm/eventsevent_type: 0 = unknown, 1 = n, 2 = g, 3 = mixediomp, batch, history, no, name: PHITS event bookkeeping
These datasets are optional and may be omitted when the PHITS adapter is not used.
NOVO DDAQ ROOT adapter (root_novo_ddaq)¶
When the NOVO DDAQ ROOT adapter is used, additional run-level metadata from the ROOT meta TTree are persisted under:
/meta/root_novo_ddaq: group
Run-level scalar fields are stored as group attributes, mirroring the ROOT metadata where available:
InputFileNameOutputFileNameCDFFileNamePSDCutsFileNameSampleRateNumDetNumThreadsWriteHistogramsMergeModeCardOffsetChannelUsePositionVeto
In addition, a normalized integer run identifier is exposed as:
run_number(attribute on/meta/root_novo_ddaq)
This is intended to capture the “run number” used in NOVO data taking (e.g. ..._000041.root → run_number = 41). When the adapter cannot infer a run number, it may omit this attribute or use a sentinel value.
Per-detector geometric and timing metadata from the ROOT file are collected into a small table under:
/meta/root_novo_ddaq/detectors: group
with the following datasets (length = NumDet):
det_id:[NumDet] int32pos:[NumDet, 3] float32– detector position (PosX,PosY,PosZ) in mmdim:[NumDet, 3] float32– detector dimensions (DimX,DimY,DimZ) in mmrot_deg:[NumDet, 3] float32– detector rotations (RotX,RotY,RotZ) in degreeslocal_time_offset:[NumDet] float32– local timing offset in nsglobal_time_offset:[NumDet] float32– global timing offset in nspos_cal_file:[NumDet] string– per-detector position calibration filenamesenergy_cal_file:[NumDet] string– per-detector energy calibration filenamesis_start_det:[NumDet] int8– 1 if marked as a start detector, else 0is_laser_det:[NumDet] int8– 1 if marked as a laser detector, else 0
Each numeric dataset carries a units attribute ("mm", "deg", "ns") to make downstream interpretation explicit.
The /meta/root_novo_ddaq layout is intended to be forward compatible: new attributes or datasets may be added in future versions, but existing ones should not be removed or change meaning.
Versioning and compatibility¶
The root attribute format_version records the HDF5 schema version. Minor
additions (e.g. new counters or label arrays) are made in a way that keeps
existing code working; new consumers should always check for dataset
existence before assuming it is present.
If you are scripting against the file format, consider:
- gating on
format_version, - using the label arrays (
*_labels) rather than hard-coding integer codes, and - relying on
/meta/config_tomlto reproduce or document the run settings.