How to inspect data from past runs#

Bluesky runs are stored in a Tiled catalog. This page shows how to read them back at the IPython prompt.

The catalog object#

The session-level catalog client is bound to cat:

cat                # <Container ...>  -- top-level Tiled client
len(cat)           # number of runs available
cat[-1]            # most recent run
cat[-5:]           # last five runs
cat["<uid>"]       # a specific run by UID

You can also instantiate a fresh client without restarting your session:

from tiled.client import from_uri
fresh = from_uri("http://sn.xray.aps.anl.gov:8000")
fresh                # /raw under that server

Structure of a run#

A Bluesky run is a small tree of streams. The two you will see most often are primary (the main event stream the scan emits) and baseline (the once-at-start, once-at-stop snapshot of every device tagged baseline):

run = cat[-1]
list(run)
# ['primary', 'baseline']

primary = run.primary
list(primary)
# ['data', 'config', 'time', 'seq_num', ...]

The primary.data group holds the actual readings; everything else is bookkeeping.

Reading the data#

run.primary.read() returns an xarray.Dataset – a labeled multi-dimensional array container that plays well with pandas, numpy, and matplotlib:

ds = run.primary.read()
ds
# <xarray.Dataset>
# Dimensions:  (time: 11)
# Coordinates:
#   * time     (time) datetime64[ns] ...
# Data variables:
#     sample_stage_xprime  (time) float64 0.0 1.0 2.0 ...
#     scaler_chan01    (time) float64 1234.0 1180.0 ...

ds.to_pandas()
# pandas DataFrame

ds["scaler_chan01"].plot()
# matplotlib plot

For single columns: ds["sample_stage_xprime"] returns an xarray.DataArray.

Metadata#

Every run carries metadata accessible without reading the bulk data:

run.metadata["start"]
# {'uid': '...', 'time': 1719000000.0, 'scan_id': 1, 'plan_name': 'scan',
#  'plan_args': {...}, 'plan_type': 'generator',
#  'detectors': ['scaler'], 'motors': ['sample_stage_xprime'], 'num_points': 11,
#  ...}

run.metadata["stop"]
# {'uid': '...', 'time': ..., 'exit_status': 'success', 'num_events': {...}}

run.metadata["start"]["scan_id"]
# 1

Baseline stream#

If the device is baseline-labeled in devices.yml, it appears in the baseline stream automatically:

run.baseline.read()
# <xarray.Dataset>
# Dimensions: (time: 2)
# Data variables (one entry per baseline device, both at start and at end):
#     sample_stage_xprime  (time) float64 ...
#     sample_stage_base_y  (time) float64 ...
#     ...

Useful for “what was the rest of the instrument doing while I scanned this one motor?”.

Filtering and searching#

Tiled supports basic filtering on metadata:

from tiled.queries import Key

# Runs by plan name:
cat.search(Key("plan_name") == "scan")

# Runs from the last hour (server-side time field is unix epoch):
import time
cat.search(Key("time") > time.time() - 3600)

Image (area-detector) data#

If a scan included an area detector like the Eiger2, the image data is referenced from the run’s master HDF5 file via HDF5 external links. Reading it from cat[-1] should return a dask-backed array, but as of the date of this writing the end-to-end path is not yet validated at 3-ID-C. See How to visualize HDF5 for the current state.

Common pitfalls#

  • Wrapping reads in RE(...) is wrong; cat[-1].primary.read() is not a plan. See The RunEngine.

  • Reading a very large image array eagerly. read() will pull the entire dataset into memory. For Eiger-class data, slice first: run.primary["eiger2_image"][0] to get the first frame.

  • Stale cat after a long session. Tiled clients cache schemas; if a new run is not showing up, refresh with cat.refresh().

See also#