How to Search for Bluesky Data?#

Show how to search for Bluesky data from a databroker catalog.

The databroker search and lookup tutorial is a great way to learn how to search for Bluesky data. This How-To guide continues from the tutorial, using additional support from apstools. Custom queries are expressed using MongoDB Query. Additional help with the MongoDB query language, operators, and syntax may be found online. The content from w3schools is both informative and compact.

What is a catalog?#

A catalog is a group of Bluesky measurements from the databroker.

Here, we create a catalog, a group of Bluesky runs from the class_2021_03 database.

[1]:
import databroker

print(f"catalogs available: {list(databroker.catalog)=}")

cat = databroker.catalog["class_2021_03"].v2
cat
catalogs available: list(databroker.catalog)=['bdp2022', 'class_2021_03', '6idb_export', 'apstools_test', 'class_data_examples', 'usaxs_test', 'korts202106', 'training']
class_2021_03:
  args:
    asset_registry_db: mongodb://localhost:27017/class_2021_03-bluesky
    metadatastore_db: mongodb://localhost:27017/class_2021_03-bluesky
    name: class_2021_03
  description: ''
  driver: databroker._drivers.mongo_normalized.BlueskyMongoCatalog
  metadata:
    catalog_dir: /home/prjemian/.local/share/intake/

What does the cat object describe?

[2]:
print(f"{cat.name=}")
print(f"{len(cat)=} measurements (runs) in the catalog")
print(f"{cat.metadata=}")
cat.name='class_2021_03'
len(cat)=40 measurements (runs) in the catalog
cat.metadata={'catalog_dir': '/home/prjemian/.local/share/intake/'}

NOTE: A search on a catalog object returns a new catalog as filtered by the search parameters.

What is a run?#

Bluesky uses the term run to describe a single measurement in a catalog. A run contains data and metadata from a single measurement, scan, or other data acquisition, such as count or scan.

A run consists of several parts. Each part has its own type of document, as follows:

document

description

start

Initial information about the measurement, including metadata.

descriptor

A description of the data to be collected.

event

The measurement data.

stop

A final summary of the measurement.

How to retrieve a run?#

Retrieve any run from the catalog using one of three references:

reference

example

type

description

scan_id

cat[192]

positive integer

not necessarily unique, returns most recent

relative

cat[-1]

negative integer

-1 is most recent run, -2 is the run before, …

uid

cat["abc1234"]

UUID string

unique, matches the first characters of the start document uid

A uid is created and returned by the Bluesky RunEngine after it executes a plan (that starts a new run).

While a full uid provides a unique reference to a run, it appears to humans to be a random sequence of hexadecimal characters with hyphens at irregular intervals. Partial representation of the uid is allowed, matching from the start of the full uid. The given short uid must include enough characters to make a unique match in the catalog and must include up to the first non-numeric character to avoid mis-interpretation as an integer. You will be advised if the short uid is not a unique match.

The first seven characters (uid7) are often sufficient as a short uid. There is a 1 in \(16^7\) (268 million) chance of this not being unique.

Each of these references retrieve the same run:

[3]:
print(f"scan_id: {cat[192]=}")
print(f"scan_id: {cat['192']=}")  # treated as an integer by databroker
print(f"relative: {cat[-1]=}")
print(f"relative: {cat['-1']=}")  # treated as an integer by databroker
print(f"short uid: {cat['e3']=}")  # shortest version that works
print(f"short uid: {cat['e3862991-688d']=}")  # must include the hyphen
scan_id: cat[192]=<BlueskyRun uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'>
scan_id: cat['192']=<BlueskyRun uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'>
relative: cat[-1]=<BlueskyRun uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'>
relative: cat['-1']=<BlueskyRun uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'>
short uid: cat['e3']=<BlueskyRun uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'>
short uid: cat['e3862991-688d']=<BlueskyRun uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'>

Show the run has the expected scan_id.

[4]:
print(f"scan_id: {cat[192].metadata['start']['scan_id']=}")
cat[-1]
scan_id: cat[192].metadata['start']['scan_id']=192
[4]:
BlueskyRun
  uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'
  exit_status='success'
  2021-05-19 15:22:03.070 -- 2021-05-19 15:22:07.087
  Streams:
    * baseline
    * primary

There is one more way to retrieve runs from the catalog, iterate over the catalog to get the full uid of each run, then use cat[uid] to get the run object. This technique might be more useful on smaller catalogs.

Here, we break after the first one (to limit output):

[5]:
for uid in cat:
    print(f"{uid=}  {cat[uid]=}")
    break
uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'  cat[uid]=<BlueskyRun uid='e3862991-688d-43dc-8442-d85ccbb3d6c8'>

Search with listruns()#

Q: What are the most recent runs?

The apstools.utils.listruns() function from apstools provides a listing of the most recent runs. Taking all the default settings, listruns() shows (up to) the 20 most recent runs in the catalog.

Here, the first column is an index number (and can be ignored), the remaining columns are labeled. Note the time column includes both the yyyy-mm-dd date and the HH:MM:SS time of day (24-hour time).

[6]:
from apstools.utils import listruns

listruns()
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: class_2021_03
    scan_id                 time plan_name                   detectors
0       192  2021-05-19 15:22:03  rel_scan  [scaler1, adsimdet, zaxis]
1       191  2021-05-19 15:19:27  rel_scan         [scaler1, adsimdet]
2       190  2021-05-19 15:18:59  rel_scan                   [scaler1]
3       189  2021-05-19 15:18:28  rel_scan      [scaler1, temperature]
4       188  2021-05-19 15:14:56  rel_scan                   [scaler1]
5       187  2021-05-19 15:13:47  rel_scan                   [scaler1]
6        33  2021-04-07 15:55:09      scan                     [noisy]
7        33  2021-03-17 00:32:55     count                  [adsimdet]
8        32  2021-03-17 00:31:44     count               [temperature]
9        31  2021-03-17 00:31:24     count               [temperature]
10       30  2021-03-17 00:31:23     count               [temperature]
11       29  2021-03-17 00:31:23     count               [temperature]
12       28  2021-03-17 00:30:33  rel_scan                     [noisy]
13       27  2021-03-17 00:30:27  rel_scan                     [noisy]
14       26  2021-03-17 00:30:21  rel_scan                     [noisy]
15       25  2021-03-17 00:30:04  rel_scan                     [noisy]
16       24  2021-03-17 00:29:57  rel_scan                     [noisy]
17       23  2021-03-17 00:29:40  rel_scan                     [noisy]
18       22  2021-03-17 00:27:55     count            [scaler1, noisy]
19       21  2021-03-17 00:27:23     count                   [scaler1]

There are many options for listruns(), show them using the listruns? syntax:

[7]:
listruns?
Signature:
listruns(
    cat=None,
    keys=None,
    missing='',
    num=20,
    printing='smart',
    reverse=True,
    since=None,
    sortby='time',
    tablefmt='dataframe',
    timefmt='%Y-%m-%d %H:%M:%S',
    until=None,
    ids=None,
    **query,
)
Docstring:
List runs from catalog.

This function provides a thin interface to the highly-reconfigurable
``ListRuns()`` class in this package.

PARAMETERS

cat
    *object* :
    Instance of databroker v1 or v2 catalog.
keys
    *str* or *[str]* or None:
    Include these additional keys from the start document.
    (default: ``None`` means ``"scan_id time plan_name detectors"``)
missing
    *str*:
    Test to report when a value is not available.
    (default: ``""``)
ids
    *[int]* or *[str]*:
    List of ``uid`` or ``scan_id`` value(s).
    Can mix different kinds in the same list.
    Also can specify offsets (e.g., ``-1``).
    According to the rules for ``databroker`` catalogs,
    a string is a ``uid`` (partial representations allowed),
    an int is ``scan_id`` if positive or an offset if negative.
    (default: ``None``)
num
    *int* :
    Make the table include the ``num`` most recent runs.
    (default: ``20``)
printing
    *bool* or ``"smart"``:
    If ``True``, print the table to stdout.
    If ``"smart"``, then act as shown below.
    (default: ``True``)

    ================  ===================
    session           action(s)
    ================  ===================
    python session    print and return ``None``
    Ipython console   return ``DataFrame`` object
    Jupyter notebook  return ``DataFrame`` object
    ================  ===================

reverse
    *bool* :
    If ``True``, sort in descending order by ``sortby``.
    (default: ``True``)
since
    *str* :
    include runs that started on or after this ISO8601 time
    (default: ``"1995-01-01"``)
sortby
    *str* :
    Sort columns by this key, found by exact match in either
    the ``start`` or ``stop`` document.
    (default: ``"time"``)
tablefmt
    *str* :
    When returning an object, specify which type
    of object to return.
    (default: ``"dataframe",``)

    ========== ==============
    value      object
    ========== ==============
    dataframe  ``pandas.DataFrame``
    table      ``str(pyRestTable.Table)``
    ========== ==============

timefmt
    *str* :
    The ``time`` key (also includes keys ``"start.time"`` and  ``"stop.time"``)
    will be formatted by the ``self.timefmt`` value.
    See https://strftime.org/ for examples.  The special ``timefmt="raw"``
    is used to report time as the raw value (floating point time as used in
    python's ``time.time()``).
    (default: ``"%Y-%m-%d %H:%M:%S",``)
until
    *str* :
    include runs that started before this ISO8601 time
    (default: ``2100-12-31``)
``**query``
    *dict* :
    Any additional keyword arguments will be passed to
    the databroker to refine the search for matching runs
    using the ``mongoquery`` package.

RETURNS

object:
    ``None`` or ``str`` or ``pd.DataFrame()`` object

EXAMPLE::

    TODO

(new in release 1.5.0)
File:      ~/Documents/projects/BCDA-APS/apstools/apstools/utils/list_runs.py
Type:      function

Search within a range of dates#

Q: What are the runs between certain dates?

To find runs that started since a particular date, use listruns(since="yyyy-mm-dd hh:mm") where yyyy-mm-dd hh:mm is a suggestion. You only need to supply the parts that matter, so “2:00” would find all runs that started after 2 AM today.

Here, we look for runs since the beginning of April 2021. (Because the date is incomplete, it is implied that the full specification is 2021-04-01 00:00:00.0000000.)

[8]:
listruns(since="2021-04")
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: class_2021_03
   scan_id                 time plan_name                   detectors
0      192  2021-05-19 15:22:03  rel_scan  [scaler1, adsimdet, zaxis]
1      191  2021-05-19 15:19:27  rel_scan         [scaler1, adsimdet]
2      190  2021-05-19 15:18:59  rel_scan                   [scaler1]
3      189  2021-05-19 15:18:28  rel_scan      [scaler1, temperature]
4      188  2021-05-19 15:14:56  rel_scan                   [scaler1]
5      187  2021-05-19 15:13:47  rel_scan                   [scaler1]
6       33  2021-04-07 15:55:09      scan                     [noisy]

Likewise, find runs that started before a particular date and time using listruns(until="yyyy-mm-dd hh:mm")

Here, the specification will include any date until (before) the beginning of May 2021.

[9]:
listruns(until="2021-05")
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: class_2021_03
    scan_id                 time plan_name               detectors
0        33  2021-04-07 15:55:09      scan                 [noisy]
1        33  2021-03-17 00:32:55     count              [adsimdet]
2        32  2021-03-17 00:31:44     count           [temperature]
3        31  2021-03-17 00:31:24     count           [temperature]
4        30  2021-03-17 00:31:23     count           [temperature]
5        29  2021-03-17 00:31:23     count           [temperature]
6        28  2021-03-17 00:30:33  rel_scan                 [noisy]
7        27  2021-03-17 00:30:27  rel_scan                 [noisy]
8        26  2021-03-17 00:30:21  rel_scan                 [noisy]
9        25  2021-03-17 00:30:04  rel_scan                 [noisy]
10       24  2021-03-17 00:29:57  rel_scan                 [noisy]
11       23  2021-03-17 00:29:40  rel_scan                 [noisy]
12       22  2021-03-17 00:27:55     count        [scaler1, noisy]
13       21  2021-03-17 00:27:23     count               [scaler1]
14       20  2021-03-17 00:27:22     count               [scaler1]
15       19  2021-03-16 17:06:01      scan  [scaler1, temperature]
16       18  2021-03-16 17:05:51      scan  [scaler1, temperature]
17       17  2021-03-16 17:05:42      scan  [scaler1, temperature]
18       16  2021-03-16 17:05:32      scan  [scaler1, temperature]
19       15  2021-03-16 17:05:03      scan  [scaler1, temperature]

You can combine them:

[10]:
listruns(since="2021-04", until="2021-05")
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: class_2021_03
   scan_id                 time plan_name detectors
0       33  2021-04-07 15:55:09      scan   [noisy]

Search metadata keys#

Q: How to search for runs matching certain metadata?

Bluesky stores a run’s metadata in the start document. Any of the terms may be searched by adding keyword arguments to listruns(), the value is the content to match.

Since only a few metadata keys are standard (such as time and uid), you should be prepared for the possibility that any particular key may not be found.

Let’s take a look at the metadata of the most recent run in the catalog:

[11]:
run = cat[-1]
run.metadata
[11]:
{'start': Start({'beamline_id': 'APS_Python_training_2021',
 'detectors': ['scaler1', 'adsimdet', 'zaxis'],
 'hints': {'dimensions': [[['zaxis_h'], 'primary']]},
 'instrument_name': 'class_2021_03',
 'login_id': 'prjemian@zap',
 'motors': ['zaxis_h'],
 'notebook': 'UB_autosave',
 'num_intervals': 7,
 'num_points': 8,
 'objective': 'Demonstrate UB matrix save & restore',
 'pid': 3712584,
 'plan_args': {'args': ["PseudoSingle(prefix='', name='zaxis_h', "
                        "parent='zaxis', settle_time=0.0, timeout=None, "
                        "egu='', limits=(0, 0), source='computed', "
                        "read_attrs=['readback', 'setpoint'], "
                        'configuration_attrs=[], idx=0)',
                        -0.1,
                        0.1],
               'detectors': ["ScalerCH(prefix='gp:scaler1', name='scaler1', "
                             "read_attrs=['channels', 'channels.chan01', "
                             "'channels.chan01.s', 'channels.chan02', "
                             "'channels.chan02.s', 'channels.chan03', "
                             "'channels.chan03.s', 'channels.chan04', "
                             "'channels.chan04.s', 'time'], "
                             "configuration_attrs=['channels', "
                             "'channels.chan01', 'channels.chan01.chname', "
                             "'channels.chan01.preset', "
                             "'channels.chan01.gate', 'channels.chan02', "
                             "'channels.chan02.chname', "
                             "'channels.chan02.preset', "
                             "'channels.chan02.gate', 'channels.chan03', "
                             "'channels.chan03.chname', "
                             "'channels.chan03.preset', "
                             "'channels.chan03.gate', 'channels.chan04', "
                             "'channels.chan04.chname', "
                             "'channels.chan04.preset', "
                             "'channels.chan04.gate', 'count_mode', 'delay', "
                             "'auto_count_delay', 'freq', 'preset_time', "
                             "'auto_count_time', 'egu'])",
                             "MySimDetector(prefix='ad:', name='adsimdet', "
                             "read_attrs=['hdf1'], configuration_attrs=['cam', "
                             "'cam.acquire_period', 'cam.acquire_time', "
                             "'cam.image_mode', 'cam.manufacturer', "
                             "'cam.model', 'cam.num_exposures', "
                             "'cam.num_images', 'cam.trigger_mode', 'hdf1'])",
                             "MyZaxis(prefix='', name='zaxis', "
                             "settle_time=0.0, timeout=None, egu='', "
                             "limits=(0, 0), source='computed', "
                             "read_attrs=['h', 'h.readback', 'h.setpoint', "
                             "'k', 'k.readback', 'k.setpoint', 'l', "
                             "'l.readback', 'l.setpoint', 'mu', 'omega', "
                             "'delta', 'gamma'], "
                             "configuration_attrs=['energy', 'geometry_name', "
                             "'class_name', 'UB', 'reflections_details', 'h', "
                             "'k', 'l'], concurrent=True)"],
               'num': 8,
               'per_step': 'None'},
 'plan_name': 'rel_scan',
 'plan_pattern': 'inner_product',
 'plan_pattern_args': {'args': ["PseudoSingle(prefix='', name='zaxis_h', "
                                "parent='zaxis', settle_time=0.0, "
                                "timeout=None, egu='', limits=(0, 0), "
                                "source='computed', read_attrs=['readback', "
                                "'setpoint'], configuration_attrs=[], idx=0)",
                                -0.1,
                                0.1],
                       'num': 8},
 'plan_pattern_module': 'bluesky.plan_patterns',
 'plan_type': 'generator',
 'proposal_id': 'training',
 'scan_id': 192,
 'time': 1621455723.0701044,
 'uid': 'e3862991-688d-43dc-8442-d85ccbb3d6c8',
 'versions': {'apstools': '1.5.0rc1',
              'bluesky': '1.6.7',
              'databroker': '1.2.2',
              'epics': '3.4.3',
              'h5py': '3.2.1',
              'intake': '0.6.2',
              'matplotlib': '3.3.4',
              'numpy': '1.20.1',
              'ophyd': '1.6.1',
              'pyRestTable': '2020.0.3',
              'spec2nexus': '2021.1.8'}}),
 'stop': Stop({'exit_status': 'success',
 'num_events': {'baseline': 2, 'primary': 8},
 'reason': '',
 'run_start': 'e3862991-688d-43dc-8442-d85ccbb3d6c8',
 'time': 1621455727.087438,
 'uid': '70b9a95f-59aa-408d-8819-1463f3eebac5'}),
 'catalog_dir': None}

A plan’s name is stored in the run’s metadata as the plan_name key. (There is no guarantee that a run will have this metadata key.) To list runs measured with the count plan, use listruns(plan_name="count").

[12]:
listruns(plan_name="count")
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: class_2021_03
    scan_id                 time plan_name         detectors
0        33  2021-03-17 00:32:55     count        [adsimdet]
1        32  2021-03-17 00:31:44     count     [temperature]
2        31  2021-03-17 00:31:24     count     [temperature]
3        30  2021-03-17 00:31:23     count     [temperature]
4        29  2021-03-17 00:31:23     count     [temperature]
5        22  2021-03-17 00:27:55     count  [scaler1, noisy]
6        21  2021-03-17 00:27:23     count         [scaler1]
7        20  2021-03-17 00:27:22     count         [scaler1]
8         5  2021-03-15 11:54:56     count     [temperature]
9         4  2021-03-15 11:49:42     count     [temperature]
10        3  2021-03-15 11:46:21     count     [temperature]
11        2  2021-03-15 11:44:21     count     [temperature]
12        1  2021-03-15 00:52:29     count     [temperature]

You can combine a search with more than one metadata key, such as any ``count`` run ``#20``.

[13]:
listruns(plan_name="count", scan_id=20)
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: class_2021_03
   scan_id                 time plan_name  detectors
0       20  2021-03-17 00:27:22     count  [scaler1]

Search metadata keys using MongoDB Query#

Q: How to search for a range of some metadata key?

Searching for a specific scan_id is awkward. Since we know that scan_id is stored as a number, we can apply range limits (such as 30 <= scan_id < 100). That type of search requires the syntax from MongoDB Query.

A Query is built as a Python dictionary where the comparison operators are the keys and the comparison values are the corresponding values. These are the terms we need for this query:

operator

MongoDB Query

>=

"$gte"

<

"$lt"

The full Query is scan_id={"$gte": 30, "$lt": 100}.

Here, search for scan_id matching that Query and a count plan:

[14]:
listruns(scan_id={"$gte": 30, "$lt": 100}, plan_name="count")
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: class_2021_03
   scan_id                 time plan_name      detectors
0       33  2021-03-17 00:32:55     count     [adsimdet]
1       32  2021-03-17 00:31:44     count  [temperature]
2       31  2021-03-17 00:31:24     count  [temperature]
3       30  2021-03-17 00:31:23     count  [temperature]

Note that the plan_name="count" expression here is equivalent to the MongoDB Query expression plan_name={"$eq": "count"}.

Search with a Filtered catalog#

As an alternative to listruns(), you can create a filtered catalog by applying MongoDB Query searches to an existing catalog.

Let’s start with a larger initial catalog to demonstrate.

[15]:
cat = databroker.catalog["training"].v2
print(f"There are {len(cat)} runs in the '{cat.name}' catalog.")
There are 8116 runs in the 'training' catalog.

Get runs from the count() plan#

Q: How many runs use the count plan?

To answer this, we must know how to apply the MongoDB Query to the catalog. The catalog has a .search(query) method, where the query term is the dictionary, similar to how it was used above. Instead of a keyword, though, the metadata key is in the dictionary.

[16]:
query = {"plan_name":{"$eq": "count"}}
print(f"{query=}")
filtered_cat = cat.search(query)
print(f"There are {len(filtered_cat)} runs collected by the 'count()' plan.")
query={'plan_name': {'$eq': 'count'}}
There are 295 runs collected by the 'count()' plan.

Searches can get more detailed. Find all count runs with scan_id between 70 and 80. (This catalog has duplicates in this range.) Display the filtered catalog using listruns()

[17]:
filtered_cat = cat.search({"plan_name":{"$eq": "count"}, "scan_id": {"$gt": 70, "$lt": 80}})
print(f"{len(filtered_cat)=}")
listruns(filtered_cat)
len(filtered_cat)=18
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
catalog: search results
    scan_id                 time plan_name                    detectors
0        79  2022-05-28 13:09:24     count                     [simdet]
1        78  2022-05-28 13:06:23     count                   [adsimdet]
2        77  2022-05-27 12:37:54     count                   [adsimdet]
3        76  2022-05-27 12:28:50     count                   [adsimdet]
4        75  2022-05-27 12:27:19     count                     [simdet]
5        74  2022-05-26 17:29:10     count                     [simdet]
6        73  2022-05-26 17:25:14     count                     [simdet]
7        72  2022-05-26 17:24:33     count                     [simdet]
8        71  2022-05-26 17:23:51     count                     [simdet]
9        79  2021-04-15 09:00:24     count  [scaler1, count_difference]
10       78  2021-04-15 09:00:22     count  [scaler1, count_difference]
11       77  2021-04-15 08:59:20     count  [scaler1, count_difference]
12       76  2021-04-15 08:59:15     count  [scaler1, count_difference]
13       75  2021-04-15 08:59:10     count  [scaler1, count_difference]
14       74  2021-04-15 08:59:04     count  [scaler1, count_difference]
15       73  2021-04-15 08:48:09     count                    [scaler1]
16       72  2021-04-15 08:48:05     count                    [scaler1]
17       71  2021-04-15 08:48:00     count                    [scaler1]

Custom report#

Q: How to make a custom report?

List of recent runs using scan (or rel_scan) including these keys:

  • plan

  • detectors

  • start position

  • end position

  • number of points

  • metadata

All of this information is available from the run but some of it is not presented in the metadata with individually-named keys. Because some of the information to be reported (specifically, arguments to the plan, including start and end position) is not easily extracted with MongoDB, custom code is needed; the listruns() function cannot be used.

Make a function that matches a pre-determined set of search terms and prints a report.

[19]:
import datetime
import pyRestTable
import time

def custom_list(cat, n_runs=20, start_md_keys=None):
    """
    List n_runs most recent, successful (rel_)scan runs.

    PARAMETERS

    cat *object*:
        Databroker catalog to be searched.
    n_runs *int*:
        Maximum number of runs to list in table. (default: 20)
    start_md_keys *[str]*:
        List of additional (start document) metadata keys to report.
        (default: 'beamline_id, instrument_name, login_id, proposal_id')
    """
    start_md_keys = start_md_keys or """
        beamline_id instrument_name login_id proposal_id
    """.split()

    ts = time.time() - 60*60*24*7*(52/2)  # ~6 months ago
    dt = datetime.datetime.fromtimestamp(ts)
    since = f"{dt.year}-{dt.month:02d}"  # start of that month
    print(f"{since=}")

    query = dict(
        plan_name={"$in": ["scan", "rel_scan"]},
        detectors=["noisy"],  # only this detector
    )
    query.update(databroker.queries.TimeRange(since=since))
    filtered_cat = cat.v2.search(query)
    print(f"{len(filtered_cat)=}")

    table = pyRestTable.Table()
    table.labels = """
        scan_id plan detectors motor start end n_points date uid7
    """.split()
    table.labels += start_md_keys

    def get_name_from_device_repr(text):
        """
        Dig out the name of the motor object from the 'plan_args'.

        "MyEpicsMotor(prefix='gp:m1', name='m1', ...)"
        From this string, only the `name='m1'` part is interesting here.
        """
        s = text.find("name='") + 6
        f = text[s:].find("'") + s
        return text[s:f]

    for uid in filtered_cat:
        run = filtered_cat[uid]

        success = run.metadata['stop']['exit_status'] == "success"
        if not success:
            continue

        uid7 = uid[:7]
        date = datetime.datetime.fromtimestamp(round(run.metadata['start']['time'], 3))
        scan_id = run.metadata['start']['scan_id']
        detectors = run.metadata['start']['detectors']
        # motors = run.metadata['start']['motors']
        plan = run.metadata['start']['plan_name']
        # n_points = run.metadata['start']['num_points']  # requested
        plan_args = run.metadata['start']['plan_args']['args']
        n_points = run.metadata['stop']['num_events']['primary']  # recorded

        # This data can't be extracted using only MongoDB Query
        p_start, p_end = plan_args[1:3]  # as defined in the bp.scan() plan
        # TODO: this assumes only 1 motor is scanned!
        motors = [get_name_from_device_repr(plan_args[0])]
        # Multi-axis scans are: motor, start, finish triples for each axis, in order

        # build the table row
        row = [
            scan_id,
            plan, ", ".join(detectors), ", ".join(motors), round(p_start, 5), round(p_end, 5), n_points,
            date, uid7,
        ]
        row += [run.metadata['start'].get(k, "") for k in start_md_keys]

        table.rows.append(row)
        if len(table.rows) >= n_runs:
            break

    print(table)

custom_list(cat, 20)
since='2022-05'
/home/prjemian/.conda/envs/bluesky_2022_3/lib/python3.9/site-packages/databroker/queries.py:89: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html
  timezone = lz.zone
len(filtered_cat)=129
======= ======== ========= ===== ======== ======= ======== ========================== ======= ================ =========================== ============ ===========
scan_id plan     detectors motor start    end     n_points date                       uid7    beamline_id      instrument_name             login_id     proposal_id
======= ======== ========= ===== ======== ======= ======== ========================== ======= ================ =========================== ============ ===========
299     scan     noisy     m1    -1.2     1.2     21       2022-10-25 14:10:06.464000 3703202 Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
2       rel_scan noisy     m1    -0.27632 0.27632 23       2022-10-12 15:23:32.336000 4704d7f Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
1       rel_scan noisy     m1    -2       2       23       2022-10-12 15:23:26.220000 a5ada81 Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
271     rel_scan noisy     m1    -0.56269 0.56269 21       2022-09-07 14:40:31.996000 c53365c Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
270     rel_scan noisy     m1    -5       5       21       2022-09-07 14:40:03.057000 166dc23 Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
269     rel_scan noisy     m1    -1.2     1.2     21       2022-09-07 14:39:39.521000 11542a9 Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
268     rel_scan noisy     m1    -0.28913 0.28913 21       2022-09-07 14:38:30.682000 c51776e Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
267     rel_scan noisy     m1    -1.2     1.2     21       2022-09-07 14:38:18.649000 4bb967f Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
266     rel_scan noisy     m1    -1.2     1.2     21       2022-09-07 14:37:51.675000 7fe7229 Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
265     rel_scan noisy     m1    -1.2     1.2     21       2022-09-07 14:37:32.813000 7969b6e Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
263     scan     noisy     m1    -1.2     1.2     21       2022-09-07 11:27:13.747000 c7b10f1 Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
260     scan     noisy     m1    -1.2     1.2     21       2022-06-29 09:42:54.977000 0783d90 Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
259     scan     noisy     m1    -1.2     1.2     21       2022-06-29 09:40:13.666000 2a487dd Bluesky_training BCDA EPICS Bluesky training prjemian@zap training
10      rel_scan noisy     m1    -0.22238 0.22238 23       2022-05-24 15:05:32.791000 72faef4
9       rel_scan noisy     m1    -2       2       23       2022-05-24 15:05:16.337000 ad2bac4
8       rel_scan noisy     m1    -0.06626 0.06626 23       2022-05-24 15:05:11.307000 9102fe6
7       rel_scan noisy     m1    -0.08168 0.08168 23       2022-05-24 15:05:08.414000 dcafde6
6       rel_scan noisy     m1    -0.25194 0.25194 23       2022-05-24 15:05:02.797000 d57a79a
5       rel_scan noisy     m1    -2.1     2.1     23       2022-05-24 15:04:45.942000 c82643b
4       rel_scan noisy     m1    -0.17233 0.17233 23       2022-05-24 15:04:38.514000 318ae81
======= ======== ========= ===== ======== ======= ======== ========================== ======= ================ =========================== ============ ===========


Other searches:#

TODO:

Some of these seaches may need additional Python code to complete. Others may be expedited by additional MongoDB Query constructs.

  • What version of bluesky was used 6 months ago?

  • When was apstools version 1.2 - 1.4 used?

  • Find all runs with sample “xyz” measured with detector adsimdet. List the most recent ones.

    • Do we have any APS catalogs with sample name as metadata?

    • similar: listruns(detectors={"$in": ["adsimdet"]}, plan_name="count")

  • Restrict that list to runs in APS cycle 2021-3 (if no APS cycle info, the last 4 months of 2021).

  • Find runs where “Joe User” appears somewhere in the metadata.

  • Find runs where “silver behenate” appears somewhere in the metadata.

  • Find runs that failed for some reason. Is there any indication why? (run.metadata['stop']['reason'])

    listruns(
        cat,
        keys="uid scan_id stop.exit_status stop.reason",
        **{
            "stop.exit_status": {"$ne": "success"},  # FIXME: not working
        }
    )
    
  • Plot centroid and width of all successful scans of specific y vs. x with more than one data point for the last week.

  • What about these?

    • using start or stop document metadata

    • user

    • sample

    • Proposal or ESAF ID

    • fuzzy or misspelled terms

    • combination searches