Online Operations Manual

Setting up an analysis

This page is a tutorial for deploying a GstLAL low-latency analysis. For onboarding, users can follow these instructions to launch an analysis with a small BBH-only template bank. The workflow for production analyses is the same.

1. Clusters and accounts

To launch your own analysis you can use your albert.einstein account at any LDG compatible cluster.

Note: This tutorial is specific for an analysis on the Penn State cluster at ICDS. The production GstLAL analyses are run using shared accounts, see the overview page.

On CIT and ICDS the singularity bind paths are different. Bind paths are paths on the host filesystem that you want to still be available for reading and writing from within the container.

On ICDS we always bind /ligo eg,

singularity run -B /ligo <build-dir>

On CIT you don’t need a bind path, but we often bind $TMPDIR, eg

singularity run -B $TMPDIR <build-dir>

Additionally on CIT you will need a different profile than what is included in the o4a-containers:main container used below. FIXME: currently we don’t have a dedicated profile for people to launch test analyses on CIT.

2. Build

The environments needed to run GstLAL low-latency analyses are defined by docker containers managed in the various branches at https://git.ligo.org/gstlal/o4a-containers. These containers include an installation of GstLAL and all the dependencies needed for running the online analysis. Each branch contains a configuration file specific to a particular analysis type (eg BBH-only analysis, early warning analysis, and so on). The main branch of o4a-containers can be used to start up the small BBH-only analysis. The docker containers are defined by the Dockerfile in each branch and are regularly published by the gitlab CI to stay up to date with the branch of GstLAL they are based on. The container defined by the main branch is based off of the GstLAL master branch.

To build the docker image, run the following:

singularity build --fix-perms --sandbox <build-dir> docker://containers.ligo.org/gstlal/o4a-containers:main

This will pull a writable container into the directory specified by <build-dir>.

Note: we use singularity to pull and run containers instead of using docker. They are mostly interchangeable, but docker requires root priveleges which the average user won’t have on LDG clusters. By making the image writable you will be able to make changes to the code installed in the image. This is useful if you plan to do any dev work with the image.

3. Set up

Now that you have a build ready to use, it’s time to set up a run directory for your online analysis. We will refer to this directory path as <run-dir>. All of the necessary files to get started are included in the build. Copy them over using this command:

cp -r <build-dir>/online-analysis/* <run-dir>

4. Authentication

Next, you need to set up influx credentials. InfluxDB is used to store data from the analysis, mainly for monitoring purposes. More information on InfluxDb can be found on the Monitoring page.

Choose a username and password, as well as a name for your influx database. These will be added to configuration files in the next section.

This username and password are not associated with your LIGO credentials, it is best to make a new influx user for each new analysis you start. Influx usernames cannot contain periods or dashes and must be unique. The database name may not contain dashes.

You will need to send your database name and Influx username/password combination to an Influx admin so that they can set up your new database:

INFLUX_USERNAME: <ANALYSIS SPECIFIC>
INFLUX_PASSWORD: <ANALYSIS SPECIFIC>

If you have a Grafana dashboard, you also need to send the database names:

INFLUX_USERNAME: <ANALYSIS SPECIFIC>
INFLUX_PASSWORD: <ANALYSIS SPECIFIC>
database_name: <ANALYSIS SPECIFIC>

If your analysis will run on CIT or PSU (ICDS), send them to ron.tapia, rebecca.ewing, or shomik.adhicary in a Mattermost direct message. If your analysis will run on UWM (Nemo), send them instead to duncan.meacher.

The admin will ensure that:

1. The Influx database is created.
2. The Influx user exists
3. The grafana reader, `gstlalreader` can access the Influx database.
4. (If requested) Your user on [the grafana website](https://gstlal.ligo.caltech.edu/grafana) is an "editor" so that you can edit dashboards.

You can continue setting up the analysis while the database and credentials are being provisioned, but the main analysis can’t be launched (step 9) until this is ready.

If you are running the analysis from your personal account, you will also need to create a suitable LIGO proxy:

  ligo-proxy-init albert.einstein

For production analyses run on the shared accounts, do not run `ligo-proxy-init`. These accounts have credentials managed through a robot certificate.

5. Configuration

After copying files from the build in the Set Up step above you will have a config.yml which defines all of the input options of the analysis. At a minimum you will need to change the following in your config.yml:

tag: a unique string to identify your analysis
accounting-group-user: your albert.einstein name
singularity-image: your <build-path>
INFLUX_USERNAME and INFLUX_PASSWORD: the Influx credentials you made above
X509_USER_CERT and X509_USER_KEY: the full path to your x509 proxy

If you know what you’re doing you can change other options in the config, but it’s not necessary.

Add the name of your influx database to the web/inspiral.yml:

db: REPLACE THIS WITH YOUR DB NAME

6. Generate Makefile

Now that your configuration is set you can generate a Makefile which will be used for the remaining steps in the tutorial.

singularity exec -B /ligo <build-dir> gstlal_ll_inspiral_workflow init -c config.yml

Profiles include specifications for the execute nodes your jobs will run on. They need to be installed on each cluster once. Check if your profile is installed:

singularity exec -B /ligo <build-dir> gstlal_grid_profile list

If it’s not, install it by running the following:

singularity exec -B /ligo <build-dir> gstlal_grid_profile install profiles/<profile.yml>

Note: You should ONLY use the profile which was included in the container you pulled. This is because resources are allocated specifically for each analysis type.

7. Pre-generated data products

You will need to download a template bank, mass model, reference PSD, and pastro model file. The DCC document number and version are stored in the config.yml. For the small BBH analysis, the DCC page is here: https://dcc.ligo.org/T2300144

Download the files by running the following:

source /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/etc/profile.d/conda.sh && conda activate igwn
make dcc-files
conda deactivate

Note: You will be prompted for your albert.einstein username and password in order to download the files.

8. Launch set up DAG

In this step you will generate a DAG which is used to create the following data products:

Time slide files
Split bank files (ie splitting the full template bank into smaller regions of parameter space, for use in generating SVD bins)
SVD bank decomposition
Ranking data files

To generate the DAG:

  singularity exec -B /ligo <build-dir> make setup

And then launch the DAG:

  condor_submit_dag online_setup_<analysis_tag>.dag

9. Launch analysis DAG

After the set up DAG completes, it’s time to generate and launch the main analysis DAG.

  singularity exec -B /ligo <build-dir> make dag

And launch the DAG:

  condor_submit_dag online_inspiral_<analysis_tag>.dag

10. Setup Grafana monitoring

FIXME

(Optional) Launch GW low-latency test suite

The Integrated Test Suite is a way to monitor the injection recovery of a low-latency analysis. This gives us a sense of the scientific validity of the analysis (for example, that sky localizations and source classifications match the expectations) as well as a measure of the sensitivity (for example, the surveyed volume-time, or SNR missed-found plots.) Monitoring the Test Suite dashboard can help us to identify problems that may arise with the analysis.

Once you have a low-latency analysis running (which processes injections and uploads them to GraceDB), you can follow the instructions in the Test Suite config repo to launch a Test Suite DAG and corresponding Grafana dashboard.

Monthly update of live injection sets

The injection cache frame is updated monthly on streaming (see section 9. low-latency injections for detail). To match up the live-injection events in each month, we need to update our injection table in the test-suite. Therefore, at the end of each month, the injection database needs to be updated with the next month’s injection file. These injection files of all months (from July 2024 to Apiral 2025) have been converted from HDF5 format to XML format and left in the ~/<analysis-dir>/test-suite/injections/ for Edward, Jacob, Rick, and Bob. If you want to get the original HDF5 files generated by Reed, you can find them at /scratch/reed.essick/rates+pop/rpo4-injections/online-injections/.

Move all the submit files and dag-related files to backup folder.

mkdir backup/backup_YYYYMMDD
mv *sub test_suite_* backup/backup_YYMMDD

Update the injection database in config.yml:

injections: injections/rpo4-<year>_<month>-1401235456-2591232/rpo4-<year>_<month>_000-1401235456-2591232.xml.gz

e.g. for July 2024, the format is rpo4-2024_07-1401235456-2591232/rpo4-2024_07_000-1401235456-2591232.xml.gz

Remake and launch the dag

make launch

Relaunch Online Analysis

The low-latency analysis should be removed once per week in order to compress files (reducing memory) and update the credentials. For this, or any other time you want to remove and relaunch the analysis follow these steps:

cd <run-dir>
condor_q -dag

condor_q -dag will give you the DAG ID of your analysis. Then:

condor_rm <dag-id>

To re-launch:

condor_submit_dag online_inspiral_<analysis_tag>.dag

You can tail the output of the DAG to check for possible failures:

tail -f online_inspiral_<analysis_tag>.dag.dagman.out

Update Online Analysis after SCCB Approval (as of O4a)

Tagging the production branch

After SCCB approval for your changes, you must make a new tag of the running branch.

To do so, head to the tags page and click New Tag in the top right. Then, name your tag, and add the hash for the last commit on the branch you want to tag as the origin.

We have been using the convention MAJOR_CHANGE.FEATURE_UPDATE.BUG_FIX for 0.0.0 tag numbers with an appropriate string in front.

For example, when adding the dtdphi plotter feature, the o4a branch went from allsky-o4a-online-v1.0.4 to allsky-o4a-online-v1.1.0 while adding a bugfix to reading frames from disk made the tag go from allsky-o4a-online-v1.0.1 to allsky-o4a-online-v1.0.2.

Updating the container repo

After tagging the branch, you also have to update the singularity container repo, so that the new changes will be reflected in the container you pull when you go to deploy your changes.

The O4a container repo is here. Depending on the analysis you want to update, you can choose that analysis by branch name. Changes to all-sky analyses have to be applied to both the Jacob and Edward branches.

To update the tag name, go to the appropriate branch, and change the first line in the Dockerfile to point to the new tag. In our previous example of adding the dtdphi plotter, the line would get changed as follows:

Original

FROM containers.ligo.org/lscsoft/gstlal:allsky-o4a-online-v1.0.4

Updated version

FROM containers.ligo.org/lscsoft/gstlal:allsky-o4a-online-v1.1.0

Changes to the container repo will propogate to the clusters every midnight, or you can force an update by running the CI pipeline.

Re-deploying your changes

Now that all of the updates are in place, you may want to actually deploy your changes in the running analyses. To do so, first make a new singularity image in the builds directory using the new container you just changed. Builds should be named as ANALYSIS-TAG_GIT-HASH-OF-TAG. Where GIT-HASH-OF-TAG should be the shortened hash of the commit you just tagged. For example: gstlal_jacob_o4a_2cfb318a.

Then, head to your analysis dir, and change the singularity-image line in the config. Then, remake the dag before relaunching. By remaking the dag, the new singularity image will now be used by all of the jobs.

Update the tag history in the Gantt chart

We keep track of the tag history in the Gantt chart under gstlal namespce of gitlab, i.e. here. Anyone that has deployed a new tagged build is responsible for updating the Gantt chart.

Add a new epic associated with the new tag in a parent epic for each all sky, SSM and Early Warning. Make sure the new epic has o4-code-tags label, otherwise it won’t show up in the Gantt chart.
Paste to a link to the relevant SCCB ticket and set the start date to be the day when the build gets deployed.
Set the end date of the epic with previous build to be the same day or the day before.
Double check the Gantt chart is updated appropriately.

Alt text