Online Operations Manual
Table of Contents
- Introduction
- Analysis diagram
- Programs used in this analysis
- gstlal_inspiral
- gstlal_inspiral_marginalize_likelihoods_online
- gstlal_ll_dq
- gstlal_ll_inspiral_event_plotter
- gstlal_ll_inspiral_event_uploader
- gstlal_ll_inspiral_pastro_uploader
- gstlal_ll_inspiral_trigger_counter
- scald_event_collector
- scald_metric_collector
- Kafka topics
- On disk layout
- HTTP traffic
- Important References and Resources
- Software and service stack
Introduction
The low-latency GstLAL-based compact binary analysis implements a mixed time-domain, frequency domain filtering scheme to produce extremely low latency gravitational wave detection capabilities. The purpose is to discover gravitational waves from merging neutron stars and black holes within seconds of the waves arriving at Earth.
There is an initial configuration procedure that must be executed first. During this stage, the template bank is decomposed into SVD bins and initial dist stats are computed. Once setup, the analysis is designed to run continuously throughout an observing period. This page provides an ‘as built’ overview of the entire analysis. If you are simply looking to start an analysis from scratch, there are step-by-step instructions in the rest of this manual starting with Configuration.
Analysis diagram
Below is a diagram relating the various workflows (dashed line boxes) and communication layers (HTTP, Kafka, File I/O) for a functioning low-latency compact binary search. You can click on the diagram to learn more about each component.
Programs used in this analysis
FIXME
- config
- doc
- source
-
Disk I/O
SVD bank files. Multiple SVD bank files can be given per job, in order to analyze data from multiple IFOs. These should each correspond to the same SVD bin.
A reference PSD file. We use a file checked into the repo as a starting point, but always use the track-psd option so that the PSD is periodically updated to reflect the current state of the detector noise. What reference PSD do we use? What time range does it correspond to? Is it updated throughout the run at all?
Ranking statistic input file. This contains likelihood ratio ranking statistic data according to our signal and noise models, created by the
create_prior_dist_stats
jobs in the set-up stage. These files are in thedist_stats
directory and we use the naming convention{IFOs}-{SVD_GROUP_NUM}_GSTLAL_DIST_STATS-0-0.xml.gz
For injection jobs, this input file is taken from the one used for non-injection job of the same SVD bin so that the noise model is consistent between non-injection/injection twin jobs. And the rankingstat data is not updated or overwritten for this case (technically, it is internally adding counts but the internal data get overwritten by the next snapshot of the rankingstat file and will be used for ranking-statistic evaluation).Ranking stat PDF file. This is used to compute the FAP and FAR of triggers. This file is made by the
gstlal_marginalize_liikelihoods_online
job. It’s kept in thedist_stat_pdfs
directory and is named as{IFOs}-GSTLAL_DIST_STAT_PDFS-0-0.xml.gz
.Time slide xml file. This is made in the Makefile using
lalapps_gen_timeslide
Filename to write out ranking statistic data to. Overwrite / update the same file that was given as the ranking statistic input file (ie the one made from
create_prior_dist_stats
in thedist_stats
directory), when the output filename is set to be the same as the input filename (which is usually the case for online analysis).For injection jobs, the rankingstat data is not output to anywhere because the collected background is nonsense.
Zero-lag ranking stat PDF. This is a histogram of the likelihood ratios of zero-lag triggers collected in the filtering. It gets written at start-up, and updated as the job continually runs. These go in the
zerolag_dist_stat_pdfs
directory and are named as{IFOs}-{SVD_GROUP_NUM}_GSTLAL_ZEROLAG_DIST_STAT_PDFS-0-0.xml.gz
Trigger files get written out to the directories
gracedb_uploads/{GPS_TIME_STAMP}/
FIXME LINK TO TABLE DEFINITIONS -
Http requests
URL is advertised to a registry file in the top level of the analysis run directory. Data can be requested from the job via this url using bottle.
-
Kafka topics
Output topics: all scientific metric topics, data quality metric topics, latency metric topics, and monitoring topics. See https://gwsci.org/ops/diagrams#kafka.
Notes:
- Data source - what are all the different options for this? FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- Channel names FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- State channel names FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- DQ channel names FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- State vector on and off bits FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- Shared memory partition FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- FAR threshold for uploads FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- Group, pipeline, search FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- Labels FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- Service URL (GraceDb, playground, test, etc.) FIXME LINK TO OPS PAGE SOURCE OF TRUTH
- config
- doc
- source
-
Disk I/O
Input: Text files containing the URL of a web server from which to retrieve likelihood data from a particular job. These are named as
{JOB_NUM}_noninj_registry.txt
(there are also{JOB_NUM}_inj_registry.txt
for injection gstlal inspiral jobs but these are not used by the marginalize likelihoods job) and are kept in the top level of the run directory.Output: Name of an xml file to write out marginalized ranking statistic PDFs to. This file gets written to the directory
dist_stat_pdfs
and we use the naming convention{IFOs}-GSTLAL_DIST_STAT_PDFS-0-0.xml.gz
. This histogram contains noise and zerolag triggers collected from all the inspiral jobs. This zerolag counts will be used to apply the extinction model (simulate the clustering effect on the ranking statistic distribution based on zerolag counts and apply that to the noise counts) when assigning FAP/FAR in the online configuration. -
Http requests
registry files for all gstlal inspiral (noninjection) jobs which provide a url to request data from. The ranking_data.xml files are requested from these urls.
Notes:
- Ranking stats from each gstlal inspiral job are gathered via HTTP request.
Ingest messages from the uploads and ranking stat Kafka topics, store the event info in a dictionary keyed by event GPS time and SVD bank bin. Handle the stored event messages - upload all of the auxiliary files and plots to the event on GraceDb. This includes:
- The ranking statistic data file (ranking data.xml.gz) - Ranking statistic plots (background (noise) PDF, injection (signal) PDF, zero lag (candidates) PDF, LR, likelihood ratio CCDF, horizon distance vs. time, rates) - These are made by the functions in plots/far.py - PSD plots - These are made by the functions in plots/psd.py - SNR timeseries plots
Notes:
- Kafka URL
- GraceDb group, pipeline, search and service url to use
- No file outputs, unless `output-path` is set by user, then plots are saved to disk in addition to being uploaded to GraceDb.
composite event
aggregation is also supported). And send the favored event information (time, SNR, FAR, PSD and coinc files) to a Kafka topic. Upload the event to GraceDb and send a message with the GraceDB event ID, coinc file, and event time to the Kafka uploads topic.
Notes:
- Kafka URL and topic to consume messages from. We consume messages from the events topic which are sent by the inspiral jobs.
- GraceDb group, pipeline, search and service url to use in uploading events.
- Trials factor on the FAR. The FAR threshold will be = FAR / trials factor, where usually the trials factor corresponds to the number of independent online pipelines, ie 5 - CWB, GstLAL, MBTA, PYCBC, SPIIR. FIXME not used.
- Upload cadence - determine how long to wait between sending multiple events for the same event window.
- No file outputs.
- Produces Kafka messages to the `favored events`, and `uploads` topic.
- config
- doc
- source
-
Disk I/O
Input file: p(astro) model file including some pre-computed data necessary to compute p(astros)
Input file: marginalized ranking stat PDF - this file is read in every four hours so that the p(astro) model can be updated with the latest ranking stat signal and noise model.
Output file: p(astro) model file - written out to disk every time the ranking stat information is updated.
-
Kafka topics
Input topics: gstlal.<analysis_tag>.inj_uploads OR gstlal.<analysis_tag>.uploads
gstlal.p_astro.json
to the event on GraceDB and apply the label PASTRO_READY
to the event.
Notes:
- Kafka URL
- GraceDb group, pipeline, search and service url to use
Notes:
- Kafka URL and topic to consume messages from.
- Specify the output period, how often to write out the zero-lag counts histogram to disk.
- Bootstrap file. The program will try to load the specified output file first to get initial counts, if that file doesn’t exist use the bootstrap file instead to start up.
Notes:
- YAML configuration file from the `web` directory, sets dashboard and plotting options, also the data backend (Influx DB to store metrics in), and schemas
- Kafka URI to get messages from
- Data type. Always triggers
- topics to subscribe to
- one schema per topic indicating metrics to aggregate
- No file outputs.
gstlal inspiral
jobs, see the update function in EyeCandy (part of the LLOIDTracker).
Notes:
- YAML configuration file from the `web` directory, sets dashboard and plotting options, also the data backend (Influx DB to store metrics in), and schemas (metrics to aggregate, eg. FAR history or SNR history, along with aggregation type (min, max) etc.)
- Kafka URI to get messages from, topics to subscribe to, one schema per topic indicating metrics to aggregate
- No file outputs.
Kafka topics
Notes:
- <analysis_tag> is a user provided string, e.g., “gstlal.inspiral_mario_MDC04”
- <inj> is either the optional string “inj_” or nothing which delineates between an injection or non injection run
- <ifo> is a particular detector, e.g., “L1”, or “K1”
Scientific metric topics:
- gstlal.inspiral_<analysis tag>.far_history:
- gstlal.inspiral_<analysis tag>.far_history:
- gstlal.inspiral_<analysis tag>.likelihood_history:
- gstlal.inspiral_<analysis tag>.inj_likelihood_history:
- gstlal.inspiral_<analysis tag>.snr_history:
- gstlal.inspiral_<analysis tag>.inj_snr_history:
- gstlal.inspiral_<analysis tag>.<ifo>_snr_history:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_snr_history:
- gstlal.testsuite_<analysis tag>.<ifo>_psd:
- gstlal.inspiral_<analysis tag>.coinc:
Data quality metric topics:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_inj_dqvectorsegments:
- gstlal.inspiral_<analysis tag>.<ifo>_dqvectorsegments:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_dqvectorsegments:
- gstlal.testsuite_<analysis tag>.<ifo>_dqvectorsegments:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_whitehtsegments:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_inj_whitehtsegments:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_inj_statevectorsegments:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_statevectorsegments:
- gstlal.inspiral_<analysis tag>.<ifo>_statevectorsegments:
- gstlal.testsuite_<analysis tag>.<ifo>_statevectorsegments:
- gstlal.inspiral_<analysis tag>.<ifo>_strain_dropped:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_strain_dropped:
- gstlal.inspiral_<analysis tag>.<ifo>_noise:
Latency metric topics:
- gstlal.inspiral_<analysis tag>.<ifo>_snrSlice_latency:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_snrSlice_latency:
- gstlal.inspiral_<analysis tag>.<ifo>_datasource_latency:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_datasource_latency:
- gstlal.inspiral_<analysis tag>.inj_latency_history:
- gstlal.inspiral_<analysis tag>.latency_history:
- gstlal.inspiral_<analysis tag>.<ifo>_whitening_latency:
- gstlal.inspiral_<analysis tag>.inj_<ifo>_whitening_latency:
- gstlal.inspiral_<analysis tag>.inj_all_itacac_latency:
- gstlal.inspiral_<analysis tag>.all_itacac_latency:
- gstlal.inspiral_<analysis tag>._all_itacac_latency:
Event topics:
- gstlal.inspiral_<analysis tag>.favored_events:
- gstlal.inspiral_<analysis tag>.inj_events:
- gstlal.inspiral_<analysis tag>.uploads:
- gstlal.inspiral_<analysis tag>.events:
- gstlal.inspiral_<analysis tag>.p_astro:
- gstlal.inspiral_<analysis tag>.ranking_stat:
- gstlal.inspiral_<analysis tag>.ram_history:
Monitoring topics:
- gstlal.inspiral_<analysis tag>.inj_ram_history:
- gstlal.inspiral_<analysis tag>.ram_history:
- gstlal.inspiral_<analysis tag>.inj_uptime:
- gstlal.inspiral_<analysis tag>.uptime:
On disk layout
A shared file system is used to store configuration data, archives of trigger outputs, and occasionally to pass information between running jobs (though the low-latency information is typically passed via http or kafka).
- archive: empty?
- dtdphi: Makefile only?
- mass_model: H1L1V1-GSTLAL_MASS_MODEL-0-0.xml.gz: wrong file extension??? and Makefile
- profiles: empty?
- svd:
- cit_mario_online.yml
- ics_online.yml
- psd: H1L1V1-GSTLAL_REFERENCE_PSD-0-0.xml.gz Makefile
- influx_creds.sh
- bank: bbh_low_q.xml.gz bns.xml.gz imbh_low_q.xml.gz Makefile mario_bros_offline.xml.gz nsbh.xml.gz other_bbh.xml.gz
- svd_bank: empty???
- H1L1V1-GSTLAL_REFERENCE_PSD-0-0.xml.gz
- Makefile
- env.sh
- H1L1V1-GSTLAL_SVD_MANIFEST-0-0.json
- tisi.xml
- filter: contains e.g., svd_bank/H1-0358_GSTLAL_SVD_BANK-0-0.xml.gz
- nohup.out
- split_bank: contains e.g., H1L1V1-0191_GSTLAL_SPLIT_BANK_0577-0-0.xml.gz
- aggregator: contains e.g., /1/3/3/6/2/7/V1-PSD-1336279900-100.hdf5 do we need these at all?
- 13362: contains e.g., H1L1V1-0918_inj_mdc04_LLOID-1336273048-14933.xml.gz H1L1V1-0918_inj_mdc04_LLOID-1336273048-60.xml.gz H1L1V1-0918_inj_mdc04_SEGMENTS-1336273048-9028.xml.gz H1L1V1-0918_inj_mdc04_SEGMENTS-1336273108-0.xml.gz H1L1V1-0918_noninj_LLOID-1336273064-14864.xml.gz H1L1V1-0918_noninj_LLOID-1336273064-9.xml.gz H1L1V1-0918_noninj_LLOID-1336287922-14406.xml.gz H1L1V1-0918_noninj_LLOID_DISTSTATS-1336273064-10.xml.gz H1L1V1-0918_noninj_LLOID_DISTSTATS-1336273064-14865.xml.gz H1L1V1-0918_noninj_LLOID_DISTSTATS-1336287922-14407.xml.gz H1L1V1-0918_noninj_SEGMENTS-1336273064-9012.xml.gz H1L1V1-0918_noninj_SEGMENTS-1336273074-0.xml.gz H1L1V1-0918_noninj_SEGMENTS-1336287922-14399.xml.gz
- plots: contains e.g., COMBINED-GSTLAL_INSPIRAL_PLOT_BACKGROUND_ALL_NOISE_LIKELIHOOD_RATIO_CCDF_CLOSED_BOX-1336272983-114190.png
- config.yml
- web: contains e.g., inspiral.yml online_dashboard.json
- test-suite: is this the test suite dag? Is this the preferred way to run it? Can we point to test-suite specific documentation?
- dist_stat_pdfs: contains e.g., H1L1V1-GSTLAL_DIST_STAT_PDFS-0-0.xml.gz
- gracedb_uploads: contains e.g., 13372/H1L1V1-GSTLAL_0621_inj_mdc04_7_945_CBC_AllSky_0621_RankingData-1337299900-1.xml.gz 13372/H1L1V1-GSTLAL_0621_inj_mdc04_7_945_CBC_AllSky-1337299900-1.xml Where is pastro??
- dist_stats: contains e.g., H1L1V1-0915_GSTLAL_DIST_STATS-0-0.xml.gz
- logs
- zerolag_dist_stat_pdfs: contains e.g., H1L1V1-0406_GSTLAL_ZEROLAG_DIST_STAT_PDFS-0-0.xml.gz
HTTP traffic
FIXME
Important References and Resources
Software and service stack
The GstLAL online analysis relies on several open source software libraries. Some of these are available in the gwsci container, but some are not. Namely, the following services are required beyond the software in the gwsci container:
- Kafka - used to stream data products for I/O between different processes
- InfluxDB - used to store metric data
- Grafana - used to visualize metric data
Additionally there is an implicit assumption that you are deploying this analysis on an LDG-compatible site running HTCondor with low-latency data services running (a few different flavors are supported).