SAGUARO: Time-domain Infrastructure for the Fourth Gravitational-wave Observing Run and Beyond

We present upgraded infrastructure for Searches After Gravitational waves Using ARizona Observatories (SAGUARO) during LIGO, Virgo, and KAGRA’s fourth gravitational-wave (GW) observing run (O4). These upgrades implement many of the lessons we learned after a comprehensive analysis of potential electromagnetic counterparts to the GWs discovered during the previous observing run. We have developed a new web-based target and observation manager (TOM) that allows us to coordinate sky surveys, vet potential counterparts, and trigger follow-up observations from one centralized portal. The TOM includes software that aggregates all publicly available information on the light curves and possible host galaxies of targets, allowing us to rule out potential contaminants like active galactic nuclei, variable stars, solar system objects, and preexisting supernovae, as well as to assess the viability of any plausible counterparts. We have also upgraded our image-subtraction pipeline by assembling deeper reference images and training a new neural-network-based real–bogus classifier. These infrastructure upgrades will aid coordination by enabling the prompt reporting of observations, discoveries, and analysis to the GW follow-up community, and put SAGUARO in an advantageous position to discover kilonovae in the remainder of O4 and beyond. Many elements of our open-source software stack have broad utility beyond multimessenger astronomy, and will be particularly relevant in the “big data” era of transient discoveries by the Vera C. Rubin Observatory.


INTRODUCTION
The discovery of a γ-ray burst (GRB) and a kilonova associated with gravitational waves (GWs) from the binary neutronstar (BNS) merger GW170817 ushered in a new era of multimessenger astronomy (Arcavi et al. 2017;Coulter et al. 2017; Corresponding author: Griffin Hosseinzadeh griffin0@arizona.edu* LSSTC Catalyst Fellow LIGO Scientific Collaboration et al. 2017c;Lipunov et al. 2017;Soares-Santos et al. 2017;Tanvir et al. 2017;Valenti et al. 2017).Among many other accomplishments, these observations confirmed that at least some GRBs come from neutronstar mergers (Goldstein et al. 2017;LIGO Scientific Collaboration et al. 2017b;Savchenko et al. 2017), revealed evidence of r-process nucleosynthesis in compact-object mergers (Chornock et al. 2017;McCully et al. 2017;Nicholl et al. 2017;Pian et al. 2017;Shappee et al. 2017;Smartt et al. 2017;Tanvir et al. 2017;Watson et al. 2019), and yielded the first measurement of the Hubble constant via the standard-siren method (LIGO Scientific Collaboration et al. 2017a).Despite this breakthrough, little is known observationally about what diversity exists among kilonovae, caused either by differences in the progenitor system or by viewing-angle effects.The handful of kilonovae discovered in association with GRBs, and therefore viewed nearly pole-on, hints at an intrinsic luminosity span of up to two orders of magnitude (Gompertz et al. 2018;Rastinejad et al. 2021Rastinejad et al. , 2022a;;Troja et al. 2022;Yang et al. 2022Yang et al. , 2023;;Gillanders et al. 2023;Levan et al. 2023).A larger sample of off-axis kilonovae, discovered in association with GW events, is needed to fill in this parameter space.
Our EM follow-up program, Searches After Gravitational waves Using ARizona Observatories (SAGUARO), surveys the localization regions of GW events by partnering with the Catalina Sky Survey (CSS), which routinely searches for near-Earth asteroids (Christensen et al. 2019).CSS's 1.5 m telescope on Mt.Lemmon, Arizona, surveys fixed 5 deg 2 fields with a typical unfiltered limiting magnitude of ∼21.5 mag.In the event of a GW alert requiring urgent follow-up, we have the ability to send a ranked list of fields to the CSS's scheduling software, which will interrupt normal survey operations on the 1.5 m to observe the GW localization as soon as possible.
Serendipitous CSS observations may also cover parts of GW localization regions without our request.
If we or others discover a plausible EM counterpart, we will trigger targeted follow-up observations under our approved program on any or all of the following observatories: 1. the 2×8.4 m Large Binocular Telescope on Mt.Graham, Arizona, USA, 2. the 6.5 m Magellan Telescopes at Las Campanas Observatory, Chile, 3. the 6.5 m MMT on Mt.Hopkins, Arizona, USA, 4. the 2.3 m Bok Telescope on Kitt Peak, Arizona, USA, 5. the 1.8 m Vatican Advanced Technology Telescope on Mt.Graham, Arizona, USA, and 6. the 1.5 m Kuiper Telescope on Mt.Bigelow, Arizona, USA.We also participate in active GW follow-up programs on several facilities worldwide, which protects our effort against bad weather at a single site and provides us the ability to obtain follow-up photometry and spectroscopy of potential kilonovae within minutes to hours of discovery.Lundquist et al. (2019) present an overview of the SAGUARO's infrastructure as it stood at the beginning of O3, and Paterson et al. (2021) present an analysis of the O3 followup effort, including 17 GW events with CSS data either triggered by our program or observed serendipitously as part of the normal near-Earth asteroid survey.We continue this series of papers by describing new and improved hardware and software infrastructure to be used by SAGUARO during LIGO, Virgo, and KAGRA's fourth observing run (O4), which started on 2023 May 24.In Section 2, we present our new target and observation manager (TOM), an interactive web front end and database back end for carrying out the project.In Section 3, we describe updates to our image-subtraction pipeline and machine-learning algorithm for real-bogus classification.In Section 4, we outline our future plans, including progress toward adding a new instrument to our program.In Section 5, we conclude by discussing our prospects for discovering the second kilonova during O4.All of our software, including installation instructions, is publicly available on GitHub1 and archived on Zenodo (Daly et al. 2023;Hosseinzadeh et al. 2023b;Hosseinzadeh & Shrestha 2023;Paterson et al. 2023;Rastinejad & Hosseinzadeh 2023).

TARGET AND OBSERVATION MANAGER
The complexity of our observing program and the necessity of responding in real time to alerts and incoming data require that we centralize all of our tools for planning observations, monitoring data, vetting targets, and reporting results in a single location that is accessible to all team members at all times.Current transient surveys and large follow-up programs use similar systems, including the Asteroid Terrestrial-impact Last Alert System (ATLAS) Transient Science Server (Smith et al. 2020)  Real-Time Alerts Final Catalog Gravitational-Wave Events in O3 Figure 1.Sankey diagram showing the outcome of real-time alerts and the provenance of astrophysical GW events during LIGO and Virgo's third observing run, using data from the Gravitational-Wave Candidate Event Database and the Gravitational Wave Open Science Center.Compact binary coalescence alerts are labeled by their highest probability source classification.Events with data-quality issues identified in real time are labeled as "retracted."Events with a probability of astrophysical origin pastro < 50% after further analysis are labeled as "marginal."Events with a false-alarm rate of FAR > 2 yr −1 after further analysis are not included in the final catalog (LIGO Scientific Collaboration & Virgo Collaboration 2021a; LIGO Scientific Collaboration et al. 2023).Only three confirmed and two marginal events containing an NS had alerts allowing follow-up in real time.SAGUARO covered a portion of the localization region for four of these events and discovered a single possible counterpart, which was unlikely to be a kilonova based on its luminosity (Paterson et al. 2021).An interactive version of this figure that includes counts and event names for each flow is available.We call our portal a target and observation manager (TOM), after Street et al. (2018).The SAGUARO TOM is a browserbased front end on top of a PostgreSQL 14 database, 2 built using the Django 3 -based TOM Toolkit (Lindstrom et al. 2022) and its affiliated packages.The TOM is hosted on a dedicated server located on the University of Arizona campus and is available over the web to all members of our team with login credentials.In the following sections, we describe the various aspects 2 https://www.postgresql.org/ 3https://www.djangoproject.com/ of our observing campaign that have now been integrated into the TOM.

Receiving GW Alerts
Our workflow begins with the receipt of an alert from the LIGO/Virgo/KAGRA Collaboration notifying us of a GW event.Starting in O4, the International Gravitational Wave Observatory Network4 has been sending alerts over Hopskotch,5 an Apache Kafka6 -based alert platform developed by the Scalable Cyberinfrastructure to support Multi-Messenger Astrophysics (SCIMMA) project. 7Two packages affiliated with the TOM Toolkit allow us to receive and handle these alerts.The tom-alertstreams app 8 provides a management command readstreams that listens for Hopskotch alerts and passes the alert contents to a user-defined handler function.The tom_nonlocalizedevents app 9 provides basic handler functions that store alert information in the TOM database and web templates to display information for a given GW event, including the contents of related circulars from the General Coordinates Network (GCN)10 retrieved via SCIMMA's Hermes message exchange service11 (Figure 2; see also Section 2.7.3).
We customized the alert handler functions in our TOM to provide additional functionality.We have the ability to send human-readable summaries of each event over short message service (SMS) through Twilio 12 and Slack through a custom app13 (Figure 3).We divide incoming alerts into five streams, with the conditions in parentheses evaluated in the following order: 1. test alerts (if the event name starts with "MS"), 2. subthreshold alerts (if significant=False), 3. burst alerts (if group='Burst'), 4. BBH alerts (if HasNS < 1% and NSBH < 1% and BNS < 1%), and 5. NS alerts (otherwise).Each user can choose which SMS streams to subscribe to via their profile page in the TOM, and streams 2-5 each have dedicated channels in our Slack workspace.NS alerts push notifications to all members of the channel with the @channel command.
The new Kafka alerts contain multiorder coverage (MOC; Fernique et al. 2014) maps of the localization area in the Hierarchical Equal Area isoLatitude Pixelization (HEALPix) format (Górski et al. 2005).The tom_nonlocalizedevents package stores these maps in the TOM database with the help of HEALPix Alchemy (Singer et al. 2022).For alerts containing a new sky map, including maps generated by external coincidence with a GRB, we calculate and store the innermost credible region (among 25%, 50%, 75%, 90%, and 95%) that contains the coordinates of each linked target (see Section 2.4) and the center of each survey field.For computational efficiency, we use the center of the field as a proxy for all future sources we discover in that field.We also calculate and store the 90% credible region contour itself, which we use for display (see Section 2.2).Last, we calculate and store the probability contained within the footprints of each of our survey fields, which we use to prioritize our observation requests.This last calculation requires rasterization of the MOC map using healpy (Zonca et al. 2019) and ligo.skymap(Singer & Price 2016).

Planning and Triggering a Search
Once we decide to follow up a GW event, we use the TOM to plan our survey and send instructions to the CSS 1.5 m telescope.CSS observes in groups of 12 adjacent fields as part of their near-Earth asteroid survey, so we generate survey plans using the following algorithm: starting with the highest prob-ability field in the localization region, we select the next highest probability adjacent field that is observable between now and the next sunrise (using 12 • twilight), taking into account an airmass limit of 1.75 (to allow time for the full 12-field pattern to complete) and a phase-dependent moon exclusion region of (3 + 42ϕ) • while the moon is above the horizon, where ϕ ∈ [0, 1] is the moon phase.We use the Astropy coordinates package (Astropy Collaboration 2022) for visibility calculations and Astroplan (Morris et al. 2018) to look up the moon phase.We iterate, excluding fields for which we do not have reference images (see Section 3.1), until we reach 12 fields (60 deg 2 ).Depending on the size and position of the localization region, we may generate and submit multiple sets of 12 fields.To visualize our survey plans, we embed version 3 of the Aladin Lite sky viewer (Baumann et al. 2022), which natively reads HEALPix maps, into our TOM's survey planning page.An example is shown in Figure 4.
Survey plans can be manually edited by selecting or deselecting fields using checkboxes in the table.Users can then click the yellow "Submit" button on the survey planning page to submit the list of fields to CSS.This button opens one or more files on CSS's server using Paramiko14 and writes lists of 12 field names.These fields are then ingested by CSS's scheduling software and observed as soon as they are visible.Users can also click the teal "Export" button to download a copy of the field lists to their local computer.
As we have plans to expand to other telescopes (see Section 4), we have taken care to make our survey planning infrastructure as general as possible.Additional survey fields can be added for different facilities, including fields centered on galaxies for imagers with small fields of view.We plan to release this module as a tom_surveys app for the TOM Toolkit.The core of this app consists of the SurveyField and SurveyObservationRecord models, which are analogous to the Target and ObservationRecord models included with the base TOM Toolkit but without the connection to a single pair of coordinates implied by a "target."

Scanning Survey Candidates
We copy all CSS images to our data-reduction server at 5 minute intervals, regardless of whether we requested them.In some cases, their ongoing near-Earth asteroid survey will overlap with GW localization regions by coincidence; we refer to these as "serendipitous" observations.Paterson et al. (2021) describe our data-reduction pipeline in detail; it performs image subtraction and detects new sources in the difference images, which we refer to as "survey candidates" because, at this stage, they may either be real transients or artifacts from the subtraction.We use the optimal image-subtraction algorithm of Zackay et al. (2016)  (called S corr ).We refer to this set of four images as "image quadruplets" below.Updates to the reference images and realbogus machine-learning classifier are discussed in Section 3.
Once detections are in the database, they can be viewed, filtered, and sorted in the TOM.The survey candidates page (Figure 5) shows the observed fields and candidate point sources overlaid on the localization region in the Aladin sky map for a chosen GW event, followed by a table of sources with their real-bogus scores (see Section 3.2) and image quadruplets (see Section 3.3).By default, we display sources within the 95% localization region of the event observed between 0 and 3 days after the merger time, sorted by decreasing "real" score, although further filtering and sorting is available via the form above the table.It is also possible to remove the filtering by localization region and view all detections from the previous night, as would be done for a typical all-sky time-domain survey.If a detection looks real, the user can link it to the GW event by clicking the blue "Link" button below its temporary target name, effectively bookmarking it for further vetting.

Vetting Potential Kilonovae
Motivated by the large number of contaminating sources found during searches following O3 merger events, we incorporate a number of publicly available tools into our infrastructure that automatically vet potential kilonovae, from both our own survey and others.In Rastinejad et al. (2022b), we evaluated the performance of each tool on potential O3 kilonovae and made recommendations for including automatic vetting in O4 follow-up.Here, we describe the components of our Kilonova Candidate Vetting package, 15 which implements the procedures described in our previous work (Rastinejad et al. 2022 b) for SAGUARO's O4 follow-up effort.
We have downloaded and ingested each of the catalogs we use for vetting into the TOM database, so that we can quickly run queries for large numbers of targets, and indexed each catalog table using the Quad Tree Cube (Q3C; Koposov & Bartunov 2006) PostgreSQL extension for efficient cone searches.The entire procedure below, with the exception of ATLAS forced photometry, has been wrapped into a single function, which is run automatically in two situations: when a button is pressed on the target detail page, useful for potential kilonovae from our own survey that already exist in the TOM, or when a new target is added through the TOM web interface, useful for potential kilonovae reported by other surveys.Because ATLAS forced photometry can take several minutes, we run it asynchronously on the press of a separate button.

Point-source Matching
We begin by crossmatching each target with several pointsource catalogs to identify known variable sources (namely, quasars, active nuclei, and variable stars) that may masquerade as kilonovae.We perform a cone search on each catalog using a 2 ′′ radius, chosen to match the typical seeing of ground-based surveys, and then apply additional cuts based on the specification of each catalog.Where a match is determined, we display the angular offset, catalog probability score (where available), and source class (e.g., variable star type, where available) on the target detail page (see Figure 6).This page also includes a view of the field around the target via an embedded Aladin Lite viewer (version 2; Boch & Fernique 2014), which we can use to identify nearby point sources or false matches by eye.
To identify quasar matches, we utilize the Million Quasars Catalog (Milliquas; Flesch 2021) compilation, which includes quasars from SDSS DR16 (Ahumada et al. 2020).Following Flesch (2015) we rule out targets as potential kilonovae if they have a >97% probability of being a quasar.To search   2023) but then classified as a Type Ia supernova by Anand et al. (2023).The planned fields are listed in a table at the bottom of the page, with their probability, rank order in our plan, time requested (if any), and time first observable in UT.
for variable star contaminants, we employ three unique catalogs.First, we look for matches within the Gaia Data Release 3 variable star catalog (DR3; Gaia Collaboration 2023; Rimoldini et al. 2023; updated from using eDR3 in Rastinejad et al. 2022b).In addition, we crossmatch with the All-Sky Automated Survey for Supernovae (ASAS-SN; Jayasinghe et al. 2019) variable star catalog.We rule out all targets as kilonovae that have matches in either catalog.Finally, we search the Pan-STARRS1 (PS1) 3π Survey (Chambers et al. 2016) pointsource catalog (Tachibana & Miller 2018).The catalog provides a point-source probability score for nearly all sources in PS1 using a random-forest machine-learning (ML) algorithm (Breiman 2001).Following Tachibana & Miller (2018), we utilize an ML score cutoff of 0.83 to determine if matches are point sources and thus unrelated to the GW merger event.

Host-galaxy Matching
We incorporate five individual galaxy catalogs to further contextualize each target and determine if its distance and luminosity are consistent with expectations of a kilonova counterpart.In Rastinejad et al. (2022b) we demonstrated that host-galaxy information was powerful in determining if targets were associated with GW events, as it ruled out >30% of potential O3 kilonovae without the need for any follow-up observations.We begin by querying each catalog for a match within 3. ′ 44 of the target.This radius translates to a 100 kpc offset at a distance of 100 Mpc, consistent with the upper end of short GRB offsets (though the median offset lies ∼8 kpc; e.g., Fong et al. 2022).To determine the likelihood the target is associated with each source, we utilize the probability of chance coincidence method (P cc ; Bloom et al. 2002).P cc combines the galaxy's offset and magnitude, and is used to associate GRBs (including NS merger-origin GRBs) to host galaxies (e.g., Bloom et al. 2002).Where available, we calculate P cc using the galaxy's r-band magnitude, as it is generally the most widely available filter, though in cases where this filter is not provided we employ a band with similar wavelength coverage.We display the 10 galaxies with the lowest P cc values and P cc < 0.8 on each target's page and circle them in the Aladin viewer.For each potential host galaxy, we display its name, P cc , offset from the target, distance or redshift (depending on what the catalog provides), magnitude, and catalog source (see Figure 6, bottom right).
As we expect counterparts to an O4 GW event to originate in a low-redshift (z ≲ 0.05) galaxy, we employ three galaxy catalogs specifically designed for the local Universe.Using the method described above, we query the Gravitational Wave Galaxy Catalog (GWGC; White et al. 2011) The "Photometry" tab shows a plot of all known photometric detections and upper limits for this target, including public data from the ATLAS forced photometry server (obtained by pressing the teal "Get ATLAS phot."button), the Transient Name Server, and the Zwicky Transient Facility.
Because this target has been linked with GW candidate S230518h (note the chain icon under the target names), the time of merger of that event is indicated by a vertical line on the photometry plot.The "Host Galaxies" tab (inset) shows a table of potential host galaxies, ordered by increasing probability of chance coincidence (Pcc).These are also marked on the Aladin viewer at the bottom left of the page.Finally, we search for and display any additional survey photometry for each target.This survey data is useful for determining if the transient was detected prior to the merger (ruling it out as a counterpart) and provides information on the color and temporal evolution.We again employ a search radius of 2 ′′ to match the typical seeing at ZTF's Palomar Observatory.We automatically ingest photometry from the public ZTF alert stream (Bellm et al. 2019;Masci et al. 2019;Patterson et al. 2019) and from the Transient Name Server (TNS; Gal-Yam 2021).We also gather ATLAS data using their forced photometry service (Tonry et al. 2018a;Smith et al. 2020;Shingles et al. 2021), and stack and sigma-clip observations from the same night using the script they provide. 16Each of these streams provides photometry from difference images.

Coordinating Follow-up
Once the target has been vetted, rapid photometric and spectroscopic follow-up is crucial to confirm the association of the EM counterpart and gather early data for later analysis of the kilonova.The TOM Toolkit already includes the infrastructure to submit observations to all facilities in the Astrophysical Events Observatories Network (AEON; Street et al. 2020).Currently, this includes optical and near-infrared imagers and spectrographs on Las Cumbres Observatory (Brown et al. 2013), SOAR (Clemens et al. 2004;Schlawin et al. 2014), and Gemini (Hook et al. 2004;Eikenberry et al. 2006;Elias et al. 2006a,b), all of which have active GW follow-up programs in which we participate (see Section 1).
In our TOM, we have added the ability to submit observations to two imaging spectrographs on the MMT: Binospec in the optical (Fabricant et al. 2019) and the MMT and Magellan Infrared Spectrograph in the near-infrared (MMIRS; McLeod et al. 2012).The tom_mmt app 17 is an "observatory module" for the TOM Toolkit based on PyMMT (Wyatt et al. 2023;Shrestha et al. 2024), a recently released Python package that communicates with the application programming interface (API) of the MMT Observatory Manager (Gibson & Porter 2018).The tom_mmt module provides observation request forms (Figure 7, bottom), which are prefilled with default settings based on observations of AT 2017gfo (Coulter et al. 2017) and theoretical models of kilonovae (e.g., Kasen et al.   16 https://gist.github.com/thespacedoctor/86777fa5a9567b7939e8d84fd8cf6a76 17https://github.com/griffin-h/tommmt 2017), along with the most recent magnitude of the target in question.The module also adds the MMT to the TOM's airmass charts (Figure 7, top) and "Facility Status" page, allowing the user to quickly determine which instrument is available for the upcoming night.Last, it allows the user to check the status of requested observations without leaving the TOM.

Visualizing Data
Regardless of whether they were requested through the TOM or if they were obtained by us or others, we have the ability to upload reduced data products for a given target.They will then be displayed on interactive light-curve and spectral plots on the target detail page, produced using Plotly's Python graphing library. 18These allow us to keep track of the evolution of targets in real time and plan or cancel follow-up accordingly.We have tested this functionality by using it to manually aggregate GRB data from GCN circulars (e.g., Figure 8).However, we strongly encourage other observers to switch to machinereadable communications as soon as possible, so that they can be ingested into TOMs automatically (see Section 2.7.3).In the future, we hope to include preliminary analysis tools, such as spectral-line plotting and comparison to light-curve or spectral templates, directly in the TOM.

Treasure Map
The Gravitational Wave Treasure Map is a tool to coordinate, visualize, and assess the EM follow-up of GW events (Wyatt et al. 2020).We report CSS pointings within the 95% localization region of any "active" GW events to the Treasure Map.We consider a GW event to be active during a given pointing if 1. the event is astrophysically significant, 2. the event has not been retracted, and 3. the pointing occurred between 0 and 3 days after the most recent estimate of the merger time.Since the beginning of 2023 September, when we restarted our reporting, we have sent 287 reports of 283 serendipitous pointings for 12 BBH mergers to the Treasure Map (some observations apply to multiple events), corresponding to 1435 deg 2 of localization-region coverage.
To accomplish this, we have developed the tom_ treasuremap app, which, when used alongside the tom_ nonlocalizedevents (Section 2.1) and tom_surveys (Section 2.2) apps, provides a TreasureMapPointing model connecting a NonLocalizedEvent to a SurveyObservationRecord and labeling them with a Treasure Map pointing ID number and a status (planned or completed).When we submit observation requests to CSS, we automatically send these to the Treasure Map as planned pointings and store the pointing IDs in our database.The app also provides the management command report_pointings, which queries the database for completed observations (re- quested or serendipitous) in the localization regions of any active GW events and reports them to the Treasure Map.We run this command hourly via a cronjob on the TOM server.

Transient Name Server
The IAU Transient Name Server (TNS) is the database of record for the discovery and classification of astronomical transients (Gal-Yam 2021).When we discover real transients in the course of our GW follow-up survey, whether or not they are real EM counterparts to GW events, we wish to report their coordinates and magnitude to the TNS and receive a permanent astronomical transient (AT) name for them.To facilitate this, we have built into our TOM a web form (Figure 9, top), accessed by clicking the gray "Report" button on the target detail page, that sends a discovery report to the TNS API without leaving the TOM.If the transient was discovered by our pipeline, the form will automatically be filled out with all the required infor- mation.Users can review the prefilled form and add authors or comments before submitting.After submission, the target will have been renamed with a permanent AT name in the TOM.
Likewise, if we obtain a classification spectrum of a transient and upload it to the TOM, we will want to report the classification to the TNS.If a target name starts with "AT," a gray "Classify" button will replace the "Report" button on the target detail page (e.g., Figure 6).Clicking this button leads to a classification report form, which is also prefilled to the extent possible (Figure 9, bottom).Submitting the form will send the report and upload its associated files to the TNS.After submission, the target's classification and redshift, if provided, will appear on the target detail page.If the target's prefix is changed (e.g., from "AT" to "SN") as a result of the classification, its name will also be updated.

Hermes
In addition to programmatic reporting of observations to these two databases, the GW follow-up community has traditionally relied on free-text GCN circulars to communicate information about potential EM counterparts, including their discovery, their likelihood of association with particular GW events, and the results of any vetting or follow-up observations.SCIMMA's Hermes service 19 is designed to supplement GCN circulars by providing several semistructured message schemata for communicating targets and data in a machinereadable format, alongside free text.Messages sent through Hermes can also be reduced to plain text and sent as GCNs 19 https://hermes.lco.global/ at the user's request.We plan to integrate Hermes messaging into the SAGUARO TOM in a similar method to our TNS reporting-providing a form prefilled with details of targets and data from the TOM that can be quickly reviewed and sent to the Hermes API.Eventually, this will be our preferred method of composing GCN circulars, but as Hermes has been under active development during the same time period as our TOM, we have not made this a priority for early O4.Until this is integrated into the TOM, we plan to send any GCN circulars manually through the Hermes web interface.

Deeper Reference Images
With the accumulation of more than 4 yr of data since the creation of templates for SAGUARO in 2019, we look to create new templates for image subtraction using the additional data for use during O4.Previously, we had access to a total of 5385 fields, arrayed in the sky accessible in southern Arizona, excluding areas near the Galactic plane (see Figure 1 of Lundquist et al. 2019) with reference templates.We have new data for 4916 of these fields, allowing us to build deeper templates for 91% of our previous fields, plus additional data for 25 new fields, expanding the number of templates.We build these new templates using the same criteria as our prior work, requiring the number of stars detected in an image to be greater than 2000, and the sky brightness lower than 20 mag (except in the cases where the total number of images is less than four, for which we implement no cuts).Compared to the previous sample of image templates that were created with a median of 60 individual images (see Lundquist et al. 2019 for details), the new sample of image templates (which includes the deeper template for fields that had additional data, the old templates for the fields that were not observed again since 2019, and the new additional fields) were created with a median of 149 images.On the low end, those with fewer than five images make up less than 1% (compared to 9% previously) and 74% of templates were made with >90 images (compared to the previous 5%).A histogram of the number of individual images that make up both the old and new templates is shown in Figure 10.We expect these deeper reference images to result in fewer imagesubtraction artifacts and allow the recovery fainter transients.We leave a full analysis of their effect for future work, after they have been in use for a longer period.

Machine Learning
Previously, SAGUARO made use of a random-forest ML algorithm (Breiman 2001), as implemented in scikit-learn (Pedregosa et al. 2011), to classify candidates from the image subtraction as real (astrophysical transients, variable stars, or slow-moving objects) or bogus (artifacts caused by bad subtraction, bright stars, or detector effects).The random forest was trained on 10 × 10 pixels taken from the difference images centered on the source (equal to a 15. ′′ 4 × 15. ′′ 4 box; see Paterson et al. 2021 for details) and was able to recover 94% of the Figure 9. Forms for sending transient discovery (top) and classification (bottom) reports to the TNS from within the SAGUARO TOM.These forms are prefilled with all relevant information that is available within the TOM, including photometric and spectroscopic data.This allows the user to quickly review and submit the report with little to no typing.real transients used for validation (where the recovery threshold was set to 0.7).In an attempt to improve the real-bogus score for O4, we looked at enhancing the method used for the ML.First we made use of the Zooniverse portal20 to classify good images for the transients and moving objects (through a private project).We found a sample of 383 and 2480 images for each, respectively.To increase the number of these images in our training set, we included each image rotated by 90 • , 180 • , and 270 • , as well as the relative inversions through the xand y-axes, which also ensures the algorithm is not sensitive to the orientation of the candidate.We then extracted a subset of images flagged as variable stars from our previous observations to add to the transient and moving-object sample of real detections, as well as a subset of images flagged as bad for the bogus sample in our training set.
Convolutional neural networks (CNNs; Fukushima 1980) are often used for image classification problems and have been used extensively in astronomy for object classification, including the real-bogus classification for transients (e.g., Cabrera-Vives et al. 2017;Gieseke et al. 2017;Turpin et al. 2020;Goode et al. 2022;Acero-Cuellar et al. 2023;Chen et al. 2023;Liu et al. 2023).Using TensorFlow (Abadi et al. 2016b,a), we set up a basic CNN consisting of multiple 2D convolution layers, maxpooling, dropouts, and Dense layers (all with the relu activation), and finally ending with a Dense layer with a sigmoid activation.For the training data, we tried a combination of arrays extracted from the new, reference, difference, and corrected significance (S corr ) images derived from the image subtraction.We tried training on the full 64 × 64 pixel (99 ′′ × 99 ′′ ) thumbnails created from the image-subtraction pipeline, as well as on the inner 16 × 16 pixels (25 ′′ × 25 ′′ ) centered on the source.Using the CNN, we found using the arrays just from the S corr images provided the highest accuracy, while including the other arrays often resulted in a drop of accuracy due to overfitting.While the initial results were similar for the 64×64 and 16 × 16 pixel cutouts, further testing found that using the full 64 × 64 pixel array increased the false positive rate for images where bright stars were present on the edge of the cutout, while the 16 × 16 pixel array avoided this.We thus decided to use 16 × 16 pixels from the S corr image as the final training data.We then looked to optimize the hyperparameters of the CNN model using this data.The final CNN model comprises two blocks of two 2D convolution layers of increasing filter size (16 for the first block and 32 for the second), followed by a 2 × 2 maxpooling, a 0.25 dropout, two 2D convolution layers with a filter size of 64, a reshaping filter to flatten the array, and two Dense layers with values of 64 and 32.All of these layers are applied using the relu activation.A final Dense layer with a value of 2 and a sigmoid activation is then applied, with a final dropout of 0.5.The final CNN returns an array of two values containing the probability (from 0 to 1) of the candidate being real and the probability of the candidate being bogus.While the classification of real-bogus is set by whichever class has the highest probability, looking at both values can provide additional insight into candidates that the ML had trouble classifying.Using the highest probability value to classify the candidates, we correctly recover 94% of real objects and label 98% of the bad data as bogus (see Figure 11 for the confusion matrix).

Candidate Ingestion
Although the postprocessing procedure has not changed significantly compared to the steps described by Lundquist et al. (2019) and Paterson et al. (2021), we have upgraded this portion of the pipeline to increase its speed and decrease its memory usage.We typically run the pipeline in a multithreaded mode so that we can process up to 16 images at once.In refactoring the pipeline, we have moved the reading of large files (i.e., the trained random-forest classifier and the moving-object catalog) outside of the threads, so that they are only accessed once at the beginning of the night.The trained CNN classifier is reloaded for each image, due to limitations of TensorFlow in the Python multiprocessing environment.We have also vector-ized the postprocessing operations to the greatest extent possible to avoid overheads associated with Python loops.With the exception of thumbnail creation, which requires writing to the hard disk, the duration of these steps is now nearly independent of the length of the source catalog.We describe each step below.
After extracting sources from the difference images using Source Extractor (Bertin & Arnouts 1996), the pipeline crossmatches the source catalog with 1.3 million solar system objects known to the Minor Planet Center.We download an upto-date catalog of orbital elements at the beginning of each night and use PyEphem (Rhodes 2011) to calculate the equatorial coordinates of each body at the time of each image.Then we use the Astropy coordinates module (Astropy Collaboration 2022) to crossmatch the source catalog with the solar system catalog, using a matching radius of 25 ′′ to account for uncertainties in the orbits.In total, this step takes about 50 s per image, almost all of which is spent calculating the ephemerides.Any sources with matches in the solar system catalog are marked as moving objects in our database and not displayed by default on the survey candidates page.
We then run both our old random-forest and our new CNN real-bogus classifiers, each of which takes ≲1 s per source catalog.Finally, for each source in the catalog, we produce thumbnails of the image quadruplet (see Section 2.3), which are saved on the data-reduction server and hosted with a lightweight Flask 21 web server so that they can be displayed in the TOM.Saving the thumbnails takes a few milliseconds per source, for a total of 1-50 s per image, depending on the length of the catalog.
The pipeline records our observations in four places in the TOM database: 1. the observation record table, which stores the field name, time of observation, and other metadata for each image, 2. the targets table, which keep tracks of distinct pairs of coordinates using a 2 ′′ matching radius, 3. a custom candidates table, which logs each detection of a point source along with its moving-object match and real-bogus scores, and 4. the reduced data table, which records a photometry point for each detection.The new source catalog is crossmatched with the existing targets table in PostgreSQL using the Q3C extension (Koposov & Bartunov 2006) and insertions are done in bulk for each source catalog, allowing us to complete this step in a few seconds per catalog.
In total, the ingestion of candidates typically takes 1-2 minutes per image.We have also begun logging the time at which candidates are ingested into the TOM database.We find that, absent technical or network issues, the typical delay from shut- ter open of the first exposure to the appearance of candidates on the web page is ≲30 minutes (Figure 12).

Bok 90Prime Upgrades
While the workhorse CSS 1.5 m telescope is a very efficient survey machine, its typical limiting magnitude (∼21.5 mag) is not much deeper than the expected brightness of kilonovae (∼20.5 mag) at LIGO's distance horizon for BNS mergers in O4 (∼200 Mpc).Therefore, we are in the process of adding to our program a second wide-field imager on a larger telescope: 90Prime, the prime-focus imager on the 2.3 m (90 in) Bok Telescope on Kitt Peak, Arizona (Williams et al. 2004).90Prime has a ∼1 deg 2 field of view and has been used for deep survey programs like the Beijing-Arizona Sky Survey (BASS; Zou et al. 2017).Equipped with standard SDSS ugriz filters, it also has the advantage of providing color information of transients.Over the past several years, 90Prime has undergone an upgrade, replacing its CCDs with higher-quality, blue-sensitive chips, along with its liquid nitrogen dewar, shutter, and guider camera.Along with these hardware upgrades come upgrades to the instrument control software, as well as a new pipeline for image preprocessing and data-quality monitoring, which we discuss in Section 4.2.The upgraded 90Prime achieved engineering first light in Spring of 2023 but is now undergoing further electronics work before being released to observers.D. Sand et al. (2024, in preparation) will present a complete description of the 90Prime upgrade in a future work.

Image Processing
CSS provides SAGUARO with images that have already been bias-subtracted, flat-fielded, astrometrically aligned, and photometrically calibrated.We need the equivalent preprocessing for 90Prime and any other instruments we wish to add to our survey.For this, we have adopted a customized version of Beautiful Algorithms for Normalizing Zillions of Astronomical Images (BANZAI; McCully et al. 2018b), the preprocessing pipeline developed and used by Las Cumbres Observatory.BANZAI is designed to be flexible so that it can process data from Las Cumbres' several different imagers, and our usage is a proof of concept that it can also be applied to extramural instruments with minimal modification (described below).
In addition to installing the pipeline itself, deploying BAN-ZAI on Kitt Peak involves setting up three Docker containers (Merkel 2014) containing a PostgreSQL database to log observations and processed data, an astrometry service based on the Astrometry.netframework (Lang et al. 2010) with index files generated from Gaia Data Release 2 (Gaia Collaboration 2018; Lindegren et al. 2018), and a photometric calibration service based on the ATLAS All-sky Stellar Reference Catalog (Tonry et al. 2018b).The main customization to the pipeline for 90Prime relates to the fact that the 90Prime focal plane consists of four chips, each with four amplifiers, producing a 16-extension FITS image, whereas all Las Cumbres imagers contain a single four-amplifier chip.Our solution involves splitting each image into four four-extension FITS files and passing them to BANZAI to be processed in parallel.Relatedly, 90Prime's depth and large field of view can produce very crowded fields, which were not always a good match for the Astrometry.netconfiguration included with BANZAI by default.We will refine these settings further as we test the pipeline on a larger quantity of images.

Survey Coordination
Once the upgraded 90Prime is regularly available, we will begin to integrate it into our survey strategy.This will involve reoptimizing our field selection algorithm when 90Prime is available (usually for 1-3 weeks per lunation) to simultaneously plan observations on two telescopes.As discussed in Section 2.2, we have kept the tom_surveys app as general as possible with this future in mind.
Likewise, we will have to adapt our image-subtraction pipeline to run on images from a new instrument, possibly using reference images from BASS.Our initial plan is to run a second instance of the pipeline on the BANZAI server on Kitt Peak to avoid the bottleneck of transferring data off the mountain.In that case, the pipeline will remotely ingest survey candidates into our TOM database and transfer only the thumbnail images off the mountain.Because of the database schema in which "targets" (coordinates) are tracked separately from "candidates" (detections; see Section 3.3), multiple instances of the pipeline can detect the same transient, and these will automatically be associated in the TOM if they are within 2 ′′ of each other.et al. (2022b) and Section 2.4 describe the utility of public photometry sources (e.g., ATLAS, TNS, ZTF) for vetting potential kilonovae.After several years of processing CSS data as part of SAGUARO, we ourselves have a large archive of images that we can use to rule out associations with GW events through premerger detections.We plan to set up a forced photometry service for SAGUARO-CSS images and integrate this into the TOM as part of our standard kilonova vetting procedure.

Additional Follow-up Facilities
In the longer term, we hope to add functionality to the TOM to trigger additional facilities accessible via API.First, we plan to add the remainder of the queue-scheduled instruments on the MMT (Hectospec and Hectochelle; Fabricant et al. 2005;Szentgyorgyi et al. 2011) to the tom_mmt app.There is also ongoing work to build a TOM Toolkit facility module around the target-of-opportunity API22 for the Neil Gehrels Swift Observatory (Gehrels et al. 2004), which would be particularly useful for constraining kilonova models via very early UV photometry (Arcavi 2018).The 4 m Víctor M. Blanco Telescope at CTIO, Chile, may also become part of AEON, allowing it to be scheduled through Las Cumbres Observatory's observing portal, providing a critical publicly accessible survey instrument in the southern hemisphere.

SUMMARY AND OUTLOOK
We have presented our completed and ongoing upgrades to SAGUARO infrastructure for optical follow-up of GW events.These include the development of a TOM that lets users receive alerts, plan and trigger a survey, scan for new survey discoveries, vet potential EM counterparts, coordinate follow-up observations, visualize reduced data, and report results.We have also described improvements to our image-subtraction pipeline, including deeper reference images, a new CNN real-bogus classifier, and streamlined candidate ingestion.Last, we described upgrades to the 90Prime wide-field imager on the 2.3 m Bok Telescope, including improvements to camera hardware, computing hardware, and software, and outlined our plans to integrate it into our search when upgrades are completed.Together, these upgrades place SAGUARO in a good position to discover and characterize EM counterparts to any BNS or NSBH mergers in the remainder of O4 and beyond, if they are visible from Arizona.
The infrastructure we have developed here has applications far beyond GW follow-up.During our design and testing, we have used the SAGUARO TOM to observe young supernovae (Shrestha et al. 2024) and GRB afterglows (Shrestha et al. 2023), and we plan to add astrophysical neutrino localizations (e.g., IceCube Collaboration et al. 2017) soon.As with the other follow-up systems we mention in Section 2, the SAGUARO TOM will grow into a centralized portal for all transient science in our research collaboration.Likewise, we plan to extend our implementation of BANZAI to run on smaller Arizona telescopes without dedicated image processing pipelines.
We also view the SAGUARO TOM as a case study in the type of software infrastructure that will be necessary for timedomain science in the era of Vera C. Rubin Observatory (Ivezić et al. 2019), when millions of transients will be discovered each year.While brokers like Alert Management, Photometry, and Evaluation of Light Curves (Nordin et al. 2019), Arizona-NOIRLab Temporal Analysis and Response to Events System (Matheson et al. 2021), Automatic Learning for the Rapid Classification of Events (Förster et al. 2021), Fink (Möller et al. 2021), and Lasair (Smith et al. 2019) will handle alert distribution, classification, and filtering, TOMs will play a critical role in managing the follow-up observations and data gathered in response.Thoughtful architecture of these systems, in coordination with other time-domain researchers, will help us maximize the scientific return from both large and small observatories.
, which takes in a new image and a deep reference image of the same field and produces a raw difference image and a significance image corrected for astrometric noise

Figure 2 .
Figure 2. GW event detail page from the SAGUARO TOM, showing the source classification probabilities, false-alarm rate, localization areas, and GCN circulars for event S230627c.The tabbed interface switches between different versions of the alert information.The active and retired candidates lists at the bottom of the page link to targets that have been suggested as low-probability EM counterparts to the GW event by Anumarlapudi et al. (2023).

Figure 3 .
Figure3.Real-time human-readable alert summary sent by the SAGUARO TOM, as viewed on the Slack mobile app.The message time stamp shows that the first alert arrived within ∼1 minute of the detection.The summary contains helpful information for determining whether follow-up is appropriate, including the inverse of the falsealarm rate (1/FAR), the distance to the source, the areas of the 50% and 90% localization regions, and, for compact binary coalescences, the probabilities of the progenitor system containing at least one object with mass <3 M⊙ (Has NS) or 3−5 M⊙ (Has Mass Gap), the event having an EM counterpart (Has Remnant), and the signal coming from a BNS merger, NSBH merger, BBH merger, or terrestrial noise source.More information is easily accessed by following the hyperlinks at the bottom of the summary.

Figure 4 .
Figure4.Survey planning page on the SAGUARO TOM, including an Aladin Lite viewer showing three groups of 12 planned fields (red, green, and blue), the 90% credible region contour for S230904n (purple), and the current positions of the sun (gold) and moon (silver).The radii of the sun and moon markers roughly represent their respective exclusion regions, although these are distorted with respect to the Mollweide projection.The purple dot near the top left of the map shows AT 2023rkw, suggested as a possible EM counterpart byNecker et al. (2023) but then classified as a Type Ia supernova byAnand et al. (2023).The planned fields are listed in a table at the bottom of the page, with their probability, rank order in our plan, time requested (if any), and time first observable in UT.

Figure 5 .
Figure 5. Survey candidates page for S230904n in the SAGUARO TOM.As on the survey planning page in Figure4, the Aladin sky map shows the 90% localization region (purple) and current sun (gold) and moon (silver) position.Observed fields and detections are shown in white.Below, the survey candidates table shows a temporary coordinate-based name, the observation month and day, magnitude, full width at half maximum (FWHM), signal-to-noise ratio (S/N), three real-bogus machine-learning (ML) scores, and thumbnails of the image quadruplet.The blue "Link" buttons allow the user to bookmark promising sources for further vetting in association with the chosen GW event.

Figure 6 .
Figure6.Two tabs of the target detail page for AT 2023ixg, which was suggested as a possible EM counterpart to GW event S230518h byJohnston et al. (2023).The teal "Vet" button near the top left of the page runs our kilonova vetting code.The results are displayed in the "Catalog Crossmatching" section at the center left of the page, which indicates no point-source matches but several galaxy matches in this case.The "Photometry" tab shows a plot of all known photometric detections and upper limits for this target, including public data from the ATLAS forced photometry server (obtained by pressing the teal "Get ATLAS phot."button), the Transient Name Server, and the Zwicky Transient Facility.Because this target has been linked with GW candidate S230518h (note the chain icon under the target names), the time of merger of that event is indicated by a vertical line on the photometry plot.The "Host Galaxies" tab (inset) shows a table of potential host galaxies, ordered by increasing probability of chance coincidence (Pcc).These are also marked on the Aladin viewer at the bottom left of the page.

Figure 7 .
Figure 7. MMT observation request page for AT 2023rkw on the SAGUARO TOM, provided by our tom_mmt module.AT 2023rkw was suggested as a possible EM counterpart to S230904n by Necker et al. (2023) but then classified as a Type Ia supernova by Anand et al. (2023).Therefore, the link to the GW event at the left of the page is struck through.An airmass chart (top) and lunar distance chart (bottom left) show the visibility of the target.The form at the bottom right of the page allows the user to specify the instrument settings, choose a program, upload a finder chart, and provide free-text notes.The tabbed interface provides four forms, for imaging and spectroscopy on Binospec and MMIRS.

Figure 8 .
Figure 8.Light curve of GRB221009A as analyzed by Shrestha et al.(2023).These data were aggregated from GCN circulars and uploaded to the TOM in real time.An interactive version of this figure, extracted from the SAGUARO TOM, is available.The plot area can be zoomed and panned, and specific data series can be turned on or off by clicking in the legend.

Figure 10 .
Figure 10.Histograms of the number of individual images that were used to create the reference templates for the old templates vs. the newly rebuilt templates.Dashed lines indicate the median values.

Figure 11 .
Figure 11.Confusion matrix for the new CNN ML algorithm, based on a highest probability classification; 98% of bogus and 94% of real detections are classified correctly.

Figure 12 .
Figure 12.Histogram of the delay time between shutter open of the first of four exposures and the appearance of survey candidates in the TOM in our updated pipeline.The median delay is under 30 minutes.
Beck et al. 2021)2021)L = 170 Mpc(Kovlakas et al. 2021).In addition, to identify contaminants that may be associated with higher-redshift galaxies, we use photometric redshifts from the Sloan Digital Sky Survey Data Release 12 (SDSS DR12;Alam et al. 2015)and the Pan-STARRS Source Types and Redshifts with Machine Learning (PS1-STRM;Beck et al. 2021)catalog.
2022), and the Heraklion Extragalactic Catalog (HECATE; Kovlakas et al. 2021) for galaxies with known distances.Using B-band luminosity density, the GLADE+ catalog is complete out to d L = 47 +4 −2 Mpc (Dálya et al. 2022) while HECATE is