Preparing to discover the unknown with Rubin LSST -- I: Time domain

Perhaps the most exciting promise of the Rubin Observatory Legacy Survey of Space and Time (LSST) is its capability to discover phenomena never before seen or predicted from theory: true astrophysical novelties, but the ability of LSST to make these discoveries will depend on the survey strategy. Evaluating candidate strategies for true novelties is a challenge both practically and conceptually: unlike traditional astrophysical tracers like supernovae or exoplanets, for anomalous objects the template signal is by definition unknown. We present our approach to solve this problem, by assessing survey completeness in a phase space defined by object color, flux (and their evolution), and considering the volume explored by integrating metrics within this space with the observation depth, survey footprint, and stellar density. With these metrics, we explore recent simulations of the Rubin LSST observing strategy across the entire observed footprint and in specific regions in the Local Volume: the Galactic Plane and Magellanic Clouds. Under our metrics, observing strategies with greater diversity of exposures and time gaps tend to be more sensitive to genuinely new phenomena, particularly over time-gap ranges left relatively unexplored by previous surveys. To assist the community, we have made all the tools developed publicly available. Extension of the scheme to include proper motions and the detection of associations or populations of interest, will be communicated in paper II of this series. This paper was written with the support of the Vera C. Rubin LSST Transients and Variable Stars and Stars, Milky Way, Local Volume Science Collaborations.


INTRODUCTION
The Rubin Observatory Legacy Survey of Space and Time (hereafter LSST) is an ambitious project that promises to monitor the entire southern hemisphere sky over a continuous ten-year interval starting in 2024. It will deliver high sensitivity, high (seeing-limited) spatial resolution, and high temporal cadence ( ≥ 1 image per night, ∼ few days repeat on each field). While other surveys have stretched into one or two directions in this feature space, delivering observations at high cadence The Rubin LSST observing strategy is designed to accomplish several science goals within four science themes: (1) Probing dark energy and dark matter; (2) Taking an inventory of the solar system; (3) Exploring the transient optical sky; (4) Mapping the Milky Way. These diverse goals lead to strict interlocking constraints including requirements on image quality, depth -single visit depth and number of visits per field -filters system, and total sky coverage. A detailed description of the science drivers and technical requirements can be found in Ivezić et al. (2019, hereafter I19). 1 While the survey strategy (and indeed facility design) is thus mostly specified by the main science goals, these constraints still allow for a significant flexibility in the details. For example: while the reference design (e.g. Claver et al. 2014;Kahn et al. 2010, I19) leads to a revisit time of 3 days on average for 18, 000 deg 2 of sky, with two visits per night, this still allows for a large distribution and even a significant range of median values for the inter-night time gaps, as seen in Figure 1.
LSST will include several "surveys," each helping to address the four key science pillars as well as other science goals in different ways.The majority of the 10 years will be spent on a survey designed explicitly to meet the requirement specified in Ivezić et al. (2019): the "Wide-Fast-Deep" survey (hereafter WFD). It is expected that this will take between 75% and 85% of the time on-sky. The remaining time will be spent on "special programs", including "minisurveys" (special coverage of extended areas of sky), "Deep-Drilling-Fields" (single pointings that will be visited periodically at an enhanced cadence and to reach a higher cumulative depth in the stacked images), and potentially "Targets of Opportunity" follow up of multi-messenger triggers.
This loose division of LSST into the different flavors of (sub-) surveys implies different levels of flexibility for the observing strategies for different regions of the sky. For example, while the expected range of per-visit exposure times in the WFD region is tightly constrained (∼ 30 seconds) to achieve the goals of the four LSST science pillars and given observing efficiency constraints (I19), some minisurveys may be better served by (or even require) different exposure times, extending potentially much shorter and/or longer exposures than the WFD program exposure time.
Perhaps uniquely among modern surveys, Rubin Observatory has embedded community involvement in the design of the survey (Bianco et al. 2021) and to that end has shared its extensive simulations framework with the scientific community at a very high level of detail, including (but not limited to): detailed hardware specifications, facility operations models (including detailed observatory and instrument overheads), atmospheric transmission, and also models for astrophysical populations and interstellar dust which allow simulated recovery of tracer populations (Connolly et al. 2014). For most users in the scientific community, it is the metadata of the predicted observing strategies (e.g. observing time, expected seeing, instantaneous depth to 5σ photometric precision) that is most relevant to the evaluation of survey strategies: Rubin has made a large number (many hundreds to-date, see Bianco et al. 2021) of simulated LSST surveys available to the community. The Operations Simulator, (Delgado et al. 2014), generates the metadata for a full ten-year period of operation under specified desiderata for the run characteristics.
Led by Lynne Jones & Peter Yoachim at the University of Washington, the project has also developed a dedicated Metrics Analysis Framework (MAF: Jones et al. 2014) 2 , and continues to work with the community in the development of the tools to extract the scientific utility of the OpSims for various scientific cases. Standard metrics run on all OpSims by the project fall under the main sims maf package, 3 while community-contributed metrics are curated at the maf-contrib project. 4 . We discuss the Operations Simulator and MAF in a little more detail from the point of view of the true-novelties we seek, in subsection 2.1. See also Bianco et al. (2021) for more details.
Recent community input on the LSST survey strategy roughly divides into three phases. The first phase concluded with the development of the "Community Observing Strategy Evaluation Paper" (COSEP; LSST Science Collaboration et al. 2017), which attempted to distill the requirements of a wide range of science cases into specifications (and in some cases evaluations) of simple quantitative measures of scientific effectiveness that could be compared between science cases. In the second phase, the community was asked by the project to prepare cadence whitepapers to suggest alternatives to the baseline cadence; 46 whitepapers were ultimately submitted. 5 In the current phase, the community and Rubin are working to implement the quantitative scoring for the various science cases to allow them to be compared on a timescale commensurate with the ultimate decisions by the project on the survey strategy to adopt. This paper forms part of this third phase of community input.
A word on notation is in order. We refer to a simulated 10-year survey as an OpSim. Following the naming convention of the the COSEP, use the acronym "MAF"( or sometimes "metric") to refer to a piece of code that measures properties of an OpSim on a per-field basis. The overall evaluation of a strategy requires assessing its power over a large number of scientific goals. Therefore, in order to be useful for comparison, the MAFs must be summarized themselves into Figures of Merit (F oM s): single numbers that convey the power of a survey (as simulated) to achieve a specific science goal. An example of a metric might be a characterization of the time-gaps between repeat observations of each field in a particular filter-pair of interest, while the associated F oM would collapse the distribution into a single number that captures the sensitivity of the strategy to the detection of transients in some range of parameter space. For more on the operational definitions of metrics and F oM s, see the COSEP.
As discussed in the introduction to the COSEP, ideally the F oM s would be measured in bits of information that the survey would contribute in excess of the previously-available information on a phenomenon. While clear in principle, this information-theory inspired definition of a F oM is challenging to achieve in practice. Not all science cases easily translate into a measurement on a quantity. For cosmology, for example, one could conceivably quantify the scientific power of a survey by the decrease in the uncertainty on the scientific parameter of interest, for example H 0 . However, the survey power even for identifying particular tracers becomes more ambiguous. The power of a survey in identifying progenitors of Supernovae, for example, is less easily quantifiable: as additional qualifiers are placed on the phenomena to be measured (for example: sensitivity to different types of progenitor), the translation of survey sensitivity into bits of additional information becomes increasingly difficult. Following this logic, measuring the power of a survey to discover truly novel phenomena would be impossible. The assessment of the ability of a survey realization (OpSim run) to discover true novelties requires a model-free approach (otherwise we would by default limit ourselves to unobserved, but predicted, phenomena: Chandola et al. 2009).
We have set out to define metrics and F oM s that will allow comparison of simulated LSST strategies based on their potential to discover true novelties, in terms of discovery parameter-space that is well-covered (or not!) by the simulated surveys. We make these MAFs and F oM s publicly available to aid Rubin survey strategy decisions.
The OpSims are continually under development based on input from the project and the scientific community, and thus the suite of available simulations is continuously evolving. Improvements made between releases include general strategy updates (such as changes to the recommended exposure time per visit), improvements in the implementation of engineering constraints (such as the time required for filter changes) and improvements in the implementation of the observing strategies themselves (such as the way in which special cadences are implemented). Bianco et al. (2021) provides more detail. 6 We remain agnostic on the accuracy with which the OpSims actually implement the desired strategies, but focus instead on the output: how well the resulting OpSims support the detection of true anomalies as quantified in our metrics and figures of merit. We selected the OpSim v1.5 family of simulations (a major release from 2020 May with 86 simulated strategies) 7 to develop and demonstrate the metrics and figures of merit, as it contains sufficient variety among the simulations to elucidate the various requirements for detecting true anomalies.
As the simulations evolve, application of the figures of merit to more recent releases is then straightforward.
As an example, we present the evaluation of our figures of merit to the OpSim v1.7 (2021 January) and OpSim v1.7.1 (2021 April) releases, which implement an updated exposure time per visit (2×15s instead of the 1×30s used in OpSim v1.5).
This publication is one of a pair: in this communication (Paper I) we focus on detecting individual objects of interest in a multidimensional feature space that includes time coverage, filter coverage, star density, and total footprint on the sky. Inclusion of constraints from proper motion, which is rather more involved and also lends itself naturally to detection of previously-unknown populations and structures, is deferred to Ragosta et al. (Paper II). The present paper therefore does NOT directly address proper motion anomalies: the reader is referred to Paper II for those issues. This paper is organized as follows: Section 2 summarizes the simulations and methods, and describes the feature space we use. the metrics and figures of merit we have developed, and present the evaluations of the figures of merit over the wide-fast-deep main survey area, on a wide range of simulated observing strategies. Here we consider the following metrics: color and time evolution (Section 3), integrated depth (Section 4), and spatial footprint (Section 5). Section 6 then applies the set of the figures of merit to the OpSims chosen, first to the WFD region (subsection 6.1), then to the minisurveys (subsection 6.2). The application of the figures of merit to the more recent OpSim v1.7 set of simulations is presented in subsection 6.3. In Section 7 we conclude with some recommendations on the usage and interpretation of the metrics and figures of merit we have developed. Interactive tools we have developed to facilitate exploration of the multi-dimensional feature space are presented in the Appendix.

METHODOLOGY
Here we summarize the simulations and methods used. subsection 2.1 summarizes the observations simulator and metrics analysis framework, both provided by the Rubin Observatory, in the context of our work. subsection 2.2 briefly discusses the tools by which we accomplish spatial selection, to isolate regions such as the Galactic Plane and Magellanic Clouds. The usage of feature space to identify discovery space for true-novelties is introduced in subsection 2.3 and the output metadata produced by OpSim in this context is summarized in subsection 2.4.

MAF and OpSim
The Operations-Simulator software 8 OpSim allows the generation of a simulated strategy based on a series of strategy requirements: for example total number of images per field per filter, including simulated weather, telescope downtimes, etc. The input of an OpSim run are the survey requirements (survey strategy) and the output is a database of observations with associated characteristics (e.g. 5σ depth) which specify a sequence of simulated observations for the 10-year survey. The Rubin OpSim went through several versions since its initial creation (Coffey et al. 2006) that primarily differ in the optimization scheme.
The Metric Analysis Framework (MAF 9 ) API is a software package created by Rubin Observatory (Jones et al. 2014) to facilitate the evaluation of simulated LSSTs to achieve specific science goals as measured by the strategy's ability to obtain observations with specified characteristics. The MAF interacts with databases. The MAF has been made public upon its creation to facilitate 8 https://www.lsst.org/scientists/simulations/opsim 9 https://www.lsst.org/scientists/simulations/maf community input in the strategy design. The MAF enables selections of observations within an OpSim primarily by SQL constraint which allows the user to select, for example, filters or time ranges (e.g. the first year of the survey). Further, the choice of slicers allows the user to group observations. For example, one may "slice" the survey by equal-area spatial regions, using the HEALPIX scheme of Górski et al. 2005. Throughout, we choose a HealpixelSlicer with resolution parameter NSIDE=16, corresponding to

Spatial selection
In practice, OpSim generates the synthetic observations using "proposals" for various assumed programs, including Deep Drilling Fields (DDFs), WFD, and "special" programs such as the Galactic mid-plane, Magellanic Clouds and the North Ecliptic Spur (NES, which affords greater sensitivity to Solar System objects; e.g. COSEP, Chapter 2), with the "proposalID" parameters preserved in the OpSim output. When evaluating our figures of merit for the WFD region, we use this "proposalID" to select the relevant observations.
When evaluating our F oM s to the mini-survey regions (subsection 6.2), we select entries spatially, as the science that can be extracted from observations of a particular spatial region depends only on what was observed, and not on the proposalID with which each observation was originally identified. This is particularly relevant when considering simulated strategies that extend the WFD region to encompass regions that would be classified as "minisurveys" in the other simulated strategies: Olsen et al. 2018a, for example, discusses some possible strategies that would do this. The spatial selector we have developed is quite flexible -regions can be specified programmatically or by hand -and we have made it publicly available. 10

Feature space
Anomaly detection is an important field of research with deep methodological ramifications (Chandola et al. 2009;Martínez-Galarza et al. 2020). Notable advances in the field have been achieved in recent years across disciplines: from threat detection in defense and security (e.g. Sultani et al. 2018), to astrophysics (Soraisam et al. 2020;Pruzhinskaya et al. 2019;Ishida et al. 2019;Aleo et al. 2020;Vafaei Sadr et al. 2019;Martínez-Galarza et al. 2020;Lochner & Bassett 2020;Doorenbos et al. 2020) with the discovery of rare and possible unique astrophysical phenomena (Lintott et al. 2009;Micheli et al. 2018;Boyajian et al. 2018, although we note that two of these "true novelties" were detected through crowd-sourced data analysis). Anomaly detection is generally approached either through unsupervised or supervised learning learning techniques (e.g. Yang et al. 2006;Bishop 2006;Hastie et al. 2009). In unsupervised learning, or clustering, a similarity metrics is defined in the available features space enabling the grouping together of similar objects together, as well as the identification of objects that do not belong to any existing group (the anomalous objects). Alternatively, the supervised approach identifies groups in a latent lowerdimensional space based on known classifications in the original feature space derived by domain experts (typically, deep learning approaches to anomaly detection belong to this category).
Both of these approaches implicitly rely on the completeness of measurements in the original feature space: gaps in the observing strategy affect the discovery of true novelties by both increasing the risk that an anomaly would go undetected, if it falls in a gap, and making its anomalous nature harder to assess. In this series of papers, we focus on survey design to maximize the throughput of algorithms for anomaly detection, regardless of the nature of the algorithmic approach.
As measured by imaging surveys, astronomical objects are characterized by brightness, brightness ratio in different portions of the energy spectrum (color), position, shape, and the rate and direction of change in any of those features. The collection of properties defines a multidimensional phase space, with different categories of phenomena lying in different regions of this phase space (see Figure 2). Accordingly, we identified the following features that can be measured in the Rubin Observatory data: We set morphology aside, as largely the power of the survey to measure morphological anomalies does not depend on the survey strategy, but rather on the image system design (e.g., resolution and depth). We assume that measuring anomalous associations depends on our accuracy in measuring the properties of each object.
To measure dynamical anomalies in a completely model-independent way proves to be more involved, because it requires comparison of measured proper motions to those of established Galactic dynamical parameters.
Motion is thus deferred to Paper II, where we also develop a Figure of Merit for the detection of previously unknown populations.
Having identified features that can be extracted from the Rubin Observatory data, and the LSST in particular, such as color information or lightcurve evolution, we measure the completeness of the survey in a hypercube in the feature space as a model-independent measure of the power to detect novel transients or novel modes of variability. Generally, we define transients as objects whose observational and physical properties are changed by some event, usually as the result of some kind of eruption, explosion or collision, whereas variables are objects whose nature is not altered significantly by the event (e.g., flaring stars). Furthermore, some objects vary not because they are intrinsically variable, but because some aspect of their viewing geometry causes them to vary (e.g., eclipsing binaries).
One further parameter that influences our ability to detect anomalies is the sky footprint. Trivially, a larger sky footprint will lead to a higher event rate for anomalies. If one wants to maximise the chance of detecting extragalactic anomalies then a larger footprint would be favorable, while the probability of detecting galactic anomalies will scale with density of objects in the sky. And both will scale with the depth over which the footprint is observed.
Ultimately, we define a set of metrics that can simply be summed to generate a F oM for true novelties: where c, s, d, A sky , D Star , represent the color, lightcurve shape, magnitude depth, footprint, and star density respectively, and w are weights that can be assigned to favor the discovery of, for example, transients over non-evolving objects, or galactic over extragalactic transients. The weights w i allow the investigator to imprint their own judgement on the relative scientific importance of the different metrics. Because we wish to remain as phenomenon-agnostic as possible, we refrain from assigning weights. Instead, we normalize each MAF to the best of our ability in a 0-1 range where 1 is optimal, so as to provide a "neutral" comparison of the existing LSST simulations.

OpSim Data
We base our results primarily on OpSim v1.5, a recent OpSim run that contains 86 databases in 20 families as listed in Table 1.  Note-The description of these OpSims can be found in the release note of OpSim v1.5

11
A more detailed description and discussion of the simulations can be found in Bianco et al. (2021) and on the Community LSST discussion forum. 12 Here we simply note that baseline refers to the straightforward implementation of the requirements in Ivezić & the LSST Science Collaboration (2005); the acronyms that were mentioned in this work to refer to different surveys within LSST, such as WFD (Wide Fast Deep) and DDF (Deep Drilling Fields) are mirrored in the names of the families of OpSims. We note that roll refers to "rolling cadence": a WFD strategy implementation where fields are not observed homogeneously in time over the survey lifetime, but rather some fields are observed more frequently early in the survey and, to different degrees, abandoned later on, to focus on other fields. These strategies provide a denser cadence on each field for some fraction of the survey time, while preserving the overall cumulative depth requirements, and are generally beneficial to the study of rapid-time-scale transients (including supernovae). The footprint family of observations modifies the survey footprint according to different recommendations. 13 Pair strategy is a family of OpSims that explores different approaches to pairing in time filters and observations. The filterdist varies the filter distribution across WFD and third adds a third observation at the end of the night. The goodseeing family explores different requirements on weather to enable observations. The remaining OpSim families explore exposure time (e.g., short or var expt), specific observing times (exclusively or in combination with the regular surveys) such as twilight, explicit observing phenomena that can be enhanced by cadence choices such as dcr, differential cromatic diffraction, or synergy with other surveys, such as Euclid.Lastly, u60 has longer exposures in the u band (60 second compared to the standard 30 seconds) and some OpSims explore modifications of the single exposure time by implementing a single 30 seconds observation instead of 2x15 "snaps" (that get combined into a single image to produce the standard Rubin data products, see also subsection 6.3 for an assessment of the impact of this choice across OpSims).

COLOR AND TIME EVOLUTION
Astrophysical transients and variable phenomena captured humanity's curiosity through the history of science. Modern astrophysics and particularly the use of digital equipment in the last half-century enabled extremely fast paced advances in this field. Figure 2, reproduced and updated from Ivezić et al. (2019), shows 12 https://community.lsst.org/t/fbs-1-5-release-may-update-bonus-fbs-1-5-re 4139 fbs-1-5-release-may-update-bonus-fbs-1-5-release, released in May 2020 13 https://www.lsst.org/call-whitepaper-2018 the phase space of known astrophysical transients: transients and variable phenomena occupy different regions of this phase space of intrinsic brightness vs characteristic time scales. At the beginning of the 20th century, essentially only supernovae were known to exist, and the phase space populated rapidly with many different classes of transients since. It is worth noting the gap for timescales shorter than ∼ 1 day: while it is possible that this region is scarcely populated intrinsically, it is also true that an observational bias impairs discovery in this region: to be effective in discovery and characterization at these time scales surveys need to reach high depth and high cadence simultaneously, while also surveying a large volume if phenomena in this region of the phase space are truly rare.
Due to their diversity in time scales, color, and evolution, the study of transients and particularly studies that aspire to discover new transient phenomena, requires dense space and time coverage. The LSST has both high photometric sensitivity and a large footprint, enabling the surveying of a large volume of Universe. This offers tremendous opportunities to study the variable sky. LSST's capability to discover novel transients then largely depend on its observation cadence. Different transients will benefit from different observation strategies because of the different phenomenological expression of their intrinsic physics. To make sure the observation strategies under design maximize our chances to discover any novel transient, we created the filterTGapsMetric. This MAF evaluates the ability of LSST's observation strategies to capture information about color and its time evolution at multiple time scales. We know some timescales remain unexplored in the present collection of LSST simulations, as discussed for example in the work of Bellm et al. 2021 andBianco et al. 2019.

The filterTGapsMetric
Rubin LSST will image the sky in six filter bands u, g, r, i, z, y. The filterTGapsMetric measures all time gaps between two filters in an OpSim, i.e., ug, gr, ri and so on. The filterTGapsMetric F oM evaluates the coverage of time gaps for each filter-pair.
On a field-by-field basis, for each filter pair, the metric and F oM are evaluated as follows: • select the survey (e.g. WFD in this paper) and the observation time range using SQL constraint and slice the sky with HealpixelSlicer (see subsection 2.1); Figure 3. The distribution of all time gaps for the baseline v1.5 OpSim. The triangle of plot on the left shows all time gaps between different filters (which enable the measurement of color) within 1.5 hours. The column of plots on the right shows the distribution of time gaps in the same filters for the 10-year survey, which enables the measurement of brightness changes. The filters are indicated in each quadrant: from u to y moving from top to bottom and left to right. All histograms are normalized but the intensity of the color is proportional to the to total number of observations in that filter-pair, as indicated by the color bar. In each quadrant the value of DKL is reported (see subsection 3.1). We note that the majority of observations are taken with adjacent filters, which gives a narrow leverage on the spectral energy distribution (SED), and less power to measure color. Color is in fact better measured with filters that are more separated in wavelength, for example g-i or r-z, as described in Bianco et al. (2019) .
• fetch observation times for each field for all visit in either of the two filters; • compute all possible time gaps that can be constructed from pairs of visits; Figure 3 shows the distribution of time gaps for all filters pairs for the baseline v1.5.
Armed with field-by-field time-gap distributions, the F oM for the entire candidate survey strategy is then computed by measuring how well the distribution of time gaps matches an ideal distribution. We use the Kullback-Leibler (KL) divergence (or relative entropy Kullback & Leibler 1951) to measure the discrepancy between the ideal and observed distribution.
The KL divergence provides an information-criteria  based measure of the difference between two distributions: the KL divergence from Q to P is defined as D KL (P ||Q) = P log( P Q ). The KL divergence is not a distance (in the sense that it does not satisfy the triangle inequality), is in general not symmetric (under exchange of Q and P ), and it is not normalized. To derive a normalized quantity from D KL we use e −D KL , where two identical distributions, with D KL = 0 would contribute 1 to the sum, while D KL > 0 would contribute < 1. Thus a larger F oM would indicate a lower discrepancy from the "ideal" distribution and thus a scientifically preferable simulation. This F oM is naturally normalized between 0 and 1 for each field.
All that remains is to choose the "ideal" distribution of time-gaps against which candidate strategies will be compared. We choose different "ideal" distributions depending on whether color evolution or the lightcurve shape is being probed (Figure 4 shows an example). Bianco et al. 2019 have shown that color can be measured reliably even for rapid explosive transients, for time-gaps as long as 1.5 hours. Of course this is not necessarily true for novel phenomena, but we will use this as a fiducial time interval and take the ideal distribution to be a uniform distribution between 0 and 1.5 hours. To probe lightcurve shapes via pairs of observations in the same filter, we want to measure evolution at all time scales, ideally down to the minimum possible repeat time of a few seconds set by the shutter and readout electronics. For observation-pairs in the same filter, then, we adopt a uniform distribution in log 10 (∆t) for the entire 10-year survey.
The steps of the calculation of the F oM for time gaps, are thus: • Compute the discrepancy-measure e −D KL between the distribution of time gaps and an "ideal" dis-tribution, for each filter-pair. This step is shown in Figure 3.
• Sum the discrepancy-measures over the filterpairs, weighted by the number of visit-pairs over the whole sky in each filter-pair N k , and optionally by a "scientific" weight-factor w that allows certain filter-pairs and/or spatial fields to be (de)emphasized.
This weighted sum, over the filter-pairs and over the positions in the sky, is the F oM for the OpSimof interest. The process is summarized in the relation: where 0 ≤ w k,i ≤ 1.0 , N k stands for the number of visits in the OpSim for each of the filter-pairs, and the index i runs through the healpixels subsection 2.1). In practice, we use a simplified version of the above relationship where the metrics are summed over the sky for each filter-pair before computing the KL divergence, since we will embed preferences in the pointing with subsequent components of the F oM (see Section 5): As indicated earlier, we do not choose any weights: the value of w k is always set to 1 in our calculations. Some filters and filter combinations may well be more useful than others to discover anomalies. Trivially, the value of w k could be set by the limiting magnitude for the shallowest filter in a filter pair.  . The plot on the left (a) shows the F oM for repeat visits in the same filter. The plot on the right (b) shows the value for observations in pairs of different filters. Each OpSim is presented as a bar whose length corresponds to the value of the F oM : the F oM s for different filters are concatenated horizontally. For example: on the left the different color bars represent the time gap F oM for different filters from u to y. The OpSims are sorted by the total F oM . In subplot (a) the bars grow toward the left, in (b) toward the right, so that asymmetries in the plot can give intuition on the overall distribution of the two different metrics across the set of the OpSims. See subsection 3.1. Figure 5 shows the F oM tGaps calculated in Equation 3 for all OpSim runs in OpSim v1.5. Because the "ideal" comparison distribution is different for the color (different-filter) and lightcurve shape (same-filter) pairs, expression (3) is evaluated twice for each OpSim: once over the 15 different-filter pairs (for color) and once over the six same-filter pairs, with the results presented separately.

Results
Two families of OpSims rise to the top of the list when ranked by F oM tGaps for the color diagnostics: short and twilight (panel a of Figure 5). This can be explained by the fact that these OpSims contain short exposures that fill in the distributions at short time scale. After a significant performance step we the see filterdist, rolling, and dcr families as the next best options.
The light curve shape F oM tGaps (panel b of Figure 5), shows the rolling family of OpSims rising among the top performers: a rolling strategy naturally provides a log-like coverage which supports the discovery and study of transients at multiple scales. All top 10 performing surveys, from the point of view of the lightcurve-shape characterization, are rolling or short cadence OpSims although we see a smooth performance decline with no sharp transition.

DEPTH METRICS
Because our time-gaps metrics are essentially based on the number of images that meet some criteria in an OpSim, it is important to assure that the images that are counted are all meeting some quality standards. In particular, we need to include information about the image depth (i.e. limiting magnitude), so that we compare the discovery potential within the same volume of the Universe. Some OpSims augment the the WFD survey with short exposures (see subsection 2.4). In fact, we noted in the F oM tGaps analysis (Section 3) that OpSims that include short exposures rise to the top of the ranked list of OpSims: while these OpSims meet the nominal criteria and provide valuable image pairs at short time gaps, they may fail to extend the survey volume to unexplored regions, which is the most important contribution LSST will make in the anomaly discovery space. To account for this, we add a metric component that measures the depth of the images collected by an OpSim.
We inspect the depth distribution of the OpSims for each filter, comparing them to the apparent magnitude limits specified in the Science Requirements Document ( Table 6 in Ivezić & the LSST Science Collaboration 2005). In practice, the main contributor to the difference in limiting depth between the OpSims seems to be the time allocated to short exposures. Short exposures are typically designed for specific purposes, such as the detection of Near Earth Objects (NEOs) (e.g., the twilight neo family) or decreasing the saturation limit so as to enable calibrations with shallower surveys (Gizis 2019). We want to penalize surveys where these short exposure come at a cost of deeper images. Figure 6 compares the per-image limiting magnitude (at 5σ) distribution for two implementa-tions of short-exposures (twilight neo mod1 and short exp 2ns 1expt) with the baseline survey (blue filled histogram). The OpSims including short exposures show a bimodal distribution of limiting magnitude. The short exposures contribute to a cluster that peaks at magnitude brighter than 21 in any bands (u=20.45, g=20.95, r=20.95, i=20.95, z=20.75, y=19.95 for short 2ns and r=20.95, i=20.85, z=20.25, y=20.95 for twilight neo mod1). However, while for short exp 2ns 1expt the distribution of faint (fainter than magnitude ∼ 21.5) images is not substantially different from that of the baseline survey, twilight neo mod1 has fewer faint images in r, i, and z band, and more in y band.
The related F oM is then the difference between the median of the distribution and the survey specification in the Science Requirements Document (Ivezić & the LSST Science Collaboration 2005): where the sum extends to the six filters and n k is a numerical factor that scales the range of the F oM for each filter to [0,1]. This has the effect of treating the contributions from each filter equally. An OpSim must therefore rank at the top in all filters simultaneously to achieve F oM depth =1.0.
This leads to a ranking of the OpSims shown in Figure 7, the short exposure family of images ranks low, compensating for the high rank conferred in the earlier F oM by the higher number of images. Aside from the u60, short, and twilight families, all other OpSims have a similar score, between ∼ 0.8 and ∼ 0.9. The u60, which produces 60-second u-band exposure instead of the standard 30-second, ranks near the top. We also note that the rolling and footprint big sky families are penalized in this metric. This may be a consequence of the added constraints on pointing competing with the constraints on image quality (which relate to weather, airmass, etc.).

FOOTPRINT
Footprint coverage is another important factor which plays a crucial role in determining LSST's ability to discover anomalous and unusual phenomena.
For the purpose of our analysis we define "footprint" as the extent of the sky (number of fields in the sky) that are "well observed" for each filter or filter-pair of interest. This approach is agnostic about the location of the fields in the sky, as we do not know where truenovelties may be. To decide if a field is "well-observed,"   . Ranking of OpSims based on the depth of the exposure as discussed in Section 4. u60, which produces 60second u-band exposure instead of the standard 30-second, extends the observed volume slightly in u band and ranks highly in this metric, but was performing poorly in both F oM tGaps (Figure 5, a and b). Otherwise, for the most part, family of OpSims are clustered together in this diagram, all with similar F oM depth score: 90% of the OpSims generate values with 10% of each other in this metric. we compare the number of relevant observations to the number obtained in a chosen baseline LSST implementation (here, baseline 1.5), under the motivation that the strategy ultimately adopted by the project should outperform this baseline. In this context, "relevant observations" are defined slightly differently depending on whether one is measuring brightness evolution of color. For single filters that measure brightness evolution, all observations in that filter are relevant (so the comparison count is just the number of observations in the 10-year surveys). For filter pairs that measure the color, observations in a pair are only relevant if they occur within two days of each other (so the comparison count is the number of observation pairs constructed from images in different filters and collected within two days of each other).
For the WFD in baseline 1.5 this results in thresholds for every filter pair listed in Table 2. We acknowledge that this choice of threshold is somewhat arbitrary and that this will influence the result of this component of our F oM . We will return to the choice of threshold, and its impact on the science figures of merit, when we extend our analysis to other versions of the OpSim strategies in subsection 6.3. For the present, we emphasize that this thresholding is entirely relative to the baseline simulation: we are not imposing a requirement that the threshold guarantee a significant probability of detection. Consider for example the u − y filter pair: since there are not u − y observations in the baseline 1.5 survey, 14 a field with nonzero u − y pairs would be considered "well observed" by us for that filter combination. By choosing a threshold relative to a fiducial implemen-  Figure 9. The figure of merit F oM EG (a) and F oM Gal (b) for all OpSim runs (for the WFD survey, selected as proposalId = 1 in the SQL query, see Section 2) based on footprint coverage and star count with image pairs in the same filter (measuring lightcurve shape) as described in Section 5 (Equation 5). Colors and symbols denote filter-combinations using the same conventions as in Figure 5 The two F oM s go hand in hand, with small differences in the ranking.  Figure 10. As Figure 9 but for image pairs in different filters (measuring color) as described in Section 5 (Equation 5). Colors and symbols denote filter-combinations using the same conventions as in Figure 5.  tation of the LSST survey, we seek to identify survey strategies that expand the potential of LSST. With these considerations in mind, the footprint figures of merit are generically calculated following the steps below. For each filter (or filter-pair) k: • count the number of visits for each filter pair; for same-filter pairs, consider all possible; for different-filter pairs, consider time gaps within 2 days; • compute the median of this count in baseline 1.5 (call this N median,k ); • check if N k > N median,k ; • sum over all fields that pass this requirement.
Depending on whether a scientist's focus is on extragalactic or galactic anomalies, the preferred footprint would be different: for extragalactic anomalies one would simply want to maximize the sky coverage, whereas for Galactic science the probability of discovering an anomalous object or phenomenon would scale with the number of objects in the Galaxy in that observing field. Therefore, in addition to the F oM just described, which focuses on extragalactic science and which we call F oM EG hereafter, we include one further footprint figure of merit, F oM Gal , that scales with the field's star density: F oM Gal is the sum of each field that meets the requirements as described above, multiplied by the number of stars in that field (itself obtained from a realization of the TRILEGAL models of Girardi et al. 2005

accessed via MAF).
For an OpSim these F oM s are therefore defined as: where i is an index that ranges over all observed fields, s i is the star density (which is obtained from existing MAF functions) for the ith field, and p i,k is set to 1 or 0 depending on whether the field meets the minimum visit requirement for that filter or filter-pair. Similarly to the depth figure of merit (Section 4), the renormalization factor n k is the reciprocal of the maximum value (over all the OpSims) of the sum over fields in the k'th filter. This renormalization serves to treat all the filters (or filter-pairs) on an equal footing: an OpSim must be simultaneously top-ranked in all filters under consideration to achieve a F oM value of 1.0. These figures of merit for all 86 simulations in OpSim v1.5 are plotted in Figure 9 and Figure 10. While some OpSims were designed to cover a large footprint (such as footprint bigsky), other OpSims perform better under the footprint figure of merit we develop here, which includes visit count thresholding in addition to simply evaluating the area covered. So we see again the short and rolling cadences rising to the top.

DISCUSSION
We have created a series of MAFs and F oM s to assess the ability of Rubin Observatory LSST to discover completely novel astrophysical objects and phenomena. In an attempt to remain agnostic to what specific characteristic may render an object or phenomenon anomalous and thus which kind of anomalies we could discover, we choose to assess the completeness of coverage achieved in a phase space quantified by figures of merit exploring the following observables: 1. Flux change, parameterized as F oM tGapsmagnitude; 2. Color, parameterized as F oM tGaps -color; 3. Depth, parameterized as F oM depth ; 4. Sky footprint, parameterized as F oM EG ; 5. Star counts, parameterized as F oM Gal .
(In contrast to Figure 9 & Figure 10, in this Section the footprint and star-count figures of merit above are evaluated over all combinations of filter.) The five elements enumerated above are added straightforwardly to one-another (Equation 1), although the final F oM could be fine-tuned to some phenomenological expectations (for example to the discovery of galactic, as opposed to extra-galactic transients) by choosing the weights in the sum over the F oM components.
We note that the weights are thus formally somewhat arbitrary but scientifically rather crucial to the balance of science considerations imprinted on the sum figure of merit by the investigator. Remaining "agnostic", we opt to strive for balance in the normalization and relative weighting of each element of the F oM . The individual figures of merit are each normalized so that they essentially rank all the OpSims on a 0-1 scale for that particular dimension in feature space, where an OpSim must be top-ranked simultaneously in each filter (or filter-pair) to achieve a maximum 1.0 score (Sections 3 -5). We then choose weighting factors (w i in Equation 1) to weight each of the five F oM s equally.

Main Survey
First, we want to summarize some considerations arising from our analysis of the performance of different LSST simulations from OpSim v1.5 . These considerations, however, should be read in the light of the discussion of the different OpSim versions in Bianco et al. (2021) and rely on the reader having familiarity with the OpSim v1.5 set, as described there in and in more detail on the Rubin Community forum 15 . We will also extend this discussion to other versions of OpSims briefly in subsection 6.3.
The bar charts included in this work (Figure 5 Figure 10) provide an intuitive way to understand how sensitive a F oM is to observing cadence choices. We note that: • The F oM tGaps for flux evolution (Figure 5 a) is very sensitive to OpSim details: OpSims that included short exposures are critically improved as they provide visibility into time-scales that are otherwise not accessible to the survey. Going back to the the phase space of transients presented in Figure 2 and the discussion of existing observational biases, rapid evolutionary time scales are quite likely to host unobserved, unexpected phe-15 https://community.lsst.org/t/fbs-1-5-release-may-update-bonus-fbs-1-5-release/ 4139 nomena: true-novelties. Our metric reflects this expectation.
• This effect is mitigated by the depth metric that down-weights OpSims where the short exposures come at a cost of overall survey depth. Otherwise, this metric does not differ much across most OpSims as the median observation depth is well defined by the Ivezić et al. (2019).
• The galactic and extra-galactic footprint metrics as defined by us are somewhat less sensitive to observing choices, as indicated by the more gentle slope of the silhouette of the bar chart in Figure 9 and Figure 10. However, in Figure 9 three regimes are visible: OpSims that include short exposure (twilight, short exp, and some wfd depth implementations with a large fraction of observations included in the WFD survey, i.e. a large "scale" parameter) raise to the top. A number of specific implementations from nearly all families, however, sink to the bottom and perform very poorly (some footprint implementation and wfd depth surveys with small scale parameter). The ranking of the OpSims is similar for Figure 9 and Figure 10. Figure 11 shows the performance for the combined F oM as described above, organized by OpSim family. Observations associated with the WFD proposal are shown in Figure 11-a (the results including the minisurveys are shown in panel b; the mini-surveys themselves are discussed in more detail in subsection 6.2). This visualization provides a synoptic look at our F oM . Individual components of the F oM can still be identified by the color of the bar element. Furthermore, this visualization allows us to identify the performance of a family of OpSims, providing a more intuitive way to assess the reason why OpSims may rank differently, but also a way to assess how the detail of an implementation affect results. For example, the short and twilight families are among the top performers, with little sensitivity to the details of the implementation. Conversely, the wfd and footprint families (the former ranking third overall, the latter in the middle ranking seventh) provide a range of results, from excellent to poor, depending on the implementations details. In both cases, the performance is dominated by the footprint F oM (the purple portion of the bar. For the wfd the result of both footprint F oM s scales with the "scale" parameter, the number of visits allocated to the WFD survey, but see also subsection 6.2). It should be noted that these are core families of simulations, with a range of implementation details that can be tweaked, so it is not surprising that they result in a range of measured performance.  Figure 11. Bar plot showing the performance for our final five-fold F oM ranked by family's top performing OpSim: (a) for WFD observations selected by setting proposalId=1; (b) all observations not identified with DDF. Arrows point to the OpSims that are also shown in the radar plots in Figure 12. This plot is discussed in Section 6. See subsection 2.4, Table 1, and Bianco et al. (2021) for more detail.
In subsection 6.2 we will discuss Figure 11-b and address the question of what the minisurveys add to the science performed in the WFD regions, by considering together all the exposures not identified with a deepdrilling field.
Applying our metrics to OpSim v1.5, we note that: • The F oM tGaps -magnitude-evolution component (see also Section 3) is pushing entire families of OpSims to the top, namely those that include short observations and thus expand the LSST feature space to short time scales.
• Within an OpSim family, the most significant contribution in determining the ranking of OpSims are the F oM EG and F oM Gal that are, however, strongly correlated (see also Section 5). DDFs. An interactive version of this plot is available at https://xiaolng.github.io/ widgets/radar.html, see Appendix A. A radar plot shows the metrics at the vertices of an polygon with the metric value mapped to the distance from the center of the polygon. With multiple OpSims plotted in the same radar plot, we can compare the tensions between F oM components, while the total area inside of the polygon is a measure of the overall quality of the OpSim. For our F oM , which is the simple sum of five components, this visualization is well suited to provide a synoptic view. This plot is discussed in subsection 6.1 • Overall, the top performing OpSims in each family are all within a score of ∼ 0.3 of each other, demonstrating that all OpSim families have the potential of being implemented in a way that is favorable to the discovery of true-novelties, with the exception of specialized surveys such as bulge, and alt dust: these families that typically allocate visits to focus areas of the sky are penalized in the footprint portion of our F oM . We refrain from discussing the rolling family of OpSims until subsection 6.3.
A radar plot in Figure 12 shows selected OpSims to tune the balance in the design of the final strategy. In Figure 12 the best performing OpSims for each of the top four families are plotted as identified above (panel a for the WFD). With this visualization, we can see the substantial impact of the flux-change component of the metric, which measure completeness in pairs of observations in the same filter, on the overall result, and how it is compensated, in the case of short exp by the depth F oM depth , leaving the twilight neo mod2 as the most balanced OpSimfor our set of metrics.
We provide an interactive widget that allows the reader to explore the radar plot for our and other sets of metrics in Appendix A.

Minisurveys
In addition to the primary WFD survey, LSST has the capability of conducting mini-surveys including but not limited to the Galactic Plane, Magellanic Clouds and Deep Drilling fields. These minisurveys enhance science cases that yield greater science return with greater density of targets, including (but not limited to): the detection of stellar-mass black holes, dwarf novae and Type Ia Supernova progenitors, and gravitational microlensing at various timescales. Because the mini-survey regions tend to cover regions of high density of stellar sources, they are therefore more likely to discover phenomena never observed before.
To assess the coverage achieved in areas of interest to the minisurveys, we select observations by spatial footprint (rather than by proposalID), as discussed in Section 2. Figure 13 shows the adopted minisurvey regions: Galactic Plane (GP), Large Magellanic Cloud (LMC), and Small Magellanic Cloud (SMC). The adopted GP footprint is a cosine function of Galactic longitude, with amplitude |b| = 10 • and first zero at |l| = ±85 • , plus a strip at constant thickness b ≤ 2.5 • to accommodate the thin disk at all longitudes. 16 For the Magellanic Clouds, we select all healpix fields (resolution parameter NSIDE=16) within 3.5 FOV of the twelve fields covering the cloud main bodies proposed in the Olsen et al. (2018b) cadence whitepaper.
We also provide code for the user to choose a specific region of the sky of their interest (see Appendix A) either by setting a formula from coordinate parameters, or interactively selecting pixels.
The individual figures of merit in these regions are normalized following similar schemes as for the main survey, but with thresholds or maximum values evaluated over the spatial regions of interest (see Tables 3-5 for the comparison N median,k counts for the minisurvey spatial regions). We normalize the F oM of the time gap metric by its maximum value across the OpSims. For the footprint, as with Section 5, we choose the median number of visits from baseline v1.5 within the defined footprint, normalized by the total selected number of fields within (254 for Galactic Plane, 12 for the LMC and 5 for the SMC), as a threshold to decide whether to classify a field as "well observed" (Tables 3 -5).  Figure 14 shows the figure of merit evaluation for the three spatial regions for fields that are allocated WFD coverage (i.e., observations with proposalID=1). This demonstrates quite dramatically that the Magellanic Clouds are not allocated WFD-like coverage in most of the strategies considered. Figure 15 widens the evaluation to include any exposures not associated with deep drilling fields. Strong variation is apparent between the families of OpSims, as expected for families that experiment with the areas of coverage on-sky. The footprint family shows strong variation depending on which region is favored: footprint gp smooth performs the best for the Galactic Plane, but is in the bottom quartile (of all the OpSims) for the Magellanic Clouds. Conversely, footprint add mag clouds is at or near the top for the Magellanic Clouds, but is near the middle for the Galactic Plane regions.
The AltSched implementations perform quite badly for the Galactic Plane regions, but allocate favorable observations to the Magellanic Clouds. Curiously, the bulges family of OpSims are among the worst-performing families for the Galactic Plane regions, though in the top three for the Magellanic Clouds. The baseline strategies appear near the middle of the distribution for the minisurvey regions.
Since the regions are to some extent competing with each other in terms of allocation, Figure 14 & Figure 15 may be best interpreted in terms of which OpSims to avoid due to their being problematic for particular re-gions of scientific importance. From that perspective, OpSims alt dust and footprint new are unlikely to satisfy those interested in the Galactic Plane, while the filterdist family serves the Magellanic Clouds particularly poorly.
By comparing Figure 15 panel a and b we can address the question of what the minisurveys add to the science performed in the WFD regions, by considering together all the exposures not identified with a deepdrilling field (Figure 15-b). Most of the F oM s remain relatively unchanged by the inclusion of the minisurvey exposures. The exception is the "star density" F oM , which appears to return systematically lower values for each OpSim when the minisurveys are included. This is probably an artefact of normalization and thresholding: OpSim s that well-cover regions of high stellar density will increase the upper bound of the range of this F oM across the OpSims. If one or two OpSims stand out from the rest in this regard, the standouts will return renormalized F oM near 1.0 while the rest sink to lower values. Inclusion of the minisurvey-identified observations in the F oM computation changes the ordering of the families somewhat, though not radically: the baseline strategies, for example, remain in the bottom quartile of the OpSims when ranked by family (see panel b of Figure 15). The minisurveys do seem to reduce the contrast somewhat between OpSims within a given family and even between the families.

Comparison with v1.7
Our work is based on OpSim v1.5, the version of OpSim simulations released in May 2020. However, since then, more simulations have been released. We briefly inspected the performance of OpSim v1.7 (74 simulations at the time of writing) and OpSim v1.7.1 (10 simulations), the most recent simulations at the time of writing.
It is important to note some key differences between OpSim v1.5, OpSim v1.7, and OpSim v1.7.1 (however, a thorough description of these simulations is outside the scope of this paper and the reader is reminded that details are available on the Rubin Community web forum. 17 ) OpSim v1.5 uses for almost all simulations 1×30 seconds exposures while OpSim v1.7 and OpSim v1.7.1 use 2×15 seconds exposures per visit. It estimated that this would lead to a loss of efficiency of ∼ 7%. 18 It is also expected that the rolling family of OpSims would display significant changes compared to OpSim v1.5, due to improvements in the way rolling cadences are implemented to more closely match their specifications. Versions 1.7 and later of the rolling OpSims are considered a more reliable implementation of rolling cadence than v1.5 (Lynne Jones, private communication). Figure 16 and Figure 17 show our F oM for all OpSims with the three OpSim versions side by side, color-coded by OpSim. Figure 16 shows the results for observations identified with the WFD survey (proposalId=1), while Figure 17 shows the results for all observations not identified with Deep-Drilling Fields (and thus addressing the impact of the inclusion of the minisurveys in the overall science figures of merit).
When run on our final F oM , OpSim v1.5 leads in general to larger F oM values (and thus suggests greater scientific yield). In Figure 16-a, we can observe how almost all OpSim v1.5s (blue) outperform OpSim v1.7s (orange), while OpSim v1.7.1 simulations populate all regions of the chart, with six stripe scale0.90 nslice6 fpw0.9 nmw0.0 outperforming all others. This is a rolling cadence, with six declination stripes as the rolling scheme. This OpSim performs well on all components of our metrics 18 see for example https://community.lsst.org/t/ october-2019-update-fbs-1-3-runs/3885 except the piece that measure flux change (F oM tGaps in the same filter) where this OpSim is outperformed, as discussed in Section 3 and Section 6, by OpSims that include short exposures. However, the performance on measuring color (i.e. the number of observations within 1.5 hours in different filters) and the footprint components of the F oM are sufficient to compensate for this and place the OpSim at the top.
We suspect that much of the difference between OpSim versions may be due to the sensitivity of the figures of merit to the total number of observations collected, through the ∼ 7% reduction in the number of observations per field noted above (particularly considering that our footprint F oM s are based on a threshold).
To test this hypothesis, we scale down by 7% the number of visits in the calculations of the F oM tGaps and F oM footprint elements applied to OpSim v1.5. The results of this exercise are shown in panel b of Figure 16 and Figure 17.
This in fact does mitigate the almost binary split in performance between the two OpSim versions seen in panel a, although the twilight and short simulations from version OpSim v1.5 continue to be at the top.
Correcting for the 7% depth effect, the OpSim v1.7.1 release also improves relative to OpSim v1.5 (as expected) and now all nine of the 10 simulations in OpSim v1.7.1 are in the top 50%. Indeed, once we control for the overall number of observations, the OpSim v1.7 and OpSim v1.7.1 evaluations tend to populate the upper half of the distri-  Figure 14. Bar plot, as Figure 11, showing OpSims ranked by family, but this time for three selected spatial regions: (a) the Galactic Plane (b) the LMC (c) the SMC. Only visits allocated to the WFD (labeled as proposalId=1) are counted. Arrows point to the OpSims that are also shown in the radar plots in Figure 12. The twilight and short families of OpSimperform best on the Galactic Plane, as they did over the entire WFD footprint, while wfd depth, formerly ranked third, is now ranked the seventh. But in reality the top performing OpSim in most families all perform similarly. The main differences are generally driven by F oM tGaps in the same filter. Only eight OpSims cover the SMC and only five cover both LMC and SMC with WFD-identified observations. See subsection 6.2.
bution. However, most of the very highest-performing OpSims by our F oM s still belong to OpSim v1.5: we note that seven of the top eight-performing OpSims in OpSim v1.5 include exploration of short exposures (Figure 16-b).
Inclusion of the minisurveys seems to mitigate slightly the preference for 1×30s exposures, with a handful of OpSims from OpSim v1.7 now appearing in the top quartile (Figure 17-a). As with the WFD-only observations, the overall number of exposures seems to explain most of the discrepancy between OpSim v1.5 and the newer releases (Figure 17-b).

CONCLUSION
Rubin LSST is designed to transform entire fields of astronomy by collecting an unprecedentedly large and rich photometric data set. Yet one of the most exciting promises of LSST is its potential to discover completely novel phenomena, never before observed or predicted from theory. We created a five-fold F oM that relies on a set of MAFs that assesses the ability of Rubin Observatory LSST to discover novel astrophysical objects, but instead of selecting known anomalies (e.g., Boyajian et al. 2018) or theoretically predicted unusual phenomena to benchmark our results, as more commonly done in the field (Soraisam et al. 2020;Pruzhinskaya et al.   2019; Ishida et al. 2019;Aleo et al. 2020;Vafaei Sadr et al. 2019;Martínez-Galarza et al. 2020;Lochner & Bassett 2020;Doorenbos et al. 2020) we attempted to remain true to the premise that a true novelty is something that fundamentally cannot be predicted. This exercise is conceptually difficult as by definition we do not know what we are looking for. We can however rely on the completeness of the feature space derived from the survey's data: if all measurable features are exhaustively sampled, anomalies can be detected.
We thus created a series of MAFs and F oM s that measure the completeness in the space of observables derived from LSST data. Completeness to color and magnitude (and their evolution) was probed by measuring the number of observations and time gaps between observations in pairs of different filters and in the same filter (respectively). We scaled a survey quality by the survey's sky coverage, choosing to benchmark this component of the metric to a fiducial implementation of LSST, baseline v1.5, and by the number of objects observed, scaling the footprint itself by the number of stars in each field. These metrics were then summed into a single F oM . Finally, since the F oM so far assembled largely relies on number of observations, an F oM element was needed that considers the quality of the observations. For this we added F oM depth to measure the LSST 10-year stacks magnitude depth, penalizing for example OpSims that include short-exposure observations if these take time from high-quality, deep observations, but only in this case. Proper motion considerations are reserved for paper II.  Figure 16. Bar plot showing the ranking of OpSims based on our five-fold F oM for WFD visits (selected as proposalId=1). All simulations from OpSim v1.5, OpSim v1.7, and OpSim v1.7.1 are included. (a) shows the result of our F oM while (b) shows the result after scaling the number of visits in OpSim v1.5 by 7% to isolate the impact of small differences in survey efficiency associated with the single visit collection strategy (1×30 seconds vs 2×15 seconds). The contribution of each component of our F oM is shown in the same order as in Figure 11, Figure 14, and Figure 15: flux change, color, depth, footprint, and star count from left to right. This plot is discussed in subsection 6.3  Figure 17. As Figure 16 but for all regions of the sky that do not correspond to DDFs. This plot is discussed in subsection 6.3 While the main purpose of this paper is to conceptualize a non-parametric way to explore a survey's potential for anomaly detection, these considerations will ultimately need to be applied to current and future Rubin LSST candidate strategies. To illustrate how this can be done, we performed the comparisons for recent suites of simulations.
We identified some high-performing families within OpSim v1.5 and justified their high rank as measured by our F oM (subsection 6.1 and subsection 6.2). Generally, families of OpSim that maximize the diversity of the observations (in terms of time gaps, footprint and exposure time) seem to be preferred, but there is considerable variation within each family.
To first order, as expected, the mini-surveys seem to be led by footprint considerations. Since fundamentally the allocation of observations to minisurvey regions is a zero-sum game, we point out here that there are high-performing OpSims for the minisurveys that do not dramatically impact the science obtained in the mainsurvey -so allocating a modest number of exposures to the minisurveys does not seriously impact the scientific goals of the main survey.
We briefly inspected the most recent (at the time of writing) versions of the OpSim: OpSim v1.7 and OpSim v1.7.1 and found that their performance is impacted, in general, by collecting exposures in two snapshots (2×15 seconds vs 1×30 seconds).
However, even correcting for this, some families of OpSim v1.5 simulations are the best performers for our science case: namely those that provide visibility into additional time scales by adding short exposures to the observing plan, but planning them when long exposures are unfeasible, so that they do not come at the cost of an overall loss of survey depth (e.g., twilight). We point out that any extension of the feature space is advantageous to the discovery of true-novelties, and thus we are not bound to the minimum allocation of short exposures required for other goals (such as cross-calibration of LSST to external catalogs with brighter saturation limits; e.g. Gizis 2019); those considerations are beyond the scope of this paper.
A further comment on the issue of 1×30 vs 2×15 seconds is in order. While we see the effects of increased survey efficiency in our metrics, it should be emphasized that none of our metrics include considerations on the impact of this choice on image quality or on the capability to open up intra-visit timescales by treating the two exposures in a visit separately. Combining the impact of Rubin's LSST data volume with visibility into shorttime scales is potentially transformational for rare phenomena (like e.g., relativistic explosions, see Figure 2). We note, however, that any analysis based on the individual snaps that make the 30 seconds exposure would require custom pipelines.
Yet a comprehensive discussion of the detailed reasons why a specific OpSim achieves a certain performance is beyond the scope of this paper. We encourage instead the use of our metrics to evaluate existing and new OpSim to implement an LSST survey that maximizes the throughput of Rubin Observatory in its four science pillars with particular care to the discovery novelties, that has the potential to advance or transform all of these fields.
The code on which this analysis is based is available in its entirety in a dedicated GitHub repository 19 . This paper was created in the nursery of the Vera C. Rubin Legacy Survey of Space Time Science Collaborations 20 and particularly of the Transient and Variable Star Science Collaboration 21 (TVS SC) and Stars, Milky Way, and Local Volume Science Collaboration 22 (SMWLV SC).
The authors acknowledge the support of the Vera C. Rubin Legacy Survey of Space and Time TVS SC and SMWLV SC that provided opportunities for collaboration and exchange of ideas and knowledge. The authors are thankful for the support provided by the Vera C. Rubin Observatory MAF team in the creation and implementation of MAFs. The authors acknowledge the support of the LSST Corporations that enabled the organization of many workshops and hackathons throughout the cadence optimization process through private fundraising.
The authors thank Dr. Edward Ajhar, who emphasized the importance of an evaluation of the effectiveness of the Rubin survey strategy in the discovery of unknown phenomena at the 2019 LSST (Rubin) Project Community Workshop.
We used the following packages: • python including • d3.js 24 to create spatial selection tools and interactive radar/parallel plots.

A. INTERACTIVE TOOLS
We provide three interactive tools that support the analysis performed in this work. We make javascript-D3 (Bostock et al. 2011) interactive versions of two synoptic visualization of the results of our F oM available: a parallel coordinate plot and a radar plot. The parallel coordinate plot 25 ( Figure A.18 panel a) allows the user to follow the performance of an OpSim across components of the F oM . Toggling between families of OpSims to highlight the OpSims within, while keeping all other OpSims in the background, the user can easily identify "standout" OpSims by F oM element. By selecting the "cumulative" option the viewer can follow the evolution of an OpSim across components of the F oM while retaining information about the overall performance.
The radar plot 26 ( Figure A.18 panel b, also discussed in Section 6) is a synoptic visualizations that maps multiple elements of a F oM to a polygon, with the distance from the center of the polygon representing the result of the F oM element. It allows the user to visualize tension between component of the F oM as well as the overall quality of an OpSim, which maps to the area of the polygon. It is however hard to include many OpSims in the same radar plot without compromising readability. This widget allows the reader to toggle between OpSims, which are color-coded by family.
In both widgets is also possible to select the survey or sky area that the user wants to inspect (e.g., WFD, or LMC, etc, see subsection 2.3).
While both widgets come pre-loaded with the metrics developed in this paper, the user can easily visualize their own metrics by uploading a comma-separated-value format file with the result of the MAFs containing the following columns: db (the database name), m 1 (numerical value for the first element of the metric), m 2 (numerical value for the second element of the metric), ... , m n (numerical value for first the last element of the metric).
We offer a python based widget to select regions of sky based on a specific pixelization (e.g., healpix) which was used to select the Galactic Plane, LMC, and SMC regions in subsection 6.2. This tool is available in a dedicated GitHub repository 27 as a jupyter notebook and interactive webtool (Li 2021).