Measuring the propagation speed of gravitational waves with LISA

The propagation speed of gravitational waves, $c_T$, has been tightly constrained by the binary neutron star merger GW170817 and its electromagnetic counterpart, under the assumption of a frequency-independent $c_T$. Drawing upon arguments from Effective Field Theory and quantum gravity, we discuss the possibility that modifications of General Relativity allow for transient deviations of $c_T$ from the speed of light at frequencies well below the band of current ground-based detectors. We motivate two representative Ans\"atze for $c_T(f)$, and study their impact upon the gravitational waveforms of massive black hole binary mergers detectable by the LISA mission. We forecast the constraints on $c_T(f)$ obtainable from individual systems and a population of sources, from both inspiral and a full inspiral-merger-ringdown waveform. We show that LISA will enable us to place stringent independent bounds on departures from General Relativity in unexplored low-frequency regimes, even in the absence of an electromagnetic counterpart.


Introduction
The detection of gravitational waves (GWs) by the LIGO-Virgo Collaboration has set stringent constraints on deviations from General Relativity (GR) [2][3][4][5] and offered additional probes of fundamental physics (e.g. [1]). The future spaceborne detector LISA [6] will be able to further test properties of GW propagation over large cosmological distances, by measuring low-frequency GWs emitted by coalescing massive black hole binaries. The specific aim of this work is to investigate what LISA can teach us about the speed of gravitational waves, by means of analysis of GW waveforms only. Our goal is part of a wider search for general, frequency-dependent modifications of GW propagation, which can be tested by the next generation of GW experiments (see e.g. [7,8]). The propagation speed of GWs, c T , was most recently measured by the LIGO-Virgo collaboration using observations of the binary neutron star merger GW170817 [9][10][11][12]. This impressively precise bound 1 of −3×10 −15 ≤ c T −1 ≤ 7×10 −16 (in c = 1 units) was translated into a constraint on the landscape of dark energy and extended gravity models in [13][14][15][16][17][18], where it proved fatal for a handful of theories.
Indeed, the constraint from GW170817 is widely considered a major challenge to extended gravity theories predicting a non-standard GW propagation speed. However, it can also inform discussions on properties required for these gravity models to possess a healthy ultraviolet (UV) completion. This is the viewpoint of [19], which added a degree of subtlety to the interpretation of the data that has not yet been considered widely in the literature (though see e.g. [20] for further theoretical work on the topic). In [19], compelling arguments and examples are presented suggesting that the speed of propagation of GWs may vary as a function of the energy scale. The starting point is the observation that at low energies, most theories spontaneously break Lorentz invariance through a time-dependent vacuum expectation value of an additional field(s). Such a time-dependent vacuum expectation value is essential for driving cosmic acceleration, but it usually leads to a tensor speed c T < 1 due to non-minimal couplings between extra fields and gravity. Explicit examples of this phenomenon arise in the context of Horndeski theories and their extensions, Beyond Horndeski or DHOST [21][22][23][24][25][26][27].
On the other hand, if the UV completion of an extended gravity theory is required to be Lorentz invariant (as is usually the case), then necessarily the graviton speed becomes luminal at high energies. The transition between non-luminal and luminal speed is likely to occur well before (or at most, around) the strong-coupling scale of the theory, which for Horndeskilike theories is typically Λ = (M Pl H 2 0 ) 1/3 ∼ 260 Hz. This is within the frequency band of ground-based GW detectors: as a consequence, ground measurements might correspond to the frequency range for which the Lorentz invariance of the theory has already enforced luminal propagation speed. At lower frequencies, for example in the LISA frequency band (∼ 10 −5 − 0.1 Hz), the speed of GWs may instead be different from one.
In a broader context, an intriguing picture about sub-and super-luminality of GWs is emerging from recent literature on so-called positivity bounds. Such a programme aims at using criteria of unitarity, causality, locality (and Lorentz invariance) to ascertain whether low-energy effective theories admit a standard UV completion. In the cosmological context or near black holes, it has often been assumed that the speed of GWs ought to be (sub-or at the most) luminal, leading to theoretical constraints on several models beyond Einstein gravity on a Friedmann-Robertson-Walker (FRW) background [28][29][30][31][32][33].
These criteria are an extension of, or rather an extrapolation from, seminal results on causality bounds derived for flat spacetime. The issue is subtler in curved spacetimes (FRW being the key example here), as the QED case studied in [34] demonstrates. Whenever curvature becomes important, super-luminality of GWs does not imply a lack of causality. In curved spacetime, the speed itself 2 may be frame-dependent (see, e.g., [36] for an example in the cosmological context) and therefore not a reliable indicator of causality. Remarkably, in the standard EFT treatment of GR one finds that loop contributions from massive fields lead to a non-luminal speed of GWs on cosmological backgrounds; positivity arguments suggest a super-luminal speed of GWs at low energies [35,37]. Such findings are not at all in conflict with causality, and have in several examples been shown to be necessary precisely to guarantee causality. In [35], a notion of causality 3 more reminiscent of the standard lore has been shown to be more than compatible with positivity bounds whenever a well-defined decoupling limit of the (helicity-2 modes of the) theory exists 4 .
A frequency-dependent propagation speed can also arise in any scenario of gravity where the spectral dimension of spacetime changes with the probed scale. This scale-dependent behaviour of geometry is typical of a broad class of theories of quantum gravity [39][40][41][42][43][44] and is due to the presence of at least one fundamental scale in the texture of spacetime (see also [1,[45][46][47][48]; also we make some further comments in §E.1). The ensuing dispersion relation features a non-trivial mixing between time and momentum and leads to a mixed redshift-frequency dependence of c T (z, f ). Also, a frequency dependent GW speed arises in brane-world models motivated by string theory [49].
Lastly, we should mention that a massive graviton (or the related bigravity) scenario can lead to a frequency-dependent GW velocity, with interesting and testable consequences for GW waveforms (as first pointed out in [50]). We refer the reader to the recent [51], and references therein, for thorough analysis of this case.
Our aim in this work is to develop a general theoretical and numerical toolkit for quantifying the perspective of LISA to measure a frequency-dependent c T only through its effects on GW waveforms from merging massive black hole (MBH) binaries, without relying on specific modified gravity scenarios 5 . We implement two representative Ansätze for a frequencydependent GW propagation velocity. The first Ansatz is motivated from a perturbative expansion in powers of (f /f ), with f a fiducial frequency controlling the onset of deviations from GR. The second Ansatz describes scenarios with rapid changes in c T , which smoothly change from c T = 1 at small frequencies to c T = 1 at larger frequencies. For both Ansätze we derive how the GW waveforms are modified with respect to GR. The tools we develop, although applied to two representative scenarios, are very flexible, and can be used in future for testing any new theoretical models predicting transitory variations of c T as function of frequency.
We will show that LISA can obtain good constraints on both the GR and new parameters involved, even without electromagnetic (EM) counterparts. In fact, a major advantage of our work is that it does not rely on detection of unique EM counterparts for LISA sources. Whilst LISA standard sirens can serve as a further tool to test gravity (see e.g. [13,[56][57][58][59][60][61]), the rate of EM counterparts adds a further layer of uncertainty to that already coming from the massive black hole population models. Furthermore, constraints from standard sirens can only be obtained very close to or after the merger, when the sky localisation is good enough to narrow down candidate host galaxies. In principle, one can imagine the analysis we present here being performed on-the-fly as a system inspirals, as done for regular GR parameters in [62].
This work is organized as follows. §2 develops general theoretical considerations regarding the effects of a frequency-dependent c T (f ) on GW propagation and corresponding observables, and presents the two Ansätze for c T (f ) that will be used in our analysis. In §3 we carefully derive the expressions for the GW waveforms in this context. We make use of a Post Newtonian (PN) expansion for describing the inspiral phase, and we adapt the PhenomA waveform [63] and ppE approach of [64] to describe the merger and ringdown epochs. In §4 we discuss the GW data analysis tools we implement for our forecasts. We compare Fisher forecast techniques with Monte Carlo Markov chains, showing that a Fisher analysis is adequate in this context. §5 presents the Fisher forecasts: we derive the prospective constraints on GR parameters and our Ansätze parameters from GW detection of MBH binaries. §6 contains our conclusions, and it is followed by five technical appendixes. Appendix A and B collect details on the Fisher forecast analysis; appendix C contains some theoretical motivations on one of our Ansätze; appendix D is an analysis on the conditions to meet for recovering a luminal c T at high frequencies. Finally, appendix E discusses future directions for further extending and developing our results; moreover, it makes more explicit the relation among our parametrizations and the ppE framework.

Theoretical framework
We assume that the dynamics of GW at emission and detection is described by GR -possibly thanks to screening mechanisms, see. e.g. [65,66] for reviews (but see also [67] for a different point of view). Deviations from GR can occur during the propagation of GW through cosmological space-time from source to observation. We focus on exploring consequences of a frequency-dependent speed of GW propagation c T = c T (f ). Except in appendix E, this is the only modification that we will allow with respect to the standard propagation equations of GR. In this paper, we will be agnostic with respect to the origin of these deformations and will collectively refer to them as modified gravity. This term includes any model where the gravitational sector is altered with respect to GR, from purely ad hoc phenomenological models and EFT results to models embedded into, or at least motivated by, a fundamental, self-consistent, predictive theory (e.g. UV completion of existing low-energy scenarios, quantum gravity, emergent gravity). We start in §2.1 with general kinematic considerations on the consequences of a c T (f ) for GW observables. We then present in §2.2 two Ansätze for c T (f ) that will constitute benchmark scenarios for our analysis 6 .

Preliminary considerations
We assume that GW are massless, and propagate freely through a cosmological background from their source -an inspiralling binary -to detection. We consider the following quadratic action for the linearized transverse-traceless GW modes with M Pl the reduced Planck mass, andᾱ a dimensionless normalization constant that we will fix with appropriate physical considerations in what comes next. It is straightforward to prove that the linearized evolution equation obtained from eq. (2.1) describes a free GW, propagating through a cosmological space-time with arbitrary speed c T (f ). The frequency dependence of c T (f ) appearing in eq. (2.1) is physically interpreted as the frequency of GW as emitted by an inspiralling binary process. We can then make the hypothesis that f = f (t) with t related to the coalescence time (up to a constant shift). Hence all quantities in eq. (2.1) depend on time only. We do not need to make any further assumptions about the functional dependence of c T (f ) in this subsection. It is convenient to distinguish three notions of time for the system under consideration (see e.g. [68]): -Time t o as measured by ticks of a distant observer's clock -Time t s as measured by clock ticks near the source region (local wave zone) -Time t e when the signal is emitted (a cosmological time scale).
The frequency of GW at emission, f s , can be different from the frequency at detection, f o , due to both the expansion of the universe and to modified gravity effects. Let us study this phenomenon in the system at hand.
The action (2.1) describes a free GW travelling through a geodesics in a Friedmann-Lemaitre-Robertson-Walker (FRW) metric, characterized by a line element This is an effective metric which we use for describing the propagation of the GW [60]. In fact, denoting the associated metric tensorg µν , the Lagrangian density for a free spin-2 field propagating through it reads corresponding to the Lagrangian density in the integrand of eq. (2.1). With the help of eq. (2.2) we write comoving and physical distances as thus linking the present discussion with scenarios studied in [60]. and r GW phys (t) = a(t) c We make the hypothesis that, in proximity of the source, modified gravity effects have no time to develop, i.e.
We can use relation (2.5) to find how the typical time scale related with the evolution of the GW phase differs between source and at detection. For two signals emitted at the same physical distance, one has The relation between frequencies at source (f s ) and at detection (f o ), which scale as the inverse of time differences (f ∼ 1/∆t), reads where z = z e is the redshift of the source. Notice that, in the frequency regimes where c T (f ) is frequency-independent, we find which is the standard relation connecting frequencies at emission and at detection. In general, however, a frequency-dependent GW velocity requires to generalize eq. (2.11) to eq. (2.10).
It is convenient to define a dimensionless quantity ∆ that measures the deviation from the standard relation (2.11) for GWs propagating through cosmological distances: ∆ can be expressed as function of f s , or of f o , depending on which is more convenient. A value ∆ = 0 indicates that c T is a non-constant function of frequency. Using the parameter ∆, the clock ticks at source and observer are related by then integrating . (2.16) Simple manipulations lead to the equality or equivalently As an immediate, general application of the formulas we derived, we conclude this subsection by deriving an expression for the GW luminosity distance in scenarios with ∆ = 0, following the arguments of [60]. We call F the energy flux at observer position: where Area= 4π(r GW phys ) 2 . Then we introduce the luminosity at the source position, L: where (2.15) has been used. The luminosity distance d GW L is defined in terms of the following relation (2.21) Using these formulas, as well as relation (2.8) to connect comoving and physical distance, we obtain so the effects of a c T varying with frequency are contained in the dependence on ∆ as defined in (2.12). As we will learn in §3, the luminosity distance d GW L and other relations we derived here play an important role for characterizing the properties of the GW waveforms.

Two Ansätze for c T (f )
After the previous considerations, in this subsection we discuss two representative Ansätze for c T . They will represent our benchmark scenarios for the LISA forecasts developed in the next sections. In fact, after discussing the Ansatz functional forms, we briefly anticipate the level of constraints we will be able to obtain with LISA on the parameters characterizing them. Importantly, these Ansätze aim to discuss possible ways to parametrize deviations from c T = 1 around LISA frequencies, and are not built for automatically satisfying at the same time constraints on c T within ground-based frequency ranges. To do so, further corrections to their frequency dependence might be needed in the intermediate frequency band between LISA and ground-based experiments. We will comment on this point through the text, and above all in Appendix D.

Polynomial Ansatz
Inspired by the scale-dependent choice originally put forward in [69], our first model parameterizes c T (f ) as a polynomial in frequency: (2.23) Here n = 0 can be a positive or negative integer, β n is a set of parameters controlling deviations from the limiting speed c 0 , and f * is a fixed frequency scale controlling the onset of the deviations. In what follows we study both positive and negative values of n as separate cases. Note that, for simplicity, we do not allow β n to be function of time; this possibility will nevertheless be explored in appendix E. Notice that our Ansatz (2.23) includes more than one free parameter, hence it goes beyond the one-parameter parametrization proposed in [45] 7 . In the negative power (n < 0) case, it is natural to set c 0 = 1 such that consistency with GR is enforced at high frequencies. In the positive power (n > 0) case c 0 is the (unknown) low-frequency GW propagation speed. If c 0 < 1 (c 0 > 1) then the GW speed rises (falls) back towards the GR value of c for β n > 0 (β n < 0). This case turns out to be the mathematically simplest model we study; however, it implicitly requires that some additional physics terminates the power-law growth of c T between the LISA band and the band of ground-based detectors, again to maintain consistency with current bounds (see Appendix D). In the next subsection we will see an example of such additional physics in the context of an EFT-inspired model, for which eq.(2.23) is a low-frequency Taylor expansion.
In practical terms, we will see in §3.2.1 that values of c 0 = 1 only results in a rescaling of β n coefficients in the phase of the waveform. The only other change it brings is a scaling of the waveform amplitude by a factor of c 1/6 0 . As such, constraints on c 0 are largely nondegenerate with other parameters in our forecasts, and including it will only weaken them by a small fraction (see Appendix A for exact results). Therefore, without loss of generality, we will evaluate the constraining power of LISA using c 0 = 1 in both polynomial cases for simplicity. Our results remain valid for values of c 0 = 1 with negligible changes.
In both the positive-power and negative-power cases alike, we assume (f /f * ) sgn(n) to be a small quantity, allowing us to truncate our polynomial series for c T (f ) (assuming that the β n are not large enough to violate the validity of the expansion). We will see later that expanding c T (f ) up to quadratic order will prove sufficient to study the dominant corrections to the waveform that may be detectable with LISA. In our forecast in §5, we mainly consider MBH binaries with total masses between 10 4 and 10 7 M , as these generally give signal-tonoise ratio (SNR) > 10 in LISA (see Figure 11). The frequency range for these waveforms is between ∼ 10 −5 and ∼ 10 −1 Hz, so f * is required to stay outside this range in order for (f /f * ) sgn(n) to remain stmall. In addition, f * should be lower than the LIGO lower sensitivity bound of ∼ 10 Hz. Therefore the typical 'safe' values of f * we use in the positiveand negative-power cases are 2 Hz and 2 × 10 −7 Hz, respectively; in this context, safe means that the deviations from GR will remain small for any astrophysical system detectable by LISA. Values of f * within the LISA band can be considered, and will result in tighter parameter constraints, but also imply that some LISA systems could show non-perturbative departures from GR. Such non-perturbative effects lie beyond the scope of the current work. It is worth noting that constraints on eq. (2.23) are degenerate in β n /f n * and so constraints on β n can be translated from one f * to another (Appendix. D).
The negative-power case is arguably the most natural prescription of deviations from GR here, because at high frequencies c T /c → 1 without the need to invoke additional physics. However, the bounds on |c T /c − 1| from GW170817 are so impressively tight that they are hard to satisfy even in this model. Using the values of f * we discussed above, and assuming no finely-tuned cancellations between the n = −1 and n = −2 terms, formally we need |β 1 | 10 −4 to satisfy the existing bounds (β 2 remains virtually unconstrained). However, recognising that our power-law models would at best be only crude representations of the underlying physics, we do not apply the latter prior on β 1 in most of this work. In §5.3 we present results with only β 2 allowed to vary, which require no further assumptions to be consistent with GW170817.

An EFT-inspired Ansatz
The second parametrization we consider has the property of rapidly changing from a value of c T smaller than one at small frequencies to c T = 1 at high frequencies, as broadly motivated by the scenarios discussed in the Introduction (see Figure 1): The parametrization (2.24) is controlled by two free parameters: a fiducial frequency f around which c T changes rapidly, and a low-frequency speed c 0 with 0 < c 0 ≤ 1. Ansatz (2.24) is motivated by the analysis in [19] of an UV completion of a scalar field theory, where the scalar velocity depends on the energy, and smoothly (but rapidly) connects from c 0 to 1 as the energy increases. The transition from c 0 to unity occurs within a relatively small interval as the frequency increases; the width of the transition is not a free parameter and depends entirely on c 0 . See Appendix C for more details on theoretical characterization of this Ansatz and Appendix D for a discussion of its compatibility with the GW170817 bound.
Further model-dependent choices of c T with similar properties might be considered, and their consequences for LISA can be analyzed with the tools we develop in this work. The convenient, 2-parameter parametrization of (2.24) is in some sense a UV-complete version of the positive polynomial model described above. It already contains the highfrequency transition back to c T = 1 without further intervention. In fact, one can show that the positive-power polynomial case in equation (2.23) represents a low-energy Taylor expansion of equation (2.24) in the limit β 1 = 0 and β 2 = (1 − c 2 0 ) 2 /4c 0 . A frequency profile for c T (f ) as (2.24) implies that all the frequency-dependent effects studied in §2.1 occur in a relatively small frequency band centered around f . One can easily compute numerically the function ∆(f ), introduced in (2.12), which is the important quantity that controls the deviations from GR. We plot ∆(f ) in Figure 2 for representative choices of parameters. We notice that this function has a pronounced peak, whose maximal value ∆ max depends on c 0 , but also on the redshift z at which the GW source event occurs. To understand better how ∆(f ) evolves over the z − c 0 parameter space, we evaluate the amplitude and the position of the maximum of the function for redshifts log-uniformly distributed from 0.1 to 10, and values of c 0 uniformly distributed between 0.1 and 0.9, see Figure 3. We see that maximum deviation from GR occurs at frequencies of the order f and for small c 0 and large z, as expected. We numerically found a simple phenomenological fit relating ∆ max to c 0 and z that is valid up to large redshifts (z = 15): For more details on the expression above we refer the reader to Appendix C. This relation suggests that if we were able to measure with good precision deviations from GR induced by Ansatz (2.24), we might then be able to extract independent information on the redshift of the source, which might be helpful to build a Hubble diagram with GW sirens. We leave the exploration of this idea to future work.
The two parameters f and c 0 controlling the location and height of the transition (with c 0 = 1 corresponding to the GR case) can indeed be constrained very well with LISA. In §5 we forecast LISA capabilities to measure these quantities, and find that both parameters influence considerably GW waveforms. We conclude that for MBH binaries in specific mass ranges (around M tot ∼ 10 5 M ), the parameters f and c 0 characterizing Ansatz (2.24), can be constrained to a fractional error of order percent level or better, with respect to their fiducial values.

Waveform computation
In this section we compute how gravitational waveforms are modified in models where c T is a function of frequency, making use of the two Ansätze discussed in the previous section. Both the waveform amplitude and the phase are affected. We combine methods first introduced in [50] in the context of a massive graviton with tools motivated by the standard post-Newtonian approach to GW observables. We start in §3.1 and §3.2 by discussing how the waveform amplitude and phase are sensitive to a frequency-dependent c T , focussing on the inspiral epoch only. Then in §3.3 we take an additional step and consider extended gravitational waveforms that include also the merger and ringdown epochs. We adopt the frequencydomain PhenomA waveforms of [63], and follow similar lines to the ppE approach of [64] for the phase of a system.

GW luminosity distance and GW amplitude
As we learned in §2.1, eq. (2.22), when c T is function of frequency the GW luminosity distance is given by while the relation between frequencies at source and detection is where in the second equality we define f z = f s /(1 + z) as the redshifted frequency as in GR. We plot in Figure 4 the GW luminosity distance versus z in GR, the polynomial Ansatz and the EFT-inspired Ansatz respectively. The values of d GW L in the polynomial case are larger than in GR for positive values of parameters β 1 and β 2 (and vice-versa for negative β 1 and β 2 ). For the EFT-inspired case d GW L is suppressed with respect to its GR behaviour. The two helicities of the GW waveform for the binary compact object inspiral in Fourier space are given by (see e.g. [73]) where A(f ) is the amplitude of the waveform and Ψ(f ) the phase (to be discussed in the next section). ι is the inclination angle of the orbit relative to the line of sight. The GW amplitude in GR, without accounting for the redshift, is given by It is derived from the time-dependent GW amplitude using the stationary phase approximation in the Fourier transform of the waveform [68]. M s is the chirp mass of the binary system at the source, defined by M s = M tot η 3/5 , with M tot the binary total mass, η = m 1 m 2 /M tot the reduced mass parameter, and m 1 , m 2 the two component masses. Since the signal observed by the detector is redshifted, we rewrite the waveform using the redshifted chirp mass M z = (1 + z)M s , redshifted frequency f z = f s /(1 + z), and using 1/a(t s ) = (1 + z). The redshifted GW waveform amplitude is then given by In modified gravity, the quantities involved in GW propagation are not only scaled by redshift, but also scaled by c T (f o )/c T (f s ). Hence we define the observed chirp mass as We can replace the physical distance (1 + z)r com by d GW L using eq. (3.1), and replace M z by M o , so to finally obtain the modified GW amplitude as The amplitudes of the characteristic strains (defined by 2f o |h(f o )| [74]) in GR as well as the positive-and the negative-power polynomial cases are plotted in Figure 5, with exaggerated values of β 1 and β 2 . Also plotted is the effective sensitivity curve of LISA with angular averaging over the sky and the polarisation angle adopted from reference [75]. It shows that the modified amplitudes deviate from their GR equivalents as f o approaches f * . Note that the amplitudes in the figure extend to the merger and the ringdown phases using the PhenomA waveform, which we discuss in §3.3. Since f * for the positive and the negativepower polynomial cases are in opposite extrema of the LISA band, the modification effects are more manifest in systems with different total masses in the two cases. Lighter systems are preferred for detecting beyond Einstein models described by the positive-power polynimal Ansatz, and heavier systems for the negative-power polynomial Ansatz.

Phase
The phase of the GW during inspiral can be computed analytically using methods based on the Post-Newtonian (PN) expansion. We first set up the calculation using a general c T (f ), and then we specialise our results to the polynomial and EFT-inspired Ansätze described in §2. As the focus of our work is on GW propagation effects, we do not consider modifications to the physics of the merging process at the source position. As such, we expect the rate of change of GW frequency in the source frame to match that of GR. This is the starting point of our calculation, and is given by (we expand up to 2.5 PN order): where u is defined as and is frame-independent, while ψ k (k = 1, 1.5, 2, 2.5) are the PN phase parameters. In this work we specialise to non-spinning binary systems on circular orbits, recognising that parameters associated with spin/non-circular orbits will be included in a later analysis, for example following the methods of [76,77]. In this case, the PN coefficients read [78]:  where η = m 1 m 2 /(m 1 + m 2 ) 2 is the symmetric mass ratio. We include up to the 2.5 PN term, as this is dominant for the latest stage of inspiral phase we consider - Figure 6 shows this for GR and the negative polynomial Ansatz (we do not show the positive polynomial Ansatz as its deviations from GR are less pronounced). We verified that the 3 PN term remains subdominant in all our calculations.
terizing deviations from GR. Making use of formulas (2.17) and (2.18), we find Note that the mass appearing in the line above is now the redshifted chirp mass, and all references to source-frame quantities have been eliminated. The next step of the calculation is to integrate this expression twice, to find the time to coalescence and then the GW phase. At this point, we separate the discussion for the polynomial and EFT Ansätze.

Polynomial parametrization models
For the polynomial case only, we make an additional simplification by setting c T (f s ) = 1. LISA binaries are located inside galaxies, a region where existing observations [54,[79][80][81] constrain gravity to be very close to GR. We will coarsely model this behaviour by fixing c T to unity at the starting point of the GW extragalactic path as well. We do not attempt to model what happens when the GW exits or enters a galaxy, as our simple Ansatz in eq. (2.23) contains no environmental dependence. However, we assume that entrance to a screened region does not completely erase the accumulated beyond-GR changes to the signal 8 . Then eq. (3.15) becomes: Rearranging for t o and integrating over the observed frequency we have where t c and f c are the cutoff time and frequency that mark the end of the first inspiral phase. In this paper we take f c to be twice the frequency of the inner-most stable circular orbit (ISCO) [68]. To simplify the computation, we convert the integral eq. (3.17) to be with so that the integration of the time interval, when expressed with the help of the function ∆(f ), reduces to and we Taylor expanded the square bracket in eq. (3.17) to second order, since u 1. Under the stationary phase approximation [50], the phase of the gravitational waveform is then computed as where t c and Ψ c are nuisance parameters that mark the time and the phase at the end of the inspiral phase. Note that the integration in the phase above depends on c T (u). At first glance, the expansion of c T (f ) in eq. (2.23) is not easily converted to u, due to the appearance of c T itself in eq. (3.10). However, we can justify that the following form for c T (u) is equivalent to c T (f ) up to n = 2: where we define a new fixed scale u * ≡ πM z f * . Consider the following quantity where we have expanded 1/c T (f ) since |c T (f ) − 1| 1. Then eq. (3.22) can be expanded to be Notice that eq. (3.25) has precisely the same form as eq. (2.23) up to n = 2, simply with a redefinition of one of the coefficients. Hence we will use c T (u) from here on; we will drop the tilde onβ 2 since it is operationally equivalent to the non-tilde quantity.
Following similar arguments, we can replace the denominator in eq. (3.21) as: We expand eq. (3.21) a final time, and carry out the integration. We arrive at the modified inspiral phase of the GW waveform, for the positive-power case: We have dropped the terms where the order of u is higher than 2.5 PN order in the curly bracket. This is since we only expand up to 2.5 PN, and higher-order terms will be integrated into the 3 PN term. We have verified that the new 3 PN term is still insignificant. Note that only the 1.5 PN term is modified from GR by the appearance of β 1 .
If we use the setup of c T (f ) with the additional parameter c 0 as in eq. (2.23), we can pull the factor c 0 out by and then rename β 1 /c 0 and β 2 /c 0 as the new β 1 and β 2 , so that c 0 just works as a scaling of β 1 and β 2 . The factor c 0 outside the square bracket in eq. (3.28) is then cancelled by 1/c T (u) in eq. (3.19), and following the same derivation in eq.
in eq.(3.21) remains unchanged as well. Hence the phase is actually unchanged up to a redefinition of β 1 and β 2 . Repeating our steps in the negative-power polynomial case, we find a significantly lengthier expression: In this case, negative PN terms are generated, but no terms higher than 2.5 PN appear. We drop terms lower than −3 PN order, to keep the lowest order the same as in c T . Recall that   (blue) for binaries with different total masses at z = 1. We use exaggerated values β 1 = β 2 = 20 for the positive-power case and β 1 = β 2 = 100 for the negative-power case to visualize the modified gravity effect. We use f * = 2 Hz for the positive-power case, and f * = 2 × 10 −7 Hz for the negative-power one.
for this model, u * is chosen below the LISA band, i.e., it is a small number. Terms with order of u below −3 PN also come with higher powers of u * , so that they are suppressed.
The modified early inspiral phase is dominated by the 0 PN term, and the late inspiral phase is dominated by the 2.5 PN terms, as in GR in the positive-power polynomial case. The scenario is the same in the negative-power case when β 1 and β 2 are small. However, the -1.5 PN and the -3 PN terms take over the early inspiral stage when β 1 and β 2 are large, as shown in Figure 6. We plot the total phases of the inspiral waveform for different total masses in both polynomial cases in Figure 7, with exaggerated β 1 and β 2 , and appropriate f * as in Figure 5. Like Figure 5, Figure 7 shows deviations of the modified phases from their GR correspondences as f o approaches f * . The deviations are larger for lighter binary systems in the positive-power case, and heavier binary systems in the negative-power case.
A feature this calculation highlights is that a power-law departure from c T = c leads to a strong modification of gravity. The PN phase expansion operates naturally in powers of f 1/3 , so corrections proportional to f effectively 'jump' three PN orders at once. It is for this reason that Ψ pos is only minimally modified, and the corrections are rapidly pushed beyond f c .
However, as we will see in §5, detectability of these effects is determined by trade-off in number of binary orbits and SNR. High PN order corrections only dominate the GW phase for a very short period of time (few orbits), but they are also the regime in which the greatest SNR is accrued. For the negative power-law case, one needs to measure hundred (thousands) of orbits over a timescale of months (years) to detect modifications to a noisy signal.

EFT-inspired model
In the EFT Ansatz, c T (f s ) can be less than one at low frequency (i.e. we do not impose GR propagation near the source), so we need to work on the more general expression in eq. (3.15). The integration of the time interval becomes The phase is then computed as : We evaluate the derivatives of ∆ with respect to f o numerically, and compute the phase by numerically integrating eqs. (3.30) and (3.31). The amplitudes and the phases for systems with different total masses in the EFT Ansatz with f * = 5 × 10 −4 Hz are plotted in Figure 8.
Recall that in this model, f * sets the position of the rapid growth of c T (f ). We notice that at frequencies much higher than f * , both amplitudes and phases are the same as in GR. The modified gravity effects start to become manifest when the observed frequency approaches f * , resulting in a different c T at the source and observer. The modified amplitudes show constant offsets from their GR equivalences at low frequencies much smaller than f * . This is because the comoving distance is modified by a factor of c T (f ), and c T (f ) c 0 when f f * , so that the amplitude is suppressed by a factor of 1/c 0 at low frequencies. The modified phase for the high mass system seems equivalent to GR values, but actually the deviation from GR of the phase does not vanish at low frequencies. The weakening of the deviation in the figure is caused by the fact that the deviation becomes less significant compared to the large values of the phase at low frequencies.

IMR extension
The inspiral waveform we have discussed above starts to become invalid above f c ∼ 2f ISCO . In GR, extended template waveforms that include the complete Inspiral-Merger-Ringdown (IMR) phases can be obtained from numerical studies of binary black hole mergers. Here we lay out an approximate analytic prescription for adding the merger and ringdown to the inspiral waveforms obtained in the previous section. Note that this is possible because we are not modifying the intrinsic strong-field dynamics of the source, but only modulations that affect its propagation. However, our treatment involves some degree of approximation in the merger phase, the full extent of which can only be tested with dedicated with numerical relativity simulations. For this reason, in §5 we will present results using both the inspiral-only and full IMR waveform described below; these can be considered conservative and optimistic versions of our analysis, respectively.
We adopt a modified version of the frequency-domain PhenomA waveform from Ajith et al. [63]. As our work neglects component spins, we do not require a more sophisticated waveform such as PhenomD [83]. By fitting to a suite of numerical relativity simulations, the amplitude of the PhenomA waveform in GR is constructed piecewise as:

33)
A ring (f ) = CωL(f, f ring , σ), (3.34) where the prefactor is and The Lorentzian function in the ringdown phase reads The ends of inspiral, merger and ringdown phase are marked by f merg , f ring and f cut respectively. Their values, along with the value of σ, are computed via expressions of the form f k = (a k η 2 + b k η + c k )/πM tot , with the values of a k , b k and c k are read from Table I in [63].
One can verify the continuity of the above amplitudes across the IMR phase boundaries. We have verified that our computations in §3.1 & §3.2 match the inspiral section of the PhenomA template when c T = 1, up to 2.5 PN order. Since we are studying the effects of modified GW propagation, we assume that the PhenomA waveform in GR is valid at the source. This implies the modification of the inspiral amplitude we derived in §3.1 -which accounts for effects of varying c T during the propagation to the observer -can be applied to all three pieces of the PhenomA amplitude. This allows us to replace in eqs. (3.32)- (3.34). This produces the amplitude of the modified PhenomA template we use in our analysis. For the phase: we already have the modified phase for the inspiral from our previous computations in §3.2. For the merger part of the signal, there is no straightforward prescription for how to adapt the GR PhenomA phase to our modified scenario. Hence, we will discard the merger phase of the PhenomA and replace it as follows.
We will expand the phase as a power series in the frequency variable u = πM o f o . This step is analogous to what is done in the Parameterized Post-Einsteinian (ppE) framework [1,64], which describes deviations from GR in the waveform (see §E.5 for further details on ppE). For prompt mergers, it is sufficient to truncate the series at linear order [64]: wheret c andΨ c are new constants. These are determined by enforcing the continuity of the the phase and its derivative at f merg , that is, Ψ ins (f merg ) = Ψ merg (f merg ). Explicitly this fixes the barred quantities tō Hence the merger phase is fully determined by consistency with our inspiral calcuations. We will set the phase during the ringdown epoch to zero, as is done in ppE [64] (modelling of quasinormal modes during ringdown lies beyond the scope of this work). Since the modified PhenomA waveform allows us to use the full GW signal, the SNR of our detections will increase. In addition, more cycles in the late inspiral epoch, and a merger era in the phase are included. These help tighten the constraints on our modified gravity parameters. We expect the additional amplitude information during the ringdown to be only weakly constraining.

GW data analysis
We employ data analysis techniques in order to find constraints both on GR quantities and on modified gravity parameters impacting the amplitude and phase of the waveform. In this section, we give a brief review of the process of measuring GWs and how to constrain parameters controlling sources and GW propagation. We start the discussion in general terms (i.e., we will not focus on any particular detector). Then, in order to specialize our analysis to the case of LISA, we choose the noise model and a particular set of sources.
Since in the following sections we are interested in producing Fisher forecasts on a given set of parameters, we will conclude this section by comparing this approach to a Monte Carlo Markov Chain (MCMC) exploration of the parameter space. By construction, Fisher forecasts give the correct constraints on the model parameters assuming that the likelihood is Gaussian around the best fit, and that injected values are exactly recovered by the parameter estimation procedure. Since these assumptions are not trivially satisfied (for example one necessary condition for these to be true is for the signal to be sufficiently loud) and the Fisher approach can be misleading [84], we perform this test to validate results of our Fisher analysis. In particular, we are able to define approximate thresholds (corresponding to identifying regions of the parameter space) where we can assume our Fisher forecasts to be trustworthy. Readers eager to see the constraints themselves could skip ahead to §5.
Let us consider a single detector data stream which has already been cleaned from glitches and from other non-stationary effects beyond the GW signal we are interested in measuring. Under these assumptions, the data stream d(t) in time domain can be expressed as a combination of signal s(t) and noise n(t) as d(t) = s(t) + n(t). By considering a data segment of length T (the observation time can be chosen appropriately for each source we will try to resolve), we can then perform a Fourier transform 9 to express the data in frequency domain as (4.1) Let us proceed by assuming that the noise is Gaussian and with zero mean. If n(t) is also stationary, then the ensemble average of its Fourier modesñ(f ) obeys where N (f ) is the single-sided noise power spectrum 10 and, likeñ(f ), it has dimension Hz −1 . N (f ) is real and positive and, since d(t) is real, it obeys N (f ) = N (−f ). While the noise is a stochastic variable with zero mean and some variance, if a resolvable GW signal is present we have d (f ) = s(f ) . This means that we can write a likelihood defined using the standard matched filtering techniques, to describe the noise residuals as wheres th (f, θ) denotes the theoretical model for the signal (which at this level still contains both the waveform and the detector response function) which depends on a set of parameters θ, and (a|b) denotes the noise weighted inner product: Since in the following we will not be interested in estimating the source sky localization parameters, we can use a sky-averaged detector response function 11 ; for LISA, this contributes an overall factor of ∼ 10/3 to the strain sensitivity curve, see [73]. After factoring this averaged detector response function out of the data (which are redefined asd i ), we can define our likelihood as whereh th (f, θ) is the theoretical model for the GW waveform, and the inner product is now weighted with the detector strain sensitivity S n (f ). It is straightforward to show that the parameters θ 0 that maximize this likelihood are defined by solving: We proceed by introducing the Fisher matrix, which provides a Gaussian approximation of the likelihood around its maximum, which is defined as: As customary, the confidence intervals on the model parameters θ can be drawn from the covariance matrix C lk which is obtained by inverting F lk .
We only consider a subset of the parameters which are expected to significantly impact the GW waveform. In particular, we focus on five parameters common with GR: ln η, ln M z , ln z, t c and Ψ c defined as in §3, meaning we will not consider the sky localization, the inclination nor the orientation of the binary as well as the spins of the two objects. Beyond these five parameters, we have the a set of modified gravity parameters θ MG , which can either represent the β 1 and β 2 defined as in eq. (3.22) or alternatively the f * , c 0 of eq. (2.24), so that our full parameter vector is: In the following, we will restrict our analysis to the case of LISA, so that the noise power spectra and the strain sensitivity can be taken for example from [75,85,86]. Since we are using a sky-averaged response function, we can also consider static arms lengths only and and do not require orbital information. We will adopt a procedure similar to the one employed in Sec. 2.4 of [75]; we will use an effective combination of the three TDI channels 12 in the AET basis (see for example [87]), where the noise is diagonal, defined as: Notice that this is formally equivalent to combining three likelihoods with the form given in eq. (4.5). We are assuming here that measurements in the three independent channels do not break any degeneracies among the signal or noise parameters. This is not strictly true, as in general the three detectors have different angular responses, and this could help to break degeneracies in, for example, sky localization. As a consequence, our assumption can be seen as a pessimistic one, and the constraints could be improved in a fully consistent, though more complicated, analysis which is postponed to future works on this topic. We conclude this section by presenting a direct comparison between the constraints on θ obtained from F lk and the ones obtained by directly evaluating eq. (4.5). This comparison is carried out for two sources: a high SNR case, where the two approaches are expected to match exactly and a low SNR one, where the real constraints obtained from the MCMC sampling of eq. (4.5) are expected to start diverging from the Fisher ones. In both cases, we start by estimating the in-band time using [68] T = fc where f i is the initial frequency (i.e., the smallest frequency in the LISA band at which the source emits) and f c is the cutoff frequency. In practice, since most of the SNR comes from the high-frequency part of the signal, we cut the low-frequency part of the GW spectrum by choosing f i so that T 10 days 13 . We then generate a Gaussian realization of the noise on top of which we inject the signal, given by h th (f, θ), to get thed to be used to evaluate eq. (4.5). The sampling of the parameter space is performed using Polychord [88,89] via its interface with Cobaya [90].
In Figure 9 we show the comparison between our Fisher forecast and a full MCMC sampling of the parameter space for a loud source with SNR 1020. It is manifest that for this event the two approaches lead to very similar results, confirming the validity of the Fisher approximations for similarly loud signals. On the other hand, in Figure 10, we compare the Fisher forecasts with the MCMC constraints for an event with SNR 42. Since in this case the event is much fainter, the validity of the Fisher approximation starts to break. As is clearly visible from the 1-dimensional marginalized constraints, the order of magnitude of the forecasted error bars are still accurate for all the parameters. However, we notice some deviations between the Fisher analysis and the real structure of the parameter space. This can be appreciated for example from displacement between the injected parameters (i.e. the centers of the Fisher ellipses) and the best fit values recovered through the MCMC procedure. Nevertheless, since the recovered values are always 2σ-compatible with the injected values, these displacements should not be interpreted as problematic. Events with similar SNR should be considered as thresholds to assume the Fisher forecasts are sufficiently robust.

Forecasts
After confirming that Fisher forecasts give satisfactory results for the parameters of interest, we focus on a Fisher analysis in what follows. We will present forecasts for how well a fouryear LISA mission can constrain both the polynomial and EFT-inspired Ansätze for c T (f ) described in §2.2, both with a single MBH merger ( §5.1) and a population of MBH mergers ( §5.3). 13 The main reason for this choice is that reducing the total observation time corresponds to reducing the frequency resolution. In practice this leads to a faster evaluation of the waveform and thus of the likelihood. Naively one might expect that the best constraints on our modified gravity parameters will be obtained from systems with the highest total SNR. Figure 11 displays SNR contours for LISA detections within GR in terms of MBH total mass and redshift, using only the inspiral portion of the signal (left panel) and full inspiral-merger-ringdown signal (right panel). We notice that systems between 10 5 and 10 7 M provide the highest SNR detections in both cases. We will show that these are not necessarily the optimal systems for bounding c T (f ), due to the frequency-dependent nature of our corrections.

Polynomial model
We consider MBH binary mergers with total masses between 10 4 and 10 7 M , at redshifts of z = 1, 2, 3 for each mass. We aim to understand the effect of total mass and SNR on the parameter constraints. The component masses in these binary systems are equal, so η = 0.25. We perform Fisher matrix analysis with the modified inspiral waveform, using 30 days of observation time. Although some sources will be detectable earlier than this (particularly when re-processing the data post-merger), the SNR of typical sources starts to rise steadily above ∼ 10 at ∼ 30 days before merger, and accumulates most of its final value after this [62].
As with the phase computation of §3.2.1, an advantage of our polynomial Ansatz is that it allows us fully analytical calculations. For this model we compute analytic derivatives of the waveform with respect to the parameters; we then verified our results with numerical derivatives. When computing the Fisher matrix, we put flat priors on t c in (−50, 50), Ψ c in (−π, π), β 1 in (−20, 20), and β 2 in (−1000, 1000). As per the discussion of §2.2, we fix f * = 2 Hz for the positive-power case, and f * = 2 × 10 −7 Hz for the negative-power case. However, we note again that the forecasted constraints on β n can be translated to other values of f * (Appendix D). Given the prior on β 2 and the values of f * , we find that the frequency range of the waveform needs to be lower than 6 × 10 −2 Hz for the positive-power case, and higher than 6 × 10 −6 Hz for the negative-power case, so that the perturbation β 2 (f /f * ) 2 < 1. This will cut off a part of the late-inspiral waveform for the cases with M tot = 10 4 M and smaller, which weakens the constraints in low-mass cases. The fiducial values we use for the modified gravity parameters are the GR values of β 1 = β 2 = 0. The fiducial nuisance parameters are set to be t c = Ψ c = 0. Instead of constraining the value of η, M z and z, we constrain their fractional errors in order to avoid incomparable magnitudes among the waveform derivatives that cause large off-diagonality in the Fisher matrix.
For the positive power model, we show the forecast constraint contours for the cases with total mass of 10 5 M at different redshifts in Figure 12. As expected, the constraints are weaker for higher redshift due to an overall lowering of the SNR. Our results reflect a common difficulty of this analysis [50], namely that some of the parameters in the modified waveform are highly degenerate, e.g. the parameter pairs (ln η, Ψ c ), (ln η, β 1 ) and (Ψ c , β 1 ). A possible reason for this result is that, apart from z, all other parameters are contained in the phase, so they are highly correlated with oneanother.
In addition to this plot, the constraints on parameters for all the models using the inspiral waveform in GR, positive and negative-power polynomial cases are listed in the tables in Appendix A. A comparison with the GR case shows that the presence of β 1 and β 2 weakens the constraints on the GR parameters. We find that the constraints are controlled by the SNR and the total mass of the system. Some GR parameters tend to be better constrained when the SNR is higher, such as η, z and Ψ c . But the constraints on M z and t c are tighter for systems with lower masses. This is expected since signals from lower mass systems stay longer in the LISA band, so that more inspiral cycles are available for constraining the parameters (see Figure 13).
The modified gravity parameters β 1 and β 2 are special, in the sense that they are better constrained when the signals extend to frequencies closer to f * . This means that the positive-power case is best constrained by systems with M tot < 10 5 M . Note also that these systems have the final stages of their inspiral -where the modified PN terms of §3.2 are most significant -around the peak sensitivity region of the LISA power spectral density (PSD). The left panel of Figure 14 shows the marginalised constraints on β 1 for both positive-and negative-power polynomial cases as a function of {M tot , z}, with SNR overlaid in red. In particular for the positive power case, we notice that the shape of the SNR and σ β 1 contours are considerably different.
For the negative-power case, the constraint contours on β 1 in Figure 14 (lower left panel) show greater similarity to the SNR contours. Here we expect the greatest deviation from GR to manifest in the heaviest systems, but this is compensated for by the rapidly rising PSD  [62]). We count cycles from the time the system exceeds SNR=8 until the end of the inspiral phase, which we take to be at f = 2f ISCO .
(and hence decreasing SNR) at low frequencies.
In general we learn that β 2 is challenging to constrain; this is not unexpected, given it represents a second-order correction to c T (f ). For the positive-power case using only the inspiral waveform ( Table 5 in Appendix A), the prior on β 2 is saturated for the M tot = 10 6 M cases, indicating that we fail to obtain a meaningful constraint. Hence in right panels of Figure 14 we show only results that include the PhenomA merger-ringdown extension (see next subsection).
We stress that the constraints obtained are quite sensitive to the value of f * . Here we have adopted a maximally conservative approach, by setting f * completely outside the LISA band for both positive-and negative-power polynomial cases. Alternatively, for a given {M tot , z}, f * can adopt any value provided that (f /f * ) n << 1, such the expansion used in §3 remains valid. An example of this can be seen in Figure 9, where a lower value of f * = 0.1 Hz was used; an improvement of one and two orders of magnitude results for the constraints on β 1 and β 2 , respectively.
Finally, the degeneracies between the parameters analysed in this section could be reduced by replacing the highly correlated parameters with new ones, for example treating the complete amplitude as a single parameter, so that parameters in the phase will not correlate to the amplitude. However, significant correlations between the phase parameters would likely remain, and we would lose the ability to estimate ∆z -which could be significant for associating the merger to a host galaxy.

EFT-inspired Model
We now move on to Fisher forecasts for the EFT case, where the beyond Einstein parameters of interest are f * and c 0 . Recall that these parameters control the location and height of the transition seen in Figure 1, with c 0 = 1 corresponding to the GR case (no transition).
Since the phase of the waveform in the EFT case is computed numerically, we also numerically compute the derivatives of the waveform used in the Fisher matrix. We use c 0 = 0.99 instead of c 0 = 1 for the fiducial model, because the numerical derivatives with respect to c 0 and f * become unstable when c 0 = 1. This is easily understood, as when there is no transition f * has no effect on the waveform, and becomes impossible to constrain. For  Figure 15 shows the width of ∆(f o ) is quite large compared to the inspiral range of a LISA binary, meaning our detections are likely to probe only one side of the peak in ∆. We show the constraints for the EFT-inspired model with total mass of 10 5 M at different redshifts in Figure 16. We see that the constraints on GR parameters are roughly as tight as the ones in the polynomial case. However, we now obtain tight constraints on the modified gravity parameters c 0 and f * (compared to the fairly weak constraints on β 1 and β 2 in the polynomial models). This greater sensitivity likely comes from i) having the deviations from GR strongest in the mid-inspiral phase, where both the number of cycles and SNR accumulation are reasonable, and ii) having both parameters play comparable roles in modifying the waveform (compared to the polynomial case where β 2 is significantly subdominant to β 1 in the LISA band).  Figure 17 shows the contours of constraints on c 0 − 1 and fractional constraints on f * in the {M tot , z} plane. The lowest total mass we consider here is 10 3.5 M , because the numerical derivatives with respect to c 0 and f * become unstable for systems with lower total masses. This can be explained by the fact that lower-mass systems evolve into frequencies much higher than f * , at which the effects of the transition in c T are very small -this makes the derivatives with respect to the beyond Einstein parameters noisy. We can see from the plots that the tightest constraints on c 0 and f * are found by systems with M tot ∼ 10 4.5 − 10 5 M . This is likely because our fiducial choice of f * is located within the early inspiral stage of such systems, so that most of the transition of c T from 1 to c 0 takes place within the detectable signal. For lower-mass systems, only a small portion of their inspiral waveform is affected by the c T transition.
On the other hand, for more massive systems, f * is above their inspiral frequency ranges, so that similarly most of their inspiral is also not affected by the transition of c T . Similarly, varying the values of f * also has an effect on the constraints on c 0 and f * . Taking the system with M tot = 10 6 M and z = 1 as an example, the fiducial value f * = 3 × 10 −4 Hz sits in the middle of its inspiral waveform, as shown in Figure 8. For a much larger or much smaller f * , such as order of 10 −2 or 10 −6 Hz, the constraints are worsened by 4 orders of magnitude and 1 order of magnitude respectively. The reason is that the transition of c T (f ) takes place at frequencies higher or lower than the inspiral waveform, and thus little information is obtained from the waveform to constrain c 0 and f * .
Given the constraints presented in §5, one may ask what is the corresponding error on the propagation speed c T (f ). In reality this is a source-, ansatz-and frequency-dependent statement. However, as an example, we consider an optimal source (using constraints 10 −8 . Whilst not as tight at the bound from GW170817 and its counterpart, we stress that result uses data from an entirely distinct regime, and does not rely on the presence of any EM signals.

Connection to other data
We note here that if an electromagnetic counterpart for at least one LISA event is unambiguously observed, in principle this can inform the prior on c 0 in the EFT-style model. A time-of-flight measurement, similar that performed with GW170817, could yield constraints of order |c 0 | 10 −10 . Although the electromagnetic counterpart may appear considerably delayed after the merger (e.g. a month, relative to seconds for a BNS merger), this is counteracted by the greater propagation distance of LISA sources. In this eventuality, the tight bound on c 0 would effectively slice through the ellipse contours in Figure 16. Due to the correlation with Ψ c and t c , this would lead to improved constraints in all parameters. The constraint on f * would also improve, though potentially not as significantly as Figure 16 might suggest; this is because as c 0 → 1 (step height in c T is decreased), the location of f * becomes harder to measure. A major stumbling block in this improved method is that it will likely be hard to associate electromagnetic signatures to MBH mergers with a high degree of confidence, due to both the extended delay before their appearance, and the intrinsic electromagnetic variability of galaxies themselves. Furthermore, due to the separation between the merger itself and the emitting gas, e.g. in a circumbinary disk, an EM counterpart may not be highly luminous. A thorough review of possible MBH counterpart mechanisms can be found in [91]. Given these uncertainties, we present constraints without assuming any prior information from a LISA EM counterpart.

Inclusion of ringdown-merger signal
For models where beyond Einstein effects grow with frequency, we naturally expect that including the merger and ringdown phases of the waveform will enhance our constraining power. For cases where the opposite is true (like the negative-power polynomial model) one might guess that such additions will be irrelevant. In fact this turns out not to be correct: including the merger and ringdown still tightens constraints on the GR parameters, and due to the correlations between parameters, this leads to a mild improvement in β 1 and β 2 . Given that the constraints on the EFT-inspired model are already strong from the inspiral alone, we will focus our attention here solely on the polynomial models.
We perform the Fisher forecast on the same sets of systems as in §5.1, but with the modified PhenomA waveform of §3.3. We present the full constraints in Appendix A, in com-parison with the constraints from only the inspiral waveform that ends at 2f ISCO . It is clear that the merger and ringdown phases increase the SNR values (and hence tighten constraints for all parameters), especially for intermediate and heavy mass systems which otherwise have only a short inspiral track in the LISA band. The constraint on β 2 is considerably improved in the positive power case for all systems, except for a 10 6 M binary at z = 2, 3.
We also place the Fisher forecast contours with the modified PhenomA waveform on top of those with the inspiral waveform for systems of 10 5 and 10 6 M at z = 1 in Figure 18. We can see that the inclusion of merger and ringdown phases shrinks the ellipse contours, and the effect is greater for the heavier mass system. Most striking is that many ellipses change their orientation for different total masses, and some even flip their signs. In Appendix B we present a deeper investigation into this phenomenon. The short summary of this is that the merger and ringdown phases add dominant contributions to the integrands that yield the entries of the Fisher matrix. The parameter dependency of these new contributions can be different from the PN expansion of the inspiral, and hence the marginalised ellipses get re-oriented as per Figure 18.
The righthand panels of Figure 14 show the constraints on β 2 with the PhenomA waveform as a function of {M tot , z}. No prior bounds on β 1 and β 2 are put in the Fisher forecast for these plots. We see that the constraints on β 2 in the negative-power case remain worse than in the positive-power case at most points, with a preference for higher-mass systems whose inspirals start closest to f * (=2 × 10 −7 Hz in this case). Now that all four panels of Figure 14 have been introduced, we can consider how the areas of best constraint relate to expectations for MBHB merger rates. The three starshaped markers in the plots mark the locations of peak merger rates predicted by three different population models from [92,93] (see those works for detailed descriptions). The PopIII model predicts that the MBH merger rate peaks at low mass and high redshift, while the Q3-nodelay (Q3-nod) and Q3-delay (Q3-d) models predict it peaks at intermediate mass and low/intermediate redshifts. The best candidates for our constraints are events of M tot = 10 4 − 10 5 M for the positive-power case, since they have good constraints and relatively high SNR. Unfortunately they are quite far from the peak of MBHB merger density for any population model. The Q3-nod and Q3-d models favour heavier MBHBs, which also lie in a region of high SNR -hence these may favour constraints on the negative-power case. That said, there is much more information underlying these MBHB population models than can be summarised through a single peak point. In the next section we estimate the constraints that could be obtained by combining multiple merger observations in each population model.
As a final exploration of the extended IMR constraints, we investigate a negative power case in which only β 2 = 0. This case is special since it naturally fits within the constraint from GW170817, without the need to invoke further physics. We show the corresponding constraint contours of β 2 with the PhenomA waveform in Figure 19. In general, the constraints on β 2 are tighter than the ones in which both β 1 and β 2 are varied, by around one order of magnitude. However, they still remain fairly weak, indicating that strongly-evolving deviations from GR will be challenging to detect at low frequencies, even with LISA.

Multiple sources
The constraints obtained in the previous sections are derived from single event detections; often these were picked to be relatively ideal sources, as determined by Figures 14 and 19. In reality, during the full LISA mission duration we will receive a sample of events with some mass and redshift distribution, as determined by the physics of the MBH population [92][93][94]. When considering a population of sources, the merger parameters are independent, whilst the beyond Einstein parameters will be common to all events 14 . As a consequence, the constraints on θ MG obtained from all these events could be combined to significantly improve their determination. Let us briefly outline the procedure employed to generate realizations of the mass and redshift distribution of MBHBs for the different models (popIII, Q3-delay and Q3-nodelay), discussed in [93], which correspond to different seeds and evolution for the MBH population. Taking the catalogues from [95], we obtained 15 the mass and redshift distribution for the three population models. Given the histograms, we proceed by defining a smooth interpolation which can be used as a probability distribution function, and hence can be sampled to generate a new catalogue. The number of events in each realization of these catalogues is set by the integral of the merger rate 16 for the three catalogues of [95] over the three effective years of the LISA mission (corresponding to 4 years with 75% efficiency). In Figure 20 we show the event distributions in the z-log 10 (M tot /M ) plane after averaging over 50 realizations for each population. The bin sizes for both z and log 10 (M tot /M ) are chosen to be 0.25. The total number of events in the three cases are respectively 511, 24 and 356 events.
To find the systems which will contribute the most to the combined constraint, we select the bins of each population that sit above the SNR= 10 contour, and have at least one event expected per bin. Since the Q3-delay population model cannot satisfy the last criteria (every bin has an expectation value < 1), we discard this case. For the two remaining models, we compute the Fisher matrix for N bin combined events in each cell of interest and invert it to obtain the covariance matrix for that cell (note this is is just the single-event covariance with errors scaled by 1/ √ N bin .) We then remove all columns except for those corresponding to  θ MG and invert again to obtain the marginalised Fisher matrix for just the beyond Einstein parameters. We sum the marginalised Fisher matrices from each cell and finally invert again to obtain the combined covariance matrix for θ MG from all the selected cells 17 . Table 1 shows the results of this process for β 1 and β 2 of the polynomial models. We see that for β 1 in both the positive-and negative-power cases, the combined constraint is approximately equal to the best-constrained region of non-vanishing size in Figure 14. In essence, this means that having a population of typical GW LISA mergers is roughly equivalent to a single 'golden' event in the ideal part of the {M tot , z} parameter space.
The picture for β 2 is a little more mixed, with PopIII model performing well in the positive-power case, and the Q3-nod model favouring the negative-power case. We see that the PopIII model, which generally produces lighter MBHs, yields good constraints on the positive-power model -this is precisely in line with the discussion of §5.1.
Of course, in reality we will have to work with whatever population of MBH mergers Nature gives us. If it closely resembles the Q3-delay model, for example, we will be dependent on a rare golden system to carry out the constraints forecast in this work. However, it is reassuring to see that in most cases our method has some robustness against realistic population models. Hence tests of gravity at low frequency can be carried out with LISA in (almost) any scenario.

Conclusions
The development of cosmological modified gravity theories has shown that infrared departures from GR are theoretically possible. The clearest demonstration of this is screening effects, where departures from GR manifest on large scales -a weak-field, low-density arena -whilst being strongly suppressed in other regimes (see [65,96,97] for reviews). At the same time, deviations of the propagation speed of gravitational waves are a common signature of new gravitational physics. As such, it is clear that the value of c T should be probed at low energy scales, independently of existing constraints at higher frequencies.
That said, the current tests of gravity from ground-based detectors are a force to be reckoned with. We find it is not simple to construct a function for c T (f ) which satisfies the LIGO-Virgo bounds whilst modifying the millihertz regime significantly. Sharp transitions for c T (f ) are needed in the frequency band between LISA and LIGO frequencies, to ensure consistency with the results from GW170817. Future theoretical work will be needed to explore more sophisticated models for c T (f ), built from first principles, that do not rely on this workaround.
Nevertheless, our work has established a theoretical and numerical toolkit for exploring the detectability of modified GW propagation with LISA. We implemented two Ansätze for frequency-dependent GW propagation speed, and computed the resulting modifications to the GW amplitude and (non-spinning) phase at 2.5PN order. The first Ansatz proposed departures of the GW propagation speed as a polynomial series in frequency for c T , in which the powers can be positive or negative. The second Ansatz represented a smooth transition in c T from some lower value to c 0 , taking place inside or close to the LISA band. We then performed a Fisher matrix analysis to forecast the constraints on five GR parameters and two modified gravity parameters. We compared the Fisher forecast with MCMC inference and found good agreement between them for the forecast parameter bounds, even for signals of comparatively low SNR.
Our use of inspiral-only and a full IMR waveform represent analyses with different theoretical assumptions. If considering departures from GR, one may wish to allow for the strong-field regime itself to be modified as well; then using a (modified) PhenomA waveform, which derives from GR simulations, is not appropriate. Our inspiral-only ( §5.1) results represent this conservative case. However, if one is confident that the strong-field regime is identical to GR (the screened case), then our approach allows the continuation of GW propagation effects into the merger and ringdown regime. Our results using the full waveform in §5.3 represent this more optimistic case. We used here a simple IMR waveform (PhenomA); this should be extended to more sophisticated, spinning waveforms for use with real data. Figures 14 and 17 represent the major results of our work, showing how the constraining power of LISA for our c T (f ) models is sensitive to the total mass and redshift of a MBH system. These raise the possibility that a single 'golden' source may be as useful a population of less optimal systems. However, this statement clearly depends on the expected rate of MBH mergers, which is still poorly known. This sensitivity to the underlying MBH population increases further if redshift-dependent or cumulative corrections to GW propagation are considered. In this work we focused exclusively on the frequency dependence of c T ; as a result, our constraints are (unsurprisingly) always tightest from low-redshift sources. If instead the beyond Einstein effects accumulated with propagation distances -as happens for some modified gravity models (see e.g. [60,98] and references therein) -then the redshift location of peaks in Figure 20 would also play a role in determining the constraints. For these reasons, and in view of future analyses, we developed in appendix E formulae that extend the discussion of the main text to include non-standard friction effects in the GW propagation.
Our method in this work has been distinctively different from that used to measure the propagation speed of GWs with event GW170817. We do not rely on the presence of an EM counterpart: for long-duration sources our analysis could be applied on-the-fly months or years before merger. This may open the possibility of multiband analyses for some sources, as considered in, e.g. [99][100][101][102]. This is not the first time modified propagation effects on the GW phase and amplitude have been computed. The ppE framework [64] is a well-established formalism that shares many of the goals of this work. In fact, ppE is sufficiently general to include distinct modifications at each PN order of the phase, and can also encapsulate departures from the GR generation of GWs (not just propagation effects, as in the present work). The price paid for this powerful generality is an increased number of modified gravity parameters, such that these are usually varied and constrained one by one (see [103] for recent discussion). By focussing on a modification to c T alone, our work effectively links amplitude and various PN phase terms to vary in concert, creating a distinct signal. A mapping between our beyond Einstein parameters and those of ppE is discussed in appendix E.5.
The rate of ground-based GW detections will continue to rise sharply over the next decade, leading to tight constraints on gravity at the frequency of terrestrial detectors (or very exciting new results in gravitational physics). Nevertheless, LISA has a crucial role to play by opening the door to the unexplored millihertz GW regime. In this work we have developed the first tools for probing new phenomenology we may find there. Motivated by the direction of current theoretical ideas, this represents the first step of a continuing program to explore frequency-dependent effects in GW cosmology.

Acknowledgments
It is a pleasure to thank David Bacon, Enrico Barausse, Enis Belgacem, Emilio Bellini, Jose Maria Ezquiaga, Stefano Foffa, Noemi Frusciante, Michele Maggiore, Nicola Tamanini, Filippo Vernizzi, and Miguel Zumalacarregui for useful discussions. We also thank the present and past LISA Cosmology Working Group Chairs -Robert Caldwell, Chiara Caprini, Germano Nardini, Marco Peloso, Nicola Tamanini -for their support. We thank Aurélien Hees for acting as internal referee within the LISA Consortium, as well as Nelson Chistensen for his help within the LISA PPC. T.

A Fisher forecasts for all models
In this appendix we present the parameter constraints for all the models we considered in this work. The Fisher matrix analysis for the polynomial case uses flat priors on t c in (−50, 50), Ψ c in (−π, π), β 1 in (−20, 20), and β 2 in (−1000, 1000). The positive-power cases use f * = 2 Hz, while the negative-power cases use f * = 2 × 10 −7 Hz. The length of the signal is 30 days.        The constraints for the EFT-inspired case are shown below. We apply flat priors on t c in (−50, 50) and Ψ c in (−π, π). We do not apply prior bounds on c 0 and f * , since the constraints are already strong.

B Behaviour of Fisher integrands
In this appendix we plot the Fisher matrix integrands for the positive-power case with M tot = 10 5 and 10 6 M in Figure 21. These assist with analysing the features seen in Fisher matrix ellipse contours, for example, the rotations of different cases. The integrand is defined as where θ i is the parameter of interest. The overall factor of f is included so that the Fisher matrix element is computed by integrating the integrand in log f space. Figure 21 shows that the integrands for different pairs of parameters.For the parameter pairs (ln z, t c ) and (ln z, Ψ c ) the integrand is effectively zero, with only a small contribution arising from numerical noise and imperfect numerical derivatives. This is because z only features in the amplitude of the waveform, while t c and Ψ c are only contained in the phase. Therefore we have and similarly for (ln z, Ψ c ).
In Figure 21 we observe that the shapes of the integrand can change substantially with the mass of the system. We further see that including the merger-ringdown part of the signal (dashed lines) can add a significant extra contribution, and in some cases may dominate the final integral that enters the Fisher matrix. The oscillatory features seen at high frequencies are due to resonances in the LISA PSD, and the sharp endpoints come from the merger-ringdown model visible in Figure 5. These substantial changes are responsible for the contour variations in seen in Figure  18. Extending the integrand in the IMR case results in larger Fisher matrix elements, and hence correspondingly tighter forecast ellipses relative to the inspiral-only cases.
In general the areas under the green curves (M tot = 10 6 M ) are less than the areas under the blue curves (M tot = 10 5 M case; this results in the correlation coefficient |ρ| being smaller for the more massive system, and hence generally less diagonal Fisher ellipses. This occurs for the parameter pairs of (ln η, ln M z ), (ln η, β 2 ), (t c , β 1 ), (t c , β 2 ), (Ψ c , β 1 ), (Ψ c , β 2 ) and (β 1 , β 2 ). However, because the Fisher matrix has sizeable off-diagonal elements, it's inverse (the covariance) is not always trivial to predict; this could be the reason the behaviour described above is not universal.

C Theoretical motivations for the EFT Ansatz
We are motivated by the arguments of [19]: suppose there exists a scalar theory valid up to a strong coupling scale Λ, with new physics (e.g extra degrees of freedom) entering at the scale M ≤ Λ. Let us assume a homogeneous scalar background φ 0 (t) that spontaneously breaks Lorentz invariance, φ 0 (t) = α Λ t, parameterised with a constant parameter α (although it may be mildly time-dependent, with |α/α| ≤ H). The spontaneous breaking of Lorentz invariance typically leads to a scalar speed different to that of light. We consider for example the partial UV completion of Eq. (6) in [19]. It leads to a dispersion relation, The propagation speed is defined through the dispersion relation Therefore (C.1) leads to a scalar speed given by Although motivated by scalar theories, we adopt this expression in the tensor case for simplicity. Here showing consistency with GR at large cases. Rewriting tensor speed in (C.3) in terms of frequency (f ≡ 2π k), one obtains as presented in (2.24). We can analytically compute the slope of the speed There exists an inflection point at which is an increasing function of c 0 . At the inflection point the slope of c T (f ) is maximal, resulting which is a decreasing function of c 0 . To investigate deviations from GR with the above speed profile, we compute the dimensionless quantity ∆ defined in (2.13). For a given c 0 , the maximum GR deviation depends on redshift We find that parameters A and n both decrease linearly with c 0 . We perform least squares polynomial fits to obtain the expression in (2.25). These results hint at possible independent redshift mapping of GW sources, provided we can accurately estimate ∆ max .
D Recovering a luminal c T at high frequencies In comparison, in §5, we found for the positive and negative powers that for the PopIII and Q3-nod cases, respectively. Hence, if the functional forms are maintained to LIGO scales, these constraints are weaker than that of GW170817. In the case of the EFT-inspired c T (f ) function (2.24), we note that for f * 10 Hz, c T (f ) reduces in the LIGO band to a negative power law with n = −1 and β 1 = 2(1 − c 2 0 )/2 ≈ √ 1 − c 0 for c 0 ≈ 1. Thus, in the LIGO band, our constraint from §5 can roughly be interpreted as which is also weaker than eq. (D.1). Observable modifications introduced with the functional forms of c T (f ) in eqs. (2.23) and (2.24) can thus not be suppressed efficiently enough to satisfy the GW170817 bound. However, higher-order corrections may in principle kick in to suppress the remaining deviations in the LIGO band. We shall briefly inspect here some requirements on the functional forms of c T (f ) that a more complete UV description of a theory should satisfy to remain observable in the LISA band while remaining compatible with the LIGO constraint. For this purpose we shall consider a power-law and exponential suppression of the tensor sound speed of the forms respectively. The parametersf * and p shall be chosen such that c T (f ), given by eqs. (2.23) or (2.24), is valid in the LISA band and (c − c T )/c < 10 −15 for LIGO. For simplicity, we shall focus only on the EFT Ansatz, which in the high-frequency limit can however also be interpreted in terms of a n = −1 power-law Ansatz. Figure 22 shows how a deviation of (c − c 0 )/c = 10 −4 in eq. (2.24) with f * = 3 × 10 −4 Hz, motivated by our forecasts in §5, can be efficiently suppressed with eqs. (D.5) or (D.6) in the LIGO band. Particularly, we find that for an exponential or power-law suppression with p 1 /2 or p 2, our forecasts remain valid for a potential signature detectable in the LISA band that is hidden to LIGO.

E Future directions: general parametrization of GW propagation
In §2.2, we motivated a frequency-dependent group velocity c T (f ) from the fact that, in many models of modified gravity (including quantum gravity), the modification of the dispersion relation can be written as a modified dispersion relation ω 2 − k 2 → F (ω, k) = 0. There, we assumed that all the time-or redshift-dependence of c T was implicit in the frequency f . Now we relax that assumption and consider a non-trivial function c T (z, f ) of the redshift and the frequency, also including a non-trivial modification to the cosmological friction term. In fact, modified theories of gravity usually predict modifications not only the propagation speed, but also the cosmological friction experienced by GWs as well as their luminosity distance. While redshift-dependent deformations of the friction and of the luminosity distance have already been considered in the literature for all these aspects of GW propagation (see [60] and references therein), the case of frequency-dependent of mixed redshift-frequency modifications is almost virgin territory (see [104,105] for scenarios motivating this possibility). Here, we will take some steps in this direction, generalizing the parametrization of [69] to a polynomial parametrization including all these corrections.

E.1 Equation of motion and amplitude
Theories of modified gravity can deform the evolution equation of tensor modes in a variety of ways. Here we are especially interested to new contributions that depend on the GW momentum. A possible structure for the modified GW evolution equation in Fourier space is (omitting the GW polarization index) where primes denote derivatives of conformal time dη = dt/a and H = a /a. Eq. (E.1) describes propagating massless modes in scenarios with modifications of both the cosmological friction term and of the graviton dispersion relation. The momentum-dependent modifications of the graviton dispersion relations are controlled by the quantity Γ β (η, k) in eq. (E.1). Such function contributes to the GW speed as which generalizes eq. (2.23). The momentum-dependent modifications of the friction term is controlled by the quantity Γ α (η, k) and generalizes the Mukhanov-Sasaki equation stemming from (2.1). Its homogeneous part, often denoted as −δ(η) [60], is well studied and is expected, for example, in scenarios with non-minimal couplings between scalars and gravity. The momentum-dependent part of Γ α has received much less attention, especially for what respects its consequences for modified GW waveforms. A time-and momentum-dependent Γ α (η, k) could be motivated by graviton decay into dark energy, Lorentz-violation models and quantum-gravity scenarios such as non-commutative spacetimes. However, while it is not especially difficult to build models with infrared deformations from classical modified gravity scenarios, it is more challenging to obtain them from theories of quantum gravity, where the main effects are expected to happen at very short, UV scales [106,107]. In particular, non-commutative spacetimes do modify the propagation of GWs at the non-commutativity scale characterizing time-space uncertainty. In fact, it is possible to re-interpret the corrections to the GW equation of motion (E.1) in terms of an effective, scale-dependent scale factor as it appears in non-commutative settings (Appendix E.6). However, the non-commutative scale is O(l Pl ) and the effect on the propagation of GWs at cosmological scales is totally negligible. For instance, the deviation from the GR luminosity distance is ∼ O(10 −120 ) [107]. A possibility could be to devise a non-commutative model where the uncertainty scale is the Hubble radius H −1 , so that corrections could take place at infrared distances. We will not make an attempt to build a viable and robust model along this line, since our approach here will be mainly phenomenological.
It is not difficult to formally solve the evolution equation at short wavelengths much smaller than the horizon size λ H −1 , using, e.g., the techniques of [60,69]. One finds that the mode function h(η, k) reads where the integration constants are chosen such to match with the GW mode at emission when η = η e . The terms in the first line control how the GW amplitude is modified with respect to the one at emission by the GW expansion (notice the scale factor at the denominator) and by modifications of gravity. The term in the second line controls the evolution of the GW phase, which is sensitive to c T . Importantly, to derive this solution we did not have to make any assumption on the momentum dependence of the functions Γ α , Γ β : these functions could have rich momentum-dependent profiles with sudden changes, and transitorily large first derivatives as a function of the GW frequency. We do not make any explicit assumption on screening mechanisms and the value of c T nearby the source but, in order to satisfy the stringent GW170817 constraints, we assume that the speed of GWs is equal to one at the frequencies f gb ∼ 10 Hz of ground-based detectors such as LIGO-Virgo: while at much lower frequencies it can be different from one.

E.2 Parametrizations for Γ α and Γ β
The momentum dependence in Γ α and Γ β eventually translates to corrections in physical observables and a choice of parametrization is needed. One can decide to parametrize directly the observables, but from a theoretical as well as a model-building point of view it may be more convenient to start with a parametrization for Γ α and Γ β and then work one's way through the observables. We will follow this route. Nishizawa [69] proposed a momentum-dependent parametrization given by constant coefficients and positive powers of momentum. In our notation and extending this parametrization to, redshift-or time-dependent coefficients α n , where k * is a reference scale that we can take to be the typical frequency f * = k * /(2π) to which the detector is sensitive (in analogy with CMB parametrizations via a pivot scale). In where we used z o = 0. Plugging (E. 10), one has where higher-order coefficients can be calculated explicitly. For our purposes, it will be sufficient to expand up to the next-to-leading term in the observed frequency. For the series of positive powers, while for the series of negative powers one has . (E.14) These coefficients simplify if we assume (E.8): .
From now on, we ignore PN corrections but the reader should keep in mind that the full expressions to be compared with simulated or data should include these terms and may display degeneracies.

E.3 Luminosity distance
In terms of Γ α and Γ β , the GW luminosity distance reads ( §2.1): where we set a(t o ) = 1 and f = f o is the frequency as measured by the observer. When Γ α (z, f ) = −δ(z) and Γ β = 0, we get the very same result of [60], while when Γ α = 0 we obtain eq. (3.1). The parametrization (E.18) in terms of Γ α,β can be further manipulated without loss of generality under the assumption that corrections to general relativity are small. To make notation more compact and connect with the observable studied in [60] for standard sirens, we also assume that the propagation of photons is unmodified, so that the optical luminosity distance is the usual d EM L (z) = (1 + z) r com , r com := tem dt a(t) .
(E. 19) In particular, the ratio between the comoving distance of a GW source and its optical counterpart is r GW com r com = 1 + Γ β = 1 + which stems from the modified dispersion relation The case p = 1/2 is less trivial: so that there is a frequency dependence at the next-to-leading order: and one recovers the case (E.28). In order to remove this degeneracy between models at the theoretical level, one has to calculate the higher-order coefficients γ 2 and b 2 but, in practice, at the experimental level the two models are indistinguishable.

E.4 Phase
Let us see how the perturbative approach can be used to evaluate the phase in eq. (3.20). For example, we assume that the time-dependence c T at present times is very mild, and negligible. Eq. (3.15) at lowest order in the PN expansion is 5π M 2 s (1 + z) 2 . (E.39) Inverting this expression and integrating yields the formal relation for the frequency dependence of the time as measured by the observer: Notice that the coefficients of the leading correction in u ±1 vanish identically in the approximation of redshift independence, where γ 0 = 1, γ ±1 = 0, β 0 = 0 and β 1 = const. However, this does not imply that the z-dependent case introduces extra terms in the u-expansion with respect to the simplest case analyzed in this paper. In fact, it is easy to convince oneself that switching on redshift dependence only changes the coefficients of the expansion, but not the expansion itself. Comparing, for instance, eqs. (3.19) and (E.42) for the positive-power case at zero order in the PN expansion and using eqs.

E.5 Comparison with the ppE framework
We conclude comparing our polynomial expansion with the coefficients of the Parametrized Post-Einsteinian (ppE) framework [1,64]. In the latter case, the waveform is written as where the phase is parametrized as Ψ ppE (u) = 3 128 where we stopped the series at the 2.5 PN level and the parameters of the expansion are the coefficients φ ppE i . Although the amplitude can also be expanded in terms of the frequency [7,69], we focus here on the comparison of the phase, as for our analysis this holds information on the most parameters. A mapping of the amplitude coefficients between the positive-power case and the ppE formalism can be found in [69] (where u ∝ f 1/3 ). Comparing eqs. (E.48) and (E.49) with (E.51), we can map our expansion into the ppE one. The coefficients φ ± i (z) can be read off from eqs. (E.46) and (E.47) and their higher-order generalization.
A difference with respect to the ppE framework is in the expansion itself, which is carried out in multiples of u (n−5)/3 in the ppE case at any order. This expansion is the same as in the positive-power case but only up to the n = 4 term. The n = 5 term has, in our case, an extra ln u factor. Therefore, the degeneracy between the two expansions is broken at the 2.5 PN approximation. In the negative-power case, the difference lies in all the negative-power terms of the tower.
Third, our coefficients are redshift-dependent while usually the φ ppE i are not, although in this paper we focused on z-independent cases for simplicity.
Fourth, it is clear that, modulo the discrepancy in the powers, our expansion and the ppE one are physically equivalent because they both encode beyond-Einstein-gravity effects as a perturbative frequency expansion. However, while the starting point of the ppE formalism is the waveform (E.50), ours is the Mukhanov-Sasaki equation (E.1), where modifications to GR appear in the friction term and in the propagation speed of GWs. Therefore, while the end result in terms of waveform-related, perturbatively expanded observables is essentially the same, from our perspective we are able to connect these corrections more directly to physical models, where the input from the theory appears at the level of the GW action and propagation equation. This is also the reason why the basic expansion for us is in terms of f and is later recast in terms of the less fundamental but much more convenient variable u, while the ppE framework is already cast in terms of u. As we have seen, however, this extra step on our side is not particularly difficult because the f and u expansions differ by negligible terms.
Along the same lines, we note that in our positive-power expansion the n = 1 term is always zero, while in the ppE framework this term is as free as the others. This is an important characterization of our expansion because it is a consequence of implementing a simple Ansatz (a polynomial of positive powers) to a basic object such as the propagation equation (E.1) of GWs. In this respect, our expansion could be thought as unveiling the fundamental structure of the theory more directly than the ppE framework. However, in the negative-power case the n = 1 term is non-vanishing, a reminder that the "prediction" φ + 1 = 0 strongly depends on how the frequency expansion of the propagation speed is defined. Fifth, it may be worth mentioning a departure point not intrinsic to our formalism but, rather, in the practical way we handled it. Typically, ppE parameters are constrained one at a time, since varying all at once produces in very weak constraints. In contrast, here we varied multiple parameters together without marginalizing. where a k = a k (η) is an effective scale factor that depends both on time and on momentum. Then, eq. (E.3) simplifies to h(η, k) = h(η e , k) a k (η e ) c T (η e , k) a k (η) c T (η, k) exp i k η ηe dη c T (η, k) . (E.53) The advantage of using a k would be about interpretation. On one hand, it can be seen as an effective time-and momentum-dependent effective mass M 2 k (η) = M 2 Pl a 2 k in the action for the perturbation mode. In quantum gravity as well as in particular in non-commutative models where time and space coordinates obey an uncertainty relation [108], one is usually able to write down a four-dimensional action of the following form, where τ is a time parameter not necessarily equal to conformal time η: The equation of motion for the Mukhanov-Sasaki variable u k = a k h k is where the function a k (τ ) depends on the theory or model. On the other hand, a momentum-dependent scale factor admits a neat physical interpretation if we extend the separate-universe approach [109] to the case of a propagating gravitational wave. In this case, at the linear order in perturbation theory one regards the wave as a local deformation of a FRW spacetime, so that 1 − 1/a k is the redshift measured along the trajectory of the wave.
F Constraints on modified GW dispersion relations, following [1] Yunes et al [1] develop an argument for obtaining a rough bound on modified GW dispersion relations, focussing on the GW frequencies of ground-based GW detectors. The same argument can be generalised, and applied to space-based detectors as LISA. In this brief Appendix, we make use of the argument of [1] to derive rough bounds on some of the parameters controlling deviations from c T = 1, as they appear in our polynomial Ansatz 2.23.
We start briefly discussing the general idea behind the method of [1]. Then, we apply the argument to the case of LISA, and translate it to special cases of our polynomial Ansatz of eq (2.23). The starting point of [1] is to introduce an Ansatz of for the modified dispersion relation of GW: which depends on two free parameters A and α. A modified dispersion as (F.1) affects the speed of GW: its deviation from the speed of light can be parameterised in terms of the quantity δ g as where, as in the main text, c T is the GW speed.
We used the fact that the GW energy E appearing in eq (F.1) is related to its frequency by with h the Planck constant, expressed as h = 4.1 × 10 −13 eV 100Hz . (F.4) The idea of [1] is to make use 18 of existing bounds on δ g for deriving bounds on A as a function of α. [1] was published before the GW170817 event, and makes use of existing bounds on the graviton mass [1] as starting point. Here, we make use of the GW170817 bound on δ g from the delay between EM and GW signals δ delay g ≤ 10 −15 . (F.5) This bound applies to frequencies of ground-based detectors, around 100 Hz, and corresponds to the choice α = 2 in the expression (F.2). Inverting eq (F.2), we can write the equality Substituting the value of h from eq (F.4), and the bound on δ g in eq (F.5), we obtain by extrapolation the following upper bound on the parameter A: for the quantity A as a function of α.
While the arguments of [1] discussed so far are designed for the parameterisation (F.1) of the modified GW dispersion relations, they can be translated with some assumptions into bounds on the dimensionless parameters β i of the parametrization in eq (2.23) of the main text. For example, let us focus on the case of positive power n = 2. Hence, we switch on the parameter β 2 in eq (2.23), and assume c 0 = 1 and β i = 0 for i = 2. We expect very stringent bounds on β 2 . In fact, the quantity δ g now reduces to (F.9) Comparing eqs (F.9) and (F.2), we find they coincide when choosing α = 4, when identifying β 2 ≡ (3/2) A f 2 h 2 , with f a pivot scale higher than the LISA band, say f = 1 Hz. We can then use eq (F.8) to derive an upper bound on the dimensionless quantity β 2 as