Keck Integral-Field Spectroscopy of M87 Reveals an Intrinsically Triaxial Galaxy and a Revised Black Hole Mass

The three-dimensional intrinsic shape of a galaxy and the mass of the central supermassive black hole provide key insight into the galaxy's growth history over cosmic time. Standard assumptions of a spherical or axisymmetric shape can be simplistic and can bias the black hole mass inferred from the motions of stars within a galaxy. Here we present spatially-resolved stellar kinematics of M87 over a two-dimensional $250\mbox{$^{\prime\prime}$} \times 300\mbox{$^{\prime\prime}$}$ contiguous field covering a radial range of 50 pc-12 kpc from integral-field spectroscopic observations at the Keck II Telescope. From about 5 kpc and outward, we detect a prominent 25 $\mathrm{km~s}^{-1}$ rotational pattern, in which the kinematic axis (connecting the maximal receding and approaching velocities) is $40^\circ$ misaligned with the photometric major axis of M87. The rotational amplitude and misalignment angle both decrease in the inner 5 kpc. Such misaligned and twisted velocity fields are a hallmark of triaxiality, indicating that M87 is not an axisymmetrically shaped galaxy. Triaxial Schwarzschild orbit modeling with more than 4000 observational constraints enabled us to determine simultaneously the shape and mass parameters. The models incorporate a radially declining profile for the stellar mass-to-light ratio suggested by stellar population studies. We find that M87 is strongly triaxial, with ratios of $p=0.845$ for the middle-to-long principal axes and $q=0.722$ for the short-to-long principal axes, and determine the black hole mass to be $(5.37^{+0.37}_{-0.25}\pm 0.22)\times 10^9 M_\odot$, where the second error indicates the systematic uncertainty associated with the distance to M87.


INTRODUCTION
Some of the earliest dynamical evidence for the presence of a supermassive black hole (SMBH) came from M87 (Sargent et al. 1978). A bright asymmetric ring of radio emission around the M87 SMBH was imaged in 2019 (Event Horizon Telescope Collaboration et al. 2019a). The black hole mass (M BH ) inferred from the ring features is consistent with the value determined from stellar dynamics based on axisymmetric orbit modeling ), but it is nearly twice the mass inferred from dynamics of a gas disk around the hole (Walsh et al. 2013).
M87 is classified as an elliptical galaxy based on the two-dimensional shape of the stellar light projected on the sky. However, its three-dimensional intrinsic shape has never been determined. A galaxy's intrinsic shape is a fundamental property that encodes the galaxy's past merger history and provides information about the mass ratios of the progenitor galaxies, the merger orbital parameters, gas fractions, and fraction of stars formed exsitu. Whether a galaxy is intrinsically spherical, axisymmetric, or triaxial also impacts dynamical determinations of its SMBH mass and stellar mass, as well as any mass reconstructions based on the method of gravitational lensing.
Thus far, almost all information about galaxy intrinsic shapes has been inferred statistically by inverting distributions of observed galaxy properties (Franx et al. 1991;Weijmans et al. 2014;Foster et al. 2017;Ene et al. 2018; arXiv:2302.07884v2  Li et al. 2018). Here we use the Keck Cosmic Web Imager (KCWI; Morrissey et al. 2018) on the 10 m Keck II telescope to obtain a spatially-resolved two-dimensional map of the stellar kinematics of M87 over a 250 × 300 field of view. The resulting kinematics span a radial range of ∼0. 6-150 , corresponding to a physical range of 50 pc-12 kpc at a distance of 16.8 ± 0.7 Mpc to M87, the value adopted in Event Horizon Telescope Collaboration et al. (2019b) and in this work (an angular size of 1 corresponds to a physical length of 81.1±3.3 pc). We perform triaxial Schwarzschild orbit modeling using the detailed stellar kinematic measurements as constraints to determine M87's shape and mass parameters. Our models include a radially declining profile for the stellar mass-to-light ratio (M * /L) inferred from stellar population measurements (Sarzi et al. 2018).

KECK OBSERVATIONS OF M87
We observed M87 with Keck KCWI in May 2020, May 2021, March 2022, and April 2022. With the large slicer and BL grating of the integral-field unit (IFU), we obtained spectra between 3500 and 5600Å at 62 pointings, which provide contiguous two-dimensional spatial coverage of the nucleus and the outer parts of M87 (Figure 1). The data span about 20 kpc (250 ) across the photometric major axis (−25 • east of north) and about 24 kpc (300 ) across the photometric minor axis (−115 • east of north).
We co-add spectra from individual KCWI spaxels to reach high signal-to-noise ratios (S/Ns), forming 461 spatial bins. Within each spatial aperture, we measure the line-of-sight stellar velocity distributions (LOSVDs) from the shapes of the absorption lines. Further details about the observations, data reduction procedures, spectral fitting processes, and stellar kinematic determination are provided in Appendices A and B.

Misalignment between kinematic and photometric axes
The KCWI map for the line-of-sight velocity V (left panel of Figure 1) shows a prominent rotational pattern at large radii, in which the northeast side of the galaxy is blueshifted and the southwest side is redshifted. The kinematic axis that connects the maximal receding and approaching velocities, however, is not aligned with the photometric major axis, as it would be for an axisymmetric rotating galaxy.
To quantify the amplitude and axis of rotation, we model the velocity field as a cosine function, with  The amplitude of rotation, V1(R), increases with radius and reaches 25 km s −1 around 6 kpc. (Lower right) The phase of the velocity function, Θ0, measures the orientation of the kinematic axis and varies significantly with radius. It plateaus to −165 • beyond 6 kpc, indicating a 40 • misalignment between the kinematic axis and the photometric major axis (red dashed curves; Kormendy et al. 2009) where R is the projected radius from the galaxy's center and Θ is the azimuthal angle on the sky. The model parameters V 1 (R) and Θ 0 (R) are the the amplitude of rotation and the position angle (PA) of the kinematic axis at radius R, respectively. With increasing radius, the velocity curve shows a systematic shift in phase and an increase in rotational amplitude (Figure 2). Within a radius of 3 kpc, the PA of the kinematic axis changes rapidly clockwise with radius (lower right panel of Figure 2), representing the kinematically distinct core mapped out by the Multi Unit Spectroscopic Explorer (MUSE) on the Very Large Telescope . Beyond 3 kpc, where the MUSE data end (at about 35 ), we find that the PA of the kinematic axis continues to change clockwise and crosses the PA of the photometric minor axis, plateau-ing at −165 • between 6 and 12 kpc. Hence, there is a 40 • misalignment between the stellar kinematic axis and photometric major axis in M87.

Stellar velocity dispersion
The KCWI map (right panel of Figure 1) and radial profile (Figure 3) of the stellar velocity dispersion σ exhibit several features. Towards the center of M87, σ increases rapidly from 250 km s −1 at a radius of 2 kpc to 370 km s −1 at 100 pc from the nucleus. This is a clear signature of the gravitational influence of the central black hole on the motions of the stars in its vicinity. The velocity dispersion stays at about 250 km s −1 between 2 and 5 kpc and then shows a gentle 10% decline between 5 kpc and the outermost reach of our data at 12 kpc. The stellar σ at the edge of our field connects   Sarzi et al. 2018; yellow and orange respectively), while the MUSE values are 10-20 km s −1 larger than KCWI between 1 and 3 kpc. At 4.5 kpc, our KCWI measurements match the single data point (red) from an independent KCWI observation (Forbes et al. 2020). The VIRUS-P values (Murphy et al. 2011;grey), which were used in the axisymmetric stellar-dynamical measurement of the M87 black hole , are 30-50 km s −1 higher than all other measurements. Murphy et al. (2011) had noted a similar offset between their values and earlier IFU measurements (Emsellem et al. 2004) in the inner 2 kpc. (Bottom) Red globular clusters have similar σ (red) as stars and appear to belong to M87's stellar halo (Zhang et al. 2015), whereas the intra-cluster component of planetary nebulae have sharply rising σ (Longobardi et al. 2018;green). smoothly to the latest determinations of the velocity dispersions of discrete dynamical tracers (lower panel of Figure 3) such as red globular clusters and planetary nebulae in the outer parts of M87 (Zhang et al. 2015;Longobardi et al. 2018). Beyond about 10 kpc, subpopulations of planetary nebulae have been reported to have distinct kinematics (Longobardi et al. 2018): σ of "intra-cluster" planetary nebulae rises to 800 km s −1 at 100 kpc, whereas those in the galaxy halo component have a relatively flat σ profile out to 100 kpc, similar to that of the red population of globular clusters (Côté et al. 2001;Strader et al. 2011;Zhang et al. 2015).

DETERMINATION OF MASS AND SHAPE PARAMETERS FROM TRIAXIAL SCHWARZSCHILD MODELING
We use the full LOSVDs from Keck KCWI, along with photometric observations from the Hubble Space Telescope (HST) and ground-based telescopes (Kormendy et al. 2009), to measure M87's mass distribution and intrinsic shape. We perform triaxial Schwarzschild orbit modeling with the TriOS code (Quenneville et al. 2021(Quenneville et al. , 2022 based on an earlier code (van den Bosch et al. 2008), and use more than 4000 observational constraints to simultaneously determine six parameters: M BH , M * /L, dark matter content, and the threedimensional intrinsic shape. As described below, we implement a new capability in the code to model spatial variations in M * /L and use a radially declining M * /L profile that closely approximates the variation inferred from stellar population and dynamics studies of M87 (Oldham & Auger 2018;Sarzi et al. 2018).

Galaxy model and orbit sampling
Each galaxy model has three mass components: a central SMBH, stars, and a dark matter halo. The threedimensional stellar density in the TriOS code is represented as a sum of multiple Gaussian functions of differing widths and axial ratios. To determine these functions, we first fit a two-dimensional Multi-Gaussian Expansion (MGE; Cappellari 2002) to the surface brightness distribution of M87 (see Appendix C). Each MGE component is allowed an independent flattening parameter (q in Table 2) to model any radially changing ellipticity observed on the sky.
For a given set of three angles, θ, φ, and ψ, that relate the intrinsic and projected coordinate systems of a galaxy (Binney 1985), we deproject each MGE component, multiply by a radially varying M * /L (see below), and add the deprojected Gaussians to obtain the three-dimensional stellar density. Each deprojected MGE component can have its own axis ratios p, q, and u, where p = b/a is the intrinsic middle-to-long axis ratio, q = c/a is the intrinsic short-to-long axis ratio, and u is the apparent-to-intrinsic long axis ratio. When the best-fit p, q, and u are quoted below, each value is luminosity averaged over the MGE components. Further details of the relations between the apparent and intrinsic shape parameters and the deprojections can be found in Section 2 of Quenneville et al. (2022).
The M * /L we use to obtain the stellar density varies radially, following a logistic curve given by where δ is the ratio of the inner and outer M * /L, and R 0 and k parameterize the location and sharpness of the transition. We choose δ = 2.5, R 0 = 10 , and k = 2, which together well approximate ( Figure D1) the spatial M * /L profile of M87 determined from Sarzi et al. (2018). We leave the overall normalization-the outer M * /L-as a free parameter. A similar form as Equation (1) was used in an axisymmetric Jeans dynamical study of M87 globular cluster and stellar kinematics data (Oldham & Auger 2018). We implement this spatial variation in our models by choosing distinct M * /L ratios for each component of the MGE such that the profile is reproduced. The dark matter halo is described by a generalized Navarro-Frenk-White density profile (Navarro et al. 1996) ρ where ρ 0 is the density scale factor and r s is the scale radius. This form of the dark matter halo is used by Li et al. (2020) when fitting axisymmetric Jeans models to M87 globular cluster and stellar kinematics data. They determine that r s = 15.7 +2.3 −2.0 kpc for the cored γ = 0 model but find no significant preference for γ = 0 over γ = 1. Oldham & Auger (2016), on the other hand, find a strong preference for flat cores with γ 0.13. We have tested models with γ = 0, 0.5, and 1, and find that the models with a γ = 0 halo are a better description of the data, with the goodness of fit (χ 2 ) lower by at least 100. We therefore adopt the flat core, γ = 0 dark matter halo. Since the KCWI stellar kinematics extend to a projected radius of 12 kpc, we expect r s and ρ 0 to be quite degenerate; we choose to fix r s = 15 kpc and keep ρ 0 as a free parameter in the models.
For each galaxy model, we compute the trajectories of a library of around 500,000 stellar orbits that sample 120 values of energy, 54 and 27 values of the second integral of motion for the loop and box orbit libraries, and 27 values of the third integral of motion over logarithmically spaced radii from 0. 01 to 316 . The loop and box orbits are integrated for 2000 and 200 dynamical times, respectively. We project the stellar orbits onto the sky and compute the LOSVDs, accounting for the KCWI point-spread function (PSF) and spatial binning. Using a non-negative least-squares optimization, we determine the orbital weights such that the linear superposition of orbits reproduces the luminous mass (to an accuracy of 1%) and the observed kinematics in each spatial bin. As described below, the procedure is repeated for a large suite of galaxy models to determine the best combination of the galaxy model parameters.

Parameter search
The best-fit model parameters and uncertainties are determined as follows. We use an iterative grid-free Latin hypercube scheme to select sampling points in the six-dimensional model parameter space (Liepold et al. 2020;Quenneville et al. 2022;Pilawa et al. 2022). In each iteration, the TriOS code is run to assess the χ 2 of each of the sampled galaxy models. The χ 2 of a model is determined by comparing the data and uncertainties for the lowest eight kinematic moments in each of the 461 spatial bins to the model predictions. An additional set of constraints is imposed on kinematic moments h 9 to h 12 , in which the value of each moment is required to be zero with error bars comparable to the errors in h 3 to h 8 . As shown in Liepold et al. (2020), these additional constraints help eliminate spurious behavior in the LOSVDs predicted by the models.
The goodness-of-fit landscape is then approximated using Gaussian process regression (GPR; Rasmussen & Williams 2006;Pedregosa et al. 2011) with a Matérn covariance kernel. To map the high-likelihood region in finer detail, we run the TriOS code again for a next set of models selected by uniformly sampling a zoom-in volume that lies within the 3-σ confidence level for six parameters in the previous regression surface. A more accurate GPR surface is then obtained from all the available models. After multiple iterations we again use GPR to construct a smooth likelihood surface from all available models (nearly 20,000 in total). Finally, we use the dynamic nested sampler dynesty (Speagle 2020) to sample from this surface to produce Bayesian posteriors assuming a uniform prior for all parameters.
Following Quenneville et al. (2022), we search over a different set of shape parameters, T , T maj , and T min , instead of angles θ, φ, and ψ. Such a parameterization maps the deprojectable volume in the viewing-angle space into a unit cube in the shape-parameter space, allowing for simpler and more efficient searches. The definitions of (T, T maj , T min ) and the relationships with Average middle-to-long axis ratio p 0.845 ± 0.004 Average short-to-long axis ratio q 0.722 ± 0.007 Average apparent-to-intrinsic long axis ratio u 0.935 ± 0.004 (θ, φ, ψ) are given in Section 3 of Quenneville et al. (2022). The final posterior distributions yield clear constraints on all six model parameters: M BH , outer M * /L, dark matter density ρ 0 , T , T maj , and T min (Figure 4). Instead of the halo density parameter ρ 0 , we describe the dark matter halo in terms of the ratio of dark matter to total matter enclosed within 10 kpc, f 10 . The posterior distributions for the more intuitive (luminosity-averaged) axis ratios p, q, and u are also shown. The best-fit parameters are summarized in Table 1.

Black hole mass and stellar mass-to-light ratio
The mass of the M87 black hole has been determined with two other independent methods (Walsh et al. 2013;Event Horizon Telescope Collaboration et al. 2019a) in addition to the stellar-dynamical method used here. Compared to our value M BH = (5.37 +0.37 −0.25 ± 0.22) × 10 9 M , the value M BH = (6.5 ± 0.2 ± 0.7) × 10 9 M inferred from the crescent diameter by the Event Horizon Telescope (EHT) team (Event Horizon Telescope Collaboration et al. 2019b) is 21% higher, but the difference is within 1.5-σ of their uncertainties. A recent re-analysis of EHT observations (Broderick et al. 2022) revised the black hole mass to M BH = (7.13 ± 0.39) × 10 9 M , which is 33% above our value, but Tiede et al. (2022) cautioned the false-positive tendency of the method used in the re-analysis and found that significant systematic uncertainties were not taken into account. The ionized gas-dynamical determination of M BH = (3.45 +0.85 −0.26 ) × 10 9 M (after scaling to our adopted distance of 16.8 Mpc) is 36% below our value (Walsh et al. 2013;Event Horizon Telescope Collaboration et al. 2019b).
Before this work, the most recent mass measurement of the M87 black hole that also used orbit-based stellar dynamics obtained Event Horizon Telescope Collaboration et al. 2019b) M BH = (6.14 +1.07 −0.62 ) × 10 9 M (after scaling to our adopted distance of 16.8 Mpc), which is 14% above our value. Despite the apparent consistency, there are many differences between the two measurements. In this work, the stellar spectra are obtained in a homogeneous manner from the latest IFU at the Keck Telescope over a contiguous 250 × 300 field and have S/N of around 100 perÅ for the outermost bins and above 200 perÅ for central bins. The observed stellar velocity dispersions used to constrain the orbit models in this work are about 20% lower than Gebhardt et al. (2011);Murphy et al. (2011) beyond 1 kpc (Figure 3; top panel), but this work is in broad agreement with other recent measurements Sarzi et al. 2018;Forbes et al. 2020). The orbit modeling in this work allows for triaxiality, and the M BH is obtained from a full six-dimensional model parameter search with posteriors measured using a Bayesian framework. Furthermore, Gebhardt et al. (2011) adopts a spatially constant V -band M * /L of 9.7 M /L (scaled to our distance of 16.8 Mpc). However, a recent detailed stellar population analysis of M87 reported a negative radial gradient due to a changing stellar initial mass function (Sarzi et al. 2018). When incorporating the shape of this M * /L gradient into our stellar-dynamical models, we find the Vband M * /L declines from 8.65 M /L at the center to an outer value of 3.46 M /L .

Dark matter mass
At the outer reach of our data, at a radius of 10 kpc, we find the enclosed dark matter mass to be M DM (< 10 kpc) = (3.88 ± 0.12) × 10 11 M , constituting about 67% of the total mass of the galaxy (f 10 in Table 1). A similar dark matter fraction (73% at 14.2 kpc) is obtained from Jeans modeling of the kinematics of globular clusters (Li et al. 2020). A lower dark matter fraction (about 30% at 11 kpc) is estimated from axisymmetric orbit-based modeling of the kinematics from stars and globular clusters (Murphy et al. 2011). This lower fraction arises mainly from their high estimate of M * /L discussed in the previous paragraph.
Our inferred total mass of M87 within 10 kpc is M tot (< 10 kpc) = (5.77 ± 0.12) × 10 11 M . Dynamical modeling of globular clusters under the assumption of spherical symmetry yields very similar value at the same radius (Wu & Tremaine 2006;Romanowsky & Kochanek 2001) but with large modeling uncertainties (Wu & Tremaine 2006). Estimates from axisymmetric orbit models find a 15% lower value (Murphy et al. 2011). Jeans modeling studies (Oldham & Auger 2016;Li et al. 2020) incorporating a radially declining M * /L find a total mass enclosed within 10 kpc to be in the range of (3-7.5) × 10 11 M .

M87's intrinsic shape
Our orbit modeling results show that M87 is strongly triaxial, where the lengths of the short and middle principal axes are 72% and 85% of the length of the long axis, corresponding to q and p, respectively. A triaxiality parameter often used to quantity the ratios of a galaxy's principal axes is T = (1 − p 2 )/(1 − q 2 ) = (a 2 −b 2 )/(a 2 −c 2 ). This parameter ranges between T = 0 for an oblate axisymmetric shape (p = 1 or a = b) and T = 1 for a prolate axisymmetric shape (p = q or b = c), with values between 0 and 1 indicating a triaxial shape. Our inferred value for M87 is T = 0.65 ± 0.02, strongly excluding the possibility that M87 is an axisymmetric galaxy.
The shape parameters p, q, and u in Table 1 are related to a set of angles θ, φ, and ψ that uniquely specify the orientation of M87's intrinsic axes with respect to its projected axes on the sky (van den Bosch et al. 2008;Quenneville et al. 2022). The angles θ and φ specify the direction of the line-of-sight from M87 to the observer; they are the usual polar angles in M87's intrinsic coordinate system. The inclination angle θ = 0 • corresponds to a face-on view of M87 along its intrinsic short axis, and θ = 90 • corresponds to an edge-on view with the short axis in the sky plane. The azimuthal angle φ = 0 • places the intrinsic middle axis in the sky plane and φ = 90 • places the intrinsic long axis in the sky plane. Once the line of sight is described by θ and φ, the third angle ψ specifies the remaining degree of freedom for the rotation about the line of sight. Our best-fit angles for M87 are (θ, φ, ψ) = (48. • 9, 37. • 5, −61. • 3). Thus, we are viewing M87 from a direction that is roughly equidistant from all three principal axes.

Angular momentum vector and origin of kinematic misalignment
To gain physical insight into the origin of the observed misalignment between the kinematic axis and photometric major axis of M87 on the sky (Figure 2; lower right), we examine the direction of the total angular momentum vector, L, of the stars predicted by our best-fit orbit model and how it would be projected on the sky. To do this, we sum the individual contributions to the angular momentum from the superposition of stellar orbits and compute the total L. Among the three major orbital types computed in the TriOS code, the box orbits supported by a triaxial gravitational potential, by construction, have zero angular momentum, whereas the short-axis and long-axis tube orbits have net L along the intrinsic short axis and long axis, respectively (Schwarzschild 1979;van den Bosch et al. 2008;Quenneville et al. 2022). The direction of the total L is therefore determined by the relative contributions from the two types of tube orbits (Franx et al. 1991).
The rotational velocity of M87 reaches sufficiently high amplitudes beyond about 5 kpc (Figures 1 and 2) for us to determine the direction of L robustly. We find it to point approximately 60 • off of the intrinsic short axis. Using the best-fit viewing angles to project L on the sky, we find it to lie at a PA of approximately −60 • . Because the projected L is orthogonal to the kinematic axis of the projected velocity field, this simple calculation indicates that the PA of the kinematic axis predicted by the model is around −150 • , very similar to the observed kinematic axis. The observed kinematic misalignment of M87 on the sky is therefore a result of both projection effects of a triaxial galaxy and a physical offset between the total angular momentum vector and the intrinsic short axis of the galaxy.

CONCLUSIONS
With 4000 constraints from Keck KCWI and our latest triaxial orbit modeling code and procedure for sampling high-dimensional parameter spaces even with computationally intensive models, we are able to relax the common assumption of axisymmetry and present the most comprehensive stellar-dynamical study of the M87 galaxy and its central black hole. This work is one of only a small number of studies that have produced constraints on all three intrinsic shape parameters for individual galaxies (Jin et al. 2020;Santucci et al. 2022). Even fewer galaxies have been observed with sufficient angular resolution, field of view, spectral coverage, and S/N for a simultaneous determination of the intrinsic shape, supermassive black hole mass, and galaxy mass (van den Bosch & de Zeeuw 2010; Walsh et al. 2012;den Brok et al. 2021;Quenneville et al. 2022;Pilawa et al. 2022). As demonstrated in this work, further advancements have only been made possible by the installations of wide-field and highly sensitive IFUs on large groundbased telescopes.
Moving forward, it is crucial to apply triaxial stellardynamical orbit models to larger samples of galaxies, thereby advancing this method from a rarity to a standard technique. This is especially pertinent for massive elliptical galaxies such as M87 because the majority of them-when a rotational pattern can be detected in the stellar velocity field-show some degree of misalignment between the kinematic and photometric major axes, extending to the half-light radius and beyond (Ene et al. 2018;Krajnović et al. 2018;Ene et al. 2020). Such an off-set indicates triaxiality (Binney 1985;Franx et al. 1991); an axisymmetric galaxy would, by symmetry, produce only aligned kinematic and photometric major axes.
When direct comparisons between axisymmetric and triaxial modeling were made on the same galaxy, the black hole mass from axisymmetric models has ranged from about 50% (van den Bosch & de Zeeuw 2010) to 170% (Pilawa et al. 2022) of the mass when triaxiality was allowed; and in two galaxies, the black hole mass did not change appreciably (van den Bosch & de Zeeuw 2010; Liepold et al. 2020;Quenneville et al. 2022). Overall, triaxial models were able to match the observed stellar kinematics significantly better than axisymmetric models (Quenneville et al. 2022;Pilawa et al. 2022).
More secure black hole masses could result in significant changes to the local black hole census and the shapes of the scaling relations between black holes and host galaxies, thereby impacting our understanding of black hole fueling and feedback physics, as well as binary black hole merger physics used to forecast and eventually interpret gravitational wave signals for Pulsar Timing Arrays (Taylor 2021) and space-based detectors (Amaro-Seoane et al. 2022). In terms of black hole imaging studies, since the photon ring diameter ranges from about 9.6 to 10.4 gravitational radii depending on the black hole spin (Event Horizon Telescope Collaboration et al. 2019b), future analyses combining direct imaging with stellar kinematic measurements such as that presented this paper have the potential to significantly improve the prospects for measuring black hole spins.

ACKNOWLEDGMENTS
We dedicate this work to the late Wal Sargent, who reported the first observational evidence for the M87 black hole and was a mentor to generations of sci- We observed M87 using the integral-field spectrograph KCWI on Keck. We used the BL grating centered on 4600Å and the Kblue filter to obtain the widest wavelength coverage, reducing possible template mismatch during the subsequent extraction of the stellar kinematics. The integration time per exposure varied from 300 s for the central pointings to 1500 s for the outermost pointings with low surface brightness. We periodically acquired offset sky exposures in between the on-source galaxy exposures, each roughly half the integration time of the adjacent galaxy exposures. Only data taken in good observing conditions are used in this analysis; the on-source and sky exposure times total 13 hr and 2.8 hr, respectively.

A.1. Data reduction
The KCWI Data Extraction and Reduction Pipeline (Morrissey et al. 2018) is actively maintained on a publicly accessible GitHub repository. We use the IDL version of the pipeline with its default settings to reduce each frame. The main steps include overscan and bias removal, cosmic ray rejection, dark and scattered light subtraction, solving for the geometric distortion and wavelength solution, flat-fielding, correction for vignetting and the illumination pattern, sky subtraction, and the generation of datacubes using the spatial and spectral mappings determined previously. The pipeline then corrects for differential atmospheric refraction and applies a flux calibration using a standard star.
In addition to the default pipeline, we perform custom steps to improve the quality of the processed data. Some cosmic rays are improperly removed by the KCWI pipeline, leaving sharp features at certain wavelengths in a small number of spaxels in our datacubes. We therefore scan through each wavelength slice of the cubes, mask the impacted pixels, and perform an interpolation to replace their values with those of neighboring pixels. Furthermore, beyond about 100 , the KCWI spectra are sky-dominated and subtle mis-subtraction of the sky can result in significant reduction of the S/N of the galaxy spectra. The sky subtraction stage of the KCWI pipeline uses b-spline interpolation to build a "noise-free" model of the sky in each pixel that is subtracted from the corresponding object exposure. We find that this routine does not capture highly space-or time-variant sky features, so we further remove residual sky features using the combination of a principal component analysis (PCA) and the penalized pixel-fitting (pPXF; Cappellari 2017) method, as described in Appendices A.4 and B.
In the final step, we merge the on-source M87 datacubes. Roughly half of the pointings were taken with the long axis of KCWI aligned with a PA of −25 • and half were oriented perpendicular to this with a PA of −115 • . We construct a pair of datacubes, one for each of the two orientations using the nifcube and gemcube IRAF tasks that are part of Gemini's data reduction software. We input the fully calibrated KCWI datacubes (the " icubes.fits" files) and map the cubes onto a shared grid with a spacing of 0. 3 × 1. 4 × 1Å. This choice of spaxel size matches the native scale of the individual KCWI datacubes for our observational setup.

A.2. Line-spread function
We find that our selected spectrograph configuration produces a line-spread function (LSF) that is distinctly non-Gaussian ( Figure A1). The LSF is instead well described by the convolution of a Gaussian function and a top-hat function of the form where Π(x) = 1 if |x| ≤ 1/2 and 0 otherwise, ∆ is the full width of the top-hat component, and σ is the standard deviation of the Gaussian. To measure the widths −5 −2.5 0 2.5 5 Wavelength offset (Å) Normalized Flux (Arbitary) Figure A1.
Line Spread Function of Keck KCWI with BL grating. We find the LSF of KCWI BL grating to be well approximated by a Gaussian function convolved with a top-hat function, as shown in Equation A1. To measure the shape of the LSF, we simultaneously fit 31 lines of an FeAr lamp spectrum as described in Appendix A.2. Here we plot a superposition of the nine most prominent of those lines. Black points mark the flux in the lamp spectrum around each line after normalizing for each line's amplitude. Our best-fit LSF model (green) has a top-hat function of width ∆ = 5.105Å convolved with a Gaussian function of σ = 0.627Å. A single Gaussian function, as is typically assumed, would provide a very poor fit to the KCWI LSF (red).
of the Gaussian and top-hat components of the LSF, we simultaneously fit 31 lines of an FeAr arc lamp spectrum between 4500 and 5000Å and determine ∆ = 5.105Å and σ = 0.627Å. Repeating this procedure on different spectral or spatial regions yields comparable best-fit parameters.

A.3. Point-spread function
During the first night of observations, we took KCWI data of the inner region of M87 and the atmospheric seeing was estimated to be 0. 63 by the differential image motion monitor at the nearby Canada France Hawaii Telescope weather station. This estimate is consistent with the broadening of point sources measured from exposures taken with the guider camera. During the other four nights, we observed the outer regions of M87 and measured similar seeing. While running stellardynamical models, described in Section 4, we use a PSF that is a Gaussian with a full width at half maximum (FWHM) of 0. 63 (σ = 0. 28).

A.4. PCA decomposition of sky features
As part of the process to remove residual sky features seen in the reduced M87 datacubes, we perform a PCA decomposition of the sky spectra. For each sky cube, we apply a conservative spatial masking of possible sources in the field and coadd the unmasked pixels to obtain a high S/N sky spectrum. A weighted expectationmaximization PCA (Bailey 2012) is then applied to each of the sky spectra between 3800 and 5650Å. Since the amplitudes of the 4861Å Hβ, 5200Å [N I], and the 5577 A [O I] lines are highly variable and are not well captured with a PCA decomposition (van Dokkum et al. 2019), we mask these features. The first PCA component is effectively the mean sky spectrum. The second and fourth components capture slight variations in the shape of the continuum and the Ca H and K features. The third and fifth components capture variations in the numerous OH lines. While we obtain measurements of the first ten components, the fifth component and beyond are consistent with noise. A similar routine was previously applied to KCWI observations (van Dokkum et al. 2019) and this method is similar in spirit to the Zurich Atmospheric Purge (ZAP; Soto et al. 2016) used for MUSE observations.

A.5. Spectral and spatial masking
We mask nine spectral features, which together span a total of 274Å ( Figure A2). The masked features include emission lines that are prominent at the nucleus, as well as the 4861Å Hβ, 5200Å [N I], and 5577Å [O I] lines that are masked in the PCA decomposition. The Mg i b region (5184-5234Å) is also masked because it is contaminated by the 5200Å [N I] skyline and is coincident with Fe emission features at M87's redshift.
We also apply a spatial mask to exclude potentially contaminant spaxels. This is done by collapsing the datacubes spectrally, flagging regions of spaxels with substantially higher surface brightness than their surroundings, and then masking the brightest spaxels in those regions. This process removes the spaxels that are contaminated by the prominent jet, the central ∼0. 85 that is affected by the active galactic nucleus (AGN), and numerous bright globular clusters.

A.6. Spatial binning
We use the vorbin package (Cappellari & Copin 2003) to construct spatial bins and obtain coadded KCWI stellar spectra with uniformly high S/N. By default, vorbin calculates the S/N of each coadded spectrum based on values of the signal and the noise of the individual spaxel spectrum given by the user, adding the signals linearly and the noise in quadrature. Instead of this default setting, we modify the sn func() routine in vorbin's voronoi 2d binning to nonanalytically recompute the S/N from the M87 datacube while binning. This approach improves the uniformity of the resultant S/N across the bins as it naturally incorporates spatial correlations in the signal and noise between spaxels. We  Figure A2. Representative KCWI spectra of M87. Skysubtracted galaxy spectra (black curves) for ten representative spatial bins located at projected radii from 1 to 130 are shown. A total of 461 binned spectra are used in this work. The S/N of these co-added spectra range from about 100 to 200 perÅ. The stellar template broadened by the best-fit LOSVD is overlaid (red curves) on each spectrum. estimate the S/N by first smoothing the spectrum with a Gaussian kernel with FWHM = 4Å, comparable to the LSF. The noise is taken to be the root-mean-square (rms) difference between the raw and smoothed spectra, while the signal is taken to be the median flux of the raw spectrum. We apply the spectral masks described above before smoothing to avoid contamination from sharp features in the spectra. This procedure results in 461 spatial bins and a coadded spectrum for each of the bins. The S/N perÅ ranges from about 200 in the central regions to about 100 in the outer regions. Figure A2 shows a series of representative KCWI spectra (black curves) for ten of the 461 spatial bins located at projected radii of 1 -130 .

B. STELLAR KINEMATIC DETERMINATION
We measure the stellar LOSVD for each of the 461 binned spectra using pPXF (Cappellari 2017). With pPXF, we convolve a linear combination of template stars with an LOSVD, parameterized by V , σ, and highorder Gauss-Hermite moments h 3 -h 8 that account for asymmetric and symmetric deviations from a Gaussian velocity distribution (van der Marel & Franx 1993). The S/N of our data enable the measurement of high-order Gauss-Hermite moments. We find that truncating the series at h 4 (or h 6 ) results in elevated values for h 4 (or h 6 ), but when fitting to h 8 or h 12 , the values of h 4 and h 6 converge and the highest extracted moments become consistent with 0, as seen in past work (Pilawa et al. 2022;Liepold et al. 2020). In addition, we find it important to constrain the kinematic moments beyond h 4 in dynamical modeling. When those moments are not constrained in orbit models, the models are prone to producing LOSVDs with unphysical features due to large values in the high-order moments, potentially biasing the preferred model parameters (Quenneville et al. 2021;Liepold et al. 2020).
For stellar templates, we use the MILES library (Falcón- Barroso et al. 2011;Sánchez-Blázquez et al. 2006) but select 485 spectra out of the full 985 templates that have well-identified spectral types and luminosity classifications. These stellar templates have a higher spectral resolution than our observations and are degraded to match the KCWI (non-Gaussian) LSF before fitting with pPXF.
During the kinematic fit, we use an additive polynomial of degree one and a multiplicative polynomial of degree 15 to model the stellar continuum. We also supply the PCA components that describe the sky background to pPXF. This procedure results in a weighted combination of the PCA components, which is included as an additional additive term to match the residual sky features in the M87 spectra that remained after the KCWI pipeline's default sky subtraction. Ultimately, we use the first ten PCA components, but find that the extracted Gauss-Hermite moments are unchanged as long as at least the first five PCA components are included in the fit.
Because we excluded three highly variable sky lines during the PCA decomposition process, we also mask those spectral regions when running pPXF, as well as emission lines associated with M87 and the Mg i b region, as described previously. In contrast to the other masked regions, we find that the extracted Gauss-Hermite moments depend strongly on the endpoints of Mg i b mask and only stabilize once the entire 5184-5234 A region is excluded from the fit.
Altogether, we fit the Gauss-Hermite moments, polynomial coefficients, template weights, and sky weights simultaneously. The stellar templates broadened by the best-fit LOSVD provide excellent fits to each of the ob-  Table 2. Best-fit MGE parameters for the surface brightness of M87. For each of the 11 two-dimensional Gaussian components, the first column lists the central surface brightness density, the middle column lists the dispersion of the Gaussian, and the last column lists the axis ratio, where primed variables denote projected quantities. We obtain the MGE by fitting to the V -band light profile in Kormendy et al. (2009). To impose a M * /L gradient in the dynamical models, the I k values are adjusted to reproduce the profile in Figure D1.
served spectra, as illustrated by the red curves for the ten representative spectra shown in Figure A2. The measurement uncertainties on the LOSVDs are determined as follows. After an initial fit to each binned spectrum, we perturb the spectrum at a given wavelength by drawing a random number from a Gaussian distribution centered on the spectrum and with a dispersion equal to the rms of the pPXF residuals from the preliminary fit at that wavelength. We perform 1000 such perturbed fits with the pPXF bias parameter set to 0 and determine the mean and standard deviation of each moment over those 1000 realizations, which we adopt as the kinematic value and its 1-σ uncertainty. For bins in the central 100 ×100 region, the mean error on V is 2.6 km s −1 and on σ is 3.0 km s −1 . The mean errors on h 3 through h 8 are similar, spanning from 0.009 to 0.016. The typical errors in the outer bins are slightly larger with mean errors on V , σ, and h 3 through h 8 of 2.6 km s −1 , 3.4 km s −1 , and 0.012-0.022, respectively.

C. SURFACE BRIGHTNESS OF M87
Besides the stellar kinematics, another constraint used in the dynamical models is the galaxy's luminosity density. We use a previously published V -band light profile, along with measurements of the ellipticity and PA of the isophotes (Kormendy et al. 2009). The profile extends from 0. 017 to 2400 and comes from a combination of ground-based data and high-resolution HST Logistic approximation with δ = 2.5 Sarzi+2018 Figure D1.
Radial profile of M * /L ratio used in this work. The logistic approximation (red) used in our modeling, given by Equation (1), is chosen to match the shape of the r-band M * /L (black) in Figure 11 of Sarzi et al. (2018). The inner M * /L is δ = 2.5 times the outer M * /L ratio, and the transition is centered around 10 arcsec. Our dynamical model prefers an outer V -band M * /L of 3.46 +0.04 images, which have been deconvolved to remove the effects of the PSF as well as the AGN. We fit the sum of multiple two-dimensional Gaussians to the composite surface photometry. These MGE (Cappellari 2002) approximations are commonly used because they are able to match the surface brightnesses of galaxies while also enabling analytical deprojections to obtain intrinsic luminosity densities. Our best-fit MGE reproduces the surface brightness between 0. 1 and 500 within 10%. This MGE has 11 Gaussian components that share the same center and PA of −25 • . While the value of the PA in Kormendy et al. (2009) varies within 50 , the isophotes between 1 and 50 are very round with ellipticity 0.08; using a constant PA in our MGE therefore does not affect the quality of the fit. The MGE parameters are given in Table 2.

D. ORBIT MODELING
Radial profiles of the M * /L ratio and stellar kinematics used in the orbit modeling in this work are shown in Figure D1 and Figure D2 Figure D2. Radial profiles of the first eight moments of the stellar LOSVDs. The observed Keck KCWI moments (gray) are well matched by the moments predicted by the best-fit model (red) given by Table 1. The triaxial orbit models produce point-symmetric LOSVDs, so we have pointsymmetrized the kinematic moments before fitting.