No self-shadowing instability in 2D radiation-hydrodynamical models of irradiated protoplanetary disks

Theoretical models of protoplanetary disks including stellar irradiation often show a spontaneous amplification of scale height perturbations, produced by the enhanced absorption of starlight in enlarged regions. In turn, such regions cast shadows on adjacent zones that consequently cool down and shrink, eventually leading to an alternating pattern of overheated and shadowed regions. Previous investigations have proposed this to be a real self-sustained process, the so-called self-shadowing or thermal wave instability, which could naturally form frequently observed disk structures such as rings and gaps, and even potentially enhance the formation of planetesimals. All of these, however, have assumed in one way or another vertical hydrostatic equilibrium and instantaneous radiative diffusion throughout the disk. In this work we present the first study of the stability of accretion disks to self-shadowing that relaxes these assumptions, relying instead on radiation-hydrodynamical simulations. We first construct hydrostatic disk configurations by means of an iterative procedure and show that the formation of a pattern of enlarged and shadowed regions is a direct consequence of assuming instantaneous radiative diffusion. We then let these solutions evolve in time, which leads to a fast damping of the initial shadowing features in layers close to the disk surface. These thermally relaxed layers grow towards the midplane until all temperature extrema in the radial direction are erased in the entire disk. Our results suggest that radiative cooling and gas advection at the disk surface prevent a self-shadowing instability from forming, by damping temperature perturbations before these reach lower, optically thick regions.


INTRODUCTION
Understanding the different physical processes behind the structure and evolution of protoplanetary disks is of fundamental importance to explain the formation of planets. Gas dynamics in disks around young stars affects the transport and accumulation of dust, thereby shaping the resulting planetary systems (a recent review on hydro-, magnetohydro-, and gasdust dynamics in protoplanetary disks can be found in Lesur et al. 2022). Given that these disks are largely cold and poorly ionized (Dzyurkevich et al. 2013;Desch & Turner 2015), it is currently unclear to what extent magnetic phenomena such as the magnetorotational instability (MRI) (Velikhov 1959;Chandrasekhar cal shear instability (VSI) (Urpin & Brandenburg 1998;Nelson et al. 2013;Stoll & Kley 2014;Flores-Rivera et al. 2020;Manger et al. 2020Manger et al. , 2021, and the recently discovered local VSI (Klahr et al. 2022, submitted) and nonlinear symmetric instability (Klahr & Baehr 2022, submitted). Besides producing turbulence and limiting dust settling (see, e.g., Birnstiel et al. 2010;Flock et al. 2020), hydrodynamical instabilities may trigger the formation of long-lived structures such as vortices and zonal flows (see, e.g., Manger & Klahr 2018;Pfeil & Klahr 2021) that due to their pressure structure can accumulate dust and pebbles, potentially accelerating planetesimal formation via gravitational collapse (Dittrich et al. 2013;Raettig et al. 2015;Gerbig et al. 2020;Raettig et al. 2021). Furthermore, a clear understanding of these phenomena is crucial to explain current observations of structures in protoplanetary disks (Andrews et al. 2018; Barraza-Alfaro et al. 2021) and help confirm or rule out the detection of exoplanets, as it is often unclear if planetary perturbations are required to explain such features (Calcino et al. 2020;Pyerin et al. 2021;Brown-Sevilla et al. 2021).
An important, yet unanswered question in this picture is if protoplanetary disks are globally stable to local perturbations in their scale height. Local regions of enlarged aspect ratio absorb stellar radiation at a higher rate than their surroundings, potentially heating up and further inducing a local growth of the disk's width. Conversely, shadowed regions cool down, which causes them to shrink and absorb even less starlight. In all models of this process, perturbations move inwards due to their asymmetric irradiation: any given enlarged region absorbs more starlight on their inner face, i.e., closer to the star, than on their shadowed outer side. As a consequence, the inner side grows and the outer one shrinks, shifting the center of the enlarged region towards the star. It is currently unknown if this feedback process can be efficient enough to trigger a self-sustained instability or if, on the contrary, the formation of scale height perturbations is damped by even faster radiation-hydrodynamical processes.
This formation and evolution of temperature bumps (or thermal waves) has received different names in literature, such as thermal wave instability (Watanabe & Lin 2008;Ueda et al. 2021), iradiation instability (Wu & Lithwick 2021), and self-shadowing instability (Dullemond 2000). We shall use the latter as we find it the most self-explanatory. It was argued by several authors that such an instability could have major consequences for planet formation, either by affecting the composition of formed planetesimals by inducing tem-poral variations in snow line locations (Ueda et al. 2021;Okuzumi et al. 2022), boosting the formation of planetesimals by creating dust traps (Okuzumi et al. 2022), or triggering hydrodynamical instabilities in the dead zone (Watanabe & Lin 2008). On the other hand, as suggested by Siebenmorgen & Heymann (2012) and Ueda et al. (2021), such an instability could naturally produce rings and gaps such as those observed in dust continuum (Andrews et al. 2018) and scattered light (Avenhaus et al. 2018) images of protoplanetary disks.
The stability of circumstellar disks to self-shadowing was first addressed by D' Alessio et al. (1999), who studied temperature perturbations in vertically isothermal disks. Using typical parameters for accreting T-Tauri stars, these authors assumed instantaneous vertical hydrostatic equilibrium in the inner regions of the disk (r 2 au), where they estimated the hydrostatic equilibrium timescale t dyn to be much shorter than the radiative cooling timescale t therm . Assuming axisymmetry, they constructed and numerically solved a 1D model for the evolution of the midplane temperature, considering absorption of stellar irradiation at an altitude z s above the midplane and radiative cooling. This analysis concluded that temperature perturbations travel inward as they get damped, which suggested that disks are stable under such circumstances. Shortly after, Dullemond (2000) studied the opposite limit of this model, i.e., t dyn t therm , therefore computing the midplane temperature from a thermal equilibrium condition and constructing a simplified equation for the acceleration of the pressure scale height due to its departure from equlibrium. A linear analysis of that model showed that inward-propagating modes grow exponentially, which supported the instability hypothesis.
A new take on the hydrostatic case (t dyn t therm ) was conducted by Watanabe & Lin (2008), who relaxed the assumption made by the previous authors that the z s /H ratio, where H is the pressure scale height, is constant throughout the disk. Instead, these authors obtained z s numerically by computing the optical depth τ s at the star's temperature along straight lines from the star and defining the irradiation surface as the location of space verifying τ s = 1. The main results of this model contradicted the conclusions of D' Alessio et al. (1999), as they showed a sustained spontaneous formation of inward-propagating temperature waves within 100 au from the central star. As argued by the authors and later supported by the linear analysis by (Wu & Lithwick 2021), this different behavior is directly caused by the variation of the z s /H ratio with temperature perturbations.
Later improvements to this 1D model were made by Ueda et al. (2021), Wu & Lithwick (2021), and Okuzumi et al. (2022). The first of these authors considered a quite similar model including external heating from the surrounding cloud, which suppressed the instability at large radii for low-enough dust content (dust-togas mass ratio of ∼ 10 −4 in their case). On the other hand, Wu & Lithwick (2021) computed the stellar irradiation heating flux via the Stefan-Boltzmann law with a temperature resulting from 2D Monte Carlo simulations carried out at each time step, and increased the thermal timescale used in their evolution equation to mimic the effect of vertical radiative diffusion. Using density values consistent with observed systems, they obtained temperature waves within 30 au from the star (∼ 100 au if dust settling was included), outside of which the instability was suppressed by radial radiation transport. Finally, Okuzumi et al. (2022) included the same timescale correction for vertical diffusion as in Wu & Lithwick (2021), and computed stellar heating by means of an approximate radiative transfer model based on an integration of the incident flux from the entire irradiation surface onto each element of the reprocessed radiation photosphere. Their results are overall similar to those obtained by the other authors, showing inward-moving temperature waves up to 30 au with sharper variations than in previous works due to the different radiative transfer scheme. As suggested by Ueda et al. (2021) and supported by the linear analysis of Wu & Lithwick (2021), the strength and growth rate of the instability in these models increase for higher z s /H, which is naturally verified for high dust contents and therefore high opacities. Moreover, high opacities reduce radial radiation transport, which otherwise damps the formation of temperature bumps.
Similar unstable behavior is observed in the 2D hydrostatic models of protoplanetary disks by Dullemond & Dominik (2004a); Min et al. (2009) ;Siebenmorgen & Heymann (2012); Ueda et al. (2019), with the difference that these are not time-dependent. Instead, these authors produced hydrostatic configurations by iteratively alternating the computation of temperature distributions via Monte Carlo radiative transfer and the solution of hydrostatic equilibrium equations. As shown in these works, this procedure does not necessarily converge, as the resulting solutions exhibit bumps in the scale height that get shifted towards the star after each iteration. This phenomenon becomes more important for increasing disk masses (Min et al. 2009) and is more prominent in T Tauri stars than in Herbig Ae/Be stars (Dullemond & Dominik 2004a;Siebenmorgen & Heymann 2012). Contrarily to the suggestion made by these authors that such temperature oscillations are related to the self-shadowing instability, Wu & Lithwick (2021) argue that these may also be influenced by an intrinsic numerical instability caused by the iterative procedure, and that disks that are unstable to self-shadowing in iterative models may be stable in reality. Similar shadowing features can also be seen in the Monte Carlo-based iterative method applied in Isella & Turner (2018), in which the 3D gas distribution is reconstructed at each step from 2D hydrodynamics simulations of the midplane density assuming vertical hydrostatic equilibrium.
Several effects disregarded in all of these works could potentially prevent the formation of the self-shadowing instability. The assumption of a vertically hydrostatic disk has been so far justified on timescales estimations, but none of these studies involved the solution of 3D (or 2D axisymmetric) hydrodynamics equations. Therefore, meridional advection of mass and energy, which could in principle kill off any large-scale perturbation via sound waves, has so far been disregarded. On the other hand, all models described above assume in one way or another instant information, meaning that the midplane temperature is at any point in time immediately affected by the temperature at the photosphere or even at distant radii. This is at odds with the fact that energy is transported via radiative transfer and advection over finite times. It is then unclear if surface temperature perturbations reach the midplane fast enough to cause a fast, substantial growth in the local scale height before they change at the surface. Thus, hydrostatic models may enhance self-shadowing effects by neglecting such delay. On this regard, a linear analysis made by Wu & Lithwick (2021) shows that vertical temperature waves due to surface perturbations are damped on their way to the midplane, although this study does not conclusively answer if that is enough to suppress the instability. It appears clear that answering if self-shadowing can lead to a physical instability requires a simultaneous treatment of stellar irradiation, radiative transfer of the disk's own thermal emission, and hydrodynamics.
In this work we address this issue by solving the equations of radiation hydrodynamics (Rad-HD) with stellar irradiation. Similar simulations in 3D and in optically thin disk regions (> 20 au), where selfshadowing is not expected, have been made by Flock et al. (2017b), obtaining VSI-induced turbulence. We assume axisymmetry and study the disk's stability in 2D simulations of the region between 0.4 and 100 au. In this way, we do not attempt to make a self-consistent description of the disk's inner rim, which we assume to be much closer to the central star. We note that 2D axisymmetric Rad-HD simulations of the inner rim have been made in Flock et al. (2016) including temperaturedependent viscosity, obtaining steady state configurations that remain stable for thousands of dynamical timescales. Further 3D studies of the inner rim employing nonideal radiation magnetohydrodynamics without viscosity (Flock et al. 2017a) obtained steady-state MRI-induced turbulence and vortex formation via the Rossby wave instability. In this work we neglect magnetic fields in order to focus on instability solely due to shadowing effects, and we include no viscosity to reduce damping factors as much as possible. We employ for this the two-moment Rad-HD scheme described in Melon Fuksman et al. (2021), which is able to maintain the direction of freely streaming radiation flows. This avoids the artificial spreading of radiation energy produced by simpler moment-based radiative transfer schemes such as flux-limited diffusion (Hayes & Norman 2003), which could in principle enhance the damping of thermal waves. This paper is organized as follows. In Section 2, we summarize the equations of our model and the employed numerical methods. In Section 3, we construct hydrostatic models of protoplanetary disks and characterize the development of features produced by selfshadowing. In Section 4 we run Rad-HD simulations to study the time evolution of our hydrostatic configurations and discuss the implications and limitations of our results in the context of the disk's stability to selfshadowing. In Section 5, we summarize the main results of our work. Additional implementation details and tests at increased resolution are included in the Appendix.

RADIATION HYDRODYNAMICS
We model the gas in the circumstellar disk and the absorption and emission of radiation by means of the nonrelativistic Rad-HD module (Melon Fuksman et al. 2021) implemented in the PLUTO code (Mignone et al. 2007). In the configuration used in this work, the Rad-HD equations read: where ρ, p, and v are the gas density, pressure and velocity, while E r , F r , and P r are respectively the radiation energy, flux, and pressure tensor. The gas energy is computed as where ρ is the gas internal energy density. This quantity is determined as a function of the pressure by applying the ideal gas equation of state where we take the specific heat ratio Γ to be constant. The radiation pressure tensor is computed as a function of E r and F r by means of the M1 closure (Levermore 1984), as where the Eddington tensor D ij is defined as where n = F r /||F r ||, f = ||F r ||/E r , and δ ij is the Kronecker delta. We have normalized the radiation flux by c in such a way that both E r and F r are measured in energy density units. The quantities c andĉ in Eq. (1) correspond respectively to the real and reduced values of the speed of light. These are chosen in such a way thatĉ/c < 1 in order to reduce the scale disparity between gas and radiation characteristic speeds, thus avoiding the employment of far too small time steps to achieve reasonable run times (see details in Melon Fuksman et al. 2021).
We assume axial symmetry around the disk polar axis and solve the Rad-HD equations in 2D using spherical coordinates (r, θ), where r is the spherical radius and θ is the polar angle. However, we compute the evolution of all three components of the vectorial fields v and F r , i.e., including the azimuthal components (φdirection). We define in these coordinates the gravitational potential of a star centered at r = 0 as where M s and G are respectively the stellar mass and the gravitational constant.
We include absorption of stellar irradiation by small dust particles modeling the star as a point source and computing the irradiation flux as where T s is the star temperature, R s is the star radius, B ν (T) = (2hν 3 /c 2 )/(e hν/k B T − 1) is the Planck radiative intensity at a temperature T and frequency ν, and h and k B are the Planck and Boltzmann constants, respectively. The frequency-dependent optical depth is computed along radial trajectories as where ρ d and κ ν are respectively the dust density and the frequency-dependent absorption opacity, while r min is the minimum radius of the computational domain. Since we do not self-consistently model the disk's inner rim, we include in this computation a factor τ 0 (r min , θ, ν) corresponding to the optical depth of the disk portion that is left out of the domain (see Appendix A). For both the absorption of stellar irradiation at optical and near-infrared wavelengths and the transfer of reprocessed infrared radiation, dust opacity is dominated by small (< 1 µm) grains. We assume that these are small enough to neglect vertical settling (we discuss this assumption in Section 4.3) and model the dust distribution of small grains by assuming a constant dust-to-gas mass ratio f dg in the entire domain, thereby computing the dust density as ρ d = f dg ρ. We compute the absorption coefficients employing the same tabulated values as in Krieger & Wolf (2020, which are obtained for spherical grains composed of 62.5% silicate and 37.5% graphite with a density of 2.5 g cm −3 and a size distribution dn ∼ a −3.5 da with radii in the range a ∈ [5, 250] nm. We use 132 κ ν values logarithmically sampled in the frequency range [ν min , ν max ] = [1.5 × 10 11 , 6 × 10 15 ] Hz, shown in Fig. 1. The radiation-matter interaction terms (G 0 , G) are computed by boosting into the laboratory frame (see Eq. (B8)) their comoving values, given bỹ where tildes indicate comoving quantities. In this expression a r = 4σ SB /c is the radiation constant, σ SB is the Stefan-Boltzmann constant, and κ P and κ R are, respectively, the Planck-and Rosseland-averaged absorption opacities. We thereby neglect scattering as an approxi- mation. The opacity coefficients are computed as functions of the gas temperature T, as where we use the same κ ν values as for stellar irradiation. The resulting values as a function of T are shown in Fig. 1. The gas temperature is computed according to the ideal gas law where µ and u are respectively the gas mean molecular weight and the atomic mass unit. We assume immediate temperature equilibrium between dust and gas, and therefore this expression holds as well for the temperature of dust grains. We also neglect dust sublimation and gas opacities, since the temperature in our simulations remains everywhere well below the 1000 − 1600 K required for silicate sublimation at our considered den-sities (Isella & Natta 2005) and the ∼ 1500 K required for the sublimation of carbon grains (Duschl et al. 1996). We integrate Eq. (1) by evolving matter and radiation fields separately in an operator-split way, using substepping for the latter. We employ implicit-explicit (IMEX) schemes for the explicit integration of radiation fluxes and the implicit integration of interaction terms, the last of which is carried out by means of the implicit method described in Appendix B.

CONSTRUCTION OF HYDROSTATIC DISK MODELS
We construct our models of passive irradiated protoplanetary disks by means of an iterative procedure based on Flock et al. (2013), which yields density and temperature distributions under the assumption of hydrostatic equilibrium. Starting with a guess for both distributions, we update the temperature via twomoment radiative transfer (Section 3.1), and recompute the density by solving hydrostatic equilibrium equations (Section 3.2). These steps are then repeated a number of times, using the previously obtained density configurations to compute new equilibrium temperatures. A convergence criterion is not applied to the resulting distributions since this process does not always lead to convergence, as we show in Section 3.3, where we analyze the obtained solutions with different parameters. In Section 3.4 we make an analysis of the disk's thermal relaxation timescale which is useful to later interpret the results of Section 4.

Temperature update
We update the temperature distribution by modifying the Rad-HD module in order to describe radiative transfer in a stationary gas distribution, thus neglecting changes in the gas density and momentum. This results in the following system of equations: together with the equations of constant mass and momentum densities. Equation (13) is solved by only applying the radiation substeps in the Strang-split method implemented in the code, and ignoring the hydro step (see Melon Fuksman et al. 2021). The implicit integration scheme in Appendix B is also modified in such a way that the momentum density is not updated. We assume assume a vertically isothermal disk in local thermal equilibrium (LTE) as initial condition for the first iteration and integrate Eq. (13) for a total time t iter per iteration. After updating the density field for a new run, the fields ρ , E r , and F r are initialized to their final values at the previous iteration.
We assume for the computation of the irradiation flux the same stellar parameters as e.g. Watanabe & Lin (2008); Ueda et al. (2021), corresponding to a T Tauri star with T s = 4000 K, Ms = M , and Rs = 2.086R , in such a way that the total luminosity is L . We take Γ = 1.41, which is typical for solar composition (Decampli et al. 1978). We define the computational domain as (r, θ) ∈ [0.4, 100] au × [π/2 − 0.3, π/2 + 0.3], and assume r 0 = 10 R s for the computation of τ 0 (Appendix A). Our discretization uses N r × N θ = 512 × 200 cells logarithmically and uniformly distributed in the rand θ-directions, respectively, in such a way that the cell aspect ratio is always ∆r/r∆θ ≈ 3.6. We employ zero-gradient boundary conditions for the radiation fields at the radial boundaries, and prevent radiation inflow by reflecting the radiation flux at the ghost cells if it is directed towards the domain. The same is done with the flux at the θ-boundaries, while E r is set to a small value given by E r = a R (10 K) 4 at the θ-ghost cells to guarantee that radiation escapes the domain. We use in every case second-order piecewise linear reconstruction with the modified Van Leer-like limiter by Mignone (2014), and integrate Eq. (13) by means of the IMEX1 scheme in Melon Fuksman & Mignone (2019) which extends the second-order total variation diminishing (TVD) Runge-Kutta method by Gottlieb & Shu (1996). We use the HLLC Riemann solver by Melon Fuksman & Mignone (2019) to compute the intercell radiation fluxes.

Hydrostatic equilibrium
Given each temperature distribution, we obtain radial and vertical hydrostatic equilibrium conditions by setting all time derivatives to zero in Eq. (1), as well as v r , v θ . Additionally, we neglect radiative momentum exchange by setting G to zero in the momentum equation (second in Eq. (1)), which is justified given that the components of G in all of our simulations are everywhere smaller than those of ∇Φ, ∇p, and ∇ · (ρvv) by several orders of magnitude. Under these assumptions, the (r, θ)-components of the momentum equation become ∂p ∂r These equations can be rewritten in terms of ρ, v φ ,and The only unknowns in these equations are ρ and v φ , which can be computed by defining ρ for all r at a given θ = θ 0 . The second equation in Eq. (15) can then be integrated in the θ-direction starting from θ 0 , computing v φ at every step via the first equation. We use for this integration a second-order Runge Kutta method (midpoint rule). We assume the following functional form for ρ(r, θ) at θ 0 = π/2, valid for thin, vertically isothermal disks: where Σ 0 (R) is the initial vertically integrated density, which is kept constant among iterations, while H(R) = c s (R)/Ω K is the midplane pressure scale height, where R = r sin θ is the cylindrical radius. The latter is defined in terms of the midplane isothermal sound speed c s (R) = T(r, π/2) and the Keplerian angular veloc- ity Ω K = G M s /r 3 . Even though the disk is not everywhere vertically isothermal close to the midplane (later shown in Fig. 6), where most of the mass is located, the surface density computed by vertically integrating Eq. (16) stays close to Σ 0 (R) within 1% in all iterations. We assume Σ 0 (R) ∝ R −1 as in Wu & Lithwick (2021), which is shallower than in the minimum-mass solar nebula model by Hayashi (1981) considered in other works on this topic (e.g., Watanabe & Lin 2008) but consistent with observed disks around T Tauri stars. We take Σ 0 (R) = 200 g cm −2 (R/au) −1 , which is similar to the distribution estimated e.g. in van Boekel et al. (2017) and an order of magnitude below that in Watanabe & Lin (2008); Ueda et al. (2021) at 1 au. As in Wu & Lithwick (2021), we define the initial temperature distribution in such a way that the initial aspect ratio is H/R = 0.0248(R/au) 0.275 taking µ = 2.3.
For the integration of Eq. (15), we impose a density floor of ρ min = 10 −27 g cm −3 to avoid errors in smallnumber operations when integrating Eq. (15) over several orders of magnitude in density. Since this is only applied in optically thin regions of space at the disk atmosphere, it does not affect the computed temperatures, as these are independent from the density for τ 1.

Resulting models
We obtained a series of hydrostatic configurations characterized by the iteration time, the dust-to-gas mass ratio, and the reduced speed of light. As summarized in Table 1, simulations are labeled following the nomenclature dgN dg tN t , where N dg and N t are defined as f dg = 10 −N dg and t iter = N t yr, respectively. We considered f dg = 10 −4 , 10 −3 , and 10 −2 . A vertically integrated mass ratio of 10 −2 spanning all grain sizes is typical for protoplanetary disks, while the small dust grains can typically contribute between 1 and 10% of the total dust mass (see, e.g., Birnstiel et al. 2012). We then regard f dg = 10 −2 as a case of extremely high dust content, which we consider in order to study the disk stability in a regime that favors self-shadowing. We usê c/c = 10 −4 in most cases, and otherwise indicate this value with an additional _cN c label (ĉ/c = 10 −N c ). Figure 2 shows the resulting temperature distributions for all employed f dg and t iter . Results are shown after 35 iterations for t iter ≥ 1 yr and after 400 for t iter = 0.1 yr. In the latter case, we do this to give the system enough time to depart from the initial condition and reach a state that repeats itself after a certain number of iterations. We focus for now on the longest t iter runs and consider the rest of them in Section (3.4). In such cases, the solutions for f dg = 10 −2 and 10 −3 do not converge, and instead exhibit peaks or "bumps" in the midplane temperature which are more prominent for higher dust content. For f dg = 10 −4 , the solutions converge to a constant distribution after 12 iter-Figure 2. 2D temperature profiles in log scale for all hydrostatic runs withĉ = 10 −4 . Distributions are shown after 35 iterations for t iter = 100, 10, and 1 yr (top three rows) and 400 iterations for t iter = 0.1 (bottom row). Videos made with snapshots of these runs at different iterations for t iter = 1 and 100 yr can be found in https://youtu.be/RT8IFe8W13g. Fig. 3, temperature bumps in the t iter = 100 yr solutions are continuously formed and shifted towards the star among iterations, reaching the same configuration every 10 iterations. This saturated behavior is reached after 10 iterations and maintained thereafter without further increasing the maximum amplitude of the temperature peaks. As mentioned in the introduction, the radial shifting is caused by the heating and expansion of the bump's inner side, which is exposed to the star, and the cooling and contraction of the shadowed regions. This asymmetric heating can be seen in Fig. 4, where density contours are shown together with power irradiation maps for t iter = 100 yr. For f dg = 10 −4 , the entire disk surface is illuminated by the star, and there are no hints of such shadowing after convergence.

ations. As shown in
As in Wu & Lithwick (2021) and Okuzumi et al. (2022), we do not observe a formation of scale height perturbations at a few tenths of au. The reason for this is that the disk is optically thin in such regions, and any local temperature perturbation is immediately damped by radial radiation transport. This can be noted by taking a look at the distribution of the reduced flux f = ||F r ||/E r , which is much smaller than 1 in the diffusion regime, particularly in the shadowed regions, and tends to 1 in the free-streaming regime. We show this quantity and the flux direction in Fig. (5). For small radii ( 10 au), the radiation flux transitions between the diffusion regime in the disk interior and free streaming in the optically thin atmosphere. In the transition layer between both regimes, the flux emanates predominantly from the overheated regions, and escapes the domain through the θ-boundary. At larger radii, radiation freely streams in the radial direction at the midplane. For f dg = 10 −2 and 10 −3 , temperature bumps only start forming in the diffusion-dominated optically thick region of the disk. The extent of this region decreases for smaller f dg , ending at approximately 30, 10, and 2 au for f dg = 10 −2 , 10 −3 , and 10 −4 respectively. In the latter case, the disk is not optically 10 0 10 1 10 2 r (au) 10 1 10 2 T (K) dg2t100 10 0 10 1 10 2 r (au) dg3t100 10 0 10 1 10 2 r (au)  thick enough for self-shadowed temperature peaks to form. Moreover, the radial radiation flux at the midplane increases for smaller f dg , with a minimum f scaling roughly as 3 × 10 −7 / f dg .
The boundary between the optically thick and optically thin regions is approximately delimited by the surfaces of unity vertical optical depth (τ = 1) computed integrating κ P (T)ρ d dz from the upper and lower boundaries. The location of these surfaces is shown in Fig. 7 for all simulations with t iter = 100 yr, together with the T and f distributions. The scale height z = H(R) and the irradiation τ = 1 surfaces are shown on the same figure, where the latter is computed for each frequency using Eq. (9). It can be seen that the ratio z s /H between the approximate height of the irra-diation layer and H varies at the temperature bumps, which is the condition for instability derived in Wu & Lithwick (2021). This ratio increases with f dg , which is argued in the same work to favor the instability.
We note in these figures that the obtained f values decrease near the irradiation surfaces due to the local increase of E r . Above these surfaces, radiation is transported out of the domain in the vertical direction. Below, part of the energy heats up the optically thick disk regions, and the rest is transported radially through the optically thin atmosphere. In the optically thin regions at large radii, this radial flux converges with the flux towards the midplane caused by irradiation heating. This convergence explains why the T(θ) profiles depart in such regions from the approximately vertically isothermal distribution verified in the optically thick regions, as shown in Fig. 6. This phenomenon is a direct consequence of solving radiative transport using a fluid approach, since convergent fluxes do not interact in a real scenario. More refined radiative transport schemes, such as methods relying on an angular discretization of the radiation specific intensity (see, e.g., Davis et al. 2012;Jiang 2021), should be able to yield more accurate radiation fluxes at large radii. However, this effect has a small impact on the radiation flux and the disk temperature, and most importantly, it does not alter the conclusions of this paper.   Figure 2 shows that the shadowing-induced temperature perturbations are not vertically uniform for sufficiently short t iter . The temperature distributions shown for runs dg2t10 and dg2t1 evidence that, at the end of each iteration, not enough time has passed to adjust the midplane temperature to a new density configuration. This occurs because most of the stellar irradiation is ab-sorbed far from the midplane at the irradiation layers. After each density update, the regions close to those surfaces are updated first, and the temperature is adjusted to the new configuration via diffusion from the outer layers to the midplane. As a consequence, if t iter is short enough, there is a middle layer close to the midplane where the temperature has not yet adjusted. Naturally, the thickness of that middle layer grows when t iter is reduced, since that means that there is less time for temperature perturbations to diffuse into the disk. We can also see that the temperature peaks at the midplane are located at larger radii with respect to those at the surface, as they trace the temperature maxima at the previous iterations. In these cases, the temperature perturbations at the midplane are smaller than at the surface, as can be seen in Fig. 8, where we show the temperature in run dg2t1 as a function of radius at two different altitudes, θ − π/2 = 0 and 0.07, indicated on Fig. 9. This occurs because thermal perturbations do not reach the midplane via diffusion before overheated regions become shadowed regions and vice versa, which averages out the cooling and heating produced at the midplane. This effect, combined with the growth of the middle layer as t iter is reduced, causes temperature perturbations to almost entirely disappear for short enough t iter , as shown in Fig. 2 for dg2t0.1 and dg3t1. For t iter = 0.1 yr, the maximum relative temperature changes between iterations are below 10 −4 in all cases after 400 iterations. It is likely that the reason why no outer perturbed layers remain for short enough t iter is that these have shrunk enough to be entirely confined in optically thin regions, where temperature perturbations are damped out by radiative transport.

Thermal relaxation time
An important issue that must be addressed is if this time-dependent diffusion is affected by the employment of a reduced speed of light. Fig. 9 shows that this is not the case for our chosenĉ values. We show in this figure the temperature distributions for dg2t1, dg2t1_c3, and dg2t1_c2 after 35 iterations. We only observe slight differences in the optically thin region in the features caused by convergent radiation fluxes (Section 3.3) forĉ/c = 10 −4 . In the optically thick region, the vertical extent and temperature distribution of the middle layer that is not efficiently heated through vertical diffusion are the same in all three cases. The reason for this is that the value ofĉ controls the adjustment time of the radiation fields to each new density distribution, but disappears from the equations once these reach a quasistatic configuration, as can be seen by ne- Table 2. Parameters of shown Rad-HD simulations: dust-togas mass ratio f dg andĉ/c ratio between the reduced and real values of the speed of light.

Label
f dgĉ /c rhdg2_c3 10 −2 10 −3 rhdg2_c4 10 −2 10 −4 rhdg3_c3 10 −3 10 −3 rhdg3_c4 10 −3 10 −4 rhdg4_c3 10 −4 10 −3 rhdg4_c4 10 −4 10 −4 glecting the terms 1 c ∂ t E r and 1 c ∂ t F r in Eq. (13). After a fast readjustment of the radiation fields, the cooling time of the disk is actually regulated by the c G 0 term on the right-hand side of Eq. (13), which depends on the real value of c. We can therefore rely on the results of this section to estimate the characteristic cooling time t cool in which surface temperature perturbations reach the midplane via diffusion. From Fig. 2, we obtain t cool ∼ 100 yr for f dg = 10 −2 and ∼ 10 yr for f dg = 10 −3 .
The just described effect is not seen in other works on this topic because of the assumption of instant information mentioned in the introduction. Even studies that replace a detailed modeling of time-dependent vertical diffusion with an increase of the cooling timescale, such as that in Okuzumi et al. (2022), do not account for the delay between the surface and midplane temperature distributions. On the other hand, iterative methods relying on Monte Carlo simulations cannot exhibit features due to vertical diffusion in finite time since they correspond to the t iter → ∞ limit, and can only be compared to our t iter = 100 yr runs. In other words, our longest-t iter runs mimic the result of assuming instantaneous thermal relaxation.
The results of this section suggest that a selfshadowing instability such as that described in literature can only be possible if density and temperature perturbations at the disk surface propagate towards the midplane faster than they change at the surface via either cooling or dynamical processes. Identifying which of these competing effects prevails requires a time-dependent analysis, such as the one we conduct in the next section.

Rad-HD simulations
Until now we have verified that assuming hydrostatic equilibrium and instantaneous vertical diffusion naturally leads to the spontaneous formation of temperature bumps for high enough dust content. We now focus on whether such temperature perturbations survive and keep forming if gas advection is switched back on. With this purpose, we ran Rad-HD simulations starting from the longest-t iter solutions obtained in the previous section. Given the suggestion by Wu & Lithwick (2021) that iterative procedures like the one used to construct such solutions might enhance the formation of temperature perturbations in comparison with time-dependent simulations, we can regard these initial conditions as the most favorable case for the development of a self-shadowing instability.
We summarize the list of our time-dependent Rad-HD runs in Table 2. Simulations are characterized by the employed values of f dg andĉ/c and labeled as rhdgN dg _cN c , where f dg = 10 −N dg andĉ/c = 10 −N c . We define the domain as (r, θ) ∈ [0.41, 95] au × [π/2 − 0.27, π/2 + 0.27] and use a resolution of N r × N θ = 512 × 200, which results in an aspect ratio of ∆r/r∆θ ≈ 3.25. This corresponds to 2 − 8 cells in the r-direction and 6 − 30 cells in the θ-direction per scale height H throughout the domain. This resolution is high enough to reproduce the thermal evolution of the selfshadowing features, as demonstrated by the resolution study shown in Appendix C.
The domain of these simulations is slightly smaller than in the hydrostatic runs, in such a way that we can simultaneously exclude the zones where the density floor has been applied, where artificial waves are otherwise generated, and impose Dirichlet boundary conditions for all fields by fixing previously calculated values in regions that are now covered by ghost cells. For each f dg , we use as initial condition the hydrostatic solutions with t iter = 100 yr obtained after 35 iterations. In each case, we fix at the boundaries the corresponding hydrostatic solutions with t iter = 0.1 yr. We verified that choosing a different boundary condition, for instance using instead the largest-t iter hydrostatic solutions at the ghost cells, does not affect the main results of this section, and simply introduces artificial features close to the boundaries which are minimized if the shortest- t iter solutions are employed. Our results also hold if the energy is instead set as E r = a R (10 K) 4 at the ghost cells. To avoid artificial waves continuously forming at the left boundary and at the domain corners, where the density is several orders of magnitude lower than in the rest of the domain, we impose a damping layer in the region {r < 0.5 au} ∪ {|θ − π/2| > 0.12 + 0.13 r/au} (see Fig. 10), in which the irradiation flux is not updated from its boundary condition distribution and all other fields are exponentially damped to that solution with a characteristic time of 0.1/Ω(R). We solve Eqs. (1) using third-order Runge-Kutta time integration for the gas fields and its corresponding IMEX version (Melon Fuksman & Mignone 2019) for the radiation fields. We compute fluxes for hydrodynamics and radiation fields using the HLLC solvers by Toro (2009) andMignone (2019), respectively. Spatial reconstruction is performed by means of the thirdorder WENO method by Yamaleev & Carpenter (2009) modified as in Mignone (2014) for spherical grids. We compute the evolution of the system in each case up to at least 500 yr, using a Courant factor of 0.3 for the time step computation. Fig. 10 shows a series of time snapshots in all of our simulations withĉ/c = 10 −3 showing the temperature distribution at different points in the disk's evolution. For f dg = 10 −4 , the temperature remains at all times approximately equal to its initial state, since the employed initial distribution was nearly convergent. Some differences of around 10% with respect to the initial temperature can be seen near the outer radial boundary in Fig. 11, in which we show for the same snapshots the ratio between current and initial temperature values. This behavior is expected, since the initial state is not exactly at equilibrium. We also observe in this run a transient formation of sound waves at the boundaries due to the initial relaxation of the system, which we further describe later in this section.
Much more relevant is the behavior in rhdg2_c3 and rhdg3_c3, which have shadowed regions at t = 0. As soon as simulations are started, even before sound waves are produced at the boundary, we observe an immediate thermal relaxation of the outer, optically thin disk layers. Fig. 11 shows that overheated regions start to cool down and shadowed regions heat up, which blurs the initial temperature perturbations at the disk surface. This causes an immediate reconfiguration of the radiation flux emitted at the overheated regions, which redirects itself towards the previously shadowed ones. This can be seen in Fig. 12, where we show the evolution of the reduced flux and the flux direction. The same figure shows that the vertical temperature gradient produced by the reconfiguration of the outer layers leads to diffusion towards the midplane in shadowed regions and away from it in overheated ones. At approximately 0.1 yr, outward-traveling sound waves start being emitted at the inner radial boundary due to the system's initial relaxation, becoming shocks at the disk surface (see Fig. 13). These waves carry more energy as in the f dg = 10 −4 runs and are even visible in the temperature distributions, since solutions are farther apart from the final equilibrium configuration as in that case.
No signs of propagation of the temperature bumps towards the star are observed either before or after the formation of the relaxation-induced sound waves. Instead, outward-traveling waves keep forming as the thermally relaxed regions grow towards the midplane. The relaxed layers reach the midplane faster for larger radii (see Fig. 11), due to the lower vertical optical depth an consequently shorter diffusion times away from the star. All radial temperature extrema are eventually erased after approximately 100 and 10 yr for runs rhdg2_c3 and rhdg3_c3 respectively (see Fig. 10). These times coincide in each case with the relaxation timescale t cool estimated in Section 3.4 for surface temperature perturbations reaching the midplane. After 500 yr in rhdg2_c3 and 100 yr in rhdg3_c3, the temperature distributions look like the hydrostatic solutions obtained for t iter = 0.1 yr and no hints of formation of new temperature maxima can be seen.
We now take a closer look at the gas motion during the system's initial relaxation, focusing both on the damping of the initial temperature extrema and on the disk's hydrodynamical stability. We highlight the regions of space where the kinetic energy is concentrated by computing its component due to meridional motion, given by k = 1 2 ρ(v 2 r + v 2 θ ). Snapshots of this quantity are shown in Fig. 13, where we can see that the thermal relaxation seen in Fig. 11 is accompanied by a reconfiguration of the gas structure via advection in Figure 13. Same snapshots as Fig. 11, showing the time evolution of log 10 k , where k = 1 2 ρ(v 2 r + v 2 θ ).
the entire disk before this is crossed by the relaxation sound waves. We also see that these waves stop being emitted as soon as the disk reaches its final equilibrium configuration. After t ∼ t cool , the remaining waves are slowly dispersed as they travel outwards. The additional boundary-and damping-induced features occurring close to the inner radial boundary at larger times are of no relevance for the conclusions of this paper, as they do not affect the overall disk structure. During the disk relaxation, the gas motion shows velocity fluctuations at the disk atmosphere. Some of these are caused by the evolution of the disk towards a new equilibrium configuration, but these could also be signposts for an evolving hydrodynamical instability, as it occurs at much higher resolution, e.g., in the 3D studies by Flock et al. (2017b) of the VSI in optically thin regions (> 20 au) of irradiated disks. We have explored this possibility by first verifying that the disk is always radially Rayleigh-stable in the entire domain, since the squared epicyclic frequency κ 2 R for radial perturbations, computed as κ 2 R = 1 R 3 ∂ R (R 4 Ω 2 ), is always positive. However, due to its nontrivial radial and vertical temperature stratification, the disk can still be COS-and VSI-unstable at various levels in the entire domain (Klahr et al. 2022, submitted). Assuming ei-ther instant relaxation at the atmosphere or that radiative diffusion defines an optimally cooling wavenumber (Latter & Papaloizou 2018), we can get an idea of the order of magnitude of the local expected VSI growth rate Γ VSI using the expression for the fastestgrowing local VSI mode for instant relaxation given by Γ VSI = |κ 2 z | 2Ω (Klahr et al. 2022, submitted), where the squared vertical epicyclic frequency κ 2 z can be obtained as κ 2 z = 1 R 3 ∂ z (R 4 Ω 2 ). We show in Fig. 14 snapshots of Γ VSI normalized by the local Ω. In rhdg2_c3 and rhdg3_c3, the transient stratification caused by the crossing waves leads to high growth rates at the atmosphere (Γ VSI /Ω ∼ 5), which makes it possible that the observed velocity fluctuations are linked to VSI growth. In rhdg4_c3, the growth rates are smaller and no strong fluctuations are observed. In all cases, regions of large growth rates vanish at large times, and only boundaryand damping-induced features remain. The resulting stability is most likely caused by the fact that our employed radial resolution of 2 − 8 cells per scale height does not suffice to obtain spontaneous VSI growth arising from a hydrostatic stratification within our simulated run times (see, e.g., the resolution study in Flores-Rivera et al. 2020). Especially since a diffusive cool- ing process occurs in the optically thicker parts, the necessary small wavelengths that could still show significant growth are not resolved (Latter & Papaloizou 2018). Higher-resolution studies of the VSI employing the same techniques as in this work, currently in preparation, will be the matter of a next article.
A conclusion regarding the disk's cooling can be drawn from the fact that the vertical extent of the thermally relaxed layers in Figs. 10 and 11 does not trace the relaxation-induced waves, which suggests that it is unlikely that these and the mentioned velocity fluctuations play an important role in the vertical transport of energy. Such process seems to be instead entirely dominated by the joint effect of radiative cooling and gas advection, which starts smoothing out temperature extrema even before waves are formed. It is possible, however, that radial transport of energy due to either dissipation of kinetic energy or advection is enhanced by the radially travelling waves.

Reduced speed of light approximation
An issue that must be carefully treated is if the disk evolution is affected by our choice of ofĉ. A criterion for the validity of the reduced speed of light approximation was derived in Skinner & Ostriker (2013), who suggested thatĉ must always be larger than the maximum fluid velocity v max and that radiation diffusion timescales must always be smaller than typical dynamical timescales. This condition can be summarized aŝ where τ L is the optical depth along a characteristic length of the system. If we take v max ∼ c s , i.e., the gas sound speed, we get that forĉ/c = 10 −3 the ratioĉ/c s is always of the order of 10 2 − 10 3 , so the first condition is fulfilled. However, it is harder to define a correct characteristic length for a computation of τ L that is relevant for this particular problem, even more considering that different radiative transport regimes dominate in different disk regions. Nevertheless, Eq. (17) is an approximate relation, and the validity of this approach can only be safely assessed through careful testing of  the problem at hand (see, e.g., Melon Fuksman et al. 2021). We therefore follow the approach of Section 3.4 and directly compare results using different values ofĉ. Fig. 15 shows a time series of the midplane temperature as a function of radius for all simulations with f dg = 10 −2 and 10 −3 , computed withĉ/c = 10 −3 and 10 −4 . For each f dg , the functional form of these profiles is roughly the same at all times independently ofĉ, with slight differences in the perturbations caused by the sound waves. At t ∼ t cool for allĉ, the midplane temperature reaches a similar power-law functional dependence with radius as that used at the beginning of the iterative procedure in Section 3 (T ∝ r −0.45 ), with some deviations at the radial boundaries. We have also compared the full 2D temperature distributions at different times for different values ofĉ, shown for f dg = 10 −2 in Fig. 16. These profiles look approximately the same independently ofĉ, again with slight differences induced by the waves. Both simulations have at all times thermally relaxed surface layers of the same vertical extent, and reach their final configuration at the same time. As in Section 3.4, this happens because, provided the adjustment of the radiation fields is rapid enough, gas cooling only depends on the c G 0 term in Eq. (1). Lastly, we have made a space-averaged comparison of the f dg = 10 −2 and 10 −3 runs for all times by computing the L 1 norm of T − T ref , where, for each f dg , T ref is the temperature in the t = 500 yr snapshot of the corresponding run withĉ/c = 10 −3 . We show in Fig. 17 the values of L 1 (T(t) − T ref ) as a function of time normalized by L 1 (T(0) − T ref ). The obtained curves are almost identical for bothĉ values in all cases. We conclude that the observed suppression of temperature extrema in all cases is not affected by our choice ofĉ.

Discussion
We saw in Section 3 that a self-shadowing instability would require surface temperature perturbations to reach the midplane faster than the gas adjusts to ther-mal changes. On the contrary, the results of this section show that temperature extrema are smoothed out at the surface long before the temperature is altered at the midplane, and therefore an instability producing temperature perturbations throughout the full vertical extent of the disk as in dg2t100 and dg3t100 can never occur. Following this argument, the only remaining possibility would be that such an instability occurred in outer disk layers that are thin enough to transmit temperature perturbations in the vertical direction before the gas readjusts its structure, causing local increases in the scale height similar to those seen, e.g., in the hydrostatic run dg2t1. However, this process is likely disallowed by radial radiative transfer, as such layers would simultaneously have to be optically thin enough to allow for fast vertical diffusion and optically thick enough to prevent radiative transport from damping temperature extrema in the radial direction. As discussed in Section 3.4, this is supported by the fact that no such outer layers are seen for short enough t iter . Therefore, our results support the hypothesis that circumstellar disks are stable to self-shadowing due to the fact that surface temperature perturbations are damped by radiative cooling and gas advection before they affect lower, optically thick regions via diffusion. This also justifies the use of simplified power-law prescriptions for the midplane temperature in 1D disk models. Furthermore, our results imply that non-convergent hydrostatic disk models obtained through iterative procedures may not reflect the actual disk structure if they underestimate the vertical diffusion time, as it is the case if the temperature is computed by means of Monte Carlo methods.
It remains to be seen if self-shadowing effects can occur if self-sustained scale height perturbations are induced by external factors, such as planetary perturbers, large-scale structures formed by hydrodynamical instabilities, or complex dust distributions inducing local changes in the opacity. In particular, opacity transitions at different ice lines can cause sharp, sustained temperature variations that could potentially evolve through self-shadowing. We intend to study this process in future investigations. Future studies will also have to account for the damping of long-standing temperature perturbations by turbulent mixing and viscosity, which we have not studied in this work. On the other hand, irradiation-induced structures unrelated to self-shadowing can still occur in disks around variable sources such as EX Lupi and FU Orionis stars, in which sudden outbursts can excite outward-traveling sound and gravity waves (Schneider et al. 2018).
Our assumption of a constant f dg does not account for vertical settling and depletion of dust in the disk atmosphere, which could in principle reduce the height of the different τ = 1 surfaces shown in Fig 7. Even though the Stokes number remains smaller than 1 for all considered particle sizes well above the τ = 1 surfaces for irradiation, the actual dust distribution in the atmosphere depends on the balance between turbulent stirring and vertical settling. The dust distribution away from the midplane is better captured by hydrodynamical simulations than by simple prescriptions relying on a parameterization of the turbulence strength (see,e.g., Dubrulle et al. 1995;Dullemond & Dominik 2004b), since it depends on the particular source of turbulence. An example of this is the anisotropic turbulence produced by the VSI, whose vertically elongated modes significantly increase the vertical stirring of dust with respect to models of homogeneous isotropic turbulence (Stoll & Kley 2016;Stoll et al. 2017). We note that these authors obtain a vertical Gaussian dust distribution of micron-sized particles with the same scale height as the gas in the entirety of a domain reaching z = 5H at 5 au, which supports the validity of our prescription. On the other hand, dust sedimentation of small grains in outer regions can even be affected by non-ideal magnetohydrodynamical effects and magnetized winds (Riols & Lesur 2018;Hutchison & Clarke 2021;Booth & Clarke 2021). In principle, we do not expect dust settling effects to be relevant for our work as long as they occur in optically thin regions where radiative transport prevents any shadowing from occurring. Future disk models could test this hypothesis by studying the stability of disks in which the opacity is computed as a function of a dynamically evolving small dust distribution. We have also assumed instantaneous thermal equilibrium of gas and dust particles, which is not verified at high altitudes as collisions between dust and gas particles become less frequent for smaller densities (see, e.g. Malygin et al. 2017;Pfeil & Klahr 2021). We do not expect this effect to alter our results, since at most it could increase the time it takes to form temperature perturbations in such regions.

CONCLUSIONS
We conducted the first 2D radiation-hydrodynamical simulations exploring the so-called self-shadowing instability in protoplanetary disks, using the twomoment Rad-HD module by Melon Fuksman et al. (2021) integrated in the finite-volume PLUTO code. Our model includes frequency-dependent stellar irradiation, and employs typical star and disk parameters for a T Tauri system. All opacities were computed assum-ing a uniform dust-to-gas mass ratio f dg for sub-micron dust grains. We explored the amplification of thermal perturbations in hydrostatic models for different f dg values and their thermal relaxation when advection is switched back on. We summarize our main conclusions as follows: • Our iterative procedure for the construction of hydrostatic configurations does not converge for high enough f dg and long enough time t iter used for the thermal evolution between consecutive iterations. In such cases, inward-traveling vertically isothermal temperature perturbations are formed in the entire optically thick region of the disk due to self-shadowing. These results are consistent with other works employing similar iterative methods.
• When t iter is shorter than the vertical diffusion time, the temperature perturbations produced at the irradiation surface do not reach the midplane before a new iteration is started. This effect cannot be captured by models that assume instantaneous radiative diffusion. As a consequence, a middle layer is formed in which temperature variations are smaller than at the surface. The thickness of the middle layer increases for decreasing t iter and scale height perturbations are not formed for small enough t iter , which shows that a self-shadowing instability is only possible if thermal perturbations reach the optically thick disk region faster than the disk evolves at the surface layers.
• We ran Rad-HD simulations taking as initial conditions the hydrostatic solutions constructed with the highest t iter values once temperature bumps are fully formed. As soon as simulations start, these perturbations begin to be damped in the outer disk layers through radiative transport and gas advection. The thermally relaxed layers grow towards the midplane until a vertically isothermal hydrostatic configuration is reached after a vertical diffusion time of about 100 yr for f dg = 10 −2 and 10 yr for f dg = 10 −3 . Radial temperature maxima are entirely smoothed out close to the surface before the thermally relaxed layers reach the midplane, which shows that surface relaxation is much faster than vertical diffusion.
• Outward-traveling waves are formed during the system relaxation without substantially intervening in the vertical transport of energy. During this transient phase the gas shows velocity fluctuations at the atmosphere, which are possibly linked to the triggering of VSI by the density and pressure stratification produced by the waves. However, no VSI is seen once the system relaxes, due to the low employed resolution in the radial direction.
• None of our results depend on our chosen value of the reduced speed of lightĉ, since after the radiation field quickly reaches a stationary configuration, gas cooling is entirely regulated by the source term in the gas energy equation, which uses the physical value of c. We verified this by comparing the results of the same Rad-HD simulations run with differentĉ values.
Our findings suggest that circumstellar disks are stable to self-shadowing due to the damping of temperature perturbations at the surface through radiative cooling and advection before these affect optically thick disk regions via diffusion. This also justifies the use of disk models where the midplane temperature scales with the distance to the star following a simple power law. Furthermore, we showed that iterative methods used to obtain hydrostatic models of irradiated disks may induce a formation of scale height perturbations if they underestimate the vertical diffusion time, as it is the case for non-convergent Monte Carlo-based models used in previous works. Further work is needed to test our results if the assumption of a constant dust-to-gas mass ratio is replaced by a more realistic model for the evolution of dust in a turbulent disk.

APPENDIX A. OPTICAL DEPTH COMPUTATION
In the hydrostatic runs in Section 3, we approximate τ 0 in Eq. (9) by assuming that the part of the disk left outside of the domain occupies a radial extent r ∈ [r 0 , r min ]. We assume in that region a midplane temperature T ∝ R q and a midplane density ρ(r, π/2) ∝ R p , where p and q are the same exponents used to compute the density and temperature profiles at the first iteration. The gas density distribution for a vertically isothermal disk with these parameters is which in spherical coordinates becomes where H = H 0 R (q+3)/2 and all lengths are normalized by 1 au. Evaluating this expression at (r min , θ) and (r min , π/2), we obtain: ρ(r min , θ) = ρ 0 r p min sin p θ exp sin θ − 1 H 2 0 r q+1 min sin q+1 θ ρ(r min , π/2) = ρ 0 r p min . (A3) Using these equations, the density can be rewritten using only its value at r min : We use this expression to estimate the density in the entire region r ∈ [r 0 , r min ], and compute τ 0 for all frequencies as where ρ d = f dg ρ. We compute this integral analytically by defining a radial grid in the region r ∈ [r 0 , r min ]. On the other hand, in the Rad-HD simulations, τ 0 is obtained by adding its already computed value in the hydrostatic runs to the optical depth of the new left-out portion of the disk when the domain is reduced. The resulting τ 0 (r min , θ, ν) is assumed to remain constant in time, and therefore this computation is only carried out at the beginning of each run. Eqs. (A4) and (A5) yield a better prescription for τ 0 than, e.g., τ 0 (r min , θ, ν) = κ ν ρ(r min , θ)(r min − r 0 ) (Flock et al. 2013), since the resulting τ 0 is exact for vertically isothermal disks. However, real disks have a sharp transition to the atmosphere's temperature at a few scale heights above the midplane (Figs. 6, 7), where the τ = 1 surfaces for irradiation are located. We tested the accuracy of our prescription in such cases by using Eq. (A4) to compute τ 0 in a chosen radial range for a density distribution with a vertical temperature gradient. We used as test distribution a hydrostatic configuration obtained as in Section 3 with f dg = 10 −2 and r min = 0.1 au, i.e., with a minimum radius close to to r 0 . We computed the optical depth in this solution between 0.1 and 0.4 au, which would correspond to τ 0 in a simulation with r min = 0.4 au. We also performed the same calculation taking ρ from Eq. (A4) with r min = 0.4 au. Unlike for the computation of F Irr , we used in both cases κ ν = κ P (T s ). This allows us to estimate the location of the irradiation surface by equating τ 0 = 1 and to evaluate the error of the prescription, which only depends on the integral of ρ d . We show a comparison of both τ 0 (θ) curves and their relative difference in Fig. 18. We see that our prescription leads to relative τ 0 differences close to 1 at the τ = 1 surfaces. However, the value of τ 0 decreases so sharply with altitude that this barely changes the location of the τ = 1 surface an amount ∆θ ≈ 0.003 coinciding with our grid resolution, since both τ 0 curves are almost identical. This justifies the computation of τ 0 using Eq. (A4) even for disks with a vertical temperature gradient.

B. IMPLICIT METHOD
In all IMEX schemes in the Rad-HD module, radiation-matter source terms are implicitly integrated by solving an equation system of the form together with the modified conservation equation where s is a constant, ∆t is the current time step, and primed quantities are computed at the previous step. The full form of the source terms in the laboratory frame is G 0 = ρκ E r − a R T 4 − 2β · F r + ρχ β · (F r − E r β − β · P r ) G = ρκ E r − a R T 4 − 2β · F r β + ρχ (F r − E r β − β · P r ) , where β = v/c and κ and χ are the absorption and total opacity coefficients, computed as general functions of the Rad-HD fields (in this work, κ = f dg κ P (T) and χ = f dg κ R (T)). In Melon Fuksman et al. (2021), Eq. (B6) is implicitly solved via a four-dimensional root-finding algorithm, either fixed-point or Newton's method. Here we have implemented a faster method similar to the implicit scheme in Skinner & Ostriker (2013), in which all terms of order β and β 2 are explicitly integrated during the explicit step of the IMEX scheme, as they are small enough to stably do so. The remaining equation to be implicitly solved is together with Eq. (B7). The first of these equations can be written as a polynomial in the specific internal energy (Eq. (3)) as with where we have defined η =ĉρκs∆t. We solve this system iteratively by first updating the guess for F r using Eq. B9, as after which we update v using Eq. (B7). This is done only once if constant opacities are used, in which case this expression satisfies Eq. (B9) exactly. We then compute the polynomial coefficients using Eq. (B11) and apply Newton's method to update our guess of , for which we neglect the derivatives of A, B and C with . This yields a reasonable approximation to the derivative of the left-hand side of Eq. (B10) as long as κ and χ do not change abruptly with in the range spanned during the implicit step. This process is then repeated updating the opacities after every step if necessary until and v converge. Finally, E is retrieved from and v using Eq.

C. RESOLUTION STUDY
We tested if our Rad-HD simulations are affected by the employment of low resolutions per scale height in some regions of the disk, which could enhance the damping of forming radial structures. We considered for this the runs with the highest f dg , which are the most favorable ones for the potential development of shadowing structures. We repeated run rhdg2_c4 at the resolutions of N r × N θ = 1024 × 400 and 2048 × 800, the last of which corresponds to 8 − 32 cells in the r-direction and 24 − 120 cells in the θ-direction per scale height. The valueĉ/c = 10 −4 was chosen for numerical efficiency since, as shown in Section 4.2, it is high enough not to alter the results. Figure 19 shows a comparison of the temperature distributions in rhdg2_c4 and the corresponding higherresolution runs at t = 1, 10, and 100 yr. These distributions are almost indistinguishable at all times. In particular, the vertical extent of the thermally relaxed layers does not depend on the resolution, which shows that the vertical diffusion time is the same in all cases. This resolution independence indicates that numerical diffusion can only be of marginal relevance for the disk relaxation compared to the radiative and advection processes described in Section 4.1. This is also supported by the fact that radial structures formed through self-shadowing in the hydrostatic runs, occurring over typical lengthscales of a few au, are well resolved by our nominal resolution. As shown in Section 3.3, that resolution also suffices to reproduce the scale height variations in the θ-direction, which are typically of order H. Figure 19. Temperature distributions as a function of time for run rhdg2_c4 (left column) and the corresponding higher-resolution runs (middle and right columns for N r × N θ = 1024 × 400 and 2048 × 800, respectively).
We thank Mario Flock for several helpful and insightful comments and discussions during this project. The research of J.D.M.F. and H.K. is supported by the German Science Foundation (DFG) under the priority program SPP 1992: "Exoplanet Diversity" under contract KL 1469/16-1/2. We thank our collaboration partners on this project in Kiel Sebastian Wolf and Anton Krieger, under contract WO 857/17-1/2, for providing us with the employed tabulated opacity coefficients. All numerical simulations were run on the ISAAC and VERA clusters of the MPIA and the COBRA cluster of the Max Planck Society, all of these hosted at the Max-Planck Computing and Data Facility in Garching (Germany). We also thank the anonymous referee for constructive comments that helped to improve the quality of this work.