Large-scale variation in phytoplankton community composition of >1000 lakes across the USA

Although environmental impacts on the biodiversity and species composition of lakes have been studied in great detail at local and regional scales, unraveling the big picture of how lake communities respond to environmental variation across large spatial scales has received less attention. We performed a comprehensive analysis to assess how the phytoplankton community composition varies among >1000 lakes across the conterminous United States of America. Our results show that lake-to-lake similarity in species composition was low even at the local scale, and slightly decreased with geographical distance. Analysis of the compositional data by Dirichlet regression revealed that the geographical variation in phytoplankton community composition was best explained by total phosphorus (TP), water temperature, pH, and lake size. High TP concentrations were associated with high relative abundances of cyanobacteria and euglenophytes at the expense of other phytoplankton groups. High lake temperatures stimulated cyanobacteria, dinoflagellates, desmids and euglenophytes, whereas cryptophytes, golden algae and diatoms were relatively more abundant in colder lakes. Low lake pH correlated with high dissolved CO2 concentrations, which may explain why it benefitted phytoplankton groups with inefficient carbon concentrating mechanisms such as golden algae and euglenophytes. Conversely, the relative abundance of cyanobacteria showed a pronounced increase with lake pH. Large lakes showed higher relative abundances of cyanobacteria and diatoms, whereas small lakes showed higher relative abundances of chlorophytes, desmids and euglenophytes. Biodiversity increased with lake temperature, but decreased at high TP concentrations and pH. The key environmental variables identified by our study (high phosphorus loads, warm temperature, low pH) are associated with anthropogenic pressures such as eutrophication, global warming and rising atmospheric CO2 concentration. Hence, our results provide a comprehensive illustration of the major impact of these anthropogenic pressures on the biodiversity and taxonomic composition of lake phytoplankton communities.


Introduction
One of the major challenges in the aquatic sciences is to understand and predict how the species composition of lakes will respond to environmental changes induced by eutrophication, global warming and other anthropogenic pressures.A range of methods are available to address this challenge, including the development of mathematical models (Elliott 2010, Brauer et al 2012, Janssen et al 2019), analysis of the genetic and physiological traits of species using sophisticated molecular tools and laboratory experiments (Steffen et al 2014, Sandrini et al 2016), dedicated lake experiments (Schindler 1974, Huisman et al 2004), and extensive monitoring studies (Nõges et al 2008, Wagner and Adrian 2009, Pollard et al 2018, Zhang et al 2018).Each of these approaches has its own advantages and disadvantages, and integration of multiple approaches will be needed to obtain the full picture.
For example, laboratory studies have greatly contributed to our understanding of the physiological and ecological traits of different phytoplankton taxa, such as their temperature responses (Dauta et al 1990, Butterwick et al 2005, Lürling et al 2013, Paerl and Otten 2013), and nutritional requirements (Rhee and Gotham 1980, Tilman et al 1981, Sommer 1985, Spijkerman et al 2005, Passarge et al 2006, Ji et al 2017).However, laboratory studies typically investigate only a limited number of species grown under controlled conditions.It is less clear to what extent results obtained from laboratory isolates can be extrapolated to complex natural plankton communities with a high diversity of interacting species exposed to a wide range of natural and human-induced environmental pressures (Burford et al 2020).
An approach that can be very fruitful in this respect is the analysis of large-scale lake surveys (Mantzouki et al 2018, Pollard et al 2018).In these large-scale lake surveys, sampling, sample analysis and data processing follow the same standards.This approach has several interesting characteristics.(a) As species responses can deviate strongly between lakes in different settings, surveys that cover a large geographical range with many different types of lakes are well suited to identify important environmental drivers.(b) Large-scale surveys allow space-for-time substitution, where instead of a long-term temporal study in a single lake, many lakes are studied across large spatial scales (Pickett 1989).For instance, lakes in warmer climates may provide a proxy for the impact of global warming on lakes in colder climates.(c) Large-scale lake surveys provide an important benchmark for the applicability of lab findings and the predictive capacity of model studies.
Prominent examples of large-scale lake surveys include the European REBECCA dataset (Ptacnik et al 2008a), the European Multi Lake Survey (Mantzouki et al 2018), a survey of Iowa lakes in the U.S.A. (Filstrup et al 2016), as well as other lake surveys in South America (Kosten et al 2012), Canada (Vogt et al 2017, Hansson et al 2019) and China (Tong et al 2017, Tao et al 2020).The most extensive survey is probably the National Lakes Assessment (NLA) of the United States Environmental Protection Agency (US EPA), conducted in the summers of 2007, 2012 and 2017 across the conterminous USA.In each of these years, biological, chemical and physical measurements and samples were taken in approximately 1000 lakes, and the datasets have been used for a wide range of limnological studies (Pollard et al 2018).
Together, these large-scale lake surveys have provided new insights and broadened our knowledge on aquatic ecosystems.For example, these studies have identified nutrients and temperature as the most important environmental drivers of phytoplankton community composition (Watson et al 1997, Ptacnik et al 2008a, Filstrup et al 2016, Sodré et al 2020).High nutrient loads generally increase phytoplankton biomass (Vollenweider et al 1974) and favour cyanobacterial dominance (Watson et al 1997, Ptacnik et al 2008a, Beaulieu et al 2013, Taranu et al 2015, Filstrup et al 2016, Beaver et al 2018, Ho and Michalak 2020, Sodré et al 2020).These large-scale findings align with numerous local and regional lake studies in which eutrophication promoted cyanobacterial abundance (Schindler 1974, Paerl et al 2011), often at the expense of chrysophytes (Ptacnik et al 2008a) and desmids (Coesel 1982).Such shifts in phytoplankton community composition with increasing nutrient loads may also impact phytoplankton biodiversity.Specifically, phytoplankton biodiversity tends to increase with increasing nutrient load, but may level off or even decline at very high nutrient loads (Jeppesen et al 2000, Stomp et al 2011).
Temperature impacts the thermal stratification of lakes, which suppresses vertical mixing.Thermal stratification is therefore expected to favor buoyant species including several bloom-forming cyanobacteria (Huisman et al 2004, Elliott 2010).Conversely, large diatom species may be at a disadvantage in stratified lakes, as they sink to the sediments due to their relatively heavy silica frustules (Huisman et al 2002, Winder et al 2009, Rühland et al 2015).Moreover, although there is considerable interspecific variation, cyanobacteria tend to have higher temperature optima than diatoms (Dauta et al 1990, Butterwick et al 2005, Paerl and Otten 2013).Consequently, many lake studies have reported increasing abundances of cyanobacteria (Jöhnk et al 2008, Wagner and Adrian 2009, Kosten et al 2012, Beaulieu et al 2013) and decreasing abundances of large diatoms at higher temperatures (Winder et al 2009, Rühland et al 2015, Sodré et al 2020).
Many of the large-scale lake surveys have focused on specific taxonomic groups, particularly on cyanobacteria (Kosten et al 2012, Beaulieu et al 2013, Rigosi et al 2014, Taranu et al 2015, Beaver et al 2018, Ho and Michalak 2020).The interest in cyanobacteria is understandable, because they can produce toxic blooms which have negative impacts on water quality, and hence are highly relevant in lake management (Chorus and Bartram 1999, Meriluoto et al 2017, Huisman et al 2018).However, in addition to cyanobacteria, other freshwater phytoplankton taxa, such as chrysophytes (Hiltunen et al 2012) and euglenophytes (Zimba et al 2017), are also capable of toxin production.Moreover, other phytoplankton groups may play key roles in biogeochemical cycles, such as silica drawdown by diatoms (Smol and Stoermer 2010), or may have unexpected impacts on food web structure such as mixotrophic taxa capable of both photoautotrophic and heterotrophic growth (Flynn et al 2013, Hansson et al 2019).It is therefore important to gain a better understanding of how entire phytoplankton communities respond to environmental change, including phytoplankton groups that have received less attention in previous large-scale studies.
In this study, we analyze how phytoplankton community composition and biodiversity varies with major environmental drivers across the conterminous USA.The data are obtained from the NLA of the US EPA in the summer of 2012, which provides a unique data set in which environmental variables and phytoplankton community composition of >1000 lakes were sampled according to standardized protocols (Pollard et al 2018).Our analysis aims at 'the big picture' , and will undoubtedly overlook details of relevance for the distribution of specific taxa or the community composition of specific lakes.We first investigate correlation patterns among the major environmental variables measured in the lakes, and assess which of these environmental variables best explain the observed variation in community composition.Next, we investigate how each of the selected environmental variables impacts the relative abundance of all major phytoplankton groups, and their most common genera, and how these variables affect phytoplankton biodiversity and dominance patterns.

Data collection
All data used for this study were obtained from the NLA 2012 of the US EPA (2016).A total of 1235 lakes from the conterminous USA with a size greater than 1 ha were sampled once or twice during summer (May-September) (US EPA 2011a).In this study, we investigated one sample per lake and therefore used only the data from the first visit for those lakes that were visited twice.At each lake, an extensive suite of abiotic and biotic parameters was measured.We describe the methods only briefly below, and refer to the US EPA reports for more extensive methodological details of the sampling and analyses (US EPA 2011b, 2012).
Depth profiles of water temperature, pH, and dissolved oxygen were measured with a water quality probe or multi-parameter sonde.We used these data to calculate the average temperature, pH, and dissolved oxygen of the surface water between 0 and 2 m depth.Furthermore, we calculated the maximum buoyancy frequency (N 2 ) as a proxy of the stability of lake stratification (Denman and Gargett 1983).Specifically, buoyancy frequency was calculated from depth profiles of water density as a function of temperature and salinity (Chen and Millero 1986), where salinity was calculated from conductivity (Lewis 1980).
Furthermore, water samples were collected from the surface water, at the location where the lake reached its deepest point, using a 2 m integrated sampler.If the euphotic zone (estimated from Secchi disk measurements) was less than 2 m deep, water samples were taken only within the euphotic zone (US EPA 2011b).Dissolved organic carbon (DOC) was measured using UV-promoted persulfate oxidation to CO 2 with infrared detection.Dissolved silica (DSi) was measured using flow injection analysis.Total nitrogen (TN) and total phosphorus (TP) were measured using flow injection analysis after persulfate digestion.Acid neutralizing capacity (ANC) was determined using automated acidimetric titration to pH ⩽ 3.5 with modified Gran plot analysis.Conductivity was measured at 25 • C using Man-Tech TitraSip automated analysis (Man-Tech, Guelph, Canada) (US EPA 2012).Turbidity was measured using Man-Tech TitraSip analysis (Man-Tech, Guelph, Canada) or manual analysis using a HACH turbidimeter for high turbidity samples.Chlorophyll-a (Chl a) content was extracted with 90% acetone and analyzed by fluorometry.For details on measurement precision and quality control, see the Laboratory Operations Manual (US EPA 2012).
Dissolved CO 2 and HCO 3 − concentrations were not measured in the lakes.Therefore, we calculated dissolved CO 2 and HCO 3 − concentrations from pH and ANC (Stumm and Morgan 2012), and then transformed dissolved CO 2 concentrations to partial pressure (pCO 2 , in ppm) based on Henry's law using equilibrium constants corrected for salinity and temperature (Weiss 1974, Millero et al 2006).
Phytoplankton samples were collected from the upper 2 m of the water column (or from the euphotic zone if the euphotic zone was less than 2 m deep) using an integrated sampler, at the same location where the other parameters were measured (US EPA 2011b).The samples were preserved with Lugol's iodine.Phytoplankton was identified to the lowest taxonomic level (usually the species or genus level), and counted with an inverted microscope using Utermöhl sedimentation chambers.A total of 400 natural algal units per sample was counted in 8-100 randomly selected fields.Phytoplankton biovolume of each taxon was calculated from cellular dimensions and geometric shapes measured by microscopic examination (Hillebrand et al 1999).A detailed description of the procedures can be found in US EPA (2012).For each lake, we calculated 'relative abundances' of the different taxa, defined as the relative contributions of these taxa to the total phytoplankton biovolume.We focused our analysis on the relative abundances of the eight most abundant phytoplankton groups, which together comprised >98% of the total phytoplankton biovolume when averaged over all lakes.
We used the following classification: cyanobacteria, diatoms, chlorophytes, desmids, dinoflagellates, cryptophytes, euglenophytes and golden algae (which include chrysophytes and synurophytes).We also analyzed relative abundances at the genus level, for those phytoplankton genera that appeared in more than 100 of the lakes selected for regression analysis.We used the genus names as they were given in the NLA dataset.

Biodiversity and similarity indices
We calculated taxon richness and the Shannon-Wiener diversity index to assess phytoplankton biodiversity of each lake.The Shannon-Wiener index (H) was calculated as: where taxon richness R is the total number of taxa in a lake, and q i is the proportion of individuals belonging to the ith taxon in this lake.
We also calculated the similarity in phytoplankton community composition between each pair of lakes using the Sørensen and Bray-Curtis similarity indices.For this purpose, we used all phytoplankton taxa identified to at least the genus level, which comprised 90% of the counted taxa, and ∼94% of the counted biovolume.The Sørensen similarity index only considers presence/absence data, whereas the Bray-Curtis similarity index takes phytoplankton abundance into account.Both similarity indices (S) are defined as: For the Sørensen similarity index, A and B are the number of taxa in lake A and lake B, and C is the number of taxa shared by the two lakes.For the Bray-Curtis similarity index, A and B represent the sum of the relative abundances of the phytoplankton taxa in lake A and in lake B, and C is the sum of the lesser relative abundances for those taxa shared by the two lakes.The similarity indices were calculated using the function vegdist from the vegan package in R. For comparison with the similarity data, we also quantified the environmental variation among lakes by calculating the (absolute) difference between the values of environmental variables for each pair of lakes.Furthermore, we calculated the geographical distance between each pair of lakes using the function distVincentyEllipsoid from the geosphere package in R (Karney 2013).
To test whether the similarity in community composition and the environmental variation changed with increasing distance between lakes, we performed two-sided Jonckheere-Terpstra tests (a non-parametric trend test) from R package clinfun, using 3000 permutations for the reference distribution.

Statistical analyses
We first tested whether all explanatory variables and their log-transformed values followed a normal distribution using the Shapiro-Wilks test.If log-transformation improved normality and/or homoscedasticity, log10-transformed data were used for further analysis.We note that buoyancy frequency values were transformed as log10(x + 1), and ANC values were transformed as log10(x + 1000).Correlations between environmental variables were investigated using Pearson's product-moment correlation, with p values corrected for multiple hypothesis testing using the false discovery rate (FDR: Benjamini and Hochberg 1995).
For our further analysis, we standardized all explanatory variables using z-transformation to obtain standardized effect sizes (Feld et al 2016).We calculated the variance inflation factor (VIF) to quantify multicollinearity of the explanatory variables, and selected variables with a VIF <5 in a stepwise manner using the R package usdm (Naimi et al 2014).We then assessed the relation between the relative abundances of the phytoplankton groups and the selected explanatory variables using multiple regression analysis.
Relative abundances of phytoplankton taxa represent compositional data, consisting of continuous proportions or percentages that sum to a total of one or 100% (Aitchison 1982, Greenacre 2021).Compositional data may show spurious correlations between taxa because the total sum of relative abundances is constrained.Furthermore, compositional data cannot be analyzed by standard linear regression models, because the data are not normally distributed and the variance is not homogeneous across the range from zero to one.These issues are often addressed by transforming the relative abundances using for instance a logit or log-ratio transformation (e.g. the centered log-ratio (clr) transform), which changes the data from values bounded between zero and one to unconstrained values.Subsequently, the transformed data can be analyzed by standard statistical techniques, such as ordinary least-squares regression.However, data transformations have some limitations that may introduce biases or hamper the interpretation.For example, a logit transformation will strongly increase the error for relative abundances close to zero (many phytoplankton taxa in our dataset) or one.Furthermore, the clr transform lacks subcompositional coherence.This means that changes in only a subset of a multispecies community (e.g.competitive replacement of species A by species B) will alter the clr transform of all species in the community, even if the relative abundances of the other species were not affected.Therefore, instead of a data transformation, we preferred a regression analysis that uses the original scale of measurement.We chose for Dirichlet regression, which is specifically tailored for compositional data consisting of multiple categories (multiple taxa) (Acevedo-Trejos et al 2013, Douma and Weedon 2019).
To account for frequent zero-valued observations, the relative abundances were first adjusted according to the following equation (Douma and Weedon 2019): In our analysis, p i is the relative abundance of phytoplankton group i, n is the total number of lakes used in the analysis (n = 1041), and G is the number of phytoplankton groups (G = 8).This adjustment implies that zero values are replaced by a very small value (1/Gn), while non-zero values are nearly identical to the original data because of the high number of lakes, and the total relative abundances still sum to one.We performed a stepwise multiple Dirichlet regression including all phytoplankton groups, as function of the environmental variables using the R package DirechletReg (Maier 2014).We selected the environmental variables that contributed most to the explanatory power of the regression model based on the Bayesian information criterion (BIC: Schwarz 1978), where the smallest value indicates the best model fit.We chose BIC over other model selection criteria (e.g.Akaike information criterion), because BIC penalizes parameters more strongly and therefore tends to select the most parsimonious set of explanatory variables (Raftery 1995).The four environmental variables selected by the Dirichlet regression were log(TP), water temperature, pH and log(lake area).
Next, we investigated the relation between the relative abundances of the phytoplankton groups and these four environmental variables in further detail.For this purpose, the TP concentration, temperature, pH and area of the lakes were each classified into four different categories, where we ensured that each category comprised at least 100 lakes (table 1).
Relative abundances of the phytoplankton groups, the Shannon-Wiener diversity index and taxon richness were not normally distributed over the lakes, as confirmed by the Shapiro-Wilk test.We therefore used non-parametric Kruskal-Wallis one-way analysis of variance to assess whether relative abundances of phytoplankton groups, taxon richness and the Shannon-Wiener diversity index differed among lakes with different TP concentrations, water temperature, pH and lake area.The Kruskal-Wallis test was followed by Dunn's post hoc pairwise comparisons, and the p values were corrected for multiple hypothesis testing using the FDR (Benjamini and Hochberg 1995).
Furthermore, we calculated the 'probability of dominance' of each phytoplankton group.The probability of dominance was defined as the percentage of lakes in which a given phytoplankton group comprised >75% of the total phytoplankton biovolume.We investigated whether the probability of dominance of the phytoplankton groups differed among lakes with different TP concentrations, temperature, pH and lake area using Chi-square tests.If Chi-square tests were significant, we used the two-proportions Z-test for post hoc pairwise comparisons of the probabilities of dominance.The p-values of the Chi-square tests and Z-tests were corrected for multiple hypothesis testing using the FDR (Benjamini and Hochberg 1995).All statistical analyses were performed in R (version 4.0.2;R Core Team 2020).

Environmental variation among lakes
The lakes cover a wide range of environmental conditions, with summer water temperatures ranging from <10 to >35 • C, TN and TP concentrations spanning three orders of magnitude, and pH values ranging from 3 to 11 (figure 1).Several of the environmental variables measured in the lakes were strongly associated with each other (figure 1, table S1).TP, TN, turbidity, DOC and pH were all positively correlated, illustrating that lakes with high nutrient concentrations tend to have high levels of aquatic primary production, which increases turbidity, DOC production and pH.Lakes with high pH had low pCO 2 and high HCO 3 − concentrations, in agreement with the pH dependence of carbon speciation.Furthermore, pCO 2 showed weak negative correlations and pH showed weak positive correlations with TN and TP in this data set (figure 1).Some variables, such as water temperature and lake area, only showed very weak or insignificant correlations with most other variables (figure 1).Lake area correlated positively with lake depth, however, illustrating that large lakes tend to be deeper (Cael et al 2017).
To remove the redundancy among lake variables, we calculated VIFs.This led to removal of the variables log(pCO 2 ) and log(HCO 3 − ) which had a high collinearity with pH and log(ANC), log(Secchi depth) which had a high collinearity with log(turbidity), and log(conductivity) which had a high collinearity with log(ANC).Although log(TN) and log(TP) also showed a high collinearity, we decided to maintain both log(TN) and log(TP) in our analysis, because of the major effects of both N and P limitation on phytoplankton communities (Smith 1983, Elser et al 1990, Conley et al 2009).Hence, based on the VIFs, the explanatory variables selected for further analysis were dissolved oxygen, water temperature, pH, log(ANC), log(lake depth), log(lake area), log(DOC), log(TP), log(TN), log(DSi), log(turbidity), and log(N 2 ).After removing all missing observations from the dataset, a total of 1041 lakes remained that included all selected variables.

Geographical distribution of phytoplankton composition
All major phytoplankton groups occurred throughout the conterminous USA, yet some biogeographical patterns were apparent (figure 2).Cyanobacteria occurred in almost all lakes and often dominated in the Midwest (figure 2  The phytoplankton community composition varied not only at the regional scale, but in many cases it also differed strongly between nearby lakes at local scales (figure 2).Consequently, Sørensen and Bray-Curtis similarity indices were relatively low even for lakes in the same area, and decreased only slightly but significantly with increasing geographical distance between lakes (figure 3, Sørensen similarity, two-sided Jonckheere-Terpstra test, T JT = 4.59 × 10 10 , p < 0.001; Bray-Curtis similarity, two-sided Jonckheere-Terpstra test, T JT = 4.83 × 10 10 , p < 0.001).The decrease in similarity in phytoplankton composition corresponded with a significant increase in variation in TP concentrations, water temperature and pH with increasing geographical distance between lakes (figure S1).

Environmental variables explaining phytoplankton composition
Subsequently, we analyzed which environmental variables provided the most parsimonious explanation for the observed variation in phytoplankton community composition.Stepwise selection of environmental variables by Dirichlet multiple regression showed that a model incorporating TP, water temperature, pH, and lake area best explained the relative abundances of the phytoplankton groups across the 1041 lakes (tables S2 and S3).Our further analysis therefore focused on these four explanatory variables.

TP
TP concentrations varied among lakes at both local and regional scales.Lakes in the Appalachian Highlands, Rocky Mountains, Great Lakes area and Pacific Northwest generally had the lowest TP concentrations, while many lakes in the Great Plains area had high TP concentrations (figure S2(A)).Phytoplankton community composition varied strongly with the TP concentration (figure 4, table S4).Relative abundance of cyanobacteria and euglenophytes were highest in phosphorus-rich lakes (figures 4(A) and (B)), whereas dinoflagellates, cryptophytes, desmids and golden algae reached their highest relative abundances in phosphorus-poor lakes (figures 4(D), (E), (G) and (H)).Relative abundances of diatoms and chlorophytes did not show major variation with the TP concentration (figures 4(C) and (F)).c, d) indicate significant differences between lakes with different total phosphorus concentrations, temperature, pH, or lake area, as tested by nonparametric Kruskal-Wallis tests (table S4) followed by post-hoc comparison of the relative abundances using Dunn's test.
At the genus level, the response to TP largely reflected that of the corresponding phytoplankton group, except for the diatoms and chlorophytes where some genera increased, others decreased, and again others showed no response to TP (figure 5, table S5).These opposite responses of individual genera may explain the lack of response of chlorophytes and diatoms at the phytoplankton group level.

Lake temperature
In line with expectation, lakes were considerably warmer in the Southeast and Southern California than in the Rocky Mountains and Pacific Northwest (figure S2(B)).Phytoplankton community composition varied significantly with lake temperature (figure 4, table S4).Although cyanobacteria and dinoflagellates showed only a modest increase with temperature (figures 4(A) and (D)), relative abundances of desmids and euglenophytes were much higher in warm than in cold lakes (figures 4(B) and (G)).Conversely, relative abundances of cryptophytes and golden algae were highest in cold lakes (figures 4(E) and (H)).Relative abundances of diatoms and chlorophytes did not show major variation in response to lake temperature (figures 4(C) and (F)).For cyanobacteria, chlorophytes and diatoms, patterns at the genus level did not always reflect the temperature response of the corresponding phytoplankton group (figure 6, table S6).Among the cyanobacteria, for example, relative abundances of Cylindrospermopsis and Pseudanabaena increased but the relative abundance of Aphanizomenon decreased significantly with lake temperature (figure 6(A)).S5) followed by post-hoc comparison of the relative abundances using Dunn's test.

Lake pH
Many lakes along the East Coast, Great Lakes area and Pacific Northwest are moderately acidic, whereas pH > 8 in most lakes in the semi-arid regions of the Great Plains and Southwest (figure S2(C)).This pattern is largely associated with geographical variation in precipitation, with higher rainfall leaching alkaline elements from the soil causing lower pH (Slessarev et al 2016).Decay of organic matter and the production of humic acids in peatlands and forested areas may also contribute to low pH.Phytoplankton community composition varied significantly with lake pH (figure 4, table S4).Relative abundance of cyanobacteria was highest in alkaline lakes (figure 4(A)), whereas relative abundances of euglenophytes, dinoflagellates, cryptophytes, desmids and golden algae were all highest in acidic lakes (figures 4(B), (D), (E), (G) and (H)).Diatoms and chlorophytes did not show a clear pattern in response to lake pH (figures 4(C) and (F)).At the genus level,  S6) followed by post-hoc comparison of the relative abundances using Dunn's test.many genera showed either a similar pH response as the corresponding phytoplankton group, or no clear response to pH (figure 7, table S7).Relative abundances of the filamentous nitrogen-fixing cyanobacteria Anabaena, Aphanizomenon and Cylindrospermopsis increased particularly strongly with pH (figure 7(A)).

Lake area
Lake area varied over more than four orders of magnitude, but the geographical distribution of lake area did not show a distinct large-scale pattern in this dataset (figure S2(D)).Relative abundances of cyanobacteria and diatoms were significantly higher in large lakes (figures 4(A) and (C)), whereas relative abundances of euglenophytes and chlorophytes were significantly higher in small lakes (figures 4(B) and (F)).Patterns at the genus level largely reflected patterns at the level of the phytoplankton groups (figure 8, table S8).(a, b, c, d) indicate significant differences between lakes with different pH, as tested by nonparametric Kruskal-Wallis tests (table S7) followed by post-hoc comparison of the relative abundances using Dunn's test.

Dominant phytoplankton groups
We also investigated under which environmental conditions specific phytoplankton groups became dominant.For this purpose, the 'probability of dominance' was calculated as the percentage of lakes in which a phytoplankton group comprised >75% of the total phytoplankton biomass.For cyanobacteria, the probability of dominance showed a strong increase with TP concentration and lake pH (figure S3(A); see table S9 for the statistics).Specifically, cyanobacteria were dominant in <5% of the lakes with a low TP concentration or a pH < 7, whereas they were dominant in >30% of the lakes with a high TP concentration or a pH > 9.The probability of dominance of diatoms was highest in cold waters (figure S3(C)), the probability of dominance of dinoflagellates and golden algae was highest in relatively acidic lakes  (a, b, c) indicate significant differences between lakes with different lake areas, as tested by nonparametric Kruskal-Wallis tests (table S8) followed by post-hoc comparison of the relative abundances using Dunn's test.
(figures S3(D) and (H)), and the probability of dominance of chlorophytes was highest in small ponds (figure S3(F)).

Biodiversity patterns
Phytoplankton biodiversity, expressed as taxon richness and by the Shannon-Wiener diversity index, varied among lakes at both local and regional scales (figure S4).Biodiversity was significantly lower in lakes with high TP concentrations and high pH, and increased significantly with lake temperature (figure 9; tables S10 and S11).Biodiversity did not vary significantly with lake area (figures 9(D) and (H); table S10).The same  S10) followed by post-hoc comparison of the relative abundances using Dunn's test.lack of significance between biodiversity and lake area was also obtained if we used a linear regression analysis over all n = 1041 lakes (table S11), indicating that this result was robust.

Large-sale patterns in community composition
The more than 1000 lakes analyzed in this study span a wide variety of geographical regions and environmental conditions, from oligotrophic to hypertrophic lakes, from cold mountain waters to warm subtropical lakes, and from acidic to highly alkaline lakes (figures 1 and 2).Phytoplankton community composition varied considerably between lakes, and not only at large spatial scales.Even phytoplankton communities of nearby lakes in the same local area often had a very different taxonomic composition.Yet, despite this local variation, large-scale patterns in phytoplankton community composition in lakes across the USA can also be discerned.Our analysis shows that the variation in phytoplankton community composition is best explained, in a statistical sense, by TP, temperature, pH and lake size.For example, cyanobacteria dominated the phytoplankton in many lakes with high phosphorus concentrations and high pH in the Midwest.Euglenophytes were abundant in several warm lakes in the South, diatoms preferred the colder lakes in, e.g. the Rocky Mountains, and dinoflagellates, desmids and golden algae tended to prefer the more oligotrophic and less alkaline lakes (figures 2 and 4).
The selection of TP, temperature, pH and lake size by our statistical analysis does not necessarily mean that these specific variables offer a causal explanation for the observed variation in phytoplankton community composition, since correlation does not necessarily reflect causation.For example, TP, TN and turbidity showed high collinearity (figure 1).TP was only a slightly better predictor than turbidity and TN (see the one-parameter models in table S2), and their high collinearity makes it difficult to disentangle effects of phosphorus, nitrogen and light on community composition from these lake data.This implies that TP represents a simple proxy of the overall trophic status of the lakes, including concomitant variation in nitrogen and light levels.Similarly, lake pH showed a strong negative correlation with pCO 2 and a strong positive correlation with the bicarbonate concentration (figure 1), and therefore variation in pH also represents variation in carbon speciation among the lakes.This implies that the lake data cannot distinguish whether shifts in community composition in response to the pH gradient represent direct pH effects on the species or indirect effects mediated by the availability and speciation of inorganic carbon.Hence, TP, temperature, pH and lake area can best be interpreted as the four major 'summary variables' that capture the underlying large-scale variation in a whole suite of environmental conditions governing phytoplankton community composition.

Cyanobacteria
Our results show that cyanobacteria reached their highest relative abundances in phosphorus-rich lakes with high pH, and showed a weaker relationship with lake temperature and lake area (figures 4 and S3).The increase of cyanobacteria with TP concentration is in agreement with many previous studies, which have shown that high nutrient loads stimulate the development of cyanobacterial blooms (Downing et al 2001, Paerl et al 2001, O'Neil et al 2012, Beaver et al 2018, Huisman et al 2018, Ho and Michalak 2020).
High relative abundances of cyanobacteria in lakes with high pH can be interpreted as a preference of cyanobacteria for alkaline waters with relatively high bicarbonate concentrations.Bicarbonate is the dominant inorganic carbon species at pH > 6.5 and cyanobacteria use a sophisticated CO 2 concentrating mechanism (CCM) including several bicarbonate uptake systems with different kinetic properties, which makes them highly effective in bicarbonate acquisition in waters at high pH (Price 2011, Sandrini et al 2014, Burnap et al 2015, Ji et al 2020).For example, Ji et al (2020) showed that in CO 2 -depleted waters of high pH the cyanobacterium Microcystis aeruginosa strongly down-regulates its CO 2 uptake capacity and fully relies on bicarbonate uptake to cover its photosynthetic carbon fixation.Moreover, the photosynthetic activity of dense cyanobacterial blooms can be responsible for the drawdown of pCO 2 , as indicated by the negative correlation between pCO 2 and chlorophyll a in lakes dominated by cyanobacteria (figure S5).This creates a positive feedback loop, where cyanobacterial dominance is not only favoured by high pH, but CO 2 depletion by dense cyanobacterial blooms in nutrient-rich waters also contributes to a high pH and a shift to bicarbonate as the dominant inorganic species (Verspagen et al 2014, Visser et al 2016).
The positive relation between relative cyanobacterial abundance and lake area appears to be a robust result (figures 4 and S3, table S3).Interestingly, this relationship is particularly pronounced for the nitrogen-fixing genera Anabaena (now known as Dolichospermum) and Aphanizomenon (figure 8(A)).Indeed, nitrogen-fixing cyanobacteria such as Aphanizomenon, Anabaena and Nodularia are also common genera in several other large waters, such as the Baltic Sea and Lake Peipsi (Stal et al 2003, Nõges et al 2008, Kahru et al 2020).
Although many studies have pointed out that cyanobacterial blooms will benefit from high temperatures (Paerl and Huisman 2008, Wagner and Adrian 2009, Kosten et al 2012, O'Neil et al 2012, Visser et al 2016, Huisman et al 2018, Ho and Michalak 2020), our results showed only a weak positive relationship of lake temperature with the relative abundance and no relationship with the dominance of cyanobacteria (figures 4 and S3).Previous analyses of the lake data of the US EPA collected in the summers of 2007 and 2012 did find positive effects of temperature on cyanobacteria (Beaulieu et al 2013, Rigosi et al 2014).Similar to our results, however, these studies also pointed out that nutrients had stronger effects on cyanobacterial abundance than temperature (Beaulieu et al 2013, Rigosi et al 2014).At the genus level, temperature affected the relative abundances of some cyanobacteria more strongly (figure 6(A)).In particular, relative abundances of the bloom-forming genera Cylindrospermopsis, Planktolyngbya and Pseudanabaena were significantly higher in warm than in cold lakes.Cylindrospermopsis is a genus generally found in warmer waters that has rapidly expanded from tropical to temperate zones in recent decades (Padisák 1997), presumably in response to global warming (Briand et al 2004).Also, Pseudanabaena is known to prefer relatively warm waters (Gao et al 2018).Other genera such as Aphanizomenon decreased with lake temperature, however (figure 6(A)).These opposite temperature effects on different cyanobacterial genera could potentially cancel out temperature effects at higher levels of taxonomic aggregation, and hence may explain the weak temperature response for cyanobacteria as a whole.

Diatoms
The probability of dominance of diatoms was highest in lakes with low water temperature and low TP concentration (figure S3(C)).These results are in line with the common observation that, as a functional group, freshwater diatoms often dominate in cold and relatively nutrient-poor waters (Reynolds 2006, Edwards et al 2012).Furthermore, the relative abundance of diatoms increased with lake area (figure 4).This pattern was particularly pronounced for the genera Aulacoseira and Fragilaria (figure 8(C)), which consist of several large and heavily silicified taxa that require a considerable degree of turbulence to remain suspended in the water column (Sherman et al 1998, Ptacnik et al 2003, Rühland et al 2008, 2015).Hence, the positive association of these genera with lake area is in line with the observation that larger lakes are often more exposed to wind action and tend to experience deeper and more extensive vertical mixing (Fee et al 1996).Another possible explanation for the positive association between diatoms and lake area is that large lakes tended to have low DOC concentrations (figure 1), which results in relatively clear waters dominated by blue and green wavelengths that provide a suitable light environment for the fucoxanthin pigments of diatoms (Holtrop et al 2021).
At the aggregated level of the entire functional group, the relative abundance of diatoms did not show a clear relationship with TP and lake temperature (figure 4).However, cell size, temperature dependence and nutrient physiology vary considerably among diatom taxa, which may cause substantial changes in the taxonomic composition of diatom assemblages along temperature and nutrient gradients (Kilham et al 1996, Winder et al 2009, Rühland et al 2015).This is also illustrated by our results, which show large variation in temperature and nutrient responses among different diatom taxa (figures 5 and 6).

Chlorophytes and desmids
Chlorophytes did not display clear patterns in relation to TP, lake temperature and pH, but showed a preference for small lakes.Our results match previous studies reporting no significant response of relative chlorophyte abundance to TP (Watson et al 1997), although some studies report an increase (Jensen et al 1994).Similar to diatoms, the data show large variation in the response to TP, lake temperature and lake pH among different chlorophyte taxa (figures 5-7), which likely explains the lack of major trends at the level of the entire functional group.Many chlorophyte genera with high relative abundances in small lakes (e.g.Gloeocystis, Oocystis, Scenedesmus, Tetraedron; figure 8(B)) are common members of the periphyton (Eminson andMoss 1980, McCormick et al 1996), which may explain why chlorophytes as a group showed a higher relative abundance and higher probability of dominance in small lakes (figures 4 and S3).
Relative abundances of desmids were highest in small lakes with low TP concentration, low pH, and high temperature.These findings are in line with the observation that many desmid species inhabit the oligotrophic and relatively acidic waters of bogs and fens (Brook 1981, Coesel 1982), although some taxa prefer more eutrophic and alkaline waters (Spijkerman et al 2005).The preference of desmids for warm waters is in good agreement with laboratory experiments showing that a variety of desmid species isolated from temperate climates all had high temperature optima in the range of 25 • C-30 • C (Coesel andWardenaar 1990, Stamenković andHanelt 2013).Indeed, although desmids have a cosmopolitan distribution, their taxonomic diversity and quantitative abundance are generally highest in tropical regions (Coesel 1996, Barbosa andPadisák 2002), and most desmids in the temperate zone reach peak abundances during the summer months (Canter andLund 1966, Coesel andKooijman-Van Blokland 1994).

Dinoflagellates
Dinoflagellates reached their highest relative abundances in small-to medium-sized lakes with low TP concentration, low pH, and relatively high temperature.Vertical migration of dinoflagellates provides a major competitive advantage in stratified waters with a nutrient-depleted surface layer but higher nutrient availability at depth (Lieberman et al 1994, Doblin et al 2006, Peacock and Kudela 2014).However, the motility of dinoflagellates is sensitive to shear stress generated by high turbulence (Thomas andGibson 1990, Berdalet et al 2007), although the sensitivity varies among taxa (Smayda 2002, Sullivan andSwift 2003).This likely explains why dinoflagellates are less common in large lakes that are more exposed to wind stress and experience deeper vertical mixing (Fee et al 1996).High temperatures strengthen the thermal stratification of lakes and enable a faster swimming speed of dinoflagellates, which may further improve conditions for the motile behavior of flagellated phytoplankton species (Kamykowski and McCollum 1986, Huisman et al 1999, Winder and Sommer 2012).Hence, although dinoflagellates display considerable physiological, morphological and behavioral variation, our results align with the common view that as a group they are generally well adapted to thermally stratified waters with a relatively low nutrient availability in the epilimnion.
In our study, the relative abundance and probability of dominance of dinoflagellates decreased strongly with pH (figures 4 and S3).One possible explanation for this pronounced pattern is that many dinoflagellate species can grow mixotrophically, combining photosynthesis and bacterivory (Stoecker 1999, Flynn et al 2013).Several studies have indicated that mixotrophic taxa are especially abundant in humic waters with high DOC concentrations and bacterial abundances (Jansson et al 1996, Hansson et al 2019).Humic waters also tend to have a relatively low pH, which may explain the prevalence of dinoflagellates in lakes with pH < 7.

Golden algae, cryptophytes and euglenophytes
According to the phycological literature, golden algae (including chrysophytes and synurophytes) are typically associated with oligotrophic or mesotrophic waters with a slightly acidic to neutral pH, cryptophytes can be found in a wide range of aquatic ecosystems, and euglenophytes are particularly abundant in nutrient-rich waters and also in acidic environments (De Huszar and Caraco 1998, Reynolds 2006, Wehr et al 2015).This is in good agreement with the distribution patterns observed in our study.
In general, less attention has been given to the CCMs of golden algae, cryptophytes and euglenophytes than to the CCMs of other phytoplankton groups.For cryptophytes, in particular, the existence of a CCM has not yet been well investigated (Kroth 2015).Chrysophytes can take up CO 2 but not bicarbonate, and hence appear to lack a CCM (Maberly et al 2009).Euglenophytes have either no CCM or a relatively ineffective CCM, depending on the species (Colman and Balkos 2005).Hence, golden algae and euglenophytes are likely to benefit from high CO 2 availability at low pH values.This is confirmed by our results, where relative abundances of golden algae and euglenophytes were highest in acidic lakes (figure 4), but remained low in alkaline lakes where bicarbonate is the predominant inorganic carbon source and CO 2 concentrations are low.
The temperature response of golden algae, cryptophytes and euglenophytes has also received relatively little attention in the literature.Yet, all three groups showed a pronounced temperature response in the lake data.Specifically, golden algae and cryptophytes had higher relative abundances in cold lakes, whereas euglenophytes reached high relative abundance in warm lakes (figure 4).Indeed, several experimental and field studies have shown that golden algae prefer relatively cold waters and can be sensitive to high temperatures (Sandgren et al 1995, Butterwick et al 2005, Bergkemper et al 2018).Cryptophytes are common in temperate, boreal and polar lakes (McKnight et al 2000, Lepistö andHolopainen 2003), where they often dominate during or shortly after ice cover (Wiedner andNixdorf 1998, Vehmaa andSalonen 2009).Conversely, blooms of euglenophytes are quite common in (sub)tropical climates and several other studies also found a positive relationship between euglenophyte abundances and water temperature (Chattopadhyay andBanerjee 2007, Rahman et al 2007).

Biodiversity patterns
Biodiversity is generally considered to be an indication of ecosystem functioning and resilience (Loreau et al 2001, Folke et al 2004, Ptacnik et al 2008b, Tilman et al 2014, Bestion et al 2021).Consistent with other lake studies, we found that phytoplankton biodiversity increased with temperature (Stomp et al 2011, Segura et al 2015), which supports the common observation of a latitudinal diversity gradient with highest species richness at low latitudes (Willig et al 2003, Hillebrand 2004, Pontarp et al 2019).Furthermore, our analysis showed a significantly lower taxon richness and Shannon-Wiener diversity index in lakes with high TP concentrations, which is in line with many other observations that the biodiversity of aquatic and terrestrial primary producers often declines at high nutrient levels, resulting in a low biodiversity of eutrophic ecosystems (Leibold 1999, Mittelbach et al 2001, Stevens et al 2004, Hautier et al 2009, Fraser et al 2015).
A classic prediction derived from the theory of island biogeography is that biodiversity will increase with habitat size (MacArthur and Wilson 1967).Since lakes represent 'aquatic islands' in a terrestrial landscape, this prediction may also apply to lake size.Indeed, a study on phytoplankton species richness across aquatic systems that spanned >15 orders of magnitude in spatial extent found that phytoplankton biodiversity increased with habitat area (Smith et al 2005).However, a study on phytoplankton species richness across USA lakes spanning a narrower spatial extent similar to our study (4 orders of magnitude) only found a very weak species-area relationship (Stomp et al 2011).In our analysis of >1000 lakes, neither phytoplankton taxon richness nor the Shannon-Wiener diversity index increased with lake area.Hence, at this continental scale, water temperature and nutrient status seemed to be the key determinants of phytoplankton biodiversity.

Caveats
As any study, this study suffers from a number of caveats.For example, the lake samples collected by the US EPA and analysed in our study were restricted to one point in time, and only considered one location in each lake.Furthermore, samples were taken from the top 2 m of the surface water, which may underrepresent species that prefer deeper water layers.That said, however, the elegant simplicity of this sampling protocol has enabled the collection of data from >1000 lakes spread across the continent and all using the exact same procedures, which is essential for a coherent analysis of large-scale patterns in community composition.
Another caveat is that microscopic analysis of lake samples reveals larger phytoplankton taxa, while picophytoplankton is often overlooked.Picophytoplankton are common in lakes, and their diversity and abundance can vary strongly with environmental variables (Callieri and Stockner 2002, Stomp et al 2007, Schiaffino et al 2013).The use of dedicated techniques such as flow cytometric analysis (Schiaffino et al 2013, Mojica et al 2015) and high-throughput sequencing of 16S and 18S ribosomal RNA (Pierella Karlusich et al 2020, Vuorio et al 2020) may help to identify picophytoplankton in future lake surveys and elucidate their contribution to the phytoplankton community composition across large spatial scales.

Conclusions
Overall, our analysis of this large-scale data set shows that TP, temperature, pH and lake size are major determinants of the phytoplankton community composition in lakes.Interestingly, three of these four environmental parameters are associated with major human impacts on aquatic ecosystems, i.e. eutrophication results in higher phosphorus inputs into lakes, global warming is predicted to increase lake temperature, and rising atmospheric CO 2 concentrations in combination with enhanced DOC inputs from the surrounding watershed may lower the pH.Our results are based on correlations, which do not necessarily reflect causal relationships, as phytoplankton communities may be influenced by TP, temperature and pH through a variety of pathways and mechanisms.Moreover, the observed lake-to-lake variation in taxonomic composition is substantial and not all of this variation can be captured by just four environmental variables.Despite these cautionary notes, the comprehensive patterns in this extensive data set nicely illustrate how phytoplankton community structure varies across large spatial scales, and provide a tentative indication of which phytoplankton taxa are likely to be among the winners and losers in response to anthropogenic pressures such as eutrophication and global warming.

Figure 1 .
Figure 1.Correlation matrix of environmental variables measured in 1041 lakes across the conterminous USA.Left panels show scatter plots of the data; each dot represents an individual lake.All environmental variables except pH, temperature and dissolved oxygen were log-transformed to improve homoscedasticity of the data.Right panels report Pearson correlation coefficients, where the font size increases with the magnitude of the correlation coefficients to highlight the strongest correlations.n.s.= not significant.
(A)).Euglenophytes had high relative abundances in the South and along the East Coast (figure 2(B)).Diatoms had high relative abundances in many lakes throughout the USA, and particularly in the Rocky Mountains and the Northeast (figure 2(C)).Dinoflagellates were less abundant in the Great Plains and Southwest than in other regions (figure 2(D)).Cryptophytes and chlorophytes occurred throughout the

Figure 3 .
Figure 3. Similarity in phytoplankton taxon composition as function of the distance between lakes.(A) Sørensen similarity index, (B) Bray-Curtis similarity index.The boxplots show the median value ± IQR.The data are based on pairwise comparison of 1041 lakes, which yields n = ((1041) 2 − 1041)/2 = 541 320 values of each similarity index.

Figure 4 .
Figure 4. Relative abundances of phytoplankton groups in lakes with different total phosphorus concentrations (TP), temperature, pH, and lake area.(A) Cyanobacteria, (B) euglenophytes, (C) diatoms, (D) dinoflagellates, (E) cryptophytes, (F) chlorophytes, (G) desmids, and (H) golden algae.Relative abundances were calculated on the basis of phytoplankton biovolume, and expressed as median values ± IQR.The environmental variables were each classified into four categories according to table 1. Bars labeled with different letters (a, b,c, d) indicate significant differences between lakes with different total phosphorus concentrations, temperature, pH, or lake area, as tested by nonparametric Kruskal-Wallis tests (tableS4) followed by post-hoc comparison of the relative abundances using Dunn's test.

Figure 5 .
Figure 5. Relative abundances of phytoplankton genera in lakes with different total phosphorus (TP) concentrations.(A) Cyanobacteria, (B) chlorophytes, (C) diatoms, (D) golden algae, (E) cryptophytes, (F) dinoflagellates, (G) euglenophytes, and (H) desmids.All genera present in >100 lakes are shown.Relative abundances were calculated on the basis of phytoplankton biovolume, and expressed as mean percentage ± SE.Total phosphorus concentrations were classified into four categories according to table 1. Bars labeled with different letters (a, b, c) indicate significant differences between lakes with different total phosphorus levels, as tested by nonparametric Kruskal-Wallis tests (tableS5) followed by post-hoc comparison of the relative abundances using Dunn's test.

Figure 6 .
Figure 6.Relative abundances of phytoplankton genera in lakes with different water temperatures.(A) Cyanobacteria, (B) chlorophytes, (C) diatoms, (D) golden algae, (E) cryptophytes, (F) dinoflagellates, (G) euglenophytes, and (H) desmids.All genera present in >100 lakes are shown.Relative abundances were calculated on the basis of phytoplankton biovolume, and expressed as mean percentage ± SE.The temperature of the lakes was classified into four categories according to table 1. Bars labeled with different letters (a, b, c, d) indicate significant differences between lakes of different temperatures, as tested by nonparametric Kruskal-Wallis tests (tableS6) followed by post-hoc comparison of the relative abundances using Dunn's test.

Figure 7 .
Figure 7. Relative abundances of phytoplankton genera in lakes with different pH.(A) Cyanobacteria, (B) chlorophytes, (C) diatoms, (D) golden algae, (E) cryptophytes, (F) dinoflagellates, (G) euglenophytes, and (H) desmids.All genera present in >100 lakes are shown.Relative abundances were calculated on the basis of phytoplankton biovolume, and expressed as mean percentage ± SE.The pH of the lakes was classified into four categories according to table 1. Bars labeled with different letters(a, b, c, d) indicate significant differences between lakes with different pH, as tested by nonparametric Kruskal-Wallis tests (tableS7) followed by post-hoc comparison of the relative abundances using Dunn's test.

Figure 8 .
Figure 8. Relative abundances of phytoplankton genera in lakes of different sizes.(A) Cyanobacteria, (B) chlorophytes, (C) diatoms, (D) golden algae, (E) cryptophytes, (F) dinoflagellates, (G) euglenophytes, and (H) desmids.All genera present in >100 lakes are shown.Relative abundances were calculated on the basis of phytoplankton biovolume, and expressed as mean percentage ± SE.Lake area was classified into four categories according to table 1. Bars labeled with different letters(a, b, c) indicate significant differences between lakes with different lake areas, as tested by nonparametric Kruskal-Wallis tests (tableS8) followed by post-hoc comparison of the relative abundances using Dunn's test.

Figure 9 .
Figure 9. Biodiversity of phytoplankton in 1041 lakes.(A)-(D) Taxon richness in lakes with different (A) total phosphorus concentrations (TP), (B) temperature, (C) pH, and (D) lake area.(E)-(H) Shannon-Wiener diversity index in lakes with different (E) total phosphorus concentrations (TP), (F) temperature, (G) pH, and (H) lake area.The biodiversity estimates are expressed as median values ± IQR.The environmental variables were each classified into four categories according to table 1. Bars labeled with different letters (a, b, c) indicate significant differences between lakes with different total phosphorus concentrations, temperature, or pH, as tested by nonparametric Kruskal-Wallis tests (tableS10) followed by post-hoc comparison of the relative abundances using Dunn's test.

Table 1 .
Classification of lakes into four categories according to total phosphorus concentration, water temperature, lake pH and lake area.The number of lakes (n) in each category is indicated between brackets.