Are African irrigation dam projects for large-scale agribusiness or small-scale farmers?

The economic development of rural economies across the global south is often related to access to water and the development of water infrastructure. It has been argued that the construction of new dams would unleash the agricultural potential of African nations that are exposed to seasonal water scarcity, strong interannual rainfall variability, and associated uncertainties in water availability. While water security is often presented as the pathway to poverty alleviation and invoked to justify large dam projects for irrigation, it is still unclear to what extent small holders will benefit from them. Are large dams built to the benefit of subsistence farmers or of large-scale commercial agriculture? Here we use remote sensing imagery in conjunction with advanced machine learning algorithms to map the irrigated areas (or ‘command areas’) that have appeared in the surroundings of 18 major dams built across the African continent between 2000 and 2015. We quantify the expansion of irrigation afforded by those dams, the associated changes in population density, forest cover, and farm size. We find that, while in the case of nine dams in the year 2000 there were no detectable farming patterns, in 2015 a substantial fraction of the command area (ranging between 8.5% and 96.7%) was taken by large-scale farms (i.e., parcels >200 ha). Seven of the remaining 9 dams showed a significant increase in average farm size and number of farms between 2000 and 2015, with large-scale farming accounting for anywhere between 5.2% and 76.7% of the command area. Collectively, these results indicate that many recent dam projects in Africa are associated either with the establishment of large-scale farming or a transition from small-scale to mid-to-large scale agriculture.


Introduction
Water availability is often viewed as a major factor constraining global food production and other economic activities under climate change and population growth (D'Odorico et al 2018, Villani et al 2018, Deines et al 2019. Indeed, water security is crucial to poverty alleviation and economic development, particularly in marginalized rural communities that rely on agriculture (e.g. Doss et al 2014). Research from the last decade has emphasized how a sustainable development of agriculture should prevent further cropland expansion to avoid habitat destruction and associated biodiversity losses (Godfray et al 2010, Foley et al 2011. In other words, an environmentally sustainable increase in crop production should not expand the global farmland footprint but rely on the intensification of agriculture, which often requires bringing irrigation to those cultivated areas that are presently rainfed (Jägermeyr et al 2017, Rosa et al 2018. While irrigated agriculture is essential to sustaining growing trends in food production (Abdullah 2006, Rosa et al 2018, Deines et al 2019 it often relies on limited water resources and claims the largest share of human water consumption worldwide (e.g. Nino et al 2016. To improve the use of limited freshwater resources available for irrigation, humankind has often relied on water storages, including groundwater stocks and surface water bodies, ranging from small farm-scale ponds to major reservoirs upstream from large dams (Van Der Zaag and Gupta 2008). These storages land preferential target for large scale land acquisitions, as already reported in the case of other infrastructure such as roadways (Grandia 2013).
These questions are difficult to address because of the lack of data and methods to investigate the socioenvironmental impacts of dam construction. Access to records of land use, concessions, permits and licenses is often difficult, as well as documenting drivers of eviction, relocation or resettlement of local communities (D'Odorico and Rulli 2014). Socioeconomic responses to dam construction likely depend on where the land is located because upstream and downstream cropped areas are expected to be impacted differently. In fact, while upstream areas seldom change in crop productivity (except for the areas that are flooded by the reservoir), the response of downstream regions depends on the proximity to the dam (Strobl and Strobl 2011). Therefore, here we focus on active command areas, defined as the areas irrigated using water from the dam. Such areas are here determined as the newly irrigated areas appearing in the surroundings of the dam after its commissioning. Unlike global studies that have mapped irrigated areas at coarse scales (e.g. Thenkabail et al 2009, Siebert et al 2015, Nagaraj et al 2021 here we use Landsat imagery (30 m) to map irrigated areas during crop growing periods at the local scale. We propose methods based on the application of machine learning algorithms as an alternative to hydrologic calculations proposed by Rufin et al (2018) for those regions where detailed hydrologic data are not available.
We concentrated on a collection of georeferenced irrigation dams in Africa built in the years after 2000, when the recent land rush took place in Africa and other regions of the world . We investigate the evolution of farm sizes and actual command areas and quantify the environmental and human impacts.

Study areas
We focused on African dams built after year 2000, a period that has seen a rise in large scale land investments globally . We used the georeferenced dam registry from the Food and Agriculture Organization (FAO). This selection (table A1 (available online at stacks.iop.org/ERC/4/015005/mmedia)) was based on the year our study period ended, 2015, and on when the dam started to operate. We concentrated only on dams for irrigation use as denoted in the AQUASTAT dam list; dams denoted as planned or lacking an opening date were not included. Dams for hydropower generation or other non-agricultural and mixed uses were not included. These criteria led to a list of 18 dams in 7 countries in total (figure 1, table A1, supplementary materials). A point shapefile of these dams was created and exported to Google Earth Engine (GEE), a cloud-based platform that includes a rich library of remote-sensing and other geospatial datasets along with several sophisticated tools for geospatial and remote sensing data analysis in JavaScript API environment. Image processing methods is described in the following section and a roadmap is shown in figure A1 in the supplementary materials.

Data and methods
In addition to the data described in section 2, MODIS Terra Land Surface Temperature collection (Wan et al 2015) was used to investigate temperature changes in study areas over the 2000-2015 time period (table A2 in  supplementary materials), and the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) (Funk et al 2015) used to infer mean precipitation based on daily data in all dams (table A2). Interestingly, there were only minimal interannual temperature changes through the study period. Similarly, precipitation data were extracted, to evaluate whether changes in greenness (NDVI) in the command areas were contributed by changes in precipitation. A Mann-Whitney test of mean comparison was computed to compare the mean precipitation in the period leading to 2000 and to 2015 for mean in land surface temperatures and precipitation (table A2).
3.1. Image processing GEE enables high-performance and quick remote sensing analyses (Gorelick et al 2017). Using the newest, cloud free images from Landsat 8 Operational Land Imager (OLI), a rectangular area of interest that fully contained a watershed delineated using the dam as the outlet point, was drawn to include all the immediate irrigated areas, and all upstream inundated areas. On GEE, Shuttle Radar Topography Mission (SRTM) (Farr et al 2007) 30 m elevation imagery was used for the delineation of the upstream watershed and the downstream floodplains. The rectangular delineation size was guided by the delineated watershed boundary.
Ground truth data was previously collected in two of the dams of interest in South Africa and Ethiopia (total of four) in 2015 the rest of the training points used were determined visually using very high-resolution imagery on Google Earth Pro. On Google Earth Pro, training data was acquired by setting the time slider to the year of interest and using land cover classes within the command area of interest previously identified, and later exported to ArcGIS as KML files. Further conversion of these into shapefile formats that can be used as training data was also done on ArcGIS 10.7. Training data was collected for both years (i.e., 2000 and 2015) to help differentiate land cover classes that did not exist in 2000, especially the existence of irrigation. Careful delineation of training data for year 2000 was done based on careful choice of clearly discernable features to not mislabel some classes Careful delineation of training data for year 2000 was done based on careful choice of clearly discernable features to not mislabel some classes.
However, even though this task was a carefully performed, in cases where no clear classification was possible because vegetation types were mixed, or changed among sparse shrubs, mixed tree-shrubs-grass plains, and sparse non-photosynthetic vegetation on bare ground, we labeled the class based on the one that looked dominant. This classification, however, did not affect our results, as the focus of this study is on irrigated croplands.
To understand the difference in spectral signatures of these irrigated areas and other land cover classes, we ran a spectral band chart for each feature-known cultivated areas showed higher NDVI and SWIR values, however irrigated areas showed even higher NDVI as compared to their rainfed counterparts. Irrigated areas are expected to have lower surface temperature than the adjacent non-irrigated areas because a greater fraction of the incoming solar irradiance is spent in evapotranspiration (i.e., it is turned into latent heat fluxes) (e.g. Droogers and Bastiaanssen 2002). Therefore, a correspondingly smaller fraction of the incoming solar energy is available to increase the sensible heat of the soil surface (Sacks et al 2009). This was in accordance with past work designed to identify irrigated croplands (Wu and De Pauw 2011).
Due to the multi temporal nature of the 30 m Landsat data that covers our study period, Landsat 5 Thematic Mapper (TM) data were used for the year 2000, Landsat 8 data were used for the year 2015. Our criteria for image selection accounted for crop seasonality, based on FAO's crop calendars for each country (http://www.fao.org/ agriculture/seed/cropcalendar/welcome.do). The occurrence and timing of multiple growing seasons was detected by looking at the temporal variability of crop vigor based on the NDVI signal. Specifically, we used all images available for each area of interest and assumed that multiple spikes in NDVI depicted multiple growth periods (figure 2(a(i))).

Imagery classification and command area mapping
Our classification goal was to identify land cover classes and, subsequently, land use in years 2000 and 2015 as two snapshots representative of land use conditions existing before and after dam construction, respectively. In a pilot study based on known irrigated areas in Bergrivier and Injaka dams (South Africa), and Tendaho and Amerti dams (Ethiopia), actual ground truth data was collected and spectral plots of bands of the different classes were computed to determine what bands were the most important in identifying these land cover classes.
Further identification of training points was aided by visually identifying features on Google Earth Pro and locating them on actual Landsat imagery.
A random forest classifier was then employed to classify imagery in the years 2000 and 2015, for the period identified as the crop growing seasons only. Classifications were carried out on a case-by-case process due to the differing crop growth seasons in different regions. Random forest classifier was chosen because it showed better accuracy results as compared to the classification and regression trees (CART), on the GEE platform. Guided by our research questions and image resolution, 30 m, we chose the following land cover classes; water, buildings, rainfed agriculture, irrigated agriculture, riparian vegetation, bare areas, woody vegetation, grass, and fallowed regions. Data was split 70/30 for training and validation. To avoid overfitting, mean absolute error was used with different maximum leaf nodes of 5, 50, 200, 500 and the best tree size was chosen as one with the least mean absolute error (200). This model was then saved and used on python to classify the rest of the imagery. Command area determination was mostly based on irrigated land cover classes and their proximity to the dam and/or canals that extend from a dam, aided by visual examination of Google Earth Pro imagery. Likewise, the areas irrigated prior to the construction of the dam were determined using imagery from 2000 adopting the same methods as for 2015. In GEE, changes in land cover and land use in the command areas as well as in the flooded areas upstream from the dam were investigated. To that end forest cover in 2000, and forest loss per year between 2000 and 2015 where quantified using the Hansen Global Forest Cover dataset (Hansen et al 2013). During initial data exploration, we plotted spectral bands for the different land cover classes identified (figure 3), NIR and SWIR1 bands showed to be the most important in identifying cultivated areas, and although the SWIR1 signature did not vary much for irrigated and rainfed land cover classes, the random forest classifier proved to be able to identify this difference.
Similarly, human population in 2000 and 2015 was extracted from the Global Human Settlement Layers, Population Grid (GHSL) dataset (European Commission et al 2015) for both the command areas and the upstream flooded areas.

Farm size determination
Using the resampled land cover classes for irrigated and rainfed farming, combined with NDVI thresholds for the two forms of farming in the two years, a canny edge detection algorithm (Ali and Clausi 2001) was used to determine farm edges and, consequently farm sizes, using pre-determined median NDVI values, with a minimum NDVI threshold of 0.3 in 2000, to include both rainfed and irrigated farms, and 0.5 in 2015 to only assess farms influenced by irrigation. We ran post edge detection cleaning to remove areas with riparian vegetation stands in clear cases where they were delineated together with farm parcels (figure A3). Using the 200 ha cutoff for large scale farms (Land Matrix 2020), we classified the land parcels in the command area as large-or small-scale farms. We also looked at the entire farm size distribution as shown in the results section. Assessing farm sizes based on 30 m Landsat pixels prevents features smaller than 30 m from showing in our analyses. In other words, our results can be impacted by the minimum mapping unit in the estimation of farm edges and consequently farm sizes. This means that, farms that are close together could be easily consolidated as one, and plots within a farm that are far from each other could be taken to be individual farms. Because of these limitation in the resolution of the farm size assessment, the results of our analysis are here used to evaluate a general trend in land parcel size change rather than an accurate high-resolution estimate of farm size.  Mean farm size comparison between 2000 and 2015 using was computed for each dam the significance of mean size differences was tested using the Mann-Whitney test. We also investigated the presence of large-scale farms using 200 ha as the threshold for large scale farms (Land Matrix 2020). Changes in the mean size of largescale farms between the two years was determined only for command areas in which large-scale farms were existent in 2000.

Results
Most dam cases included in this study exhibit an expansion of irrigation after the construction of the dam with the appearance of command areas that by far exceed in size the pre-existing irrigated areas (table 1) as shown in figure 2 for two illustrative cases. In 10 of these dams there was almost no irrigation before the construction of the dam (though these areas were often cultivated with rainfed farming), while in the other 9 cases less than 50% of the command areas was irrigated in the year 2000, mostly in the form of floodplain irrigation. Kissir and Brezina dams of Algeria show no sign of irrigation implementation (table 1, figure 3). Overall, these results show how, as expected, dams lead to the emergence of new irrigated areas, known as 'command areas' (Rufin et al 2018). To evaluate the impact these new or expanded irrigation projects have on the environment and local communities, we look at changes in forest cover, population density, the number of farms and their size. Our results do not show any discernable dependence of increase in command area with factors such as dam height, expected dam capacities or any other dam physical properties, suggesting that other factors related to the implementation of irrigation matter more. Our analysis of precipitation shows no major changes between the period leading to 2000 and 2015 (table A2) that could have confounded the mapping of the command areas by mixing the effects of irrigation on land surface attributes to those of precipitation.
Our image classification results show high overall accuracies (>90%) in areas of low to no woody crops such as grapevines and orchards. Regions with fruit trees and grapevines show overall accuracies above 80%; the lower accuracy is due to difficulties in distinguishing irrigated tree plantations from green riparian areas (table 5)  Changes in forest cover are modest (<1%) in all command areas. Moreover, some of these areas did not exhibit any forest cover and/or loss at all. While changes in tree cover are negligible, command areas exhibit important changes in land use. In addition to the expansion of irrigation (figures 2(i) and (ii), table 1), we observed a change in farm size, with evidence of an overall transition from small-scale subsistence farming to large scale commercial agriculture (table 3). Results of the non-parametric Mann Whitney U-test comparison of mean farm size (μ A ) between year 2000 and 2015 show a statistically significant difference (except for Steelpoort (South Africa) with an increase in mean farm size during the study period. In the case of 9 dams there were no detectable farming patterns in 2000 and therefore we reported a zero average farm size. Overall, the number, n, of farms has dramatically increased after the construction of these dams, except for the case of Injaka, where it  slightly decreased, because of a process of consolidation and merging of land parcels. The increase in mean farm size, number of farms (table 3), and overall irrigated areas (table 1) indicate that not only has farming expanded within the command areas (i.e., the total farmed area, μ A ×n, has increased) but also that many new farms have appeared and that they are of bigger in size (hence the increase in μ A ). Thus, mid-to large-scale farming is benefiting from new irrigation opportunities emerging in areas that were either only partly used for agriculture or not farmed at all prior to the construction of the dam. Indeed, in the command areas of Talo, in Mali, Tendaho and Koga in Ethiopia, and Zitemba, Taksebt, Fontaine, El Agrem in Algeria, and Hassan in Morocco there was no recognizable sign or farming prior to dam construction but these areas exhibit the establishment of large-scale farms (i.e., farms greater than 200 ha) until 2015. Altogether large-scale farms account for a substantial fraction of the command area (ranging between 8.5% and 96.7%), (table 3) Interestingly, the number of these very large farms is only a small percentage (<0.25%) of the total number of farms, even though their total area is a substantial fraction of the command area, which suggests that these farms have very big areas (table 3).
Because the threshold of 200 ha commonly used to characterize large scale farming (e.g., Land Matrix 2020), is somewhat arbitrary, we also considered the entire distribution of farm size (table 4). The case of Bergrivier is an exception to the general pattern of increase in the area occupied by large farms between 2000 and 2015 since it is shown to have been built upstream of previously irrigated areas. In fact, in this case there is an increase both in the small (<1 ha) and mid-sized (10-50 ha) farms while the area of bigger farms (>50 ha) dropped to zero between 2000 and 2015. In Injaka and Taskebt, the farm size distribution has not dramatically changed, though in Injaka there has been a slight increase both in small (<1 ha) and small to mid-sized farms (1-10 ha) at the expenses of bigger farms. Of note is again the case of Steelpoort, where the area occupied by small farms has increased from 7.5% to 47.2% the total area of mid-sized farms (10-50 ha) has decreased from 79.8% to 34.7%, indicating that this dam project was realized to the benefit of smallholders. Interestingly, the farm sizes detected by our analysis are consistent with the values reported by the FAO (Lowder et al 2016) family farming platform dataset. The asterisk ( * ) denotes statistically different mean farm sizes between 2000 and 2015 (p-value < 0.001), while 'ns' refers to mean values that are not statistically significant.

Discussion and conclusions
The current controversy about large-scale land acquisitions by foreign investors has raised questions regarding the world's future development and the associated impacts especially on developing countries (Robertson and Pinstrup-Andersen 2010, de Schutter 2011, Liversage 2010. This debate has opened important international discussions on how to improve land administration systems and investment in agriculture and infrastructure, so that the land rights and livelihoods of smallholder farmers, pastoralists and other vulnerable groups are protected (de Schutter 2011). Liversage (2010) notes that, in our attempt to understand the impacts of foreign investment and development trajectories of poor communities, we should not divert our attention from the influence of local administration systems and the positive influence that foreign investment might have on an area of interest. As with large transnational land deals, the construction of dams has been viewed as the result of non-democratic decision making (Franco et al 2013), resulting in what is critically known as 'water grabbing' (Mehta et al 2012, Dell'Angelo et al 2018. This term refers to a condition in which local populations or ecosystems lose their rights (either formal or informal) to use water or are forced to share it with powerful, self-interested actors who take control of water resources in that specific area (Franco et al 2013).
Often forgotten in the debate on large-scale land deals, water can be a major target and driver of land investments (Mehta et al 2012). Similarly, our results demonstrate how the construction of new dams for irrigation can favor large-scale farming in the areas that will benefit from the improved access to irrigation water, thereby contributing to a transition from small-scale subsistence farming to large-scale commercial agriculture. The construction of dams has often been a controversial approach to the sustainable development of agriculture (WCD 2000, Scudder 2012). Proponents of dams invoke the social and economic benefits provided by these infrastructures, including irrigation, hydro-electric power generation, water supply and other uses ( Our results show that irrigated areas increased by >55% in the command area of dams that showed signs of irrigation in 2000, while in the command areas that showed no irrigation in 2000 the irrigated area increased from 0 to a maximum of ∼9000 ha in 15 years (table 2). We need to stress that this does not depict the actual rate of command area growth since the dams studied here were built in different years between 2000 and 2015, and their ages ranged between 3-14 years. It is of utmost importance to note that implementation of dam and irrigation projects does not follow a defined timeline which can be used to infer how much or when irrigation projects will be completed, therefore we find expansion of command areas to be variable in different dams.
Interestingly, we find that, while less than 2% of the farms are greater than 200 ha, in many cases they claim a substantial fraction of the command area (tables 4 and 5). Overall, large dam projects tend to favor the establishment of large to midsized farms, though some exceptions exist (e.g., Steelpoort, and-to some extent- Bergrivier). This outcome can also be explained by cases such as that of Talo dam, Mali, which was initially meant to allocate small scale and medium scale farms to rural communities (Meierotto 2009), while our results show more than 96% of the irrigated area belongs to large scale farms (even though the number of large farms accounted for less than 1% of the total number of farms) (table 4). Overall, in most command areas the mean farm sizes in 2000 and 2015 were significantly different (table 4) supporting the hypothesis of a transition to larger scale farming. According to the AQUSTAT dam lists (http:// www.fao.org/aquastat/en/databases/dams) 53 dams have been planned for future construction in Africa for agriculture. The methods and analyses developed in this manuscript can be used to monitor changes in cultivation, farming practices (including irrigation) and land tenure in the areas downstream from those dams and investigate the possible impacts on local communities and rural livelihoods. Specifically, it will be important to evaluate the extent to which the construction of these water infrastructures will contribute to the expansion or the intensification of agriculture (or both), dispossession of local farmers, fishing and pastoralist communities.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI:https://doi.org/ 10.6084/m9.figshare.14745456.