Food flows between counties in the United States from 2007 to 2017

Food supply chains are essential for distributing goods from production to consumption points. These complex supply chains are important for food security and availability. Recent research has developed novel methods to estimate food flows with high spatial resolution, but we do not currently understand how fine-grained food supply chains vary in time. In this study, we use an improved version of the Food Flow Model to estimate food flows (kg) between all county pairs across all food commodity groups for the years 2007, 2012, and 2017 (which requires estimating 206.3 million links). We then determine the core counties to the US food flow networks through time with a multi-criteria decision analysis technique. Our estimates of county-to-county food flows in time are freely available with this paper and could be useful for future research, policy, and decision-making.


Introduction
Food supply chains are complex systems that incorporate production, distribution, intermediate processing, and consumption of food commodities (Porkka et al 2013, MacDonald et al 2015. Recent studies have estimated food flows with high subnational spatial resolution (Lin et al 2019), but we do not currently understand how fine-grained food supply chains vary in time. Food supply chains propagate and attenuate shocks (Heslin et al 2020, Gomez et al 2021, Karakoc and Konar 2021, embody resources (Weber and Matthews 2008, Dang et al 2015, Robinson et al 2016, Metson et al 2020, and depend upon infrastructure (Attavanich et al 2013). Estimates of how high-resolution food flows vary in time would enable an assessment of spatiotemporal risks in food supply chains, critical infrastructure, and environmental footprints. The goal of this paper is to estimate food flows between counties in the United States for multiple years (e.g. 2007, 2012, and 2017) and identify the counties that are core to the network.
The United States is an important nation in the global food system. It is a major producer, consumer, and trade power in agri-food commodities (Xu et al 2011, Konar et al 2018. The US produces over 50% of the world's soybean and 30% of the world's corn (2020b). The US also accounts for significant fractions of the world's export market for sorghum and wheat (70% and 25%, respectively) (2020b). The US is also a key nation for global processed food trade. In fact, the US is the top exporter of processed food commodities with an average of 16.19% of market share between 1980 and 2012 (Baiardi et al 2015). It is also the top importer nation for processed foods with 13% of market share in 2018 (Suanin 2021). Climate change is likely to influence domestic grain transport in both volumes and modes within the US (Attavanich et al 2013), making it important to understand current patterns.
The US also has widely available data to enable data-driven studies of its food supply chain. Of particular importance, the commodity flow survey (CFS) and freight analysis framework (FAF) provide comprehensive data on freight movement among the 132 FAF zones within the US, which represent states and major metropolitan areas. Note that the CFS/FAF data is only available every 5 years (years ending with '2' and '7') and for coarse commodity categories, rather than individual items. The availability of CFS/FAF data within the US enabled Lin et al (2019) to develop the Food Flow Model to estimate food flows between counties. The Food Flow Model is a data-driven approach to estimate food flows between counties in the US that integrates machine learning, network properties, production and consumption statistics, mass balance constraints, and linear programming (Lin et al 2019). The Food Flow Model was initially developed for a single year (e.g. 2012). In this paper, we extend the Food Flow Model in time with some improvements to the original model.
This study builds upon the previous literature of agri-food flow modeling within the United States. Smith et al (2017) estimated the movement of corn used as biofuel among the counties of the US with a transportation optimization model. Lin et al (2014) analyzed the network properties of the statelevel US food flows, finding a power-law relationship between node betweenness centrality and node degree, indicating potential network vulnerability to the disturbance of key nodes. Konar et al (2018) studied how the statistical network properties of US food flows compares with global food trade and village scale food sharing. Konar et al (2018) concluded that nodal mass flux follows a Gamma distribution across the full range of spatial scales, which means that there is high heterogeneity in the distribution of food mass, such that the majority of nodes exchange small masses of food, while some outlier nodes exchange large quantities of food. This observation was a key insight in the development of the Food Flow Model in Lin et al (2019), which preserves this high mass flux heterogeneity through maintenance of the Gamma distribution of mass flux at the county spatial resolution (Lin et al 2019). Note that all of these studies examined spatial patterns and did not consider the time trends in domestic food flows. We build on this literature by continuing to focus on fine grained spatial fluxes of food, but with the additional consideration of time.
The goal of this study is to estimate food flows between all county pairs in the United States through time. To do this, we apply an improved version of the Food Flow Model (Lin et al 2019), which is a datadriven model, to quantify high spatial resolution agrifood commodity fluxes within the US for the years 2007, 2012, and 2017. We then use our estimates to determine the core counties to the US food flow networks over time. The research questions that guide this study of food flows in the United States are: (a) How do the food flow network properties change over time? (b) Which counties and links participate in food flows over time? (c) What are the core counties over the study period? We briefly present our data, overview the Food Flow Model, and our improvements in section 2. Our findings are presented in section 3. We discuss our results in section 4 and conclude in section 5.

Methods
We extend the Food Flow Model developed by Lin et al (2019) to estimate food flows between US counties for the years 2007, 2012, and 2017. The Food Flow Model is a data-driven approach that incorporates logistic regression, gamma regression (with a gravity model structure), mass balance, and linear programming. A schematic of The Food Flow Model is provided in figure 1 (see the supporting information for a more detailed description of the original Food Flow Model). We introduce three key improvements: (a) systematic handling of estimator selection, (b) smoothing distance data, and (c) a quantitative approach to select the core nodes. We provide a brief overview of the Food Flow Model below; the interested reader is referred to Lin et al (2019) for a full description. We describe our model improvements in more detail below.

Input data
The Food Flow Model relies on empirical data to estimate county-level food flows. Input data include multiple factors such as the geography of production, transportation, input-output requirements, and consumption that combine to determine food transport (Lin et al 2019). In the model, consumption is not restricted to the purchase of final goods by households, but instead also accounts for the intermediate stages in supply chain production and processing. Hence, the transformation of commodities from raw to more refined items is also considered to be consumption. For example, live animals that are sent to a slaughterhouse are transformed into meat, so the counties containing the slaughterhouses consume the live animals and produce meat.
The FAF data is a key input to the Food Flow Model (2020b). FAF information is available for the years 2007. FAF provides information on the transfer of commodities within the 132 FAF zones of the US. The Food Flow Model relies on the FAF database which reports commodity flows by the standard classification of transported goods (SCTG). Thus, the Food Flow Model inherits the SCTG commodity categories and definitions. The agricultural and food commodities included in this study are listed in table 1.
Statistical information on the production, population, and personal income for each county and study year are used. For unprocessed agricultural commodities, production data is obtained from the US Department of Agriculture (2020a). For processed food items, per industry revenue in thousand US dollars is computed for each county by combining the SCTG-NAICS crosswalk table (Lin et al 2019), NAICS codes, and I-O accounts table (2020b). Since processed foods (SCTG 05-07) require industry inputs, we considered the corresponding SCTG 05-07  Live animal and fish 02 Cereal grains 03 Agricultural products (except for animal feed, cereal grains, and forage products) 04 Animal feed, eggs, honey, and other products of animal origin 05 Meat, poultry, fish, seafood, and their preparations 06 Milled grain products and preparations, and bakery products 07 Other prepared foodstuffs, fats, and oils production industry revenues as supply level indicators of counties. The last set of required data is the geodesic distance between counties obtained from Oak Ridge National Laboratory (2020a) and port trade data from the US Bureau of Transportation Statistics (2020). The geodesic distance is a simplification of real-world transportation path lengths which is a commonly used distance measure for the gravity model of trade (De Benedictis andTajoli 2011, Shepherd 2013). Port data is used to boost the food flows to/from the counties that contain the ports as they are assumed to be the main transit hubs for import and export. All input data along with a brief description of how they are utilized in the Food Flow Model are listed in table 2. Additional details regarding input data are provided in the SI (available online at stacks.iop.org/ERL/17/034035/mmedia).

Improvements to the food flow model
We introduce three key improvements to the original Food Flow Model: negatively correlated with distance in the gravity model (Shepherd 2013). We winsorize the inter-county distance values (Ghosh and Vogt 2012) to avoid high flow estimates between small counties. Winsorization is a common technique in economics to modify the values of outliers to bring them closer to the other sample values (Hwang et al 2011, Orth 2013. Additionally, large self-loop flows were driven by counties with small areas in the original Food Flow Model. To avoid this issue, self-loop area is now set equal to the mean of all self-loop distances to remove the effect of extremely small counties. Flow assignment is now mainly driven by the other variables, such as production and income (see SI for a more detailed explanation). (c) Quantitative approach to select the core nodes.
To determine the core nodes over time we adopt TOPSIS, a multi-criteria decision analysis technique. TOPSIS is a commonly used approach to determine the importance of network components (Du et al 2014, Hu et al 2016, Karakoc et al 2020 as it ranks the components based on their pre-determined criteria performances (Hwang and Yoon 1981). We use node betweenness centrality and degree (Ercsey-Ravasz et al 2012, This supply chain data represents one industry's requirement degree for another industry's output. The required input is to produce unit dollar output. The requirement coefficients of SCTG 05-07 are multiplied by production data of SCTG 01-04 to determine the need of each industry per commodity. The sum of industry and end consumer input needs per commodity represents county's total consumption. This is used in gamma mixture hurdle model for link prediction and flow amount estimation. Port Trade US Bureau of Transportation Statistics (2020) Data for commodity trade from sea, air, and land ports in the United States The counties where these ports are located in are considered as the transit hubs for import and export.
Port trade values of SCTG commodities (in US dollars) are used in gamma regression models to boost up the flows of food to/from these certain counties. North American Industry Classification System (NAICS)

United States Census Bureau (2020c)
This production-oriented NAICS data groups industries according to similarity in their production processes. The geodesic distance between county centroids based on their latitude and longitude information.

NAICS data, I-O Accounts
The regression models consider distances between all county-pairs to estimate the existence and strengths of the links. Also, linear programming algorithm uses distance matrix to assign the food flows to the shortest paths. Land Area Coverage Per County United States Census Bureau (2020b) The land area coverage in square miles per US census area.
In the regression models, the land area in per county is used as the distance measures for self-loops. The square are in miles is converted into meters.

Model component Functionality
Logistic regression Binary logistic regression is used to estimate the existence of flow links between FAF zones based on the available independent variables. It estimates two possible outcomes and then the outcome values are coded as either '1' or '0' indicating the 'existence' or 'non-existence' of a link between any two FAF zone, respectively. Once the desired level of model accuracy is achieved in the FAF scale, it is implemented on the county scale.

Gamma regression
Since food mass fluxes follow the gamma distribution (Konar et al 2018), gamma regression is used to estimate the flow capacities (i.e. weights) of the links between counties based on the available independent variables. The shape of the gamma distribution could be interpreted as the effective units of food commodity that is actually delivered from origin to destination apart from the wasted amount during transit (Lin et al 2019). Through the gamma regression model, between 3% and 10% of link weights across all SCTG commodities are underestimated. To satisfy the total mass balance of FAF data, a separate gamma regression model is developed for underestimated links to boost their weight and achieve mass balance.
To boost the weight estimates, available port data is introduced to the second gamma regression as a separate variable. Similar to logistic regression, the two gamma regressions are first implemented on FAF scale and then on the county scale. Linear programming Linear programming component aims to minimize the total transportation cost (i.e. travel distance per unit commodity) in the county food flow network. It is a common approach in supply chain transportation studies (Klein 1967). As it is based on the gravity-model, l Linear programming also assigns food flows between counties to the link with largest capacity. It is another common approach in supply chain studies to deliver the goods among origin and destination pairs in the most efficient way (Chen et al 1999, Schrijver 2002.

Mass balance
Mass balance is introduced as a constraint in the linear programming component of the Food Flow Model. Sum of the total outflow/inflow of counties that are located in a single FAF zone is matched with total outflow/inflow of that corresponding FAF zone. Additionally, mass balance is included in the gamma regression. Sum of the estimated link weights is compared with the empirical total flows per FAF zone. Once the underestimated link weights are identified, a second gamma regression is implemented on them to match the total FAF mass in the estimations. Gravity-model The gravity model of international trade proposes that the trade flows are inversely correlated with the distances between origin and destination pairs (Disdier and Head 2008). Both the regressions and linear programming components of Food Flow Model is based on the gravity-model of trade. Hence, shorter distances between counties are assigned with higher flow capacities.

Model component Functionality
Grid-based search A grid-based search is introduced to the logistic and gamma regression components of the Food Flow Model. By using grid-based search, the trade-off between overfitting due to high number of estimators and underfitting due to low number of estimators is balanced. Accuracy measure (AUC) in range 0.78-0.94 is achieved with 7-10 estimators for each commodity for each study year. Winsorizing To avoid extremely high flow estimates between small counties, winsorizing technique is introduced to the Food Flow Model. By winsorizing technique, value of extremely small outliers, 1.5% of the inter-county distances, are brought up the first quantile of the distribution. Hence, effect of extremely small distances on flow estimation is overpowered by other estimators i.e. production, population, etc. TOPSIS A multi-criteria decision analysis technique, TOPSIS, is introduced to the Food Flow Model in order to methodologically identify core nodes. Counties are ranked based on their score for two criteria, node degree and betweenness centrality. For each commodity network in each study year, counties with highest aggregated score for degree and betweenness are determined as core.
Gaur et al 2020) (see section 2.3) to assess the core counties in each study year (see SI for more details). Table 3 lists the original and improved components of the Food Flow Model. Additional details are provided in the SI.

Food flow networks
We construct food flow networks for FAF data and our county-scale estimates. Nodes (N) are the spatial locations that serve as the origin and destination of food flows (e.g. FAF zones, counties). Links (L) indicate connections between origin (o) and destination (d) nodes. Link weight is the mass flux between nodes. Density (d) is the ratio of existing links over the potential number of links, including self-loops: d = L N * N . The core nodes are defined to be those with both high total degree and betweenness centrality. Total degree is c o total = ∑ N d l od + ∑ N d l do (Barabási 2016). Node betweenness centrality is the portion of network shortest paths, σ, that pass through that node over all potential shortest paths in the network and is given by σst . Nodes with higher betweenness centrality are located on more shortest paths in the network and are more central to the national network structure (White and Borgatti 1994). We explain our multi-criteria decision approach to core node selection in the SI.

Results
In this section, we answer the research questions listed in section 1. We compare county estimates with FAF data to address each question. These observations taken together indicate that FAF networks have become more spatially concentrated over time, with fewer links carrying larger quantities of food. The mass flux trend is preserved in the county flow networks due to our methodological constraint of mass balance between county and FAF flows. County networks are sparser than FAF networks. However, county network density mostly increases with time, with some variation. The density of SCTG 01, 03, 04, 06, and 07 increases from 2007 to 2017, while the density of SCTG 02, 05, and all commodities summed up decreases over the same period. The density is the most variable in the drought year 2012. We explore this drought year in more detail in section 4.2. Figure 3 illustrates our estimates for 2017 countylevel food flow networks broken down by each SCTG commodity code. Table 4 lists the locations with the largest net mass flow (= total inflows-total outflows) (see the SI for the break down by SCTG).  Orleans port which is a major port for the trade of agri-food commodities via the Mississippi River (NOL 2020). Sedgwick, KS has a large agricultural economy that is built on food processing (KDA 2020). Platte County, NE is in the list of top 5 Nebraska counties for agricultural sales (NDA 2020). Table 5 provides the top 10 links in county and FAF flow networks for all SCTG commodities summed up (see the SI for a break down by SCTG commodity). Many of the top 10 links are self-loops, which is in line with previous results presented in Lin et al (2019). Many of the largest county-level links are located in the FAF zones with the largest food flows. However, some county-level links exceed their respective FAF level ranking. For example, Los Angeles County, California to Los Angeles County, California is consistently in the top 10, although Los Angeles-Long Beach CFS Area is not always in the top FAF links. This is sensible because counties in California are bigger in area than counties in the Midwest and Eastern parts of the US, which mechanically means that allocating to them will lead to a larger mass. Figure 4 presents a heatmap of FAF and countyscale net flows over time. Changes in county-scale net flows between study years are also illustrated. FAF zones and counties that import more have higher net flows whereas the ones that export more have lower net flows. Southern and Eastern FAF zones bring in more food (high net flows) and this spatial pattern is constant through time. In contrast, more rural FAF zones in the Midwest and Northwest send out more food (low net flows). Similar trends are observable in the county-scale maps. Wealthy population centers such as Los Angeles, CA, Chicago, IL, and New York City, NY have higher net flows through time. (Refer to the SI for heatmaps of net flow by SCTG, as well as separate heatmaps for inflows and outflows through time.) Figure 5 maps the change in county food flows over time. By design, the county-scale networks capture the spatial patterns of the FAF data (e.g. compare figure 5(A) with (E); figure 5(B) with (F)). Link-level changes in mass flux between counties align well with the FAF-level data (e.g. compare figure 5(C) with (G); figure 4(D) with (H)). For example, the Mississippi River-band experiences some of the largest fluctuations in food flows across spatial scales. There is also a relatively high increase in food flows within the counties of Florida and inter-county flows of Washington. These maps align with the heatmaps of county inflow and outflow changes. Food flow network maps broken-down by SCTG commodity are provided in the SI. Table 6 lists the FAF zones and counties that are core to the national network through time. Core locations tend to remain so through time, which indicates persistence in their importance. The consistently core FAF zones are: Los Angeles, CA CFS Area, Chicago-Naperville CFS Area (IL Part), and Remainder of Texas. The consistently core counties are: Los Angeles County, CA, Cook County, IL, Maricopa County, AZ, Shelby County, TN, Riverside County, CA, Bexar County, TX and Harris County, TX. Core FAF zones and counties-which are defined in terms of their topological importance-are also major movers of food mass (see table 16 in SI). This means that these FAF zones and counties are important in terms of both their contribution to network structure and mass flux. Refer to the SI for the core FAF zones and counties by SCTG commodity. Figure 6 illustrates that a power-law relationship exists between node degree and betweenness centrality. The power-law relationship is stronger in dense networks and weaker in sparse networks, such as SCTG 01 and 02. This means that the identification of core counties is less clear for SCTG 01 and SCTG 02. Refer to the SI for the power-law fit by SCTG commodity. Importantly, the power-law relationshipwhich has been observed in empirical food flow data (Lin et al 2014, Konar et al 2018-was not predetermined by our modeling approach. This means that a power-law relationship between node degree and betweenness centrality naturally arises through our algorithm. The self-arising power-law indicates that our model captures the critical attributes of food flow networks and gives us more confidence in our approach.

Discussion
Here, we discuss the advantages and limitations of our model, estimates during the 2012 Corn Belt drought, and future research directions.

Advantages and limitations of the food flow model
The Food Flow Model is a data-driven approach to estimate food flows at a fine spatial resolution through time. The data-driven nature of our approach is both a limitation and an advantage. It is a limitation because we do not explicitly include mechanisms that would enable us to address why questions about food flows. Rather, we describe the who, what, and where of food flows with time. Yet, the empirical patterns that the Food Flow Model incorporates  have mechanistic explanations in certain cases, such as the gravity model of trade (Anderson 1979), which is incorporated in our methodology (see section 2). The data-driven nature of our model also means that input data availability represents a limitation to its scope. This is the reason that our study time series is restricted to the years 2007, 2012, and 2017, for example.
The main shortcoming of our study is the absence of ground truth data to validate the county food flow estimates. Yet, our data-driven approach employs a variety of measures to ensure that our countylevel estimates are bounded by reality (e.g. mass balance requirement). Additionally, we introduced model improvements to limit human error and enhance realism. Grid-based search was used to automate variable selection and TOPSIS was used for core node selection (see table 3). This reduces the potential for human bias and error with a scientific method for these model components. The additional improvements that we made in the handling of outliers (i.e. winsorizing distances and setting self-loop distances constant) further enhances the realism of model outputs. Flows of relatively large counties (in terms of income, population, production, and other distance-unrelated characteristics) are now estimated to be higher than relatively small counties. This represents an improvement to the original Food Flow Model, in which the flow values of some counties was likely overestimated due to their very small size and the previous way that self-loop distance was handled. Additionally, the self-arising power-law relationship between degree and betweenness centrality (see section 3.3) provides additional confidence in model performance.
The main advantage of our model is the provision of county-level estimates of food flows. We provide an example of the output of our model in figure 7 for Los Angeles County, CA and Hillsborough County, FL, which are both in the list of core nodes (see table 6). Figure 7 illustrates that we are able to map the food mass inflows and outflows for each county in the US per commodity per study year. Similar maps could be generated for each of the 3134 counties by SCTGyear. Researchers and policy makers could evaluate a specific location of interest over time. However, our estimates are best suited for national-level analyses and local-level decision-makers may want to augment our estimates with additional site-specific information.
The total mass in/outflow of counties are constrained to sum to the mass in/outflow of their corresponding FAF zone. However, the mass in/outflow is heterogeneously allocated to links between counties according to our regression models. Regression models are fit to each SCTG-year and provided in the 'Regression Models and Network Statistics.xlsx' spreadsheet in the SI. For example, 15 log (S1 d ) + 0.14 log (C1 d ) − 0.06 log (T3 d ) + 0.24 log (LIVE d ). This means that grain flows will be allocated to inter-county links according to county-level regressor values for the distance between counties (D), personal income of the origin county (GDP o ), grain production of the origin county (P o ), revenue for accommodation in the destination county (A1 d ), revenue for drinking places (alcoholic beverages) in the destination county (D3 d ), revenue for scientific research and development services industries in the destination county (S1 d ), cattle population in the destination county (C1 d ), turkey population in the destination county (T3 d ), and total livestock population in the destination county (LIVE d ). The heterogeneity in the spatial and temporal distribution of regressor variables explains the differences in food flows between counties and FAF zones with time. Maps of all regressor variables are provided in the SI.

Impact of the 2012 Corn-Belt drought on grain flows
In 2012 a severe drought hit the US Corn-Belt, which is a highly productive region for grain (Boyer et al 2013). The 2012 drought led to a 55% variation in corn yield across the region (Wan et al 2015). Illinois, Iowa, Indiana, Minnesota, and Nebraska were the main states impacted by the drought (Wu et al 2015, Prokopy et al 2017. Here, we examine our estimates of grain (SCTG 02) flow changes during the drought for the Corn-Belt. The drought effect is captured in our model since the total mass flux of grain in 2012 is the lowest among all study years (see figure 2). The Corn-Belt remains a top region for grain outflow in 2012, despite the drought (see table 18 in SI). However, the mass of grain outflows are lower in 2012 due to decreased grain production during the drought. Grain outflows from the Corn-Belt decreased by 3.29 × 10 11 kg (53.18%) from 2007 to 2012, while inflows also decreased (2.84 × 10 11 kg; 53.67%). Across FAF zones, the mass of grain outflow is 15.28%-68.05% lower in 2012 than in 2007. As in figure 40 in the SI, grain inflows also decrease from 2007 to 2012. Corn-Belt grain outflows and inflows rebound following the drought in 2017: outflows increase by 11.67%-129.65% and inflows increase by 30.72%-114.22% across FAF zones from 2012 to 2017, primarily due to an increase in self-loops (see table 25 in SI). This indicates the importance of grain production and processing within the regional agricultural economy.
Link-level changes in grain flows are mapped in figure 8. The largest mass decreases in grain links are concentrated in the Corn-Belt FAF zones (see figure 8(A), which indicates more internal movement in a non-drought year. From 2007 to 2012, mass increases in cereal grain flows to/from the Corn-Belt are connected with the rest of the nation, such as California, Texas, and Mississippi FAF zones ( figure 8(B)).

Future research
There are many potential future research applications of this study. The spatially detailed maps of food flows that we have generated in this study (and make freely available with the paper) could be paired with footprint estimates to quantify embodied resource (e.g. water, carbon, etc) in future work. Our work could also be used to develop understanding of county-level food security, dietary preferences, and agricultural sustainability. Additionally, more realistic transportation distance measures could be considered to develop mode-specific estimates of food flows. For example, the Food Flow Model could incorporate mode-specific travel times (Weiss et al 2018, Nelson et al 2019. This would move us closer to assessing the critical infrastructure that undergirds the production, processing, transport, and storage of agricultural and food flows within the US. Another important avenue of future research is to determine the vulnerabilities and resiliencies that exist in the national food supply network. These findings may be useful for researchers and decision makers interested in food systems security within the United States. The Food Flow Model is a framework that could be applied to other locations. However, since it is a data-driven model, the necessary input data would need to be available in other locations. A major constraint that we foresee as being a likely impediment to implementing the Food Flow Model in other locations is the lack of sub-national commodity flow information. The US government collects information every 5-years in the US Census and uses that information to build the CFS (and subsequent FAF database), which is a key data requirement of the Food Flow Model described here. So, other countries or world regions with comparable coarse-scale commodity flow information to downscale from, could implement the Food Flow Model to estimate finegrained food flows.

Conclusion
We estimated food flows between US counties through time with an improved version of the Food Flow Model. We provide 206.3 million data points with this paper (9 821 956 links including zeros for each of the 7 SCTG agri-food commodities and 3 study years). Our estimates present good general agreements with FAF data (by design), and capture a self-arising power-law relationship among node degree and betweenness centrality. The 2012 Corn-Belt drought is also evident in our estimates and our core counties are mainly consistent through time. The core counties represent some of the major transit hubs, such as Houston, TX, Chicago, IL, and Los Angeles, CA. Thus, our time-series estimates of food flows between US counties contribute to a more comprehensive picture of our national food system for researchers and policymakers.

Data availability statement
The data that support the findings of this study will be openly available at the following URL/DOI: https:// doi.org/10.13012/B2IDB-9585947_V1. Data will be available from 7 February 2022.