Complexity of increasing knowledge flows: the 2022 Southwest Airlines Scheduling Crisis

The 2022 Southwest Airlines Scheduling Crisis, resulting in approximately 15 000 flight cancellations, demonstrates the challenges of structuring infrastructure systems and their knowledge-making processes for increasingly disruptive conditions. While the point-to-point configuration was the focus of immediate assessments of the failure, it became rapidly evident that the crew-assignment software was unable to operate effectively due to the scale of disruption. The airline failed to recognize environmental shifts associated with internal and external complexity, leaving operations vulnerable to a known potential risk: computer and telecommunications failures due to an extreme weather event resulting in knowledge systems failures. The cascading failures of the crisis emphasize the necessity to invest in adaptive capacity prior to catastrophic events and provide a lesson to other infrastructure managers pursuing resilience in the face of increasingly uncertain environments.


Introduction
In December 2022, Southwest Airlines had to cancel an unparalleled number of flights. The precipitating event was a historic winter storm and resulting cold wave across the majority of the United States, including blizzard conditions in the midwest and northeast. While bad weather is nothing new in terms of causing cancellations-and other large airlines like American and Delta also had commensurate levels of cancellations to Southwest in the early days of the storm-Southwest's cancellations persisted for five to six days beyond the initial weather event, triggering approximately 15 000 flight cancellations from December 21st to 29th (Olson 2022, Stiles andHickey 2022). These numbers constitute Southwest canceling over 60% of their flights per day between December 26th to 28th, while their competitor airlines were in single-digit percentage cancellations (Stiles and Hickey 2022). It is important to note that while the Southwest Airlines Scheduling Crisis is used as an example herein, many organizations (airlines and otherwise) are susceptible to similar crises should they not recognize environmental shifts associated with internal and external complexity.
Infrastructure systems (i.e., physical networks, digital technologies, and governing institutions of infrastructure, including sectors such as power, water, transportation) are experiencing increasingly disruptive conditions that challenge their reliability. These disruptions manifest across internal environments (e.g., aging infrastructure, accretion) and external environments (e.g., climate change, emerging technologies), where the environment is defined broadly to include not only environmental change but also social and technological. While Southwest was initially dealing with an external disturbance (the winter storm), they soon were managing compounding internal disturbances. Initially, the cascading cancellations of flights were blamed on Southwest's network structure, a decentralized point-to-point model, which differs from that of the other large network carriers that operate a hub-and-spoke model (which routes flights through an intermediary central hub that can handle large traffic flows, minimizing the total number of unique routes (Rodrigue 2020)). A point-to-point network decentralizes crews and generally assigns them to a 'tour' throughout the day. One canceled flight, therefore, propagates delay through the network as each flight leg is dependent on the aircraft and crew from the upstream legs of the tour. Because a point-to-point model decentralizes operations rather than consolidating them at a few airports, most airports do not have a standby fleet and crew available (Chiarito et al 2022, Cramer and Leveson 2022, Krugman 2022. Compare Southwest to, for example, Delta Airlines, which operates a hub-and-spoke model and has strong hubs with crew bases and spare aircraft (Ryerson andKim 2013, Gaggero andLuttmann 2023). Southwest does operate a few 'focus cities' creating regional and small-scale hubs; but since the aftermath of the COVID-19 pandemic, Southwest has been expanding the number of destinations in its network-stretching its network and decentralizing further (Olson 2022, Southwest Airlines 2022a. The 2022 Southwest Airlines Scheduling Crisis highlights the internal complexity within centralized and decentralized configurations in an increasingly information-dependent world. This growing complexity can hinder infrastructure resilience, defined by the ability of infrastructure to withstand disturbance events, recover quickly if a failure occurs, operate at boundary conditions, and sustain adaptation over long temporal scales (Woods 2015). Infrastructure resilience literature often suggests-often without evidence-that a decentralized configuration should fare better in periods of instability (Helmrich et al 2021). More specifically, this literature asserts that a decentralized configuration (such as point-to-point) may limit cascading failures by quickly recognizing and isolating the failure (Gleick 2003, Goldthau 2014, Zodrow et al 2017, Helmrich et al 2021. The degree, however, of centralization or decentralization in infrastructure networks and institutions is dependent upon the level of internal and external complexity. Helmrich et al (2021) found that 'increased centralization appears beneficial when circumstances require greater levels of coordination' which supports the partial point-to-point configuration Southwest Airlines maintains. It would be untenable to manage direct flights between 121 destinations (Southwest Airlines 2023).
The designation between centralized and decentralized configurations in aviation is unique as the airlines share the same airports but differ in how they route and schedule flights. Operations are critically dependent upon information flows to function, unlike for example water distribution where the flow of the service is largely directed by physical assets. Aviation, therefore, appears to have a critical interdependency with information systems that are managed by aviation institutions (e.g., Federal Aviation Administration, airlines, regulators, etc.). Therefore, the sensing of disturbances is crucial for repositioning organizational resources as instability unfolds and is dependent upon institutional processes given the long design lives of physical infrastructure (Helmrich et al 2021). Yet, Southwest saw a myriad of cascading failures (e.g., failures due to a failure of another interconnected system component, specifically canceled flights in this case) despite their 'more resilient' point-to-point structure. These cascading failures suggest that sensemaking (e.g., knowledge-making, decision-making) plays a significant role in infrastructure resilience; yet, the role of institutions is frequently overlooked in infrastructure literature (Helmrich and Chester 2022).
By operating a decentralized network structure and not consolidating operations at hubs, Southwest found itself with crews-pilots and flight attendants-dispersed throughout their network with no clear direction in terms of their next assignment. Should they wait until they could fly their originally scheduled flight, or do they need to be repositioned to another airport? In a near instant, Southwest's operations team was flooded with information requests from their crews; the system was overwhelmed and ground to a halt. The disruption of operations revealed that Southwest's internal systems were unable to locate and reposition crews to maintain service, resulting in significant cancellations to make 'proactive schedule adjustments' by 'rebalancing the airline and repositioning Crews and [the] fleet' (Southwest Airlines 2022a).
Direct pathways of disruption (e.g., an airport closing due to a winter storm) are often not the largest system vulnerability despite receiving the most attention from infrastructure managers (Markolf et al 2019); it is indirect and non-physical pathways of disruption (e.g., operation management software failing to facilitate the coordination of fleet and crews) which impede knowledge and sensemaking that can have major socio-economic impacts. In the case of the 2022 Southwest Airlines Scheduling Crisis, the dated crew-assignment software, Network Crew Optimization (formerly known as SkySolver), was unable to triage the mismatch of resources at such a large scale (Chokshi 2023). This is a logical interdependency that results from two infrastructures (digital technologies for crew-assignment and aviation transportation networks, in this case) dependent upon the state of the other via a non-physical, non-cyber, and non-geographic mechanism (Rinaldi 2001). While this could be described as a cyber interdependency failure since Southwest had to revert to manual scheduling (Lin 2023), it is critical to recognize that the crew-assignment software was not designed to operate at its current scale, and with such widespread disruption (Chokshi 2023, Wile 2023. Fundamentally, the system failed because of aging infrastructure and accretion, emphasizing a need for infrastructure managers to invest in software modernization and highlighting the interconnectedness of digital technologies in infrastructure systems. Southwest Airlines experienced a problem known as 'technical debt,' or the implied cost of relying upon outdated digital technologies while not upgrading (Tufecki 2022, Sider 2023). Southwest Airlines had been incrementally updating the Network Crew Optimization software, developed over twenty years ago and adopted when they managed 58 destinations (Southwest 2000, GE 2021, Lin 2023, Sider 2023. However, significant operational system upgrades were overlooked in favor of upgrading customer-facing software (e.g., ticketing systems) as well as staying in compliance with federal safety regulations (Sider 2023). Additionally, previous crises (e.g., Boeing's 737 MAX Crisis (Bhattacharya andNisha 2020, Herkert et al 2020)) may have demanded competing resources. This emphasizes the need for holistic sensemaking processes in complex systems so that informed tradeoffs can be made. In this case, Southwest Airlines had been aware of the critical role of technology, publishing to their shareholders in 2022b (pg. 37): The Company is increasingly dependent on technology to operate its business and continues to implement substantial changes to its information systems; any failure, disruption, breach, or delay in implementation of the Company's information systems could materially adversely affect its operations.
Additionally, they explicitly state in this report that their technologies and systems are potentially vulnerable to 'unforeseeable' extreme weather events and such disturbances may cause computer and telecommunication failures (Southwest Airlines 2022b), demonstrating that this event was not an unknown unknown but the knowledge was not appropriately acted upon (Luft andIngham 1955, Snowden andBoone 2007).
Most infrastructure systems are becoming increasingly interconnected through digital technologies, multiplying information pathways with new opportunities but also increasing the opportunity for disruption (Chester and Allenby 2020). For example, on January 11, 2023, the Federal Aviation Agency had an outage within the preflight safety notification system (NOTAM), resulting in the cancellation of 1,300 flights and delaying approximately 10 000 more (Brown and Fadel 2023, FAA 2023, Wile 2023. Despite the risk, digital technologies also allow infrastructure to be better attuned to the increasingly complex environment and respond appropriately. Consider that air traffic controllers, pilots, and airline dispatchers have long communicated over frequencies such that all those using the same airspace could-and are required to-hear all exchanges of information. This situational awareness is critical such that all those in the airspace can be prepared, without any added workload for additional messages and information sharing, should they need to respond to an emergent disturbance. A more modern example is the traffic management advisor technology produced by the Department of Transportation, which allows air traffic controllers at different spatial levels (based on proximity to the ground and airport) to understand the sequence of traffic flow and any real-time perturbations and allows those involved in airport operational decisions the ability to make those decisions and communicate them broadly. These bidirectional information pathways create a distributed system (i.e., decentralized elements coordinated by digital technologies), empowering diverse stakeholders and supporting adaptive capacity (Helmrich et al 2021).
The 2022 Southwest Airlines Scheduling Crisis emphasizes the necessity of sensemaking in infrastructure organizations and the value of cybertechnology. An abundance of information pathways creates increased opportunities for infrastructure managers to develop novel sensing and anticipating processes to inform responses (Thomas et al 2017). Airlines manage a massive volume and flow of real-time data including, but not limited to, fleet maintenance and routing, crew scheduling and location, weather, regulation compliance (e.g., timing out), and passenger (and their luggage) location (Lin 2023). Cybertechnology is a tool for infrastructure managers to develop sensing and anticipating processes, but it adds its own layer of internal complexity that must be monitored (e.g., accretion, technical debt, vulnerability to cyberattacks, or simple mistakes such as the accidental file deletion that caused the January 11th NOTAM failure event (FAA 2023)). Cybertechnology also requires investment-people, time, money-because as infrastructure systems become more tightly coupled with digital technologies, they will experience coinciding failures. Yet, cybertechnology integration provides infrastructure managers with a tool to increase system adaptive capacity-a characteristic that will be vital to maintaining critical services during future foreseeable and unforeseeable disturbances.

Repositioning with resilience
To respond to accelerating complexity and uncertainty, infrastructure systems must be capable of responding to perturbations and also willing to invest in resilience capacity during periods of stability while managing day-to-day operations of coordinating physical assets and people within regulatory constraints in a highly competitive market. Biggs et al (2012) provide seven principles of resilience that can help reimagine infrastructure configurations and processes (Helmrich et al 2021): maintain diversity and redundancy, manage connectivity, manage slow variables and feedbacks, foster complex adaptive systems thinking, encourage learning, broaden participation, and promote polycentric governance. Helmrich et al (2021) describe how oftentimes the capacity for a centralized or decentralized configuration to support resilience depends upon context. The examination of these principles in the context of the 2022 Southwest Airlines Scheduling Crisis provides a valuable scenario to examine centralized and decentralized configuration choices and management of information pathways for a particular infrastructure.
Centralized systems are dominant in infrastructure and oftentimes seen as advantageous in periods of stability; yet, during the winter storm, centralized hub-and-spoke configurations fared better than Southwest's more decentralized, partial point-to-point configuration. In this scenario, the hub-and-spoke configuration was able to maintain redundancy of fleet and crews by having both readily available-and easily accessible-at hub locations. While Southwest may have had some redundancy available through idle fleet and crew (notably, Southwest only flies one model of plane allowing any hired crew to fly), they were unable to locate them due to the volume of requests overwhelming the crew-assignment software (Tufekci 2022, Sider 2023, Chokshi 2023. Southwest, albeit, did have a backup process-manual crew assignment-that was unable to function at the speed and scale required to meet the disruption of the weather event (Chokshi 2023, Lin 2023. The high connectivity of Southwest's operations (i.e., a greater number of unique routes to manage) likely accelerated the impacts of the disturbance, quickly scaling the crew-assignment problem during the extreme weather event and causing the crisis.
The inability to manage slow variables (e.g., climate change, technology acceleration) and feedback transitioned the disaster's primary influence from that of the system's configuration (i.e., centralized or decentralized) to the management of information pathways. Most infrastructure systems are not organizationally structured to foster complex, adaptive systems thinking , and Southwest was likely similar. The winter storm was not unforeseeable given long-term climate patterns and Southwest's own assessment of information system deficiencies. In short, this disturbance (specifically a compounding of technology failures due to extreme weather) was a known unknown at best (Luft andIngham 1955, Snowden andBoone 2007). This is validated by Southwest's 2021 shareholder report that acknowledged the vulnerability of computer systems and telecommunications as well as prior events that resulted in failures of similar form but on smaller scales (Chokshi andMurphy Marcos 2021, SWAPA 2023). Southwest employees and their union representatives have expressed concern about leadership ignoring shortcomings in operational decision-making such as the crew-assignment software (Lonero 2022, SWAPA 2023). The truncated communication implies that the organizational structure of Southwest leans vertical. Vertical organizational structures concentrate power on select managers who delegate decisions and tasks through a chain of authority (Mintzberg 1979), and this structure is common in infrastructure systems . The inability of leadership in any infrastructure organization to recognize environmental shifts associated with accelerating internal and external complexity means the organization will be unable to effectively learn and adapt. Helmrich et al (2021) discuss the value of nurturing enabling leadership (Uhl-Bien and Arena 2018) within organizations, which can empower operators to aid in decision-making; however, operators' expertise is more likely to be heard in horizontal institutional structures (e.g., polycentric governance) that have processes in place to broaden participation by positioning individuals with knowledge, regardless of assigned role, in decision-making positions.

Conclusion
Southwest Airlines failed to learn-from previous mistakes, from mistakes of others (e.g., JetBlue (Katcher 2022)), and from operator warnings. They had openly acknowledged their technical debt as a vulnerability (Southwest Airlines 2022b). They are not the only infrastructure managers facing the reality of increasing complexity, compounded by legacy systems that are increasingly decoupled from internal and external environments (Chester and Allenby 2022). Time will tell if the 2022 Southwest Airlines Scheduling Crisis will be the tipping point for Southwest to begin investing in adaptive capacity. Ideally, other infrastructure systems (in and beyond aviation) will learn from this event and begin to build adaptive capacity prior to catastrophic events and subsequent failures within their own organizations. In doing so they must invest in adaptive capacity for surprise events that spans both physical networks and governing institutions. The sensing and anticipating processes that cybertechnology can support within an organization provide a foundation for engaging with adaptation and learning processes. These four reiterative and recursive processes (sensing, anticipating, adapting, and learning) play a critical role in enabling the resilience of socio-technical systems (Thomas et al 2017). The engagement, and even empowerment, of infrastructure managers, operators, and community members can help bring localized expertise to those with decision-making power. It is evident that the change in the environment is outpacing change in infrastructure systems, and infrastructure managers must embrace instability as the new norm.

Data availability statement
No new data were created or analyzed in this study.