Connecting physical and social science datasets: challenges and pathways forward

The integration of physical and social science data can enable novel frameworks, methodologies, and innovative solutions important for addressing complex socio-environmental problems. Unfortunately, many technical, procedural, and institutional challenges hamper effective data integration — detracting from interdisciplinary socio-environmental research and broader public impact. This paper reports on the experiences and challenges of social and physical data integration, as experienced by diverse Early Career Researchers ( ECRs ) , and offers strategies for coping with and addressing these challenges. Through a workshop convened by the National Center for Atmospheric Research ( NCAR ) Innovator Program, 33 participants from different disciplines, career stages, and institutions across the United States identi ﬁ ed four thematic data integration challenges related to complexity and uncertainty, communication, scale, and institutional barriers. They further recommended individual, departmental, and institutional scale responses to cope with and address these integration challenges. These recommendations seek to inform faculty and department support for ECRs, who are often encouraged — and even expected — to engage in integrative, problem-focused, and solutions-oriented research.


Introduction
The understanding of environmental problems has experienced a paradigm shift over the last several decades.Pressing challenges such as climate change, food insecurity, and water scarcity are complex and 'wicked' socioenvironmental problems (Rittel and Webber 1973, Funtowicz and Ravetz 1993, Levin et al 2013, Head and Alford 2015)-neither simple, predictable, nor solely environmental in nature.Multiple drivers of change operating at different scales shape the interactions within and between socio-environmental systems, generating contribution-inclusive of both challenges and pathways forward-will inform faculty and departmental support for Early Career Researchers (ECR), who are often expected to engage in integrative research without adequate support and guidance.This is timely given major funding agencies' recent prioritization of convergence-based research approaches (NSF 2019), despite said approaches not receiving their deserved institutional support within academic settings.

Methodology
The challenges and recommendations presented below were originally formulated at the workshop, Connecting Physical and Social Science Datasets, held on July 21, 2022 at NCAR in Boulder, Colorado (United States).The workshop explored the challenges, limitations, and opportunities for integrating physical and social science datasets to address complex socio-environmental challenges.Participation in the workshop was voluntary and co-authorship was offered to all 33 participants.Further, participants were informed and agreed that their responses within the workshop could be used in the manuscript, presented here.Those who attended the workshop (in-person and online) were provided a short survey, which gathered information about their reasons for participating, the challenges they face when performing data integration, and certain demographic information.Participants were asked for consent before providing this information and were informed their responses were anonymous and may be used in this manuscript to ensure the conclusions adequately represented the diversity of experiences in the workshop.Different career stages, disciplines, demographic backgrounds, and data integration knowledge and experiences were represented by the 33 workshop participants (figures 1-2).Participants were asked to brainstorm one challenge (perceived or experienced) of integrating different datasets prior to the session.To accommodate participants, the session was hybridized with an in-person and virtual component.The workshop opened with outlining the objectives and conducting an icebreaker.Then, workshop organizers, who have expertise in environmental hazards and risk communication, global climate modeling and human migration, and human dimensions of climate change vulnerability, provided a three-minute overview of how they were conceptualizing and practicing data integration in their research.This was followed by a collective brainstorming activity, where participants were asked to write the most prominent challenges (perceived or experienced) to data integration on Post-it ® Notes (in-person) or on Google's 'Jamboard', an interactive online whiteboard for collaboration.Workshop organizers developed four inductive and data-driven 'challenge' themes from this exercise.Participants elaborated these four themes, and through collaborative brainstorming, identified pathways for coping with and addressing them.

Results
This section describes challenges and pathways forward to enhance data integration, as identified in the workshop.The citations provided below represent theories underlying our experiences, and similarities recognized in broader efforts to cultivate interdisciplinary and convergence research.We recognize many pathways to cope with or directly address specific challenges are constrained by other challenge areas.As a result, while we describe these challenge areas independently, we stress the need for holistic consideration of academic systems governing and rewarding certain forms of research and knowledge production.Table 1 summarizes the results.

Research processes, uncertainty, and complexity
Integrating physical and social science datasets requires clarifying and understanding the assumptions and values that shape data collection, analysis, and interpretation, including how uncertainty and complexity are represented and addressed.

Challenges
Epistemological and methodological approaches used to generate physical and social science datasets commonly differ.For instance, research into Earth system science frequently prioritizes instrument-collected or computermodel-generated quantitative data and statistical analyses, often assumed to be objective and disconnected from the researcher's positionality (Cockburn 2022).Social science research includes diverse ontological, epistemological, and methodological approaches, from quantitative surveys undergirded by assumptions of objectivity to constructivist epistemologies that understand the researcher's interpretation of, for example, oral histories or discourses, as part of the data generation and results itself (Haraway 1988, Nightingale 2003, Lengwiler 2008, Bernard et al 2016, Freese and Peterson 2017).These diverse approaches pose challenges for data integration.First, researchers must agree on the purpose and uses of data, including the roles of frontline communities in processes of research co-design, data use, and data ownership and control (Reyes-Garcia et al 2022).Second, researchers must determine how to identify, represent, and contextualize different forms of uncertainty across physical and social datasets (Sharmina et al 2019).Characterizing and quantifying uncertainties is a non-trivial task-merging research products and datasets can compound the uncertainties inherent in physical and social datasets (e.g., Tate 2013, Doll and Romero-Lankao 2017).Third, researchers must agree on what makes data reliable, including if it needs to be (or can be) reproducible, or requires a certain quantity and quality of observations and validation processes.

Pathways forward
Actions for addressing these challenges include, but are not limited to: • Establishing Early Working Relationships: Establish interdisciplinary partnerships early to consider how social and physical datasets may be collected and integrated across different phases of research design and analysis (Cockburn 2022).This may require larger team structures and longer project durations (Bukvic et al 2022).
• Developing Shared Understandings of Assumptions, Biases, and Uncertainties: Build in time across all research phases for communicating the underlying assumptions, complexities, and uncertainties associated with data collection, analyses, and interpretation (Halvorsen et al 2016, Morss et al 2021, Bukvic et al 2022, Richter et al 2022, Mahmoudi et al 2022).This includes feedback and learning during data integration processes, and developing recognition for how uncertainty and complexity may be communicated upon integration.For example, the National Socio-Environmental Synthesis Center (SESYNC) encourages meeting  facilitation practices that recognize team-building is advanced through 'repeated cycles of divergence and convergence as individuals advance their positions, listen and learn from the rest of the team, and ultimately come to a shared understanding' (Graef et  Analyzing and interpreting physical and social datasets together, or in relation to each other, is complicated by differences in data generation, collection methods, and approaches and assumptions for handling uncertainty and complexity.
• Design projects from the beginning with both physical and social science contributions.
• Develop shared understandings of how uncertainty is accounted for across methods.
• Recognize the importance and value of data generated through community science.
• Provide detailed descriptions of the methodology used and better data sharing mechanisms.

Communication
When communication is reduced to knowledge 'dissemination', the knowledge generation of communities is erased; power dynamics centering academic structures of knowledge production are reified; and opportunities to co-produce knowledge are foreclosed.
• Value communication transparency, learning, and reflexivity for developing partnerships between researchers and communities.
• Emphasize knowledge co-generation processes as knowledge production partnerships.
• Establish clear ethical standards and incentive structures for participatory research and knowledge co-production.

Scale
Temporal and spatial scale mismatches between different datasets present challenges to integration, and when uncritically integrated, can result in inaccurate or unusable conclusions.
• Clearly describe the temporal and spatial scale of data collection and analysis in relation to the problem of interest.
• Design data collection procedures at multiple scales to generate data that informs advances in physical and social science and stakeholder priorities.
• Apply conceptual frameworks and modes of analysis that fit multi-scale research contexts.
• Make decisions about trade-offs across scales regarding data quality and uncertainties.

Institutional Barriers
Limited institutional support for convergence research strengthens silos between disciplines.
Directed efforts that enhance data accessibility and useability are better positioned to invite novel interdisciplinary collaborations through available data products.

Communication
Integrating physical and social science datasets requires effective communication between researchers, policymakers, and communities across all research phases, including moving beyond the equivalency of communication with 'knowledge dissemination'.

Challenges
Communication across the research process is central to all aspects of convergence (Peek et al 2020, Roque et al 2022a).To this end, communication must not be reductively understood as 'knowledge dissemination' between academics (i.e., as 'knowledge producers') and communities and policymakers (i.e., as 'knowledge-users' or 'knowledge-consumers').When framed as such, the knowledge generation of collaborating communities is erased; power dynamics centering academic structures of knowledge production are reified; and opportunities to co-produce knowledge are foreclosed.Instead, researchers must recognize communication as an on-going, iterative, and open exercise between communities and policymakers, integral to co-developing research objectives and outcomes.Significant challenges remain in implementing such models for data and knowledge integration.These include how integration is communicated and how it may reassert hierarchical relationships between researchers and communities (Klenk and Meehan 2015, Klenk et al 2017); the perceived roles of researchers and communities; and data control, privacy, and access considerations (e.g.Finn et al 2022).

Pathways forward
Actions for addressing these challenges include, but are not limited to: • Communication Transparency and Learning: Establishing shared understandings between researchers and communities around ontologies, epistemologies, and methodology-and their importance in conducting research for particular aims-requires commitments to communication, learning, and reflexivity (Palmer

Scale
Integrating physical and social sciences datasets require addressing temporal and spatial scale mismatches, and evaluating their integrative credibility and usability, including implications for policy-and decision-making (Finn et al 2022).

Challenges
No inherent scale mismatch exists between physical and social sciences data-i.e., both can be conducted at granular and coarse spatial and temporal scales.However, certain physical science datasets, as climate data, widely seen as central for addressing complex socio-environmental problems, often emerge from larger, geophysical scales.Much of the community-science data, elaborated above, often occurs at certain administrative or jurisdictional boundaries.Both data can reflect short and long-term records (e.g., biogeochemical scales, intergenerational oral histories, etc.).Therefore, researchers must be cognizant of the potential for physical and social science data to be produced at different spatial, organizational, and temporal boundaries and scales.As one prominent example, the scales of analysis for physical climate data are primarily conveyed on gridded regional-or global-scales, and are often measured at coarse temporal intervals, such as years or decades (Eyring et al 2016).As the spatial resolution of climate models is refined, however, uncertainty expands (Chen et al 2011, Deser et al 2012).Attempts to engage in convergence research can be thwarted when the spatial and temporal scales of these projections do not align with more granular-scale risk assessments (Finn et al 2022).Researchers who focus on local contexts of climate or other environmental change must build capacity for using physical and social science data in ways that support local decision-making and priorities.

Pathways forward
Actions for addressing multi-scale processes and interactions include, but are not limited to: • Explicit Identification of Scalar Units and Cross-scale Interactions: Begin with clear identification of the multiple units or levels of analyses required to address the problem(s) of interest.This can facilitate matching of existing datasets to the chosen scales or levels of analysis using aggregation or disaggregation methods.
• Strategic Primary Data Collection with Attention to Scale: When research is based mainly on primary data sources, design data collection procedures at multiple scales to generate data that informs advances in both physical and social science, as well as stakeholder-driven priorities.
• Incorporating Uncertainties into Conceptual and Methodological Frameworks: Apply conceptual frameworks and modes of analysis that fit multi-scale research contexts, and make decisions about trade-offs across scales regarding data quality and uncertainties.For instance, internal climate variability can easily overwhelm any climate change signal on the scale of a single U.S. county.A useful starting point, therefore, may be to present future event scenarios in the absence of assigned event likelihoods (Dessai andHulme 2004, Shepherd et al 2018).
• Considering Building-up from Scales with the Greatest Understood Certainty: Researchers should consider, carefully, whether to begin integrating and analyzing data at scales matching the greatest levels of environmental certainty.For example, upon having access to similar data (including proxies) at multiple scales, performing analyses across varied levels may provide critical information as to how data behaves when uncertainty increases, and how sensitive the results are to that uncertainty.

Institutional barriers
Integrating physical and social science datasets requires de-centering long-standing cultures of siloed and disciplinary research and training, and intransigent rewards systems that, in large part, disincentivize convergence research.Barriers include disciplinary undergraduate and graduate training, tenure review processes, criteria of research productivity, the peer-review process, and availability of funding mechanisms, among other challenges.

Challenges
The systemic colonial foundation of academic knowledge production obstructs convergence research in a myriad of ways (e.g.Bell and Lewis Jr 2022).First, major funding bodies have prioritized physical and natural sciences, often assuming social sciences are additive, or exist to enhance knowledge dissemination of the 'hard sciences'.Researchers, and institutions, must avoid conflations between social science and science communication, which otherwise risk centering the primacy of physical sciences over convergence-based approaches.Second, the required to build (and repair) relationships and trust with communities (particularly marginalized and underrepresented ones), must be recognized.An insufficient acknowledgement of time, resources, training, and additional workload of interdisciplinary research exists, despite this research often having a greater capacity for scientific and broader social impact.Training at this nexus is scarce and rarely integrated within undergraduate and graduate curricula, which prevents future generations from developing skills for integrating data across fields.Third, dissemination remains challenging given academia's siloed structure, and disciplinary emphasis on refereed academic journals and books.

Pathways forward
Advancing science that integrates physical and social science data requires deep institutional change.Funding agencies, government entities, journal editors, and academic units are examples of key actors who operate at a range of institutional scales that have a critical role in changing individual behavior and reducing barriers to novel, practical, and robust socio-environmental research.Actions that may assist with this progression include, but are not limited to: • Developing Social Science-led Funding Opportunities: Design and publicize models, such as the NCAR Innovator Program, that rewrite the narrative that social science, and in particular, qualitative research, serve as 'add ons' or demand lesser value than other scientific approaches.
• Customizing Structures for Funding Convergence Research: Lengthen funding periods and require community compensation as opportunities for prioritizing the merging of social and physical sciences, and for involving communities, respectively.Refocusing interdisciplinary funding to better align with the realities of complex socio-environmental datasets may support more innovative and impactful research while demonstrating a commitment to advancing convergence approaches.
• Prioritizing Institutional Hiring, Professional Development, and Career Advancement Processes that Value Interdisciplinarity: Ensure new generations of researchers positioned to conduct convergence research will not only be trained, but retained, long-term.This spans teaching, mentoring and research, and includes interdisciplinary undergraduate and graduate academic majors and curricula.This will benefit the next generation of scientists as well as filling broader needs in the scientific community, such as peer reviewers.

Conclusions
In this paper, ECRs from diverse disciplines, and their collaborators, identified shared data integration challenges for effective interdisciplinary and convergence research.Further described were important strategies ECRs proposed to cope with and respond to these barriers prior to initiating research, during data collection and analysis phases, and outside of the research process, more broadly.A common theme across all challenges discussed in the workshop is that successful convergence research requires more time and resources than disciplinary approaches.

Figure 1 .
Figure 1.WordCloud of the most common self-identified fields and sub-fields of expertise from workshop participants.

Figure 2 .
Figure 2. Current position, gender, and data integration experience of workshop participants.Note: Graphs represent 22 of the 33 workshop participants.
al 2021: 17, citing Bennett et al 2018 on 'storming'; see also: Wallen et al 2019). 15 Complement long-term observations from scientific instruments (e.g., about physical landscape change) with local communities' knowledge and expertise.Incorporate, where appropriate, community science, data, and observations to better understand local contexts, especially if evidence-based policymaking is an outcome and physical science data is of low resolution (Bélisle et al 2018, Eddy et al 2017, O'Lenick et al 2019, Fraisl et al 2022).Community science can, further, reform existing instruments.This can occur by honoring, centering, and integrating community knowledge into the co-design of evaluation and measurement products, such as public health assessments, designed for community deployment and use (Roque et al 2022b, Roque et al 2023).Last, other examples include creative and interactive research methodologies, including scenario development, simulations, and online narrative games (e.g.Survive the Century; https://survivethecentury.net/) (Pereira et al 2021; elsewhere: Sanga et al 2021), that engage communities and broader diverse audiences around problem-and solution-framing.• Communicating Underlying Assumptions and Methodologies: Incorporate well-described workflows, assumptions, functional code, and accessible user guidelines into publicly available datasets (Campbell 2005, Brock 2019, Devare et al 2021, Reich et al 2021), making use of repositories such as DesignSafe, the Dataverse Project, the Qualitative Data Repository, and Inter-university Consortium for Political and Social Research (ICPSR).
• Incorporating Community Science Research Approaches: al 2016, Finn et al 2022, Bukvic et al 2022; see above: Graef et al 2021, Bennett et al 2018)., and support ethical and partnership-based approaches may better encourage convergence research.Establishing broader metrics for what counts as scholarship and institutionalizing communitydefined success metrics when defining research impact are examples (Corbin et al 2015, Staub and Maharramli 2021, Bell and Lewis Jr 2022).As one example, the Faculty Senate at the University of Washington recently approved legislation that amends the Faculty Code to include 'community-engaged' research and teaching activities to the list of scholarly achievements (University of Washington 2023).
encompass a focus on Reciprocity in Hazards and Disaster Research (West et al 2021), Positionality in Hazards and Disaster Research and Practice (Evans et al 2023), and Cultural Competence in Hazards and Disaster Research (Wu et al 2019).•Supportive Incentive Structures: Conventional metrics of academic success rely on citation numbers, publishing outlets, and an array of quantitative metrics (e.g., H-Index; i10-Index).Incentive structures that value, prioritize

•
Updating Publishing Processes and Rewards: Review interdisciplinary and field-specific journal aims and consider how convergence research intersects with the work they publish.Value and reward a diversity of research and data products (Corbin et al 2015, Bell and Lewis Jr 2022) consistent with the applied value of convergence research.
Stokols et al 2008b, Börner et al 2010 et al 201, Hall et al 2018)1e datasets and methods, and establish clear lines of communication all prolong such timelines(Finn et al 2022, Bukvic et al 2022).As a result, funding structures and academic reward systems must reflect these needs.Another key takeaway is that pathways to effective data integration will require simultaneous and intentional change at individual, department, and institutional levels(Morss et al 2021).Who has the power to enact change depends on the scale in question.Whether it is changing individual behavior or reducing systemic barriers to socioenvironmental research, successfully challenging disciplinary priorities and the implied hierarchy between physical and social science demands concerted efforts from actors at different scales.To this end, we stress that interdisciplinary and convergence approaches do not require researchers to hold highly specialized and overlapping knowledge in disparate disciplines (NRC 2015, Peek et al 2020, Fiore 2021).Rather, 'complementary expertise' (NRC 2015: 23) paired with practices of research engagement, which commit to communicating, processing, and being open to the ontological, methodological, and epistemological ways in which other researchers, policymakers, and communities understand the world around them, is required (e.g.,Stokols et al 2008a, NRC 2015, Bennett et al 2018, Graef et al 2021, Fiore 2021).Reflexivity and reflection, a part of this process, further enables opportunities for strengthening learning and collaboration between communities and universities (e.g.Ostrander and Chapin-Hogue 2011).Related multi-scalar challenges and opportunities associated with high functioning interdisciplinary collaborations are elaborated in the Science of Team Science (SciTS) scholarship (seeStokols et al 2008b, Börner et al 2010, NRC 2015, Hall et al 2018).Overall, practices of research engagement can enable the parties involved in convergence research to integrate and transcend disciplinary mental models of critical problems, without being specialists in multiple fields.Cultivating such practices are already emphasized in interdisciplinary education (e.g.course projects) and fellowship training programs (e.g., Wallen et al 2019).However, they must be explicitly emphasized in disciplinary undergraduate and graduate education, with supportive actions including, but not limited to, faculty hiring in joint-appointment positions, convergence-focused research cohort opportunities for students (undergraduate and graduate) and postdoctoral scholars, and course requirements in interdisciplinary fields.In sum, while our results (table1) are not necessarily novel-and in fact, corroborate scholarship identifying obstacles for interdisciplinary and convergent team research (Palmer et al 2016, Peek et al 2020, Morss et al 2021, Finn et al 2022, Bukvic et al 2022)-the exercise adds to a critical mass of academics advocating for multi-levelled change to foster meaningful and impactful research activities.