What topics of peer interactions correlate with student performance in physics courses?

Research suggests that interacting with more peers about physics course material is correlated with higher student performance. Some studies, however, have demonstrated that different topics of peer interactions may correlate with their performance in different ways, or possibly not at all. In this study, we probe both the peers with whom students interact about their physics course and the particular aspects of the course material about which they interacted in six different introductory physics courses: four lecture courses and two lab courses. Drawing on social network analysis methods, we replicate prior work demonstrating that, on average, students who interact with more peers in their physics courses have higher final course grades. Expanding on this result, we find that students discuss a wide range of aspects of course material with their peers: concepts, small-group work, assessments, lecture, and homework. We observe that in the lecture courses, interacting with peers about concepts is most strongly correlated with final course grade, with smaller correlations also arising for small-group work and homework. In the lab courses, on the other hand, small-group work is the only interaction topic that significantly correlates with final course grade. We use these findings to discuss how course structures (e.g. grading schemes and weekly course schedules) may shape student interactions and add nuance to prior work by identifying how specific types of student interactions are associated (or not) with performance.


Introduction
Research has demonstrated that interacting with more peers about a science course is linked to increases in students' self-efficacy, sense of belonging, self-confidence, and academic achievement [1][2][3][4][5][6][7][8][9][10].In regard to academic achievement in particular, interacting with peers about specific course material has been shown to be central to students' learning of that material [11][12][13].Peer interactions afford students the opportunity to exchange information with one another in the dynamic process of co-constructing their understanding [14][15][16][17].Collaboration with others also provides opportunities for students to individually reflect on their own understanding.
Most of these studies, however, examine how the number of peer interactions in which a student engages, rather than the specific topics about which they interact, relates to student performance.One study, conducted by Bruun and colleagues [7], separately analyzed networks of student interactions about problem solving and about physics concepts.They found that students who are connected to well-connected others in the problem solving network, while students who are connected to many other people in general in the physics concepts network, tend to earn higher grades.Students' interactions about different aspects of a course, therefore, may correlate with their performance in different ways, or possibly not at all.In the current study, we similarly investigate whether and how student engagement in peer interactions about various topics correlates with their performance.Different from the work by Bruun and colleagues, we determine these topics through an emergent coding scheme using students' open-ended survey responses about their peer interactions.
We also disentangle the relationships between interaction topics and student performance in the instructional contexts of lab and lecture separately, motivated by our prior work [28].Given the different learning objectives, course structures, and grading schemes of these two contexts, we hypothesize that students may interact with peers about different topics in each context and that the relationship between interacting about certain topics and performing well in the course may vary between contexts.Physics labs, for example, often center around students' experimental investigations with small groups of peers, both in terms of course schedule (students often attend lab sessions for a few hours per week) and grading scheme (lab grades are often based on deliverables related to students' small-group work, such as lab reports) [30].Physics lecture courses, in contrast, place emphasis on multiple forms of course engagement: students attend both lectures and small-group problem solving sessions each week and are largely graded on exams and longer homework assignments.Thus, we both probe and analyze student interactions about these two instructional contexts separately.
This study aims to address the following research questions: (i) To what extent is the number of peers with whom a student interacts related to their final course grade in distinct lab and lecture physics courses?
(ii) About what topics do students interact with their physics peers in distinct lab and lecture courses?
(iii) Which, if any, peer interaction topics correlate with students' final course grades in distinct lab and lecture physics courses?
The first question allows us to compare our data set to those in prior work by performing a similar analysis, while the second and third questions expand on the existing body of literature.
We collected survey data from six different introductory physics courses (four lecture courses and two lab courses) at Cornell University, asking students to self-report their peers with whom they interacted about the course and to describe the aspects of the course material they discussed in these interactions.We find, in most courses, that students who interact with more peers about the course material tend to earn higher final course grades, consistent with prior work.From the written explanations, we observe that the topics about which students interact with their peers closely align with the class structures: in lab, students primarily mention interacting about small-group work, while in lecture, students mostly mention interacting about homework, but they also describe interacting about specific physics concepts and lecture material.Furthermore, different topics of interactions correlate with students' final grades in lab versus lecture courses.In lab, small-group work is the only interaction topic that significantly correlates with students' final grades.In lecture, on the other hand, interacting with peers about physics concepts correlates most strongly with final course grade.Interacting about small-group work and homework also significantly correlates with final course grade in lecture, though with smaller effects than interacting about concepts.These results add nuance to prior work by identifying the specific types of peer interactions, in addition to the number of peer interactions, that are associated with stronger student performance.

Methods
In this section, we summarize the instructional context of our study and then describe the data collection and analysis methods.

Courses and participants
The data for this study came from two offerings (fall and spring of the same academic year) of three in-person introductory physics courses -two lecture courses and one lab course -at Cornell University (six courses total; Table 1).
Table 1.Summary of survey response rates and the self-reported gender, race or ethnicity, intended major, and academic year of students in each course.Survey response rate is calculated as the percent of students enrolled in the course who completed the survey.For demographic information, the percentages are out of the number of students included in our analysis.Students are categorized as non-URM (underrepresented and minoritized) if they only self-identified as White and/or Asian or Asian American and as URM if they self-identified as at least one of the following: American Indian or Alaska Native, Black or African American, Hispanic or Latinx, and Native Hawaiian or other Pacific Islander.Both lecture courses were calculus-based mechanics courses.One lecture course was primarily designed to serve non-physics majors (predominantly engineering students and other science majors) and the other lecture course was designed to serve physics majors.However, students were encouraged to take courses according to their preferences and academic preparation, leading to some variety in students' majors between each course (Table 1).The lecture course for engineers was larger, with 300-500 students per semester, and was taught in a "flipped-classroom" format.Students in this course attended three 50 min lectures each week.Before each lecture session, students were assigned textbook readings and a short reading quiz.Each lecture section contained 150-300 students in a stadium-style lecture hall, where students engaged in active learning activities such as review of the pre-class reading material, clicker questions, and demonstrations.Students also attended two 50 min discussion sections each week.Each discussion section contained 20-25 students and was led by a graduate teaching assistant and often a supporting undergraduate teaching assistant.In discussion sections, students worked in self-selected small groups of three or four to complete ungraded practice problems.These small groups remained consistent throughout the semester.The lecture course for physics majors, in contrast, contained 30-50 students and was taught predominantly through traditional lectures in stadium-style classrooms, with a few clicker questions per lecture that students answered individually.Students in this course also attended three 50 min lectures each week.Similar to the other lecture course, students attended two 50 min discussion sections containing 20-25 students each week, where they completed ungraded practice problems in self-selected small groups (which were consistent over the course of the semester) or followed along as a graduate teaching assistant demonstrated problem solutions.Once per week, students completed a short, graded quiz during discussion.In both lecture courses, students completed independent problem sets each week for homework.Students were offered optional, course-facilitated homework sessions outside of class supported by the course instructor or graduate and undergraduate teaching assistants.Each lecture course had two midterm exams and a final exam.The grading scheme for each lecture course is shown in Table 2.
The lab course was offered as a distinct course (i.e., separate course code and final grade) in which the students in the two lecture courses described above typically coenrolled.The lab course was larger than the combination of the two lecture courses (400-600 students) because students who had transfer or AP credits for the lecture course were still required to take the lab course.The lab course focused on teaching experimental skills, as in Refs.[31][32][33][34], with experiment topics spanning both mechanics and electromagnetism.Students attended one 50 min lecture each week, where each lecture section contained 200-300 students in a stadium-style lecture hall.These lectures focused on experimental and statistical analysis topics and students participated in collaborative active learning activities including small group discussion and clicker questions.Students also attended a 2 h lab session each week, which contained 20-25 students and was facilitated by a graduate teaching assistant and often a supporting undergraduate teaching assistant.Students worked in small groups of two to four to complete open-ended experimental investigations.At the end of each session, each group submitted lab notes for a group grade.Periodically throughout the semester, groups delivered oral presentations about their lab projects to the rest of their lab section.Lab groups were assigned by the graduate teaching assistant, with consideration given to student preferences indicated on an online survey at the beginning of the semester.The teaching assistants were also advised to avoid creating groups with a lone woman.Lab groups stayed consistent throughout the whole semester.Students took one midterm quiz and one final quiz in this course.Students also completed independent homework assignments in Jupyter Notebook focused on data analysis techniques and concepts, as well as reflection exercises about ethics, collaboration, and experimental design [35].The grading scheme for the lab course is shown in Table 2.
Students who needed to take calculus-based introductory mechanics were advised to co-enroll in one of the two lecture courses and the lab course.Most students coenrolled, but it was ultimately up to the student if they would like to take both courses together.Because our data collection took place solely in the lab course, there may have been students in the lecture courses that our data collection missed.The survey response rates were all above 90% of enrolled students, however, suggesting that the proportion of students our data collection missed was small (Table 1).On the other hand, between 10% and 30% of students in the lab course were not enrolled in one of the two lecture courses analyzed in this study.These students were likely only taking the lab course (e.g., if they had AP credit for the lecture course) or were enrolled in a different lecture course than the two lecture courses analyzed here (such as the next course in the sequence focused on electricity and magnetism).

Data collection
We administered an online survey via Qualtrics as part of a homework assignment in the lab course.The survey was given in the middle of the 15-week semester, after students had completed at least one assessment in both the lab and lecture courses.The survey asked students to list peers in each of their physics courses (lab and lecture) with whom they recently had a meaningful interaction using a survey prompt adapted from prior work [19,20,25,28,36].For each peer that they listed, students were also asked to explain what aspects of the course material they discussed with that peer.The survey prompts were as follows: Please .We asked students about who they interacted with "this week" to capture interactions that students were regularly having with their peers throughout the course while reducing the possibility of recall bias (e.g., if we asked them to recall all peers with whom they have interacted throughout the semester).This phrasing may have captured a few one-off interactions that only occurred the week of the survey, however such interactions likely represent a small fraction of the reported interactions.
In our data cleaning, responses containing misspelled names, nicknames, or only a first or last name were manually compared to the course roster by the first and second authors.If only a first or last name was listed that was not unique within the course roster, that particular interaction was dropped from the analysis.We were able to match at least 90% of all nominations made in the survey to the roster.Additionally, response rates for each course in the analysis were at least 90% (Table 1).Applying methods of social network analysis to this data set, therefore, is reliable because we have less than 30% missing data [37].
The survey also asked students to self-disclose their demographic information, including gender, race or ethnicity, academic major, and year (Table 1).Both the lecture course for engineers and the lab course contained approximately even proportions of men and women, while a majority of the students in the lecture course for physics majors were men.Additionally, the composition of underrepresented and minoritized (URM) students doubled from approximately 15% to 30% between the fall and spring offerings of each of the three analyzed courses.The lab course contained a majority of engineering students.In the lecture courses, the composition of academic majors generally followed the expectation (i.e., majority of engineering majors in the lecture course for engineers and majority of physics majors in the lecture course for physics majors), with slight variation between semesters.All six courses also contained at least 80% first-year students.

Data analysis
We conducted three stages of analysis: determining the structure of the peer interaction networks, measuring the relationship between students' position in the interaction networks and their course grade, and identifying which kinds of interactions are correlated (or not) with students' course grades.

Network structure
We first used methods of social network analysis [10,36,38] to understand the broad structural features of our six interaction networks (one for each of two offerings of the three courses).We converted the self-reported peer interactions into directed networks.Each student was considered a node in the network and each reported interaction from the survey was considered an edge.Edges pointed from the nominating student to the student with whom they reported an interaction.A one-way edge indicated that one student reported having a meaningful interaction with another student, while a two-way edge indicated that two students reported having a meaningful interaction with each other.Interactions conceptually indicate mutual communication and involvement, thus one could interpret all reported interactions as inherently twoway edges.However, one-way interactions may indicate two forms of survey bias.One possibility is recall bias, where one student does not remember the interaction or the name of the other student.The other possibility is an over-reporting of interactions, where the nominator listed many cursory interactions and the nominated student did not consider the interaction meaningful.These biases respectively produce an underrepresentation or over-representation of meaningful interactions in the course.We treated the networks as directed in our analysis, therefore, such that mutually reported edges are weighted more than one-way edges, but all reported edges are still considered.This treatment helped provide a middle-ground between both possible biases.
For each network, we measured four different network statistics to describe the overall structure: (i) Density: the number of edges in the observed network as a proportion of all possible edges that could exist in the network (ii) Transitivity: the tendency of nodes in the network to cluster together, measured as the proportion of two-paths (two edges connecting three nodes) that are closed by a third edge to form a triangle (iii) Number of clusters: the number of groups of nodes that are connected to each other but not to any other node in the network (iv) Giant component: the number of nodes contained in the largest cluster of the network (v) Number of isolates: the number of nodes in the network that are not connected to any other nodes (i.e., nodes with zero adjacent edges) We determined the standard errors of the density and transitivity values via bootstrapping with the snowboot package in R [39] to get a sense of uncertainty in the statistics.Each observed network was re-sampled in 10,000 bootstrap trials.We calculated a given network statistic for each sampled network and found the standard error of the statistic across the distribution of all sampled networks.

Relationship between interaction networks and grades
Next, we used exponential random graph models (ERGMs) to understand the relationship between students' number of peer interactions and their final grades, controlling for other measurable variables in the networks.ERGMs assume that networks form from a series of social processes and that this formation is often related to the qualities of the members of the network.Therefore, the model considers ways that the nodes of a network might self-organize on a structural level and how attributes of those nodes (e.g., grades) are related to the way they organize.
The model assumes that an observed network is one of a large exponential distribution of potential networks that could form from the given set of nodes.The ERGM models this distribution of possible network structures and determines if patterns of organization (i.e., students with higher grades having more central positions in the network) are significantly more present in the observed network than would occur by random chance [40,41].The goal is to use k predictor variables or network statistics, g k (y), and their corresponding coefficients θ k to predict the structure of the random (observed) network Y .The model takes the form: where y is a realization of the random network Y and ψ = y exp ( k θ k g k (y)) is a normalization constant that ensures that the probability sums to one.Given an observed network y, the coefficients of the model are estimated using Maximum Likelihood Estimation (MLE).Due to the dependence between the network edges, the MLE is commonly approximated with Markov Chain Monte Carlo (MCMC) techniques [42].
The coefficients θ k represent log-odds of tie formation and can be interpreted as a weighting of the importance of each modeled configuration for the realized network, where positive coefficients show that the configuration is observed more frequently than by chance after accounting for all other configurations that are modeled, and vice versa for negative coefficients.We chose a set of predictor variables that incorporated both structural variables and nodal variables, similar to our prior work [28].Our final model included the following predictor variables: (i) Edges: main intercept term measuring the number of observed edges (ii) Reciprocity: measure of reciprocal or two-way edges (e.g., student A reports an interaction with student B and student B reports an interaction with student A) (iii) Geometrically-weighted out-degree (GWOD); decay parameter = 0.7: measure of the distribution of outgoing edges of each node (iv) Homophily on lab section: measure of edges occurring between students in the same lab section (v) Homophily on discussion section: measure of edges occurring between students in the same discussion section (vi) Homophily on lab group: measure of edges occurring between students in the same lab group (vii) Homophily on gender : measure of edges occurring between students of the same gender (viii) Main effect of gender on degree (woman): measure comparing women's total number of adjacent edges to men's total number of adjacent edges (ix) Homophily on race or ethnicity: measure of edges occurring between students of the same URM status (x) Main effect of race or ethnicity on degree (URM): measure comparing URM students' total number of adjacent edges to non-URM students' total number of adjacent edges (xi) Main effect of final course grade on degree: measure of correlation between students' total number of adjacent edges and their final course grade The first three predictor variables measured structural features of each network, while the remaining eight predictor variables measured the relationship of node-level attributes to the formation edges of the network.In this study, we focused on the main effect of final grade on degree variable to investigate the relationship between students' position in the interaction networks and their performance in the course.We included the other predictor variables to control for other aspects of students' identities and participation that likely affect network formation.Removing these terms from the model may have led to different results for the relationship between grade and network degree through omitted variable bias [43].We note that this list of variables differs slightly from that used in our previous work [28].First, we added a term to measure the tendency for students to nominate peers in their same lab group, beyond just those in their same lab or discussion section.We also found that models using the term for geometrically-weighted edgewise shared partners (GWESP) did not converge for our observed networks.Thus, we replaced this term with the geometrically-weighted outdegree (GWOD) term, a term similar to GWESP that aids in model convergence and preventing model degeneracy [44].The GWOD term accounts for the outdegree (the number of adjacent outgoing edges to a node, indicating number of nominations reported on the survey) distribution for all nodes in the network, with more weight placed on nodes with lower outdegrees (lower numbers of nominations reported on the survey) because such distributions are often highly skewed [45,46].Including this term allowed for model convergence and improvements to the goodness-of-fit diagnostics (see Fig. 5 in the Appendix) because our observed networks had a large proportion of students who did not nominate others.
We also note that the main effect of final course grade on degree variable cannot handle missing data, therefore nodes and associated edges that are missing grade data (i.e., students that may have dropped or withdrawn from the course after completing the network survey) were not included in the ERGM analysis.In all cases, at least 90% of enrolled students were retained in the ERGM analysis.

Types of interactions and their relationship with grades
To further examine the correlation between student interactions and their final course grades, we analyzed students' responses to the question: "What aspects of the course material did you discuss with this person?"We performed a thematic coding analysis to identify the main aspects of the courses being discussed among students.The first and third authors initially read all of the student responses across the six analyzed courses to get a sense of the data as a whole [47].These authors then identified common themes in the explanations and defined a preliminary codebook.The same authors then iteratively coded a subset of responses independently, met to discuss coding disagreements, and modified code definitions [48].Modifications to the codebook were also discussed with the full project team between iterations.
Once the coding scheme was finalized, the two authors coded a random sample of 100 of the 1,982 total reported interactions (1,177 in lab and 805 in lecture) across all six courses to determine interrater reliability.We stratified the random sample by instructional context (50 in lab and 50 in lecture) because the courses were structured differently and had different learning objectives.We calculated Fuzzy Kappa [49] to determine interrater reliability between the two coders because multiple codes could be applied to each response.Fuzzy Kappa was 0.93, exceeding the reliability threshold of 0.80 [49].After establishing reliability, the first author coded the remaining explanations.
We then created histograms of the code frequencies to determine which aspects of the lecture and lab courses students interacted with each other about the most.When determining the frequencies of each code, each reported interaction was counted separately.That is, if two students mutually reported an interaction with one another, the codes from each of their reported interactions were counted.We also combined data from the lecture course for physics majors and the lecture course for engineers because there were not large differences in the code distributions in each of these courses.Within each instructional context, lab and lecture, we aggregated the data from the fall and spring offerings because there were not substantial differences in the code distributions when considering each offering separately.This larger data set helped to reduce possible noise in our statistical analysis.
We employed linear mixed models, or hierarchical linear models [50], to understand how the topics of students' interactions, identified by our coding scheme, related to their course performance.We ran a linear mixed model for each context, lab and lecture, because the relationships between interaction topics and final grades may be different in each context.We included a random effect in each model to account for variations (e.g., instructor, instructional style, and students' prior preparation) in the different lecture courses students were taking: fall offering of the lecture course for engineers, fall offering of the lecture course for physics majors, fall offering of any other lecture course, spring offering of the lecture course for engineers, spring offering of the lecture course for physics majors, and spring offering of any other lecture course.The "any other lecture course" categories were only used in the lab model, where there were students who were not enrolled in one of the two lecture courses analyzed in this study.We calculated the intraclass correlation coefficient (ICC) for each model to verify the inclusion of this random effect [50].ICC quantifies the fraction of the total variance in the student-level data that can be attributed to variance between each offering of the lecture courses.Typically, a random effect should be included in the model if the ICC for that effect is at least 0.05.The ICC for course was 0.02 and 0.11 in the lab and lecture models, respectively.Thus, the ICC for the lecture model surpasses the common threshold, while the ICC for the lab model does not.We opted to use the same linear mixed model for both the lab and lecture contexts for consistency.We also checked that using a single-level linear regression (i.e., without the random effect) for the lab context produced the same overall results as the linear mixed model.
In the models, students' final grades were the dependent variable and whether or not they interacted with at least one peer about each interaction code comprised the binary predictor variables.Students' final course grades were converted from letter grades to grade point average (GPA) points.We found that this produced enough discrete values to be approximated as a continuous variable.We also checked that using an ordinal logistic regression produced the same overall results, but report the results of the linear regression because they are more interpretable.
For the predictor variables, we considered a student as interacting about a given code if they had at least one interaction (either an incoming or outgoing edge) that received that code.We decided not to consider the number of times each student interacted about each code because the distributions of codes per student were highly skewed, with many students having zero interactions and very few students having at least two interactions about each code.Grouping together students who had at least one interaction about each code, therefore, allowed for more comparable sample sizes of students with zero and at least one interaction.Additionally, considering the number of times each student interacted about each code would simultaneously measure the effect of the number of interactions the student had and the topic of interaction.Considering only whether the student had at least one interaction with each code more explicitly addressed our third research question.That is, this treatment of the predictor variables directly determined whether or not students who interact with peers about a given topic, regardless of the number of these peers, tend to receive a higher course grade than students who do not interact with any peers about that topic.
Similar to our ERGMs, the linear mixed models did not include students with no final course grade (less than 10% of enrolled students).Interactions reported by a student who was dropped from the analysis, however, still counted for the student who remained in the analysis.

Results
In this section, we present the results for each stage of analysis.

Network structure
The network diagrams and network-level statistics for all six interaction networks are shown in Fig. 1 and Table 3, respectively.In Fig. 1, each student is represented as a node and the nodes are colored by students' final course grades, with darker blue indicating lower grades and lighter green and yellow indicating higher grades.Nodes are also sized by total degree (sum of incoming and outgoing edges), with larger nodes having more connections in the network than smaller nodes.Each of the connections (edges) between nodes represent a reported interaction, with thin lines representing one-way edges (only one student reported the interaction) and thick lines representing two-way edges (both students reported the interaction).We observe that most of the networks are quite interconnected, containing many edges.Network densities cannot be directly compared across networks of vastly different sizes because this measure does not scale linearly with the number of nodes in a network, however we see that the densities for both offerings of the lab course and the lecture course for engineers (courses with relatively similar class sizes) are comparable in magnitude.The densities of interaction networks in the lecture course for physics majors are larger in magnitude because there are fewer nodes and thus fewer possible edges.
The networks also contain relatively large giant components (the largest interconnected cluster of students in a given network).Specifically, in four out of six courses, more than 60% of all students in the network are connected within this giant component.We suspect that the smaller proportion of students in the giant component in the spring offerings of the lab course and the lecture course for engineers is due to the class sizes increasing by 1.5 times in the lab course and 2.2 times in the lecture course compared to the fall offerings.Indeed, the raw number of students in the giant components of these two networks is comparable to the number of students in the giant components of the fall offerings.Similarly, students in the spring offerings of the lab course and the lecture course for engineers form more individual clusters (more than 50) than students in the other four courses (21 or fewer).These observations are likely due to students having a fixed capacity for knowing and interacting with peers regardless of the size of the class in which they are enrolled.
The transitivity values are also similar across all six networks, indicating that students in all of the analyzed courses have similar tendencies to form small groups of peers with whom they interact.The proportions of nodes that are isolates are also similar across all six networks, with 14% to 30% of students in each course having zero adjacent edges.Isolated students did not report any interactions on the survey and no other students in the class reported interacting with them.

Relationship between interaction networks and grades
Visually, we observe in Fig. 1 that many of the large, well-connected nodes in these networks tend to have lighter colors, indicating higher grades.There are, however, some exceptions to this general trend.For example, the spring offering of the lecture course for engineers contains many large dark blue nodes, indicating lower grades, that are connected to many other nodes.
The ERGMs allow us to quantitatively measure this relationship between students' "Attended [the] study hall and discussed strategies for the homework", "We live together so we make sure to discuss the reading chapters."

Other
Response is vague, captures ideas that are too infrequent to warrant a separate code, or is entirely blank.

"General well-being"
position in the interaction network and their final grade.Controlling for other structural features of the network, we find that there is a significant, positive correlation between students' degree in the interaction network and their final course grade in both offerings of the lab course and both offerings of the lecture course for engineers (Fig. 2 and Table 5 in the Appendix).In both offerings of the lecture course for physics majors, however, we see no significant correlation between students' grades and their interaction network degree.

Types of interactions and their relationship with grades
Our coding scheme captures common topics that students report interacting about with their peers (Table 4).The coding scheme closely aligns with the class structure, assignments, and grading scheme (Table 2) of each course (i.e., small-group work, assessments, lecture, and homework ), with the exception of concepts, which indicates interactions about specific physics content students are learning.In the lab courses (top panel of Fig. 3), small-group work is the most common interaction topic, with students mentioning this topic in about 60% of the explanations.Students also mention concepts, lecture, and other topics in 10%-20% of the reported interactions.In the lecture courses (bottom panel of Fig. 3), interactions about homework occur the most often (about 40%), while the remaining five codes are similarly frequent (between 10% and 25%).
Linear mixed models indicate which of these interaction topics are correlated with students' final grades in each course.In the lab course, interacting about small-group work correlates most strongly with students' final grades (top panel of Fig. 4 and Table 6 in the Appendix).Interacting with at least one person about small-group work corresponds to an increase in final course grade by 0.2 GPA points as compared with students who did not interact with any peers about small-group work.Interacting with at least one peer about the lab lecture also positively correlates with students' final grades (corresponding to an increase in final course grade by about 0.1 GPA points), though the effect is not statistically distinguishable from zero.
In the lecture courses, we see a wider range of interaction topics that significantly correlate with students' final course grades (bottom panel of Fig. 4 and Table 6 in the Appendix).Interacting with at least one peer about concepts correlates most strongly with students' final grades in the lecture courses, corresponding to an increase in final course grade by about 0.2 GPA points as compared with students who did not interact with any peers about concepts.Additionally, interacting with at least one peer about small-group work or homework positively correlates with final course grade, each corresponding to an increase in final course grade by about 0.15 GPA points.Similar to the lab courses, interacting with peers about lecture positively correlates with final course grade (corresponding to an increase in final course grade by about 0.1 GPA points) in the lecture courses, but this effect is not statistically distinguishable from zero.
Interestingly, the extent to which interaction topics correlate with final course grade (Fig. 4) mirrors the frequencies of the topics (Fig. 3) in the lab course, with more frequently mentioned topics generally having a higher correlation with final grade in this course.In the lecture courses, on the other hand, the interaction topic frequencies do not closely mirror their relationship with student grades.While there is a breadth of interaction topics that students frequently discuss and that significantly correlate with grades in the lecture courses, there are some discrepancies to this pattern.Homework, for example, is by far the most discussed topic, but is only moderately correlated with grades.

Discussion
In this study, we identified the specific course topics about which students interact with their peers and determined whether and how engagement in interactions about these topics is correlated with students' final course grades.In the remainder of this section, we synthesize the findings for our three research questions within each instructional context, lab and lecture, separately and then discuss the limitations of our study.

Relationship between peer interactions and student performance in lab
Regarding our first research question, we replicate prior work finding that engagement in more peer interactions within a lab course is positively correlated with students' lab grades [16,29].We did not find such a correlation in our previous work investigating remote physics labs, possibly because the lab material was part of the larger lecture course, rather than a distinct course, and only accounted for between 10% and 20% of students' final course grades [28].The current result is still somewhat surprising because we expected, due to the collaborative nature of the lab assignments, that all students would interact with each other during in-person lab sessions.A small range in students' number of peer interactions would reduce the possibility of finding a statistically significant correlation between interactions and grades.Instead, however, our results point to the variability in the extent to which students engage in and perceive meaningful small group interactions during lab.For example, there are students who may show up to a different section than their own on any given week (e.g., due to illness) and work with an entirely new group of peers with whom they are not familiar.There may also be situations where group dynamics lead to students being excluded from the conversation.Still other students may choose to disengage from the activity entirely and opt to do other work on their phones or laptops.
Regarding our second and third research questions, we build on prior work by also examining the topics about which students interact related to lab instruction and how those interaction topics relate to student performance.Our analysis unveiled that the majority of peer interactions in the lab course were related to the small-group work that takes place during lab sessions, as one might expect.To a much lesser extent, students also talked to their lab peers about physics concepts, lab lecture, assessments, homework, and other topics.These topic frequencies mirrored the strength of correlations between students interacting about a given topic and their final course grade: small-group work was the only topic that significantly correlated with student performance in the lab course.
These results are likely due to features of the lab course structure such as the time allotted to each course component, the grading scheme, and the nature of the learning activities.In this lab course, in-class time was primarily allotted to small-group work (2 h per week), and lab notes and presentations completed during this small-group work comprised 54% of students' final course grades (see Table 2).The open-ended nature of the experimental investigations performed during lab sessions also necessitated peer interactions to, for example, make experimental decisions and write up and submit the lab notes for a group grade, moreso than in a traditional physics lab [21].The large amount of class time and large fraction of the final course grade dedicated to small-group work, and the collaborative nature of the small-group work, therefore, likely explain why this topic was both the most common and the most strongly correlated with student performance in the lab course.
In contrast, students only attended lab lectures for 50 min per week and lecture attendance and participation only accounted for 18% of students' final course grades.While the lectures made use of active learning strategies such as clicker questions, the collaborative nature of lecture activities likely did not outweigh the relatively low time and grade weight allotted to this course component.Likewise, homework and assessments (quizzes) did not take up a lot of time relative to other coursework, made up a small fraction of the course grade (15% and 12%, respectively), and were mostly completed individually.
Though not directly reflected in the course structure or grading scheme, students did not report interacting about concepts very frequently in the lab course nor did interacting about concepts correlate with student performance.With regard to physics concepts (e.g., angular momentum), this finding is consistent with the learning goals of the lab course, which focused on experimental skills rather than content reinforcement [31][32][33][34].The concepts code, however, also captured concepts specific to the lab course, such as data analysis concepts, that students applied in their small-group work, homework, and assessments.We suspect that students' interactions about such concepts may have been conflated in their reports of these other interaction topics (e.g., students may have interacted about concepts during the small-group work to which they referred in their written explanation).Alternatively, the lack of correlation may again reflect the overall grading scheme, where assessment of these concepts (through homework and quizzes) made up a relatively small proportion of students' grades.

Relationship between peer interactions and student performance in lecture
Regarding the first research question, we again replicate previous findings that students' number of peer interactions is positively correlated with their course grades in the lecture course for engineers [5][6][7][8][9][10]28].Interestingly, however, this effect was not statistically distinguishable from zero in the lecture course for physics majors.An instinctive explanation is that this result is attributable to small sample sizes: the lecture course for physics majors contained 45 and 36 students in the fall and spring, respectively.These samples were likely sufficiently large, however, because we found other statistically significant relationships in the ERGMs for these networks.For example, we found significant gender effects in the fall lecture course for physics majors despite the small fraction of women in that course (main effect of gender on degree variable in Table 5 in the Appendix).
Instead, our findings for this course could be due to range restriction, such as due to low variability in either students' number of peer interactions or students' final course grade, which would reduce the possibility of finding a significant correlation between the two variables.The range of students' network degree in the lecture courses for physics majors (zero to nine), however, is comparable to that in the lecture courses for engineers, therefore the variability in students' number of peer interactions was likely sufficient.Final course grades, however, were less variable.While final grades ranged from C+ to A+ in the spring lecture course for physics majors, final grades only ranged from B to A+ in the fall offering (compared to a range of D-to A+ in the lecture course for engineers).
The limited variability in student grades, therefore, are a plausible explanation for our results in the fall, but not the spring, offering of lecture course for physics majors.
Alternatively, these results may be due to a truly non-meaningful relationship between student interactions and final grade in this course.It is plausible that physics majors engage in peer interactions about their physics course because they are interested in the subject, and that the extent to which they engage in such interactions is more related to interest than performance.Future work, therefore, should aim to further understand the relationship between students' interaction network degree and their final grade in physics lecture courses across different class sizes, instructional styles, and student populations.
Expanding on these findings for our second and third research questions, we observed that the frequencies of the six different interaction topics were more uniform in the lecture courses than in the lab course.The most common interaction topic was homework, followed by (in descending order) concepts, lecture, small-group work, assessments, and other topics.Correspondingly, our statistical analysis indicated that concepts, small-group work, and homework were the three interaction topics most strongly correlated with student performance, respectively.The remaining topics did not exhibit a significant relationship with final course grade.These results are consistent with Bruun and colleagues' demonstration of the significant impacts of peer interactions about both concepts and problem solving on performance, though we now explicitly link this pattern to different aspects of the course (i.e., problem solving is specifically about student participation in small-group work during discussion sections) [7].
Similar to the lab course, these findings for the lecture courses are likely attributable to a combination of the time allotted to each course component, the grading schemes, and the nature of the learning activities.Teaching concepts, for example, was the primary aim of the lecture courses and the central focus of all of the course components (small-group work, assessments, lecture, and homework).Students' interactions with peers about such concepts, in turn, were strongly related to their performance.Students also engaged in small-group work during two 50 min discussion sections each week and this participation accounted for 20% of their final course grade in the lecture course for engineers.This small-group work was also collaboration-oriented, such that students were prompted to work together with their peers on physics problems.Interestingly, homework was an important interaction topic but only comprised 5% of the final course grade in the lecture course for engineers (30% in the lecture course for physics majors).
We suspect the prevalence of interactions about homework is related to both time and the nature of homework assignments because students spent a handful of hours per week outside of class on the homework assignments, including attending homework help sessions where students often worked together.The homework problems were also similar to those on the exams (worth 65% or 70% of the final grade), providing student motivation to complete and understand them.
Surprisingly, though talking about lecture was not rare, such interactions were not strongly correlated with student performance.Lectures were the primary component of in-class time in both of the lecture courses (three 50 m sessions per week) and often implemented clicker questions discussed in small groups.Participation in lecture, however, hardly contributed to students' grades (5% or 0%).Still, this result is contrary to prior work showing that student engagement in in-class activities is linked to their conceptual understanding [51], which we would expect to be reflected in higher course grades, warranting future research.
Finally, assessments were the least frequent interaction topic in the lecture courses and did not strongly correlate with students' grades.As mentioned above, this result could be due to students' interactions about concepts, rather than about the exams themselves, relating to their performance on exams, which comprise the majority of the final course grades (65% or 70%).Alternatively, this finding may be attributable to the timing of our data collection.The survey was not administered close to a midterm exam in all but one of the courses, which may explain the low frequency of this topic and weak correlation with overall performance.Future research should examine how the timing of such a survey impacts both the topics about which students report interacting and the relationship between interaction topics and final course grades.

Limitations and future work
We conclude by acknowledging the limitations to our study that motivate additional follow-up.First, our network survey may not have captured all interactions between students, for example due to recall bias where students do not remember a peer interaction and/or do not remember their peers' names to report.We also asked students about peers with whom they interacted "this week."While this prompt likely captured many interactions that happen consistently week-to-week, it may also have captured one-off interactions that only occurred during the week of the survey.Future work should seek to disentangle these two kinds of peer interactions in terms of the number of peers students report and whether one kind of relationship is more impactful for student performance than another.Future studies should also explore the impact of different strategies for measuring students' interactions, such as providing full rosters of student names, and how these methods may affect the reported interactions and relationships with outcomes.
In our linear mixed models, we chose to simplify our data by only considering whether or not students interacted about each topic.This decision allowed our analysis to isolate the relationship between interaction topics and grades, intentionally removing information about the number of peers with whom students interacted about each topic.Future work should examine whether and how the number and topics of peer interactions are related to student outcomes.For example, there could be thresholds of numbers of interactions above which the relationship with course grade plateaus and these thresholds may be different for each interaction topic.
In addition, as with any correlational analysis about student interactions, we cannot quantitatively disentangle whether peer interactions lead to higher grades or students with higher grades to interact more and/or about different topics.Future work to disentangle causation could probe student interactions over time, control for incoming course performance, or conduct student interviews about the roles of their various peer interactions.
We also examined the correlations between students' interaction topics and their course grades for all students combined.Prior work, however, has found mixed results as to whether students from different demographic groups engage in peer interactions to a similar extent [4-6, 26, 27] and that course grading schemes can differentially impact the final grades of students from different demographic groups [52][53][54].Future work, therefore, should determine whether the role of interaction topics in student performance varies by student gender and race or ethnicity.Doing so would likely require much larger sample sizes than those included in this study, in order to have sufficient statistical power for multiple interaction terms.
Finally, this study was conducted at one institution with only a few instructional styles.Future research should examine these relationships with different student populations and in different instructional contexts, particularly those with different course structures and grading schemes as these factors seem to be strongly related to interaction patterns.

Conclusion
We have built on the body of evidence suggesting that student interactions about different aspects of a physics course may be related to their performance in different ways.We also found that these relationships vary across instructional contexts: lab and lecture.Importantly, the interaction topics which most strongly correlated with students' grades mapped onto the assignments given the most weight in the course structures, through both the instructional time allotted and the grading schemes, and the nature of the assignments, whether collaborative or individually completed.These findings indicate that patterns of peer interactions, including which kinds of interactions are important for students' performance, are largely shaped by the instructional design of a course.Both instructors and researchers should consider these effects of course design on peer networks in order to ensure that all students have the opportunity to interact with their peers about meaningful topics that could impact their performance.

ERGM coefficient estimates
The coefficient estimates of the ERGM model for each network are summarized in Table 5.We interpret the coefficient estimates as log-odds of edge formation.For example, the coefficient estimate for the homophily on lab section variable for the fall offering of the lab course is 1.66.This means that the log-odds of an edge forming in the network increases by 1.66 for each additional edge connecting students in the same lab section, holding the rest of the network the same.In other words, edges connecting students in the same lab section are more probable than edges connecting students in different lab sections, even after accounting for the other configurations included in the model.

ERGMs: Goodness of fit
The goodness-of-fit of an ERGM can be evaluated by comparing our observed network to a distribution of random networks simulated using the model coefficients.Figure 5 shows this comparison for one of the observed networks in this study for three different network measures: indegree (the number of incoming edges), outdegree (the number of outgoing edges), and edge-wise shared partners (measure of triadic closure, or smallgroup clustering).The boxplots represent the distribution of frequencies of these measures for 10 simulated networks.We see that our observed network, represented by the black line, falls within the distribution of simulated networks, indicating that our statistical model sufficiently represents the observed network.We observed similar goodness-of-fit plots for the remaining five networks in the study as well.

Linear mixed model results
The coefficient estimates of both linear mixed models presented in the main text are provided in Table 6.Coefficients represent the mean increase in final course grade for a student interacting with at least one other peer about an interaction topic, as compared to a student who does not interact with any other peers about that topic.For example, a student who interacts with at least one other student about small-group work in the lab courses have, on average, a final grade of 0.2 GPA points higher than a student who does not interact with any peers about small-group work.

Model diagnostics for linear mixed models
Here we assess the model diagnostics for the linear mixed models used in our analysis.
Variance inflation factors (VIFs) Variance inflation factors (VIFs) are a measure of multicollinearity of variables in a linear regression model, which can affect the model's precision.Multicollinearity indicates that two or more of the predictor variables vary closely with each other, reducing our ability to distinguish the significance of each variable on its own.VIFs measure the ratio of the standard error of a coefficient of a variable in the full model to the standard error of the coefficient of a variable in a model containing only that variable.For example, a VIF of two indicates that the standard error of a variable in the full model is twice what it would be in a model containing only that variable.VIF values less than two suggest adequate model precision and reliability.The VIFs of the predictor variables in our linear mixed models are all close to one (Table 7), suggesting sufficient precision of our estimated effects.
Checking model assumptions We checked the three main assumptions of linear mixed models: homoscedasticity of residuals, normality of residuals, and homogeneity of variance of residuals (Fig. 6).The assumption of homoscedasticity requires that residuals are randomly scattered about zero across the range of predicted values of  the dependent variable.We do not observe any strong trends in the residual plots (top row of Fig. 6), though we note that the discrete rows of residuals and the upper bound to the distribution are due to the nature of the dependent variable -final course grades that are measured in discrete GPA values and capped at 4.3.Quantile-quantile plots are used to compare the distribution of residuals to a normal distribution.We mostly observe a normal distribution for our models, with some departure from normality at the tails of the distribution (middle row of Fig. 6).This is a common pattern, however, and the regression results are still valid when the dependent variable is not normally distributed if the sample size is sufficiently large, as we have in this study [55].
Lastly, the homogeneity of variance assumption requires that there are not significant differences in the distribution of residuals for each value of the random effect variable, in this case the different lecture courses in which students were enrolled.The boxplots of the residuals within each lecture course show fairly consistent medians and interquartile ranges (bottom row of Fig. 6).One-way ANOVAs comparing the residuals across the lecture courses also do not suggest a significant difference between the variances of residuals (lab: p = 0.96, lecture: p = 0.96).

Figure 1 .
Figure 1.Diagrams of interaction networks for all six courses.Nodes are colored by final course grade and sized proportional to total degree (number of edges connected to each node).Thick edges represent reciprocal edges (students A and B both reported interacting with one another) and thin edges represent one-way edges (student A reported interacting with student B, but student B did not report interacting with student A).

Figure 2 .
Figure 2. Plot of ERGM coefficients for the main effect of final course grade on degree variable for each observed network.A more positive (negative) coefficient estimate indicates that students with higher final course grades have more (fewer) total connections in the network than students with lower final course grades.Error bars indicate the standard error for each estimate and asterisks indicate statistical significance.

Figure 3 .
Figure 3. Histograms showing the frequencies of each interaction topic in each instructional context.The bars within each context may add up to more than one because each explanation could receive more than one code.

Figure 4 .
Figure 4. Linear mixed model results.Each coefficient compares the final course grades (on a GPA scale) of students that interact with at least one peer about a given topic to the grades of students who do not interact with any peers about that topic.

Figure 6 .
Figure 6.Model diagnostic plots for the linear mixed models.The plots in the top row visualize the residuals across the fitted values.The middle row shows quantile-quantile plots comparing the distribution of standardized residuals to a normal distribution.The bottom row shows the distribution of residuals at each value of the random effect variable (lecture course).

Table 2 .
Grading schemes for each of the three courses analyzed in this study.The grading schemes were consistent between the fall and spring semesters in each course.

Table 3 .
Summary of network-level statistics for the observed interaction networks.Standard errors of the last digit of density and transitivity are shown in parentheses.The percentages for size of giant component and isolates are calculated as a fraction of all nodes in a given network.

Table 4 .
Definitions and examples of the coding scheme characterizing students' explanations of what they talk about with other students.The same codes were applied in all courses.

Table 5 .
Coefficient estimates for our ERGM fit to the six observed networks.Standard errors of the coefficient estimates are in parentheses below.Asterisks indicate statistical significance ( * p <0.05; * * p <0.01; * * * p <0.001).

Table 7 .
Variance inflation factors for the linear mixed models.