Investigating the epistemology of physics students while reflecting on solutions

Reflecting on one’s solution is widely recognized as an important part of the problem-solving process, though the cognitive processes underlying this action are not well understood. In previous work, we identified certain strategies students used while reflecting on a solution but found that strategies most often used by experts were not often used by students and were often used incorrectly. In this paper, we present the results of a study that more carefully examines why this could be the case. We conducted think-aloud interviews with students from a variety of physics backgrounds and asked them to check the answer to a particular static equilibrium problem (there were two different problems, but each student saw only one). We found that students’ strategy use varied both by problem features and level of experience. We found students’ epistemological framing to be more stable across problem-difficulty but still correlated with experience. We further noticed more frequent shifting of epistemological frames among intermediate students. Altogether, the results point to an epistemological transition from solution reflection as an algorithmic procedure to be performed using whatever strategy is most useful, to a conceptual procedure more aligned with sense-making as students gain more experience with physics. This will be useful for instructors when thinking about the best ways to encourage novice students to engage in meaningful solution reflection or mathematical sense-making more broadly.

A seminal work on reflection comes from Dewey [16] and was built upon by Schoen [17]. Dewey characterizes reflection as a process of arriving at an idea of what is missing based upon what is known; he also characterizes it as an active and deliberate process. Schoen elaborates on this and argues that reflection allows practitioners to make tacit knowledge explicit; he also dichotomizes reflection into 'reflection-in-action' and 'reflection-on-action'. The framework of Price et al identifies both reflection-in-action and reflection-on-action as critical parts of the problem-solving process. For the purposes of this study, students will be engaging in 'reflection-in-action,' as they are being asked to process information in real-time and actively construct arguments about physical phenomena.
In a previous study [18], we characterized how physics students decide to check whether an answer makes sense. Students were given two static equilibrium problems in two different formats and asked to verify the (given) answers to the problems without solving the problem themselves. From this prompt, we identified categories of strategies students employed to make sense of their answers, some of which reflected more expert-like thinking than others: 1. Identifying dependency. This group of strategies involves identifying what factors (e.g. variables) should affect the solution to a problem. In many textbook problems, this strategy consists of simply processing the problem statement and identifying what variables are specified and which are not. This is often not a cognitively demanding task, though there are common and notable exceptions, such as when students must decide whether the mass of an object will matter in a dynamics problem. 2. Identifying functional relationships. This group of strategies relies on either a mathematical or physical sense of how two quantities should be related. For example, identifying that the orbital velocity of a satellite should be proportional to the mass of the object it is orbiting. The cognitive demands of the strategies can be variable. The strategy could be as simple as identifying the covariational relationship described above or could be as sophisticated as carrying out dimensional analysis to determine the precise power of the proportional relationship. The commonly prescribed strategy of checking the units of an expression would fall into this category. 3. Making predictions. These strategies involve using mental models of a system to determine how it will behave under certain conditions. For example, identifying what happens when the angle of an incline is zero or 90 degrees. The most common strategy in this category is to evaluate limiting cases of an expression. These strategies are often the most cognitively demanding because they rely on simultaneous physical and mathematical reasoning. 4. Rederiving the solution. In the laboratory environment, students were given the solution to a problem and would often attempt to derive the solution themselves from first principles. We hypothesize that, in a more ecologically valid environment in which the student derives the answer themselves from the start, this would be replaced with checking their math. The cognitive difficulty associated with this strategy is highly variable depending on the features of the problem.
This previous work provides one proposed ontology for reflecting on a solution in the context of introductory physics, though we hypothesize that these categories of strategies are likely to manifest in other contexts and be quite general. For example, if one is reviewing a historical argument, one needs to identify important individuals, locations, and events (Identify dependency), how those variables are related (Identify functional relationships), and predict how these factors might influence current events, or predict how changing one of the historical factors would have altered the outcomes to assess the quality of the argument. Our categories are similar to the strategies for sensemaking identified by Hahn et al [19], but we believe that our strategies are more general than the strategies they identified. As discussed above, one can map our strategies into different disciplines outside of quantitative physics problems. The strategies identified by Hahn et al were more specific to physics (e.g. finding a 'limiting case' or identifying 'fundamental dimensions'). It is known that novice students tend not to reflect on their solutions automatically [20], and even have difficulty checking their answers to problems [15,18]. From a cognitive resources perspective [21], this poses the question of whether this 'difficulty' is because students lack the appropriate mental models and content knowledge required to carry out various solution reflection strategies (they lack the resources), or whether students simply do not use these strategies because they do not perceive them as germane to the task of reflecting on one's solutions (the resources are not activated). Investigating this question thoroughly is complex due to the dynamic nature of epistemology within an individual as well as across individuals within a system [22]. Previous studies have investigated how assessment features activate local coherences of epistemic resources which influence how students frame a particular task [23].
We are interested in the epistemology of reflection on a solution: moving beyond what actions characterize this reflection to why students may approach reflection tasks the way that they do. We are interested in variation within individuals, as well as how it manifests across individuals with similar physics backgrounds. This paper presents a pair of analyses that examine in detail student epistemologies while reflecting on the solution to a physics problem. In the first analysis, we focus on the prevalence of different answer-checking strategies across student levels and different problem types. This is to provide some evidence as to what common factors might impact student epistemology of solution reflection. The second analysis then investigates the epistemological framing associated with answer checking to provide a more detailed picture of this epistemology both within and across individuals.

Analysis 1
In a previous study [18], we carried out an experiment to explore what types of solution reflection strategies 'transitioning novice' students used. In that work, a transitioning novice was defined as someone who had some explicit training in expert problem-solving practices but was far from an expert in their behaviors. Students were given two static equilibrium problems in two different formats and asked to verify the answers to the problems without solving the problem themselves. The problems were chosen because they involved multiple concepts (both force and torque balances), and were posed in a way that is more authentic than typical physics textbook problems. For example, the problems asked about design specifications for the systems instead of just calculating a force or a distance. Additionally, the problems were chosen because their solutions would have multiple unspecified quantities that one could conceivably vary in a real system (e.g. angle, mass, length, friction coefficient) and thus there would be multiple opportunities for students to engage in sensemaking even if they could not reason about a specific feature of the problem. From this prompt, we identified a limited set of strategies students employed to make sense of their answers, some of which reflected more expert-like mental models than others (see above).
Two strategies that we identified, used almost ubiquitously by experts when checking their answers, were evaluating limiting cases of an expression and checking the units of an expression. In our previous study, we found that 22%-38% of the 78 students who completed the study checked the units of an answer without being prompted, compared with 73%-80% of students who checked their units when explicitly told to. In both cases, most of the students were able to use this strategy correctly. Checking limits, however, was far less prevalent and much more difficult. Only 11% of students checked limits without prompting (8.1% correctly) and 27% percent checked when told to check the angular dependency of a solution (12% correctly).
In that work, we hypothesized that students did not check limiting cases because students lacked the appropriate cognitive resources to execute that strategy. Checking limiting cases of an expression requires the student to have a sophisticated enough mental model of a problem to make predictions about how it will behave under certain conditions. This requires a deep conceptual understanding of physics and mathematical understanding of the relationship between quantities. However, we also found evidence that students' strategy choices may be linked to their perceptions of the task. Most of the students in this study were able to successfully check whether an answer was correct using simpler strategies like checking units or calculating force components. It was thus also possible that students did not check limiting cases because they did not see that as a necessary tool in verifying the answer to the problem (the resources were not activated).
In this first analysis, we aimed to investigate barriers to reflecting on one's solution by systematically varying assessment features [23] as well as students' physics background. Variation of assessment features was chosen to investigate the activation of cognitive resources, while students' physics background was used as a rough proxy for the availability of cognitive resources. We asked: 1. Does the mathematical and physical complexity of a problem scenario affect students' choice of solution reflection strategies? 2. Do we see variations in students' strategy use by prior physics experience?
We hypothesized that, if the assessment features were affecting students' strategy choice, we would see a difference in students' strategy use between the two problems. This would provide evidence to support the idea that strategy choice is affected by the activation of certain resources. If the barriers to checking answers in a more expert-like way were only because students lacked the cognitive resources to execute those strategies, we would likely see variation across student physics backgrounds but not across problem types. Variation across both student physics backgrounds and problem types would provide evidence that both resource possession and activation are important for strategy selection.

Methods
Data were collected in the fall quarter of 2019 and winter quarter of 2020 from undergraduate and graduate students enrolled at Stanford University (all interviews were conducted prior to the start of the COVID-19 pandemic). Graduate students (hereafter: Grad) in physics and physics majors (hereafter: Major) were recruited through emails to department mailing lists. Students who had previously taken an introductory physics course (hereafter: Intro) were recruited through emails from the instructors of those courses. Students who had not yet completed a physics course and had not taken physics in high school (hereafter: None) were recruited from Physics 41E, a course taught by E. W. B [24]. All students were offered $25 for an hour of their time. In all, we recruited 37 students (the breakdown by physics level is given in table 1).
We note that there was a smaller number of physics majors and graduate students as those populations were smaller at Stanford University than the other groups of students. This means that the conclusions can be drawn about behaviors of physics majors and graduate students may be somewhat limited as these samples may not be completely representative.
In the interview, students were given 6 multiple-choice questions from the PIQL [25], as well as 6 open-ended questions probing their ability to use and interpret math in physics concepts. The open-ended questions were all written by E. W. B. Students were asked to think-aloud as they solved all of the problems, and all the interviews were transcribed by L. F. B. and E. W. B. For the purposes of this study, we only considered the portions of the transcripts where students were solving open-ended problem 4. In this problem, students were presented with a situation where two fictional students, Sarah and Jessica, were studying for a midterm exam and came across a particular static equilibrium problem. They were presented with the text of the problem, as well as the two students' answers (one of which was correct, and one of which was incorrect). They were then asked to determine which of the solutions, if either, was correct. There were two versions of this problem, the Chandelier problem and the Ladder problem, which are shown in figures 1 and 2. Students were randomly assigned one version of this problem in the interview (see table 1). There was no statistically significant difference in which version of the problem students saw by their physics background (Fisher's exact test, p = 0.77). In both cases the 'correct' student was Sarah.
We chose to use the language 'correct' as we found this to be a clearer prompt for students than other sensemaking tasks we attempted during the same interviews. For the ladder problem one cannot really determine which solution is correct without attempting to rederive the answer. Indeed, a limiting case of the 'correct' answer makes a prediction that makes little physical sense. We found that only the graduate students noticed this, however, and so we gave students credit for both identifying this issue or saying that Sarah was correct.
The first version of the problem is called the 'Chandelier problem,' and is mathematically and physically simpler than the ladder problem (see figure 1). In this problem, the fictional students were asked to find the required strength of a rope to hang a chandelier from the ceiling. The chandelier is attached with two ropes to the ceiling at equal angles. Solving this problem simply requires the students to calculate the vertical component of tension and then do a one-dimensional force balance. The second and more difficult problem is the 'Ladder problem.' (figure 2) In this problem, the hypothetical students are asked to determine the maximum height off the ground you can stand on a ladder before it slides out from underneath you. Solving this problem requires multiple force and torque balances in two dimensions, as well as some more sophisticated trigonometry.
The interview transcripts were all coded according to an a priori coding scheme. Interviews were broken down into units determined to be discrete chunks of a thought. This often included multiple sentences. We coded each chunk for what strategy students were using to check their answers [18]. The interviews were independently coded by E. W. B. and O. C. M. in batches of 3 interviews. If during the coding process we felt that a chunk reflected multiple strategies simultaneously, we broke the chunk up into smaller pieces. Students' solution reflection strategies were coded according to the findings of [18], see table 2 for examples and definitions. The two raters met after each group was coded and discussed and resolved all disagreements. In early groups, Cohen's kappa for interrater reliability was typically between 0.6-0.65. Disagreements between raters typically probed new edge cases of how the codes should be defined. As the definitions of codes were refined at each discussion point, previous coding was adjusted to fit the new rubric. In the latter batches of coding, Cohen's kappa approached 0.8-0.85, indicating less variation between the two raters. As Hammer and Berland point out [26], it is important not to treat this coding as error-free. To provide the reader with more context we present the full text of several example students to illustrate the findings found by the quantification of qualitative data in the appendix. We note that the organization of our strategies is different from that presented by Hahn et al [19]. Their 'special case' and 'limiting case' would correspond to our 'Making a prediction,' provided there was physical sensemaking that accompanied it. Their 'functional dependence' we have broken down to identifying physical dependence and identifying mathematical dependence. Their fundamental dimensions code would typically fall under 'identify functional relationship.' We note that our coding scheme is likely organized somewhat differently because of the different nature of the tasks employed in this study versus their study. However, the codes employed in that study can be mapped onto the codes used in this study. We were primarily interested in how strategy use varied by the assessment features and students' level of background physics knowledge. To investigate this, we thus counted instances of each strategy and computed what fraction of the responses used that strategy. We also counted the total number of times across all students within a group/question type that the strategy shifted [22] to investigate the within-subjects variability in the epistemology of solution reflection.

Results
The total code counts by student physics background level and question difficulty may be found in table 3. Percentages reported are the percentage of total coded lines in each category. For example, across the 4 graduate students, there were 23 codable lines, and 57% of those lines were associated with making a prediction. Note that we have included a code indicating whether a student arrived at an answer that was correct because in both problems, one of the alternative solutions we provided was correct and one was wrong. This indicates what percentage of students chose the correct answer from the two alternatives. In the following sections, we first address differences by problem type, then by student level.

Differences by question type
There was a statistically significant difference in strategy use across the two questions (Fisher's exact test, p < 0.0001, see figure 3). Students were more likely to make predictions and identify functional relationships in the ladder problem, while they were more likely to attempt and rederive the answer in the chandelier problem. We note that most students tried to make a prediction, identify a quantitative relationship, or solve the problem in both the ladder and chandelier problems and that students from all levels were able to execute the more sophisticated strategies of making a prediction. Essentially zero students tried to simply identify the relevant variables or remember the answer. We also see that fewer students solved the ladder problem correctly compared with the chandelier problem. Table 2. Coding scheme for strategy use including code name, code definition, and an example of each code from the data.

Strategy
Definition Example Making a prediction A statement about how the equation/ system should behave in a certain instance. This most often includes identifying limiting cases of an expression but can also include predicting an average behaviors of a system (see example). This often includes a mathematical calculation of the limit but also includes a physical explanation of how that limit manifests  To illustrate these differences, we include two examples. Consider the following student attempting to check the answer to the Chandelier problem: 'Okay, well, oohKso, we have one component that is going downwards. Okay, we have tension force going up at an angle. Correct. Now, if we are using this angle now the only question we have to break it into its x and y components, right?' This student sees the problem and immediately starts drawing a force diagram and breaking forces into components. This is a procedure that they have committed to memory as what you are supposed to do when you see any problem involving forces. Prior research has shown that training students to do this can interfere with their overall problem-solving ability [23]. To contrast this, consider this student solving the ladder problem: 'In Jessica's answer if the coefficient of friction is zero, then the height is equal to l cosine squared alpha over sine alphaKwhich I am pretty sure is not always zero, and so I am gonna say Jessica can't be correct, so maybe Sarah is correct.' This student has predicted that the answer should go to zero when the coefficient of friction is zero and is checking that prediction. This is a more sophisticated strategy that students in the Intro and None groups typically do not use [13]. Though the physics and math used to derive the answer to the ladder problem are far more complex, this illustrates that the task is not necessarily very cognitively demanding. Rather, the features of the problem statement may affect resource use and activation among students.

Differences by student level
There was significant variation in strategy use by student groups ( 12 2 ( ) c = p 72.9, 0.0001, < see figure 4) as well as the fraction of students arriving at a correct answer. Graduate students and physics majors spent most of their time attempting to make predictions about how the system would behave in various limiting cases, and some of their time identifying quantitative relationships between variables. Students in the Intro and None groups split their time between identifying quantitative relationships and attempting to rederive the answer to the problem themselves. However, some of the former introductory students did attempt to make predictions about limiting cases of the expressions. The percentage of introductory students making a prediction is almost identical to Hahn et al's number from their techniques of theoretical mechanics course.

Shifts in strategy
In table 4, we count the number of strategy shifts by student level and problem type. We see that, overall, students do not shift strategies very often. We see a small number of students who shift strategies often (all rates are 0.6-1 strategy shift per student with large standard deviations), but most students stay with the same strategy.

Discussion
There are several notable results from this first analysis, many of which align with expectations. First, we found that students used more sophisticated strategies in solving the ladder problem compared with the Chandelier problem. This aligns with our previous finding that  students often adopt the simplest productive strategy to check their answers. The chandelier problem is quite simple to solve (a single force balance), and indeed it is simpler to rederive the answer than predict limiting behavior. The ladder problem, however, is much more difficult to solve, and so it is simpler to check limiting cases than attempt to rederive the answer. This also explains why we saw essentially no students trying to use 'identify dependency' in the ladder problem. Though this is an easier strategy to use, both answers that students were given contained all the relevant variables, so it was not productive in terms of determining which expression was correct.
It also makes sense that more advanced students used more sophisticated strategies to check their answers. Physics majors and physics graduate students have spent a lot of time practicing checking the limits of expressions throughout their coursework and research and are more comfortable with this task. Introductory physics students are sometimes told to check limits, but it is often not required of them and thus they do not get as much practice with this skill. They rely on skills they practiced often, such as identifying components of forces and solving force and torque balances.
We note that many of the introductory students did not exactly follow directions and attempted to solve the problem themselves. We do not consider this a failure of methodology or failure to follow instructions on the part of these students. Rather this reflects the unusual nature of a sensemaking task for novice students. Many of them attempted to start sensemaking, but in the process of doing so, resorted to the more familiar algorithmic strategies (see below) they associate with physics.
Based on our hypotheses, these results cannot rule out that strategy choice in solution reflection is affected by both the availability and activation of the necessary cognitive resources. Students are more likely to use more sophisticated strategies regardless of prior physics knowledge when it is required of them by the features of the problem-supporting the idea that students must perceive the task to require these more sophisticated strategies. At the same time, however, more advanced students are more likely to use these sophisticated strategies. This does not necessarily suggest that lower-level students lack the resources necessary to make predictions. It is possible that more advanced students use these strategies because they have been conditioned to believe that this is part of physics problem-solving, while more novice students have not been conditioned that way. This is in line with the findings of Gupta and Elby [27] who showed that a first-year student had all the resources necessary for sense-making, but they were not activated due to epistemological barriers.
The results of the first analysis strongly suggest that epistemology plays a role in students' strategy selection while checking answers. Students' strategy selection-what it means to reflect on the solution to a physics problem-is clearly affected by both features of the assessment as well as students' experiences with physics. We thus want to investigate the epistemological framing while checking answers to physics problems to provide a more detailed picture of what information we communicate to students when we ask them to reflect on the solution to a problem. That is the purpose of analysis 2.

Analysis 2
In this analysis, we adopt the theoretical framing of student epistemologies presented by Shar et al [22]. In that work, Shar et al investigated how assessment features impacted student engagement with the assessment problems. From think-aloud interviews with undergraduate students in introductory physics courses, they specifically explored how students frame assessments and what knowledge resources regulate those frames. Within the context of assessment features, Shar et al further documented the stabilities and dynamics in resources and frames. To evaluate the ways in which student understandings of knowledge and learning are present in their engagement in assessments, Shar et al developed a theoretical framework on epistemology. This framework involves epistemological framing, which answers 'what is it that is going on here', and epistemological resources, the smaller elements that make up the framing.
Rather than a stable model of epistemology, Shar et al adopted a model of epistemology that is contextual and dynamic to characterize the approach students take toward learningbased activities. Particularly when engaging in learning physics, undergraduate students have been found to adopt a variety of frames depending on whether they are drawing on mathematics or physics knowledge or whether they are engaged in algorithmic or conceptual thinking [24]. Moreover, students may transition between multiple epistemological frames during an assessment rather than adopting a single epistemology [22]. The epistemological frames that we adopt from Shar et al's work can be identified by student behavior and have been recognized in the literature by Chari et al [29]. These epistemological frames include Conceptual Physics, Algorithmic Physics, Conceptual Math, and Algorithmic Math (table 5).
In order to examine and document the dynamics of epistemological framing, Shar et al identified epistemological resources as the behavioral clusters that are activated together in context to make up epistemological frames as defined in Hammer and Elby's work [16]. With frames defined as local coherences of resources, we can expect a fairly consistent set of epistemological resources to be associated with a particular epistemological frame as identified by behaviors. Likewise, we can expect a set of associated behaviors, or frames, to be associated with a group of resources [22]. The epistemological resources Shar et al coded for included nature of knowledge, source of knowledge, epistemic activity, and epistemic source, which we also adopted to our work.
In our work, we apply this theoretical framework of epistemological framing as described in Shar et al's work. With this framework we aim to examine student epistemological framing when making a specific problem-solving decision, i.e. deciding how well a solution holds (or Table 5. Epistemic frames and resource definitions adapted from [20], including examples of each code from the current data set. solution reflection). Our focus is to thus examine how assessment features or prompts and student physics background change epistemological framing. Our research question for this study was how do epistemologies of solution reflection, vary within students and across students of similar physics backgrounds?

Methods
We used the same interview questions from the same student population and the same interview transcripts. Interviews were broken down into discrete chunks (see above) and were coded for students' epistemic frames [28].
Students' epistemic frames were coded according to the scheme established in [22]. The interviews were independently coded by E. W. B. and O. C. M. in groups of 3. The two raters met after each group was coded and discussed and resolved all disagreements. Interrater reliability was similar to that found in Analysis 1, and we acknowledge, not free of bias or error. Following the results of Shar et al [22], we also investigated instances in which students shifted frames. They found essentially no frame shifting in their investigation, but as we detail below, we find many instances of frame shifting. During the coding process, the two raters felt that math and physics frames frequently blended together, and it was often difficult to distinguish. Though we present examples in table 5 for instances in which we could distinguish between math and physics frames, we only present the results in terms of conceptual versus algorithmic thinking.

Results
Differences by problem type. The results detailing students' epistemic frames are listed in table 6. Interestingly, there were no statistically significant differences by problem type in students' overall epistemic framing (Chi-squared test, p > 0.10). Though students were more likely to use more sophisticated strategies in the ladder problem compared to the chandelier problem, that did not seem to translate into differences in epistemological framing that we were able to measure from the think-aloud protocol.
Differences by student level. There was more variation in student epistemic framing by students' background physics experience. Intro and None students were more likely to frame the solution reflection task as an algorithmic physics or math procedure, while physics majors and graduate students were more likely to view it as a conceptual task ( p 3 28.0, 0.0001 2 ( ) c = < ). Consider two students both solving the chandelier problem: 'Another way of seeing this is that if you take theta to zero you do something like this which is like really hard to hold, so it should blow up and that's what happens.'-Grad 'Okay, so If we are trying to determine the force of the rope, the net force would be something like the force in the rope plus the weight, minus the weight.'-None The graduate student is constructing a physical model in their head and making predictions about the system-they are in the conceptual physics frame. The student who has not yet completed a physics course, on the other hand, immediately tries to construct equations based on a force diagram and is in the algorithmic physics frame. Epistemic frame and strategy shifts. We identified a number of instances of students shifting frames in our investigation, averaging between 0.7 and 3 frame shifts per student across the different groups (see table 7). Interestingly, graduate students and the most novice students were the least likely to shift frames, while intermediate students were more likely to shift frames. This stands contrary to Shar et al's findings of students seldom switching frames [22]. It is possible that shifting frames is more inherent to the task of reflecting on one's solution as compared to the problem-solving task presented in the work of Shar et al. If a student is struggling to solve a problem in one frame, they should ideally be able to smoothly switch to another frame, as experts do [30]. Alternatively, the lowest level students may not shift frames when it is required of them, resulting in difficulties in problem-solving, whereas the most advanced students do not shift frames because they are able to select an appropriate frame from the beginning of their problem-solving. This would seem to align with the percentage of novice and advanced students who arrived at a correct answer.

Discussion
It is not surprising that less advanced students frame solution reflection as a task that relies on algorithmic procedures, while more advanced students frame it as something requiring deep conceptual knowledge and that more advanced students use qualitative reasoning more often. Indeed, typical introductory physics instruction often does not result in students acquiring an adequate conceptual understanding of the content covered (e.g. [31]), and thus it is unsurprising that these students would not perceive physics problems as conceptual problems.
The finding that students more often shifted epistemological frames when engaging in solution reflection tasks contrasts with Shar et al's findings of students mostly working within the same frames. However, the nature of solution reflection could require students to engage in different frames depending on what strategy they use. For example, making a prediction can involve both algorithmic procedures (calculating the limit) and conceptual thinking (explaining what that limit means). However, based on the strategies the advanced students use and their overwhelmingly conceptual framing, it seems that frame shifting can be associated with answer checking, but is not always necessary.
The highest-level students did not often shift frames or strategies. This suggests they were able to select an approach to the problem that worked for them and resulted in a satisfactory answer. The lowest level students shifted strategies often, but not frames, which could suggest some fundamental difficulty with the task: they perceive physics problems in a certain way and have certain strategies they use, but those strategies do not work for checking the answer to a problem. Thus, they shift strategies often without fundamentally changing the way they think about the problem. The physics majors appear to shift frames often but not strategies. This suggests they have appropriate strategies to check their answers but may have to shift their thinking about how those strategies apply to the problem at hand because they are still learning to use them. We note that frame shifts do not correspond one-to-one with strategy shifts. As detailed above the hypothesis is that certain strategies may require shifting frames. We counted only two instances of frame shifts that coincided with strategy shifts, which supports this explanation. This would add to the literature suggesting that sense-making involves the coordination of mathematical and physical resources.

Conclusions, limitations & future work
These two analyses provide rich data describing students' epistemologies while reflecting on the solution to a physics problem. Broadly speaking, the data support (1) the influence of both the availability and activation of cognitive resources on students' strategy choice in solution reflection and (2) the instability of epistemology for intermediate students, whereas more seasoned physics students and the most novice physics students exhibit more stable epistemologies.
We saw significant variations in student strategy use across both two different problems and different levels of physics background. If strategy choice was only affected by the availability of cognitive resources, we would not expect to see differences in strategy use for different answer-checking problems. The variation in strategy use across student levels could be an indicator of both the availability and activation of resources. More advanced students typically use more sophisticated strategies, but this could be either because they have been conditioned to use those strategies in that context, or that they have more strategies available to them to reflect on solutions. To check this, we analyzed the strategy use on a different sense-making task from the same interview and found no difference in strategy use between different student backgrounds, suggesting that in different contexts, more 'novice' strategies can be activated in the more advanced student groups as well. Overall, this seems to align with findings that strategy choice is determined by the activation of resources [e.g. 21].
Our more detailed investigation of epistemology of answer checking showed that advanced students are more likely to perceive solution reflection as a conceptual task. They also associate solution reflection with particular strategies, like checking limits and identifying functional behavior, or making predictions about how a system will behave. Based on the small number of participants, it appears that the correlation between strategy use and problem difficulty is biased by more novice students choosing easier strategies for the easier problem. They have a certain perception of solution reflection as an algorithmic task that involves checking their force components and their calculations. When these strategies fail, they can turn to other ones, but still largely view those as algorithmic procedures to apply in order to arrive at a solution. The data seem to suggest that the resources that instructors want them to use are not often activated by the problem context.
The intermediate students show some variation in strategy use by problem difficulty, but also a larger variation in epistemological framing of the solution reflection task. This seems to represent a transition between a stable, but often unproductive, epistemology of solution reflection as an algorithmic procedure to be performed using basic problem-solving strategies they have acquired during physics courses, to another stable epistemology where solution reflection is more about conceptual understanding and sense-making.
In all, the data seem to show that it is often difficult to distinguish between physical and mathematical sense-making during the answer. There is also evidence of many students shifting between algorithmic and conceptual thinking during answer checking. We presented an example of how this might manifest in the context of checking limits of an expression above. We note, however, that there are some advanced students who are able to check their answers without shifting frames. For example-some of the graduate students are able to reason about an answer entirely conceptually, without mechanically calculating the limit of an expression. This did not, however, make them more successful than the physics majors who often shifted frames.
Though we believe these results are interesting, there are some methodological limitations to this work. First, we use think-aloud protocols as measures of epistemology. We are thus inferring how students perceive a task based on verbal reports [32]. Another measure of epistemology would be to allow students to complete the think-aloud task, and then ask follow-up questions about why they made certain strategy choices or what they thought the question was asking them to do. This would have been impractical with this protocol, as we were attempting to characterize other aspects of mathematical reasoning in physics at the same time and did not want to influence students' behavior by asking probing questions.
The second limitation is that our solution reflection task is limited to reflection in action and does not probe reflection on action [17]: it asks students to make sense out of a solution given to them rather than a solution they derived themselves. Future work should examine whether strategy use and epistemology change when checking your own answer as compared to someone else's answer because, as seen here and by Shar et al [22], assessment features are important in activating various cognitive resources that students draw upon in problem-solving.
Though qualitative in nature, this study provides some directions for future instruction. First, it is important to recognize that intermediate physics students have unstable epistemologies. These students, as they are starting to engage in more conceptual thinking and sensemaking, can be easily provided with feedback by an instructor when they are engaging in productive sensemaking strategies. This may speed the development of more expert-like sensemaking strategies.
An obvious direction of future inquiry to the authors, aside from solidifying our understanding of the epistemology of student solution reflection in more authentic contexts, is to study ways we can shape not only the cognitive aspects of problem-solving for students but also the epistemological aspects. It is clearly not enough to tell introductory students to check limits or units of an expression and still expect them to engage in the kind of sense-making which physicists associate with those activities. We hypothesize that a targeted reflection activity (e.g. [33]) could have some impact on how students perceive the solution reflection task if repeated frequently throughout the introductory course. Learning science suggests that instructors can model 'doing physics' for students in order to communicate expectations. If we are to communicate to students that physics is about building and understanding simplified models of real systems, those ideas must be integrated into our lessons and assessments.

Ethical statement
This work was approved by the Stanford University Institutional Review Board, protocol 48006. All participants were over the age of 18 and gave written consent to participate in the interviews.

Data availability statement
The data cannot be made publicly available upon publication due to legal restrictions preventing unrestricted public distribution. The data that support the findings of this study are available upon reasonable request from the authors.

Appendix. Complete transcripts for selection of students
None Okay great so, since they are both labeled as theta they are equal to one another right? Great. Alright so this will make this a triangle. Without solving the problem yourself determine whether Jessica's answer, Sarah's answer, or neither is correct. Okay This student is from the group with no prior physics experience, and they are checking the answer to the chandelier problem. The student starts out by writing down force balances and drawing a force diagram, essentially following a memorized procedure until they remind themselves of what the task at hand is. They then start to parse the expression a bit more carefully (why is it divided by the angle), but again they are connecting to their memories of what the answers to problems like this are supposed to look like. They then cycle back and forth between these three behaviors, eventually arriving at the correct solution. Interestingly though, the logic for justifying the solution is still quite algorithmic: it has to be sine theta because that is the y-component of the force. At no point during this episode do we see evidence of conceptual manipulation of the system or expressions. The strategy shifts are frequent, but the overall framing of the task remains relatively consistent.

Intro
Two students, Sarah and Jessica are working together on studying for their physics midterm. ON the practice exam they come across this question. You have to change a lightbulb in your basement. To reach the lightbulb you lean a lightweight aluminum ladder of length l against the wall such that it makes an angle alpha with the floor. The floor and the wall are both made of concrete which has a coefficient of friction mu with the aluminum ladder. What is the maximum height off the ground h to which you can climb before the ladder starts to slide down and away from the wall. They decide to solve the problem independently, they compare their answers. Their answers are: the height equals coefficient of friction times l over one over coefficient of friction squared sine squared alpha over cosine alpha plus mu sine alpha. H is equal to l over 1 plus the coefficient of friction squared cosine squared alpha over sine alpha plus mu cosine alpha. Without solving the problem yourself determine if Jessica's answer, Sarah's answer, or neither is correct. Without solving the problem. Hmm. I guess the further, the higher up you are l would increase. Okay so look at the sine squared over cosine. The height would matter. Would the height matter more or the distance away? Height would matter more. At the same angle, the greater height would result in mu against the wall being higher. There would also be translational. Okay, so, so there's both at the bottom and at the top. Is Jessica or Sarah correct? I think Jessica is wrong, so. As l increases, so as l increase it would increase an angle I think of sine squared alpha. As l increases h increases by sin squared alpha not cosine squared alpha because the higher you are up the more force you apply at that spot.
Unlike the student with no prior physics experience, this student starts off by thinking conceptually about the expression, rather than by drawing force diagrams and writing down equations. They think about how quantities should vary with respect to one another, as well as what variables should be in the final expression. They are very much thinking about the physical model that underlies this system but appear to be having trouble converting that into an appropriate mathematical expression. They ultimately rely on the shape of the sine versus cosine graphs but do not take into account the entirety of the expression, they only focus on one small piece of it, ultimately leading them to an incorrect conclusion. Interestingly, this student does shift between algorithmic and conceptual framing of the task but does not really shift their strategy at any point.

Major
Alright two students are working together. A lot of studying going on. Okay you need to change a lightbulb, you lean aK.coefficient of friction mu. Oh, okay makes an angle alpha which, oh it has a coefficient of friction on both sides. Okay, what's the maximum height you can climb. So you are, you are climbing up the ladder okay. This, uh, determine who is right. Mu times okay so we know cosine of pi over 2 is zero. We know sine of pi over 2 is one. So we know that, and then we know that h of pi over 2 is, should like, okay wait. The limit as alpha goes to uh pi over 2 of h is infinity. You should be able to go up forever because it does not, it won't slip. So then we know that let us do Sarah's thing. Okay, so we can actually just see which one's right, we know that Sarah's right because It is over cosine.
This physics major clearly demonstrates that they are thinking about limiting cases of the expression and making predictions about how the system behaves. However, they start this approach in a very algorithmic way, plugging pi/2 and 0 into the expression and calculating the limits before making any physical predictions. After they get the results of their calculations, they then bring in physical reasoning to make a correct conclusion about the solution in front of them. This represents a stable strategy with some frame shifting again, indicating that these frame shifts may be necessary for these types of tasks.

Grad
Two students, Sarah and Jessica, are working together on studying for their physics midterm. On the practice exam, they come across this question: Consider a chandelier of weight W that is attached to the ceiling by two ropes of equal length, as shown below. One end of each rope is attached to the center of the chandelier, and the other end is attached to the ceiling. And theta. How strong does the rope have to be as a function of the weight of the chandelier? They decided to solve the problem independently. Okay, so obviously the strength of the rope will be proportional to the weight . So based upon the numerator, I would say that both of them have possibly be answering incorrect. Now, as far as that is concerned, if you are at a smaller theta That means a smaller component of the tension is balancing out in the vertical direction with the mass, not with the weight of the block And since you have a smaller component of tension. Balancing out of the weight of the block, that means your tension overall has to be much larger to balance out the weight of the block. So what I am trying to say is that are small theta, the tension would have to be much greater in both ropes and therefore we require much much more strength. Um, and so what I would expect is I would expect this f rope to increase with decreasing theta. So I would expect F to increase or decrease in theta. Yes. Denominator. Expect Frope to increase. With decreasing theta. And the only one that does that is Sarah's answer because cos theta. No is that right? Sarah's answer is correct. Answer is correct Because sine sin theta increases with theta at least between zero and 90 degrees. Whereas cos theta decreases with theta in the same range.
This graduate student again relies on making predictions to explain how the system will behave. They are able to immediately identify that the answer should be proportional to the weight of the object because that is the only force in the problem. They then think about how cosine and sine vary with angle and how that is related to the variation in tension that would physically occur. Unlike the physics major, at no point do they start to plug in different values, they use entirely variational reasoning. There is no shift out of the conceptual frame, nor is there a shift of strategy.