Particulate matter(s!) – Evaluation of a learning environment regarding scientific investigations by data exploration

How students plan and conduct experimental inquiry has been a major focus in science education research. However, experimental inquiry is not representative for all scientific investigations. So, a focus on experimental inquiry can cause the impression that there is a “single scientific method”. We are currently developing learning environments that focus on another type of inquiry using environmental data from an online data repository. We call this scientific investigation by data exploration. Due to the increased variability in environmental data, ideas of inferential statistics are of extreme importance because causal relationships cannot be directly derived. Hence, the focus of our learning environment is to support students’ skills which are relevant for performing scientific investigations by data exploration. The main goal during the intervention for the students is to identify factors influencing the particulate matter concentration in an Austrian city. In this article, we report the evaluation of our intervention with a cohort of 27 secondary school students. The evaluation shows that students regard particulate matter as a highly interesting. Furthermore, students self-report a high intrinsic motivation during the intervention and feel more informed about the environmental issue of particulate matter after the intervention. However, a few starting points for further improvement of the learning environment were identified and are discussed in this article.


Introduction
Interpreting and drawing inferences from data plays a crucial role in today's everyday life. Overall, there is a need for scientifically literate citizens so they are able to make justified decisions not only on a personal level, but on a societal level as well. Hence, there is a need to train students to develop an understanding of how evidence stemming from data is used to construct, support and evaluate claims [1][2][3]. To pick up this needs, the Next Generation Science Standards [4] for example directly emphasize practices that involve using or interpreting data such as engaging in argument from evidence and critically evaluating information. However, several studies have shown that students frequently struggle with skills related to the interpretation and usage of statistical information [5], reasoning based on empirical evidence [6], approaches to statistical investigations [7], understanding of variability in data [8][9][10] and extracting important information from graphical representations [11,12]. Nevertheless, there are only a few examples of existing learning environments or professional development programs (e.g. [1]) that pick up these aspects. Hence, in the context of science education, we developed a learning environment using environmental data from an online data repository which is supposed to support secondary school students' skills in the afore mentioned aspects. We chose a context-oriented approach using particulate matter concentration in an Austrian city since this topic is relevant to students. One the one hand, because the students live in the area of the afore mentioned Austrian city, on the other hand IOP Publishing doi: 10.1088/1742-6596/1929/1/012040 2 because air pollution is a relevant topic for our society, which students are future contributing citizens of. In our approach we do not only situate the learning in a certain context, but the students are actors in a given scenario. During the intervention, they take the role of experts in the "department for air monitoring" of the municipal government and their goal is to identify and reason about factors which influence the particulate matter concentration in their city [13]. In the next section, the theoretical background of the learning environment as well as a short description of the intervention are outlined, followed by the results of a first evaluation of the intervention with Austrian secondary school students.

Theoretical Framework
When looking at students' ideas about scientific inquiry we often find a limited perspective, reduced to "the single scientific method". According to Lederman et al. [14], this may be due to the overemphasis of the classical experimental design in science instruction, which is neither representative for all scientific investigations, nor is there any "prototypical type". Yet, in general, it is possible to categorise at least three different types of investigations: descriptive, correlational and experimental. The design and undertaking of experimental investigations have been in the focus of educational research for the past decades, but there is lack of knowledge about how students are able to plan and conduct correlational investigations, for example with meteorological or environmental data. When doing so, students need to interpret data while taking into account the factor of uncertainty present in the data. However, methods of inferential statistics like regression analysis are often not accessible to students in secondary schools or enrolling in science-related studies. Hence, there is a need to facilitate students' ability to interpret empirical data without the use of formal inferential statistics. For this purpose, we introduce the concept of scientific investigations by data exploration using already given datasets. The theoretical framework underpinning this concept is described in the next section. In statistics education literature, informal statistical inference or informal inferential reasoning has received increased attention [15] when it comes to interpreting empirical data in a meaningful way.
Hence, informal inferential reasoning serves as a pillar for the design of the learning environment. Scientific investigations by data exploration must meet the requirements of both, scientific investigations and statistical investigations. Due to this nature of scientific investigations by data    [17] that describes the cognitive processes in a scientific investigation cycle. Each of these models involve both, steps being appropriate and steps being unappropriated for our framework. The QAIC-cycle for scientific investigations by data exploration consists of four successive phases: Question -Analysis -Interpretation -Conclusion. During the phase Question, students should generate research questions in relation to the context (in our case particulate matter). These research questions need to be investigable with the data at hand. Furthermore, the students should name relevant variables and they also may generate and justify a hypothesis in the form of a presumption regarding the expected results. Concerning the phase Analysis, students need to represent the data appropriately in the form of a graph and use special techniques of exploratory data analysis [18] like the transformation and variation of graphs. During the phase Interpretation, students should describe graphs and the data represented therein. In the phase Conclusion the learners draw conclusions from their data-based interpretations and thereby answer their research question. Additionally, they should justify their conclusions explicitly by referring to contextual knowledge or the interpretation of graphs. Another import aspect of the phase Conclusion represents the discussion of uncertainty present in the data and how this relates to their conclusion. The QAIC-cycle serves as our second pillar for the design of this learning environment. Still we want to mention that this model is merely a prototypical set of phases involved in scientific investigations by data exploration and does not necessarily describe a student's actual procedures. Furthermore, we use the software TinkerPlots [19] as data analysis software since it is excels with interactivity, the possibility to generate and manipulate graphical representations and especially speed of analysis for the user [20].

Description of the learning environment
The basic idea of this learning environment can be used in different contexts, for example for the introduction of multiple regression analysis, but also to address the role of statistical correlational investigations within physics as a subject. It can be used with pre-service teachers, but also with secondary school students. Within this article, we describe the learning environment as it has been used with 36 secondary school students (age 16 to 18) from Austria. Within the intervention, students should investigate factors influencing the particulate matter concentration of an Austrian city. During the developed intervention, students learn how to carry out scientific investigations by data exploration using innovative technology like TinkerPlots. Additionally, they train how to present the findings of such investigations. In total, the learning environment consists of four lessons of about two hours each. The sequencing of our learning environment is shown in Figure 2, the units were taught over the span of four weeks. In the first lesson, students are introduced to the topic of particulate matter as a whole, focusing on specific circumstances of the Austrian city and on reasons why particulate matter poses a problem to the citizens. For our study, this introduction was given by an expert of the municipal government of Styria. In the second lesson, students are introduced to the software TinkerPlots and subsequently they have time to get familiar with the software using an alternative dataset. They also have time to ask questions with respect to different representations of data (boxplots, histograms, scatterplots, …) and the handling of the software. In the second half of the second unit, students are introduced to the dataset containing data from an online data repository of the municipal government. This dataset consists of measurements from three different meteorological stations, including particulate matter concentration, air temperature, humidity and other variables. For a detailed description of the used dataset and possible results from the data analysis see [13]. During the third lesson of the learning environment, students conduct their own investigations based on the dataset. The students' task thereby is to investigate which variables influence the particulate matter concentration in the Austrian city. They are additionally provided with scaffolding material containing facts about particulate matter, which can help them to formulate research questions and hypothesis or to justify conclusions. At the beginning of this lesson, students get an assignment for the last lesson: They have to prepare a presentation (5-7 min) of their main conclusions, with a focus on how the empirical data supports their conclusion. In the fourth and last unit, students present the conclusions of their investigations. Then each group gets feedback from their peers as well as the course instructors, focusing on the line of argumentation used in their presentation.

Research Design and Research Methods
The learning environment was part of an extracurricular course on "insights into climate research" of an Austrian secondary school, where students from 10th and 11th grade could participate. In total, 36 students took part in our study, 27 among those filled in both, the pre-and post-questionnaire. Among those were 15 female and 12 male students with an average age of 16,5 ± 0,7 years. In this article, we want to focus on the following aspects of our research project: First, we want to evaluate the learning environment with respect to the intrinsic motivation during the intervention. Additionally, we want to find out how students perceive the influence of the intervention on their contextual knowledge about particulate matter. Furthermore, we are interested whether the students interpret variability in data differently before and after the intervention. In the PISA 2015 study, Austrian students showed the biggest gender-gap regarding performance in science of all participating countries [21]. Hence, we additionally analysed whether gender is related to how students perceived the learning environment. The research questions guiding the analysis of the collected data are as follows: 1. How intrinsically motivated are the participating students during the intervention? 2. How do students assess their own knowledge of particulate matter before and after the intervention? 3. Do students reason differently about variability in data in graphs before and after the intervention?
The research design was based on a pre-and post-questionnaire (three weeks after the pre-questionnaire and directly after the last unit of the intervention). In the pre-questionnaire only, we asked the students about demographic data (sex, age and whether they live in Graz or not) and a four-point Likert-scale regarding their interest in science [22]. Both the pre-and post-questionnaire comprised five questions (four-part Likert-scale) asking how informed they are concerning environmental issues (global warming, genetically modified organisms, acid rain, deforestation and particulate matter), three specific open-ended questions regarding particulate matter (sources of particulate matter, events of high particulate matter concentration and possible actions against particulate matter concentration) and two questions regarding variability in data (the first question is shown in Figure 3). The post-questionnaire additionally contained 12 items of the intrinsic motivation inventory [23] (7 items on interest/enjoyment and 5 items on effort/importance), which had been adapted to the setting of the learning environment. A five-point Likert-scale was used ranging from 1 (low intrinsic motivation) to 5 (high intrinsic motivation). We used already existing scales for all variables measured, which resulted in different Likert-scales (four-point and five-point) for the variables since we sticked to the original scaling. We also administered three general feedback questions.

Results of the Pre-Questionnaire
In the pre-questionnaire, the students showed a mean interest in science of 3,24 ± 0,57 (ranging from 1 to 4). The distribution is shown in Figure 4. Mean values and standard deviations for the questions about how aware the students are about environmental issues are shown in Table 1.
The results show that our sample was already rather well informed about environmental issues and showed a rather high interest in science, as shown in Table 1.  Figure 4: Distribution of the participants' interest in science, ranging from 1 (low interest) to 4 (high interest)

Intrinsic motivation
Addressing research question 1 our results show that the intrinsic motivation (using the interest/enjoyment scale as self-report measure of intrinsic motivation) of the students was quite high (mean = 3,70 ± 0,56). The distribution of the intrinsic motivation of the participating students is shown in Figure 5, a Shapiro-Wilkinson test confirms a normal-distribution (W = 0,97; p = 0,70). A t-Test revealed no significant difference in the intrinsic motivation regarding gender (t(21) = 0,13; p = 0,90). The analysis for the subscale effort/importance revealed that the students showed in general very high effort with a value of 4,01 ± 0,67. A Wilcoxon Signed-Rank Test showed that the effort did not depend on gender (Z = 71,5; p = 0,37). The separate analysis of each items shows that the students especially "made an effort to do well on their own investigations" (mean = 4,19 ± 0,83) and "found the topic of particulate matter interesting" (mean = 4,04 ± 1,06). The lowest value -although still being considerately high -was found for the item "I was very curious about the results of the investigations of the other groups" with a value of 3,37 ± 1,04.

Awareness about environmental issues after the intervention
The mean values and standard deviations for the questions about how well informed the students are about environmental issues after the intervention are shown in Table 2. Additionally, the p-values for a paired Wilcoxon test are presented. A p-value lower than 0,05 indicates that the students view themselves as better informed after the intervention. The results shown in Table 2 indicate that the students tend to generally feel more informed about issues related to atmospheric environmental issues after the intervention. The significant increase regarding the issue of acid rain could be due to the introductory presentation in the first lesson. There, the expert also mentioned acid rain at some stage. At this point we want to mention that the increase in individually perceived knowledge may not only be attributed to the intervention, as we did not check for any other learning opportunities during the four week lasting intervention. However, the results show that the greatest difference between the means is regarding the issue of particulate matter. This may be seen as an indication that the students feel to have gained knowledge regarding particulate matter during the intervention.

Handling variability in data
In this section we discuss students' awareness about the variability in data (see item in Figure 3). Students' written answers were analysed with qualitative content analysis [24]. Figure 6 shows the relative frequency per student of the inductively formed categories. Before the intervention, most students (89%) interpret the diagram in the item in terms of a positive relationship between air humidity and particulate matter concentration, as Figure 6 shows. Although a Chi-squared test does not reveal a significant difference regarding the distribution ( 2 = 2,85; = 0,91) in the pre-and post-test, we clearly see that this incorrect interpretation of data decreases. After the intervention, only 67% identify a positive correlation. This is especially astonishing and positive, as we did not address this issue directly in our learning environment.  Figure 6: Relative frequencies of the categories before and after the intervention A similar result can be found regarding the correct interpretation that the variability in particulate matter concentration increases for greater air humidity. While before the intervention, only three students made this interpretation, the frequency of this category increased to eight afterwards. However, this change is also not significant ( 2 = 2,85; = 0,91).

General feedback for the learning environment
The overall feedback of the students regarding the learning environment was generally positive. We asked students, what they particularly liked and what they did not like about the learning environment. The analysis of the open answers revealed five main positive aspects of the learning environment. The frequencies of these five aspects are shown in Figure 7. 56% of the students especially highlighted TinkerPlots, the software used for the data analysis, 7 students especially enjoyed that they could choose their own research questions and were free to perform their own analysis. Five students particularly liked that they were working with "real data", meaning that the data stemmed from meteorological stations in an Austrian city and that the amount of measurements exceeded the amount they usually deal with.
Regarding the question what the students did not like about the learning environment, 15% of the students mentioned that the fourth lesson (presentation of findings with power point) needs improvement in organisation. Since the students were completely free in choosing their research questions, parts of individual presentations were redundant. One student articulated the wish for an even bigger data base, one for a longer intervention and one student wanted a more detailed introduction into the software.  Figure 7: Aspects emerging from answers to the question "What did you particularly like about the learning environment?"

Discussion and Outlook
Overall, the results show that the learning environment on scientific investigations by data exploration using environmental data does work well from an intrinsic motivation point of view. Students show a high intrinsic motivation and an even higher involvement regarding the intervention. However, we need to mention that the intervention was carried out in an elective subject. Hence, our sample represents a positive selection regarding students' interest in science.
The results show that the participants feel generally better informed about environmental issues concerning the earths' atmosphere after the intervention. Their perceived knowledge gain for the topic of atmospheric particulate matter is even higher. Additionally, the topic of particulate matter is recognized as a relevant and interesting topic by the students. Regarding the understanding of the variability in data, a few conclusions need to be emphasized. Although no significant changes can be found in comparison of pre-and post-test, it is apparent that fewer students confuse the increase in variability with the presence of a positive correlation between two variables. However, for future evaluation of next versions of the learning environment, this item should be revised and additional items should be added. Furthermore, we want to emphasize that this finding regarding students' understanding of variability in data cannot be generalized, further investigations with bigger sample sizes should be conducted to ensure that this finding was not found by mere chance. Students' feedback to the learning environment clearly shows that the software TinkerPlots was especially appreciated. Furthermore, the students liked to be free to choose their own topics of investigation and they enjoyed working with real data. For the next iteration of the intervention however, a few modifications are needed. In particular, the format of the presentation of the students' findings should be changed in order to avoid thematic overlaps. One approach would be to split the students into two rooms for the presentation, so there are no thematic overlaps in the groups. Another approach would be to change the "presentation-mode" into a "research paper mode with peer-review". The research paper mode will require students to write a short article describing their findings that is peer-reviewed by their classmates. Following a design-based research approach, a next step in our project is to redesign the learning environment based on the findings presented here. Additionally, we are adapting this learning environment for a different audience, for pre-service teachers. In this version we focused on argumentation and its assessment.