PaperThe following article is Free article

Exploring the explaining quality of physics online explanatory videos

and

Published 20 September 2016 © 2016 IOP Publishing Ltd
, , Citation Christoph Kulgemeyer and Cord H Peters 2016 Eur. J. Phys. 37 065705DOI 10.1088/0143-0807/37/6/065705

0143-0807/37/6/065705

Abstract

Explaining skills are among the most important skills educators possess. Those skills have also been researched in recent years. During the same period, another medium has additionally emerged and become a popular source of information for learners: online explanatory videos, chiefly from the online video sharing website YouTube. Their content and explaining quality remain to this day mostly unmonitored, as well is their educational impact in formal contexts such as schools or universities. In this study, a framework for explaining quality, which has emerged from surveying explaining skills in expert-novice face-to-face dialogues, was used to explore the explaining quality of such videos (36 YouTube explanatory videos on Kepler's laws and 15 videos on Newton's third law). The framework consists of 45 categories derived from physics education research that deal with explanation techniques. YouTube provides its own 'quality measures' based on surface features including 'likes', views, and comments for each video. The question is whether or not these measures provide valid information for educators and students if they have to decide which video to use. We compared the explaining quality with those measures. Our results suggest that there is a correlation between explaining quality and only one of these measures: the number of content-related comments.

Export citation and abstractBibTeXRIS

1. Introduction

1.1. Focus of the study

YouTube is among the most famous websites and apart from entertaining it also offers explanatory videos on a broad range of everyday and science-related topics ranging from knitting to the Higgs boson (Wolf and Kratzer 2015). Even though many people, most likely including teachers and university lecturers, use YouTube videos, there are only a few studies that deal with YouTube explanatory videos (e.g. Chandra and Watters (2012), Wolf (2015), Kleinhanß (2015)). Welbourne and Grant (2015) examined factors that cause video popularity. Our work tries to make a first step in a direction that allows viewers and educators to determine the explaining quality of videos by valid means.

Teachers and university lecturers have to decide about the quality of explanatory videos that are to be included into their learning environments, whether it be in the conventional classroom or in a flipped classroom setting (e.g. Schmidt and Ralph 2016). As teachers have to evaluate which existing videos they deem suitable for their purpose, due to the lack of frameworks in science education literature, they more or less need to decide on the explaining quality of the video without reliable means apart from their experience and the measures provided by YouTube (such as 'likes' and 'comments'). But do those measures really reflect the explaining quality appropriately? In addition, students use explanatory videos for different reasons and purposes such as free tutoring, assistance during homework, or preparing and revising for exams (Wolf and Kratzer 2015). They also need information to decide which video to use. In particular, students probably rely on measures such as 'likes' and 'dislikes'. That is why we apply an established measure to gauge explaining skills (Kulgemeyer and Schecker 2013, Kulgemeyer and Tomczyszyn 2015) on online explanatory videos of Kepler's laws on planetary motion and Newton's third law. We want to find first hints as to whether or not the measures provided by YouTube such as 'likes' and 'comments' are an appropriate measure for explaining quality. Can students and teachers rely on them?

1.2. Explaining in science teaching

Explaining is often attributed to teachers/university lecturers and their professional competence (Osborne and Patterson 2011, Geelan 2012). To explain something well it is crucial to anticipate the prior knowledge of the explainee. Producers of explanatory videos have to equally anticipate the audience's prior knowledge and adapt their language as teachers have to. Consequently, explaining skills used by both professional educators and laymen in explanations published on YouTube overlap concerning certain key aspects that appear essential to giving a successful explanation. That makes it possible to use a measure derived from teachers' explanations and apply it to a more diverse and general medium in order to examine the explaining quality; in this particular case explanatory videos (Wolf and Kulgemeyer 2016). Those videos can be seen as short self-made films, which explain how something is done or how something works (Bullock 2015, Wolf 2015).

For this study, some terms have to be substantiated. In this context the term to explain usually incorporates the process of giving an explanation and as such includes a variable like addressee-adequacy by considering the addressee's prior knowledge, his or her attitudes and skills (Brown 2006) with the objective to lead to an understanding of certain occurrences (Kulgemeyer and Schecker 2013). Explanations aim to make scientific topics understandable to a certain audience, and as such a dominating factor in this process is the consideration of what the audience exactly needs to know in order to grasp a new concept (Treagust and Harrison 1999, Kulgemeyer and Tomczyszyn 2015). The process of explaining aims to make matters more comprehensible for a certain audience and during this process an explanation might be altered based on the audience's reaction. In this sense, explaining should be understood as a constructivist process and good explaining makes it more likely for someone to construct the meaning of the explained topic—it, however, does certainly not comprise a 'direct transfer' of knowledge from an explainer to an explainee (Kulgemeyer and Schecker 2013).

Brown and Armstrong (1984) report that better explanations have more keys such as focusing statements that emphasize the crucial points. Better explainers varied the cognitive demands on the pupils (Brown 2006). To measure something as volatile as explaining skills, Kulgemeyer and Tomczyszyn (2015) had therefore advanced an assessment method that not only observed the product but also the process of explaining. They developed a model of dialogic explaining with a focus on physics. The model consists of four main parts, the explainer, the explanation and the explainee and the explainee's feedback (see figure 1). The explainer has to decide what is to explained (science content) and to whom (addressee's needs), whereas the explainee's role is to evaluate whether the content of the explanation is interesting and comprehensible and to give feedback either verbally or non-verbally. Furthermore, the explainer can vary the explanation on four levels based on this feedback, ranging from the language code, the graphic representation form and the mathematic code, to using examples and analogies. It is at the core of good explaining to evaluate the feedback and to adapt the explaining based on the four variables accordingly. This model has been used to analyze students' explanations (Kulgemeyer and Schecker 2009, Kulgemeyer and Schecker 2012, Kulgemeyer and Schecker 2014) and teachers' explanations as well (Kulgemeyer and Tomczyszyn 2015).

Figure 1. Refer to the following caption and surrounding text.

Figure 1. Communication model for explaining physics (Kulgemeyer and Schecker (2013)).

Standard image High-resolution image

2. Methods

2.1. Research questions

Analysing videos with the complete set of 45 categories that Kulgemeyer and Tomczyszyn (2015) proposed based on their model would take at least five times the duration of the video and prior training. This is not possible for teachers or lectures. If the surface features on YouTube provided information about the explaining quality it would be much easier for teachers to find videos that meet the needs of their students. Even more importantly, students in particular probably rely on these features to find videos—can teachers consider this easy strategy as suitable? The main questions are therefore as follows.

  • 1.  
    Research question: do explanatory video surface features such as views, likes and average view duration provided by YouTube correlate with an established measure for explaining quality?
  • 2.  
    Research question: do the viewers' comments correlate with the explaining quality of online explanatory videos on YouTube?

Basically, the aim of our study is a first exploration as to whether any of the information provided by YouTube can be a meaningful tool for teachers and students to gain insights into the explaining quality. We will reflect on the scientific correctness as well, but the focus lies on explaining quality.

2.2. Sample

YouTube is the primary choice of data collection, not only due to its extensive amount of video material but also due to its popularity and free access to its video contents, as well as its policy to let viewers participate and contribute actively (Welbourne and Grant 2015). We decided on two topics to analyze: Kepler's laws on planetary motion and Newton's third law. Both topics are short enough to gain first insights by an explanatory video and both are very common topics in many physics curricula. Furthermore, for both topics YouTube provides a lot of different videos, which should increase the chance of finding videos with a broad range of explaining quality. The videos were found by using the query field in English and German on YouTube's own search engine as well as YouTube's Up-Next feature, resulting in a sighting of more than 100 unique videos for Kepler's laws on planetary motion, and more than 300 for Newton's third law.

The final sample consists of 37 videos on Kepler's laws on planetary motion in both German (6) and English (31) and 15 videos on Newton's third law, all of them in English. It was settled on these videos due to their equal run-time. The original sample of more than 400 videos also consisted of recorded lectures that do not share the explainers' intentional core of publishing a concise explanatory video whose content is additionally accessible by a wider range of people. A lecture is more than an explanation. The mean duration of the final selection of videos is about M = 6.2 min with a standard deviation of approximately SD = 3.4 min. For Kepler's laws on planetary motion it is noteworthy that apart from minor mistakes that are being corrected during the run-time of the video, no video shows any mentionable mistakes in the science they present. This might be due to the fact that Kepler's laws themselves do not require mathematical derivations, as they are findings based on observations. That is why the explanations mainly vary in their depth of mentioning implications and range of figures, mathematics, and structuring elements such as summaries and reviews. In fact, that was a reason why the topic was chosen. It makes sense to compare only the explaining quality of scientifically correct explanations. For Newton's third law we also just found minor mistakes. However, there still is one common mistake that is worth mentioning. In some of the videos, the explainers give examples of Newton's third law that deal with an equilibrium of forces, e.g. standing on the ground. Of course, Newton's third law plays an important role in those examples. However, having in mind a very common misconception about Newton's third law this could lead to misunderstanding. Students tend to think that action and reaction forces act on the same object. Some students mistake Newton's third law for an equilibrium of forces anyway. These examples, therefore, might strengthen this misconception, even if the explanation itself is correct. Other videos mention this misconception explicitly.

2.3. Data collection

The data needed for the surface features are provided by YouTube. The comments are provided below every video, the 'likes' and 'dislikes' directly beneath the video. Not only are the comments and surface features the only channels to allow communication between the explainer and addressee, but also this kind of communication is mainly asynchronous, meaning the communication is timely and spatially separated, resulting in a delay in responses and feedback.

We will explain the surface features and comments in the following and argue why there might or might not be a relationship to the explaining quality of the video.

Views. The views are among the most easily recognized surface features. YouTube's own definition of a 'view' is 'a viewer-initiated, intentional play of a video' (Parsons 2014) either on YouTube.com or embedded in a third party's website. YouTube counts a view if a video is started and watched for as few as five seconds. Views of less than 300 in number are not audited by the video-sharing provider and as such might be artificially increased. We assume that the number of views is more influenced by the time a video is online and the popularity of the YouTube channel than the explaining quality.

Likes/dislikes. The viewer decides whether he or she likes or dislikes a video, by clicking either a thumb-up or a thumb-down symbol. For this action, an account is necessary to prevent fraudulent manipulations and allowing just one vote per viewer. The 'likes' are YouTube's primary tool to represent the video quality, apart from views, as everyone who has an account has one vote per video. Whether or not these votes correlate with the actual explaining quality is an open question but they are probably among the measures both students and teachers use to find high-quality videos. Also, the likes probably correlate with the time online and the popularity of the YouTube channel, they are however more likely to reflect the explaining quality than the views mentioned above.

Average view duration. This surface feature can be found under Video Statistics and the button labeled 'More', followed by the tab labeled 'Time Watched'. The publisher can decide via his or her video settings to make the display of Average View Duration public, hence some videos do not publish this surface feature. This is possibly interesting if it comes to choosing the video with the best explaining quality since we can assume that a higher average view duration correlates with a more interesting video because the viewer is more fully engaged with the video and its content. Thus, an important factor of an interesting video could be the explaining quality.

Comments. The comments provide by far the most intense communication channel between explainer and addressees since the explainer can reply directly to an addressee's comment. Here, new topics are suggested and constructive criticism is given by anyone who likes to give his or her comment on the video. Sometimes a whole dialogue can be found, discussing the topic, improving the explanation or answering questions.

2.4. Evaluation of the explaining quality

Kulgemeyer and Schecker (2013) and Kulgemeyer and Tomczyszyn (2015) report on how to use their model (figure 1) to analyse explaining quality. All in all, their analysis of the quality of explanations is based on 45 categories. The 45 categories applied on 36 explanatory videos resulted in 31 subcategories to be assignable, neglecting categories that deal with involving the addressee such as 'Medium–Verbal: Interrogation', 'Call-For-Action', and 'Medium–Non-Verbal: Structuring' which describes interruptions that can be cut out in post-production of the videos. Kulgemeyer and Tomczyszyn (2015) used their categories to explain dialogues and therefore needed more categories. For a complete list of the remaining 31 subcategories, see table 1.

Table 1.  The used 31 categories for all 36 videos.

Main category Category
Content Scientific mistake (-)
  Mistake corrected
Structure Giving an outlook
  Giving a review
  Giving a summary
  Ignoring students' comment (-)
  Emphasizing important points
  Open justification of the explaining approach
  Addressing common misconceptions
Use of language Paraphrasing technical terms
  Comment technical term with everyday language
  Comment technical term with other technical terms
  Leaving new technical term uncommented (-)
Contexts and examples Addressing explainee
  Example close to everyday life
  Abstract example
  Without context (-)
  Connecting at least two examples by showing analogies
  Connecting example to explained topic by showing analogies
Mathematics Providing numerical example for formula
  Using formula
  Describing relationships by use of 'the more... the less/more' relations
  Using mathematical terms and idealisations
Interrogation Asking further questions
Non-verbal elements Using realistic figures (such as photos)
  Using analogical figures
  Using logical figures (such as diagrams)
  Using experiments
  Connecting non-verbal elements
  Using writings
  Draw/amend figures

The measure for explaining quality is derived by accumulating each occurrence of a subcategory by awarding one point (+1) to its use by the explainer. This approach is identical to the approach of Kulgemeyer and Tomczyszyn (2015) to analyze teacher explanations. Each category is treated equally in this process and subsequent uses of the same category are not counted since repetitions of the same wording or repeated use of a similar explaining aid without any variation are not considered a rich and varied explanation. As a result, Kulgemeyer and Tomczyszyn (2015) apply the following assumption which we use as well: the better the explaining skill, the more diverse the explanation, the more categories used. Hence a simplified frequency analysis is applied to count the emergence of a subcategory. However, some categories are considered by Kulgemeyer and Tomczyszyn (2015) to decrease the explaining quality and are therefore assigned a negative point (−1) for their occurrence, as they are not considered beneficial to the explaining. These categories encompass (scientific) mistake, ignoring students' comments, and leaving new technical terms uncommented. They are also marked by the symbol '(-)' in the category list found in table 1. Using the results of the analysis, an ordinal performance number can be calculated for each video, abbreviated by CP(s), which stands for category point(s)

X+ stands for all positive and X for all negative categories, as noted by '(-)' in table 1. The subtraction of one point due to the category mistake can be eliminated by correcting the error or fault. A subsequently corrected mistake basically neutralizes the aforementioned category. All other subtractions cannot be canceled out. Using the framework we reached an inter-rater reliability's Cohen's kappa of κ = 0.860 between two raters familiar with the model for explaining quality. This can be considered as very good (Altman 1997). The measure reaches a Cronbach's alpha of α = 0.690, which we consider as a satisfactory result even though it encompasses a high number of categories because the explaining quality is complex. The reliability is similar to the data Kulgemeyer and Tomczyszyn (2015) report for analyzing teacher explanations (α = 0.772). The measure for explaining quality, therefore, can be used for further analysis as it can be considered as objective and satisfactory reliable. The validity of this measure has been researched by means of content validity (do the categories cover the underlying communication model?) and construct validity (the measure for explaining quality correlates with pedagogical content knowledge and content knowledge of the explainers, the measure for explaining quality correlates with students views on the explaining quality) (Kulgemeyer and Tomczyszyn 2015, Riese et al 2015).

For Kepler's laws on planetary motion, the average median of the explaining quality measure (of all videos, N = 36) is M = 8 CPs, ranging from 2 to 15. For Newton's third law the average median (N = 15) is M = 7 CPs, ranging from 3 to 16. Both topics are comparable in their explanatory quality. All four communication variables from the model in figure 1 (mathematics, code, examples and analogies, and graphic presentation) are still part of the 31 categories found, which is an important argument for content validity: The model for explaining is still represented in the categories, which is an important argument for a content valid interpretation.

2.5. Analysis carried out to answer research question 1

Having collected data on CPs and other surface data such as views, average view duration, and likes all collected at the same time, rank correlations are determined using Pearson's correlation coefficient since the data are of metric scale. We assume that a simple correlation is the best way to analyze the data for a first exploration. Correlation should of course not be confused with a causal relationship, for a first exploration, this method seems to be suitable.

2.6. Analysis carried out to answer research question 2

Our second approach to compare YouTube criteria with criteria for explaining-quality is a more in-depth analysis. We analyze the comments given below every video and compare them to the results of the analysis for explaining quality. In order to find content-related or relevant comments, which may be connected to explaining quality, inductive categorizing is used. In this context, a 'relevant comment' is a statement given by the viewer (and sometimes replies from the explainers), which can be sorted into one of the four categories that emerged during Mayring's (2000) qualitative content analysis of the comments. These categories include (1) Comment on Content, (2) Comment on Explainer's Style, (3) Comment on Explanation, and (4) Comment on Use, which states the viewer's use of the video, e.g. revising, preparing a talk or learning for a test. (1) Includes further questions or comment on notations, (2) encompasses comments on the style including a reason, and (3) covers constructive criticisms and inquiries for more videos. The first three categories were defined before the analysis (deductive analysis), but the need for a fourth category emerged after a couple of videos and was included in the manual, which also gives examples for each category in order to ease the allocation. These categories were confirmed after having entered the feedback stage after 15 videos due to Mayring's methodical step-by-step instruction on qualitative content analysis. The 'subcategories' provide further specification and emerged from an inductive analysis. In table 2 we present sample comments to illustrate the concept of 'Relevant Comments'. This method of qualitative content analysis uses the comments submitted by YouTube viewers to chart categories exclusively. The collection of data took place on 2 May 2015 for Kepler's laws on planetary motion and 5 July 2016 for Newton's third law. All in all, 1365 comments were analyzed of which 392 were labeled as relevant comments, excluding comments such as 'Thanks', or other comments not related in any way to the content or explanation. After reviewing all comments and assigning them to their category, the relevant comments per video are totaled to receive the value needed for correlation calculations. We, therefore, also use correlation analysis for this research question to compare the number of comments (or relevant comments) to the explaining quality. For further analysis we will analyze just the relevant comments and not differentiate between the categories or subcategories. We cannot think of a reason why one of these subcategories should have a stronger correlation to explaining quality than the other—our research question just deals with the relevant comments in general.

Table 2.  Sample comments and used categories for our analysis. All of these categories are indicators for 'relevant comments'. We do not differentiate between them for further analysis.

Category Subcategory Example (summarized from YouTube)
Comment on Content Comprehension Question 'Does the Moon obey Kepler's second law?'
  Comment on Notation 'I have a question: For centripetal force, you wrote Fc, for centripetal acceleration, you wrote acp, so is it c or cp???'
  Further Question 'Is it really the Sun's center that's located at the focus? Or is it the barycenter of the planet-sun system?'
Comment on Explainer's Style Good or Bad (incl. Reason) 'Thank you very much. Not only are your explanation of the concepts clear, but your narrations help us to understand the intuition that leads to these ideas'.
Comment on Explanation Good or Bad (incl. Reason) 'I thought the first two laws were explained quite well... the 3rd law was overly elaborate.'
  More Videos to Come 'hey can you explain the other 2? please i really need'
  Constructive Criticism 'I just have one criticism of this Keppler hero. Fact is Comets go past earth repeatedly and therefore they have some type of orbit. They are bound same as the rest of us'
Comment on Use Used for ... 'This video helped me so much in writing my paper on Kepler's three laws! Very clear and understandable!'

3. Results

The main questions are whether the measure on explaining quality shows significant correlations to (1) surface features of online explanatory videos and (2) the comments. At this point, it is important to understand that it is not our objective to find the best explanation among 51 YouTube videos on Kepler's laws and Newton's third law as learning is way too complex a process and bound to an individual. There is no such as thing as an ideal explanation for all individuals. The objective is rather to distinguish between rich and varied explanations on the one hand and those with fewer variations on the other. Consequently, those with fewer variations in their explanations may be less suitable for a wider range of viewers as some learners' needs may not be considered. This selective and constructivist view will warrant the labeling of the video with a more varied explanation as being superior. The result can be found in table 3. As the videos offer a great range of settings and run-times as well as quantifying explaining skill is still in its infancy, mainly tendencies can be observed.

Table 3.  This table shows Pearson's correlation coefficient r of explaining quality (measured by category ooints) to various surface features and the relevant comments. Significance is denoted by an asterisk (*: p < 0.05; **: p < 0.01). High correlation: r > 0.5; medium correlation: 0.3> r > 0.5; low correlation: 0.1< r < 0.3 The number of included cases is given in parentheses.

  Time online (month) Views Average view duration Likes Dislikes Relevant comments
Explaining quality (both, N = 51) −0.05 −0.26 0.28 (N = 23) 0.21 −0.09 0.38**
Explaining quality (Kepler, N = 36) 0.02 −0.05 0.33 (N = 13) 0.13 −0.07 0.35*
Explaining quality (Newton, N = 15) −0.16 −0.01 0.50 (N = 10) 0.25 −0.13 0.42

The analysis shows that a moderate but significant correlation can be determined for relevant comments as they are derived by equally focusing on explanation quality, whereas dislikes and views, for instance, might possess other factors that influence their values more than quality measures, such as background designing, the speaker's likeability, and presentation form; thus hinting to discriminant validity. Also, those data were precast by YouTube without evaluating their data production, thus their quality criteria remain questionable. To ensure that the correlation between CPs and relevant comments is not an effect of the time online, we conducted an additional partial correlation. Controlling the time online on the relationship of CPs and relevant comments for all videos we find an increased partial correlation: r = 0.40**, p = 0.004. It is noteworthy that the described results hold true for both topics.

4. Discussion

As noted in section 2.3, not all surface features provided by YouTube can be considered (explaining) quality-related. We assumed that views might be considered less quality-related than likes, which itself might be even less related to explaining quality than average view duration or comments. In this perspective, it is not surprising that the measure for explaining quality correlates at a value of r = 0.38*, p = 0.003 with relevant comments which were derived by applying qualitative content analysis by filtering out relevant comments that relate to the explainers' explanations. This correlation is not large, but still from our point of view large enough to regard the relevant comments as a promising measure to reflect explaining quality—or at least to regard explaining quality as having an influence on the number of relevant comments. This assumption gets support from the additional partial correlation analysis. We controlled the time online because videos that are longer online are probably more likely to have more comments in general and therefore also more relevant comments. Controlling the time online on the relationship we find a significant partial correlation: r = 0.40**, p = 0.004. This comes close to a large effect high correlation: r > 0.5. The correlation between explaining quality and the relevant comments, therefore, might indeed be meaningful. We want to highlight that the correlations are nearly the same for both topics, Kepler's laws of planetary motion and Newton's third law, which supports the assumption that relevant comments are an important measure for various topics. An interview study with YouTube users might provide more insights.

We did not find a correlation with the likes. We would have expected a small correlation, as the likes are likely to be confounded by e.g. the popularity of the YouTube channel. A possible explanation could be that some users might feel satisfied by an explanation that objectively is wrong or at least not complete. This so-called 'illusion of understanding' is sometimes found when students work on self-explanations of a topic (e.g. Chi et al 1994). Students do not realise the possible inconsistencies in their understanding and feel as if they have understood a topic. That might encourage them to 'like' a video that oversimplifies the matter. Also, Wolf and Kulgemeyer (2016) report that students not only value explaining quality in online explanatory videos but also whether or not the explainers appear likable or the use of media is impressive (cf Welbourne and Grant 2015). It is probably not a good idea for a teacher to let students choose explaining videos on their own as it is likely that they value other aspects than explaining quality (or possibly correctness). This effect should also be researched with interview studies.

We want to highlight the fact that the average view time appears to be a very promising candidate to reveal explaining quality-related information, but in our study the sample size was simply too small to corroborate this claim. We could not find a significant correlation. Further analysis is needed. All other surface features show no such correlation, reflecting their lack of quality-relation in the sense of Kulgemeyer and Tomczyszyn (2015). Besides, YouTube's surface features are to be considered with caution: YouTube's data collection method could not be controlled or evaluated, questioning its quality especially when considering YouTube's deliberate manipulation of the appearance of videos to boost advertisement sales.

We would argue that the relevant comments are a fair first indicator of explaining quality. Learning something new and adding knowledge into anybody's individual construct of the world is a highly demanding task that requires cognitive activity, which cannot be linked to surface features like views, likes, dislikes or average view duration of explanatory videos at this point. Some viewers, though, develop a need to contact the producer and/or explainer of an explanatory video via YouTube's comment interface in order to talk, discuss or state questions. That could mean that those viewers who give content-relevant comments are more activated cognitively by this explanatory video than by others. Hence, videos that accumulate plenty of those relevant comments are more successful in catching viewers' attention as these videos might use either a more stimulating explanation or the explanation delivered is considered as a starting point for further learning progress. However, the cognitive activation might be the reason why likes, dislikes, views, and average view time are not directly connected to explaining quality of the content of the video, which results in no significant correlation due to the viewers' trial and error technique in locating videos with an appropriate explanation, while their views are being accumulated for each video watched for whatever brief moment.

5. Conclusion

Concluding, this research shows that dislikes, views and average view duration are likely to be a lot less linked to the explaining quality found in online explanatory videos, however tempting their usage seems due to easy access, when being compared to the number of relevant comments found underneath a video. The greater the amount of those content-related comments, the more likely the explaining quality is high, too. One could say: the comments given by the viewers are relatively trustworthy if it comes to explaining quality. It can be said that YouTube offers high-quality explanatory videos, yet finding those extraordinary videos needs some research. Still, comments may be used as a first lead to finding those superior videos among the enormous number of videos on offer.

But can we recommend teachers and students to analyze the comments if they wish to find out about the explaining quality? It is definitely not the only thing they should rely on, but it gives valuable hints about which videos ought to be watched more carefully. Since YouTube provides so many videos on all different topics, a first hint to help narrow the numbers of videos down to a few is very useful for teachers. Furthermore, our results imply that surface features provided by YouTube, such as the number of likes, might not be a trustworthy indicator. Teachers as well as students should be aware of that fact, even more so since these features claim to represent the quality of a video. Maybe the most important result of this study is the empirical support for a very important fact that teachers need to stress in science education: if students are looking for an online explanatory video, they should not simply trust the likes or dislikes. Likes or dislikes might appear to be a measure of quality, but most likely they are not.

However, the most important limitation of our research should be mentioned explicitly: we were only able to analyze a small number of videos due to the large amount of comments we had to analyze, alongside the effort that an in detail analysis of explaining quality requires. Based on our data, we cannot say that our results hold true for all explanatory videos, but our results should be treated as a well-reasoned hypothesis. Considering the small sample size, only large effects could be identified as significant in the correlation analysis. But these are the effects teachers need for their decision-making after all (Hattie 2009). Furthermore, a correlation does not imply a causal relationship—we cannot be sure that a high explaining quality actually results in a higher number of relevant comments. Further studies might confirm that by using experimental studies with explanatory videos and interview studies with viewers. Considering the limitations of our study, though, the results ought not be overestimated. We consider the results merely as a starting point. Online explanatory videos are likely to play a more important role in the near future and we would like to strongly encourage further research, particularly in science education, to conduct studies about both their quality and ways to implement them into teaching most effectively.

Finally, an effective use of explanatory videos does not only mean finding videos with a high explaining quality. It also means integrating them successfully into instruction (Wolf and Kulgemeyer 2016). Ultimately that means that finding an explanatory video is just the beginning. For effective learning it is essential that a video is followed by two stages. First, teachers have to evaluate the understanding by asking questions and giving students the opportunity to ask questions on their own. Secondly, teachers should prepare tasks that challenge their students with problems that need the information explained in the video. It is crucial that students actually work with this information on their own—otherwise, the information is likely to be forgotten soon. Instructional explanations often do not work (Wittwer and Renkl 2008)—and one of the main reasons for this is that teachers tend to rely too much on the impact of a sound explanation and too little on appropriate learning tasks afterwards.

undefined