Visualising relativity: assessing high school students’ understanding of complex physics concepts through AI-generated images

This study investigates how students utilized artificial intelligence (AI)-generated images to represent their understanding of general relativity concepts. Ten high school students participated in an extracurricular course on relativity theory. Using AI chatbot, these students created visual representations of ‘relativity’ before and after the course. The produced images, the accompanying prompts, student interviews, and their test scores were analysed to examine students’ conceptual understanding and interactions with AI. Students with a clearer understanding of relativity tended to focus their prompts on more central concepts like spacetime deformation. In contrast, those with a weaker understanding leaned towards more tangential ideas. The clarity of their prompts was directly linked to more effective AI interactions, leading to more meaningful image generation. Despite this, some students faced challenges in crafting coherent prompts, resulting in less relevant images, indicating that understanding the concept does not always translate into successful AI engagement. The study underscores the potential of AI-generated images as a tool to illuminate student conceptualisation and interaction skills with AI in the context of complex physics concepts, offering a novel approach to evaluating understanding in advanced scientific topics.


Introduction
With the current advent of generative artificial intelligence (AI), its usage in educational fields has fostered diverse new possibilities [1] which change how we can approach the teaching and learning of complex concepts, such as Einstein's relativity theory (RT).
RT currently provides the most accepted and accurate description of the universe [2].However, its concepts challenge our intuitive understanding of the world because the relativistic phenomena are not observed directly in everyday life [3,4].In this sense, to grasp these counterintuitive concepts, great reasoning skills are required, embracing more abstract and flexible conceptions of reality.Visualization skills are crucial for abstract reasoning [5] and, students' many difficulties with the visualization of relativity phenomena [6,7] presents a challenge for educators.In this context, research on students' visualization abilities is essential for a deeper understanding of the theory.
Therefore, resources that aid both the visualization of relativistic effects and the externalization of students' imagery processes become significant, and a tool with remarkable potential is the AI-generated images.This new resource not only helps students' visualization, but also provides a window into their cognitive and imagery processes, thereby helping educators to identify students' difficulties and their conceptions.
There are already some studies involving the use of student selected or generated images.By exploring students' visual literacy skills and the use of images in academic work, Matusiak et al [8] found that students lack skills in selecting, evaluating, and using images.The use of studentmade images can also promote engagement and motivation, because the use of visual thinking has been found to improve student satisfaction and learning outcomes [9].Moreover, visual thinking provides alternative assessment methods [10] that can allow students to demonstrate their understanding in a creative and engaging manner.
Student-created images can also improve students' observation skills [11].This approach also allows students to express their thinking through the use of images, promoting higher-order thinking skills.Moreover, artistic reflection and image creation have been found to be effective in fostering students' reflective thinking skills, such as critical analysis, and evaluation skills [12].
Consequently, the use of student-made images can be a valuable educational approach.In this context, AI-generated images bring a new perspective and can be a powerful tool for learning purposes.However, it is important to note that, more than visual literacy and thinking skills, a completely new set of skills is required, because students are interacting with new external resources [13].Students must be able to interact with the AI through the chatbot adequately, that is, must know how to write the prompts, how to modify the generated images and interpret them.Even though the interaction using the chatbot happens through natural language (NL), a programing-type thinking is necessary [14].
Considering the scenario discussed, this study deals with students' use of AI-generated images about a complex topic, RT.The focus of the present research consists of investigating the conceptual focus of the students about their AI-generated images.In doing so, we aim to answer the following question: 'How do students approach AI to represent and express their understanding of general relativity?'.Moreover, we also analysed the factors that influenced the quality of students' images.

Research context and intervention
The present study was developed with ten year-12 students at a Brazilian public school.The students were invited by the first author, who was their physics teacher, to participate in a short extracurricular course called 'Einstein's Relativity: from GPS to black holes'.Pre-tests and post-tests were answered by students before and after the course activities.The tests consisted of 11 conceptual questions, ten of multiple choice and one open question, that were validated by three expertsone in physics and two in physics education.
The test was focused on the concept of relative spacetime, covering Special and General Relativity.The 11 questions were divided into two sections: Questions 1-6 explored Special Relativity scenarios, addressing space contraction (Questions 1 and 2) and time dilation (Questions 3-6).Questions 7-11 delved into General Relativity, covering gravitational time dilation (Questions 7 and 8), space deformation (Questions 9 and 10), and spacetime curvature and gravity (Question 11).
Each multiple choice question had five answer options: one with the scientifically correct answer, two with scientifically correct elements but also including alternative conceptions, and two with only alternative conceptions elements.
After the end of the course, all students were interviewed individually through the Report Aloud protocol [15].Using this protocol, a constant dialogue between the interviewer and interviewee was developed, where the students described what they were thinking at the time they performed some task.The focus consisted of investigating students' reasoning processes during specific activities.
The interviews were recorded and fully transcribed.For the present work, the excerpts dealing with AI-image generation were first literally translated into English and then adjusted for sentence structure.This process allowed a smoother reading while maintaining the meaning of the transcript.
The objective of the developed course was to promote learning about RT, by using multiple representations [16].Therefore the students interacted with different resources, namely, videos, images, computer simulations, experiments, group activities and generative AI, to approach relativistic phenomena.In the present paper, we discuss one of the generative AI activities in detail as well as the main results.

AI image-generation activity
Students were also engaged in an activity using generative AI right before and after the course.Using Bing AI through their own smartphones [17], the students were asked to generate an image for the concept 'relativity'.For that, they received a written guide on how to use the tool.The students were prompted to think about what they imagine for the concept, and then they articulated their visual image with written text on the worksheet.Using their description as input, the students asked Bing AI to generate an image.The AI, by default, generates four images.From these, the students selected the one that most accurately mirrored their envisioned concept.
After the image generation, the students analysed how the selected image aligned with or diverged from their initial expectations.In instances where the outcome differed from what they anticipated, they were encouraged to critically evaluate their descriptive prompts, consider potential modifications, and regenerate images.After the activity, they submitted their final chosen images and handed over the completed worksheet to the teacher for review.

Data analysis
This analysis has a greater emphasis on the posttest results (questionnaire and images) based on the information provided by students after the activities.In this regard, students' written descriptions (prompts) were examined, focusing on the clarity and the main concepts used.
Structured prompts, showing clarity and specificity as well as providing the necessary context can be considered as good prompts [18].However, as we aimed to investigate the meaning of the images and concepts involved, the accuracy of the images representing students' thoughts also was considered.
In this sense, a prompt was considered effective if the student expressed satisfaction with the resulting image.Since the focus was on whether students could use prompts to externalize their conceptual understanding through images, success was defined as whether the resulting image matched what they had imagined.
Therefore, to identify what students meant with their images based on the concepts involved, and if these images were accurate (considering what the students imagined), the interview excerpts were also analysed.The students' explanations were compared to their prompts and images.
Consequently, a match between student's expectations and the AI output indicates an ability to effectively create prompts to guide the AI, even if the conceptual accuracy is limited.If the resulting image met the student's expectations, this meant that he or she could create a good prompt.
To identify the concepts focused in the process, the keywords used by the students during the interview explanation and in the prompt were highlighted.Afterwards, these keywords were compared to the key-concepts of relativity.Finally, looking at the images, it was possible to assess students' conceptual understanding.
After the images analysis, students' tests scores were calculated.The score analysis focused only on identifying and differentiating image generation processes and prompts from students with higher and lower scores.
To calculate the scores, only the ten multiple choice questions were considered.Dealing with the five alternatives, for the correct answer five points were attributed, for the two partially correct answers three points, and for the two completely wrong answers no points were attributed.The maximum score of the test was 50 points.

Results and discussion
Through the results it was possible to identify different focuses and interactions between the students and the AI image-generator.To illustrate these scenarios, three exemplary students are discussed here.To protect the students' identities, pseudonyms were employed for reference.

Images and prompts
The first student, Luke, generated an image similar to the representations commonly used to deal with General Relativity.In the image generated (figure 1) there is a golden grid in which stars are reflected, and the bright light source in the centre has its rays bent.It is possible to identify the keyconcepts of relativity on the image, that mass can deform spacetime, being a meaningful image concerning RT.
Looking at the prompt used by Luke, the image seems coherent with the student's description because he mentioned the curved spacetime and light bending in a concise sentence.
PromptLuke: "I imagine a sheet representing the space with massive stars bending the space and changing the light trajectory".Moreover, the student was satisfied with the result obtained, as he mentioned in the interview that it was even better than he had expected.Luke related the word 'relativity' to the fourdimension universe deformed by massive objects, as he explained: "It was much more related to space, there were many more stars.The spacetime sheet [gave me] the idea of a four-dimensional universe that included the time being bent by massive objects." Analysing the student's prompt and speech, we could identify that he focused on spacetime deformation caused by massive objects.He used this idea to generate the image, with the keyconcepts of GR [2].As he could externalize and describe his thoughts adequately, Luke had the desired output from the AI.
Moreover, Luke showed a good understanding of RT, so he could articulate the concepts related to the theory to construct a coherent prompt and generate an image with physical meaning.As evidenced by the positive result obtained, Luke had adequate skills to communicate in NL what he meant and interacted successfully with the AI.
Another student, Sam, focused on similar concepts as Luke; however, the results were completely different.The first prompt provided by her was quite vague.

PromptSam(a): "The relativity is a theory about space and time".
As a result, Sam was not satisfied with the first image, which did not represent the concepts that she had thought of as 'relativity'-space and time.Therefore, Sam tried to modify the image using the second prompt:

PromptSam(b)"I think of a plane deformed sphere comparing to the Earth".
According to her, the resulting image (figure 2) still was not as she had expected: "… [relativity] would be like a trampoline but the AI

didn't understand what I wanted…
The trampoline is bent because the black hole deforms spacetime as shown on this trampoline, as I imagined.

But then I couldn't describe this [deformation and] it didn't understand me. I also couldn't express what I was imagining."
Therefore, analysing Sam's description during the interview, her focus during the image generation was the spacetime deformation caused by massive objects.It was the same concepts used by Luke, but using a trampoline analogy instead of the lycra sheet.
However, the AI-generated image was completely different.It shows the Earth and a black object aside it, possibly the student's attempt to represent the space distortion using 'plane deformed sphere'.As a 'plane deformed' object seems contradictory, the generated image was somewhat undefined, with no significant physical meaning.Depending on how to look at it, the image seems like a planet or a hole in space.
Looking at Sam's explanation during the interview, a more accurate description could be 'a sphere deforming the plane' which would be more related to the trampoline analogy made by her.However, Sam could not adequately express and describe what she was thinking to prompt the AI, even after asking for modifications of the image generated.
Even though the second generated image was better compared to the first one, it still was not representing what Sam was imagining.This unsatisfactory result reflects the difficulty in students expressing their ideas verbally and generating adequate prompts to communicate with the AI.
It is important to note that this student showed a good understanding of RT and focused on the key concepts to generate the image-the spacetime deformation by mass.During the interview, she also could explain correctly these concepts.However, due to the lack of ability to develop prompts using them [13], the resulting image does not adequately represent 'relativity' according to the student.
There are also students who focused on different concepts, such as light-speed.In these cases, the images were completely different from the previous ones.For example, Tom's image (figure 3) showed some colourful and bright lines, in an abstract way.During the image generation process, he listed some concepts.
Even though these concepts are related to RT, Tom could not articulate them to catch the main point of the theory, the curved spacetime.Moreover, he could not explain these concepts adequately during the interview.In this sense, he did not describe something concrete, and this was reflected in the image generated.
The image provided a sense of movement, related to the 'light-speed' and 'curved lines' used by him and reflecting these focused ideas in the process: "Actually, I just thought about the light-speed.According to the movies light is just a trail, right, and luminous.So, I thought about this actually happening, how it would be in the image".
As Tom mentioned during the interview, he related 'relativity' to the light-speed, thinking about a 'luminous trail'.In this sense, the image generated represented what he meant; Tom also affirmed this during the interview.This result shows that this student had the ability to interact with the AI using prompts to obtain the desired output from it.
However, it is possible to note, by the description and the interview, that the student did not have clear ideas about 'relativity'.Tom also did not demonstrate a reasonable understanding of RT and could not explain the main concepts of the theory during the interview [7].
Therefore, like Tom, even if the student possesses the skills to externalize his or her thoughts and generate effective prompts, that is, to obtain the expected result, other difficulties remain.There may be a challenge if the student has a poor conceptual understanding with no clearly defined image.In such cases, AI-generated images may fail to represent the main conceptual ideas of RT, representing the student's fuzzy ideas with no significant conceptual meaning.

Score patterns
The students' post-test scores were compared to their respective AI-generated images, prompts and explanations (figure 4).It was possible to identify some patterns related to The students who presented the higher scores (>28) had a greater focus on 'spacetime deformation' and 'gravity', being among the most important concepts concerning General Relativity.Moreover, even when the image was not as expected, as in the case with Sam, students tried to represent these concepts on a concrete basis, indicating their clear ideas.
The only exception was Tom, who admitted that he had 'guessed' a lot of the test answers, and could not provide reasonable explanations for them during the whole interview.Thus, he was considered a 'low score student' despite scoring as a reasonable level on the test.
On the other hand, the four students with lower scores (<28) focused on different concepts, mainly the light-speed.Looking to their prompts and explanations during the interview, it was possible to note that these students do not possess clear ideas about relativity.
Even though some students such as Leo and Lara mentioned concepts related to the theory, they did not articulate them coherently in the prompt or explain them during the interview.Therefore, as they did not provide concrete descriptions, the generated images were also abstract with no conceptual meaning.
In this sense, these students could not generate meaningful images because they did not have a reasonable conceptual understanding of RT.Even the students among them with skills to develop good prompts and obtain the expected result, generated fuzzy images due to this lack of understanding.

Conclusion
The emergence of generative AI tools creates new opportunities for teaching complex subjects like RT. GR's complex nature makes it inherently challenging to represent visually in traditional representations like diagrammatic sketches.This investigation reveals that AI-generated images could serve as a valuable educational medium: prompting students to visualise and engage complex ideas while offering teachers a glimpse at students' conceptualisation of challenging concepts [6,7].
Our analysis, encompassing not only the images but also the prompts and explanations, centred on students' conceptualisation of the key ideas during the image creation process.The aim was to discern the intent behind their representations and their satisfaction with the outcomes.The focus of our analysis was not on the scientific precision of the images in depicting RT but rather on what these images reveal about students' conceptual understandings and reflections.This approach underscores the potential of AI-generated images as a tool for exploring student conceptions, offering insights into their understanding and interpretive abilities.
This study also highlights a broader issue: the challenge of effectively translating understanding into coherent AI prompts for image generation.This difficulty is not limited to someone with a limited understanding; even those with a better conceptual understanding may struggle to capture their thoughts into the concise language required for AI interaction [13].It is crucial for educators to recognize and address the challenges students face in generating effective prompts.Guidance in this area can mitigate frustration and enhance the learning experience.Many students struggled to articulate their thoughts into precise instructions for the AI [19], underscoring the need for skills in conceptual understanding, clear textual expression, and concise prompt formulation.
This study also reveals the potential of using AI-generated images to evaluate and foster students' communicative skills in interpreting and conveying scientific concepts, a methodology that can be extended to other intricate subjects like Quantum Physics [20].While our findings are promising, they are preliminary and highlight the need for further research.This exploratory study opens the door to a novel and engaging way of teaching and learning, one that intertwines technological innovation with educational practice.

Figure 1 .
Figure 1.Luke's AI-generated image, where is possible to observe the idea of curved spacetime and light bend.

Figure 2 .
Figure 2. Sam's AI-generated image, the black object aside Earth possibly is the attempt to represent a deformed space.

Figure 3 .
Figure 3. Tom's AI-generated image, where is possible to note the relation to the 'light-speed' concept.

Figure 4 .
Figure 4. Summary with students' prompts, generated images, explanation of the images and post-test scores-the main concepts used by each student are highlighted.