Exploring the role of large language models in radiation emergency response

In recent times, the field of artificial intelligence (AI) has been transformed by the introduction of large language models (LLMs). These models, popularized by OpenAI’s GPT-3, have demonstrated the emergent capabilities of AI in comprehending and producing text resembling human language, which has helped them transform several industries. But its role has yet to be explored in the nuclear industry, specifically in managing radiation emergencies. The present work explores LLMs’ contextual awareness, natural language interaction, and their capacity to comprehend diverse queries in a radiation emergency response setting. In this study we identify different user types and their specific LLM use-cases in radiation emergencies. Their possible interactions with ChatGPT, a popular LLM, has also been simulated and preliminary results are presented. Drawing on the insights gained from this exercise and to address concerns of reliability and misinformation, this study advocates for expert guided and domain-specific LLMs trained on radiation safety protocols and historical data. This study aims to guide radiation emergency management practitioners and decision-makers in effectively incorporating LLMs into their decision support framework.


Introduction
In the recent past, large language models (LLMs) have greatly emerged as transformative tools in various domains.Interest grew especially after the introduction of OpenAI's ChatGPT in November 2022 [1] which saw a frenzy in internet search activity for more information on LLMs (figure 1).This emergence of LLMs, including OpenAI's GPT-4 [2] and Google's PaLM 2 [3], with their intuitive conversational mode of interaction, has witnessed a surge in their utilization across several diverse industries [4] due to their remarkable language generation capabilities and contextual understanding.
But new technologies are often touted as double-edged swords and artificial intelligence (AI) in general and LLMs in particular being no different [5].On the one hand, their ability to process vast amounts of textual data and generate human-like responses [6,7] is a potential game-changer in natural language understanding [8], sentiment analysis [9][10][11], and question-answering tasks [6,12].And on the other hand, difficulty in distinguishing between human-written and machine-generated text [13,14], potential for mass production of misleading and deceitful content [6,13] and questions of reliability and truthfulness of generated content [13,15,16] have raised concerns.
Understanding the two sides of the LLMs becomes especially important when this technology is used in a sensitive domain such as disaster or emergency response in the nuclear and radiological industry.The infamous nuclear accidents at Chernobyl and Fukushima and the recent attacks on the Zaporizhia nuclear power plant in Ukraine have raised apprehensions in the minds of the public about all things radiation.In such an environment, it is critical that a potentially disruptive technology such as LLMs be examined comprehensively on its use by practitioners as well as the lay-person.As far as we know, this paper is the first of its kind to explore the application of LLMs in the domain of radiation emergency response management.The rest of the paper is structured as follows-sections 2 and 3 introduce the domain of radiation emergency management and traditionally used decision support tools in this domain.The scope of using LLMs as decision support tools is discussed in section 4. Insights on who could be potential users of this application and their use in radiation emergency is explored in sections 5 and 6 respectively.Section 6 also includes preliminary results of a user-ChatGPT simulation exercise.Section 6 also discusses the challenges and precautions to take while using LLMs in radiation emergency management.Appropriate conclusions are drawn in section 7.

Radiation emergency management
Radiation emergencies 3 or accidents, while relatively infrequent, are challenging events to respond to because they involve dealing with radioactive sources and their potential dispersal, handling of specialised equipment, coordination between different stakeholders and dealing with the public wariness of radiation.The challenge lies in effectively responding in the face of high uncertainty and public fear.Accurate and rapid decision-making in such accidents is critical, and can significantly mitigate the extent of harm incurred.
A radiation emergency can encompass a range of situations, from smaller incidents involving radioactive sources [17] to larger nuclear accidents like a core meltdown in a nuclear reactor [18].Within this spectrum, various individuals play distinct roles.These roles include victims, first responders, incident commanders, medical doctors, public relations officers, and concerned citizens.Each stakeholder has specific concerns, which vary depending on the scale of the emergency.For instance: • Victims of acutely high radiation exposure incidents would be worried about any symptoms they may experience; • First responders would seek guidance on conducting radiation monitoring in hazardous environments; • Incident commanders would want efficient methods to manage their responding team's radiation exposure; • Medical doctors would require information on distinguishing radiation sickness symptoms from other diseases; • Public relations officers would want to convey information without causing unnecessary alarm among the public; • Concerned citizens would seek comprehensive knowledge about the radiation emergency and the hazards associated with radiation.
These diverse set of participants would look towards any tool that could alleviate their concern and add value to any decision they would make.Among the experts and practitioners of emergency response management, decision support tools and systems are commonly employed to address their concerns during and after disasters [19].These tools serve Online Nuclear Emergency Response System (ONERS) [28] as valuable resources for enriching decision-making, leveraging predictive capabilities, accessing domain-specific information quickly, resource management and easing the cognitive burden associated with planning and comparative assessments [20].
On the other hand, non-experts and citizens often rely on popular search engines such as Google or Microsoft Bing for information, primarily due to their accessibility, user-friendliness, and convenience [21].For these stakeholders, the underlying motivation typically revolves around accuracy of information, what to do, self-preservation, concern for loved ones, curiosity etc [22].This is where the free versions of popular LLMs like ChatGPT or Bard can become potential one-stop destinations for answering queries.But their versatility in addressing technical questions from various domains may also attract practitioners in radiation emergency management to use them for some of their tasks.This will be discussed in detail in sections 5 and 6.

Traditional decision support systems (DSS)
One way to look at radiation emergencies is as a juxtaposition of uncertain radiological consequences with varying weather pattern, distributed demography and topography along with a variety of human habits.All these factors together make these situations complex.
In such an event, a DSS does exactly as it is named-it supports decisions, especially of those responsible for managing the response to such emergencies.Most of the available literature in this domain involves decision support for nuclear accidents, simply because of its higher hazards when compared to radiological incidents that typically involve one or more radioactive sources.Some popular DSS for nuclear emergency management are given in table 1.
Most of these DSS are modular in architecture and have several modules that cater to different parts of the emergency management, such asi.Data integration: Data from several monitoring systems are integrated into one common platform, keeping in mind the different data types; ii.Display of real-time data: Real-time data of monitoring systems, such as radiation monitoring systems at site boundary, are displayed to the operators; iii.Modelling and prediction: Different models for accident progression, source term estimation, atmospheric dispersion and dose prediction are incorporated; iv.Consequence assessment: Dose projections, risk estimation, dose aversion through different protective actions, variation of protective action parameters etc., are also incorporated; v. Response planning: Multi-criteria decision-making methods are also incorporated to rank different strategies; In general, most of these DSS run their models and provide output based on rules of physics and any other rules defined by the DSS developers.However, some of them have modules that incorporate a level of intelligence.For example, the Evaluation Subsystem (ESY) of the RODOS compares different response strategies and provides a natural language report [29].This 'intelligence' in the ESY is based on expert systems [30], which is one of the earlier forms of AI [31].There was no 'learning' in these classical expert systems, unlike in modern AI powered tools.Though the output provided by the DSS enriched the decisions of the decision makers, it was limited by its modelling architecture.
Another important feature of these DSS is their usability and user base.Many of them have a graphical user interface for easy access by the user, who would typically be the decision makers in any nuclear emergency.Therefore, these tools are built with a supposition that the users would have a certain level of knowledge about the various aspects of emergency management.They function more as technical tools to support the user and less as a general-purpose decision support tool.Also, out of the DSS listed in table 1, only RODOS has a natural language component in the form of an output report that is written in natural language [30].However, none of them have a conversational feature that allows user interaction with the DSS through a natural language query-response model.

LLM as a decision support tool
As seen in the previous section, decisions in radiation incidents have been based on expert knowledge, established protocols, and the application of predictive models.However, the complexity of such scenarios, coupled with the vast amounts of data generated during an unfolding event, may overwhelm traditional DSS.Furthermore, the scarcity of real-world incidents from which to learn, makes it challenging to train human responders to anticipate every possible scenario.Thus, there is an increasing need for more robust, adaptable, and sophisticated decision-aiding tools.Recent advances in LLMs, notably Google's PaLM and OpenAI's GPT provide a novel avenue to explore these models as decision support tools.
LLMs are self-supervised learning models that are trained on a large corpus of textual data.At its core, the transformer architecture of a LLM can process such voluminous data, identify intricate patterns in word relationships, and subsequently predict appropriate continuations.This allows them to generate the most probable output, depending on the input from the stakeholders [32].This output mimics human language in a strikingly natural manner, reflecting the effectiveness of the underlying foundational models [33].Thus, they can perform a multitude of tasks, from composing emails to discerning the context of a textual conversation.
LLMs have their limitations, particularly because of the mechanism of their response generation.Because of the inherent nature of their learning, LLMs predict the next token (word or phrase), which may or may not always be factually true.Despite these constraints, recent experiments with ChatGPT taking standardised tests have yielded remarkable results [34][35][36].This demonstrated that ChatGPT, and LLMs in general have the emergent ability to perform critical reasoning, and answer complex questions.
A bibliographic search on 'large language models' and 'decision support' on the Science direct literature database primarily illustrated the use of LLMs in medical/healthcare fields.This is likely due to the large volumes of textual data in the form of medical reports being ideal inputs for LLMs to work with [37] explored if ChatGPT can generate useful suggestions for improving clinical decision support.Of the 20 suggestions that scored the highest, 9 were generated by ChatGPT [38] examined the accuracy and reproducibility of ChatGPT in answering questions regarding knowledge, management, and emotional support for cirrhosis and hepatocellular carcinoma (HCC).In [39] it was found that ChatGPT correctly answered 74% of the trivia questions related to heart diseases.Specifically, the accuracy of ChatGPT scored impressively in the domains of coronary artery disease (80%), pulmonary and venous thrombotic embolism (80%), atrial fibrillation (70%), heart failure (80%) and cardiovascular risk management (60%) [40] evaluated ChatGPT as a support tool for breast tumor board decision making [41] assessed ChatGPT's capacity for clinical decision support in paediatrics [7] evaluated the capacity of ChatGPT as a clinical decision support in triaging patients for appropriate imaging services [42] did a comparative analysis of humans and LLMs in decision making abilities.The analysis found that there was a moderate level of agreement between the decisions of humans and LLMs.
All this points towards a growing interest in adopting LLMs as decision support tools in critical applications.Their conversational or chatbot feature also makes them attractive for even a general user or a non-expert, who need not rely on technical language to pose queries.This greatly enhances the usability of the tool as users can write queries in their own way and the deep learning architecture of these LLMs is able to understand and provide a sensible output [6].But it is important to understand that, while LLMs are trained on a huge corpus of data, they do not have specific modules to carry out most of the required functions mentioned in sections 3(i)-(v).
Hence, while LLMs hold promise in providing valuable assistance to all participants in a radiation emergency it is also important to address the following crucial questions: 1. Can the responses from LLMs be trusted as reliable and factual? 2.Where do LLMs fit in the broader landscape of emergency response management?These questions are essential to understand the true potential and limitations of LLMs in the context of managing radiation emergencies effectively.But due to the breadth of this field, we acknowledge that it may not be possible to comprehensively cover all emergency response aspects in a single paper.

Potential users of LLMs in a radiation emergency
While the various functional applications of LLMs are well discussed [43][44][45][46], we have identified four major use-cases and their potential users in a radiation emergency (figure 2).

Chatbot and smart assistant
LLMs are perfectly poised to take the role of chatbots or smart assistants in an emergency [47].Their ability to comprehend questions and contextual awareness means that the user need not waste time creating an elaborate query.First responders, incident commanders and medical doctors are expected to be the primary users.First responders could use the LLMs in certain radiation monitoring tasks such as feeding a series of monitored data points and querying a statistical analysis; or describing an area of high radiation field and querying for an optimum monitoring route.
Incident commanders would perhaps find the greatest application in the form of smart assistant.LLMs can function as an all-round advisor among human experts during a radiological emergency.In comparison to first responders, incident commanders may be more efficient in using LLMs simply because of being away from the line of action and the ability to delegate the job of interacting with the LLM.Some applications could be feeding in real-time scenario observations and querying for standard operating procedures (SOPs) or simple action plans for the different teams to follow [48] (figure 3).Another application could be describing the monitored radiation levels, team composition and querying for an optimum scheduling plan to minimize the radiation exposure per person and as a team.
Medical doctors receiving patients with radioactive contamination or exposed to high radiation levels can also use LLMs as smart assistants.By feeding in patient history and symptoms, they can use LLMs to narrow down on possible causes of exposure or contamination.In case the hospital staff is not trained to handle contaminated victims of a radiation emergency, LLMs can be used to create radiation-specific SOPs and highlight the dos and don'ts in their treatment.

Text summarization
Another excellent use-case of LLMs in a crisis is in summarizing streams of text, either from social media [49] or crisis reports [50] or even medical records [51].The ability to extract the most contextually relevant parts of a text makes LLMs an attractive option as a text summarizer.Incident commanders can use this ability to compress a long compilation of team reports and even team recordings into a brief day-to-day summary of events or an action taken report.
Doctors can use the summarization feature to prepare handy notes about the contaminated patient's history or use the patient's description of her symptoms to generate a report.The LLMs that helped in generating the report can also be used to analyze the same report for clues on possible radiation exposure or contamination.Another important user could be public information officers who could use LLMs to provide periodic event and action summaries based on inputs from incident commander and other relevant stakeholders.The ability of LLMs to perform sentiment analysis can also be leveraged to create a media briefing (figure 4) that is positive and reassuring in tone yet conveys the facts as well.

Question-Answering
An extension of LLM's chatbot like use-case could be to answer queries [52] raised by concerned citizens and crisis victims.Victims of a radiological emergency would naturally be concerned with their symptoms and course of treatment.They could interact with LLMs to ask questions about their symptoms, or what to do next or how to make sense of their treatment etc. Concerned citizens could also ask questions from LLMs to understand the nature of the radiological emergency.Questions on hazards of radiation exposure, what to do in case of contamination, dos, and don'ts about going outside etc., could be the nature of queries.Doctors could also use LLMs for getting answers, rather than as a chatbot or a smart assistant.This could be the case for doctors or physicians who are not trained in radiation medicine and whose queries would more likely be generic rather than specific to the exposure situation.Their questions could be similar in nature to those of concerned citizens, but posed with some medical jargon.

Language translation
Another major application of LLMs could be their ability to translate text into multiple languages [53].At the time of a radiation emergency, instructions may have to be given to people by the first responders, sometimes in a language not of their own.This is likely in a country such as India where multiple languages are spoken across geographic areas.Or in another instance, incident commanders may need the translating services of LLMs to interact with foreign volunteers in a crisis.Pubic relation officers may also need to generate media briefings or press releases about the event in multiple languages (figure 5).In all these instances, the translation capabilities of LLMs would be a typical use-case.However, LLMs would have to be trained on a large corpus of local languages to expand its domestic use capability.

Application of LLMs in radiation emergency management
In the preceding section, we examined major LLM use-cases catering to various participants in a radiation emergency.In this section we explore how this usage fits in the larger scheme of the emergency response management by describing the context of the emergency.To do this we consider a nuclear accident scenario and a radiological incident scenario, where LLM use-cases can fit well.

Urgent phase of a nuclear accident
Not every accident at a nuclear reactor leads to the release of radioactive material into the environment.Robust safety systems are in place to mitigate accidents, operating redundantly and diversely.If an accident progresses, it does so in a sequential manner, with safety systems failing systematically until radioactive material is released.Operators often prevent such releases by halting the accident's progression and initiating a safe shutdown.However, in rare cases like Chernobyl and Fukushima, releases may occur.The period between the occurrence of the accident till the time when the releases become insignificant is typically considered as the urgent phase of the emergency [54].And in case plant operators have confirmed that the accident could progress to off-the-site releases, they typically notify the local administration to start taking protective measures [55].
The urgent phase can span from a few hours to a few days and is generally characterized by high uncertainty in the radiological conditions, both within the plant and outside.Amidst such uncertainty, different participants look towards decision support tools to aid in decision making.Plant operators could use LLM based decision support tools as chatbots/smart assistants to make sense of critical plant parameters.
The LLMs could generate natural language reports of the plant status that could highlight critical parameters that need troubleshooting.LLMs could also be used by the incident commander of the district housing the affected nuclear reactor.As stated in section 5, the incident commander and the various first responder teams could use LLMs to plan their response actions, optimize their resources and develop action reports.The citizens in the affected areas could look towards LLMs to answer many of their concerns regarding the situation.Through the currently available LLMs are general purpose tools, LLMs tailormade for nuclear emergency response could be easily used by the general public as a sort of risk communication and warning tool.

Lost radioactive source
Conventionally, when a radioactive source goes out of regulatory control, it triggers a response mechanism that involves the licensee of the source, the first responders and the radiation emergency response manager.These stakeholders typically have well defined roles in a national communication, coordination and response mechanism for radiological incidents.The licensee is trained in handling the radioactive source as a part of licensing requirement and is informed on certain basic do's and don'ts in the event of the source going missing.The first responder is also trained in responding to such events and this involves following certain standard operating procedures that consider the hazards of the source involved.The emergency response manager lends support to a search, response and recovery operation by the first responders in the form of technical inputs such as dose assessment, consequence analysis etc.
The entire emergency management system comprising of these different players is built on expert knowledge of the radioactive source, its hazards, response capabilities and effective training.But the moment a source is found missing, time is crucial.Every minute spent not recovering the source is a potential increase in radiation exposure to unsuspecting members of the public.The emergency management system in place may be effective, but there is a certain response time delay that just cannot be avoided.Moreover, the urgent phase of the incident (like what was described in section 6.1) is typically marked by high uncertainty regarding the state and location of the lost source.So how can this time delay and uncertainty be reduced?
The licensee can use LLMs as Chatbots to identify possible locations to begin searching for the lost source.By feeding in relevant pieces of information about the source type, quantity, last known location and other observations, the licensee can tease out possible areas where the source may be found.Information on whom to call and notify about the lost source can also be extracted from the LLM, especially one tailor-made to the country's specific response management system.The incident commander can use LLMs as Chatbots to understand the extent of hazard of the lost source, its possible dispersal once the area is identified, efficient monitoring protocols etc.The incident commander can also use LLMs as text summarizers to create and send periodic incident reports to higher management at all steps of the source search and recovery operation.The first responder teams can plan their resource allocation using LLMs by providing input on number of persons, equipment, allowed dose values, etc., in natural language.The licensee and their parent company can use LLMs as text summarizers to create press releases.The concerned citizens can query LLMs on precautions to take based on these press releases, social media information etc.They can also gather information on possible hazards of a lost source by asking queries in natural language, that would otherwise be difficult in traditional search engines.

Simulating the LLM interaction in a radiation emergency
To simulate the use of LLMs in a radiological emergency, we conducted a theoretical exercise centred around a past radiological emergency-the incident in Samut Prakarn, Thailand in 2000.During this event, a radioactive source was accidentally broken by a scrap worker, leading to substantial exposure.

Method
The timeline of the accident was extracted from the International Atomic Energy Agency (IAEA) report [56] and the key participants involved were identified.For a better understanding of users' interaction, we divided the accident sequence into distinct stages, each with its corresponding participants.With this structure in place, questions were formulated from the perspective of each participant and OpenAI's ChatGPT was used to simulate the interaction a user might have with the LLM.A sample of the query dataset is shown in table 2. Our aim was to evaluate ChatGPT's responses in two ways: (a) using the default ChatGPT, without contextual priming and (b) using ChatGPT, after priming it in the context of radiological emergencies, though not trained in the traditional sense.
'Priming the model' consisted of providing it with the textual content of a few past incident reports (Chilca, 2012 [17] and Cochabamba, 2004 [57]), allowing it to absorb and comprehend the information.We then presented both the default and primed ChatGPT with the same queries.Here we present preliminary results highlighting the opportunities and challenges of using ChatGPT as a decision support tool for emergency management.Readers trying to replicate this exercise's prompts may please note that it will not give the same response, due to the stochastic nature of ChatGPT's token prediction architecture.

Victims and doctors
For queries of a medical nature, such as those posed by exposure victims, the output of both primed and default models displayed a disclaimer element, i.e. they emphasised that they are not medical professionals.This observation is depicted in figure 6. Surprisingly, the disclaimer element appears only when the question includes a phrase such as 'what is wrong with me?' .For example, when an exposure victim poses a query such as 'i am feeling nauseous and I have a severe headache.What is wrong with me?' , both the models begin with the disclaimer 'I am not a doctor, but…' .But when the query is posed as, 'i am feeling nauseous and I have a severe headache.List out the possible causes for my symptoms' , the disclaimer element is absent.While a cautious user may take the model response with a pinch of salt, whether the disclaimer be there or not, an unsuspecting user may give it serious consideration.
When faced with a rather generic query such as, 'i am feeling dizzy, have bad headaches and also started vomiting.What is wrong with me?' , both the models showed a predictive element that gave a diagnosis based on their given condition (figure 7).But the default model gave a broad spectrum of causative factors, which did not include radiation exposure.While the primed model delved into the possibility of linking the symptoms to acute radiation exposure.However, the primed model, while acknowledging the symptoms were non-specific and could be associated with a wide range of medical conditions, also exhibited contextual  awareness and said that its response was in the context of its priming.Further, the primed model also emphasized the need for more detailed information on recent activities, potential exposure to hazardous material etc., to reach any conclusion.
The models also displayed an advisory and a precaution element that directed the user to seek medical attention and precautions to take in case of any future work involving similar hazards (figure 6).All this points towards contextual awareness of both the models.
From the point of view of a doctor, when a detailed query was posted including the exposure victim's symptoms and work history, both the models recognized the severity of the patient's condition, highlighting the need for immediate medical attention due to symptoms potentially linked to chemical or radiological exposure.While the default model tended to focuses on chemical exposure, detailing how the patient's symptoms correlate with chemical effects like burns, toxic inhalation, and systemic toxicity, the primed model correctly identified radiological exposure as an additional concern.It explained how the symptoms aligned with radiation effects, suggesting immediate consultation with radiological experts for assessment and guidance.The primed model even emphasized investigating the patient's occupational history and the suspected material's nature for accurate diagnosis.It also recommended informing relevant authorities if radiological exposure is suspected and monitoring the health of fellow workers who might be at risk.
Both responses stressed the importance of further evaluation through diagnostic tests and specialist consultations to determine the underlying cause of the symptoms.Given the potential seriousness of the situation, both suggested prompt and thorough actions for patient care and safety.Here we would like to direct attention to the importance of a primed model versus a default one.If the doctor were to rely only on the general-purpose default ChatGPT, then without any prior training in radiation medicine, the doctor would be unlikely to consider radiation exposure as a possibility.This would most likely result in improper treatment to the patient and delay in informing the authorities, which could result in more inadvertent exposures to the unsuspecting public at the incident site.But on the other hand, the primed ChatGPT may always be biased towards signs of radiation exposure and therefore, a certain level of judgement is needed from the user while using this LLM.
The importance of having a primed decision support tool is further apparent when, despite presenting all tell-tale evidences of a radiation exposure, the default ChatGPT was unable to narrow down on the cause (figure 7).Nausea, vomiting, hair loss, skin burn and low white blood cell counts are signatures of acute radiation exposure [58].The default model was unable to attribute these symptoms to acute radiation exposure, whereas the primed model was able to do so immediately.
But when a query was posed affirming a possible radiation exposure and asking for the next course of action, both the models gave similar responses.Both responses underscored the urgency of providing specialized medical care for acute radiation exposure, incorporating measures like hydration, pain management, and supportive interventions.They concurred on the necessity of consulting radiological experts for tailored guidance and involving authorities responsible for radiological safety and public health.Collaboration with these authorities was stressed for investigations and containment efforts and evaluating potential risks to other exposed individuals, including workers and family.Long-term medical monitoring for late radiation effects was also emphasized by the two models.

First responders and incident commanders
When a query was posted as a first responder asking for a plan of action to locate possible sources of exposure at the incident site, both the models gave a broad outline of actions to carry out, that were somewhat specific to the task.Both the models stressed on wearing personal protective equipment, choosing the appropriate equipment, adequate training to the responders and coordinating with radiation safety experts.The primed model was a bit more specific in its recommendations, such as recommending certain types of radiation monitors, deciding on safe distances and exposure limits etc.While this is expected of a primed model, the default model also surprised us by providing useful insights such as importance of using remote tools, establishing safety zones, and marking elevated radiation areas.Both the models exhibited contextual awareness and their recommendations were highly aligned with what is typically recommended and practised in radiation response.
But a significant difference between the two models was observed when it came to responding to a query on evacuation.For example, consider the query posed from the point of view of the incident commander-'There is high radiation level at certain points (of the order of 1 Sv/h) in the junkyard and there is no radioactive contamination.As the incident commander, should I recommend evacuation of the people living around the junkyard?Please take into account the dense population living in the vicinity of the junkyard.'Both the models identified and acknowledged the seriousness of the situation involving high radiation levels and the potential risks it posed to the nearby population.Both recommended consulting with radiological experts and authorities to assess the situation and make informed decisions while also emphasizing the importance of taking immediate safety measures, including cordoning off the affected area and assessing the extent of the radiation field.They also suggested evaluating factors such as the type and intensity of radiation, distance from the source, shielding, duration of exposure, and population density when considering the potential health impact and need for evacuation.On these technical aspects, the two models provided useful insights to the incident commander.
However, on the question of evacuation, the default model strongly recommended evacuation due to the extremely high radiation levels whereas the trained model approached the decision more as a balanced consideration.The default model justified considering evacuation as it focused on the severe radiological hazard and potential health effects of prolonged exposure, while the primed model emphasized evaluating risk factors, potential health impact, and weighing the risks against the challenges of evacuation in a densely populated area.It was very clear in the primed model's responses that it specifically highlighted the need to balance the risks of radiation exposure with the potential disruption and challenges of evacuation.Once again, the importance of a primed LLM becomes apparent, as it offers a more balanced view on a controversial decision of evacuation.This balancing of conflicting factors is an essential part of decision support.

Concerned citizens and public information officer
We considered a situation where a curious onlooker observes the response activities at the incident site from her house.She then writes a query on ChatGPT describing the event and asking if she would be safe from radiation and how she could protect herself.Both the models correctly identified the importance of shielding and ventilation in this context and recommended staying indoors with windows and doors closed.The models even suggest avoiding unnecessary exposures by refraining from going outside.Both the models stressed on staying informed about the event through news, social media and official press releases.The tone of the responses was cautious and informative and there were not too many differences between the two models' responses.
We also considered how a public information officer might use ChatGPT in this situation.As a public information officer, a query was posed to ChatGPT to write a press release after describing the latest updates about the incident.This query also insisted that the press release convey the facts and sound reassuring.Both the models generated similar press releases which were remarkably informative, reassuring and instructive in nature.Coupled with expert oversight, ChatGPT could help make the media briefing process more efficient.
The full comprehensive results of this exercise will be published separately.

Discussion
Through our simulation on the individual users and the LLM use-cases in the radiation emergency context, we believe that LLMs hold tremendous scope as decision support tools in radiation emergency management.Their remarkable contextual awareness, natural language interface and ability to comprehend a variety of queries makes it very valuable for decision support.But this technology is still in its nascent stages and its achievements must be tempered with some hard realities-the issues of ethics, misinformation, reliability, authenticity, toxicity, and plagiarism of LLM outputs [13,59].Among these, we believe two aspects need specific cautioning in the context of radiation emergency management-reliability and misinformation.Unreliable information from LLMs may have unwarranted consequences in a radiation emergency, especially with untrained users.For example, consider the case of a lost radiography camera, where the licensee of the device provides incorrect prompts to a freely available default LLM such as ChatGPT.This may result in a downgraded risk assessment output, which may tempt the licensee to recover it herself.This is not acceptable as such a licensee is not trained for such recovery operations, the input prompts were incorrect and any action based on this output may cause unnecessary overexposure to herself.This is analogous to instances where a sick user provides his symptoms on the internet and decide to self-medicate based on the internet's search results.The reliability of outputs becomes a greater concern for specific users, such as incident commander or the responders, who deal in higher stakes.The data on which the general purpose LLMs are trained may not be enough for their particular use cases and so these field users may not receive reliable information on field-specific queries.
Having a domain specific LLM is one way to improve the reliability of outputs.The value addition of being trained on a huge corpus of historic reports and response protocols is tremendous.A general purpose LLM is more useful for a concerned citizen or exposure victim who may not understand the more technical responses of a trained LLM.In addition to having a trained LLM, reliability of LLM output could be improved by robust in-built checks and balances [60].
Then there is the problem of misinformation and LLMs, particularly ChatGPT, have been known to exhibit 'hallucinations,' [61][62][63] i.e. they produce statements that sound plausible but are false.This would be a concern especially in the context of radiation emergency management.For example, if a responder were to receive a 'hallucinated' action recommendation from the LLM, then it could result in a down-played radiological risk and an in-advertent radiation exposure.In another example, an exposure victim may not receive the appropriate course of treatment if the physician acts upon factually incorrect recommendations from a 'hallucinating' LLM.
The only means of preventing this is by improving the contextual understanding or 'common sense' of the LLM architecture and by updating the training database with up-to-date information.In the light of these two aspects, we recommend that any LLM to be used as a decision support tool in a radiation emergency be (a) primed on specific response protocols, radiation safety principles and historic incident reports and (b) be used under expert supervision, preferably someone who is an 'AI whisperer' , i.e. who is well versed in providing usable prompts to the LLM to extract maximum value.

Conclusion
The emergence of LLMs has brought transformative capabilities across various industries, owing to their powerful language generation and contextual understanding.However, the dual nature of LLMs as potential game-changers and sources of concern highlights the need for comprehensive examination, particularly when applied to critical areas like radiation emergency response.
The purpose of this paper was to explore the application of LLMs in radiation emergency response management.This is particularly relevant in a world where incidents like Chernobyl and Fukushima have heightened public apprehensions about radiation and assessing the potential benefits and risks of utilizing LLMs in such a context becomes paramount.Our exploration mainly delved into the role of LLMs as decision support tools in radiation emergency response management.We examined their contextual awareness, natural language interface, and potential to comprehend a diverse range of queries, revealing promising possibilities for their integration.We also identified the different use-cases and users of LLMs in a radiation emergency.Through user-ChatGPT simulation exercises, we established their scope for providing valuable insights and assistance.
However, in the realm of radiation emergency response, reliability and misinformation emerge as critical concerns.Instances where unreliable information or 'hallucinations' from LLMs could lead to improper actions, endangering responders and victims alike were discussed.To mitigate these challenges, development of domain-specific LLMs trained on radiation safety protocols, response principles, and historic incident data is a must.Such LLMs can provide reliable, contextually accurate responses.We also recommend employing LLMs under expert supervision, enhancing their contextual understanding, and updating their training databases regularly.
Through this paper, we hope to provide initial guidance and insights to practitioners and decision makers in radiation emergency response management, who are seeking to incorporate LLMs into their decision support framework.

Figure 1 .
Figure 1.Dramatic rise in popularity of keywords 'Bard' , 'ChatGPT' and 'large language model' in Google searches from Jan 2022 till July 2023(data source: Google Trends TM ).

Figure 2 .
Figure 2. Illustrating how different individuals in a radiation emergency scenario may use LLMs capabilities.

Figure 3 .
Figure 3. Snapshot of ChatGPT's response to a query on creating a plan of action for a radiation emergency (Generated by authors using the free version of a popular LLM-OpenAI's ChatGPT version 3.5).

Figure 4 .
Figure 4. Snapshot of ChatGPT generating a press release for a radiation emergency (Generated by authors using the free version of a popular LLM-OpenAI's ChatGPT version 3.5).

Figure 5 .
Figure 5. Snapshot depicting ChatGPT's language translation (here in French) capability in a radiation emergency.(Generated by authors using the free version of a popular LLM-OpenAI's ChatGPT version 3.5).

Figure 6 .
Figure 6.ChatGPT's response (untrained model) to victim's query has many supportive elements-a disclaimer element to reinforce its limited knowledge, a prediction element that supports in diagnosis, an advisory element that provides guidance on next course of action and a precaution element that gives valuable insights on any future actions.(Generated by authors using the free version of a popular LLM-OpenAI's ChatGPT version 3.5).

Figure 7 .
Figure 7. Comparing the response of default and primed ChatGPT model for the same prompts.(Generated by authors using the free version of a popular LLM-OpenAI's ChatGPT version 3.5).

Table 1 .
Some DSS used for nuclear and radiological emergencies.

Table 2 .
A sample of the query dataset used in the simulation exercise.