Fast, Simple, and Accurate Time Series Analysis with Large Language Models: An Example of Mean-motion Resonances Identification

Classical machine learning has been actively utilized in astronomy to address various challenges, including predicting orbital stability, classifying asteroids, galaxies, and other objects, and analyzing images. However, the emerging trend in artificial intelligence involves the use of large language models such as GPT-4 and ChatGPT. These models are trained on a large corpus of text and can perform a wide range of natural language processing tasks, including text generation, translation, summarization, and classification. Surprisingly, these capabilities present significant potential for application in astronomy. This paper demonstrates how the new model gpt-4-vision-preview can analyze visual patterns and accurately classify asteroids as resonant or nonresonant with high accuracy. This process requires no training, fine-tuning, or coding beyond writing the appropriate prompt in natural language. Moreover, this approach can be extended to other common problems within astronomy.


Introduction
Machine-learning (ML) techniques are now actively used in astronomy and celestial mechanics.Using ML, it is possible to identify asteroids trapped in mean-motion resonances (MMRs; Smirnov & Markov 2017), predict orbital stability (Liu et al. 2021) or long-term stability of planetary systems (Lam & Kipping 2018;Tamayo et al. 2020), find new asteroid families members (Carruba et al. 2020), classify Kuiper Belt objects (Smullen & Volk 2020), detect exoplanets (Malik et al. 2022), and analyze images (Carruba et al. 2021b).Primarily, these study utilize what is often called classical ML, which uses statistical algorithms or models to analyze data.
There are several types of astronomical problems that can be solved using classical ML.These could be classification problems where multiple objects are classified into different categories.Such classification is performed either without prior data (in a so-called clustering task) or with prior data (in so-called supervised learning).The former aims to find the hidden patterns in the data, while the latter aims to boost prediction, for example, when a non-ML approach requires significant computer resources.An example of clustering task is asteroid families classification (Carruba et al. 2020), while an example of reducing computations is predictions of stability (Tamayo et al. 2020) or resonance identification (Smirnov & Markov 2017).
Another type of problem that can be solved using classical ML is related to big data.For example, when there are many images and it is necessary to find outliers or anomalies there, the ML approach can achieve good results.Baron & Poznanski (2017) were able to find multiple abnormal galaxies by using a simple clustering method (unsupervised random forest).
Classical ML can be understood as "statistics on steroids."It is based on the analysis and utilization of the data and the application of statistical algorithms that are able to find patterns in the data.For example, one of the most popular methods, k-nearest neighbors (kNN), is based on the idea that similar objects are close to each other in the feature space.If there is an object that has to be classified, the algorithm will find the k nearest objects and assigns the class that is the most frequent among them.A researcher can vary multiple parameters, such as the number of neighbors, the distance metric, or the weights of the neighbors.This algorithm is used in multiple astronomical problems, such as the classification of asteroids (Smirnov & Markov 2017).
Another popular method is a decision tree.It is based on the idea of splitting the data into subsets based on the values of the features.The algorithm finds the best feature and the best value to split the data into several tree branches.This process is repeated until the subsets are pure or until the maximum depth of the tree is reached.
It is possible to use a combination of several "weak" classifiers, such as kNN or decision trees.This is called an ensemble method.The most popular ensemble method is random forests, which creates multiple decision trees and averages their results.This method is used in multiple astronomical problems, such as the detection of moving objects (Lin et al. 2018) or the identification of secular (Carruba et al. 2021a) or MMRs (Carruba et al. 2021b).
Another common ML approach is deep learning, which is based on neural networks, such as artificial neural networks (ANNs) and convolutional neural networks.They are the models capable of automatically learning feature representations from data.Thus, there is no need for manual feature optimization.ANNs are inspired by the biological neural networks and composed of layers of "neurons" that can learn to represent complex patterns through training.They can handle a wide range of tasks from regression to classification.A convolutional neural network is a specialized type of neural network designed for processing special kind of data, such as images.Carruba et al. (2021b) used a deep-learning approach to classify the images representing the resonant angles of the asteroids that can be trapped into MMRs.The same authors performed a comprehensive review of the astronomical papers dedicated to ML (Carruba et al. 2022).
All methods of the classical ML (and even deep learning) are based on the input data.Often, supervised learning (or, in other words, learning with a teacher) is used.This means that the algorithm should be trained on the data with known labels, which implies the existence of the training set and enough computational resources to build a reliable model, which might differ from one task to another.For example, having a model that can predict whether or not an asteroid is trapped in the specific resonance does not mean that the same model can check whether it is trapped in another resonance (Smirnov & Markov 2017).It is possible to build more universal models, but this will dramatically decrease the accuracy.One of the reason why it happens is due to the nature of these methods: they are based on the statistical algorithms.There is no background knowledge of the underlying astronomical or mathematical problem.
In contrast, the latest trend in artificial intelligence (AI) is the use of large language models (LLMs) such as GPT-4 (OpenAI et al. 2023) and ChatGPT (OpenAI 2024).These models are trained on a large corpus of text and can perform a wide range of natural language processing (NLP) tasks, including text generation, translation, summarization, and classification.What is more important, they can be used for decision-making tasks, where the human expertise is required because they can behave like human beings.This is a new approach that is not based on the statistical algorithms, but on the NLP.If one needs to classify an object, it is enough to write a prompt in natural language, and the model will provide the answer.This approach is called zero-shot learning, as it does not require any training or fine-tuning.Thus, it makes such an approach universal.In other words, while an LLM might have no information about what it means "to be trapped in a resonance," but it still can classify whether an asteroid is resonant if there is an explanation written in natural language.
Usually, it is assumed that LLMs are suitable for social sciences, humanities, and other fields where the data are in the form of text.However, this is not quite accurate: it is possible to use LLMs in natural sciences.More specifically, they can be used for any task where human expertise and decision-making are required.A good example of such a task is pattern recognition in time series, which is often required in multiple astronomical problems.Other examples include outlier detection, image recognition, or object detection.While these tasks are often addressed with classical ML techniques, the application of LLMs introduces a novel approach that relies on NLP and explicit task definitions, rather than on statistical algorithms or probabilistic models.Thus, it does not require any prior knowledge of the algorithms, training, or fine-tuning.
Let us demonstrate on the case of MMRs.The distribution of asteroids in the main belt over the semimajor axis has a number of gaps.These gaps are caused by the MMRs with planets, which represent a commensurability between the mean motions of the asteroid and a planet (or planets): where N is the number of bodies involved in the resonance, λ i are their mean longitudes, and m i are integers.To determine whether an asteroid is trapped in an eccentricity-type resonance, it is necessary to calculate the resonant angle σ: where λ i and ϖ i are mean longitudes and longitudes of periapsis of the bodies and m i and p i are integers following the D'Alembert rule: ∑ i m i + pi = 0.The resonant angle must librate if the asteroid is trapped in the resonance.The example of libration is in Figures The classification of the behavior of the resonant angle is not a simple task.Of course, one can manually identify it by the direct visual analysis.However, even for a human, there are multiple edge cases (see Figure 2) when the decision can be subjective, which challenges the interrater reliability of the classification.
Automation of this task is possible.A simple automatic algorithm, introduced by Smirnov & Shevchenko (2013), which is based on the analysis of how many times the resonant angle crosses the limits of −π and π, and the duration of its stay within these limits, can achieve 80% accuracy (Smirnov & Shevchenko 2013).More complicated analysis can be done by using Lomb-Scargle periodograms (Lomb 1976;Scargle 1982;Zechmeister & Kürster 2009).However, this method is not perfect too and achieves only 90% accuracy (Smirnov & Dovgalev 2018).Moreover, it requires special knowledge and the implementation of complex algorithms, which is a timeconsuming task.Carruba et al. (2021b) tried to use a neural network to perform such a classification.It required training and fine-tuning, but the overall accuracy was still limited to approximately the same numbers.
In this paper, it is shown that it is possible to develop a fully automatic solution using LLMs with an exceptional accuracy (almost 100% on the example task) without prior knowledge, a testing set, fine-tuning, complex algorithms, and other things requiring direct human involvement.The new model gpt-4vision-preview1 is used to analyze the visual patterns and classify resonant or nonresonant asteroids.

Experiments
Let us start with the formulation of the problem.Given an image of the resonant angle (i.e., Figure 1), the goal is to determine what pattern it has.In other words, the inputs are images and the outputs are labels.No other data, such as a training set or knowledge of algorithms, are required.
There are three possible outcomes: pure libration, which means that the resonant angle librates all the time, circulation, which means that the resonant angle circulates all the time, and the mixed (transient) case, which means that the resonant angle librates for some time2 and circulates for the rest of the time (Smirnov & Shevchenko 2013).
Let us design an experiment to test the capabilities of the LLM in this task.The most well-known and widely used LLM is ChatGPT by OpenAI.It supports images as inputs and can perform a wide range of tasks, including classification.Thus, conducting initial tests with it is a logical step.The goal of such an exercise is to check whether the LLM can differentiate between pure libration, circulation, and the mixed (transient) case.
Preliminary experiments using the online version of ChatGPT (accessed on 2024 February 14) indicate that after some prompt engineering (see the final prompt in Appendix), the LLM can differentiate between pure libration, circulation, and the mixed (transient) case.In five requests, it was able to correctly identify the outcome.Therefore, it is plausible to run a fully automatic experiment with OpenAI's application programming interface (API).
The design of the simulation is as follows: 1.As a test set, let us take (1) 10 asteroids trapped in a pure libration, (2) 10 asteroids trapped in a transient libration, and (3) 10 nonresonant asteroids, but with a semimajor axis close to the resonant value.2. For each asteroid, a resonant angle is calculated, filtered, smoothed, plotted as a time series, and saved as a PNG file.Filtering and smoothing are required to exclude  The results are in Table 1.It contains the classifications results performed by the resonances package (Res.), the author (Hum.), and three consecutive iterations by GPT-4.In total, there are 90 results from the LLM and 30 classifications performed by the resonances package and the author.
From Table 1, it follows that there are three edge cases.The resonant angle of the asteroid 1011 librates from 40,000 to 58,000.Thus, formally, it should be classified as nonresonant because the libration time is low: the package resonances demonstrates exactly this result.However, from visual analysis of the Figure 2(a), it is clear that the asteroid is trapped in the resonance.Therefore, its transient status is plausible.
For the asteroids 13944, it can be classified either as transient or nonresonant.Figure 2(b) shows that the resonant angle librates, but it has only one libration cycle.Therefore, the classification is ambiguous.The same applies for the asteroid 22947 that has only one break of the libration, which can be an artifact of filtering or smoothing.Overall, it is valid to assume that the classification of these asteroids by the LLM is plausible.
In two cases, the LLM classification differs from the human classification: the first iteration of the asteroid 2640 (Hällström) and the third iteration of the asteroid 17925 (Dougweinberg).Notably, the human classification for the latter differs from the one produced by the resonances package: the package identifies this asteroid as transient, whereas the actual libration time is lower than 20,000 yr.However, this does not impact the overall results from the LLM, as the outcomes for the same objects in the other two iterations are correct and hence, the average values are correct too.The resonant angles of these asteroids are shown in Figure 1(e), (h).
The analysis of Figure 1(a) indicates that the resonant angle librates all the time.Thus, the classification of the asteroid 2640 as transient by the LLM is an artifact.Note that the second and the third iterations by GPT-4 return correct results.On the contrary, for the asteroid 17925 (Figure 1(b)), the resonant angle librates from 0 to ≈18,000 yr, then circulates.The behavior is the same as for the asteroid 13944, which also has only one libration period.Therefore, the classification of the asteroid 17925 as transient by the LLM is expectable and plausible in general.
To sum up, if one assumes that edge cases could be classified either way, the LLM is able to differentiate pure libration, circulation, and the mixed (transient) case with accuracy, precision, and recall equal to 100%.
Note that there are two misclassifications, attributable to the probabilistic nature of LLMs and the problem of "hallucinations."For simplicity, one can assume that an LLM can behave like a human being who can make mistakes.There are some techniques targeted at this problem: self-evaluation of the results (Miao et al. 2023), context optimization (Shi et al. 2023), refining and reassessing the results (Shridhar et al. 2023), and rereading the prompt (Rawte et al. 2023;Xu et al. 2023).In this paper, multiple iterations (3-5) were performed for each classification task, and the results were averaged.This approach helped eliminate random artifacts, such as the cases of asteroids 2640, 13944, and 17925.The actual number of the iterations should be studied in each research separately.If the accuracy is still unacceptable, one can explore other techniques to improve accuracy, for example, an ensemble of models, the use of multiple models with different architectures for initial classification or further validation of results, and the use of various prompts.However, for the problem discussed in this  paper, the results are already satisfactory, and no extra steps are deemed necessary.The full code of the simulation is available on GitHub as a Jupyter notebook.3

Discussion and Conclusions
The results of the simulation indicate that the LLM is able to differentiate between pure libration, circulation, and the mixed (transient) case, achieving accuracy, precision, and recall of 100%.This process requires no training, fine-tuning, or coding beyond writing the appropriate prompt in natural language.Furthermore, it is possible to apply this method to other problems requiring human decision-making, completing tasks in just a few minutes without writing any code.
Clearly, the performance metrics achieved in the example above are exceptional.It is reasonable to expect that the LLM may not achieve such high levels on a larger sample, in more complex tasks or edge cases.
The most common patterns of resonant angles are pure librations with a small amplitude for two-body MMRs (Figure 1(c)) and a larger amplitude for three-body MMRs (Figures 1(d)-(f)), pure circulation (Figures 1(a)-(b)), and the mixed cases when the periods of librations are replaced by the periods of circulation (Figures 1(g)-(h)).For the librations, the usual value of the amplitude is less than π, even for three-body MMRs (Figure 1(f); Smirnov & Shevchenko 2013;Smirnov & Markov 2017;Smirnov & Dovgalev 2018).In these cases, the LLM achieves high accuracy and differentiates different behaviors.
Certainly, there might be some edge cases that can affect the overall metrics.The resonance center can drift.However, the tests demonstrate that the provided prompt covers these cases by default because it does not require the notion of the center of libration.There might be a larger amplitude that exceeds the limits of −π and π.In this case, there can be several solutions: (1) increase the limits and reapply the classification, (2) apply filters like a moving average, or (3) adjust the prompt to cover this behavior.If a human being can differentiate the case and formulate the principle as a rule, this rule can be embedded into the prompt.
Another edge case can be related to the undersampling of the libration cycles.If the output image has less than one cycle, the result can be ambiguous.However, for the MMRs in the solar system, the interval of 100,000 covers multiple cycles.For other types of resonances, this might not be the case.However, it can be either covered by a longer integration period or by additional investigation of the real data and accurate specification of the patterns in natural language.
The accuracy depends on the quality of the images uploaded to the LLM.One can imagine that if an image is blurred, the metrics will be lower.This case was not covered in the paper because the design of the experiments and the software used for the classification allowed the production of high-quality images.The size of each image is small (approximately 800 × 200 pixels; 20 Kb).Thus, it does not require significant storage or network resources.
The last type of edge case is the real borderline, when even a human can make a mistake.In this case, the LLM can make a mistake too.However, this is a problem with the methodology or the design of the experiment, not the tool used for the classification.
The proposed method has several disadvantages.The first one is that it is not robust.LLMs can "hallucinate," providing incorrect outputs (Rawte et al. 2023).However, this can be mitigated by (1) performing multiple iterations and averaging the results, and (2) employing various models.
The second disadvantage is the cost associated with LLMs.In this paper, GPT-4 is used, which is proprietary.This can be a problem for some studies when researchers have to classify millions of items.However, there are open-source alternatives -the models, such as llama or mistral, that have acceptable quality and could be used for a similar task.
Another factor to consider is the amount of computational time required to perform the classification.In fact, for the experiment with ChatGPT, the time required for classification is negligible-a few seconds.Considering that it is possible to classify multiple images in parallel (this simply requires making multiple API requests), the overall time for classifying a large data set is the same or almost the same.However, this depends on the availability of this LLM, which cannot be guaranteed.
As mentioned earlier, it is possible to use open-source models.They require significant computational resources for the initial training (especially in terms of GPU units).However, a scientist can utilize already pretrained models, such as llama, llava, kosmos-2, or mistral. 4The classification (inference) using a pretrained model is fast.For example, the speed of the LLM llama (7B) is 36.6 tokens per second on a single A800 GPU unit (Zhao et al. 2023). 5The average speed of other models is quite similar, varying depending on the model, its size and dimensions, the training set, the hardware, the prompt used, and other parameters.
However, what is truly important is that there is no difference in the type of problem to be resolved.In other words, it does not matter which pattern should be recognized and how complex it is-an LLM takes an image as input and processes the text.The size of the image and the length of the text affect the speed, whereas the content does not.Note that the classical ML approach may require multiple preprocessing steps, such as smoothing or filtering, before the actual analysis, which is unnecessary for an LLM, potentially boosting the overall speed of classification.
of the OY axis are -pi and pi.The resonant angle cannot exceed these limits.
It is known that if the resonant angle librates, then the asteroid is trapped in the resonance.Librations mean oscillations, like sine.It means that the curve is within some limits (i.e., +2, or +1) and does not come close to the borders (-pi and pi).
The opposite situation is when the resonant angle circulates.It means that the curve is not limited and can reach the borders of the plot.In our case, if the resonant angle is greater than pi or less than -pi, then we add or subtract 2pi to the resonant angle to make it within the limits.Therefore, in the case of circulation, the pattern will be like linear curves parallel to each other.
I want you to assess visually whether the resonant angle librates if you were a human looking at this image.There are three possible cases: 1.The resonant angle librates all the time (from 0 to 100,000).Then you should reply "pure." 2. The resonant angle could librate some significant time, but at other times it circulates.Let's assume that by significant I mean 20,000 yr.In this case, you should write 'transient'.
3. Otherwise, when the resonant angle circulates most of the time, please write "nonresonant." As an output, I want you only to print one word: pure, transient, or nonresonant.If you are not sure, write 'I do not know.'You will get tips if you perform the identification correctly.

Note.
The columns are as follows: Ast.-asteroid number; Res.-the result of the classification performed by the resonances package; Hum.-result of the classification by the author; LLM, LLM2, LLM3-three consequent results of the classification by the LLM.The results are as follows: p-pure libration; t -transient libration; n-nonresonant.Edge cases are marked with a slash (i.e., t/n: either transient or nonresonant).The bold font indicates the cases when the LLM classification differs from the human classification or previous results.The italic font indicates the cases when the LLM classification differs from the human classification.

Table 1
The Results of the Classification