Machine learning to detect, stage and classify diseases and their symptoms based on inertial sensor data: a mapping review

This article presents a systematic review aimed at mapping the literature published in the last decade on the use of machine learning (ML) for clinical decision-making through wearable inertial sensors. The review aims to analyze the trends, perspectives, strengths, and limitations of current literature in integrating ML and inertial measurements for clinical applications. The review process involved defining four research questions and applying four relevance assessment indicators to filter the search results, providing insights into the pathologies studied, technologies and setups used, data processing schemes, ML techniques applied, and their clinical impact. When combined with ML techniques, inertial measurement units (IMUs) have primarily been utilized to detect and classify diseases and their associated motor symptoms. They have also been used to monitor changes in movement patterns associated with the presence, severity, and progression of pathology across a diverse range of clinical conditions. ML models trained with IMU data have shown potential in improving patient care by objectively classifying and predicting motor symptoms, often with a minimally encumbering setup. The findings contribute to understanding the current state of ML integration with wearable inertial sensors in clinical practice and identify future research directions. Despite the widespread adoption of these technologies and techniques in clinical applications, there is still a need to translate them into routine clinical practice. This underscores the importance of fostering a closer collaboration between technological experts and professionals in the medical field.


Introduction
A wide range of diseases across various specialisation areas can cause the emergence of motor symptoms.According to the International Classification of Diseases (ICD-10), many disorders affecting the nervous system are grouped under the term 'Movement disorders' (World Health Organization 1992), suggesting that information necessary for detecting pathologies in these areas can be directly inferred from motor manifestations.Modifications to movement-related functions include, but are not limited to, changes in joint mobility or muscle strength.These may arise as the byproduct of diseases affecting other systems (e.g. the nervous or the cardiovascular system) or syndromes associated with cognitive or behavioral disturbances.Consequently, motor symptoms serve as established indicators in a variety of diagnostic procedures across various specialties.

Use of technology for motor symptoms detection and classification
For over sixty years, technology has been an integral part of the motor symptoms determination process (Dierssen et al 1961).The new century has witnessed a significant increase in the availability and usage of devices able to collect such information, benefitting from trends in miniaturization, reduced energy consumption for data recording and processing, and advancements in wireless communications.These elements altogether have virtually eliminated all barriers to capturing data in natural environments (Vijayan et al 2021).
In this context, the use of inertial measurement units (IMUs) for clinical applications is a relatively recent development (Morris and Paradiso 2003).Thanks to miniaturisation and technological advancements, IMUs have become increasingly small and lightweight, making them suitable for integration into wearable devices in clinical settings: IMUs have been employed to detect and classify tremors in patients with neurological disorders that affect movement; they can also monitor changes in gait patterns over time, and this proves valuable in assessing the effectiveness of treatments for mobility impairments resulting from conditions like stroke, Parkinson's disease or spinal cord injury.In addition, IMUs can be worn with minimal encumbrance for extended periods, enabling long-term monitoring of motor symptoms and gait patterns, and this provides valuable insights into disease progression and response to treatment.

Plugging in machine learning
The advancement in computational power has facilitated the integration of processing functions at the system level, allowing the incorporation of machine learning into the process (Haick and Tang 2021).In particular, machine learning algorithms trained with IMUs data have shown promising capabilities in detecting and classifying specific motor symptoms associated with different diseases (Patel et al 2009).This technological approach holds the potential to enhance patient care by providing more accurate and objective assessments of motor symptoms, thereby enabling the tracking of disease progression and aiding treatment decisions (Lim et al 2022).In addition, the use of wearable IMUs for data collection enables continuous monitoring of patients, offering a more comprehensive representation of their symptoms over time.
While machine learning has the potential to greatly improve the detection of diseases and the classification or prediction of symptoms, there are also several limitations and challenges to consider (Kubota et al 2016).One key challenge is the need for large and diverse datasets to train and validate the machine learning algorithms.Inertial sensor data can be highly variable, and it can be difficult to collect enough representative data to accurately describe all possible variations that may be encountered in real-world scenarios.In addition, the accuracy of the machine learning algorithms can be affected by factors such as sensor placement, device performance, and the presence of noise or interference in the data.It is also important to consider ethical and privacy concerns when collecting and analyzing sensitive health data and to ensure that the technology is used in a responsible and transparent manner.

Aim of the review
This work aims at systematically reviewing the literature published since 2012 focusing on the utilization of machine learning to support clinical decision-making via the analysis of data coming from wearable inertial sensors, either alone or in combination with other sources.By quantitatively analyzing a set of selected papers through the lenses of four research questions, the review article aims to uncover the trends, perspectives, strengths and limitations within the extensive body of literature pertaining to the use of machine learning in clinical practice.Given the observed diversity of application fields and decision-making objectives of the involved studies, we decided to leave aside the analysis on the numerical performance of the different adopted solutions.

Methods
The analysis of the existing literature on the topic has been carried out by formulating four key research questions that define the main instrumental and clinical aspects of the previous research.Additionally, four pertinence indicators were defined and employed to guide the inclusion and exclusion criteria.

Research questions
The following research questions aim to summarize the current status of the research advancements in integrating machine learning and inertial measurements for clinical applications.

RQ1-which pathologies, which movements?
This research question focuses on characterizing the pathologies typically studied with the use of IMUs.Moreover, together with the investigation of the clinical conditions, it explores the movements under analysis and the experimental protocols that are adopted.

RQ2-which technologies and setups, which protocols?
The second research question investigates the most commonly used technologies in the literature.It examines the types and number of IMUs used, as well as the possible inclusion of additional measurements coming from different sensors.The protocol for sensor placement is also analysed.

RQ3-which data processing schemes, and which features?
This third question delves into the data processing workflow applied to the raw inertial data, before feeding the relevant features into the ML model.This includes all the steps from data conditioning (e.g.denoising filters) to dimensionality reduction (e.g.feature extraction and feature selection).
The last research question investigates the ML techniques employed for data analysis.It describes the techniques used and examines the prediction objectives, including the control on possible comparisons among different techniques or training strategies.

Searching strategy 2.2.1. Chosen databases
For the systematic review, three databases were queried, namely Elsevier Scopus, Clarivate Web of Science and Pubmed.

Boolean expression
The query string, in its Pubmed form, is reported in the following This query was converted into the corresponding form for each of the other databases, and run on September 14, 2022.
The main feature of the querying strategies was that of including all the research papers that show a combination (either in the abstract or in the title) of the concepts of machine learning, pathology, classification and/or staging and IMUs at large.The three results lists were joined using the doi identifier as a key, thus selecting an article for the successive phase if it was present in at least one of the three research lists.

Inclusion and exclusion criteria
From the resulting overall list, additional filters were applied as in the following.Articles listed as review, book chapter or conference paper as document type were removed from subsequent analyses, to single out results that reported original research in a complete form.The identification of the search results to be excluded was made through the keys in the query results (e.g. the document type key in the Scopus results); additional checks on the remaining results were carried out manually to remove any article that met the exclusion criteria, and any duplicate.

Pertinence assessment indicators
To further refine the search results, a set of four pertinence assessment indicators (PAI) was defined.While these indicators do not directly reflect the quality evaluation of a research, they were introduced to quantify the relevance of each article within the scope of the review.Each article was assigned an integer value for each PAI, resulting in an ordered set of pertinence classes aligned with the four research questions described in the previous section.All the included papers were evaluated in terms of these indicators by at least two authors of this review, according to the rules given in the following.In cases where there was a non-univocal result from one particular indicator, the evaluation was discussed among all the authors.PAI 1 -dataset characteristics, based on the sample size, in terms of number of patients involved in the study, and presence of an adequate control group.As this review focuses on clinical applications, the absence of a sample of patients in the study directly determined the exclusion from the successive analysis.Levels were thus defined as follows: 0-absence of any patient group.
1-small number of patients (N < 10) and no control group.
2-small number of patients (N < 10) with a control group.
3-substantial group of patients (10 N < 30) and no control group.

4-any of the following:
-Substantial group of patients (10 N < 30) with a control group -High number of patients (N 30).
PAI 2 -clinical impact, based on the clinical relevance and impact of the study.If the technique for classification was not instrumental for the diagnosis staging or classification of the pathology, the study was excluded from further analysis.Levels were defined as follows: 0-no classification or detection of clinical interest.
1-indirect use of the data/features to capture elements of low clinical relevance 2-indirect use of the data/features to detect or classify pathologies.
3-direct use of the data for the classification or detection of pathologies, without connection to clinical scales.
4-direct use of the data for the classification or detection of pathologies, with connection to clinical scales.PAI 3 -use and readiness level of IMUs, based on the role that technology plays in the study, and the readiness level of the technology that was used for it.This element was considered as an indirect indicator of the study replicability.If no inertial technology was used in the study, this was excluded from successive analysis.Levels were defined as follows: 0-absence of any IMU sensor/technology.1-presence of at least one IMU sensor, without a thorough description of the protocol.

2-presence of at least one IMU sensor with a low technology readiness level (TRL).
3-presence of at least one IMU sensor with a high TRL (e.g. a commercial sensor), used in conjunction with other sensors.4-presence of at least one IMU sensor with a high TRL, without the use of additional sensors.PAI 4 -impact and role of ML, based on the importance of the Machine Learning techniques on the final results.If no ML was used in the analysis pipeline, this caused exclusion from further evaluation.Levels were defined as follows: 0-absence of any ML technique.
1-single ML technique that is only partly related to the results (e.g.used only for feature selection/ extraction).
2-single ML technique with a direct effect on the clinically relevant results.

3-comparison or composition of different ML techniques for predicting clinically relevant information.
The pertinence indicators for each paper were determined from the independent evaluations, or consensus values when needed.A study was then included in the further review process if both the following conditions were met: The manuscripts close to the decision boundary for the second condition (i.e. total sum of the PAI i equal to 10 or 11) were discussed by all the authors for confirmation of their inclusion or exclusion in the analysis.

Manuscript analysis
The selected articles from phase I were then analysed in relation to the different research questions.Each manuscript was reviewed by at least one author, and the main takeaways were categorized into four classes, corresponding to the four research questions.Based on the inclusion criteria and the fulfillment of the first condition on the PAI i , each manuscript provided at least one key finding for the RQs.Following these analyses, the RQ answers derived from the articles were discussed and summarized by all the authors.

Results
This section will be organized as follows: the first subsection will present the analysis of the studies resulting from the search query (I).It will include information on the total number of studies identified, the general characteristics of those meeting the inclusion criteria on the individual PAIs and those that surpassed the overall PAI threshold; section 3.2 will focus on the analysis of the individual PAIs for the included studies, highlighting the distribution of the values assigned to each PAI; section 3.3 will provide an analysis of the studies that surpassed the threshold, with specific elements regarding each RQ.

Main characteristics of the phase I studies
A total number of 825 studies resulted from the query (see figure 1 for details on included studies).Out of these, 86 were excluded because they were not falling in the research article category, or not appearing in a Journal.Out of the remaining ones, 522 were also excluded because they had at least one zero for the PAI values.The total sum of PAI values was thus calculated on the remaining 217 research articles.Figure 2 shows the overall PAI value for this group.146 articles yielded a value above the threshold, and underwent Phase II of the analysis.

On PAI 1
The vast majority (higher than 70%) of the above-threshold studies included a sample population size that was relevant for clinical classification purposes.A few studies were admitted to Phase II even if they obtained data from a reduced sample size (e.g.(Mostafa et al 2021)).

On PAI 2
Most studies that were admitted to Phase II made direct use of data for the classification, detection or staging of a disease or a symptom associated with it (aggregated percentage of 88% for PAI 2 3), with more than one third of them making direct reference to clinically agreed scales (e.g.

Analysis of Phase II studies
Around three quarters of the papers were published in journals associated (also) with the Medicine subject area in Scopus SJR.Approximately half of the publications included Engineering and Computer Science, thus highlighting the inherent interdisciplinary nature of the topic.Other represented areas included Health Professions (approximately 22%) and Neuroscience (19%).In terms of specific Categories within each subject area, and disregarding miscellaneous ones (for each subject area, the category that groups together topics that cannot be assigned to a specific one), Electrical and Electronic Engineering and Health Informatics appeared as the most common ones, accounting for 30% and 25% of the papers, respectively.An interesting observation regarding the Medicine area is the absence of any specific clinical specialty category among the most prevalent ones.Only Clinical Neurology and Rehabilitation appeared in around 8% and 7% of the papers, respectively.Notably, the majority of papers published in journals with Medicine as one of the subject areas were categorized under the miscellaneous group.

RQ1-pathologies and movements
Regarding diseases, all the studies pertained to one of the following classes, defined according to the main affected system or organ, and grouped according to ICD-10: Among the identified categories, more than two thirds of the studies targeted neurological conditions, and more than half of them were related to Parkinsonʼs Disease. Figure 3 includes the distribution of the studies across the categories, and a detailed view on specific clinical conditions studied.
For what concerns the analyzed task, all the studies were conducted under experimental conditions that pertained to one of the following categories: • Gait: any form of locomotion, including straight overground walking (see e.More than one third of the studied involved gait, and approximately the same share appeared for the activities of daily living.Less represented were the non-functional exercises.

RQ3-data processing and features
The selected studies mostly relied on various sets of features to compactly describe the data gathered from the IMU sensors; a minor part of the studies skipped this step (Mannini et al 2016, Camps et al 2018), as this was implicitly performed by the machine learning part (such as in the case of deep ML models).When the feature extraction part was instead present, the extracted features were fed to the ML models described in section 3.3.4.Before feature extraction, the vast majority (higher than 65%) of the studies that provided details on the processing pipeline applied a standard denoising scheme for preserving relevant information, which was mostly based on low pass filtering with task-dependent cut-off frequencies, adjusted on the frequency content of the recorded signals.Feature sets were then extracted from fixed-length segments of the signal through a dedicated windowing procedure in around three quarters of the studies who provided details on this.A minor, yet not irrelevant, portion of the pipelines (around 20%) provided details on dimensionality reduction criteria for feature selection (see e.g.Jeon et al 2017, Shawen et al 2020).Regardless of the standard pre-processing steps, the extracted feature sets were here assigned to one of the following categories: • Aspecific: general features that statistically describe the signal either in the time domain or in transformed domains, such as amplitude or spectral distribution parameters, without a direct specific link with the characteristics of the studied pathology, task, or type of signal.
• Handcrafted: features that are defined to capture fundamental or relevant aspects of the targeted pathology, of the used motor task, or of the recorded signal (e.g.power in the tremor frequency bands for PD, or spatiotemporal gait parameters for walking tasks).
• Raw: the denoised raw version of the signal is used as input to the deep ML model.
The selected studies used in comparable proportion sets of either aspecific (43%) or handcrafted (53%) features, with few (6%) using a mix of them Concerning the specific nature of the extracted features, a focus on the most common scenario found in the literature (i.e. the classification of Parkinson's disease or its symptoms) shows that there are no clear trends on the nature of the information that is captured from IMU data, with a trend towards the extraction of a large set of heterogeneous features.In particular, more than 60% of the studies extracted amplitude-based features (such as ranges, averages, parameters of variability, peak values) in the time domain or in the frequency domain (mean frequency, energy in specific spectral ranges, ...), and more than 20% included also indicators of complex dynamics (such as sample entropy or approximate entropy).Just a reduced share (less than 25% of the studies) limited to one specific domain.

RQ4-machine learning techniques and classification outcomes
-Starting from the features set described in section 3.3.3,the aim of the ML techniques adopted within the selected studies was either a prediction or a classification problem (see figure 5).The final aim of the ML models with reference to the disease described in section 3.In one interesting case, performance was evaluated across different environment conditions entailing the same tasks, i.e. in real world as compared to lab (Rehman et al 2022).For what concerns the used ML techniques, all the studies adopted at least one method that pertained to one of the following categories: • ANN : all kinds of non-deep artificial neural networks (Oung et al 2017, Hssayeni et al 2019), including ANFIS.The right panel of figure 6 shows the relative proportion of the adopted ML techniques.Vector machines and ensemble learning models resulted as the most used ML techniques.
Regarding the specific objective of the ML models, around 42% of the studies aimed at the classification of a disease (e.

General discussion
The majority of studies focusses on monitoring Parkinson's Disease, highlighting the potential of combining non-intrusive wearable technologies with machine learning for staging and symptom detection.This focus can be attributed to the well-defined motor symptoms associated with Parkinson's disease, making it a suitable target for analysis using wearable devices.However, there are relatively fewer studies that investigate other highincidence diseases that affect the central or peripheral nervous system, such as Stroke, Alzheimer's Disease, or Muscular Dystrophies, despite the presence of motor symptoms in these conditions.The scarcity of studies on these diseases may be mainly attributed to the inherent variability of motor symptoms, of their clinical course and progression, potentially making the practical use of wearable technologies combined with ML for these specific conditions more challenging.Interestingly, some studies have identified psychiatric disorders as an area of interest.This suggests the potential utility of these technologies in capturing higher-level behavioral information beyond motor tasks.The inclusion of psychiatric disorders in the research focus underscores the broad application of wearable devices beyond traditional neurological conditions, and expanding their potential impact on understanding and managing a wider range of disorders.
Regarding the use of sensors, more than half of the studies included in the review used only one IMU.This finding suggests that a single-sensor setup generally provides sufficient information for obtaining satisfactory results when combined with the selected machine learning algorithms.This preference for a simple and practical setup is particularly relevant in clinical contexts and with patients, where the usability of the system plays a fundamental role.In terms of the technologies adopted and their readiness level, a significant proportion of the studies used off-the-shelf products that incorporated IMUs: this indicates that commercially available sensors, which can be easily integrated into wearable devices such as smartwatches, wristbands, or smartphones, are generally considered reliable and preferable due to their convenience and simplified setup.Although the placement of IMUs varied among the studies, there was a tendency towards simplified configurations.Approximately one-third of the studies focused on placing the IMU on the upper limb, while one-fifth placed it on the trunk only.This further supports the preference for simplified setups.In contrast, only a small share of the studies incorporated IMUs in multiple locations across different parts of the body, suggesting that more complex sensor configurations, including the use of multiple sensors or custom solutions, were not perceived as adding significant information to improve the implemented ML techniques.In summary, the overall findings concerning the technological aspects indicate a preference for a simplified setup, involving a small number of commercially available sensors positioned in the upper part of the body.This approach is considered practical and effective for plugging in ML techniques.
For the processing pipeline, the methods for feature extraction showed common choices, both in the case of aspecific features, that have no direct physiological meaning, and in the case of handcrafted features, that in most of the cases represent standard parameters typically investigated in the clinical research, such as the spatiotemporal features of gait.Other processing steps in the feature extraction , when present, involve denoising filters and unsupervised feature selection and reductions.When advanced deep learning techniques were employed, feature extraction was embedded within the machine learning model itself.Considering the high number of data points that can be recorded by IMU sensors over an extended period of time, the engineering of ad-hoc features was found to be less critical.Only half of the analysed studies used engineered features, and this was often due to the integration of the ML model in a processing pipeline derived the clinical research, which includes e.g.temporal segmentation and parameter calculation.Instead, researchers have leveraged the wealth of data captured by IMUs and incorporated established feature extraction methods, both generic and tailored to specific clinical parameters.This approach allows for efficient analysis and interpretation of the sensor data, with the integration of ML models playing a central role in exploiting the information contained within the features.Moreover, the current technologies allow for an easy processing of a large number of features; considering this, most of the choices that have been found in this review adopt a conservative strategy, in which the ML model works on several different features that describe most of the characteristics of the incoming signal, without the need for increasing a priori knowledge or computational complexity.It is predicted that this trend may be even more pronounced in the future, given the increased popularity of deep learning pipelines, where the process of feature extraction can be integrated into the ML model.
In terms of the problem ML seeks to solve, there is no clear tendency towards a specific use.Classification and regression techniques are applied equally to specific symptoms or general aspects of the pathology.This indicates that the current availability and variety of ML techniques allow for their flexible application across different clinical scenarios.ML methods can be adapted to specific clinical needs, highlighting the versatility of data recorded from IMUs, mainly consisting of features derived from acceleration and angular rate time series.Regarding the adopted ML technique, deep learning models were, at the time of the query execution, still infrequently used, suggesting that the trade-off between data structure (i.e time series from a few channels) and the computational burden of DL is unbalanced, even in large datasets.Most studies employ multiple ML approaches and compare their results.Among the individual ML techniques, there is a clear preference for ensemble learning models, despite the continued popularity of vector machines.This indicates that in the specific field analyzed in this review, there is no consensus or prior knowledge regarding which technique may outperform others.Even though beyond the aims of the present systematic review, it is worth noting that the variety of approaches in this field opens up opportunities for disease-specific meta-analysis studies focusing on the quantitative performance assessment of individual ML techniques.
To uncover potential relationships among the results obtained for the different research questions, an analysis using association rules was conducted, aiming to identify the most frequent co-occurrence of rules.As shown in figure 7, the use of machine learning for diagnosis and classification of nervous system diseases is often linked to the absence of additional sensors.This suggests that the information conveyed through intertial sensors alone is deemed sufficient for such classification or regression tasks.Secondly, when gait is chosen as the task for classification, researchers tend to utilize inertial sensors placed on the trunk without employing additional ones.This implies that trunk-based inertial sensors offer valuable information for gait analysis and classification, eliminating the need for other sensor modalities.Lastly, in terms of processing techniques, the feature extraction process typically occurs after temporal segmentation and/or filtering/denoising processes.This indicates that researchers often apply temporal segmentation or filtering/denoising techniques as preprocessing steps to refine the data before extracting relevant features for machine learning.
The studies included in the analysis were categorized based on the subject areas of the journals in which they were published.Using the Scimago Journal and Country Rank (SJR), a total of 15 subject areas were identified, as shown in figure 8. Surprisingly, despite the focus of the analyzed studies on the application of machine learning techniques for detecting or staging pathologies or symptoms, the majority of the articles were published in journals primarily associated with the scientific or technical field.In contrast, only a small percentage of the articles were published in journals related to the clinical field.This finding suggests that while machine learning techniques have garnered significant attention in clinical applications across various specialty areas, there are still barriers that need to be addressed to facilitate their widespread adoption in clinical practice.

Conclusions
The integration of machine learning algorithms with lightweight inertial sensors has emerged as an important topic in the current scientific literature.The outcomes of this review work show that, for a variety of different pathologies, these technologies can have a wide impact on the detection and staging of the patients' conditions.It is however important to acknowledge that a solution that is universally accepted in the clinical practice is still to be fully developed.Nevertheless, the vast majority of the studies under analysis highlight that the integration of a wearable sensor, often directly included consumer technological objects (e.g.smartphones, wristbands), with techniques that learn by experience can be a powerful tool in extracting information from the patient's behaviour during the activities of daily living, with little or no influence on the freedom of movement and quality of life.Moving forward, further interdisciplinary collaboration between researchers, clinicians, and technology developers is essential to address the remaining challenges and bridge the gap between scientific advancements and practical clinical applications.

Data availability statement
No new data were created or analysed in this study.

Figure 1 .
Figure 1.Article type of the query results.

Figure 2 .
Figure 2. Left diagrams: histogram of PAI i values for the above-threshold papers; right diagram: overall PAI score for papers with nonzero values, blue highlights papers with overall score above the threshold.
g. Aich et al 2018, Cannière et al 2020, Trabassi et al 2022), curved walking, turning (Rehman et al 2020, Pardoel et al 2021) and treadmill walking (Romijnders et al 2022); • Upper Limb (UL): any form of non-functional exercise that specifically targeted movements of the upper limb or hands (see e.g.Huo et al 2020, Kim et al 2020, Park et al 2021); • Lower Limb (LL): any form of non-functional exercise that specifically targeted movements of the feet (Rovini et al 2018) or lower limbs in general (Rovini et al 2020); • Posture: any form of task that requires the maintenance of a specific posture of the whole body (e.g.standing balance, (Memar et al 2017, Sotirakis et al 2022)) or of a body part (e.g.keep the arms extended in front of the body, (Xing et al 2022)); • Rest: any condition in which the experiment is carried out with the subject in resting state condition: e.g.rest tremor evaluation (de Araújo et al 2020), seated positions, lying supine positions, or sleep analysis; • Sit-To-Stand (STS): any experimental condition where the task or part of the task involves standing up or sitting down on a chair (see e.g.Hubble et al 2016, Tabatabaei et al 2020, Borzi et al 2022); • Activities of Daily Living (ADL): any experimental condition that targets functional activities or daily living ones (Lonini et al 2018, Li et al 2019), conducted in either controlled (Talitckii et al 2021) or uncontrolled conditions (Cook et al 2015, San-Segundo et al 2020, Wasselius et al 2021).
More than half of the studies included only one IMU sensor, and just around 25% of the studies used more than 2 IMUs.Regarding the TRL, for those studies that disclosed specific characteristics of the used sensors, approximately three quarters made use of off-the-shelf products (Thomas et al 2018, Moon et al 2020, Teufl et al 2021), including those IMUs that were embedded into smartwatches, wristbands or smartphones (Sajal et al 2020).A few studies were made based on datasets already available (e.g.Palmerini et al 2017, Demrozi et al 2020, Ghosh and Banerjee 2021, Noor et al 2021, Park et al 2022).Regarding the chosen setups, the outlook was rather diversified, with around one third of the studies using IMUs placed only in the upper limb portion of the body

Figure 3 .
Figure 3. Disease category and specific pathology nature break-up of the analysed studies.
(such as in Madrid-Navarro et al 2018, Rehman et al 2020, Donisi et al 2021).Only 10% of the studies relied on the raw version of the gathered signal, mainly for subsequent use in deep learning models (see e.g. Park et al 2017, Papadopoulos et al 2020, Wang et al 2022), where the feature extraction and selection is automatically assigned to dedicated layers of the deep neural network (Kim et al 2018).

Figure 4 .
Figure 4. Placement locations for the inertial sensors.
3.1 was either to predict or classify the samples based on specific symptoms associated with a disease (for instance, freezing of gait in Parkinson's Disease in Xia et al (2018) and Rodríguez-Martín et al (2017)) or the general presence of a pathological condition.All those studies adopting a regression model aimed at predicting symptoms, according to self-defined scores or accepted clinical scales.Those studies using ML for solving a classification problem were either aimed at classifying the symptoms (Darnall et al 2012, Ahlrichs et al 2015) or the presence of a pathology (Vos et al 2020, Yin et al 2021, Kovalenko et al 2022).

•
DNN : all kinds of deep learning (e.g.Bidabadi et al 2019, Hssayeni et al 2021), and long short-term memory networks (Sigcha et al 2020).• DT: all kinds of Decision tree learning (e.g. in Khodakarami et al 2019, Talitckii et al 2022), with the inclusion of the regression trees (CART).• k-NN : k-nearest neighbour models (Borzi et al 2021, Dai et al 2021, Mesin et al 2022); • LR: Logistic, linear regression and all kinds of discriminant analyses (LR, LDA, DA) such as in (Kheirkhahan et al 2016).• RF: all kinds of ensemble learning, including bagging, boosting, random forest (Kuhner et al 2020, Donisi et al 2021, Mirelman et al 2021).• NB: Naive Bayes-based and other Bayesian classifying schemes (e.g.Cuzzolin et al 2017, Mileti et al 2018), including those based on temporal prediction such as HMM; • SVM : vector machines, such as relevance vector machines, support vector machines (Bernad-Elazari et al 2016, Oliveira et al 2018, Dvorani et al 2021).• ZZ-Oth: models not categorised elsewhere (Watts et al 2021), and included hard clustering and fuzzy clustering with supervision (Lonini et al 2018, Dvorani et 2021).Around 56% of the studies applied more than one ML technique (Aich et al 2020) and usually compared the results (Samà et al 2018, Donohue et al 2020), while the other studies relied on the use of a single ML technique.

Figure 5 .
Figure 5. Aim of the ML procedure.
g. Costa et al 2016, Butt et al 2017, Williamson et al 2021), and 29% were designed to isolate the presence of a condition (such as a symptom) of clinical interest (for instance, Channa et al 2021, Sigcha et al 2021); approximately 35% of the studies used the ML models to predict scores or values of clinically relevant or agreed scales (e.g.Wan et al 2018, Borzi et al 2020, Ramesh and Bilal 2022).

Figure 6 .
Figure6.Feature extraction strategies and adopted ML algorithms.

Figure 7 .
Figure 7. Main association rules obtained.Connection line width is proportional to rule confidence.

Figure 8 .
Figure 8.Shares of categories according to the SJR classification of the journals where the selected papers were published.
Samà et al 2017).More than half of the above-threshold studies made use of commercially available IMUs alone, approximately one third used them in conjunction with other sensing technologies (Cole et al 2014, Johnson et al 2019, Chakraborty and Kishor 2022).Comparison of performance across different ML techniques was present in more than half of the studies that were analysed in Phase II.A few studies were above the threshold despite using ML technique only indirectly for classification or prediction (see e.g.Melin et al 2016, Hamy et al 2020).