Compartmental modelling in epidemic diseases: a comparison between SIR model with constant and time-dependent parameters

The compartmental modelling is one of the most widely used techniques in investigating the dynamics of infectious diseases. This modelling technique usually treats model parameters as constant. However, the parameters associated with infectious diseases randomly change following the changes in the conditions of disease transmission. As a result, the estimated parameters are often found over or under-determined by direct problems when some conditions change and the forecasting using direct problems often goes wrong. In this study, we estimate the model parameters over different time intervals by means of the inverse problem method and then solve the forward problem using these estimated parameters to compare them with the real epidemic data. We apply the method to estimate the parameters corresponding to Nipah virus, Measles and COVID-19 in the context of Bangladesh. The results suggest that the method helps to gain improved insights into epidemic scenarios corresponding to smaller time intervals. The results of the direct problem are found to fall apart fairly quickly from the real epidemic data as the length of the interval used in the inverse problem method increased.

The compartmental modelling is one of the most widely used techniques in investigating the dynamics of infectious diseases. This modelling technique usually treats model parameters as constant. However, the parameters associated with infectious diseases randomly change following the changes in the conditions of disease transmission. As a result, the estimated parameters are often found over or under-determined by direct problems when some conditions change and the forecasting using direct problems often goes wrong. In this study, we estimate the model parameters over different time intervals by means of the inverse problem method and then solve the forward problem using these estimated parameters to compare them with the real epidemic data. We apply the method to estimate the parameters corresponding to Nipah virus, Measles and COVID-19 in the context of Bangladesh. The results suggest that the method helps to gain improved insights into epidemic scenarios corresponding to smaller time intervals. The results of the direct problem are found to fall apart fairly quickly from the real epidemic data as the length of the interval used in the inverse problem method increased. * Author to whom any correspondence should be addressed.
Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Supplementary material for this article is available online
Keywords: COVID-19, Measles, Nipah, epidemic model, inverse problem, time dependent parameters, parameter estimation (Some figures may appear in colour only in the online journal)

Introduction
Knowing the transmission dynamics in details is quite useful when it comes to regulating the spread of infectious diseases, disease-related death, and socio-economic losses. Suitable mathematical models proven to be very effective in projecting realistic and quantitative predictions of the spread of infection, death tolls, and other relevant information useful for public health officials and policymakers [1][2][3]. The most common approach includes compartmental models from various aspects [4][5][6][7]. A compartmental system describes individuals in a community into compartments based on their infection status. The population is divided into sections with designations such as S, I, and R (susceptible, infectious, or recovered). People are free to move between compartments. The flow patterns between the compartments are commonly shown by the sequence of the labels; for example, SEIS stands for susceptible, exposed, infected, and susceptible again. Such models date back to the early twentieth century, with notable works by Ross [8] in 1916, Ross and Hudson in 1917 [9], Kermack and McKendrick in 1927 [10], and Kendall in 1956.
These models are often very useful in estimating several important future factors like how a disease spreads, the total number of people sick, and the length of an epidemic, as well as estimate epidemiological parameters like the reproductive number. However, because of the lack of a realistic epidemic scenario, compartmental modelling with constant parameter values does have certain shortcomings. The parameter values associated with the disease dynamics randomly change following the change in individuals and social preventive actions, seasonal variations, treatments, and new variants of the virus to name a few. These factors have an influence on the parameters since the coefficient fluctuates over time in real-world settings like Nipah, Measles, and the COVID-19 outbreak. Therefore, it is wise to use time-varying parameter values to predict epidemic disease dynamics in the future. Ignoring this important issue while modelling, the estimated parameters are often found over or under-determined by direct problem when some conditions change and the forecasting using direct problems often goes wrong.
To illustrate with an example, we use the following classical SIR model [10].
where, β is the transmission rate from infected to susceptible and α is accountable for the death and recovered population. Note that first two equations in the above SIR model is independent of the last one and hence, can be decoupled. The form implies that S(t) + I(t) + R(t) = constant = N, which indicates that the total population N remains invariant. We use the maximum likelihood fitting method [11] to estimate the transmission rate using the early days COVID-19 data from Bangladesh, as shown in figure 1. The figure 1(a) shows the likelihood profile, indicated as PL β , with estimated value (red dot) of the parameter corresponding to the first 50 days COVID-19 data of Bangladesh, whereas the plot 1(b) shows its corresponding fitting as well the projection using the estimated value of β for the next 30 days. It can be seen that the data followed the projection for about a week and then quickly fell apart. Further, figure 1(c) the likelihood profile with estimated value (red dot) of the parameter β corresponding to the data from day 51 to day 81. The transmission rate during this time went down after the implementation of lock-down for about 40 days, as a result the estimated β from the data of first 50 days over-determined the scenario and predicted a large number of infected by the end of day 81, which actually did not happen in real. Finally, figure 1(d) shows the piece-wise fitting from day 1 to day 50, and then from day 51 to day 81, considering that β 1 is the transmission rate corresponding to the first time interval and β 2 is the transmission rate corresponding to the last time interval. This figure gives a clear indication about the limitation of forward compartmental model when it is used for long-term prediction.
In this study, we use a somewhat different approach, as proposed by Marinov et al [12,13], to estimate the key parameters associated with three different epidemic diseases, namely Nipah Virus, Measles and COVID-19. The estimated parameters are then used to explore the true epidemic scenarios, as well as in estimating the key parameters associated with the disease dynamics. Besides, we analyse the efficacy of the SIR compartmental epidemic model in representing real epidemic data [14], which is expected to be very useful information for epidemiologists.

Inverse problem technique
In inverse problem, we do the opposite of the direct problem. The inverse problem is constructed based on the first two equations of the above system (1.1) and the following boundary conditions formed by using the available real epidemic data where the suffices i and f denote the initial and final data of the corresponding interval. Four conditions are required to solve the direct problem: the initial conditions (2.1) and the value of the parameters β and α. The obtained solution of the direct problem may not necessarily satisfy the terminal conditions (2.2). In that case, the system is said to be over or under-determined for known parameters value β and α. However, if the system is supplemented by the initial and terminal conditions (2.1) and (2.2), and the value of the parameters are not known, still the problem will be well-posed as the general solutions of the system will depend only on four conditions. The problem of this type is known as so-called inverse problem. Using this method, we can obtain (S, I) and (a, b) from the first two equations of the system (1.1) under the conditions (2.1) and (2.2). It is noteworthy to mention that, the problem still may be ill-posed even the conditions (2.1) and (2.2) are given. For random choice of S i , I i , S f and I f , there may be solution satisfying the system and all of the conditions (2.1) and (2.2). For this reason, we assume that the boundary data (2.1) and (2.2) have physical meaning and the problem is posed correctly after Marinov et al [12] and the solutions of the problem exist. In this paper, it would be more realistic to consider the parameters β and α as functions of time, that is, β = β(t) and α = α(t). We assume a data set D with entries S(t k ) and I(t k ) at some time moments {t 1 , t 2 , t 3 , . . .}. Further, we assume that β and α are piece-wise constant functions of time, as follows By solving the aforesaid inverse problem, we can estimate the values of β k and α k by using We adopt the inverse problem method developed by Marinov et al [12,13,15] to estimate the parameters β = β(t) and α = α(t). In order to use this method, we begin by introducing a sub-problem considering data values at two time moments t i and t f , where t i is the initial time moment, and t f is the final time moment. We then proceed by dividing the considered time interval t 0 = t i and t n = t f into n + 1 equidistant nodes {t 0 , t 1 , t 2 , . . . , t n } and then approximate the state variables S(t) and I(t) at these discrete time points. The step size is defined as η = tf−ti n and the equidistant nodes are t k = t i + kη for k = 0, 1, 2, . . . , n. The equations for S and I can then be discretized as follows where S k = S(t k ) and I k = I(t k ) are the approximations of the functions S(t) and As the governing equations are non-linear, hats are used in order to secure the second order of approximation O(η 2 ). We can calculate the values ofS k andĪ k using the basic Euler forward difference of the governing system as follows Here S 0 and I 0 are known from data and they are used asS 0 = S 0 andĪ 0 = I 0 . We will use an iterative method to estimate S and I under the assumption thatS k ,Ī k are known from the previous iteration, as mentioned above. Residuals of the above discrete equations could be minimized by minimizing the following function where ζ k and ψ k , respectively, represent the residuals of equations (2.4) and (2.5). Notice that Ω is a homogeneous quadratic function of ζ k and ψ k , suggesting that its absolute minimum is zero which is achieved only when ζ k and ψ k are both zero for all k = 1, 2, . . . , n − 1. Therefore, the solutions of the system (2.4) and (2.5) under the conditions (2.1) and (2.2), and the problem of minimization have one-to-one correspondence between them. With respect to S k and I k , the function Ω can be minimized as follows which gives the following difference equations for k = 1, 2, . . . , n − 1. We see that system is well-posed with 2n + 2 linear equations comprising four boundary conditions mentioned in equations (2.1), (2.2) and 2n − 2 difference equations stated above against 2n + 2 unknowns {S 0 , S 1 , S 2 , . . . , S n } and {I 0 , I 1 , I 2 , . . . , I n }.
In order to derive the relations for determining β = β(t) and α = α(t) in the considered time interval [t i , t f ], we redefine the function Ω as described in [12]. Finally, we solve the inverse problem comprising equations (2.1), (2.2), (2.7) and (2.8), corresponding to every disconnected time interval [t i , t f ], which is, however, a slightly different approach compared to the one proposed by Marinov et al [12]. The technique proposed by the authors is as follows. For example, there are 10 daily data points which is divided into subsets of uniform length of 3 days. The authors choose day 1-day 3 as the first interval and estimate the parameters. The sub-interval then increments by one and the second sub-interval comprises day 2-day 4, and so on. However, in our approach, the first interval comprises day 1-day 3, the second interval comprises day 4-day 6, and so on. This different approach allows us to estimate parameters over disconnected time intervals [t i , t f ], and solve the forward problem corresponding to these estimated parameter values. By doing this, we can compare the results of the forward problem with the real epidemic data. Note that the accuracy and loop exiting criteria in numerical coding are implemented using the concept outlined in [12]. We used MATLAB 2018a to implement the numerical algorithm. A copy of the code can be obtained by email on request from the corresponding author.

Application to Nipah virus
Nipah virus is a novel non-segmented RNA virus that belongs to the paramyxovirus family [16][17][18][19]. It is a zoonotic virus transmitted from animals to humans. This virus caused a viral encephalitis outbreak in Malaysia from October 1998 to mid-July 1999 [20], and cases have been reported in Bangladesh almost every year since 2001 [21][22][23][24][25][26]. In this section, we implement the inverse problem method as outlined in section 2 using genuine Nipah patients data of Bangladesh from 2001 to 2021 [27]. In this example, a two-years time-frame is used to estimate the parameters. Note that this time interval is chosen randomly. The effect of time intervals of different lengths are discussed in details later in this section and in section 4.
The graphs in figure 2 demonstrate that the transmission rate was higher in the first couple of years. And following the higher transmission, the death rate was also high around that time. The transmission then went down significantly for next couple of years and then went up again. We also calculate the basic reproduction numbers using the estimated values of the transmission rate and removal rate, which is shown in figure 3. The estimated parameter values for the last year (2021) under consideration are shown in table 1. Table 2 summarizes a comparison of our estimated basic reproduction number with the available one in the existing studies. This table shows that our approximations are in quite agreement with the existing results.

Application to Measles
The World Health Organization (WHO) and UNICEF release annual estimates based on reported data from member nations on the expected number of measles cases and measles-related deaths around the world. Total measles cases and fatalities were documented 7585 900 and 124 000, respectively. In 2018, there were approximately 9769 400 cases of measles reported, with 142 300 deaths [28]. Madagascar, Ukraine, Somalia, and Liberia have all recorded the most cases. Measles claimed the lives of 207 500 people globally in 2019, with 869 770 cases reported [29]. These annual increases point to a global issue [30][31][32]. In this section, using WHO data on real measles cases in Bangladesh from 2000 to 2020 [28,33,34], we estimate the transmission rates and removal rates by solving the inverse problem over a two years interval.
The figure 4 exhibits the measles transmission and removal scenarios in Bangladesh. The basic reproduction numbers using the estimated values of the transmission rate and removal rate are shown in figure 5. The estimated parameter values for the last year (2020) under    [29]. In this study, the authors estimated the basic reproduction number as 1.44, whereas in our study, estimated basic reproduction number is about 0.4136, which is way below than the one estimated by Kuddus et al [29]. This difference is reasonable considering the fact that, in our study the basic reproduction corresponding the year 2020 is calculated based on the data of year 2019

Application to COVID-19
A cluster of pneumonia cases with uncertain etiology in Wuhan, Hubei province, China was first reported to the WHO by the Wuhan Municipal Health Commission in China, On 31 December 2019 [35][36][37][38][39]. The cause of the disease was soon confirmed as a new class of Corona virus (SARS-CoV-2) [40,41]. And the infection has since spread to many countries around the World and on 11 March 2020, the WHO announced COVID-19 as a highly contagious global pandemic [35]. The fast-spreading coronavirus has so far claimed over 4.6 million lives and infected more than 222.9 million people throughout the world, according to Worldometer [35,42,43].
Here we use the COVID-19 data of Bangladesh from 12 March 2020 to 17 July 2021 [14] to estimate the transmission rates and removal rates by solving the inverse problem over a 3, 7, 10 and 14 days interval. The reason behind choosing more time intervals of different length is discussed in the last paragraph of this subsection. Figure 6 demonstrates the transmission rates and removal rates corresponding to 3 days interval. The fluctuations as can be seen in transmission rates and removal rates during early days are mainly due to the irregularities of data. Sudden outbreak of the virus in Bangladesh posed a huge threat on the country's health system. There was huge shortage of testing kits and facilities at the beginning. The daily data collection procedure was unstable at the beginning. Besides, around 1000 people were declared recovered just in one day around mid-May 2020. Due to the continuous mass isolation during the first couple of months, the transmission scenarios went down to comfortable zone by the end of September 2021, as demonstrated in figures 6(a) and (b). The overall scenario was good except two small peaks in November 2020 and April 2021. However, the new Delta variant hit hard in early June, 2021 and the number   figure 7(a) that corresponds to 3 days interval appears to be demonstrating the epidemic scenarios in more detail. As the length of sub-interval increases, the epidemic details seem to be fading away. In order to explore this fact, we make further investigation from a different perspective. We apply this concept for COVID-19 first, and then we do the same for Nipah and Measles as well. The technique is discussed in details in the following section.

Comparison of the results with different time intervals
As mentioned in the last section, we investigate the accuracy of the results corresponding to different time intervals. To do so, we solve the forward problem over the corresponding time interval by using the estimated parameters obtained in that interval. Following steps were followed to implement the idea for three days interval data.
(a) From the estimated time series data of β and α, the first entries of β and α were taken, and the initial values of S and I were taken from the first day COVID-19 data of Bangladesh. The forward problem was then solved exactly for three days. (b) Next, the second entries of β and α were then taken, and the initial values of S and I were taken from the fourth day COVID-19 data of Bangladesh. The forward problem was then solved exactly for three days.
The forward problem was solved for every sub-interval using these steps. Same approach was used for other intervals considered in this study. Results of the forward problem were then compared with the data using the following relation.
Relative error = | I data − I model | I data where I data represents daily active infectious COVID-19 data and I model represents the corresponding active infectious data obtained by solving the forward problem. Figure 8 demonstrates the comparison of the results produced by the forward SIR model with the respective epidemic data in all three cases corresponding to different time interval. This figure shows the results of the model fit very well with the epidemic data in all three cases when the time interval is smaller. However, as the length of the interval increases, the predicted results produced by SIR model fall apart fairly quickly from the epidemic data. The relative errors with 95% confidence interval for different time intervals in all three cases summarized in figure 9. This figure demonstrates that, in case of COVID-19, the mean error corresponding to 3 days interval is almost zero and the range for 95% confidence interval around the mean is also very small. The mean errors and ranges for confidence intervals corresponding to larger intervals can clearly be seen increasing. Corresponding to 14 days interval is the highest, which is about 0.17, and the range for 95% is also quite extended. The similar behaviours are also observed corresponding to increasing time intervals for Nipah and Measles. The eye ball test, therefore, tells us that the SIR model fits the epidemic data well for shorter time interval. In other words, we can say that prediction accuracy of the SIR model falls apart as the period of prediction increases.

Conclusion
In this study, we evaluated the time-dependent parameters of SIR models corresponding to Nipah, Measles and COVID-19 outbreak in Bangladesh. This research aimed to precisely predict the transmission and removal rates using Bangladeshi COVID-19 data from 12 March 2020 to 17 July 2021, as well as Measles and Nipah data from 2000 to 2020 and 2001 to 2021, respectively.
The results of this study demonstrate that the inverse problem method could be used to obtain the detailed insights into the epidemic scenario. Further, we compared the results produced by the forward SIR model using the estimated parameters with the epidemic data. The analysis shows that the real data and the results produced by the SIR model have really good agreement for smaller time interval. The model produces less accurate results as the time interval increases. The study, therefore, suggests that SIR compartmental model could be a good choice for analysing the transmission dynamics and forecasting with a greater accuracy only for a short period of time. The proposed technique could be very effective in understanding the true epidemic scenario and hence, deciding the short-term innervation policy, such as mass isolation or lock-down.
The simple SIR model considered in this study poorly captures the transmission dynamics of the diseases such as COVID-19 , Measles and Nipah. There is a latency period of about 4-5 days in case COVID-19, 10-14 days in case measles and symptoms of Nipah virus usually show up within 4-14 days of exposure. Exposed people who develops the symptoms later and shows symptoms often get tested and hence held isolated. Besides, some people do not show symptoms at all, yet they are infected and transmitting the diseases silently throughout their entire infectious lifetime. Future research in this direction could incorporate these issue into modelling. Our future research intends to extend the SIR model by incorporating latency, vaccination, and reinfection.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.