SIMULATIONS AND PREDICTIONS OF COVID-19 PANDEMIC WITH THE USE OF SIR MODEL

Background. The COVID-19 pandemic is of great interest to researchers due to high mortality and a very negative impact to the world economy. A detailed scientific analysis of the phenomenon is yet to come, but the public is already interested in the problems of duration of the epidemic, the expected number of patients, where and when the pandemic started. Correct simulation of the pandemic dynamics needs complicated mathematical models and many efforts for unknown parameters identification. In this article, preliminary estimates for many countries and world will be presented, summarized and discussed. Objective. We will estimate the epidemic characteristics for USA, Germany, UK, the Republic of Korea and in the world with the use of SIR simulations and compare them with the results obtained before for Italy, Spain, France, the Republic of Moldova, Ukraine and Kyiv. The hidden periods, epidemic durations, final numbers of cases and quarantine measures will be discussed. Methods. In this study we use the known SIR (susceptible-infected-removed) model for the dynamics of the epidemic, the known exact solution of the linear differential equations and statistical approach developed before. Results. The optimal values of the SIR model parameters were identified with the use of statistical approach for epidemic dynamics in USA, Germany, UK, the Republic of Korea, and in the world. The actual number of cases and the number of patients spreading the infection versus time were calculated. The hidden periods, durations and final sizes of the epidemic were evaluated. In particular, the pandemic began in China no later than October, 2019. If current trends continue, the end of the pandemic should be expected no earlier than March 2021, the global number of cases will exceed 5 million. A simple method for assessing the risk of premature weakening of quarantines is proposed. Conclusions. The SIR model and statistical approach to the parameter identification are helpful to make some reliable estimations for the epidemic dynamics, e.g., the real time of the outbreak, final size and duration of the epidemic and the number of persons spreading the infection versus time. This information will be useful to regulate the quarantine activities and to predict the medical and economic consequences of the pandemic.


Introduction
Here we consider the global COVID-19 pandemic dynamics and epidemic outbreaks in USA, Germany, UK, South Korea other countries and regions with the use of official WHO data sets [1]. The SIR model, connecting the number of susceptible S, infected and spreading the infection I and removed R persons, was applied [2,3]. The unknown parameters of this model can be estimated with the use of the cumulative number of cases V = I + R and the statistics-based method of parameter identification [4].

Data
The official information regarding the accumulated numbers of confirmed COVID-19 cases V j in South Korea, Germany, UK, USA and in the world from WHO daily situation reports (numbers 81-109) [1] is presented in Table 1. The corresponding moments of time t j ( measured in days) are also shown in this table. Data sets for the period April 9-29 were used for calculations. Other values were used only for verifications of calculations.

SIR model
The SIR model for an infectious disease [2][3][4][5] relates the number of susceptible persons S (persons who are sensitive to the pathogen and not protected); the number of infected is I (persons who are sick and spread the infection; please don't confuse with the number of still ill persons, so known active cases) and the number of removed R (persons who no longer spread the infection; this number is the sum of isolated, recovered, dead, and infected people who left the region);  and  are constants: , dS SI dt   (1) , dI SI I dt     (2) .
The parameter  is called the infection rate, since according to (1) it shows how quick the susceptible persons become infected. Large values of this parameter correspond to severe epidemics with many victims. This parameter accumulates many characteristics. First it shows how strong (virulent) is the pathogen and what is the way of its spreading. For airborne droplets transmission (typical for coronavirus , the values of  are higher than for AIDS, for example. Parameter  accumulates also the frequency of contacts and the way of contacting. Epidemics use to start at large cities were the average contact rate is much higher, than in small villages. In order to decrease the value of , we have to minimize the number of our contacts and change our contacting habits. For example, we have to avoid the public places and use masks there, minimize or cancel traveling. We have to change our contact habits: to avoid handshakes and kisses. First, all these simple things are very useful to protect yourself. In addition, if most people follow these recommendations, we have chance to diminish the value of parameter  and reduce the negative effects of the pandemic. Usually the parameter  is called the immunization rate. This name is reasonable when there is no isolation and all the sick persons recover. Since the removed persons number R is the sum of isolated, recovered and dead people, eq. (3) demonstrates the increase rate for R. So we will call this parameter the removing rate. The inverse value 1/ is an estimation for time of spreading the infection. So, we are interested in increasing the value of parameter  and decreasing 1/. Public authorities should work on this and organize immediate isolation of suspicious cases.
Since the derivative ( )/ d S I R dt  is equal to zero (it follows from summarizing eqs. tt  It must be noted that the constant N is not the volume of population total N , but only the initial number if people sensitive and not protected to some specific disease. Unfortunately, the ratio total / NN may be very large for coronavirus COVID-19. We can see this from the situation with the liner Diamond Princess. The total initial number of persons on board was 3711; on February 18, 2020 the cumulative number of confirmed cases was 542. Thus the percentage of susceptible persons can be at least 14.6% and if the people will be not protected and isolated enough, hundred millions in the world can be infected. It means, we have to work hard on protection and isolation to avoid millions of deaths and reduce the number N.

Analytical solution of SIR equations
To determine the initial conditions for the set of eqs. (1)-(3), let us suppose that at the moment of the epidemic outbreak 0 t [4]: The analytical solution for the set of eqs. (1)-(3) was obtained by introducing the function ( ) ( ) ( ) V t I t R t , corresponding to the number of victims or cumulative confirmed number of cases [4]: Thus, for every set of parameters N, , , 0 t and a fixed value of V the integral (5) can be calculated and the corresponding moment of time can be determined from (4). Then functions I(t) and R(t) can be easily calculated with the formulas [4]: Function I has a maximum at S  and tends to zero at infinity, see [2,3]. In comparison, the number of susceptible persons at infinity 0, S   and can be calculated from the non-linear equation [4]: The final number of victims (final accumulated number of cases) can be calculated from: To estimate the duration of an epidemic outbreak, we can use the condition: which means that at final tt  less than one person still spreads the infection.

Parameter identification procedure
In the case of a new epidemic, the values of this independent four parameters are unknown and must be identified with the use of limited data sets. A statistical approach was used in [4][5][6][7][8][9][10][11] to estimate the values of unknown parameters. The registered points for the number of victims V j corresponding to the moments of time t j can be used in order to calculate 11 values N and  with the use of (5) and then to check how the registered points fit the straight line (4). For this purpose the linear regression can be used, e.g., [12], and the optimal straight line, minimizing the sum of squared distances between registered and theoretical points, can be defined. Thus we can find the optimal values of , 0 t and calculate the correlation coefficient r for the linear dependence (4).
Then the F-test may be applied to check how the null hypothesis that says that the proposed linear relationship (4) fits the data set. The experimental value of the Fisher function can be calculated with the use of the formula: where n is the number of observations, m = 2 is the number of parameters in the regression equation [12]. The corresponding experimental value F has to be compared with the critical value 12 ( , ) C F k k of the Fisher function at a desired significance or confidence level 1 (1 km  , 2 k n m ) [13]. When the values n and m are fixed, the maximum of the Fisher function coincides with the maximum of the correlation coefficient. Therefore, to find the optimal values of parameters N and , we have to find the maximum of the correlation coefficient for the linear dependence (4). To compare the reliability of different predictions (with different values of n) it is useful to use the ratio / (1, 2) C F F n  at fixed significance level. We will use the level 0.001; corresponding values of (1, 2) C Fn  can be taken from [13]. The most reliable prediction yields the highest / (1, 2)

Results
Usually the number of cases during the initial period of an epidemic outbreak is not reliable. To avoid their influence on the results, only V j values for the period April 9-29, 2020 ( 47 67; Table 1) were used to calculate the epidemic characteristics. Since during the quarantine, the international people exchange is quite limited, we can apply the SIR model for every country assuming its parameters to be constant (but different for every country) during the fixed period of time. The results of calculations are shown in Tables 2 and 3. To illustrate the influence of data on the results of SIR simulations, the previous estimation for Germany ( [10]) is also presented in Table 2.
It can be seen that the previous prediction for Germany (see [10]) was more optimistic. Fresh data sets has showed that the final number of cases in this country could reach 177,000 and their appearance can stop only after August 4, 2020 (see Table  2, prediction 2). The presented second prediction for South Korea is also more pessimistic in comparison with the first one [6]. In particular, the epidemic stop is expected after June 29, 2020 (see Table 2). These estimations are valid only when the quarantine measures, isolation rate and the coronavirus activity will be the same as for the period taken for calculations. Tables 2 and 3 illustrate that real epidemic outbreaks in Germany, USA and the Republic of Korea probably occurred in November-December 2019, in UKin early 2020. The real beginning of the global COVID-19 pandemic can be attributed to the beginning of October 2019. It happened in China, most likely in Wuhan, the epicenter of its visible course. Unfortunately, official data from China are very contradictory, what makes their analysis impossible using the SIR model. In any case, the estimations of the 0 t values presented in [4,5] are no longer relevant. The rather long duration of the pandemic is expected. The last cases could stop to appear only in March 2021 after exceeding the value of 5 millions. This longterm prediction is very preliminary, corresponds to  the current situation and does not take into account the repeating outbreaks that are possible and are already happening in many countries.
The results about the epidemic hidden periods (time between 0 t and the day when the first COVID-19 case was confirmed), epidemic durations final 0 tt  and the final numbers of cases V  (final sizes) for different countries are presented in Table 4. For the world data, December 8, 2019 was taken as the day of the first laboratory confirmed case in Wuhan, China (according to [14]). The results of previous calculations for Austria, Italy, Spain, France, Moldova, Ukraine and Kyiv from [10,11] are also shown in Table 4. It can be seen that the longest epidemic durations are expected in the countries with the longest hidden periods (USA, Italy, Germany). Probably, the zero hidden period in France indicates the need of recalculations after obtaining more recent data on the number of cases. The predicted saturation value 129823 for this country (see Table 4) is 4.5% lower than real number of cases 135980, registered on May 7 after 20 days of observations (after April 18). The real numbers of cases in Spain, Moldova and Austria are also higher than predicted saturation levels shown in Table 4. The SIR curves and markers representing the V j values taken for calculations ("circles"), comparisons ("triangles") and verifications of calculations ("stars") are shown in Figs. 1 and 2 by different colors corresponding to the country or region (Worldbrown; USAblue; Italygreen; Spainyellow; UKred and South Koreamagenta). Solid lines represent the number of cases (victims) V(t) = I(t) + R(t), dashed lines show the number of infected persons spreading the pathogen I(t). The number of laboratory confirmed cases in Wuhan, China are shown in Fig. 1 by brown "squares". These values was calculated in [5] with the use of information from [14].
Dashed brown curve in Fig. 1 illustrates that more than 600 persons could spread the infection on December 31, 2019 -the day when China notified WHO about the situation in Wuhan (see brown vertical line in Fig. 1). On January 23, 2020 this city was locked down, but the number of infected persons could be estimated by 3000 with hundreds of cases in USA, Italy and Germany (the officially confirmed number of cases was 830 in mainland China, 2 in the Republic of Korea and 1 in USA on this day). The big difference between number of calculated (bold lines in Figs. 1 and 2) and actual cases ("triangles") is explained by the fact that many infected people do not have symptoms and there were and still are problems with testing. In particular, the hidden periods (time between the estimated epidemic beginning 0 t and the first confirmed case) can be rather long (see Fig. 1 and Table 4).
Recently, there is more and more evidence in the media and literature about the hidden periods of the epidemic. In particular, according to [10] the first COVID-19 could happen in Italy around November 26, 2019 (see Table 4). This result correlates with the information form Giuseppe Remuzzi, director of the Mario Negri Institute for Pharmacological Research that "virus was circulating before we were aware of the outbreak in China" [15]. Probably the spread of the infection was facilitated by the Military World Games held in Wuhan from 18 to 27 October with the participation of 9,300 athletes from more than 140 countries. Many participants became ill with COVID-19 symptoms and passed the infection on to their families [16]. A very fundamental statistical analysis [17] showed that the number of cases of pneumonia in the United States in January and February 2020 exceeded last year's figures and this excess was higher in those states where the actual numbers of reported COVID-19 cases are larger. Fig. 1 illustrates that before Mach 1, 2020 the estimated numbers of cases in Italy and USA were rather close. Nevertheless, on March 1 the number of confirmed cases in Italy was 1689 in comparison with 62 in USA. This delay with detection of infected persons and later quarantining [18] caused the huge recent number of cases in USA (for 116 Innov Biosyst Bioeng, 2020, vol. 4, no. 2  Fig. 1 shows that in the Republic of Korea the detected number of cases was close to the estimated one (compare solid magenta line and magenta ‚triangles‛) already on March 1. Timely testing and isolation of patients allowed this country to avoid a large number of cases.

Reliability of predictions
It was already mentioned that during the initial stages of epidemic the registered number of cases is much lower than the real one. This fact reduces the accuracy of any simulation using the registered values. Nevertheless, the prediction have  [5] for the mainland China. "Triangles" represent the official data set for China from [5], "squares" -the cumulative number of cases in Wuhan calculated with the information presented in [14], "stars" -to the points corresponding to the epidemic in Italy [1], with the corresponding time shift.
to be performed in order to estimate the final sizes and the durations of epidemics in different countries even with limited accuracy.
Errors caused by incomplete data can be illustrated by two different predictions for the Republic of Korea. Both of them were performed with the use of SIR model and the same method of parameter identification. The first one used the data for the period February 17 -March 12, 2020 and predicted: the final accumulated number of cases 8117; V   0 t = −9 days and the final day of epidemic March 20, 2020 (see [6]). The results of calculations presented in Table 2 illustrate that the difference in final sizes is about 25%, but the epidemic duration predictions differ more than 5 times. Three different estimations for Italy [8][9][10] yielded the variation in the final sizes from 111548 to 225736; the hidden period estimates increased by 66 days with the use of more recent data sets.
Since the presented forecasts for USA and the world are very long-term, they must be considered as very preliminary and optimistic. The global pandemic dynamics is very unpredictable, since the situation is very different in different countries. In particular, there is no quarantine in Belarus and the word "coronavirus" is prohibited in Turkmenistan. In any case the global dynamics must be updated with the use of new data sets. Unfortunately, the new estimations will probably be more pessimistic. When the number of confirmed cases tends to the real one, the accuracy of SIR simulation may be rather high. An example is the latest fore-cast for Austria [10]. After 28 days of observation the predicted final size (see Table 4) is only 3.6% lower than the real number 15673 (May 7, 2020).

Comparison of epidemic dynamics in different countries
WHO data [1] and Fig. 1 show that visible periods of epidemic begin in different time in different countries. This fact allows comparing the initial period of the epidemic in one country with the use of SIR curve obtained for another country, where the registered number of cases is already approaching the real one. Such a comparison of the epidemic dynamics in Italy and mainland China was done in [7] (see Fig. 3). To synchronize the time, the time moment t s corresponding to the number of victims V=76 (this cumulative number was confirmed in Italy on March 22, see [1]) was calculated with the optimal values of parameters for the epidemic in mainland China [5]. The result t s = 15.82545 means that all the corresponding time moments for the data set for Italy have to be shifted by the value 52.82545 (since March 22 corresponds to the time moment t j = 37, see Table 1 in [7], zero t j value corresponds to January 16, 2020). The "stars" in Fig. 3 represent the accumulated number of confirmed cases in Italy. It can be seen that the epidemic in Italy developed much more rapid than it was in China. Thus, the application of SIR model allowed predicting the higher final number of cases in Italy already on March 3, when the preprint [7] was submitted.

Control of testing and relaxation of quarantine
SIR simulation could be used to control the situation with COVID-19 testing. For example, in Ukraine the PLC tests were introduced for pneumonia patients and medical staff only after April 10, 2020. To estimate how the change in testing algorithm affected the epidemic dynamics in Ukraine and Kyiv, two series of calculations were performed in [10,11] with the use of data sets for the periods March 28 -April 10 and April 11-24.
Figs. 4 and 5 illustrate the results of these estimations. The corresponding SIR curves (solid for the second data set and dashed for the first one) are very different. Delayed proper testing could cost Ukraine at least 9,000 additional patients and an epidemic duration increase of 47 days (compare blue lines in Fig. 4). That is why the maximum PCR testing (especially for medical staff and patients with pneumonia) can be recommended in all countries as an effective means of reducing the scale and duration of the pandemic. The calculated SIR curved were used to control the epidemic dynamics after April 24, 2020 (see "stars" in Figs. 4 and 5). It can be seen, that the situation in Kyiv develops according to the second prediction (see Fig. 5), but the number of cases in Ukraine increases much rapidly in comparison with the second prediction shown in the Fig. 4 by solid blue line. The obvious differences in the epidemic dynamics can be explained by the fact that the situation with testing in the Ukrainian province is much worse than in the capital. It is possible that the dynamics was also affected by the large number of quarantine violations in the province during the orthodox Easter celebrations.
The problem of easing quarantine has become urgent in many countries. The SIR I(t) curves can be used to estimate possible risks. For example, green line in Fig. 5 shows that in Kyiv the number of infected persons (they may feel completely healthy or have mild symptoms) is estimated by 100 on May 11, 2020 (the day of quarantine easing, t j = 68). It means that the probability to meet such person is rather low 5 3, 4 10 . p   But if you have M contacts, the probability of meeting at least one infected person increases according to the formula Therefore, a person on duty in the subway or a trolleybus driver is quite likely to meet an infected person during, for example, 10 days of work (until the value of p decreases). Meeting an infected person does not mean getting infected, so masked mode and distance in transport and other public places must remain mandatory. Workers in transport, trade, pharmacies, police (all of whom are forced to have many contacts) must be provided with enhanced protection. People at risk should continue to refrain from traveling, visiting indoors and minimizing visits to medical facilities. Where possible, distant work and study should be maintained.
For ordinary Kyiv citizens working in small groups, the risk of meeting an infected person depends on the transport situation. If cars and metro stations, land transport will be regularly decontaminated, the values of M and P for passengers will be small. Everyone (not only in Kyiv) can assess their own level of risk (allowable probability) using formula (10).

Could the pandemic be avoided?
The answer to this question may be rather positive. This is evidenced by the experience of Hong Kong, which introduced the relevant measures on December 31, 2019 as soon as alarming information from Wuhan appeared on a social media platform. The Hong Kong authorities immediately closed "transport ties with Wuhan"; increased "vigilance and temperature screenings at every border checkpoint, including the city's international airport and high-speed railway station in West Kowloon", the hospitals were told "to report further cases of "pneumonia of unknown origin" [19]. As a result the accumulated number of cases in Hong Kong was 1045 on May 9, 2020. For example, the number of infected people in Kyiv was 1.7 times higher.
Quarantine has been announced all over Ukraine on March 12, 2020. It happened 9 days after the first COVID-19 case was confirmed in Chernivtsi region and 7 days before the first case in Kyiv. On May 11 the corresponding numbers of cases were 2396 and 1930 respectively. Taking into account that the population of Kyiv exceeds the population of Chernivtsi region more than three times, it can be concluded that the timely introduction of quarantine can effectively slow down spread of the infection.

Conclusions
The SIR (susceptible-infected-removed) model and statistical approach to the parameter identification provide a possibility for making reliable estimations for the epidemic dynamics, e.g., the real time of the outbreak, final size and duration of the epidemic and the number of persons spreading the infection versus time. This information may be useful to regulate the quarantine activities and to predict the medical and economic consequences of the pandemic. The pandemic outbreak probably occurred in China not later than the end of September 2019, it could continue beyond mid-March 2021, and the number of infected people in the world could exceed 5 million.