Statistics based predictions of coronavirus 2019-nCoV spreading in mainland China

Background. The epidemic outbreak cased by coronavirus 2019-nCoV is of great interest to researches because of the high rate of spread of the infection and the significant number of fatalities. A detailed scientific analysis of the phenomenon is yet to come, but the public is already interested in the questions of the duration of the epidemic, the expected number of patients and deaths. For long time predictions, the complicated mathematical models are necessary which need many efforts for unknown parameters identification and calculations. In this article, some preliminary estimates will be presented. Objective. Since the reliable long time data are available only for mainland China, we will try to predict the epidemic characteristics only in this area. We will estimate some of the epidemic characteristics and present the most reliable dependences for victim numbers, infected and removed persons versus time. Methods. In this study we use the known SIR model for the dynamics of an epidemic, the known exact solution of the linear equations and statistical approach developed before for investigation of the children disease, which occurred in Chernivtsi (Ukraine) in 1988-1989. Results. The optimal values of the SIR model parameters were identified with the use of statistical approach. The numbers of infected, susceptible and removed persons versus time were predicted. Conclusions. Simple mathematical model was used to predict the characteristics of the epidemic caused by coronavirus 2019-nCoV in mainland China. The further research should focus on updating the predictions with the use of fresh data and using more complicated mathematical models.


INTRODUCTION
Here, we consider the development of epidemic outbreak cased by coronavirus 2019-nCoV (see e.g., [1][2][3]). Since the reliable long time data are available only for mainland China, we will try to predict the number of victims V of this virus only in this area. The first estimations of V exponential growth, typical for the initial stages of every epidemic (see e.g., [4]) have been done in [3]. For long time predictions, more complicated mathematical models are necessary. For example, a susceptible-exposed-infectious-recovered (SEIR) model was used in [2]. Nevertheless, the complicated models need more efforts for unknown parameters identification. This procedure may be especially difficult, if reliable data are limited.
In this study, we use the known SIR model for the dynamics of an epidemic [4][5][6][7][8] To the parameter identification, we will use the exact solution of the SIR set of linear equations and statistical approach developed in [4]. These methods were applied for investigation of the children disease, which occurred in Chernivtsi (Ukraine) in 1988-1989. We will estimate some of the epidemic characteristics and present the most reliable dependences for victim numbers, infected and removed persons versus time.

Data
We shall analyze the daily data for the number of confirmed cases in mainland China, which origins from China National Health Commission [1]. We show in the table Table 1 the corresponding time moments t j from 0 to 24 and the number of victims V j (confirmed cumulative cases of coronavirus 2019-nCoV infection), which were used for calculations. Table 1 shows that the precise time of the epidemic beginning 0 t is unknown. Therefore, the optimization procedures have to determine the optimal value of this parameter as well as for other parameters of SIR model. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Exact solution of SIR-equations
The SIR-model for an infectious disease can be written as follows, [6,7]: The number of susceptible persons is S, infected -I, removed -R; the infection and immunization rates are  and  respectively. Since To determine the initial conditions for the set of equations (1-3), let us suppose that 0 ( ) 1 It follows from (1) and (2) that Integration of (5) with the initial conditions (4) yields: Function I has a maximum at S   and tends to zero at intinity, see [6,7]. In comparison, the number of susceptible persons at infinity 0 S   , and can be calculated with the use of (6) from a non-linear equation In [4] the equations (1-3) were solved by introducing the function ( ) ( ) corresponding to the number of victims. The integration of corresponding equation: yields: All rights reserved. No reuse allowed without permission.
the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2020.02.12.20021931 doi: medRxiv preprint Thus, for every set of parameters N,  , , 0 t and a fixed value of V the integral (10) can be calculated and the corresponding moment of time can be determined from (9). Then I can be calculated from (6) by putting S=N-V and function R from R=V-I.

Statistical approach for parameter identification. Linear regression
As in paper [4], we shall use the fact that the random function 1 ( , , ) with (see (9)). Then we can apply the linear regression (see [9]) for every pair of parameters N and  and the corresponding values of 0 t and  . The optimal (the most reliable) values of N and  correspond to the maximum value of the correlation coefficient r (see [4]).

RESULTS
The optimal values of parameters were calculated: means that these persons can couch the infection. Such situation needs additional analysis, in particular, with the use of more complicated models (see, e.g., [10]).

CONCLUSIONS
Simple mathematical model was used to predict the characteristics of the epidemic caused by coronavirus 2019-nCoV in mainland China. The further research should focus on updating the predictions with the use of fresh data and using more complicated mathematical models.