Forecasting the COVID19 transmission in Italy based on the minimum spanning tree of dynamic region network
 Published
 Accepted
 Received
 Academic Editor
 Ulrich Pfeffer
 Subject Areas
 Bioinformatics, Computational Biology, Epidemiology, Infectious Diseases
 Keywords
 Coronavirus Disease 2019 (COVID19), Italy, Early warning signals, Region network, Dynamic network marker (DNM), Minimum spanning tree (MST)
 Copyright
 © 2021 Dong et al.
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
 Cite this article
 2021. Forecasting the COVID19 transmission in Italy based on the minimum spanning tree of dynamic region network. PeerJ 9:e11603 https://doi.org/10.7717/peerj.11603
Abstract
Background
Italy surpassed 1.5 million confirmed Coronavirus Disease 2019 (COVID19) infections on November 26, as its death toll rose rapidly in the second wave of COVID19 outbreak which is a heavy burden on hospitals. Therefore, it is necessary to forecast and early warn the potential outbreak of COVID19 in the future, which facilitates the timely implementation of appropriate control measures. However, realtime prediction of COVID19 transmission and outbreaks is usually challenging because of its complexity intertwining both biological systems and social systems.
Methods
By mining the dynamical information from region networks and the shortterm time series data, we developed a datadriven model, the minimumspanningtreebased dynamical network marker (MSTDNM), to quantitatively analyze and monitor the dynamical process of COVID19 spreading. Specifically, we collected the historical information of daily cases caused by COVID19 infection in Italy from February 24, 2020 to November 28, 2020. When applied to the region network of Italy, the MSTDNM model has the ability to monitor the whole process of COVID19 transmission and successfully identify the earlywarning signals. The interpretability and practical significance of our model are explained in detail in this study.
Results
The study on the dynamical changes of Italian region networks reveals the dynamic of COVID19 transmission at the network level. It is noteworthy that the driving force of MSTDNM only relies on small samples rather than years of time series data. Therefore, it is of great potential in public surveillance for emerging infectious diseases.
Introduction
The world is currently witnessing a major and devastating pandemic with substantial mortality and morbidity–Coronavirus Disease 2019 (COVID19) (Mohanty et al., 2020). It was declared by the World Health Organization (WHO) as a public health emergency of international concern in January 2020 (Team, 2020; Lai et al., 2020). As of November 28, 2020, about 60 million cases and 1.45 million deaths were confirmed globally. Italy is one of the most affected countries. In about two months, i.e., from midFebruary 2020 to midApril 2020, it has been one of the main epicenters of the COVID19 pandemic when the epidemic reached its first peak. Then, the epidemic curve gradually decreased until early October 2020, after which the spread of infection accelerated again until today (Perone, 2020). As shown in Fig. 1A, a new and more severe epidemic is sweeping Italy. As of November 28, Italy had suffered 54,363 deaths and 1,564,532 cases (Dong, Du & Gardner, 2020). Therefore, there is an urgent need for an effective and lowcost model to build an epidemic surveillance system to help countries severely affected by the epidemic like Italy to warn of a new wave of COVID19 outbreaks in the future.
The COVID19 pandemic has sparked an intense debate about the factors underlying the dynamics of the outbreak (Pacheco Coelho et al., 2020; Pequeno et al., 2020). Meanwhile, the study of mathematical models of epidemiology is helpful to understand the dynamics of epidemics, being an important tool to evaluate the potential effects of preventive and controlled measures, especially when their characteristics are still unclear (Chatterjee et al., 2020; Khan et al., 2019; Cheng et al., 2020). Under such circumstances, by exploring the dynamical information from region networks and timeseries data, we employed a combined model, the minimumspanningtreebased dynamical network marker (MSTDNM) (Yang et al., 2020), to quantitatively describe the dynamics of COVID19 transmission and thus identify the earlywarning signal of a new wave of COVID19 pandemic in Italy. This is the first time that this model has been applied to the study of the emerging infectious disease like COVID19. This model is improved from our recently proposed concept, the socalled dynamical network biomarker (DNB), which determines the critical state of complex diseases by analyzing the dynamics of driven biomolecules (i.e., a group of genes and proteins that are the leading factors to the critical state transition) (Chen et al., 2012). The DNB based methods have been applied to a number of biological progresses and obtained remarkable results, including identifying the critical points of cell fate decision (Mojtahedi et al., 2016) and cellular differentiation (Richard et al., 2016), detecting the critical period during various biological processes (Liu et al., 2014; Liu et al., 2019; Chen et al., 2015; Chen et al., 2017), and predicting the warning signals of influenza outbreaks (Yang et al., 2020; Chen et al., 2019; Chen et al., 2020).
By extracting the minimum spanning tree from the dynamic region network, we could typically describe the dynamics of the spread of the infectious diseases among regions. Specifically, we collected the historical information of daily cases by COVID19 infection in Italy from February 24, 2020 to November 28, 2020. When applied to the region networks constructed based on geographical location and traffic conditions, the MSTDNM model has the ability to monitor the whole process of epidemic spread in Italy, and successfully identify the earlywarning signals about two weeks in advance. It is worth noting that in the previous research, the key role of the minimum spanning tree in this model has been described in detail (Yang et al., 2020), which can avoid issuing wrong warning signals due to the appearance of local abnormal correlation. Therefore, we pay more attention to the interpretability and practical significance of the MSTDNM model in our study, as detailed in the Materials & Methods section and in the Results section. Consequently, our model is quite suitable for predicting the potential outbreak of COVID19 in Italy with the characteristics of nonlinearity time series and small sample size based on Italian region network, which may help to develop new control strategies for COVID19 before its new wave of outbreaks.
Materials & Methods
Theoretical background
The spread of infectious diseases in a region is described as the dynamic evolution of a nonlinear system, while the outbreak of COVID19 is regarded as a qualitative state transition of a dynamic system (Yang et al., 2020). From the perspective of dynamic modelling, the stage before the COVID19 outbreak is regarded as a preoutbreak stage, immediately after which the system undergoes a critical transition. Then the dynamical process of an epidemic system can be roughly modelled as three stages similar to the dynamics of disease progression (Chen et al., 2012): (i) the normal stage, which is a stable stage with high resilience; (ii) the preoutbreak stage, which is dynamically unstable. At this stage, the epidemic is still controllable through appropriate measures; and (iii) the outbreak stage, which is an uncontrolled stage with high elastic dynamic features. As shown in Fig. 1B, when the system transits from the normal stage to the preoutbreak stage, the region network changes significantly and the indicator of our model rises sharply. Different from the traditional detection of the outbreak stage, our model could determine the preoutbreak stage which generally has no obvious abnormalities but with high potential of state transition into a severe and irreversible stage.
Data collection
Data about the COVID19 pandemic can be obtained from the GitHub repository managed by Johns Hopkins University for modeling (Dong, Du & Gardner, 2020), which contains publicly available data from multiple sources. The open dataset of Italy we use is available on the website COVID19 in Italy at: https://github.com/pcmdpc/COVID19. This database has been created and managed by the Italian Civil Protection Department, which is updated and integrated daily (Italian Civil Protection D et al., 2020).
Algorithm applied in Italy
The MSTDNM model is illustrated in Fig. 2 and the detailed process of our model applied in Italy is described in the following four steps.
i. Modeling and mapping
It is noted that our model is applied to the region network to monitor the COVID19 transmission and outbreak in Italy. Therefore, it is necessary to construct the regional network based on the Italian regions’ geographic distribution and their adjacent information at first. The adjacent information is shown in Table S1 of Supplementary Information. In the network, each node represents a region or an autonomous province, while each edge represents the adjacent relation between two regions. Then a 21 ∗ 279 data matrix formed by the records of COVID19 daily confirmed cases is mapped to the network. The region network model of Italy is presented as in Fig. 3.
ii. Weighting and extracting
The region network in the first step can be represented as an undirected graph $G=\left(V,E\right)$, which contains a collection $V={\left\{{v}_{i}\right\}}_{i=1}^{M}$ composed of M vertexes and a collection $E={\left\{{e}_{j}\right\}}_{j=1}^{N}$ composed of N edges in this network. In the Italian region network (Fig. 3), M is 21 and N is 34. It should be noted that in order to make the application of our model easier to understand, Node 15 is not considered in the subsequent calculations. When the model is applied to other regions, the construction of regional network should be reconsidered, instead of simply excluding islands. In addition, as shown in Fig. S1, as the COVID19 epicenter is located in northern Italy, whether Sardinia is included in the model has little influence on the final warning results.
The number of daily confirmed cases of a region is considered as a sample s to form a set of time series data. Therefore, for each vertex v_{i} of the region network on the day t, there is a corresponding time series of confirmed cases ${S}_{t}^{{v}_{i}}=\left\{{s}_{1},{s}_{2},\dots ,{s}_{t}\right\}$. In order to assign a weight ${W}_{t}^{k}$ to each edge e^{k} of the region network on the day t, it is necessary to calculate the correlations between the two vertexes v_{i}, v_{j} of the edge e^{k} as follows:
(1)${W}_{t}^{k}=\left\Delta S{D}_{t}\left(i,j\right)\right\ast \left\Delta PC{C}_{t}\left(i,j\right)\right,$
where
(2)$\Delta S{D}_{t}\left(i,j\right)=\left\overline{S{D}_{t}\left({S}_{t}^{{v}_{i}}\right)+S{D}_{t}\left({S}_{t}^{{v}_{j}}\right)}\overline{S{D}_{t1}\left({S}_{t1}^{{v}_{i}}\right)+S{D}_{t1}\left({S}_{t1}^{{v}_{j}}\right)}\right$
is the differential standard deviation of the nodes v_{i}, v_{j} on day t and t − 1, and $S{D}_{t}\left({S}_{t}^{{v}_{i}}\right)$ and $S{D}_{t}\left({S}_{t}^{{v}_{j}}\right)$ represent the standard deviation of the time series data of the two vertices v_{i}, v_{j}. (3)$\Delta PC{C}_{t}\left(i,j\right)=\leftPC{C}_{t}\left({S}_{t}^{{v}_{i}},{S}_{t}^{{v}_{j}}\right)\right\leftPC{C}_{t1}\left({S}_{t1}^{{v}_{i}},{S}_{t1}^{{v}_{j}}\right)\right$
is the differential Pearson’s correlation coefficient between the two vertices v_{i}, v_{j} of the edge e^{k}, where $PC{C}_{t}\left({S}_{t}^{{v}_{i}},{S}_{t}^{{v}_{j}}\right)$ and $PC{C}_{t1}\left({S}_{t1}^{{v}_{i}},{S}_{t1}^{{v}_{j}}\right)$ represent the Pearson’s correlation coefficient between the two vertices v_{i}, v_{j} on day t and t − 1 respectively.
Based on the above work, an undirected and edgeweighted network that changes dynamically over time is obtained. The next step is to extract the minimum spanning tree from the dynamic region network at each moment. In detail, when Italy is on the t day of the COVID19 pandemic, we could extract the minimum spanning tree $MS{T}_{t}=\left(V,E\right)$ to better describe the evolution of Italian regional network with the change of daily cases. In this work, a classical minimum spanning tree algorithm, the Prim’s algorithm, is applied to the differential weighted network G_{t} at a specific time t to obtain its minimum spanning tree MST_{t}.
iii. Calculating early warning indicators I_{t}
Then, the MSTDNM indicator I_{t} can be obtained by calculating the weight sum of the minimum spanning tree. ${I}_{t}={\sum}_{i=1}^{K}{W}_{t}^{i}$, where ${W}_{t}^{i}$ represents the weight of the edge e_{i} of the minimum spanning tree MST_{t} at time t and K is the total number of edges of the minimum spanning tree MST_{t}. The specific algorithm flow is shown in Algorithm 1.
_______________________
Algorithm 1 The indicator It using Prim’s algorithm________________________________
Require: The nodes of the weighted undirected graph V ; the function w(u,v)
which means the weight of the edge (u,v); the function adj(v) which means
the nodes adjacent to v.
Ensure: The sum of weights of the minimum spanning tree in the input graph,
It.
1: It ← 0
2: choose an arbitrary node in V to be the root
3: dis(root) ← 0
4: for each node v ∈ (V −{root}) do
5: dis(v) ←∞
6: end for
7: rest ← V
8: while rest ⁄= Φ do
9: cur ← the node with the minimum dis in rest
10: It ← It + dis(cur)
11: rest ← rest − cur
12: for each node v ∈ adj(cur) do
13: dis(v) ← min(dis(v),g(cur,v))
14: end for
15: end while
16: return It________________________________________________________________________________________
According to DNB theory, during the critical stage, there are two cases for the minimum spanning tree MST_{t} at time point t:

In the MST_{t}, all of the nodes are DNB members;

In the MST_{t}, DNB and nonDNB members both exist.
For the above two cases respectively, the statistical indicator I_{t} has significant changes as presented in Table 1. Obviously, the MST_{t} based on the indicator I_{t} and the edges’ weight W_{t} has the ability to monitor the dynamical process of COVID19 spread between regions and issue a warning signal timely.
iv. Identifying earlywarning points
In previous studies, machine learning methods, i.e., logistic regression (Yang et al., 2020), have been applied to identify the appearance of critical points based on years of highdimensional data. However, for COVID19 which originated at the beginning of 2020, the time series data obtained is of quite small scale, which is difficult for machine learning algorithm to learn the appropriate parameters and features. Therefore, the foldchange threshold, an index of volatility, is used to detect the earlywarning signal. Specifically, a 2fold change threshold is applied to identify the significant changes of the indicator I_{t} in our study.
Case  Nodes  SD_{t}  $\left\Delta {\mathit{SD}}_{\mathit{t}}\left(\mathit{i}.\mathit{j}\right)\right$  ${\mathit{PCC}}_{\mathit{t}}\left(\mathit{i},\mathit{j}\right)$  $\left\Delta {\mathit{PCC}}_{\mathit{t}}\left(\mathit{i},\mathit{j}\right)\right$  ${\mathit{W}}_{\mathit{t}}^{\mathit{k}}$  I_{t} 

1  All DNM  ↗  ↗  ↗  ↗  ↗  ↗ 
2  DNM and nonDNM  D ↗  ↗  D ↗  ↗  ↗  ↗ 
N →  N ↘  ↗ 
Notes:
When the system moves from time point t − 1 to t, it is approaching the critical point.
“ ↗” represents the increase of variables; “ ↘” represents the decrease of variables; “ →” represents that there is no significant change in the variables.
“D” represents the DNM members; “N” represents the nonDNM members.
SD_{t} is the standard deviation at time point t; $PC{C}_{t}\left(i,j\right)$ is the Pearson’s correlation coefficient between two nodes v_{i}, v_{j}.
The significance of indicator I_{t}
The transmission of COVID19 is a complicated dynamic system, which contains many biomedical and social factors. Due to the massive number of influencing factors, it is difficult to describe the transport dynamics in highdimensional space mathematically. The sharp or qualitative transition of regional network from the normal state to the outbreak state corresponds to the bifurcation point in dynamic system theory (Gilmore, 1993). According to this theory, if the system approaches the critical point, it will eventually be confined to onedimensional or twodimensional space, where the dynamic system can be represented in quite simple forms. This is the theoretical basis for developing a general indicator that can describe the dynamics of COVID19 transmission.
It’s clear from the above statement that the meaning of the variables in Formula (1) is as follows: (i) $\left\Delta S{D}_{t}\left(i,j\right)\right$ can describe the differential fluctuation of cases growth in two adjacent regions compared with the previous time point. (ii) $\left\Delta PC{C}_{t}\left(i,j\right)\right$ can describe the difference of the COVID19 interaction between two adjacent regions compared with the previous time point. Apparently, attention should be paid to the edge with larger weight. Because it means that the regions associated with this edge not only worsen their own epidemic situation, but also have a great impact on the surrounding regions. Therefore, it’s obvious that the indicator I_{t}, the weight sum ${W}_{t}^{k}$ of all edges in MST_{t}, has the ability to observe the change of a group of weighted differential networks.
Results
Early warning of COVID19 outbreaks in Italy
We collect the historical data of daily cases infected by COVID19 from February 24, 2020 to November 28, 2020 in Italy. The outbreak points of COVID19 are defined as the peak of the daily cases.
Provided as in Fig. 4, the earlywarning signals are identified through the MSTDNM model for each outbreak of COVID19. For the first wave of COVID19 outbreak from midFebruary to midApril, the earlywarning signal was issued on March 6, which is about 15 days ahead of the outbreak point. This means that our model has successfully played an early warning role.
On 9 March 2020, the Italian prime minister Mr. G Conte announced the implementation of placing the country into lockdown to restrict the movement of people, thus reducing the possibility of human to human infection (Chintalapudi, Battineni & Amenta, 2020; Remuzzi & Remuzzi, 2020). Since the last week of March, the statistics have become consciously optimistic, and the number of daily cases has been stabilizing. For the second wave of epidemic since early October, later developing into a larger outbreak, the indicator I_{t} was sensitive and significantly increased about 10 days before the actual number of confirmed cases skyrockets. In addition, the indicator showed a continuous downward trend with wave type after November 5, which means that the number of daily cases in Italy has initially peaked. The successful prediction of each wave of COVID19 outbreaks in Italy demonstrates the robustness and effectiveness of the MSTDNM model in detecting realtime warning signals for emerging infectious diseases.
The dynamics of COVID19 transmission in Italy
Dynamic monitoring map
To better illustrate the MSTDNM model’s principle, we introduce the dynamic evolution of the COVID19 transmission network in Italy. As shown in Fig. 5, the daily number of newly confirmed cases with MinMaxScaler in each region is mapped to each node and the correlation between two vertices of an edge is mapped to the thickness of the edge in the tree network. The specific method of MinMaxScaler is to subtract the minimum value of the feature from the processed value and divide it by the feature range, which is the difference between the original maximum and the original minimum. MinMaxScaler can keep the shape of the original data distribution of each region, and make the coloring of each node unaffected by other regions. It is clear that the edges became thicker before the nodes turned darker on October 20, which indicated our model identified the earlywarning signal in the preoutbreak stage when the actual number of confirmed cases did not increase significantly. After that, i.e., on October 26, the edges continued to become thicker, which meant that the epidemic might continue to worsen in Italy.
Warning function of MSTDNM in Italy
As of November 28, the five regions with the highest cumulative number of confirmed cases in Italy are Lombardy, Piedmont, Campania, Veneto and EmiliaRomagna, which correspond to Nodes 10, 13, 5, 21 and 6 in Fig. 6A, respectively. Among them, Lombardy region is considered as the epicenter of COVID19 outbreak in Italy (Grasselli, Pesenti & Cecconi, 2020; Tuite et al., 2020).
As shown in Fig. 6A, the dynamic region network is divided into two local networks according to the thickness of the edges, which are centered on Node 10 (Lombardy) and Node 5 (Campania) respectively. It’s obvious that our model has successfully warned two highrisk regions. In fact, a large number of new cases have been confirmed in these regions over the next month as shown in Fig. 6B. In addition, the nodes sandwiched between two local networks, i.e., Node 17 (Tuscany) and Node 19 (Umbria), should also be focused on. It turns out that from October 26 to November 28, the growth rate of new case in Tuscany and Umbria exceeded 200%.
As described in Table 2, the thicker edges in two local networks on October 26, such as $e\left({v}_{10},{v}_{21}\right)$, $e\left({v}_{6},{v}_{21}\right)$, $e\left({v}_{13},{v}_{20}\right)$, $e\left({v}_{5},{v}_{8}\right)$ and $e\left({v}_{12},{v}_{14}\right)$, should be focused on. As of November 28, the number of confirmed cases in the regions, like Lombardy, Campania, Lazio and Piedmont, corresponding to the nodes associated with these edges has a high growth rate and a large number of new cases. This also verify the early warning function of MSTDNM, which not only measures its own epidemic situation, but also reflects the regional impact on the surrounding areas.
Local region network #  Edge  Related regions  Newly added cases  Growth rate 

1  $e\left({v}_{10},{v}_{21}\right)$  Lombardy  244,726  154.96% 
Veneto  95,506  210.07%  
$e\left({v}_{6},{v}_{21}\right)$  EmiliaRomagna  71,379  148.93%  
Veneto  95,506  210.07%  
$e\left({v}_{13},{v}_{20}\right)$  Piedmont  107,150  187.46%  
Valle d’Aosta  3747  140.39%  
2  $e\left({v}_{5},{v}_{8}\right)$  Campania  111,077  273.91% 
Lazio  80,262  223.35%  
$e\left({v}_{12},{v}_{14}\right)$  Molise  3193  242.08%  
Puglia  37,341  249.39% 
Notes:
“Newly added cases” refers to the cumulative number of confirmed cases in the corresponding region from October 26 to November 28.
“Growth rate” refers to the newly confirmed cases in corresponding regions from October 26 to November 28 divided by the cumulative cases on October 26.
Application of MSTDNM in northern Italy
As shown in Fig. 6B, several areas in northern Italy, such as Lombardy, Veneto, etc., are the most severely affected by COVID19. Our model can be applied not only to the entire country of Italy, but also to an area. In order to verify the effectiveness of our model, it has also been applied to identify the earlywarning signals of COVID19 outbreaks in northern Italy. The results are presented in Figs. S2 –S3 of Supplementary Information.
Performance comparison
The machine learning algorithms are also used to forecast the COVID19 epidemic situation (Parhusip, 2020; Singh et al., 2020; Parbat & Chakraborty, 2020). Regarding the identification of early warning signals of COVID19 outbreaks as a binary classification problem, we compare the performance of our combined model with the support vector machine (SVM). The AUC of MSTDNM is 0.9318, while that of SVM is 0.9076. It’s clear that the performance of a system based on MSTDNM is better than that based on SVM when only the data of daily confirmed cases is provided. In addition, the SVM model issue an early warning signal on September 28, 2020, which is too early to be of practical significance for the second wave outbreak; and our adaptive model can actually issue an early warning signal about 10 days before the actual number of confirmed cases skyrockets. Actually, compared with traditional machine learning algorithms, the MSTDNM model has the following internal strengths. First of all, it is a modelfree approach without any training and testing processes. There is no feature selection in MSTDNM strategy, which solely depends on the statistical conditions of our model. Second, it’s noted that there is no limitation of the data sample size for our approach, which means that our model could achieve a good performance with only small sample data. Therefore, it can be applied to describe and monitor the emerging infectious diseases like COVID19. In addition, our combined model is capable to describe the dynamic process of the spread of COVID19 through the minimum spanning tree of dynamic region networks.
Discussion
A new wave of COVID19 epidemic is sweeping the world. On November 26, more than 1.5 million people were diagnosed with COVID19 infection in Italy, and the death toll rose rapidly in the second wave of COVID19 epidemic, bringing a heavy burden to hospitals. In order to prevent a new wave of COVID19 pandemic or at least reduce the magnitude of COVID19 outbreaks, it is essential to build a surveillance system that relies entirely on reliable and available information, such as the number of daily cases.
Specifically, unlike the critical transformation analysis based on DNB of complex diseases with genomic datasets, the DNB method has been improved and applied to the macro regional networks. The successful application in Italy shows that the MSTDNM is a modelfree method with datadriven characteristics and has great potential in actual realtime monitoring for emerging infectious diseases. Moreover, this is the first time that the improved method based on DNB has been applied to predict the outbreak of COVID19. Unlike previous studies that used the DNB based methods to predict influenza outbreaks (Yang et al., 2020; Chen et al., 2019; Chen et al., 2020), our study is based on small time series samples rather than years of time series data. Therefore, it could be employed to describe and monitor emerging infectious diseases like COVID19. In addition, this paper introduces the practical significance and early warning function of the MSTDNM model in detail. It is believed that this is an important step from theory to practice. It should be noted that the MSTDNM model in our work is completely based on the records of confirmed cases per day, and has achieved satisfactory performance. Given more information on the spread of the COVID19 epidemic, the monitoring model is expected to reliably forecast the transmission and outbreak of COVID19 in terms of sensitivity and accuracy.
Although the proposed model have achieved good results, there are some limitations of the project:

The MSTDNM model in our work is completely based on the records of daily cases. If we could get data on the number of people tested in a region, we could measure the epidemic situation in this region more accurately. This may be the direction of model improvement in the future.

As for the recognition of early warning signals, we can take into consideration any alternative to that criterion and the effect that a different choice could have on the prediction results in the future work.

Experiments in this paper were performed on COVID19 outbreaks in Italy. The future work could involve to exam the proposed model on other regions or countries.
Conclusions
In this study, we developed a combined model with dynamic network marker and minimum spanning tree solely based on the daily cases to describe and forecast the COVID19 outbreaks in Italy. In order to put theory into practice, we also explain the significance and warning function of the model indicators in detail. By extracting the minimum spanning tree from the dynamic region network, the model can effectively identify the earlywarning signals with an average of 2week window lead prior to the catastrophic transition into COVID19 outbreaks in Italy. Through the study of the network dynamics in Italy, this paper reveals the spread of COVID19 on the network level. It is noteworthy that the driving force of MSTDNM only relies on small samples, rather than multiyear data. Therefore, it has great potential to monitor emerging infectious diseases timely.
Supplemental Information
Raw data and code
The daily cases data of COVID19 in Italy from February 24, 2020 to November 28, 2020 is in ‘Italy_covid19_data.csv’. The rows represent the historical time and the columns represent different regions. The data in the table represents the number of daily cases.
The code is located in ‘myTools.py’, ‘calculate_Italy.py’ and ‘calculate_Northern_Italy.py’. ‘myTools.py’ contains the necessary calculation functions. With the environment for python and the corresponding package, this code could be run to get the results of calculations. ‘Italy_mst.json’ contains the weighted minimum spanning tree for each day, ‘Italy_daily_cases.xls’ contains the sum of daily confirmed cases in all regions of Italy for each day and ‘Italy_weight_sum.xls’ contains the daily MSTDNM indicator. For the calculation of northern Italy, the meaning is similar. The ‘data’ and code must be in the same directory.