ADIDUNET—a segmentation model for COVID19 infection from lung CT scans
 Published
 Accepted
 Received
 Academic Editor
 Faizal Khan
 Subject Areas
 HumanComputer Interaction, Artificial Intelligence, Computer Aided Design, Computer Vision
 Keywords
 COVID19 pulmonary infection, Dense network, Attention gate, Improved dilation convolution, UNET, Lung CT segmentation
 Copyright
 © 2021 Joseph Raj et al.
 Licence
 This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
 Cite this article
 2021. ADIDUNET—a segmentation model for COVID19 infection from lung CT scans. PeerJ Computer Science 7:e349 https://doi.org/10.7717/peerjcs.349
Abstract
Currently, the new coronavirus disease (COVID19) is one of the biggest health crises threatening the world. Automatic detection from computed tomography (CT) scans is a classic method to detect lung infection, but it faces problems such as high variations in intensity, indistinct edges near lung infected region and noise due to data acquisition process. Therefore, this article proposes a new COVID19 pulmonary infection segmentation depth network referred as the Attention GateDense Network Improved Dilation ConvolutionUNET (ADIDUNET). The dense network replaces convolution and maximum pooling function to enhance feature propagation and solves gradient disappearance problem. An improved dilation convolution is used to increase the receptive field of the encoder output to further obtain more edge features from the small infected regions. The integration of attention gate into the model suppresses the background and improves prediction accuracy. The experimental results show that the ADIDUNET model can accurately segment COVID19 lung infected areas, with performance measures greater than 80% for metrics like Accuracy, Specificity and Dice Coefficient (DC). Further when compared to other stateoftheart architectures, the proposed model showed excellent segmentation effects with a high DC and F1 score of 0.8031 and 0.82 respectively.
Introduction
COVID19 has caused a worldwide health crisis. The World Health Organization (WHO) announced COVID19 as a pandemic on March 11, 2020. The clinical manifestations of COVID19 range from influenzalike symptoms to respiratory failure (i.e., diffuse alveolar injury) and its treatment requires advanced respiratory assistance and artificial ventilation. According to the global case statistics from the Center for Systems Science and Engineering (CSSE) of Johns Hopkins University (JHU) (Wang et al., 2020a) (updated August 30, 2020), 24,824,247 confirmed COVID19 cases, including 836,615 deaths, have been reported so far with pronounced effect in more than 180 countries. COVID19 can be detected and screened by Reverse Transcription Polymerase Chain Reaction (RTPCR). However, the shortage of equipment and the strict requirements on the detection environment limit the rapid and accurate screening of suspected cases. Moreover, the sensitivity of RTPCR is not high enough, resulting in a large number of falsenegatives (Ai et al., 2020), which presents early detection and treatment of patients with presumed COVID19 (Fang et al., 2020). As an important supplement to RTPCR, CT scans clearly describe the characteristic lung manifestations related to COVID19 (Chung et al., 2020), the early Ground Glass Opacity (GGO), and late lung consolidation are shown in Fig. 1. Nevertheless, CT scans also show imaging features that are similar to other types of pneumonia, making it difficult to differentiate them. Moreover, the manual depiction of lung infection is a tedious and timeconsuming job, which is often influenced by personal bias and clinical experience.
In recent years, deep learning has been gaining popularity in the field of medical imaging due to it’s intelligent and efficient feature extraction ability (Kong et al., 2019; Ye, Gao & Yin, 2019), and has achieved great success. An earliest classic example is the application of deep learning to children’s chest Xrays to detect and distinguish bacterial and viral pneumonia (Kermany et al., 2018; Rajaraman et al., 2018). Also using deep learning methods have been applied to detect various imaging features of chest CT images (Depeursinge et al., 2015; Anthimopoulos et al., 2016). Recently, researchers proposed to detect COVID19 infections in patients by radiation imaging combined with deep learning technology. Li et al. (2020) proposed a simple CovNet deep learning network in combination with a deep learning algorithm, which was used to distinguish COVID19 and CommunityAcquired Pneumonia (CAP) from chest CT scans. Wang & Wong (2020) proposed CovidNet to detect COVID19 cases from chest Xray images, with an accuracy rate of 93.3%. The infection probability of COVID19 Xu et al. (2020) was calculated from CT scans by adopting a positionoriented attention model that presented accuracy close to 87%. However, the above models rarely involved the segmentation of COVID19 infection (Chaganti et al., 2020; Shan et al., 2020). The challenges involved in segmentation include: (a) variations in texture, size, and position of the infected areas in CT scans. For example, some infection areas are small, which easily lead to a high probability of false negatives in CT scans. (b) The boundary of GGO is usually of low contrast and fuzzy in appearance, which makes it difficult to distinguish from the healthy regions during the segmentation process. (c) The noise around the infected area is high, which greatly affects the segmentation accuracy and (d) finally the cost and time consumed in obtaining highquality pixellevel annotation of lung infection in CT scans is high. Therefore, most of the COVID19 CT scan datasets are focused on diagnosis, and only a few of them provide segmentation labels. However, with the passage of time, the annotated datasets for the segmentation of COVID19 pulmonary infection were released but due to a lesser amount of data, the phenomenon of overfitting could cause problems while training thus necessitating the need for more segmentation datasets and better algorithms for accurate results.
Therefore, to address the challenges stated above, we propose a new deep learning network called Attention GateDense Network Improved Dilation ConvolutionUNET (ADIDUNET) for the segmentation of COVID19 from lung infection CT scans. Experimental results on a publicly available dataset illustrate that the proposed model presents reliable segmentation results that are comparable to the ground truths annotated by experts. Also, in terms of performance, the proposed model surpasses other stateoftheart segmentation models, both qualitatively and quantitatively.
Our contributions in this paper are as follows:

To address the problem that the gradient disappearance in the deep learning network pose, we employ a dense network (Huang et al., 2017) instead of a traditional convolution and maxpooling operations. The dense network extracts dense features and enhances feature propagation through the model. Moreover, the training parameters of the dense network are less, which reduces the size and the computational cost.

To increase the size of the respective field and to compensate for the problems due to blurry edges, an improved dilation convolution (IDC) module is used to connect the encoder and decoder pipelines. The IDC model increases the receptive field of the predicted region providing more edge information, which enhances the edge recognition ability of the model.

Since the edge contrast of GGO is very low, we use the attention gate (AG) instead of simple cropping and copying. This further improves the accuracy of the model to detect the infection areas by learning the characteristics of the infected regions.

Due to the limited number of COVID19 segmented datasets with segmentation labels, which is less than the minimum number of samples required for training a complex model, we employ data augmentation techniques and expand the dataset on the basis of the collected public datasets.
The rest of the paper is organized as follows: “Related Work” describes the work related to the proposed model. “Methods” introduces the basic structure of ADIDUNET. Details of the dataset, experimental results and discussion are dealt with in “Experimence Results”. Finally, “Conclusion” presents the conclusion.
Related work
ADIDUNET model proposed in this paper is based on UNET (Ronneberger, Fischer & Brox, 2015) architecture and therefore, we will discuss the literature related to our work which includes: deep learning and medical image segmentation, improvement of medical image segmentation algorithms, CT scan segmentation, and application of deep learning in segmentation of COVID19 lesions from lung CT scans.
Deep learning and medical image segmentation
In recent years, deep learning algorithms have become more mature leading to various artificial intelligence (AI) systems based on deep learning algorithms being developed. Also, semantic segmentation using deep learning algorithms (Oktay et al., 2018) has developed rapidly with applications in both natural and medical images. Long, Shelhamer & Darrell (2015) pioneered the use of a fully connected CNN (FCN) to present rough segmentation outputs that were of the input resolution through fractionally strided convolution process also referred as the upsampling or deconvolution. The model was tested on PASCAL VOC, NYUDv2, and SIFT datasets and, presented a Mean Intersection of Union (MIOU) of 62.7%, 34%, 39.5%, respectively. They also reported that upsampling, part of the innetwork, was fast, accurate, and provided dense segmentation predictions. Later through a series of improvements and extensions to FCN (Ronneberger, Fischer & Brox, 2015; Badrinarayanan, Kendall & Cipolla, 2017; Xu et al., 2018), a symmetrical structure composed of encoder and decoder pipelines, called UNET (Ronneberger, Fischer & Brox, 2015), was proposed for biomedical or medical image segmentation. The encoder structure predicted the segmentation area, and then the decoder recovered the resolution and achieved accurate spatial positioning. Also, the UNET used crop and copy operations for the precise segmentation of the lesions. Further, the model achieved good segmentation performance at the International Symposium on Biomedical Imaging (ISBI) challenge (Cardona et al., 2010) with the MIOU of 0.9203. Moreover, an improved network referred as the SegNet was proposed by Badrinarayanan, Kendall & Cipolla (2017). The model used the first 13 convolution layers of the VGG16 network (Karen & Andrew, 2014) to form an encoder to extract features and predict segmentation regions. Later by using a combination of convolution layers, unpooling and softmax activation function in the decoder, segmentation outputs of input resolution were obtained. When tested with the CamVid dataset (Brostow, Fauqueur & Cipolla, 2009), the MIOU index of SegNet was nearly 10% higher than that of FCN (Long, Shelhamer & Darrell, 2015). Xu et al. (2018) regarded segmentation as a classification problem in which each pixel was associated with a class label and designed a CNN network composed of three layers of convolution and pooling, a fully connected layer (FC) and softmax function. The model of successfully segmented threedimensional breast ultrasound (BUS) image datasets was presented into four parts: skin, fibroglandular tissue, mass, and fatty tissue and achieved a recall rate of 88.9%, an accuracy of 90.1%, precision of 80.3% and F1 score of 0.844. According to the aforementioned literature, FCN (Long, Shelhamer & Darrell, 2015) and their improved variants presented accurate segmentation results for both natural or medical images. Therefore, the UNET and variants (Almajalid et al., 2019; Negi et al., 2020), due to its advantages of fast training and high segmentation accuracy are widely used in the field of medical image segmentation.
Improvement of medical image segmentation algorithms
Medical images such as the ultrasound images are generally prone to speckle noise, uneven intensity distribution, and low contrast between the lesions and the backgrounds which affect the segmentation ability of the traditional UNET (Ronneberger, Fischer & Brox, 2015) structure. Therefore, considerable efforts were invested in improving the architecture. Xia & Kulis (2017) proposed a fully unsupervised deep learning network called WNet model that connects two UNETs to predict and reconstruct the segmentation results. Schlemper et al. (2019) proposed an attention UNET network, which integrated attention modules into the UNET (Ronneberger, Fischer & Brox, 2015) model to achieve spatial positioning and subsequent segmentation. The model presented a segmentation accuracy of 15% higher than the traditional UNET architecture. Zhuang et al. (2019a) combined the goodness of the attention gate system and the dilation convolution module and proposed a hybrid architecture referred as the RDAUNET. By introducing residual network (He et al., 2016) instead of traditional convolution layers they reported a segmentation accuracy of 97.91% towards the extraction of lesions in breast ultrasound images. Also, the GRAUNET (Zhuang et al., 2019b) model included a group convolution module inbetween the encoder and decoder pipelines to improve the segmentation of the nipple region in breast ultrasound images. Therefore, from the literature, it can be inferred that introducing additional modules like attention gate instead of traditional cropping and copying, inclusion of dilation convolution to increase the receptive fields and use of residual networks can favorably improve the accuracy of the segmentation model. However, these successful segmentation models (Schlemper et al., 2019; Zhuang et al., 2019a; Xia & Kulis, 2017) were rarely tested with CT scans, hence the next section concentrates on the segmentation of CT scans.
CT scan segmentation
CT imaging is a commonly used technology in the diagnosis of lung diseases since lesions can be segmented more intuitively from the chest CT scans. The segmented lesion aid the specialist in the diagnosis and quantification of the lung diseases (Gordaliza et al., 2018). In recent years, most of the classifier models and algorithms based on feature extraction have achieved good segmentation results in chest CT scans. Ye et al. (2009) proposed a shapebased ComputerAided Detection (CAD) method where a 3D adaptive fuzzy threshold segmentation method combined with chain code was used to estimate infected regions in lung CT scans. In featurebased techniques, due to the low contrast between nodules and backgrounds, the boundary discrimination is unclear leading to inaccurate segmentation results. Therefore, many segmentation techniques based on deep learning algorithms have been proposed. Wang et al. (2017) developed a central focusing convolutional neural network for segmenting pulmonary nodules from heterogeneous CT scans. Jue et al. (2018) designed two deep networks (an incremental and dense multiple resolution residually connected network) to segment lung tumors from CT scans by adding multiple residual flows with different resolutions. Guofeng et al. (2018) proposed a UNET model to segment pulmonary nodules in CT scans which improved the overall segmentation output through the avoidance of overfitting. Compared with other segmentation algorithms such as graphcut (Ye, Beddoe & Slabaugh, 2009), their model had better segmentation results with a Dice coefficient of 0.73. Recently, Peng et al. (2020) proposed an automatic CT lung boundary segmentation method, called Pixelbased TwoScan Connected Component LabelingConvex HullClosed Principal Curve method (PSCCLCHCPC). The model included the following: (a) the image preprocessing step to extract the coarse lung contour and (b) coarse to finer segmentation algorithm based on the improved principal curve and machine learning model. The model presented good segmentation results with Dice coefficient as high as 96.9%. Agarwal et al. (2020) proposed a weakly supervised lesion segmentation method for CT scans based on an attentionbased cosegmentation model (Mukherjee, Lall & Lattupally, 2018). The encoder structure composed of a variety of CNN architectures that includes VGG16 (Karen & Andrew, 2014), ResNet101 (He et al., 2016), and an attention gate module between the encoderdecoder pipeline, while decoder composed of upsampling operation. The proposed method first generated the initial lesion areas from the Response Evaluation Criteria in Solid Tumors (RECIST) measurements and then used cosegmentation to learn more discriminative features and refine the initial areas. The paper reported a Dice coefficient of 89.8%. The above literatures suggest that deep learning techniques are effective in segmenting lesions in lung CT scans and many researchers have proposed different deep learning architectures to deal with COVID19 CT scans. Therefore, in the next section we will further study their related works.
Application of deep learning in segmentation of COVID19 lesions from lung CT scans
In recent months, COVID19 has become a hot topic of concern all over the world and CT imaging is considered to be a convincing method to detect COVID19. However, due to the limited datasets and the time and labor involved in annotations, segmentation datasets related to COVID19 CT scans are less readily available. But, many researchers have still proposed advanced methods to deal with COVID19 diagnosis, which also includes segmentation techniques (Fan et al., 2020; Wang et al., 2020b; Yan et al., 2020; Zhou, Canu & Ruan, 2020; Elharrouss et al., 2020; Chen, Yao & Zhang, 2020). On the premise of insufficient datasets with segmentation labels, the InfNet network proposed by Fan et al. (2020), combined a semisupervised learning model and FCN8s network (Long, Shelhamer & Darrell, 2015) with implicit reverse attention and explicit edge attention mechanism to improve the recognition rate of infected areas. The model successfully segmented COVID19 infected areas from CT scans and reported a sensitivity and accuracy of 72.5% and 96.0%, respectively. Elharrouss et al. (2020) proposed an encoderdecoderbased CNN method for COVID19 lung infection segmentation based on a multitask deeplearning based method, which overcame the shortage of labeled datasets, and segmented lung infected regions with a high sensitivity of 71.1%. Wang et al. (2020b) proposed a noiserobust COVID19 pneumonia lesions segmentation network which included a noiserobust dice loss function along with convolution function, residual network, and Atrous Spatial Pyramid Pooling (ASPP) module. The model was referred as CopleNet presented automatic segmentation of COVID19 pneumonia lesions from CT scans. The method proved that the proposed new loss function was better than the existing noiserobust loss functions such as Mean absolute error (MAE) loss (Ghosh, Kumar & Sastry, 2017) and Generalized CrossEntropy (GCE) loss (Zhang & Sabuncu, 2018) and achieved a Dice coefficient and Relative Volume Error (RVE) of 80.72% and 15.96%, respectively. Yan et al. (2020) employed an encoderdecoder deep CNN structure composed of convolution function, Feature Variation (FV) module (mainly contains convolution, pooling, and sigmoid function), Progressive Atrous Spatial Pyramid Pool (PASPP) module (including convolution, dilation convolution, and addition operation) and softmax function. The convolution function obtained features, FV block enhanced the feature representation ability and the PASPP was used between encoder and decoder pipelines compensated for the various morphologies of the infected regions. The model achieved a good segmentation performance with a Dice coefficient of 0.726 and a sensitivity of 0.751 when tested on the COVID19 lung CT scan datasets. Zhou, Canu & Ruan (2020) proposed an encoderdecoder structure based UNET model for the segmentation of the COVID19 lung CT scan. The encoder structure was used to extract features and predict rough lesion areas which composed convolution function and Resdil block (combines residual block (He et al., 2016) and dilation convolution module). The decoder pipeline was used to restore the resolution of the segmented regions through the upsampling and the attention mechanism between the encoderdecoder framework to capture rich contextual relationships for better feature learning. The proposed method can achieve an accurate and rapid segmentation on COVID19 lung CT scans with a Dice coefficient, sensitivity, and specificity of 69.1%, 81.1%, and 97.2%, respectively. Further, Chen, Yao & Zhang (2020) proposed a residual attention UNET for automated multiclass segmentation of COVID19 lung CT scans, which used residual blocks to replace traditional convolutions and upsampling functions to learn robust features. Again, a soft attention mechanism was applied to improve the feature learning capability of the model to segment infected regions of COVID19. The proposed model demonstrates a good performance with a segmentation accuracy of 0.89 for lesions in COVID19 lung CT scans. Therefore, the deep learning algorithms are helpful in segmenting the infected regions from COVID19 lung CT scans which aid the clinicians to evaluate the severity of infection (Tang et al., 2020), largescale screening of COVID19 cases (Shi et al., 2020) and quantification of the lung infection (Ye et al., 2020). Table 1 summarizes the deep learningbased segmentation techniques available for COVID19 lung infections.
Literature  Data Type  Dataset  Technique  Segmentation results 

Fan et al. (2020)  CT Scan  100 CT images  Semi supervised CNN  73.9% (DC) 
FCN8s network  96.0% (S_{p})  
Wang et al. (2020b)  CT Scan  558 CT images  Residual connection  80.7% (DC) 
CNN  16.0% (RVE)  
Yan et al. (2020)  CT Scan  21,658 CT images  Deep CNN  72.6% (DC) 
75.1% (S_{en})  
Zhou, Canu & Ruan (2020)  CT Scan  100 CT images  Attention mechanism  69.1% (DC) 
ResNet, dilation convolution  81.1% (S_{en})  
Elharrouss et al. (2020)  CT Scan  100 CT images  Encoderdecoderbased CNN  78.6% (Dice) 
71.1% (S_{en})  
Chen, Yao & Zhang (2020)  CT Scan  110 CT images  Encoderdecoderbased CNN  83.0% (DC) 
89.0% (ACC)  
Xu et al. (2020)  CT Scan  110 CT images  CNN  86.7% (ACC) 
83.9% (F1)  
Shuai et al. (2020)  CT Scan  670 CT images  CNN  73.1% (ACC) 
67.0% (S_{p}) 
Methods
In this section, we first introduce the proposed ADIDUNET network with detailed discussion on the core network components including dense network, improved dilation convolution, and attention gate system. To present realistic comparisons, experimental results are presented at each subsection to illustrate the performance and superiority of the model after adding core components. Further in “Experimence Results” we have presented a summary of the % improvements achieved when compared to the traditional UNET architecture.
ADIDUNET architecture
ADIDUNET is based on UNET (Ronneberger, Fischer & Brox, 2015) architecture with the following improvements: (a) The dense network proposed by Huang et al. (2017) is used in addition to the convolution modules of encoder and decoder structures, (b) an improved dilation convolution (IDC) is introduced between the frameworks, and (c) the attention gate (AG) system is used instead of the simple cropping and copying operations. The structure of ADIDUNET is shown in Fig. 2. Here f_{en}, f_{upn}, f_{idc} describe the features at the nth layer of the encoder, decoder, and IDC modules, respectively.
When COVID19 CT scans are presented to the encoder, the first four layers (each layer has convolutions, rectification, and max pooling functions) extract features (f_{1}–f_{4}) that are passed to dense networks. Here dense networks are used instead of convolution and maxpooling layers to further enhance the features (f_{5}–f_{6}) and in “Dense Network”, we elaborate the need for the dense network and present experimental results to prove its significance. Next, an improved dilation convolution module referred as the IDC model, is used between the encoder–decoder structure to increase the receptive field and gather detailed edge information that assists in extracting the characteristic. The module accepts the feature f_{6} from the dense networks and after improvement, present f_{idc} them as inputs to the decoder structure. To ensure consistency in the architecture and to avoid losing information, the decoder mirrors the encoder with two dense networks that replace the first two upsampling operations. Further for the better use of the context information between the encoderdecoder pipeline, the AG model is used instead of cropping and copying operations, which aggregates the corresponding layerwise encoder features with the decoder and presents it to the subsequent upsampling layers. Likewise, the decoder framework presents upsampled features f_{up1} to f_{up6} and final feature map (f_{up6}) is presented to the sigmoid activation function to predict and segment the COVID19 lung infected regions. The following section explains the components of ADIDUNET in detail.
Dense network
It was presumed that with the increase of network layers, the learning ability of the network will gradually improve, but during the training, for deep networks, the gradient information that is helpful for the generalization may disappear or expand excessively. In literature, the problem is referred as vanishing or explosion of the gradient. As the network begins to converge, due to the disappearance of the gradient the network saturates, resulting in a sharp decline in network performance. Therefore, Zhuang et al. (2019a) introduced residual units proposed by He et al. (2016) into UNET structure to avoid performance degradation during training. The residual learning correction scheme to avoid performance degradation is described in (1): (1) $$y=G\left(x,\left\{{F}_{i}\right\}\right)+x$$
Here x and y are the input and output vectors of the residual block, F_{i} is the weight of the corresponding layer. The function $G\left(x,\left\{{F}_{i}\right\}\right)$ is a residue when added to x, avoids vanishing gradient problems, and enables efficient learning.
From (1) the summation of $G\left(x,\left\{{F}_{i}\right\}\right)$ and x in ResNet (He et al., 2016) avoids the vanishing gradient problems but forwarding the gradient information alone to the proceeding layers may hinder the information flow in the network and the recent work by Huang et al. (2016) illustrated that of ResNets discard features randomly during training. Moreover, ResNets include large number of parameters, which increases the training time. To solve this problem, Huang et al. (2017) proposed a dense network (as shown in Fig. 3), which directly connects all layers, and thus skillfully obtains all features of the previous layer without convolution.
The dense network is mainly composed of convolution layers, pooling function, multiple dense blocks, and transition layers. Let us consider a network with L layers, and each layer implements a nonlinear transformation H_{i}. Let x_{0} represent the input image, i represents layer i, x_{i−1} is the output of layer i − 1. H_{i} can be a composite operation, such as batch normalization (BN), rectified linear function (RELU), pooling, or convolution functions. Generally, the output of traditional network in layer i is as follows: (2) $${x}_{i}={H}_{i}\times \left({x}_{i1}\right)$$
For the residual network, only the identity function from the upper layer is added: (3) $${x}_{i}={H}_{i}\times \left({x}_{i1}\right)+{x}_{i1}$$
For a dense network, the feature mapping x_{0}, x_{1},…, x_{i−1} of all layers before layer i is directly connected, which is represented by Eq. (4): (4) $${x}_{i}={H}_{i}\times ([{x}_{0},{x}_{1},\dots ,{x}_{i1}])$$
where $[{x}_{0},{x}_{1},\dots ,{x}_{i1}]$ denotes the cascade of characteristic graphs and × represents the multiplication operation. Figure 4 shows the forward connection mechanism of the dense network where the output of layers is connected directly to all previous layers.
Generally, a dense network is composed of several dense blocks and transition layers. Here we only use two dense blocks and transition layers to form simple dense networks. Using Eq. (5) to express the dense block: (5) $$\gamma =\alpha ([{x}_{0},{x}_{1},\dots ,{x}_{i1}],{\beta}_{i})$$where $[{x}_{0},{x}_{1},\dots ,{x}_{i1}]$ denotes the cascade of characteristic graphs, β_{i} is the weight of the corresponding layer. In the ADIDUNET model proposed in this paper, the feature f_{4} (refer to Fig. 2) is fed to the transition layer, which is mainly composed of BN, RELU, and average pooling operation. Later the feature is batch standardized and rectified before convolving with a 1 × 1 kernel function. Again, the filtered outputs go through the same operation and are convoluted with 3 × 3 kernel, before concatenating with the input feature f_{4}. The detailed structure of the two dense blocks and transition layers used in the encoder structure is shown in Fig. 5A. Here w, h correspond to the width and height of the input, respectively, and b represents the number of channels. Besides, s represents the step size of the pooling operation, n represents the number of filtering operations performed by each layer. In our model, n takes values 32, 64, 128, 256, and 512. It should be noted that the output of the first dense layer is the aggregated result of 4 convolution operations (4 × n), which is employed to emphasize the features learning by reducing the loss of features. In the decoding structure, to restore the resolution of the predicted segmentation, a traditional upsampling layer of the UNET (Ronneberger, Fischer & Brox, 2015) is used instead of the transition layer. The detailed structure is shown in Fig. 5B.
For the proposed network, we use only two dense networks mainly (a) to reduce the computation costs and (b) experiments with different layers of dense networks suggest that the use of two dense networks was sufficient since the segmentation results were accurate and comparable to the ground truth. Figure 6 and Table 2 illustrate the qualitative and quantitative comparisons with different numbers of dense network in the encoderdecoder framework.
Number of dense network  ACC  DC  S_{en}  S_{p}  P_{c}  AUC  F1  S_{m}  E_{α}  MAE 

Num_{1}  0.9696  0.7971  0.8011  0.9958  0.8290  0.9513  0.8129  0.8411  0.9315  0.0088 
Num_{2}  0.9700  0.8011  0.8096  0.9966  0.8596  0.9492  0.8184  0.8528  0.9394  0.0083 
Num_{3}  0.9686  0.7569  0.7546  0.9957  0.8200  0.9334  0.7806  0.8349  0.9379  0.0104 
Num_{4}  0.9699  0.7869  0.7579  0.9961  0.8485  0.9495  0.8241  0.8341  0.9348  0.0090 
UNET  0.9696  0.7998  0.8052  0.9957  0.8247  0.9347  0.8154  0.8400  0.9390  0.0088 
ResNet  0.9698  0.8002  0.7978  0.9962  0.8344  0.9504  0.8180  0.8415  0.9352  0.0094 
From the analysis of results in Fig. 6 and Table 2, it is found that the effect of using two dense networks in the model is obvious and can present accurate segments of the infected areas that can be inferred directly from the qualitative and quantitative metrics.
Moreover, with high accuracy and a good Dice coefficient, the choice of two dense networks is the best choice in the encoder decoder pipeline. Also, using two dense networks in place of traditional convolutions or residual networks enable global feature propagation, encourage feature reuse, and also solve the gradient disappearance problems associated with deep networks thereby significantly improving the segmentation outcomes.
Improved dilation convolution
Since the encoder pipeline of the UNET structure is analogous to the traditional CNN architecture, the pooling operations involved at each layer propagate either the maximum or the average characteristics of the extracted features, hence connecting the encoder outputs directly to decoder, thus limiting the segmentation accuracy of the network. The RDAUNET proposed by Zhuang et al. (2019a) utilized a dilation convolution (DC) module between the encoderdecoder pipeline to increase the receptive field and further learn the boundary information accurately. Also, the DC module is often used in many variant UNETs (Chen et al., 2019; Yu & Koltun, 2015) to improve the receptive field, hence, we use the DC module and introduce additional novelty in the DC module.
Equation (6) describes the DC operation between the input image $f\left(x,y\right)$ and the kernel $g\left(i,j\right)$.
(6) $$p(x,y)=\alpha \left\{\sum _{i,j}f(x+i\times r,y+j\times r)\times g\left(i,j\right)+k\right\}$$where α is the RELU function, k is a bias unit $\left(i,j\right)$ and $\left(x,y\right)$ denote the coordinates of the kernel and those of the input images respectively, and r is the dilation rate that controls the size of receptive fields. The size of the receptive field obtained can be expressed as follows: (7) $$N=\left(\left(k\mathrm{\_}{f}_{\mathrm{s}\mathrm{i}\mathrm{z}\mathrm{e}}+1\right)\times \left(r1\right)+k\mathrm{\_}{f}_{\mathrm{s}\mathrm{i}\mathrm{z}\mathrm{e}}\right)$$where k_f_{size} is the convolution kernel size, r is the convolution rate of the dilation and N is the size of the receptive field. As shown in Fig. 7.
Based on our experimental analysis we understand that DC module has a pronounced effect in extracting information for larger objects or lesions and considering that most of the early groundglass opacity (GGO) or late lung consolidation lesions have smaller areas, we present an improved dilation convolution (IDC) module between the encoder–decoder framework to accurately segment smaller regions.
Figure 8 illustrates the IDC module that consists of several convolution functions with different dilation rates and rectified linear functions (RELU). Our improvements are as follows: (a) combining single strided convolution operations and dilated convolutions with dilation rate such as 2, 4, 8, and 16, respectively. The above combination helps in the extraction of features from both smaller and larger receptive fields thus assisting in the isolation of the small infected COVID19 regions seen in lung CT scans and (b) referring to the idea of the dense network (Huang et al., 2017), we concatenate the input of the IDC module to its output and use the information of input features to further enhance feature learning. The input of IDC module is the rough segmentation regions obtained by encoder structure. The combination of the original segmentation region features and the accurate features extracted by IDC module not only avoids the loss of useful information, but also provides accurate input for the decoding pipeline, which is conducive to improve the segmentation accuracy of the model. As the inputs advance (left to right in Fig. 8), they get convolved with a 3 × 3 kernel of convolution layers and the dilation rate of IDC is 2, 4, 8, and 16, respectively. From the comparative experiments with the traditional DC model (the dilation rate is the same for both the models), we find that the computational cost and computation time required for the IDC module is less than that of the DC module, as shown in Table 3.
Method  Total parameters  Trainable parameters  Nontrainable parameters  Train time epoch/(s)  Test time (s) 

DDUNET  56,223,034  56,190,272  32,762  145  8 
DIDUNET  52,162,362  52,132,416  29,946  135  3 
From Fig. 9 and Table 4, it is found that the use of layers with convolution and smaller dilation rates at the end along with others ensures the cumulative extraction of features from both smaller and larger receptive fields thus assisting in the isolation of the small infected COVID19 regions seen in lung CT scans. Also, the performance scores specifically the Dice coefficient is higher (about 3%) for DIDUNET compared to DDUNET. In summary, the IDC model connected between the encoder–decoder structure, reduces loss of the original features but additionally expands the field of the segmented areas thereby improving the overall segmentation effect.
Method  ACC  DC  S_{en}  S_{p}  P_{c}  AUC  F1  S_{m}  E_{α}  MAE 

UNET  0.9696  0.7998  0.8052  0.9957  0.8247  0.9347  0.8154  0.8400  0.9390  0.0088 
DDUNET  0.9697  0.7757  0.7402  0.9971  0.8622  0.9214  0.7923  0.8401  0.9312  0.0094 
DIDUNET  0.9700  0.8023  0.7987  0.9964  0.8425  0.9549  0.8241  0.8447  0.9374  0.0084 
Attention gate
Although the improved dilation convolution improves the feature learning ability of the network, due to the loss of spatial information in the feature mapping at the end of the encoder structure, the network has difficulties in reducing false prediction for (a) small COVID19 infected regions and (b) areas with blurry edges with poor contrast between the lesion and background. To solve this problem, we introduce the attention gate (AG) model shown in Fig. 10 mechanism into our model instead of simple cropping and copying. AG model computes the attention coefficient $\sigma \in \left[0,1\right]$, based on Eq. (8): (8) $$\sigma ={\epsilon}_{2}\left\{{p}_{k}\left[{p}_{i}\left({\epsilon}_{1}\left({p}_{n}\times n+{p}_{m}\times m+{b}_{m,n}\right)\right)+{b}_{int}\right]+{b}_{k}\right\}$$ (9) $${\epsilon}_{2}\left(x\right)={\displaystyle \frac{1}{1+\mathrm{exp}\left(x\right)}}$$where n and m represent the feature mapping of the AG module input from the decoder and encoder pipelines, respectively. And p_{m}, p_{n}, p_{i}, p_{k} are the convolution kernels of size 1 × 1. b_{m,n}, b_{int}, b_{k} represent the offset unit. ε_{1} and ε_{2} denote the RELU and sigmoid activation function respectively. Here ε_{2} limits the range between 0 and 1.
Finally, the attention coefficient σ is multiplied by the input feature map f_{i} to present the output g_{o} as shown in Eq. (10): (10) $${g}_{o}=\sigma \times {f}_{i}$$
From Fig. 11 and Table 5, results showed that the inclusion AG module improved the performance of the network (ADIDUNET), with segmentation accuracy of almost 97%. Therefore, by introducing the AG model, the network makes full use of the output feature information of encoder and decoder, which greatly reduces the probability of false prediction of small targets, and effectively improves the sensitivity and accuracy of the model.
Method  ACC  DC  S_{en}  S_{p}  P_{c}  AUC  F1  S_{m}  E_{α}  MAE 

UNET  0.9696  0.7998  0.8052  0.9957  0.8247  0.9347  0.8154  0.8400  0.9390  0.0088 
AGUNET  0.9697  0.8020  0.8106  0.9962  0.8347  0.9571  0.8116  0.8511  0.9345  0.0087 
DAUNET  0.9698  0.7754  0.7400  0.9959  0.8470  0.9274  0.7930  0.8334  0.9104  0.0091 
IDAUNET  0.9698  0.7961  0.7834  0.9964  0.8469  0.9450  0.8126  0.8513  0.9437  0.0085 
ADIDUNET  0.9701  0.8031  0.7973  0.9966  0.8476  0.9551  0.8200  0.8509  0.9449  0.0082 
Experimence results
COVID19 segmentation dataset collection and processing
Organizing a COVID19 segmentation dataset is timeconsuming and hence there are not many CT scan segmentation datasets. At present, there was only one standard dataset namely the COVID19 segmentation dataset (MedSeg, 2020), which was composed of 100 axial CT scans from different COVID19 patients. All CT scans were segmented by radiologists associated with the Italian Association of medicine and interventional radiology. Since the database was updated regularly, on April 13, 2020, another segmented CT scans dataset with segment labels from Radiopaedia was added. The whole datasets that contained both positive and negative slices (373 out of the total of 829 slices have been evaluated by a radiologist as positive and segmented), were selected for training and testing the proposed model.
The dataset consists of 1,838 images with annotated ground truth was randomly divided into 1,318 training samples, 320 validation samples, and 200 test samples. Since the number of training images is less, we expand the training dataset where we first merge the COVID19 lung CT scans with the ground scene and then perform six affine transformations as mentioned in Krizhevsky, Sutskever & Hinton (2012). Later the transformed image is separated from the new background truth value and added to the training dataset as additional training images. Therefore, the 1,318 images of the training dataset are expanded, and 9,226 images are obtained for training. Figure 12 illustrates the data expansion process.
Segmentation evaluation index
The commonly used evaluation indicators for segmentation such as accuracy (ACC), precision (P_{c}), Dice coefficient (DC), the area under the curve (AUC), sensitivity (S_{en}), specificity (S_{p}) and F1 score (F1) were used to evaluate the performance of the model. These performance indicators are calculated as follows:
(1) For computing accuracy, precision, sensitivity, specificity, and F1 score we generate the confusion matrix where the definitions of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) are shown in Table 6.
Category  Actual lesion  Actual nonlesion 

Predicted Lesion  True Position (TP)  False Position (FP) 
Predicted NonLesion  False Negative (FN)  True Negative (TN) 
(1) Accuracy (ACC): A ratio of the number of correctly predicted pixels to the total number of pixels in the image.
(11) $$\mathrm{A}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{r}\mathrm{a}\mathrm{c}\mathrm{y}(\mathrm{A}\mathrm{C}\mathrm{C})={\displaystyle \frac{\mathrm{T}\mathrm{P}+\mathrm{T}\mathrm{N}}{\mathrm{T}\mathrm{P}+\mathrm{T}\mathrm{N}+\mathrm{F}\mathrm{P}+\mathrm{F}\mathrm{N}}}$$(2) Precision (P_{c}): A ratio of the number of correctly predicted lesion pixels to the total number of predicted lesion pixels.
(12) $$\mathrm{P}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n}({P}_{c})={\displaystyle \frac{\mathrm{T}\mathrm{P}}{\mathrm{T}\mathrm{P}+\mathrm{F}\mathrm{P}}}$$(3) Sensitivity (Sen): A ratio of the number of correctly predicted lesion pixels to the total number of actual lesion pixels.
(13) $$\mathrm{S}\mathrm{e}\mathrm{n}\mathrm{s}\mathrm{i}\mathrm{t}\mathrm{i}\mathrm{v}\mathrm{i}\mathrm{t}\mathrm{y}(\mathrm{S}\mathrm{e}\mathrm{n})={\displaystyle \frac{\mathrm{T}\mathrm{P}}{\mathrm{T}\mathrm{P}+\mathrm{F}\mathrm{N}}}$$(4) F1 score (F1): A measure of balanced accuracy obtained from a combination of precision and sensitivity results.
(14) $$\mathrm{F}1\phantom{\rule{thickmathspace}{0ex}}\mathrm{s}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e}(\mathrm{F}1)=2\times {\displaystyle \frac{{P}_{c}\times {S}_{en}}{{P}_{c}+{S}_{en}}}$$(5) Specificity (S_{p}): A ratio of the number of correctly predicted nonlesion pixels to the total number of actual nonlesion pixels.
(15) $$\mathrm{S}\mathrm{p}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{f}\mathrm{i}\mathrm{c}\mathrm{i}\mathrm{t}\mathrm{y}({S}_{p})={\displaystyle \frac{\mathrm{T}\mathrm{N}}{\mathrm{T}\mathrm{N}+\mathrm{F}\mathrm{P}}}$$(6) Dice coefficient (DC): Represents the similarity between the model segment output (Y) and the ground truth (X). The higher the similarity between the lesion and the ground truth, the larger the Dice coefficient and the better the segmentation effect. Dice coefficient is calculated as follows: (16) $$\mathrm{D}\mathrm{i}\mathrm{c}\mathrm{e}\phantom{\rule{thickmathspace}{0ex}}\mathrm{C}\mathrm{o}\mathrm{e}\mathrm{f}\mathrm{f}\mathrm{i}\mathrm{c}\mathrm{i}\mathrm{e}\mathrm{n}\mathrm{t}(\mathrm{D}\mathrm{C})={\displaystyle \frac{2\times \left(X\cap Y\right)}{X+Y}}$$
Also, we use a Dice coefficient (Dice, 1945) loss (dice_loss) as the training loss of the model, the calculation is as follows: (17) $$\text{Train Loss = Dice Coefficient Loss}=1.0{\displaystyle \frac{2\times \left(X\cap Y\right)}{X+Y}}$$
(7) The area under the curve (AUC): AUC is the area under the receiver operating characteristic (ROC) curve. It represents the degree or the measure of separability and indicates the capability of the model in distinguishing the classes. Higher the AUC better is the segmentation output and hence the model.
In addition to the above widely used indicators, we also introduce the Structural metric (S_{m}) (Fan et al., 2017), Enhanced alignment metric (E_{α}) (Fan et al., 2018) and Mean Absolute Error (MAE) (Fan et al., 2020; Elharrouss et al., 2020) to measure the segmentation similarity with respect to the ground truth.
(8) Structural metric (S_{m}): Measures the structural similarity between the prediction map and ground truth segmented mask, it is more in line with the human visual system than Dice coefficient.
(18) $${S}_{m}=\left(1\beta \right)\times {S}_{os}\left({S}_{op},{S}_{gt}\right)+\beta \times {S}_{or}\left({S}_{op},{S}_{gt}\right)$$where S_{os} stands for target perception similarity, S_{or} stands for regional perceptual similarity, β = 0.5 is a balance factor between S_{os} and S_{or}. And S_{op} stands for the final prediction result and S_{gt} represents the ground truth.
(9) Enhance alignment metric (E_{α}): Evaluates the local and global similarity between two binary maps computed based on Eq. (19): (19) $${E}_{\alpha}={\displaystyle \frac{1}{w\times h}{\sum}_{i}^{w}{\sum}_{j}^{h}\alpha \times \left({S}_{op}\left(i,j\right),{S}_{gt}\left(i,j\right)\right)}$$where w and h are the width and height of ground truth S_{gt}, (i,j) denotes the coordinates of each pixel in S_{gt}. α represents the enhanced alignment matrix: (20) $$\alpha ={\displaystyle \frac{2\times \left({S}_{gt}\sqrt{{S}_{gt}}\right)\times \left({S}_{op}\sqrt{{S}_{op}}\right)}{{\left({S}_{gt}\sqrt{{S}_{gt}}\right)}^{2}+{\left({S}_{op}\sqrt{{S}_{op}}\right)}^{2}}}$$
(10) Mean Absolute Error (MAE): Measures the pixelwise difference between S_{op} and S_{gt}, defined as: (21) $$\mathrm{M}\mathrm{A}\mathrm{E}={\displaystyle \frac{1}{w\times h}{\sum}_{i}^{w}{\sum}_{j}^{h}\left{S}_{op}\left(i,j\right){S}_{gt}\left(i,j\right)\right}$$
Experimental Details
The ADIDUNET proposed in this paper is implemented in Keras framework and is trained and tested by using the workstation with NVIDIA GPU P5000. During the training process, we set the learning rate as ${l}_{r}=1\times {10}^{3}$, and Adam optimizer was selected as the optimization technique. The 9,226 training samples, 320 verification samples, and 200 test samples were resized to 128 × 128 and trained with a batch size of 32 for 300 epochs. Figures 13 and 14 shows the performance curves obtained for the proposed ADIDUNET during training, validation, and testing.
Segmentation results and discussion
Qualitative results
To show the performance of the ADIDUNET model, we used 200 pairs of COVID19 lung infection CT scans as test data, and the segmentation results are shown in Fig. 15. From the analysis of Fig. 15, it was found that the ADIDUNET model can accurately segment the COVID19 lung infection areas from the CT scans, especially the smaller infected areas, and the segmentation result is very close to the ground truth. This illustrates the effectiveness of the proposed method for the segmentation of COVID19 lung infection regions from CT scans. Moreover, we can also see that ADIDUNET can accurately segment the complicated infection areas (single COVID19 lung infection areas and more complex uneven distribution infection areas) in CT scans, which further proves the power of the model proposed in this paper. In a word, the ADIDUNET model proposed in this paper can effectively and accurately segment COVID19 lung infection areas with different sizes and uneven distribution, and the visual effect of segmentation is very close to the gold standard.
Further, we also compare the proposed model with other stateofart segmentation models. From the results (Figs. A1 and A2 and Table 7), we can infer that the ADIDUNET model presents segmentation outputs closer to the ground truth. In contrast, the FCN8s network (Long, Shelhamer & Darrell, 2015) presents more under and over segmented regions. Further RADUNET (Zhuang et al., 2019a) presents comparable segmentation results but its effect is less pronounced for smaller segments. Analyzing the segmentation visual results from Figs. A1 and A2, we can clearly find that the ADIDUNET model proposed in this paper can accurately segment the COVID19 lung infection regions than other stateoftheart model with results close to the ground truth, which proves the efficacy of the proposed ADIDUNET model.
Method  ACC  DC  S_{en}  S_{p}  P_{c}  AUC  F1  S_{m}  E_{α}  MAE 

FCN8s  0.9666  0.6697  0.6692  0.9923  0.6860  0.9485  0.6724  0.7539  0.9134  0.0157 
UNET  0.9696  0.7998  0.8052  0.9957  0.8247  0.9347  0.8154  0.8400  0.9390  0.0088 
Segnet  0.9684  0.7408  0.7608  0.9937  0.7549  0.9492  0.7558  0.8080  0.9374  0.0125 
Squeeze UNET  0.9689  0.7681  0.7827  0.9946  0.7776  0.9446  0.7785  0.8227  0.9326  0.0107 
Residual UNET  0.9697  0.7924  0.7905  0.9961  0.8248  0.9444  0.8055  0.8397  0.9324  0.0094 
RAD UNET  0.9699  0.7895  0.7625  0.9970  0.8601  0.9419  0.8062  0.8475  0.9328  0.0096 
Fan et al. (2020)      0.7390  0.7250  0.9600              0.8000  0.8940  0.0640 
Elharrouss et al. (2020)      0.7860  0.7110  0.9930  0.8560      0.7940          0.0760 
Yan et al. (2020)      0.7260  0.7510      0.7260                     
Zhou, Canu & Ruan (2020)      0.6910  0.8110  0.9720                         
Chen, Yao & Zhang (2020)  0.8900          0.9930  0.9500                     
ADIDUNET  0.9701  0.8031  0.7973  0.9966  0.8476  0.9551  0.8200  0.8509  0.9449  0.0082 
Quantitative results
Table 7, presents the performance scores for various indicators mentioned in “Experimence Results”. Here, for ADIDUNET the scores such as the Dice coefficient, precision, F1 score, specificity and AUC are 80.31%, 84.76%, 82.00%, 99.66% and 95.51%, respectively. Further, most of the performance indexes are above 0.8 with the highest segmentation accuracy of 97.01%. The above results clearly indicates that the proposed model presents segmentation outputs closer to ground truth annotations.
Discussion
The proposed model presents an improved version of the UNET model obtained by the inclusion of modules such as the dense network, IDC and the attention gates to the existing UNET (Ronneberger, Fischer & Brox, 2015) structure. The effectiveness of these additions were experimentally verified in “Methods”. Further, to summarize the effectiveness of the addition of each module to the UNET architecture, Table 8 tabulates the improvement at each stage of the addition. From Table 8, it is found that adding additional components to the UNET (Ronneberger, Fischer & Brox, 2015) structure can obviously improve the overall segmentation accuracy of the network. For example, with the inclusion of the dense networks (DUNET), the metrics such as Dice coefficien (DC) and AUC reached 79.98% and 93.47%, respectively.
Method  ACC  DC  S_{en}  S_{p}  P_{c}  AUC  F1  S_{m}  E_{α}  MAE 

UNET  0.9696  0.7998  0.8052  0.9957  0.8247  0.9347  0.8154  0.8400  0.9390  0.0088 
DUNET  0.9700  0.8011  0.8096  0.9966  0.8596  0.9492  0.8184  0.8528  0.9394  0.0083 
DIDUNET  0.9700  0.8023  0.7987  0.9964  0.8425  0.9549  0.8241  0.8447  0.9374  0.0084 
ADIDUNET  0.9701  0.8031  0.7973  0.9966  0.8476  0.9551  0.8200  0.8509  0.9449  0.0082 
Improvement of DUNET  ↑0.04%  ↑0.13%  ↑0.44%  ↑0.09%  ↑3.49%  ↑1.45%  ↑0.30%  ↑1.28%  ↑0.04%  ↓0.05% 
Improvement of DIDUNET  ↑0.04%  ↑0.25%  ↓0.65%  ↑0.07%  ↑1.78%  ↑2.02%  ↑0.87%  ↑0.47%  ↓0.16%  ↓0.04% 
Improvement of ADIDUNET  ↑0.05%  ↑0.33%  ↓0.79%  ↑0.09%  ↑2.29%  ↑2.04%  ↑0.46%  ↑1.09%  ↑0.59%  ↓0.06% 
Further, the inclusion of the IDC improved the scores further (DIDUNET). Finally, the proposed model with dense network, IDC and the AG modules (namely ADIDUNET) presented the best performance scores and provided an improvement of 0.05%, 0.33%, 2.29%, 2.04% and 1.09% for metrics such as accuracy, DC, precision, AUC and structural metric respectively when compared to traditional UNET architecture.
Furthermore, from Figs. A1 and A2, it is obvious that ADIDUNET performs better than other wellknown segmentation models in terms of visualization. Specifically, ADIDUNET can segment relatively smaller infected regions which is of great significance for clinical accurate diagnosis of COVID19 infection location. The use of (a) dense networks instead of traditional convolution and maxpooling function, (b) inclusion of improved dilation convolution module between the encoderdecoder pipeline and (c) the presence of attention gate network in the skip connections have presented accurate segmentation outputs for various types of COVID19 infections (GGO and pulmonary consolidation). However, ADIDUNET still has room for improvement in terms of Dice coefficient and sensitivity and also computational costs which can be researched in future.
Conclusion
The paper proposes a new variant of UNET (Ronneberger, Fischer & Brox, 2015) architecture to accurately segment the COVID19 lung infections in CT scans. The model, ADIDUNET includes dense networks, improved dilation convolution, and attention gate, which has strong feature extraction and segment capabilities. The experimental results show that ADIDUNET is effective in segmenting small infection regions, with performance metrics such as accuracy, precision and F1 score of 97.01%, 84.76%, and 82.00%, respectively. The segmentation results of the ADIDUNET network can aid the clinicians in faster screening, quantification of the lesion areas and provide an overall improvement in the diagnosis of COVID19 lung infection.
Appendix
We describe the abbreviations of this paper in detail, as shown in Table A1.
Abbreviation  Explanation 

DUNET  Inclusion of Dense networks to the UNET structure 
AGUNET  Inclusion of Attention gate module to the UNET structure 
DAUNET  Inclusion of both dense networks and attention gate module to the UNET structure 
IDAUNET  Inclusion of Improved dilation convolution and Attention Gate module to the UNET structure 
DIDUNET  Inclusion of dense networks and improved dilation convolution to the UNET structure 
ADIDUNET  Inclusion of dense networks, Improved dilation convolution and Attention Gate modules to the UNET structure 