Metal artifact reduction combined with deep learning image reconstruction algorithm for CT image quality optimization: a phantom study

Huachun Zou; Zonghuo Wang; Mengya Guo; Kun Peng; Jian Zhou; Lili Zhou; Bing Fan

doi:10.7717/peerj.19516

Metal artifact reduction combined with deep learning image reconstruction algorithm for CT image quality optimization: a phantom study

Huachun Zou^1,2, Zonghuo Wang², Mengya Guo³, Kun Peng², Jian Zhou², Lili Zhou ¹, Bing Fan ²

1School of Medical and Information Engineering, Gannan Medical University, Ganzhou, China

2Department of Radiology, Jiangxi Provincial People’s Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China

3CT Imaging Research Center, GE Healthcare China, Beijing, China

DOI: 10.7717/peerj.19516

Published: 2025-06-04
Accepted: 2025-05-02
Received: 2024-11-29

Academic Editor: Nikolaos Gkantidis

Subject Areas: Cardiology, Radiology and Medical Imaging, Computational Science, Data Mining and Machine Learning, Data Science
Keywords: Deep learning image reconstruction, Metal artifact reduction, CT, Image quality, Diagnostic performance

Copyright: © 2025 Zou et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Zou H, Wang Z, Guo M, Peng K, Zhou J, Zhou L, Fan B. 2025. Metal artifact reduction combined with deep learning image reconstruction algorithm for CT image quality optimization: a phantom study. PeerJ 13:e19516 https://doi.org/10.7717/peerj.19516

The authors have chosen to make the review history of this article public.

Abstract

Background

Aiming to evaluate the effects of the smart metal artifact reduction (MAR) algorithm and combinations of various scanning parameters, including radiation dose levels, tube voltage, and reconstruction algorithms, on metal artifact reduction and overall image quality, to identify the optimal protocol for clinical application.

Methods

A phantom with a pacemaker was examined using standard dose (effective dose (ED): 3 mSv) and low dose (ED: 0.5 mSv), with three scan voltages (70, 100, and 120 kVp) selected for each dose. Raw data were reconstructed using 50% adaptive statistical iterative reconstruction-V (ASIR-V), ASIR-V with MAR, high-strength deep learning image reconstruction (DLIR-H), and DLIR-H with MAR. Quantitative analyses (artifact index (AI), noise, signal-to-noise ratio (SNR) of artifact-impaired pulmonary nodules (PNs), and noise power spectrum (NPS) of artifact-free regions) and qualitative evaluation were performed.

Results

Quantitatively, the deep learning image recognition (DLIR) algorithm or high tube voltages exhibited lower noise compared to the ASIR-V or low tube voltages (p < 0.001). AI of images with MAR or high tube voltages was significantly lower than that of images without MAR or low tube voltages (p < 0.001). No significant difference was observed in AI between low-dose images with 120 kVp DLIR-H MAR and standard-dose images with 70 kVp ASIR-V MAR (p = 0.143). Only the 70 kVp 3 mSv protocol demonstrated statistically significant differences in SNR for artifact-impaired PNs (p = 0.041). The f_peak and f_avg values were similar across various scenarios, indicating that the MAR algorithm did not alter the image texture in artifact-free regions. The qualitative results of the extent of metal artifacts, the confidence in diagnosing artifact-impaired PNs, and the overall image quality were generally consistent with the quantitative results.

Conclusion

The MAR algorithm combined with DLIR-H can reduce metal artifacts and enhance the overall image quality, particularly at high kVp tube voltages.

Introduction

With the accelerated global aging population, metal implants for fixation or prosthetic replacement—including dental prostheses (Bayerl et al., 2023), spinal screws (Enache et al., 2025), hip arthroplasty (Zhao et al., 2023), and cardiovascular implantable electronic devices (CIEDs) (Wong & Devereaux, 2019)—have been extensively utilized. These metallic devices induce substantial imaging artifacts through multiple physical mechanisms such as photon starvation phenomena, beam-hardening effects and scatter, which collectively degrade diagnostic image quality in CT. Specifically, metal artifacts generated by CIEDs during CT scanning significantly degrade visualization of adjacent anatomical structures, including mediastinal vasculature, lymph nodes, and parenchymal tissues (Pennig et al., 2021; Zhao et al., 2023). These artifacts critically compromise diagnostic accuracy in routine thoracic CT applications, particularly impacting lung cancer screening sensitivity, treatment planning, and therapeutic response assessment (Kikuchi et al., 2020).

In recent years, various technical strategies have been developed to mitigate metal artifacts in CT imaging, including optimization of acquisition parameters (increased tube voltage and tube current) (Selles et al., 2024), high-keV virtual monoenergetic imaging of spectral CT (Laukamp et al., 2019; Bongers et al., 2015; Khodarahmi et al., 2018; Long et al., 2019) and metal artifact reduction (MAR) algorithms (e.g., projection completion MAR, iterative MAR) (Lehti et al., 2020; Wichtmann et al., 2023). Among these, MAR techniques have emerged as pivotal solutions due to their ability to correct abnormal X-ray attenuation profiles caused by metallic implants through either projection data compensation or image domain iterations (Chae et al., 2020; Choo et al., 2021; Dunet et al., 2017; Kim et al., 2020; Kanani et al., 2022; Kovacs et al., 2018). Specifically, projection-based MAR algorithm such as smart MAR (GE HealthCare, Chicago, IL, USA) synthesizes corrected projections using a combination of both the original and substitutive projection data, potentially inducing global alterations in the projection domain (Fukugawa et al., 2022). However, most existing research predominantly focuses on the artifact reduction in artifact-impeded areas, while less attention is paid to artifact-free regions, particularly in terms of image texture preservation.

Furthermore, in images with metal artifacts, it’s important to consider not only the extent of artifacts but also the overall image quality, including image noise, contrast and textures preservation. These comprehensive image quality metrics, as well as artifact reduction, are influenced by a variety of parameters. For instance, tube voltage impacts both image contrast and artifacts degree (Zhao et al., 2023), while the radiation dose and reconstruction algorithm generally impact the image noise (Szczykutowicz et al., 2021) in general. However, in some literature, certain reconstruction algorithms have demonstrated significant potential to reduce beam hardening artifacts (Fujita et al., 2023; Yasaka et al., 2017). For instance, Li et al. (2024) demonstrated that the artificial intelligence iterative reconstruction (AIIR) algorithm can mitigate streak artifacts caused by irregular arm positioning, thus reducing the likelihood of misdiagnosis. With the advancement of artificial intelligence, deep learning image reconstruction (DLIR, TrueFidelity, GE Healthcare) algorithms have emerged. This is a vendor-specific, deep convolutional neural network-based image reconstruction technique that is trained under supervision with millions of parameters simultaneously, in order to produce an output image similar to filtered back projection (FBP) (Zhu et al., 2024). Compared to adaptive statistical iterative reconstruction-V (ASIR-V), which applies advanced noise, physics, and object modeling, they can effectively balance noise, radiation dose, and image texture (Yang et al., 2021). Previous studies have demonstrated that DLIR has an excellent ability to improve image quality and reduce radiation doses in metal-free scenarios, including thoracic (Jiang et al., 2022a; Zhao et al., 2022; Yao et al., 2022), abdominal (Jensen et al., 2022; Caruso et al., 2024), and cerebral CT (Jiang et al., 2022b; Jiang et al., 2024). While in images with metal artifacts, as we know, only Sun et al. (2024) investigated the feasibility of metal artifact reduction in low-dose spinal CT for post-surgical children based on a combination of the MAR and DLIR algorithms.

This study therefore performed a pacemaker-embedded (a specific type of CIED) phantom experiment to systematically investigate the metal reduction and image quality improvement of the combination between MAR and DLIR algorithms under various scan conditions (different tube voltages and radiation doses). The dual objectives focus on (1) optimizing metal-implant CT protocols, while (2) pioneering the clinical implementation of DLIR in artifact management. The key innovations reside in the novel integration of DLIR with MAR across diverse radiation dose regimes, coupled with the first comprehensive assessment of image texture fidelity in artifact-free regions.

Methods and Materials

Phantom

In this study, the Lungman chest phantom (Lungman ph-1, Kyoto Kagaku Inc., Japan) was utilized. The anatomical structures, including the trachea, pulmonary vessels, and mediastinum, were simulated using tissue substitutes. Thirteen spherical nodules (CT value = −800 HU, corresponding diameters = 12, 10, 8, and 5 mm; CT value = −630 HU, corresponding diameters = 12, 10, 8, 5, and 3 mm; CT value = 100 HU, corresponding diameters = 12, 10, 8, 5 mm) were randomly placed in the phantom using cotton. To investigate the impact of metal artifacts, a pacemaker was attached to the upper left of the chest phantom (Fig. 1A). Four non-solid pulmonary nodules (PNs) were obscured by streak artifacts (Fig. 1B).

Image acquisition and reconstruction

All scans were conducted using a 256-row multidetector CT scanner (Revolution Apex CT, GE Healthcare). To investigate the impacts of various scanning conditions on metal artifact reduction, three tube voltages (70, 100, and 120 kVp), along with their corresponding tube currents, were selected to achieve standard (3 mSv) and low (0.5 mSv) effective doses. The remaining scanning parameters were fixed across all scanning scenarios, as follows: display field of view (DFOV) of 42 cm × 42 cm, a pitch of 0.992, a detector width of 80 mm, a rotation time of 0.8 s/r, and a slice thickness of 1.25 mm. Furthermore, all acquisitions were reconstructed using high strength DLIR (DLIR-H), DLIR-H with metal artifact reduction (DLIR-H MAR), 50% adaptive statistical iterative reconstruction-V (ASIR-V), and ASIR-V with metal artifact reduction (ASIR-V MAR). CT scanning was repeated three times for each scenario.

Objective image quality evaluation

For quantitative analysis, the artifact index (AI), background noise, signal-to-noise ratio (SNR) of artifact-impaired non-solid PNs, noise power spectrum (NPS) of artifact-free regions were calculated. All image sequences were loaded into MITK software (v2024.06, The German Cancer Research Center, Heidelberg, Germany), and region of interests (ROIs) were delineated in background air, artifact areas, and artifact-impaired PNs by a well-experienced radiologist (with nine years of chest radiology experience) based on 3 mSv 120 kVp DLIR-H MAR images. To ensure consistent quantitative analysis of the same ROIs in other images, these ROIs were saved and subsequently imported into other images for analysis. According to a previous study (Chae et al., 2020), the AI was quantified using the following formula: $AI = \sqrt{{SD}_{artifact}^{2} - {SD}_{background}^{2}}$ , where SD_artifact and SD_background represent the standard deviation (SD) of the streak artifact and background, respectively. To represent the extent of the artifacts as comprehensively as possible, ROIs (50 mm²) were placed in five consecutive artifact-pronounced slices. Background ROIs (50 mm²) were located in the air among five consecutive artifact-free slices, and the SD_background in the AI formula was the average of these five background SD. To assess the influence of artifacts on PNs, the SNR values of artifact-impaired non-solid PNs (Fig. 1B) were calculated using the following formula: $SNR = \frac{|{mean}_{PN}|}{{SD}_{PN}}$ , where mean_PN and SD_PN refer to the average and SD of the CT values of the four PNs at the maximum slice, respectively. To investigate the influence of DLIR and MAR algorithms on the image texture of artifact-free regions, NPS was evaluated in a homogeneous heart using imQuest software (Clinical Imaging Physics Group, Duke University, Durham, NC, USA) and the NPS area, average spatial frequency (f_avg) and peak spatial frequency (f_peak) were calculated.

Subjective image quality evaluation

Two radiologists (5/9-year-experience in CT image diagnosis) independently performed subjective evaluations using a 5-point scale to assess the extent of metal artifacts, the confidence in diagnosing artifact-impaired PNs, and the overall image quality. The extent of metal artifacts was rated as follows: severe artifacts, unable to be diagnosed = 0, pronounced artifacts = 1, moderate artifacts = 2, mild artifacts = 3, and no artifacts = 4. The confidence in diagnosing artifact-impaired PNs was graded as follows: undetectable = 0, poorly detectable = 1, moderately detectable = 2, well detectable = 3, and manifestly detectable = 4. The overall image quality was rated as follows: very poor = 0, poor = 1, acceptable = 2, good = 3, and excellent = 4.

Statistical analysis

IBM SPSS statistical software (version 25.0, IBM Corp) was used for statistical analyses. According to the Shapiro–Wilk test, the objective parameters and subjective scores did not exhibit the normality. Therefore, AI, noise and SNR were expressed as M [Q1, Q3], where M represents the median, and Q1 and Q3 denote the first quartile and third quartile, respectively. The differences in these parameters among various image sets were compared using the Kruskal–Wallis test with a Bonferroni post hoc test. The evaluation of subjective consistency between two radiologists was conducted using Cohen’s kappa test, with values greater than 0.75 indicating high consistency, values ranging from 0.4 to 0.75 indicating average consistency, and values less than 0.4 indicating poor consistency. p < 0.05 indicates statistically significant differences.

Results

Objective image quality evaluation

Table 1 summarized the results of noise and AI evaluations of various reconstruction algorithms and scanning scenarios (radiation doses and tube voltages). Compared to ASIR-V, the background noise of DLIR-H was reduced by 25.95% to 53.50% (paired calculation), whereas the background noise of DLIR-H MAR was reduced by 27.73% to 54.24% compared to ASIR-V MAR (paired calculation) across different dose levels and tube voltages (all p < 0.001). The noise of low-dose images with DLIR were similar to those of standard-dose images with ASIR-V, although there were statistically significant differences (p < 0.05, Table 1). Furthermore, for different tube voltages, the noise values of 70 kVp images were significantly higher than those of the other two tube voltages (p < 0.001), except for DLIR-H (p = 0.87) and DLIR-H MAR (p = 0.735) images at 0.5 mSv. The background noise values showed no statistically significant difference between MAR and non-MAR images across different tube voltages and radiation doses (Figs. 2A–2F). Under the same tube voltage and reconstruction algorithm for different radiation doses, the noise of the 3 mSv image group is lower than that of the 0.5 mSv group (all p < 0.001, File S1).

Table 1:

Background noise and AI analysis across four different reconstruction algorithms (DLIR, DLIR-MAR, ASIR-V, and ASIR-V MAR), tube voltage (70 kVp/100 kVp/120 kVp) and dose level (3 mSv and 0.5 mSv) (Median [Q1,Q3]).

[mSv, kVp]	ASIR-V	ASIR-V MAR	DLIR-H	DLIR-H MAR	p
Noise
[3, 70]	32.9 [32.1, 35.1]^*^,⁺	35.5 [33.0, 36.8]^*^,⁺	17.8 [17.2, 18.9]⁺	18.9 [18.1, 20.2]^*^,⁺	<0.001
[3, 100]	28.4 [28.1, 29.1]	32.5 [30.4, 33.4]	15.7 [15.2, 16.0]^x	17.0 [16.2, 17.9]	<0.001
[3, 120]	29.9 [29.6, 31.0]	30.5 [29.6, 32.2]	17.2 [16.1, 17.7]	16.6 [16.1, 17.4]	<0.001
p	<0.001	<0.001	0.001	<0.001
[0.5, 70]	67.4 [62.6, 68.9]^*^,⁺	63.8 [62.6, 66.0]^*^,⁺	37.3 [36.5, 38.3]	38.8 [37.0, 40.2]	<0.001
[0.5, 100]	59.9 [58.0, 63.4]	58.4 [56.8, 60.4]	36.4 [35.1, 40.8]	37.3 [36.3, 39.5]	<0.001
[0.5, 120]	60.9 [58.4, 63.1]	60.6 [58.6, 62.6]	38.2 [34.6, 40.8]	38.2 [36.0, 40.3]	<0.001
p	<0.001	<0.001	0.87	0.735
AI
[3, 70]	103.5 [85.1, 219.3]^*^,⁺	39.4 [35.0, 48.3]^*^,⁺	92.4 [81.4,111.0]^*^,⁺	31.9 [30.2, 38.4]^*^,⁺	<0.001
[3, 100]	115.0 [68.6, 221.5]^X	29.1 [24.6, 35.6]	72.6 [59.4, 94.3]^X	27.9 [25.0, 33.2]	<0.001
[3, 120]	75.7 [58.7, 99.4]	29.2 [25.8, 34.9]	60.7 [50.3, 66.5]	27.3 [23.3, 31.4]	<0.001
p	0.002	<0.001	<0.001	0.001
[0.5, 70]	113.0 [98.8, 124.0]^*^,⁺	46.1 [33.5, 54.9]	115.6[104.3, 124.3]^*^,⁺	43.1 [36.6, 48.3]	<0.001
[0.5, 100]	67.8 [61.7, 77.7]	38.3 [20.7, 50.9]	79.6 [68.8, 83.5]	39.7 [29.8, 46.5]	<0.001
[0.5, 120]	65.3 [56.2, 78.6]	38.2 [4.5, 44.5]	71.2 [66.5, 80.7]	37.3 [28.7, 46.0]	<0.001
p	<0.001	0.078	<0.001	0.149

DOI: 10.7717/peerj.19516/table-1

Notes:

AI: artifact index
ASIR-V: 50% adaptive statistical iterative reconstruction-V
ASIR-V MAR: ASIR-V 50% with MAR
DLIR-H: deep learning image reconstruction with high strength
DLIR-H MAR: DLIR-H with MAR

*Value was statistically different between 70 kVp and 120 kVp group.

+Value was statistically different between 70 kVp and 100 kVp group.

XValue was statistically different between 100 kVp and 120 kVp group.

Figure 2: Comparison of noise and AI across four reconstruction algorithms (DLIR, DLIR-MAR, ASIR-V, and ASIR-V MAR).
The labels (A–F) denote the noise levels of the four groups under the following six scanning conditions: 120 kVp 3 mSv, 100 kVp 3 mSv, 70 kVp 3 mSv, 120 kVp 0.5 mSv, 100 kVp 0.5 mSv, and 70 kVp 0.5 mSv. The labels (G–I) denote the AI levels of the four groups under the following six scanning conditions: 120 kVp 3 mSv, 100 kVp 3 mSv, 70 kVp 3 mSv, 120 kVp 0.5 mSv, 100 kVp 0.5 mSv, and 70 kVp 0.5 mSv. Ns means no statistical difference. AI, artifact index; ASIR-V, 50% adaptive statistical iterative reconstruction-V; ASIR-V MAR, ASIR-V 50% with MAR; DLIR-H, deep learning image reconstruction with high strength; DLIR-H MAR, DLIR-H with MAR.

Download full-size image

DOI: 10.7717/peerj.19516/fig-2

AI decreased significantly with MAR (MAR: 27.3–46.1 HU; without MAR: 60.7–115.6 HU; all p < 0.001) and with high tube voltages (except for 0.5 mSv ASIR-V MAR and DLIR-H MAR, all p < 0.001), as shown in Table 1 and Figs. 2G–2L. Compared to low-kVp images, the AI values for high-kVp images decreased except 0.5 mSv ASIR-V MAR and DLIR-H MAR (e.g., ASIR-V and 0.5 mSv 70/100/120 kVp: 113.0 [98.8, 124.0]/67.8 [61.7, 77.7]/65.3 [56.2, 78.6] HU), with the lowest AI value obtained using the high-kVp combined with the MAR algorithm (3 mSv 120 kVp in DLIR-H MAR: 27.3 [23.3, 31.4] HU; 0.5 mSv 120 kVp in DLIR-H MAR: 37.3 [28.7, 46.0] HU). Furthermore, there was no statistically significant difference in AI values between ASIR-V and DLIR-H, or between ASIR-V MAR and DLIR-H MAR , among four groups comparison (Kruskal–Wallis test among ASIR-V, ASIR-V MAR, DLIR-H and DLIR-H MAR) (Figs. 2G–2L). These results indicate that the AI of low-dose images with 120 kVp DLIR-H MAR was comparable to that of standard-dose images with 70 kVp ASIR-V MAR (p = 0.143). Under different radiation doses, the AI of images in the 3 mSv group was significantly lower than that of the 0.5 mSv group across multiple protocols (70 kVp DLIR-H, DLIR-H MAR; 100 kVp ASIR-V, DLIR-H MAR; 120 kVp ASIR-V MAR, DLIR-H, DLIR-H MAR; p < 0.001, File S2).

Figure 3 and Table 2 indicate that among the four reconstruction algorithms, only the 70 kVp 3 mSv protocol demonstrated statistically significant difference in SNR for artifact-impaired PNs (p = 0.041), whereas no significant SNR differences were observed in other comparative analyses. In the 70 kVp 3 mSv group, ASIR-V MAR showed a median SNR increase of approximately 74.5% compared to ASIR-V, while DLIR-H MAR exhibited a 137.3% higher median SNR than DLIR-H. Under the same reconstruction algorithm and radiation dose, there was no statistical difference in the SNR of artifact- impaired PNs among three tube voltages (p-values refer to Table 2). Furthermore, except for the 70 kVp ASIR-V and 70 kVp DLIR-H groups, the SNR for artifact-impaired PNs in the 3 mSv groups was significantly higher than that of the 0.5 mSv groups across all other scan protocols (with varying voltages and reconstruction algorithms), demonstrating statistical significance (p < 0.05).

Figure 3: Boxplots of SNR across four reconstruction algorithms.
Boxplots of SNR across four reconstruction algorithms (DLIR, DLIR-MAR, ASIR-V, and ASIR-V MAR). The labels (A–F) denote the SNR levels of the four groups under the following six scanning conditions: 120 kVp 3 mSv, 100 kVp 3 mSv, 70 kVp 3 mSv, 120 kVp 0.5 mSv, 100 kVp 0.5 mSv, and 70 kVp 0.5 mSv. ASIR-V, 50% adaptive statistical iterative reconstruction-V; ASIR-V MAR, ASIR-V 50% with MAR; DLIR-H, deep learning image reconstruction with high strength; DLIR-H MAR, DLIR-H with MAR.

Download full-size image

DOI: 10.7717/peerj.19516/fig-3

Table 2:

SNR of artifact impaired non-solid PNs across four different reconstruction algorithms (DLIR, DLIR-MAR, ASIR-V, and ASIR-V MAR), tube voltage (70 kVp/100 kVp/120 kVp) and dose level (3 mSv and 0.5 mSv) (Median [Q1,Q3]).

[mSv, kVp]	ASIR-V	ASIR-V MAR	DLIR-H	DLIR-H MAR	p
SNR
[3, 70]	10.6 [7.7, 18.4]	18.5 [17.5, 20.5]	7.5 [6.5, 18.6]	17.8 [14.5, 25.2]	0.041
[3, 100]	12.7 [9.8, 16.9]	15.4 [12.8, 23.0]	10.8 [9.6, 17.7]	17.1 [11.9, 25.8]	0.194
[3, 120]	14.9 [11.7, 20.6]	18.7 [16.0, 21.8]	16.5 [11.1, 20.7]	20.7 [14.2, 26.6]	0.405
p	0.190	0.641	0.199	0.879
[0.5, 70]	7.4 [6.1, 9.6]	8.9 [8.5, 10.2]	6.47 [5.69, 9.30]	9.1 [7.3, 13.6]	0.143
[0.5, 100]	7.7 [6.7, 8.7]	9.5 [8.2, 12.0]	7.0 [5.9, 9.4]	9.4 [6.9, 13.5]	0.331
[0.5, 120]	8.8 [8.1, 9.21]	10.5 [8.4, 11.8]	7.9 [6.7, 9.8]	10.3 [8.0, 13.2]	0.271
p	0.462	0.574	0.543	0.911

DOI: 10.7717/peerj.19516/table-2

Notes:

SNR: signal-to-noise ratio
ASIR-V: 50% adaptive statistical iterative reconstruction-V
ASIR-V MAR: ASIR-V 50% with MAR
DLIR-H: deep learning image reconstruction with high strength
DLIR-H MAR: DLIR-H with MAR

Regarding the NPS, the trends of the NPS area under various kVp levels, radiation doses, and reconstruction algorithms were similar to those of background noise. Higher kVp, higher doses, or DLIR were more likely associated with the lower NPS areas. Regarding noise texture, the differences in f_peak/f_avg values between MAR and non-MAR images were insignificant, and both were closer to the reference values (3 mSv, FBP). Similarly, the influences of different voltages and radiation doses on f_peak/f_avg were also negligible in our study (Table 3 and Fig. 4).

Table 3:

NPS analysis across four different reconstruction algorithms (DLIR, DLIR-MAR, ASIR-V, and ASIR-V MAR), tube voltage (70 kVp/100 kVp/120 kVp) and dose level (3 mSv and 0.5 mSv).

[mSv, kVp]	ASIR-V	ASIR-V MAR	DLIR-H	DLIR-H MAR	Ref (3 mSv, FBP)
NPS area (HU²mm)
[3, 70]	41.7	41.6	22.0	22.1	77.7
[3, 100]	35.9	36.8	20.0	20.3	66.2
[3, 120]	36.4	36.6	20.4	20.3	67.1
[0.5, 70]	93.8	93.4	41.3	41.3	77.7
[0.5, 100]	85.1	84.4	39.7	39.7	66.2
[0.5, 120]	84.8	85.0	40.4	40.2	67.1
f_peak (mm⁻¹)
[3, 70]	0.44	0.42	0.39	0.40	0.45
[3, 100]	0.42	0.45	0.40	0.41	0.45
[3, 120]	0.45	0.43	0.40	0.42	0.43
[0.5, 70]	0.42	0.43	0.43	0.42	0.45
[0.5, 100]	0.42	0.42	0.42	0.42	0.45
[0.5, 120]	0.43	0.41	0.43	0.41	0.43
f_avg (mm⁻¹)
[3, 70]	0.43	0.42	0.43	0.42	0.43
[3, 100]	0.43	0.43	0.43	0.43	0.44
[3, 120]	0.43	0.43	0.43	0.43	0.44
[0.5, 70]	0.41	0.41	0.43	0.43	0.43
[0.5, 100]	0.41	0.41	0.43	0.43	0.44
[0.5, 120]	0.41	0.41	0.43	0.43	0.44

DOI: 10.7717/peerj.19516/table-3

Notes:

NPS: noise power spectrum
ASIR-V: 50% adaptive statistical iterative reconstruction-V
ASIR-V MAR: ASIR-V 50% with MAR
DLIR-H: deep learning image reconstruction with high strength
DLIR-H MAR: DLIR-H with MAR
f_peak: the peak spatial frequency of NPS
f_avg: the average spatial frequency of NPS

Figure 4: NPS curves across four reconstruction algorithms (DLIR, DLIR-MAR, ASIR-V, and ASIR-V MAR).
The labels (A–F) denote the NPS curves of the four groups under the following six scanning conditions: 120 kVp 3 mSv, 100 kVp 3 mSv, 70 kVp 3 mSv, 120 kVp 0.5 mSv, 100 kVp 0.5 mSv, and 70 kVp 0.5 mSv. obtained at various scanning scenarios. ASIR-V, 50% adaptive statistical iterative reconstruction-V; ASIR-V MAR, ASIR-V 50% with MAR; DLIR-H, deep learning image reconstruction with high strength; DLIR-H MAR, DLIR-H with MAR.

Download full-size image

DOI: 10.7717/peerj.19516/fig-4

Qualitative analysis

The interobserver agreements were significant concerning the extent of metal artifacts, the confidence in diagnosing artifact-impaired PNs, and the overall image quality (κ = 0.79, 0.82, 0.78 for 3 mSv, all p < 0.001; 0.74, 0.74, 0.73 for 0.5 mSv, all p < 0.001). The median [Q1, Q3] of the extent of metal artifacts assessments for 0.5 mSv ASIR-V, ASIR-V MAR, DLIR-H, and DLIR-H MAR were as follows: 2 [2, 2], 3 [3, 3], 2 [2, 3], and 3 [3, 3] (p < 0.001), and for 3 mSv were as follows: 2 [2, 3], 4 [4, 4], 2 [2, 3], and 4 [4, 4] (p < 0.001). The influence of artifacts in images with MAR was significantly lower than those in images without MAR, aligning with the findings from the objective evaluation. The median [Q1, Q3] of the overall image quality assessments for 0.5 mSv ASIR-V, ASIR-V MAR, DLIR-H, and DLIR-H MAR were as follows: 2 [1, 3], 3 [3, 3], 2 [2, 3], and 3 [3, 3] (p < 0.001), while for 3 mSv, the values were 2 [1, 3], 4 [3, 4], 2 [1.75, 3], and 4 [4, 4] (p < 0.001). These results indicate that DLIR-H MAR/ASIR-V MAR exhibited superior image quality compared to DLIR-H/ASIR-V (p < 0.001). In terms of diagnostic confidence for PNs, there were statistically significant differences in the subjective scores among different algorithms, the median [Q1, Q3] for 0.5 mSv ASIR-V, ASIR-V MAR, DLIR-H, and DLIR-H MAR were as follows: 1 [1, 2], 3 [2, 3], 2 [1, 2], and 4 [2.75, 4] (p < 0.001), while for 3 mSv, the values were 2 [1.75, 3], 4 [3.5, 4], 3 [3, 3], and 4 [4, 4] (p < 0.001), indicating that metal artifacts in the images significantly affected the diagnostic confidence for PNs lesions under the influence of artifacts (Table 4).

Table 4:

Subjective score across four different reconstruction algorithms (DLIR, DLIR-MAR, ASIR-V, and ASIR-V MAR), tube voltage (70 kVp/100 kVp/120 kVp) and dose level (3 mSv and 0.5 mSv).

[mSv, kVp]	ASIR-V	ASIR-V MAR	DLIR-H	DLIR-H MAR
The extent of metal artifact
[3, 70]	2	3.67	2	4
[3, 100]	2	4	2	4
[3, 120]	2.83	4	3.17	4
[0.5, 70]	1.83	3	2	3
[0.5, 100]	2	3	2	3
[0.5, 120]	2.5	3	3	3.5
The confidence of PNs diagnosis
[3,70]	2	3	2	4
[3, 100]	2.33	4	3	4
[3, 120]	3	4	3	4.17
[0.5, 70]	2	3	2	3
[0.5, 100]	2	3	2	3
[0.5, 120]	2.5	3	3	3
The overall image quality
[3, 70]	1	3	1.33	4
[3, 100]	2	4	2	4
[3, 120]	3	4	3	4
[0.5, 70]	1	3	2	3
[0.5, 100]	2	3	2	3
[0.5, 120]	3	3	3	3.5

DOI: 10.7717/peerj.19516/table-4

Discussion

In this study, we assessed the performance of the combinations of MAR and DLIR algorithms under various scanning scenarios on artifact reduction and image quality improvement. Both objective and subjective analyses showed that the MAR algorithm combined with DLIR-H at 120 kVp could significantly reduce metal artifacts and improve image quality while preserving image texture in artifact-free regions.

As demonstrated in Fig. 2 and Table 1, noise levels were predominantly influenced by the utilization of the DLIR algorithm and radiation dose, whereas MAR algorithms, designed for artifact suppression, demonstrated negligible impact on noise characteristics. Regarding the DLIR algorithm, previous literature has demonstrated its potential to improve image quality (Li et al., 2024), enhance diagnostic confidence (Zhu et al., 2024), and reduce radiation dose (Yang et al., 2021; Jiang et al., 2022a), particularly in the detection of lung nodules. Jiang et al. (2022a) demonstrated the feasibility of using DLIR for lung nodule screening with chest X-ray doses, showing that images at 0.07/0.14 mSv yielded comparable results in lung nodule detection, SNR, and malignant features to those of 3 mSv enhanced images. Zhao et al. (2023) presented superior accuracy and repeatability in the detection of pulmonary lesions and nodules based on DLIR (D’Hondt et al., 2024). In line with these studies (Szczykutowicz et al., 2021), our research also showed that DLIR maintained lower noise levels than ASIR-V, regardless of tube voltages or radiation dose levels (all p < 0.001). The noise of low-dose images with DLIR-H were comparable to those of standard-dose images with ASIR-V, although there were statistically significant differences (p < 0.05, Table 1). For different tube voltages, under identical reconstruction algorithms and radiation dose conditions, the 70 kVp protocol exhibited significantly higher noise compared to 100 kVp/120 kVp settings in all groups (p < 0.001) except the 0.5 mSv DLIR-H (0.87) and DLIR-H MAR (0.735). This observation may be explained by the characteristics of metal-implanted scans: higher kVp settings improve photon penetration efficiency, allowing the detector to capture more effective signals, thereby reducing noise in reconstructed images.

For the AI results, previous studies have demonstrated that the MAR algorithm alone can reduce metal artifacts to improve the delineation accuracy in dental implants (Fukugawa et al., 2022) or diagnostic confidence in knee implant (Zhang et al., 2020). In line with these papers, our study demonstrated that AI values in MAR images were significantly lower than those in images without MAR across various tube voltages, radiation dose levels and reconstruction algorithms (all p < 0.001). Furthermore, the Bonferroni post hoc tests of AI between DLIR-H and ASIR-V groups (or DLIR-H MAR VS ASIR-V MAR) were not statistically significant among the four-groups comparison: ASIR-V, ASIR-V MAR, DLIR-H and DLIR-H MAR. Kovacs et al. (2018) reported similar results. However, when only compared the AI between ASIR-V and DLIR-H images (or DLIR-H MAR VS ASIR-V MAR) using Mann–Whitney test across various scenarios, DLIR-H reconstructed images exhibited lower AI than those of ASIR-V, and differences were statistically significant at 3 mSv without MAR algorithm across various tube voltages. The p-values were 0.001, 0.003, and 0.036 for 120, 100, and 70 kVp, respectively. The lack of statistical differences between the ASIR-V and DLIR-H subgroups (Bonferroni post hoc test of Kruskal–Wallis test) in the multi-group comparison (Kruskal–Wallis test) may result from the pronounced differences between images with and without MAR, which obscured the AI differences between DLIR-H and ASIR-V. Regarding the tube voltage in metal artifact reduction, some studies have demonstrated that high keV of spectral CT combined with the MAR algorithm effectively reduces artifacts and improves diagnostic confidence (Chae et al., 2020), while other studies have indicated that moderate keV (70–80 keV) with the MAR algorithm has the best trade-off between vascular clarity and artifact levels (Zhao et al., 2023). In our study, the 70 kVp protocol produced significantly higher AI values compared to other tube voltages (p < 0.001) in all groups except the 0.5 mSv ASIR-V MAR (p = 0.078) and DLIR-H MAR (p = 0.149). The lack of statistically significant differences in AI values between different tube voltages within these two low-dose groups may be attributed to the combined effects of increased noise from low radiation doses and the effective artifact reduction by MAR algorithms, which collectively diminished the inherent advantages of higher kVp in metal artifact reduction. Furthermore, the AI of 120 kVp images with DLIR-H MAR at 0.5 mSv was comparable to this of 70 kVp images with ASIR-V MAR at 3 mSv (p = 0.143). Thus, DLIR-H combined with high kVp and MAR algorithm allows for the possibility of low-dose scanning in the context of metal implants. Qualitative evaluations of the extent of metal artifact by observers demonstrated significant concordance with quantitative metrics.

For most of the SNR values for artifact-impaired PNs, there were no statistical differences among the four groups of reconstruction algorithms (except for 3 mSv, 70 kVp). This may be because the degree of artifact impact on the four nodules varied, resulting in a wide range of SNR values under the same algorithm. Therefore, even though the differences in medians were substantial, there was no statistical significance. However, considering the statistical results of the 70 kVp 3 mSv group and the median SNR values of the other groups, the SNR values in the MAR algorithm groups are higher than that in the no-MAR algorithm groups (ASIR-V vs. ASIR-V MAR or DLIR-H vs. DLIR-H MAR). The increase in SNR is primarily attributed to two factors: an increase in CT values or a decrease in SD values. The reduction in SD values is mainly due to the decrease in artifacts and noise. In non-enhanced images, the advantages of high contrast and CT values in low keV images are diminished due to the absence of iodine contrast agents. Thus, decreased SD is the primary contributor to the increased SNR. Results showed that the noise value of MAR images is comparable to non-MAR images; however, the AI value decreases significantly in MAR images, leading to a lower SD. Thus, due to the metal artifact reduction, images with MAR had the higher SNR compared to images without MAR.

Furthermore, to the best of our knowledge, our study is the first to introduce the NPS for evaluating whether the MAR algorithm alters image texture in artifact-free regions. The results indicated that image texture was not affected by the choice of tube voltage, dose levels and reconstruction algorithm with or without MAR.

This study has several limitations: first, this study was performed on a phantom, which inherently lacks the anatomical complexity and did not account for the clinical diversity of patients. Future clinical studies with large sample sizes are warranted to investigate the clinical efficacy of combining MAR with DLIR in enhancing diagnostic accuracy and artifact suppression capability in CT images with metal implants. Second, to assess the ability of metal artifact suppression, three tube voltages were selected, but 140 kVp was not among them. Third, the metrics related to PNs, particularly volume, were not assessed in this study, because volume measurements by the AI software were inaccurate due to interference from metal artifacts and cotton. Finally, the exclusive investigation of GE’s proprietary reconstruction algorithms (DLIR and ASIR-V) inherently limits the generalizability of our conclusions to other vendor platforms.

Conclusion

In conclusion, the combination of the MAR algorithm with DLIR-H demonstrated significant noise and AI reductions, SNR improvements, while preserving the image texture of artifact-free regions. Interestingly, the low-dose images reconstructed by DLIR-H MAR at 120 kVp had comparable noise and AI compared to standard-dose images by ASIR-V MAR at 70 kVp.

Supplemental Information

Statistical analysis of SNR under different radiation doses with various tube voltages and algorithms

DOI: 10.7717/peerj.19516/supp-5

Download

Statistical analysis of AI under different radiation doses with various tube voltages and algorithms

DOI: 10.7717/peerj.19516/supp-6

Download

Statistical analysis of Noise under different radiation doses with various tube voltages and algorithms

DOI: 10.7717/peerj.19516/supp-7

Download

[1] Bayerl N, May MS, Wuest W, Roth JP, Kramer M, Hofmann C, Schmidt B, Uder M, Ellmann S. 2023. Iterative metal artifact reduction in head and neck CT facilitates tumor visualization of oral and oropharyngeal cancer obscured by artifacts from dental hardware. Academic Radiology 30(12):2962-2972

[2] Bongers MN, Schabel C, Thomas C, Raupach R, Notohamiprodjo M, Nikolaou K, Bamberg F. 2015. Comparison and combination of dual-energy- and iterative-based metal artefact reduction on hip prosthesis and dental implants. PLOS ONE 10(11):e0143584

[3] Caruso D, De Santis D, Del Gaudio A, Guido G, Zerunian M, Polici M, Valanzuolo D, Pugliese D, Persechino R, Cremona A, Barbato L, Caloisi A, Iannicelli E, Laghi A. 2024. Low-dose liver CT: image quality and diagnostic accuracy of deep learning image reconstruction algorithm. European Radiology 34(4):2384-2393

[4] Chae HD, Hong SH, Shin M, Choi JY, Yoo HJ. 2020. Combined use of virtual monochromatic images and projection-based metal artifact reduction methods in evaluation of total knee arthroplasty. European Radiology 30(10):5298-5307

[5] Choo HJ, Lee SJ, Kim DW, Lee YJ, Baek JW, Han JY, Heo YJ. 2021. Comparison of the quality of various polychromatic and monochromatic dual-energy CT images with or without a metal artifact reduction algorithm to evaluate total knee arthroplasty. Korean Journal of Radiology 22(8):1341-1351

[6] D’Hondt L, Franck C, Kellens PJ, Zanca F, Buytaert D, Van Hoyweghen A, Addouli HE, Carpentier K, Niekel M, Spinhoven M, Bacher K, Snoeckx A. 2024. Impact of deep learning image reconstruction on volumetric accuracy and image quality of pulmonary nodules with different morphologies in low-dose CT. Cancer Imaging 24(1):60

[7] Dunet V, Bernasconi M, Hajdu SD, Meuli RA, Daniel RT, Zerlauth J-B. 2017. Impact of metal artifact reduction software on image quality of gemstone spectral imaging dual-energy cerebral CT angiography after intracranial aneurysm clipping. Neuroradiology 59(9):845-852

[8] Enache AV, Toader C, Onciul R, Costin HP, Glavan LA, Covache-Busuioc RA, Corlatescu AD, Ciurea AV. 2025. Surgical stabilization of the spine: a clinical review of spinal fractures, spondylolisthesis, and instrumentation methods. Journal of Clinical Medicine 14(4):1124-1142

[9] Fujita N, Yasaka K, Katayama A, Ohtake Y, Konishiike M, Abe O. 2023. Assessing the effects of deep learning reconstruction on abdominal CT without arm elevation. Canadian Association of Radiologists Journal 74(4):688-694

[10] Fukugawa Y, Toya R, Matsuyama T, Watakabe T, Shimohigashi Y, Kai Y, Matsumoto T, Oya N. 2022. Impact of metal artifact reduction algorithm on gross tumor volume delineation in tonsillar cancer: reducing the interobserver variation. BMC Medical Imaging 22(1):161

[11] Jensen CT, Gupta S, Saleh MM, Liu X, Wong VK, Salem U, Qiao W, Samei E, Wagner-Bartak NA. 2022. Reduced-dose deep learning reconstruction for abdominal CT of liver metastases. Radiology 303(1):90-98

[12] Jiang B, Li N, Shi X, Zhang S, Li J, De Bock GH, Vliegenthart R, Xie X. 2022a. Deep learning reconstruction shows better lung nodule detection for ultra-low-dose chest CT. Radiology 303(1):202-212

[13] Jiang C, Jin D, Liu Z, Zhang Y, Ni M, Yuan H. 2022b. Deep learning image reconstruction algorithm for carotid dual-energy computed tomography angiography: evaluation of image quality and diagnostic performance. Insights Imaging 13(1):182

[14] Jiang C, Zhang J, Li W, Li Y, Ni M, Jin D, Zhang Y, Jiang L, Yuan H. 2024. Deep learning imaging reconstruction algorithm for carotid dual energy CT angiography: opportunistic evaluation of cervical intervertebral discs-a preliminary study. Journal of Imaging Informatics in Medicine 37(4):1960-1968

[15] Kanani A, Yazdi M, Owrangi AM, Karbasi S, Mosleh-Shirazi MA. 2022. Metal artifact reduction in cervix brachytherapy with titanium applicators using dual-energy CT through virtual monoenergetic images and an iterative algorithm: a phantom study. Brachytherapy 21(6):933-942

[16] Khodarahmi I, Haroun RR, Lee M, Fung GSK, Fuld MK, Schon LC, Fishman EK, Fritz J. 2018. Metal artifact reduction computed tomography of arthroplasty implants: effects of combined modeled iterative reconstruction and dual-energy virtual monoenergetic extrapolation at higher photon energies. Investigative Radiology 53(12):728-735

[17] Kikuchi N, Yanagawa M, Enchi Y, Nakayama A, Yoshida Y, Miyata T, Hata A, Tsubamoto M, Honda O, Tomiyama N. 2020. The effect of the reconstruction algorithm for the pulmonary nodule detection under the metal artifact caused by a pacemaker. Medicine 99(24):e20579

[18] Kim J, Park C, Jeong HS, Song YS, Lee IS, Jung Y, Lee SM. 2020. The optimal combination of monochromatic and metal artifact reconstruction dual-energy CT to evaluate total knee replacement arthroplasty. European Journal of Radiology 132:109254

[19] Kovacs DG, Rechner LA, Appelt AL, Berthelsen AK, Costa JC, Friborg J, Persson GF, Bangsgaard JP, Specht L, Aznar MC. 2018. Metal artefact reduction for accurate tumour delineation in radiotherapy. Radiotherapy and Oncology 126(3):479-486

[20] Laukamp KR, Zopfs D, Lennartz S, Pennig L, Maintz D, Borggrefe J, Große Hokamp N. 2019. Metal artifacts in patients with large dental implants and bridges: combination of metal artifact reduction algorithms and virtual monoenergetic images provides an approach to handle even strongest artifacts. European Radiology 29(8):4228-4238

[21] Lehti L, Söderberg M, Mellander H, Wassélius J. 2020. Iterative metal artifact reduction in aortic CTA after Onyx^®-embolization. European Journal of Radiology Open 7:100255

[22] Li J, Meng T, Zhang G, Yu X, Lu Z, Zhang W. 2024. Artificial intelligence iterative reconstruction in abdominal CT of patients with irregular arm positioning: a case-by-case evaluation. Acta Radiologica 65(8):907-912

[23] Long Z, De Lone DR, Kotsenas AL, Lehman VT, Nagelschneider AA, Michalak GJ, Fletcher JG, McCollough CH, Yu L. 2019. Clinical assessment of metal artifact reduction methods in dual-energy CT examinations of instrumented spines. AJR American Journal of Roentgenology 212(2):395-401

[24] Pennig L, Zopfs D, Gertz R, Bremm J, Zaeske C, Große Hokamp N, Celik E, Goertz L, Langenbach M, Persigehl T, Gupta A, Borggrefe J, Lennartz S, Laukamp KR. 2021. Reduction of CT artifacts from cardiac implantable electronic devices using a combination of virtual monoenergetic images and post-processing algorithms. European Radiology 31(9):7151-7161

[25] Selles M, Van Osch JAC, Maas M, Boomsma MF, Wellenberg RHH. 2024. Advances in metal artifact reduction in CT images: a review of traditional and novel metal artifact reduction techniques. European Journal of Radiology 170:111276

[26] Sun J, Li H, Yu T, Huo A, Hua S, Zhou Z, Peng Y. 2024. Application of metal artifact reduction algorithm in reducing metal artifacts in post-surgery pediatric low radiation dose spine computed tomography (CT) images. Quantitative Imaging in Medicine and Surgery 14(7):4648-4658

[27] Szczykutowicz TP, Nett B, Cherkezyan L, Pozniak M, Tang J, Lubner MG, Hsieh J. 2021. Protocol optimization considerations for implementing deep learning CT reconstruction. American Journal of Roentgenology 216(6):1668-1677

[28] Wichtmann HM, Laukamp KR, Manneck S, Appelt K, Stieltjes B, Boll DT, Benz MR, Obmann MM. 2023. Metal implants on abdominal CT: does split-filter dual-energy CT provide additional value over iterative metal artifact reduction? Abdominal Radiology 48(1):424-435

[29] Wong JA, Devereaux PJ. 2019. Cardiac device implantation complications: a gap in the quality of care? Annals of Internal Medicine 171(5):368-369

[30] Yang S, Bie Y, Pang G, Li X, Zhao K, Zhang C, Zhong H. 2021. Impact of novel deep learning image reconstruction algorithm on diagnosis of contrast-enhanced liver computed tomography imaging: comparing to adaptive statistical iterative reconstruction algorithm. Journal of X-ray Science and Technology 29(6):1009-1018

[31] Yao Y, Guo B, Li J, Yang Q, Li X, Deng L. 2022. The influence of a deep learning image reconstruction algorithm on the image quality and auto-analysis of pulmonary nodules at ultra-low dose chest CT: a phantom study. Quantitative Imaging in Medicine and Surgery 12(5):2777-2791

[32] Yasaka K, Furuta T, Kubo T, Maeda E, Katsura M, Sato J, Ohtomo K. 2017. Full and hybrid iterative reconstruction to reduce artifacts in abdominal CT for patients scanned without arm elevation. Acta Radiologica 58(9):1085-1093