Epidemiological surveillance of HIV infection in Japan involves two technical problems for directly applying a classical backcalculation method, i.e., (i) all AIDS cases are not counted over time and (ii) people diagnosed with HIV have received antiretroviral therapy, extending the incubation period. The present study aimed to address these issues and estimate the HIV incidence and the proportion of diagnosed HIV infections, using a simple statistical model.

From among Japanese nationals, yearly incidence data of HIV diagnoses and patients with AIDS who had not previously been diagnosed as HIV positive, from 1985 to 2017, were analyzed. Using the McKendrick partial differential equation, general convolution-like equations were derived, allowing estimation of the HIV incidence and the time-dependent rate of diagnosis. A likelihood-based approach was used to obtain parameter estimates.

Assuming that the median incubation period was 10.0 years, the cumulative number of HIV infections was estimated to be 29,613 (95% confidence interval (CI): 29,059, 30,167) by the end of 2017, and the proportion of diagnosed HIV infections was estimated at 80.3% (95% CI [78.7%–82.0%]). Allowing the median incubation period to range from 7.5 to 12.3 years, the estimate of the proportion diagnosed can vary from 77% to 84%.

The proportion of diagnosed HIV infections appears to have not yet reached 90% among Japanese nationals. Compared with the peak incidence from 2005–2008, new HIV infections have clearly been in a declining trend; however, there are still more than 1,000 new HIV infections per year in Japan. To increase the diagnosed proportion of HIV infections, it is critical to identify people who have difficulty accessing consultation, testing, and care, and to explore heterogeneous patterns of infection.

Following an infection with human immunodeficiency virus (HIV), development of acquired immunodeficiency syndrome (AIDS) takes about 10 years (

Understanding the transmission dynamics of HIV using such statistical models is in line with the concept of treatment cascade, introduced by the Joint United Nations Programme on HIV/AIDS (UNAIDS). The so-called care cascade aims to identify and fill gaps in the continuum of services for testing, care, and effective treatment of HIV (

Despite the clear need for epidemiological estimation of the number of undiagnosed HIV infections, the surveillance data in Japan possesses two technical problems. First, while the definition of AIDS has remained nearly unchanged over time, reporting AIDS cases that were previously diagnosed as HIV-infected cases has never been mandated (

In the present study, the aim was to address the abovementioned issues, estimating the HIV incidence among Japanese nationals, and also to offer statistical estimates of undiagnosed HIV infections and the proportion of diagnosed HIV infections over time.

The present study investigated the epidemiological surveillance data of HIV and AIDS in Japan, which is publicly reported by the

The proposed statistical model is derived from the following partial differential equation (PDE) model, which is referred to as the McKendrick equation (

New HIV infections occur at rate

for

The datasets are reported in a discrete time interval (i.e., year); thus, here I discretized models

and

There is no prior notion as to the shape of the epidemic curve (i.e., the frequency of transmission) over time. Thus, the incidence of HIV infection in year _{t}, is modeled as a step function: _{t}, is similarly modeled as

The probability mass function of the incubation period is assumed as known, and in discrete time, this is written as

and

Using the abovementioned model, undiagnosed HIV infections at the end of year

The diagnosed proportion of HIV infections is calculated either as ∑(

To quantify the proposed system of equations, we estimate parameters _{t} and _{t} by means of the maximum likelihood method. Considering that HIV infections are generated as the nonhomogenous Poisson process, the resulting HIV diagnoses and AIDS cases would also follow Poisson distributions. The likelihood function of HIV diagnoses is _{t} denotes the reported (observed) number of HIV diagnoses in year _{t} denotes the reported number of new AIDS diagnoses in year

Maximum likelihood estimates of parameters are obtained by minimizing the negative logarithm of

The 95% confidence interval (CI) of parameters was derived from the profile likelihood. The 95% CI of model estimates (e.g., the number of undiagnosed HIV infections and the proportion diagnosed) was derived using a parametric bootstrap method. In the bootstrapping exercise, model parameters were resampled from a multivariate normal distribution with vectors of mean

In the present study, the analyzed data are publicly available (

Estimated parameters, i.e., yearly incidence and yearly probability of diagnosis, are shown in

(A) The yearly incidence of HIV infection, assuming that the median incubation period is 10.0 years. The step function for every 4 years was used to model the incidence. The 95% confidence intervals were derived from profile likelihood. (B) The yearly rate of diagnosis of HIV infection, assuming that the median incubation period is 10.0 years. (C) Maximum likelihood estimates of the yearly incidence with different median incubation periods: 7.5, 10.0, and 12.3 years. (D) Maximum likelihood estimates of the yearly rate of diagnosis with different median incubation periods: 7.5, 10.0, and 12.3 years. (E) Yearly incidence estimates by sex and different median incubation periods. Maximum likelihood estimates are shown. Note that a common logarithmic scale is used on the vertical axis, to ease comparisons. (F) Yearly rate of diagnosis estimates by sex and different median incubation periods. Maximum likelihood estimates are shown.

(A) Comparisons between observed and predicted yearly number of HIV diagnoses and AIDS cases. Different median incubation periods (i.e., 7.5, 10.0, and 12.3 years) were assumed, but predicted values are mostly overlapped. (B) Comparisons between observed and predicted values by sex. Circles represent the observed number of HIV diagnoses whereas triangles represent that of AIDS cases. Solid marks represent males; empty marks represent females. A common logarithmic scale is used on the vertical axis. In A and B, bold grey lines represent lower and upper 95% confidence intervals with the median incubation period of 10.0 years based on the parametric bootstrap method.

(A) Estimates of undiagnosed HIV infections, assuming that the median incubation period is 10.0 years. The 95% confidence intervals were derived from profile likelihood. (B) Maximum likelihood estimates of undiagnosed HIV infections with different median incubation periods: 7.5, 10.0, and 12.3 years. (C) Proportion of diagnosed infections out of the cumulative number of HIV infections, inclusive of AIDS cases. (D) Proportion of diagnosed infections out of the cumulative number of HIV infections, excluding AIDS cases. (E) Maximum likelihood estimates of undiagnosed HIV infections by sex, with different median incubation periods: 7.5, 10.0, and 12.3 years. Note that common logarithmic scale is used on the vertical axis. (D) Proportion of diagnosed infections out of the cumulative number of HIV infections, excluding AIDS cases, by sex.

Including and excluding AIDS cases, the estimated proportions of diagnosed HIV infections are shown in

(A) Estimates of undiagnosed HIV infections with different incubation periods. Whiskers extend to lower and upper 95% confidence intervals derived using a parametric bootstrapping method. (B) Proportion of diagnosed infections out of the cumulative number of HIV infections, excluding AIDS cases (solid circles) or including AIDS cases but subtracting 2,321 deaths (empty circles). Whiskers extend to lower and upper 95% confidence intervals derived using a parametric bootstrapping method.

The present study estimated the incidence and diagnosed fraction of HIV infections among Japanese nationals, devising an original model that captures the data generating process of HIV and AIDS in the epidemiological surveillance. By the end of 2017, the cumulative number of HIV infections was estimated to be about 30,000 cases, of which 4,000 to 6,000 were considered to have remained undiagnosed. Assuming that the median incubation period was 10.0 years, 80% of infections have ever been diagnosed; accounting for the uncertainty in a median incubation period ranging from 7.5 to 12.3 years, the estimate of the diagnosed proportion can range from 77% to 84%. To the author’s knowledge, the present study is the first to offer firm statistical estimates of the incidence and diagnosed proportion of HIV infections based on epidemiological surveillance data in Japan, using an explicit mathematical modeling approach.

There are two take-home messages from the results of this study. First, regardless of whether AIDS cases are included, the proportion of diagnosed HIV infections appears not to have reached 90% among Japanese nationals. Although some estimates exceed 80%, even after subtraction of known deaths owing to AIDS, the findings echo those of a published study that analyzed blood donor data (

Second, compared with the peak from 2005–2008, the incidence showed a declining trend. Compared with the estimate in 2005–2008, the upper 95% CIs of the next two time periods (2009–2012 and 2013–2017) were significantly lower than those in the peak period. In fact, a declining trend has also been seen in other datasets, including the incidence of counseling and blood testing at local health centers and the proportion of HIV-positive blood donors over time (

Although the present study was motivated by the need for quantifying the care cascade in Japan, in accordance with the goals of 90-90-90, a few technical issues must be noted to interpret the estimates and apply the present results to the evaluation. First, Japanese estimates of the latter two goals of the 90-90-90 initiative, i.e., access to ART and virus suppression, rest on questionnaire surveys conducted in the prefectures, which do not distinguish between infected individuals who are Japanese nationals and those who are not (

Five technical limitations must be noted. First, the present study did not account for uncertainties other than variations in length of the incubation period. There has been a concern that the incubation period has probably shortened over time (

Despite these limitations, the present study successfully estimated the incidence of HIV infections, undiagnosed number of infections, and the proportion diagnosed in real time, using limited but readily available epidemiological surveillance data. Improved estimates using age and geographical data, as well as estimates based on other methods, are to follow, which will boost studies of epidemiological estimation in this area in Japan.

In the present study, a statistical modeling method was developed for the estimation of HIV incidence in Japan and estimates made of the undiagnosed number of HIV infections and the proportion of diagnosed HIV infections over time. Using the McKendrick equation, a general convolution-like equation was derived, allowing for joint estimation of the HIV incidence and time-dependent rate of diagnosis. By the end of 2017, the cumulative number of HIV infections was estimated to be about 30,000, and about 80% of infections have ever been diagnosed. Accounting for the uncertainty in the median incubation period ranging from 7.5 to 12.3 years, estimates of the diagnosed proportion of HIV infections can range from 77% to 84%. The proportion of diagnosed HIV infections appears not to have reached 90% among Japanese nationals.

We thank Analisa Avila, ELS, of Edanz Group for editing a draft of this manuscript.

Hiroshi Nishiura is an Academic Editor for PeerJ and has no competing interests.

The following information was supplied regarding data availability:

The raw data with the website URL is referenced in the manuscript, and any readers can download the data from there.