To increase transparency, PeerJ operates a system of 'optional signed reviews and history'. This takes two forms: (1) peer reviewers are encouraged, but not required, to provide their names (if they do so, then their profile page records the articles they have reviewed), and (2) authors are given the option of reproducing their entire peer review history alongside their published article (in which case the complete peer review process is provided, including revisions, rebuttal letters and editor decision letters).
I believe you have addressed the comments sufficiently.
There are only minor suggested revisions. Please see them below.
The article meets the basic reporting standards.
The methodological approach is sound. Minor comments:
1. What was the total number of alerts? I assume it was around 1500 given the median of two per patient. This could be better clarified especially in the results, line 125. It reads as if there were 795 alerts, but I think there were 795 unique patients in whom the alert fired.
2. After discussion with my statistical colleagues, I believe that the statement made (Methods, line 98) is not quite accurate. Sensitivity and specificity can be estimated without manual chart review of ALL negative subjects. With a known/anticipated prevalence, a smaller subset can be reviewed to estimate the prevalence (with a certain degree of confidence) in the negative alert patients. The number needed may still be larger than the authors are able to review, but I do not believe reviewing all negative is required.
Now that formal estimates of sensitivity and specificity are not attempted, the results more appropriately match the methodology.
Appreciate that the authors added the table reporting ED outcomes (admission, discharge, died). However, some discussion on the 10% of patients who were discharged despite a true positive sepsis alert might be warranted. I realize that the alert was not made available to the clinicians, but does this 10% represent a population who could potentially benefited had the alert been transmitted to the clinician?
Interesting manuscript. Appreciate the opportunity to review.
The reviewers and myself have reviewed the paper and all find it interesting and novel and point out many strong points. However there are many concerns and questions that need addressing for this to be published in PeerJ.
Please review and respond to the referees comments and either revise the manuscript in response to each comment or explain why you choose not to.
1) In “Selection of Subjects – EMR Sepsis Detection System:
The interpretation of the study’s results and its generalizability might be clarified by including a more detailed description of the design of the automated sepsis detection algorithm within this section. This might include the rules engine software platform and programming language used to develop the sepsis detection system, the means by which the application queried data from the Cerner FirstNet EMR system and its data repository for use in rules testing, and how the application interfaced with the “front end” of the EMR to trigger an EMR alert.
Additional methodological details could include description of the methods by which potentially dyssynchronous data were screened to identify a patient with a sepsis diagnosis at a precise moment in time. For example, vital signs and labs may not be taken from the patient at the same time and may be entered into the EMR at different times such that precise logic may need to be defined to determine when it is, or is not, appropriate to use temporally disparate vital sign and laboratory data. As an example, initial labs may show a WBC count of 16 or a lactic acid of 2.3 in a patient whose initial vital signs do not meet SIRS criteria (no alert fired). 2 hours later, the vital signs may have deteriorated and now meet SIRS criteria. Would the initial lab data, combined with the repeat vital signs, now result in an alert being fired? What if the vitals didn’t deteriorate for 6 hours? Is there a statute of limitations? These types of methodological issues can have significant impact on the accuracy of alerting systems and would be useful to disseminate.
Also in this section, there was no description of the process of validation of the queries for each of the individual data elements (e.g. – how did the authors ensure that the queries consistently and accurately identified the crossing of the data thresholds defined for each step in the query logic). The overall performance of the sepsis alert algorithm is contingent on the query accuracy of each algorithm component (meeting SIRS criteria, systolic BP < 90, and lactic acid > 2) within analysis time bins. The complexity of most commercial EMR databases can result in compromised query accuracy when the query is run against a large number of patient records if the query is not thoroughly validated. Were query results for each component of the algorithm validated against manual chart review? Similarly, was manual chart review used to ensure that values crossing the “abnormal value” threshold had in fact crossed the defined threshold? How many patient charts were used for the validation steps?
On line 62, change “WBC count” to “white blood cell (WBC) count”
On line 67, the authors state, “All patients in the ED were included in the study.”
• What percentage of patients were adults vs. children?
• It is suggested from the text in the Methods section that the same sepsis alert was applied to both adults and children of all ages. The SIRS criteria thresholds for ‘abnormal’ in children are different from those of adults (Goldstein et. al. Pediatr Crit Care Med. 2005 Jan;6(1):2-8.) and are age-group specific such that application of a ‘one size fits all’ sepsis alert algorithm could result in sub-optimal algorithm performance in children. It would be helpful to know the age-related epidemiology of the ED cohort studied in this manuscript and to know what effect, if any, patient age had on algorithm performance.
Beginning on line 70, the authors detail a sampling strategy limited to the subset of 18,000+ patients in whom a sepsis alert was not fired.
• It is unclear why this sampling strategy was chosen over an alternative sampling strategy, such as randomly selecting a number of patients from the entire 18,000+ ED population for manual validation, rather than from the cohort of “non-alert” patients only.
2) In “Outcomes and Methods of Measurement”:
On line 76, the authors state, “Confirmed sepsis was defined as the presence of 1) a serious infection related to the ED presentation…”
• From the text, it is unclear 1) how ‘serious infection’ is defined and 2) whether “serious infection” is synonymous with any infection thought to cause the episode of sepsis in question. In this regard, was the diagnosis of ‘infection’ based on culture positivity, appropriate clinical signs and symptoms of infection with or without positive cultures (as commonly occurs with pneumonia for example), a requirement for a minimum number of days of antimicrobial use, etc? The definition used could significantly alter the data analysis in terms of inter-rater agreement, positive predictive value, the rate of sepsis in the cohort of patients in whom an alert never fired, as well as the overall generalizability of the results.
Beginning on line 80, the authors state “Two investigators manually reviewed the ED medical records for all sepsis alert activations as well as for the randomly selected non-alert controls.”
• Were chart reviewers blinded to cohort identity (i.e. – alert v. non-alert cohort)?
3) In “Data Analysis”:
Beginning on line 85, the authors state, “We determined the diagnostic accuracy of the automated EMR sepsis detection system by calculating positive predictive value (PPV) of the sepsis alerts. Because of the sampled nature of the non-alerts, it was not possible to calculate negative predictive value (NPV), sensitivity, specificity and area under the ROC curve.”
• Please clarify why the NPV, sensitivity, and specificity are reported in the “Results” section beginning on line 100.
4) In “Results”:
Beginning on line 100, the authors state, “While not encompassing all ED visits, based upon the sepsis alert and randomly selected non-alert patients, the NPV for the sepsis alert was high (100%; 95% CI: 98.8-100.0%), the sensitivity for sepsis was high (100.0%, 95% CI: 98.7-100.0%), and specificity was low (37.4%, 95% CI: 34.0-40.9%).”
• It sounds from the text that the alert patients and non-alert patients were combined for the statistical analysis that follows. Please clarify if this is the case and explain why this is statistically sound.
• The calculation of the negative predictive value (#True Neg/(#True Neg + #False Neg)) may not be valid with the available data. The cohort of non-alert patients used for this analysis may not be representative of the larger cohort of 18,000+ ED patients evaluated during the study period. The non-alert cohort would be expected to have a very low prevalence of sepsis since they were selected based on the absence of SIRS, hypotension, and elevated lactic acid levels (non-alert criteria) from a large cohort of patients with a presumably already low prevalence of sepsis (all patients presenting to the ED with any complaint/problem). Not surprisingly, this highly selected non-alert cohort did not contain any patients with sepsis. Using a non-representative cohort that lacks sepsis cases, it may not be reasonable to conclude that the “false negative” rate is zero. A truly randomly selected sample from the 18,000+ population would be required to calculate a true negative predictive value.
• The calculation of sensitivity (#True Pos/(#True Pos + #False Neg)) may not be statistically valid with the available data since it appears that the # of true positives and # of true negatives are derived from different samples from the total population. Also, as described above, the use of a non-alert-derived cohort may have introduced bias.
• The calculation of specificity (#True Neg/(True Neg + #False Pos)) may not be statistically valid with the available data since it appears that the # of true negatives and the # of true positives are derived from different samples from the total population. As above, the estimation of the true negative rate may also not be valid.
The analysis of PPV, NPV, sensitivity, and specificity could all be calculated if a random sampling of the total 18,000+ patient population was taken, followed by blinded manual chart review to determine the # of true positives and # of true negatives. The results of the sepsis alert system’s analysis of this same cohort could then be compared to the “gold standard” for the calculation of PPV, NPV, sensitivity, and specificity.
5) In “Discussion”:
Beginning on line 123, the authors state, “While we could not formally calculate the sensitivity of the system, the random sample of non-alert patients resulted in no sepsis cases, assuring that the prevalence of false-negatives (undetected sepsis) is relatively low.”
• Suggest changing the word assuring to “suggesting”.
Beginning on line 144, the authors state, “For example, with clinician input, the system exclude patients…”
• Clarify the phrase “the system exclude” (meant to say “the system could exclude”?)
Beginning on line 149, the authors state, “Future studies much evaluate…”
• Clarify the phrase “studies much evaluate”
This study describes an innovative approach to sepsis detection in the ED through the development of what sounds like a sophisticated rules engine able to analyze vital sign and laboratory data represented in a commercial EMR. Similar electronic detection systems have been reported but there are no consensus methods for the development and validation of such systems. Example strategies include the use of pre-existing “clinical alert” system functionality within a commercial EMR, customization of a pre-existing “clinical alert” system for a novel use case, the development of free-standing applications that interface with the EMR and/or legacy systems and then feed information back into the EMR, and the development of free-standing applications that use data from the EMR and other systems but alert clinicians independent of the EMR (e.g. – through the hospital paging system). Elaboration on this study’s methods, in particular the technical solutions developed and the processes used to validate sub-system performance, could add significantly to the literature pertaining to realtime syndrome surveillance using commercial EMRs.
Meets the specified requirements.
This manuscript reports the test properties of an automated sepsis decision rule implemented for patients in the emergency department. The rule conducts surveillance of physiologic and laboratory data to trigger an alert when pre-specified conditions are present. The authors performed clinical chart review on all positive alerts and a subset of negative patients to draw conclusions about sensitivity and specificity of the rule. The manuscript is well-written and composed. The analytic approach is generally sound but warrants consideration of the following comments:
1. While working on the inpatient rule, the decision rule also incorporated serum glucose and assessment of mental status (SIRS) and serum creatinine, bilirubin, and platelet count (organ dysfunction). I assume the ED is using a simplified rule set?
2. Please describe the process by which negative (non-alert) charts were identified. Was it truly random? How was the 300 number selected as sufficient?
3. With regard to confirmation of sepsis, please describe in more detail what met the criterion of “serious infection related to the ED presentation.” Please include discussion related to negative (non-alert) patients who might have been discharged but subsequent testing might have indicated possible source of infection (e.g., positive blood culture). Please also include discussion of the time frame of the clinical information reviewed (e.g., did review of data stop at discharge from ED?)
4. The definition used to confirm the alert diagnosis of sepsis is subject to endogeneity bias as 2 of the 3 elements were also elements of the decision rule/alert. Hence, the importance of the evaluation of “serious infection related to the ED presentation” as a component of the confirmation definition.
5. The authors report that “Because of the sampled nature of the non-alerts, it was not possible to calculate NPV, sensitivity, specificity and ROC.” However, they report these characteristics in the result. Need to commit to one choice or the other.
This is why the process of negative alert chart review is important. If the selection of negative charts was truly random, then it is possible to calculate/estimate these parameters. However, the approach taken is suboptimal. For example, the authors calculate sensitivity/specificity based solely on the 300 charts reviewed rather than extrapolating the sampled results to the large population from which the sample arose (the 17K or so negative alerts). The most obvious implication is a lower than expected specificity. Knowing the estimated prevalence in the sample of all ED patients (from literature review) could allow one to estimate the number of negative chart reviews needed to draw reasonable conclusions about sensitivity/specificity.
6. Would report number of positive alerts per day rather than total number of alerts during the study period (Discussion first paragraph).
7. (Table 1) Would recommend either (1) omitting cells associated with negative alerts or (2) extrapolate the sample reviewed to the larger population. Otherwise, the specificity is substantially underestimated.
8. It would be helpful to know what proportion (assume 100%) of patients with positive alerts were admitted and to what location. Also, some idea of what proportion of alerts added meaningful information rather than confirming what the clinical team already knew or suspected. The most important aspect of the automated alert is to catch patients in whom the clinical team miss the diagnosis.
9. Relate to #8 above, it is possible to use administrative data to identify patients discharged from hospital following inpatient stay who had sepsis (see Dombrovskiy 2007 use of ICD-9 discharge data).
The findings have face validity. There is concern about the approach used to estimate sensitivity and specificity. The approach used to confirm the alert diagnosis could be better described both in procedural terms, but also the temporal nature of the data evaluated.
Solid manuscript that is well-written. The three most important aspects in need of greater attention are (1) the process used to identify negative charts for review, (2) the process used to confirm “serious infection related to ED presentation.” and (3) the reporting of sensitivity and specificity.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.