People using assistive technology may not be able to fully access information in these files. For additional assistance, please contact us.
The reliable identification of patients with abdominal pain who need surgical intervention for acute appendicitis can improve clinical outcomes and reduce resource use. The test performance and impact on outcomes of alternative diagnostic strategies are unclear.
Study eligibility criteria
We searched PubMed®, Embase®, the Cochrane Central Register of Controlled Trials, and the Cumulative Index to Nursing and Allied Health Literature® to identify primary research studies meeting our criteria for cohort studies that reported information on test accuracy for the diagnosis of acute appendicitis or harms, and for comparative studies (randomized or nonrandomized) that reported information on patient-relevant outcomes and resource use (last search, August 6, 2014, for PubMed; August 12, 2014, for all other databases).
Study appraisal and synthesis methods
A single investigator extracted data from each study and a second investigator verified extracted data from comparative studies; we also extracted data in duplicate for a sample of noncomparative studies. We performed Bayesian meta-analyses to estimate summary test performance using random-effects models; data on other outcomes were synthesized qualitatively. We also assessed the strength and applicability of the evidence.
Information on the test performance of diagnostic tests was available from 903 studies: clinical symptoms and signs (137 studies), laboratory tests (217 studies), imaging tests (519 studies), multivariable diagnostic scores (127 studies), and diagnostic laparoscopy (55 studies). Trials directly comparing diagnostic tests were too heterogeneous to support definitive conclusions; therefore, most of our results pertain to the test performance of individual tests. Clinical symptoms and signs, and laboratory tests had relatively low sensitivity and specificity when used in isolation. Their combination in multivariable scores performed somewhat better; however, the most studied scores were developed before the widespread use of imaging, thus lessening the applicability of their results to current practice. Computed tomography (CT) had high sensitivity (summary estimates ranging from 0.96 to 1) and specificity (0.91 to 1) in all populations of interest to this report; magnetic resonance imaging (MRI) had high sensitivity (0.94 to 1) but appeared to have variable specificity (0.86 to 1), mainly because of the smaller number of studies, which focused on its use for pregnant women. In adult populations, ultrasound (US) had lower sensitivity (0.85) and specificity (0.90) than CT and MRI, and produced more nondiagnostic scans. In children, the specificity of US was similar to that of CT (0.91 vs. 0.92), but CT had greater sensitivity (0.89 vs. 0.96); these results were based on a large number of studies (85 for US and 34 for CT). In the same patient population, MRI had a specificity of 0.96 and sensitivity of 0.97, but data were derived from only seven studies. Among pregnant women CT, MRI, and US had similar specificity (0.91, 0.98, and 0.95, respectively), but CT and MRI had higher sensitivity than US (0.99, 0.98, and 0.72, respectively). Information on diagnostic test performance among the elderly was limited. Studies of test performance were deemed to be at moderate risk of bias, mostly because of concerns about differential and incomplete verification.
Information on patient-relevant outcomes and resource use was available from a small number of trials with moderate risk of bias that assessed heterogeneous comparisons between various tests and nonrandomized studies that did not appropriately adjust for potential confounding factors. Only a few studies reported information on harms, leading to concerns about selective outcome reporting. Therefore, no definitive conclusions could be drawn about patient-relevant outcomes or harms.
Patient-level data were unavailable, and information about study- or population-level characteristics was too limited to allow the identification of modifiers of test performance, patient-centered outcomes, or harms. Studies reported adverse events incompletely and did not provide details of outcome ascertainment methods.
The literature on the diagnosis of acute appendicitis is large but consists almost exclusively of studies assessing the performance of individual tests. The evidence on individual tests indicates that imaging tests have adequate test performance, while clinical symptoms and signs and laboratory tests used in isolation have lower discriminatory capacity. The evidence is largely insufficient to support conclusions about comparative effectiveness for clinical outcomes because studies assessing more than two test strategies on the same population are few and have evaluated different test comparisons. More research is needed to evaluate the comparative performance and effectiveness of individual tests, test combinations, and integrated diagnostic algorithms; to identify potential modifiers; and to evaluate the impact of testing strategies on patient-relevant outcomes, resource use, and harms. Decision and simulation models using information from this review could inform the design of future studies and guide decisionmaking.