by Mary Dimmock (please re-post freely with attribution to Mary Dimmock)
Last week, Jennie Spotila and Erica Verillo posted summaries of just some of the issues with AHRQ’s Draft Systematic Evidence Review, conducted for P2P.
Jennie and Erica highlighted serious and sometimes insurmountable flaws with this Review, including:
- The failure to be clear and specific about what disease was being studied.
- The acceptance of 8 disparate ME or CFS definitions as equivalent in spite of dramatic differences in inclusion and exclusion criteria.
- The bad science reflected in citing Oxford’s flaws and then using Oxford studies anyway.
- The well-known problems with the PACE trial.
- The flawed process that used non-experts on such a controversial and conflicted area.
- Flawed search methods that focused on fatigue.
- Outright errors in some of the basic information in the report and apparent inconsistencies in how inclusion criteria were applied.
- Poorly designed and imprecise review questions.
- Misinterpretation of cited literature.
In this post, I will describe several additional key problems with the AHRQ Evidence Review.
Keep in mind that comments must be submitted by October 20, 2014. Directions for doing so are at the end of this post.
We Don’t Need No Stinking Diagnostic Gold Standard
Best practices for diagnostic method reviews state that a diagnostic gold standard is required as the benchmark. But there is no agreed upon diagnostic gold standard for this disease, and the Review acknowledges this. So what did the Evidence Review do? The Review allowed any of 8 disparate CFS or ME definitions to be used as the gold standard and then evaluated diagnostic methods against and across the 8 definitions. But when a definition does not accurately reflect the disease being studied, that definition cannot be used as the standard. And when the 8 disparate definitions do not describe the same disease, you cannot draw conclusions about diagnostic methods across them.
What makes this worse is that the reviewers recognized the importance of PEM but failed to consider the implications of Fukuda’s and Oxford’s failure to require it. The reviewers also excluded, ignored or downplayed substantial evidence demonstrating that some of these definitions could not be applied consistently, as CDC’s Dr. Reeves demonstrated about Fukuda.
Beyond this, some diagnostic studies were excluded because they did not use the “right” statistics or because the reviewer judged the studies to be “etiological” studies, not diagnostic methods studies. Was NK-Cell function eliminated because it was an etiological study? Was Dr. Snell’s study on the discriminative value of CPET excluded because it used the wrong statistics? And all studies before 1988 were excluded. These inclusion/exclusion choices shaped what evidence was considered and what conclusions were drawn.
Erica pointed out that the Review misinterpreted some of the papers expressing harms associated with a diagnosis. The Review failed to acknowledge the relief and value of finally getting a diagnosis, particularly from a supportive doctor. The harm is not from receiving the diagnostic label, but rather from the subsequent reactions of most healthcare providers. At the same time, the Review did not consider other harms like Dr. Newton’s study of patients with other diseases being diagnosed with “CFS” or another study finding some MS patients were first misdiagnosed with CFS. The Review also failed to acknowledge the harm that patients face if they are given harmful treatments out of a belief that CFS is really a psychological or behavioral problem.
The Review is rife with problems: Failing to ask whether all definitions represent the same disease. Using any definition as the diagnostic gold standard against which to assess any diagnostic method. Excluding some of the most important ME studies. It is no surprise, then, that the Review concluded that no definition had proven superior and that there are no accepted diagnostic methods.
But remarkably, reviewers felt that there was sufficient evidence to state that those patients who meet CCC and ME-ICC criteria were not a separate group but rather a subgroup with more severe symptoms and functional limitations. By starting with the assumption that all 8 definitions encompass the same disease, this characterization of CCC and ICC patients was a foregone conclusion.
But Don’t Worry, These Treatment Trials Look Fine
You would think that at this point in the process, someone would stand up and ask about the scientific validity of comparing treatments across these definitions. After all, the Review acknowledged that Oxford can include patients with other causes of the symptom of chronic fatigue. But no, the Evidence Review continued on to compare treatments across definitions regardless of the patient population selected. Would we ever evaluate treatments for cancer patients by first throwing in studies with fatigued patients? The assessment of treatments was flawed from the start.
But the problems were then compounded by how the Review was conducted. The Review focused on subjective measures like general function, quality of life and fatigue, not objective measures like physical performance or activity levels. In addition, the Review explicitly decided to focus on changes in the symptom of fatigue, not PEM, pain or any other symptom. Quality issues with individual studies were either not considered or ignored. Counseling and CBT studies were all lumped into one treatment group, without consideration of the dramatic difference in therapeutic intent of the two. Some important studies like Rituxan were not considered because the treatment duration was considered too short, regardless of whether it was therapeutically appropriate.
And finally, the Review never questioned whether the disease theories underlying these treatments were applicable across all definitions. Is it really reasonable to expect that a disease that responds to Rituxan or Ampligen is going to also respond to therapies that reverse the patient’s “false illness beliefs” and deconditioning? Of course not.
If their own conclusions on the diagnostic methods and the problems with the Oxford definition were not enough to make them stop, the vast differences in disease theories and therapeutic mechanism of action should have made the reviewers step back and raise red flags.
At the Root of It All
This Review brings into sharp relief the widespread confusion on the nature of ME and the inappropriateness of having non-experts attempt to unravel a controversial and conflicting evidence base about which they know nothing.
But just as importantly, this Review speaks volumes about the paltry funding and institutional neglect of ME reflected in the fact that the study could find only 28 diagnostic studies and 9 medication studies to consider from the last 26 years. This Review speaks volumes about the institutional mishandling that fostered the proliferation of disparate and sometimes overly broad definitions, all branded with the same “CFS” label. The Review speaks volumes about the institutional bias that resulted in the biggest, most expensive and greatest number of treatment trials being those that studied behavioral and psychological pathology for a disease long proven to be the result of organic pathology.
This institutional neglect, mishandling and bias have brought us to where we are today. That the Evidence Review failed to recognize and acknowledge those issues is stunning.
Shout Out Your Protest!
This Evidence Review is due to be published in final format before the P2P workshop and it will affect our lives for years to come. Make your concerns known now.
- Submit public comments on the Evidence Review to the AHRQ website by October 20.
- Contact HHS and Congressional leaders with your concerns about the Evidence Review, the P2P Workshop and HHS’ overall handling of this disease. Erica Verillo’s recent post provides ideas and links for how to do this.
The following information provides additional background to prepare your comments:
However you choose to protest, make your concerns known!