The great English marathon runner Paula Radcliffe is under suspicion for doping based on some leaked lab values from drug tests conducted sometime in the 2000s. Radcliffe has run the three fastest marathons by a woman and my colleagues Sandra Hunter, Andy Jones and I have argued that her world record of just over 2:15 is essentially a sub “2-hour” marathon for women.
There is a lot of controversy about this and about the release of the blood test scores and just how transparent Radcliffe has been with her data. Here is an excerpt from a Sky News report:
“Radcliffe, who this week spoke out to deny doping, insists test results included in a leaked database were cleared by the International Association of Athletics Federations (IAAF) and can all be explained by the circumstances in which they were taken. She says the test results, seen by Sky News, all fall below normal levels for samples given following altitude training, and she believes this destroys the case against her.
Radcliffe’s “off-scores”, the measures used to gauge an athlete’s blood values, in the three tests were 114.86, 109.86 and 109.3. Anything above 103 recorded by a female athlete can be a trigger for investigation and target-testing, but the ‘normal’ threshold can rise for a number of reasons, including altitude training and tests taken immediately after extreme exertion.
Radcliffe says all three samples were taken after periods of altitude training and two, including the highest, were taken immediately after she had raced. Radcliffe says these factors explain the figures.”
What is an “Off Score?”
I have always been a little confused about what an “off score” is perhaps because in clinical medicine we think about the hemoglobin concentration (Hb, hemoglobin is the substance in red blood cells that carries oxygen and increased hemoglobin in the blood is why blood doping and the drug EPO work to improve endurance). We also sometimes think about other parameters like the reticulocyte count which is an index of how rapidly the body is making new red blood cells. To understand how the values are used to create the scores used in the Athlete Biological Passport I e-mailed my colleagues Drs. Jim Stray-Gundersen and Ben Levine. Jim and Ben are two of the world’s leading experts on altitude training and legal ways to increase endurance performance in humans. Jim has also been involved in a number of the studies that led to the current approach to testing. As he followed the traffic Ben Levine simply said:
“I have nothing to add to Jim’s erudite discussion. Mike – could you put this on your blog?”
So here are some excerpts of our e-mail exchange from late last week.
E-mail #1: Subject: ABP scores and Paula
Jim, can you explain these scores to me? What was her Hb and what was her reticulocyte count?
Response #1 Hi Mike, if you are talking about Paula’s OFF scores from the hematologic passport, I don’t know what the values were, but it is a function of a low reticulocyte (retic) count and a high Hb. So when one gets a transfusion, one has a high (or higher) Hb and the bone marrow shuts off (or decreases) the rate of producing new retics. Here is a link to a detailed explanation by Ross Tucker.
The calculation is: is Hemoglobin x 10 – 60 x (square root of the reticulocyte %). “Normal” OFF scores are between 80-110.
Essentially, we are looking at abnormally accelerating or decelerating the bone marrow and coming up with a “score” that is associated with a probability of only being due to various blood doping practices. These scores were developed from an IOC study, that I was involved with, where we surveyed blood values in over a 1000 international caliber athletes from around the world (all ethnic and racial groups form endurance sports) and then 3 different blood doping trials (in Norway (which I did), Australia and China). Originally, we had an ON score (starting an ESA – erythropoiesis-stimulating agent) and an OFF score (transfusions or stopping an ESA), but the normal range of the ON score overlapped significantly with starting an ESA, so we just went with the OFF score where there was almost no overlap.
I had also (separately) come up with something called the SAFE Program (Safe and Fair Events) which we used in ISU (international skating federation) and trialed in FIS, IBU, IAAF and UCI in 1999-2001. I used a Z score for both hemoglobin and % reticulocytes to detect the degree of abnormality. The current ABP system is an evolution of my work and the original ON/OFF work centered in Australia.
So there is a one off chance of catching someone with a sufficiently abnormal score, but then if you take serial measurements and create a “passport” of scores over time and you have normal scores followed by an abnormal score and back again, that is a much more sensitive test. It is conceivable if you are very severely dehydrated which can raise Hb, but not % retics to get an abnormal OFF score (e.g. Hb going from 14 to 17, with retics staying at 1% causes the score to go from 80 to 110), but even that degree of dehydration produces a score still within the normal limits. In all our altitude studies, we have never seen an abnormal OFF score being at an altitude camp and coming down to sea level.
Originally, we intended to use these scores to deter cheating and not allow a start in an important race (avoiding the cost and embarrassment of a doping finding). The intent was to “herd” cheaters into safer and less effective behaviors. We would test everyone the day before the competition, as well as, random out of competition testing. This program would make people think twice about cheating and would catch people using massive doses (which are both dangerous and provide a huge advantage) and prevent them from competing. To me this program cleans up a lot of sport ahead of time and is a different paradigm than the “cops and robbers” anti-doping game.
It is possible to use “micro-dosing” and stay below the radar, but when doing this, it is unlikely to be harmful to health and the advantage can be negated by legal, ethical means, like proper HiLo training. The effect is safe and fair events reducing the extent of doping and letting athletes decide that they can be clean and win.
Jim, very helpful…… but this gets a bit like other derived variables sometimes better to actually just focus on Hb and retic count.
Response #2 definitely, there are some potential confounding variables. That is why we look at serial measurements in the same person with their own normal values, take into consideration the circumstances and. The reason to come up with derived variables is that it allows us to take the changes in Hb and changes in retic % into one consideration.
Since your first email, I looked up what the values were that raised suspicion in Paula. There were apparently three were between 110 -114. None of these are really that abnormal and all three have logical explanations. Apparently, they were correctly handled and found by experts to not indicate doping. The system worked correctly. This is an example of the (insufficiently informed) media and public casting hurtful accusations incorrectly and inappropriately. Paula is due a public apology.
One final comment, there are also individuals with high Hb values on a genetic basis. We developed a procedure involving looking at family members and historical records back into childhood to provide a “letter of exception” when the genetic basis is documented.
Summary: I would like to thank Dr. Stray-Gundersen for his excellent tutorial on this topic and his permission to post our exchange. His ideas about focusing behavioral incentives and better systems amplified by testing to deter doping are excellent and consistent with my own ideas about how to address this complex problem.
It is finally here! Our data packed and evidence based book on major issues affecting the health of the U.S. population, including smoking, diet, physical activity, and the policy options to move us in the right direction is now available. You can download a no cost PDF version of this book (and other books from the Roadmap series) from the website of the Arizona State University’s Healthcare Delivery and Policy Program. A paperback version is also available from Amazon (no profits to us). We hope that this book will be useful to a wide range of people interested in the topics of population health, physical activity, exercise and diet. We have focused on basic data related to these topics and what policies might be used to promote healthier lifestyles for both individuals and society as a whole.
A couple of weeks ago Ethiopia’s Genzebe Dibaba broke the women’s world record for the 1500m run with a time 3:50.07. The believability of this performance will certainly be questioned because most of the women’s world records in track and field have been stagnant for decades and date to the era of industrial strength doping in the 1980s and 90s. The 1500 record was set by a Chinese athlete in 1993 who was almost certainly doping. Many of the men’s distance running records are also “old” and occurred after the emergence of the blood boosting drug EPO in the late 1980s and before the advent of better (but far from perfect) drug testing regimens in the later 2000s.
A reasonable rule of thumb is that world records in women’s middle and long distance running “should” be on the order of about 11-12% slower than men’s. This is based on the fact that maximal aerobic power is typically that much lower in elite women than men, while other key physiological factors related to lactic acid build up and running efficiency that determine running performance are generally similar. The current fastest time by a man for 1500m in the pre EPO era was set by Said Aouita at 3:29.46 in 1985! The best time since drug testing got better is 3:26.69 by Asbel Kiprop of Kenya set earlier this year (the world record for men is 3:26 set by Hicham El Guerrouj in 1998).
Historically even better performances, but not faster times, were achieved by Jim Ryun and Kip Keino in the late 1960s. Ryun ran a 3:33.1 on a cinder track at the LA Coliseum in 1967. It was also hot that day. A modern optimally tuned track might be worth 3% and if you adjust Ryun’s performance you get an estimated time of about 3:26 and change.
An even more remarkable performance came a year later when Kip Keino ran 3:34.9 at high altitude to win the gold medal at the Mexico Olympics. Mexico City has an altitude of almost 7,400 feet (2,250m), and the best data suggests that lack of oxygen at that altitude should reduce aerobic power by about 10%. Now Keino was altitude adapted because he had spent his life in the highlands of Kenya, but adaptation only gets you so much. So if we are conservative and adjust his performance by 5% an estimated time just over 3:24 seems “possible”. Old school “point tables” from the 1960s and early 70s also suggest that the 5000m times run by Dibaba and also her world record holding sister equate to times under 3:50.
Which brings me back to Dibaba and the women’s 1500m record, her time is a little more than 12% slower than what Keino might have run and between 11 and 12% slower than the projection for Ryun. It is just over 11% slower than the best time for men since drug testing got better. There are all sorts of reasons to be suspect of any world record in sports like track and cycling and the East Africans have done their share of doping. However, given the analysis above, Dibaba’s record seems like it is at the edge of believable to me.
I have recently had the opportunity to hear tech industry leaders discuss how the combination of gene sequencing in large populations plus various forms of “big data” were going to transform medical knowledge, medical practice, and ultimately public health. To be frank these have been pretty standard recitations of the catechism that once we know your genome and link it to enough data about you we will be able to Predict and Prevent most diseases and/or Personally (or Precisely) treat them in a way that maximizes your Participation in all of the relevant decision making and outcomes. This general scheme has been called P4 Medicine.
As I heard these recitations, a couple of things hit me and I began wonder just how insulated the major players in the tech world are from medical and biological reality. So I will list a few concepts for the techies to consider.
- It is all about MAGOTS or multiple assorted genes of tiny significance. This is term coined by the writer David Dobbs and is a pretty good description of the fact that for most common diseases a clear picture of how genetic factors contribute to them has not emerged even when hundreds of thousands of people have been studied. It also seems like the picture is not going to get a whole lot clearer when millions of people are studied. So the signal might not be there. There are also a host of pretty straight forward statistical considerations about what makes a useful clinical test that the tech folks may not have been exposed to. Giving people useful advice based on a biomarker is more than just considering the odds associated with a gene variant. For many common diseases so-called gene scores don’t improve risk prediction much if any over conventional means.
- For some uncommon and very rare diseases seen in children, gene sequencing is providing insights into causes. Unfortunately, many of these tragic diseases are essentially one-offs and it is unlikely that knowledge of the gene defect is going to lead to breakthrough therapies. Gene therapy has been a bust so far and there are currently no licensed products in spite of 25 plus years of strong efforts in the area. There have been reports of some niche successes but it is unclear how long lasting they will be.
- In tech there is something called Moore’s Law about the computing power of semi-conductors doubling every 24 or so months. In drug development there is something called Eroom’s law that describes how, in spite of all the advances in molecular biology and omics, it is getting harder and harder and more expensive to develop new drugs – the reverse of Moore’s Law. There are many potential reasons for this, but unlike most tech things the costs to develop and market new drugs is not coming down, it is skyrocketing. The chart below shows this. Maybe if the techies study up on this chart they will understand they are dealing with a different animal and that what they think about when they think about hardware, search engines, apps, big data, and gizmos of various sorts doesn’t apply to biology and medicine. Bill Gates for one seems to be coming to that realization, but it only took ten years.
- Whatever the limitations in the biology, no worries for the techies. They can just use big data approaches to mine medical records and the smart watch monitors that “everyone” will soon be wearing. The problem here is that electronic medical records are primarily billing, coding, and compliance documents. The quality of the data has far more limitations than is generally known. As for all of this remote monitoring, first people actually have to wear the monitors, second the information has to be reliable, and third people then might have to change their behaviors based on all of this monitoring. There are a lot of what-ifs in all of this and it is unclear just how willing most people are to be actively or passively monitored. More importantly, all sorts of people know they need to not smoke, exercise more, and eat less but getting them to do it is going to be a challenge. Maybe the gizmos will work, but my bet is they will end up like a lot of exercise equipment that gets bought used for a while and then ends up stored in the basement. Sort of like “all diets work” provided people adhere to them.
- Of course one of the promises of tech is that all of this is going to reduce costs. Well, as mentioned above the costs of developing drugs are going up and for cancer the price of new drugs is unrelated to outcomes. There is also evidence that getting a gene screen leads to more not less medical usage by anxious people with in reality nothing to worry about, and then there are likely to be large number of people in what might called the genomic twilight zone with tests that are a little off and no clear course of preferred action. Also, if people do choose to take action at least some of these actions like extra scans, tests, and biopsies are not without risk. They also will increase costs. Monitors that track people and get people to change behavior might work, if people use them.
- Now we can forgive the techies for not knowing much biology and not having full knowledge of the limitations of the biological ideas underpinning P4 medicine. However, shouldn’t we expect them to know about the limitations of “big data”. Robert McNamara – at some level the inventor of big data – attempted to “manage” the Viet Nam War on the basis of metrics, analytics and hard data. He had tried to do the same when he was the CEO of Ford Motor Company and in both cases, but especially Viet Nam, his approach became a sort of tragic cult of data unrelated to reality. The chart below summarizes what has been termed the McNamara Fallacy and is one I use in my talks to academic audiences all over the world on these topics. To me it summarizes many of the perils of big data.
Ultimately, the techies have a lot of money and a lot of toys and a lot of influence. However, it is unclear if they have any insight into what they don’t know or the inherent limitations of their “model”. The blind faith they have in their world view and their self-image as modern day frontiersmen creating a better world is also a disturbing echo of Robert McNamara.
Over the last few months I have run across a couple of ideas — really catchy phrases — that are influencing the way I think about trends and hopefully progress (or lack of it) in medicine. The phrases are idea bubbles, biological plausibility, and bio-babble.
Idea Bubbles & Alzheimer’s Disease
I ran across the phrase idea bubbles when I was doing web search on the amyloid hypothesis for Alzheimer’s Disease. The idea that first emerged in the 1990s is that a buildup of amyloid proteins in the brain is central to the development of Alzheimer’s. This has led to the development of animal models that generate excess amyloid in their brains and also drugs that either slow the buildup or help clear it. It has also led to a number of promising early stage human drug trials in patients with Alzheimer’s that have ultimately failed in larger trials. At this time there are no effective approved anti-amyloid therapies on the market in spite of this vast effort.
All of this was reviewed in a great blog post on Forbes by David Grainger and he discusses why the hypothesis lives on to fight another day and why drug companies, investors and the scientific community is continuing to make “large bets” on the amyloid hypothesis:
— There are a few important lessons from this sorry tale, that extend well beyond Alzheimer’s Disease. It highlights the danger of what I previously called “idea bubbles” – that a hypothesis gains so much credibility over a long period of time that even when the data tells you otherwise, adherents (acolytes may be a better word) question everything but the hypothesis.–
As I drilled down I found the definition of an idea bubble and how it relates to better known bubbles like stock market bubbles.
— An “ideas bubble” occurs when, over a long period of time, positive social feedback disconnects the perceived validity (of the idea) from the real underlying validity – in the same way price and value dissociate in a stock market bubble.” –
Bubbles are sustained by gold rush mentalities, optimism, the fact that careers have been invested in one thing or another, and the general problem of sunk costs.
Bubbles similar to Alzheimer’s and amyloid have occurred for all sorts of cancer therapies (surgery, radiation, chemo) over the last hundred years and I wonder if the next big bubble is going to be the precision medicine bubble and as the noted public health expert David Hunter recently pointed out:
— “In searching for a cure for cancer, we have repeatedly climbed on various bandwagons. They include the radical mastectomy for breast cancer, high-dose chemotherapy, immunotherapy, and — more recently — molecularly “targeted” therapies. In each case, it took someone with courage to point out the limitations or futility of the approaches.
Hope is critical to cancer patients and those treating them, but hope that is not rooted in the facts risks becoming an illusion. As Mikkael Sekeres of the Cleveland Clinic has commented, we should not delude ourselves into believing targeted therapies will be a panacea for cancer treatment.”
Bubbles & Biological Plausibility
One of the things required for a bubble (or perhaps a bio-bubble) to take off is the need for a narrative that makes biological sense and can then underpin a big idea. In a great editorial in the British Medical Journal, David Healy traces how the serotonin hypothesis emerged as an explanation for depression and led to the generation of whole new classes of drugs with marginal efficacy that have been vastly over-used. Another good example is the idea that “free-radicals” cause cancer, aging, and heart disease and that taking anti-oxidants can make people healthier. In fact big clinical trials show the opposite, anti-oxidants can be associated with worse instead of better outcomes. However, the theory lives on as do the pitches. Each of the cancer therapies mentioned above had at the time of their adoption a tight and biologically plausible back story.
The biology of most hard to treat or cure diseases is complicated and usually defies a simple linear story. However, we persist in seeking them. One example that hits home for me as an anesthesiologist and physiologist is what has been described as the “Cult of the Swan-Ganz Catheter”. In the early 1970s it became possible to routinely put big catheters into the hearts of most patients in intensive care units. The idea was that by carefully and precisely measuring the pressure, oxygen levels, and blood flow various places in the heart “goal directed” therapy could be used to give just the right amount of fluid and drugs to patients. This would then improve outcomes for hard to treat diseases like heart failure or severe infections.
Sounds like a good idea, but it has not worked. What is also interesting is that the general narrative appeared in the early 70s, was questioned in the middle 80s, was not really evaluated objectively on a large scale until the 90s, and a firm consensus about the limitations of these catheters only really emerged in the 2000s with an “obituary” written in 2013.
As an aside, when I was a resident in the late 1980s, the placement of these catheters in the ICU was almost like a religious ceremony or sacrament. The more senior Drs. served as high priests while the younger interns and medical student acolytes watched on and waited for ordination and their chance at the altar.
Ideas that are perhaps too good to be true die hard. This is true in medicine and health in so many ways. That the some of the “smartest people around” continue to fall into the same cognitive traps over and over again should make us all think twice before jumping on any bandwagons that are “sure” to cure anything.
There has been yet another wave of “too much” exercise stories in the media based on a recent study of 1 million women from the UK. The idea is that moderate levels of physical activity most days with occasional bouts of strenuous activity can cause big drops in both cardiovascular and all-cause mortality. However, doing a lot of hard training is not as beneficial.
This topic has been recycling for the last couple of years. Alex Hutchinson (who has a Ph.D. in physics) has done an excellent numerical/statistical breakdown on one of the key studies that “supports” the too much exercise hypothesis. Put simply there are many limitations to the whole argument. I have done a couple of posts on what both the epidemiology and physiology tell us on the topic. The first was in 2012 and another one with Brad Stulberg in 2014. I too am a skeptic.
I am in the camp that 30-60 minutes of physical activity most days is the sweet spot for general health and that more is not better, but it is not worse either. Those who really push it most days are also likely motivated by things other than return on investment thinking about their health. Perhaps they want to compete in races or are into pushing themselves to meet more hard core physical goals.
The Swedish Skiers
Whenever this topic comes up I also bring up a paper that followed about 50,000 male finishers of the 90km (~55 miles) Vasaloppet cross country ski race in Sweden. This study used the Swedish medical records system to look at mortality in the race finishers. In preparing for an upcoming talk on exercise and health, I asked my colleague Andy Miller to generate some figures from the skiing study. The one below shows that mortality is about 50% or less than predicted for race finishers compared to the expected rate gleaned from Swedish population records. It also shows that finishing more races was not associated with an uptick in mortality, if anything it was associated with a down tick. Who knows exactly what these folks were doing, but those who finished a number of races certainly had to be doing a lot of strenuous training over many years.
I have repeatedly asked those in the “too much” much camp to rebut this paper and point out any major flaws in it. The bottom line is that it is at least as strong or stronger than the studies “supporting” the too much exercise hypothesis. Until data comes along that clearly refutes the data in the chart above I will remain a skeptic.