I agree with the overall message and conclusion of the letter. I mostly agree with the bullet points of action at the end of the letter. I agree the CRUK adverts are extremely unlikely to do anything good, and may be harmful, so they should probably stop.
But I don’t agree with some of the arguments the academics made in the letter, and I want to talk about this, even though I’m pretty sure I could get slammed for doing so.
I want to state upfront that I completely agree that stigmatising people with obesity is incredibly harmful. I state this because I think some people may profoundly disagree with me on two of my arguments, namely that obesity is a mix of personal choice, the environment and genetics, and that smoking is also a mix of personal choice, the environment and genetics.
I am not arguing that smoking and obesity are the same, or that personal choice is more important than the environment for either smoking or obesity, although both are probably more important than genetics (for most people). I am also not saying I know what to do, how obesity should be tackled, or even if it should be tackled.
Rather, I’m arguing that treating smoking like it’s solely a personal choice is wrong, that personal choice exists for both smoking and obesity, and to deny that is to take away people’s autonomy.
First off, the things I agree with.
I completely agree. Most (possibly all) of the studies associating BMI with health outcomes are observational, not causal. From these studies, you can’t say anything about whether a high BMI causes cancer, or whether something else entirely is going on.
BMI can’t be randomised in a study in the same way a drug or other treatment can. You could look at randomised studies of treatments for obesity and see if they affect cancer in the long run, but that’s expensive, difficult, and probably not ethical – you’d have to not treat one group of people, who initially wanted treatment, to see if they were diagnosed with cancer more frequently than the group who were treated. Oh, and this only tells you if the treatment affects cancer risk, not BMI itself.
You could also look at genetic studies, which can be thought of as a natural experiment, since bits of genes are distributed randomly at conception. You would look at the bits of genetic code causing changes in BMI, and see if they also affect cancer risk, and any other outcomes. .
Genetic studies probably give more of a causal estimate than non-genetic studies, but they aren’t free from bias, e.g. , and . With complex, multifactorial outcomes like BMI, it’s difficult to be completely sure whether the effects you’re seeing are from BMI, or from some other process. It’s still a lot better than non-genetic observational studies though, hence the “more causal” estimates.
The isn’t actually what I was hoping for, which was a study showing that telling people obesity is bad doesn’t affect people’s BMI in the long term. The research was more about people’s perceptions of obesity and “self-efficacy for health behaviour change”. However, I agree that telling people that obesity causes cancer is extremely unlikely to change people’s BMI. I don’t believe that many people who are obese believe obesity has no health consequences, or that telling them there are consequences will change either behaviour or outcomes.
However, my opinion on this isn’t relevant or important. CRUK should not launch a nationwide intervention without evidence that it would work – advertising is an intervention, same as intervening with drugs or surgery. There are potentially both positive (reduction in BMI?) and negative effects (increased stigma towards those perceived as obese), and the money spent on advertising could have gone to studying the causes of cancer. Therefore, the campaign should have had evidence to support its use, beyond “we tested it in focus groups”. If they don’t have the evidence, it shouldn’t have happened.
. It is indefensible if healthcare providers are causing a barrier to access for their healthcare through stigmatising those they perceive as obese. As, of course, is shaming anyone exercising. It’s utterly absurd that anyone should try to make other people feel bad about obesity when they are actively trying to do something about it. The same is true of a lot of shaming, but it feels particularly unjust when people are shamed into not exercising because they are considered too obese to exercise.
Stigmatising obese people will not, and never will, reduce the amount of obesity in the world.
I’ll start with something easy.
The NHS has a , but it’s general advice, along with the information that GPs can recommend both exercise on prescription and local weight loss groups. Given CRUK has stated obesity is bad, it needs to provide some mechanism for reducing obesity, or it would be completely pointless. So I can see why they would partner with a group that would, presumably (hopefully?), be recommended by GPs. Whether that was necessary is debateable, but I don’t see it as being an immediate problem in and of itself.
More concerningly, the letter states that the programmes are:
not effective ways of achieving and maintaining weight loss or preventing cancer
The doesn’t seem to me to support any part of that assertion.
The research is a systematic review and meta-analysis of weight loss among overweight but otherwise healthy adults who used commercial weight-loss programs. The outcome was whether people in the included studies lost more than 5% of their initial body weight. I’ve included a forest plot below, but the conclusion was that, on average, 57% people on a commercial program lost less than 5% of their initial body weight.
Therefore, 43% of people lost more than 5% of their initial body weight, which would seem to me to be evidence that these programs can help people “achieve weight loss”. They don’t help everyone, but that doesn’t mean they aren’t effective – they worked for 43 out of every 100 people! So I disagree that the “evidence demonstrates these programs are not effective”, although I grant that others may disagree.
Some of the biggest studies lasted only 12 weeks, and all but one study lasted less than 1 year. Therefore, this isn’t any kind of evidence for long term effects, or of “maintaining weight loss”. In general, it seems like the longer the study, the more people lost 5% of their body weight. I’d have liked to see a meta-regression on this, to see whether length of study was important, but there wasn’t one. In any case, I also disagree the evidence demonstrates the programs aren’t effective at maintaining weight loss, since that wasn’t tested here.
Finally, the research doesn’t say anything at all about preventing cancer. I would doubt there is much evidence either way for that outcome – if there were, however, it should be cited.
Ok, so this is where things get trickier.
I fundamentally disagree with the approach to this argument. Specifically, this part:
Through making a direct comparison between smoking and weight, your campaign contributes to these assumptions, suggesting that it is a lifestyle choice.
I’m fairly certain this means the authors of this letter, who are criticising CRUK for stigmatising obesity by “implying that individuals are largely in control of and responsible for their body size”, are themselves stating unequivocally that smoking is a lifestyle choice.
I cannot overstate how much I object to this.
I can see why people would think that obesity is different from smoking. Obesity is a consequence of many things, it’s an outcome, not a behaviour or single action that can be stopped at will. Smoking is a deliberate action, where you need to buy cigarettes, light, and inhale them.
But that is completely misrepresenting smoking. Smoking is also a consequence of many things, it’s an outcome as much as a behaviour, same as obesity.
Suffice to say, I strongly reject the idea that smoking is a “lifestyle choice” while obesity is not.
Rather, smoking, like obesity, is complex and multifaceted. If people object to the CRUK adverts solely because smoking is a lifestyle choice, and obesity is not, then I think they are wrong.
There are, of course, plenty of other reasons to object to the adverts.
I expect this will be an unpopular opinion.
Are people largely in control of their body size?
I don’t know, nor do I know how we could ever test this.
But I know that individual choice affects obesity. To imply otherwise is to say that people have absolutely no autonomy over their own body, over what and how much they eat, or how much they exercise.
Choice is clearly not the only factor in play (I mean, see above), and for some people there is very little choice, but for many people there is plenty.
I include myself in this. I have been both overweight and obese. I still am overweight. I have experienced weight-related stigma from both family members and strangers, although not at all to the same degree as others will have done. But it’s my choice to eat and exercise the way I do. My environment affects those choices, and I recognise that I am very privileged in that I live close enough to work that I can walk in, I’m healthy enough to go for a run, and I have the time and resources to choose to eat “healthily” or “unhealthily” (quotes to show how little I care for those definitions), where others don’t have these options.
The choices we make about food and exercise become harder or easier based on the environment and genetics. It’s easier to cook food or exercise if you have the time and resources to do so. It’s easier to eat salads if you like them. It’s easier to eat less if you aren’t depressed and comfort eating is one of the ways you can cope. It’s easier to eat less if you aren’t taking a steroid.
Sometimes, it’s impossible to eat “healthily”, or to exercise, or to make any “good” choices. But this doesn’t mean that for other people, choice had no part in either gaining or losing weight. It also doesn’t mean that anyone should be judged or stigmatised for making those choices.
I don’t know whether structural change through policy or weight-loss programs that target individuals are better for losing weight, either individually or at a population level.
I don’t know whether obesity is even the core issue – what if the main issue is exercise, and both obesity and poor health are a result of not exercising? Even genetic studies couldn’t tell you that, since obesity may cause poor health by making it more difficult to exercise, both through just it being more physically and mentally difficult, and through weight-related stigma making exercise worse.
I don’t know whether the CRUK advert could be beneficial or detrimental. I’m almost certain it’s not going to do any good, but I’m not psychic. I’m equally certain CRUK doesn’t know what effect the adverts will have, since they are apparently not based on firm evidence.
I don’t know whether it’s weight-related stigma to compare obesity weight smoking – I see smoking and obesity as both consequences of personal choice, the environment and genetics. Saying that, the advert may increase stigma all the same, so it could definitely be destructive.
I don’t know by how much obesity causes cancer. I haven’t assessed the evidence CRUK has for their claims, though I’m fairly certain the evidence they have is not causal, so I don’t think they know by how much obesity causes cancer either.
I know weight-related stigma is horrible. It seems to me that much of it could come from people with more privilege who have had to make easier choices stigmatising those who have had to make harder choices.
I know the current Government is extremely unlikely to change policy to promote an environment that reduces obesity. Therefore, I believe the only option people who want to lose weight have is changing their own choices. Some people may benefit from weight loss programs. Others may benefit from lifestyle changes. Others may find it impossible to lose weight. And that’s ok.
While I agree with the letter’s overall message and conclusion, I think the argument could have been limited to the lack of causal evidence of obesity on cancer and the lack of evidence the campaign would have any effect on obesity in this country.
I completely disagree that smoking is a lifestyle choice and obesity is not at all. Both are a mix of personal choice, the environment and genetics.
]]>
To recap, a couple of weeks ago a by Xinzhu (April) Wei & Rasmus Nielsen of the University of California was published, claiming that a deletion in the CCR5 gene increased mortality (in white people of British ancestry in UK Biobank). I had some issues with the paper, which I posted . My tweets got more attention than anything I’d posted before. I’m pretty sure they got more attention than my published papers and conference presentations combined. ¯\_(ツ)_/¯
The CCR5 gene is topical because, as the paper states in the introduction:
In late 2018, a scientist from the Southern University of Science and Technology in Shenzhen, Jiankui He, announced the birth of two babies whose genomes were edited using CRISPR
To be clear, gene-editing human babies is awful. Selecting zygotes that don’t have a known, life-limiting genetic abnormality may be reasonable in some cases, but directly manipulating the genetic code is something else entirely. My arguments against the paper did not stem from any desire to protect the actions of Jiankui He, but to a) highlight a peer review process that was actually pretty awful, b) encourage better use of UK Biobank genetic data, and c) refute an analysis that seemed likely biased.
This paper has received an incredible amount of attention. If it is flawed, then poor science is being heavily promoted. Apart from the obvious problems with promoting something that is potentially biased, others may try to do their own studies using this as a guideline, which I think would be a mistake.
I’ll quickly recap the initial problems I had with the paper (excluding the things that were easily solved by reading the online supplement), then go into what I did to try to replicate the paper’s results. I ran some additional analyses that I didn’t post on Twitter, so I’ll include those results too.
Full disclosure: in addition to to me, Rasmus and I exchanged several emails, and they ran some additional analyses. I’ll try not to talk about any of these analyses as it wasn’t my work, but, if necessary, I may mention pertinent bits of information.
I should also mention that I’m not a geneticist. I’m an epidemiologist/statistician/evidence synthesis researcher who for the past year has been working with UK Biobank genetic data in a unit that is very, very keen on genetic epidemiology. So while I’m confident I can critique the methods for the main analyses with some level of expertise, and have spent an inordinate amount of time looking at this paper in particular, there are some things where I’ll say I just don’t know what the answer is.
I don’t think I’ll write a formal response to the authors in a journal – if anyone is going to, I’ll happily share whatever information you want from my analyses, but it’s not something I’m keen to do myself.
All my code for this is .
Not accounting for relatedness (i.e. related people in a sample) is a . It can bias genetic analyses through population stratification or familial structure, and can be easily dealt with by removing related individuals in a sample (or fancy analysis techniques, e.g. Bolt-LMM). The paper ignored this and used everyone.
Quality control (QC) is also an issue. When the IEU at the University of Bristol was , they looked for sex mismatches, sex chromosome aneuploidy (having sex chromosomes different to XX or XY), and participants with outliers in heterozygosity and missing rates (yeah, ok, I don’t have a good grasp on what this means, but I see it as poor data quality for particular individuals). The paper ignored these too.
The paper states it looks at people of “British ancestry”. Judging by the number in participants in the paper and the reference they used, the authors meant “white British ancestry”. I feel this should have been picked up on in peer review, since the terms are different. The referenced uses “white British ancestry”, so it would have certainly been clearer sticking to that.
The main analysis should have also been adjusted for all principal components (PCs) and centre (where participants went to register with UK Biobank). This helps to control for population stratification, and we know that . I thought choosing variables to include as covariables based on statistical significance was discouraged, but . Still, I see no plausible reason to do so in this case – principal components represent population stratification, population stratification is a confounder of the association between SNPs and any outcome, so adjust for them. There are enough people in this analysis to take the hit.
I don’t know why the main analysis was a ratio of the crude mortality rates at 76 years of age (rather than a Cox regression), and I don’t know why there are no confidence intervals (CIs) on the estimate. The CI exists, it’s in the online supplement. Peer review should have had problems with this. It is unconscionable that any journal, let alone a top-tier journal, would publish a paper when the main result doesn’t have any measure of the variability of the estimate. A P value isn’t good enough when it’s a non-symmetrical error term, since you can’t estimate the standard error.
So why is the CI buried in an additional file when it would have been so easy to put it into the main text? The CI is from bootstrapping, whereas the P value is from a log-rank test, and the CI of the main result crosses the null. The main result is non-significant and significant at the same time. This could be a reason why the CI wasn’t in the main text.
It’s also noteworthy that although the deletion appears strongly to be recessive (only has an effect is both chromosomes have the deletion), the main analysis reports delta-32/delta-32 against +/+, which surely has less power than delta-32/delta-32 against +/+ or delta-32/+. The CI might have been significant otherwise.
I think it’s wrong to present one-sided P values (in general, but definitely here). The hypothesis should not have been that the CCR5 deletion would increase mortality; it should have been ambivalent, like almost all hypotheses in this field. The whole point of the CRISPR was that the babies would be more protected from HIV, so unless the authors had an unimaginably strong prior that CCR5 was deleterious, why would they use one-sided P values? Cynically, but without a strong reason to think otherwise, I can only imagine because one-sided P values are half as large as two-sided P values.
The best analysis, I think, would have been a Cox regression. Happily, the authors did this after the main analysis. But the full analysis that included all PCs (but not centre) was relegated to the supplement, for reasons that are baffling since it gives the same result as using just 5 PCs.
Also, the survival curve should have CIs. We know nothing about whether those curves are separate without CIs. I reproduced survival curves with a different SNP (see below) – the CIs are large.
I’m not going to talk about the Hardy-Weinburg Equilibrium (HWE, inbreeding) analysis– it’s still not an area I’m familiar with, and I don’t really think it adds much to the analysis. There are loads of reasons why a SNP might be out of HWE – dying early is certainly one of them, but it feels like this would just be a confirmation of something you’d know from a Cox regression.
I have access to UK Biobank data for my own work, so I didn’t think it would be too complex to replicate the analyses to see if I came up with the same answer. I don’t have access to rs62625034, the SNP the paper says is a great proxy of the delta-32 deletion, for reasons that I’ll go into later. However, I did have access to rs113010081, which the paper said gave the same results. I also used rs113341849, which is another SNP in the same region that has extremely high correlation with the deletion (both SNPs have R2 values above 0.93 with rs333, which is the rs ID for the delta-32 deletion). Ideally, all three SNPs would give the same answer.
First, I created the analysis dataset:
I conducted 12 analyses in total (6 for each SNP), but they were all pretty similar:
With this suite of analyses, I was hoping to find out whether:
I found… Nothing. There was very little evidence the SNPs were associated with mortality (the hazard ratios, HRs, were barely different from 1, and the confidence intervals were very wide). There was little evidence including relateds or more covariables, or changing the time variable, changed the results.
Here’s just one example of the many survival curves I made, looking at delta-32/delta-32 (1) versus both other genotypes in unrelated people only (not adjusted, as Stata doesn’t want to give me a survival curve with CIs that is also adjusted) – this corresponds to the analysis in row 6.
You’ll notice that the CIs overlap. A lot. You can also see that both events and participants are rare in the late 70s (the long horizontal and vertical stretches) – I think that’s because there are relatively few people who were that old at the end of their follow-up. Average follow-up time was 7 years, so to estimate mortality up to 76 years, I imagine you’d want quite a few people to be 69 years or older, so they’d be 76 at the end of follow-up (if they didn’t die). Only 3.8% of UK Biobank participants were 69 years or older.
In my original tweet thread, I only did the analysis in row 2, but I think all the results are fairly conclusive for not showing much.
In a reply to me, Rasmus stated:
This is the claim that turned out to be incorrect:
Never trust data that isn’t shown – apart from anything else, when repeating analyses and changing things each time, it’s easy to forget to redo an extra analysis if the manuscript doesn’t contain the results anywhere.
This also means I couldn’t directly replicate the paper’s analysis, as I don’t have access to rs62625034. Why not? I’m not sure, but the likely explanation is that it didn’t pass the quality control process (either ours or UK Biobank’s, I’m not sure).
I’ve concluded that the only possible reason for a difference between my analysis and the paper’s analysis is that the SNPs are different. Much more different than would be expected, given the high amount of correlation between my two SNPs and the deletion, which the paper claims rs62625034 is measuring directly.
One possible reason for this is the imputation of SNP data. As far as I can tell, neither of my SNPs were measured directly, they were imputed. This isn’t uncommon for any particular SNP, as imputation of SNP data is generally very good. As I understand it, genetic code is transmitted in blocks, and the blocks are fairly steady between people of the same population, so if you measure one or two SNPs in a block, you can deduce the remaining SNPs in the same block.
In any case there is a lot of genetic data to start with – each genotyping chip measures hundred of thousands of SNPs. Also, we can measure the likely success rate of the imputation, and SNPs that are poorly imputed (for a given value of “poorly”) are removed before anyone sees them.
The two SNPs I used had good “info scores” (around 0.95 I think – for reference, we dropped all SNPs with an info score of less than 0.3 for SNPs with minor allele frequencies similar), so we can be pretty confident in their imputation. On the other hand, rs62625034 was not imputed in the paper, it was measured directly. That doesn’t mean everyone had a measurement – I understand the missing rate of the SNP was around 3.4% in UK Biobank (this is from direct communication with the authors, not from the paper).
But. And this is a weird but that I don’t have the expertise to explain, the imputation of the SNPs I used looks… well… weird. When you impute SNP data, you impute values between 0 and 2. They don’t have to be integer values, so dosages of 0.07 or 1.5 are valid. Ideally, the imputation would only give integer values, so you’d be confident this person had 2 mutant alleles, and this person 1, and that person none. In many cases, that’s mostly what happens.
Non-integer dosages don’t seem like a big problem to me. If I’m using polygenic risk scores, I don’t even bother making them integers, I just leave them as decimals. Across a population, it shouldn’t matter, the variance of my final estimate will just be a bit smaller than it should be. But for this work, I had to make the non-integer dosages integers, so anything less than 0.5 I made 0, anything 0.5 to 1.5 was 1, and anything above 1.5 was 2. I’m pretty sure this is fine.
Unless there’s more non-integer doses in one allele than the other.
rs113010081 has non-integer dosages for almost 14% of white British participants in UK Biobank (excluding relateds). But the non-integer dosages are not distributed evenly across dosages. No. The twos has way more non-integer dosages than the ones, which had way more non-integer dosages than the zeros.
In the below tables, the non-integers are represented by being missing (a full stop) in the rs113010081_x_tri variable, whereas the rs113010081_tri variable is the one I used in the analysis. You can see that of the 4,736 participants I thought had twos, 3,490 (73.69%) of those actually had non-integer dosages somewhere between 1.5 and 2.
What does this mean?
I’ve no idea.
I think it might mean the imputation for this region of the genome might be a bit weird. rs113341849 has the same pattern, so it isn’t just this one SNP.
But I don’t know why it’s happened, or even whether it’s particularly relevant. I admit ignorance – this is something I’ve never looked for, let alone seen, and I don’t know enough to say what’s typical.
I looked at a few hundred other SNPs to see if this is just a function of the minor allele frequency, and so the imputation was naturally just less certain because there was less information. But while there is an association between the minor allele frequency and non-integer dosages across dosages, it doesn’t explain all the variance in the estimate. There were very few SNPs with patterns as pronounced as in rs113010081 and rs113341849, even for SNPs with far smaller minor allele frequencies.
Does this undermine my analysis, and make the paper’s more believable?
I don’t know.
I tried to look at this with a couple more analyses. In the “x” analyses, I only included participants with integer values of dose, and in the “y” analyses, I only included participants with dosages < 0.05 from an integer. You can see in the results table that only using integers removed any effect of either SNP. This could be evidence that the imputation having an effect, or it could be chance. Who knows.
rs62625034 was directly measured, but not imputed, in the paper. Why?
It’s possibly because the SNP isn’t measuring what the probe meant to measure. It clearly has a very different minor allele frequency in UK Biobank (0.1159) than in the (~0.03). The paper states this means it’s likely measuring the delta-32 deletion, since the frequencies are similar and rs62625034 sits in the deletion region. This mismatch may have made it fail quality control.
But this raises a couple of issues. First is whether the missingness in rs62625034 is a problem – is the data missing completely at random or not missing at random. If the former, great. If the latter, not great.
The second issue is that rs62625034 should be measuring a SNP, not a deletion. In people without the deletion, the probe could well be picking up people with the SNP. The rs62625034 measurement in UK Biobank should be a mixture between the deletion and a SNP. The R2 between rs62625034 and the deletion is not 1 (although it is higher than for my SNPs – again, this was mentioned in an email to me from the authors, not in the paper), which could happen if the SNP is picking up more than the deletion.
The third issue, one I’ve realised only just now, is that that rs62625034 is not associated with lifespan in UK Biobank (and other datasets). This means that maybe it doesn’t matter that rs62625034 is likely picking up more than just the deletion.
Peter Joshi, author of the article, helpfully posted these :
If I read this right, Peter used UK Biobank (and other data) to produce the above plot showing lots of SNPs and their association with mortality (the higher the SNP, the more it affects mortality).
Not only does rs62625034 not show any association with mortality, but how did Peter find a minor allele frequency of 0.035 for rs62625034 and the paper find 0.1159? This is crazy. A minor allele frequency of 0.035 is about the same as the GO-ESP population, so it seems perfectly fine, whereas 0.1159 does not.
I didn’t clock this when I first saw it (sorry Peter), but using the same datasets and getting different minor allele frequencies is weird. Properly weird. Like counting the number of men and women in a dataset and getting wildly different answers. Maybe I’m misunderstanding, it wouldn’t be the first time – maybe the minor allele frequencies are different because of something else. But they both used UK Biobank, so I have no idea how.
I have no answer for this. I also feel like I’ve buried the lead in this post now. But let’s pretend it was all building up to this.
This paper has been enormously successful, at least in terms of publicity. I also like to think that my “post-publication peer review” and Rasmus’s reply represents a nice collaborative exchange that wouldn’t have been possible without Twitter. I suppose I could have sent an email, but that doesn’t feel as useful somehow.
However, there are many flaws with the paper that should have been addressed in peer review. I’d love to ask the reviewers why they didn’t insist on the following:
So, do I believe “CCR5-∆32 is deleterious in the homozygous state in humans”?
No, I don’t believe there is enough evidence to say that the delta-32 deletion in CCR-5 affects mortality in people of white British ancestry, let alone people of other ancestries.
I know that this post has likely come out far too late to dam the flood of news articles that have already come out. But I kind of hope that what I’ve done will be useful to someone.
]]>Also, Snowdon said I should get a blog.
The article cherry-picks data, conflates observational epidemiology with causal inference, and misunderstands basic statistics.
I don’t care whether people drink or not. I’d prefer it if people drank in moderation, but I’m certainly not an advocate for teetotalism.
I do, however, think people should be informed of the risks of anything they do, if they want to be.
I think the article is poor, but think people should feel happy to drink if they want to. Based on the available evidence though, I wouldn’t say it helps your heart, and there may be some risk of drinking moderately.
But that’s the same for cake.
Let’s delve into the article.
The piece starts out by saying that there is a drive to treat drinkers like smokers. That seems to conflate saying that alcohol can be harmful with saying people shouldn’t drink alcohol.
They aren’t the same.
I also don’t know which organisation runs this campaign, but calling people who say alcohol is harmful “anti-alcohol activists” is a trick to make those same people seem like “others” or “them”. It also makes them sound like fanatics, trying to stop “you” drinking “your” alcohol.
But that’s not why I’m writing this.
It’s the “health benefits of moderate drinking”, stated as if it were indisputable fact. As if it’s known that alcohol causes health benefits.
Causal statements like this need rigorous proof. They need hard evidence. If moderate alcohol intake is associated with health benefits, that’s one thing. But saying it causes those health benefits is quite another.
Even if alcohol caused some benefits though, something can have both positive and negative effects – it’s not absurd to tell people about the dangers of something even if it could have benefits, that’s why medications have lists of side-effects.
And calling something “statistical chicanery” is another tactic to make it seem like people saying alcohol is harmful are doing so by cheating, or through deception.
The link to “decades of evidence” is to a 2004 meta-analysis, showing
Strong trends in risk were observed for cancers of the oral cavity, esophagus and larynx, hypertension, liver cirrhosis, chronic pancreatitis, and injuries and violence.
Which sounds pretty bad to me.
I’m guessing that if this is the right link, then it was meant for you to observe that there is a J-shaped relationship between alcohol intake and coronary heart disease.
That is, low and high levels of drinking are bad for your heart, but some is good. This sounds good – alcohol protects your heart – and it is common advice to hear from loads of people, doctors included.
The problem is that the evidence for this assertion comes from observational studies – the association is NOT causal.
This is all about causality.
We cannot say that drinking alcohol protects your heart, only that if you drink moderately, you are less likely to have heart problems. They sound the same, but they aren’t. The first is causation, the second is correlation, and if there’s one thing statisticians love to say, it’s “correlation is not causation”.
Studies measuring alcohol intake and heart problems are mostly either cross-sectional or longitudinal – they either look at people at one point in time, or follow them up for some time.
These are observational studies, they (probably) don’t change people’s drinking behaviour. Of course, people might change their behaviour a little if they know they have to fill in a questionnaire about their drinking habits, but we kind of have to ignore that for now.
Anyway, observational studies do not allow you to make causal statements like “drinking is good for your heart”.
Why not?
It comes down to bias/confounding, the same things I on Twitter when those researchers claimed .
There are ways to account for this when comparing drinkers with non-drinkers, but they rely on knowing every possible way people are different.
Imagine the reasons why someone doesn’t drink very much. Off the top of my head, they:
Now imagine the reasons why someone doesn’t drink at all. The above holds true, but you can add in:
A confounder is something that affects both the exposure (alcohol intake) and the outcome (health). If you want to compare drinkers and non-drinkers, you need to account for everything that might affect someone’s drinking behaviour and their health. This includes many of the things I listed above.
But this is nigh-on impossible, as behaviours are governed by so many things. You can adjust out *some* of the confounding, but you can’t prove you’ve gotten ALL the confounding. You can measure people’s health, but you won’t capture everything that contributes to how healthy a person is. You can ask people about their behaviour, but there’s no way you’ll capture everything from a person’s life.
If you see, observationally, that moderate drinking is associated with fewer heart problems, what does that imply?
My last was about how you really should have mechanisms to posit causality, i.e. if you say X causes Y, you need to have an idea of how X causes Y (or evidence from trials). This holds true here too.
Suppose alcohol protects your heart. How?
Fortunately, people have postulated mechanisms, and we can assess them: one possible mechanism is that alcohol increases HDL cholesterol (the good one), which improves heart health.
We can’t assign a direction to that mechanism using observational studies, since people who live healthily might have good HDL levels anyway, meaning they drink moderate amounts because they can.
To work this out (and to assign causality more generally), you can use trials. Ideally randomised controlled trials, since they’re so good. The ideal trial, the one where we wouldn’t need mechanisms at all, is one where we randomise people to drink certain amounts (none, a little, some, a lot) over the course of their life, make sure they stick to that, then see what happens to them.
Since that would never work, the next best thing is to test the proposed mechanisms, because if alcohol increases HDL cholesterol in the short-term (i.e. after a few weeks), then we’re probably on safer territory. We’d then have to prove that higher HDL cholesterol causes better heart health, but one thing at a time.
Well, a of trials was done to look at exactly that, fairly recently too (2011):
Effect of alcohol consumption on biological markers associated with risk of coronary heart disease: systematic review and meta-analysis of interventional studies
In total, there were 63 trials included, looking at a few markers of heart health, including HDL cholesterol. They found that alcohol increased HDL a little bit.
But there were problems.
The trials were a mix of things, but having looked at a few, it looks like many studies randomised small numbers of people to drink either an alcoholic drink or a non-alcoholic drink (the good ones had alcohol-free wine compared with normal wine), and they measured their HDL before and after the trial.
The problem with small trials is that they can have quite variable results, because there is a lot of imprecision when you don’t have enough people. You do a trial with 60 people and get a result. You repeat it with new people, and get an entirely different result.
That’s one reason why we do meta-analyses in the first place – one study rarely can tell you the whole story, but when you combine loads of studies, you get closer to the truth.
But academic journals exist, and they tend to publish studies that are interesting, i.e. ones that show a “statistically significant” effect of something, in this case alcohol on HDL. This has three effects.
Repeat a study enough, you’ll eventually get the result you want. Since lots of people want alcohol to be beneficial to the heart, and because these trials are pretty inexpensive, there is a good chance that there are missing studies that were never published.
I’m aware this sounds like I’m reaching, and I could never prove that these things happened. But I can show, with relative certainty, that there are missing studies, ones that showed either that alcohol didn’t affect HDL or reduced it.
In meta-analyses, we tend to produce , which show whether studies fall symmetrically around the average effect, i.e. the average effect of alcohol on HDL. Since studies should give results that fall basically randomly around the true effect of alcohol on HDL, they should be symmetrical on a funnel plot.
If some studies have NOT been published, i.e. ones falling in the “no effect” area, or those without statistical significance, then you see asymmetry.
We don’t know WHY these studies are missing, just that something isn’t right, and we should treat the average effect with caution. The link I gave above shows a nice symmetrical funnel plot, and an asymmetrical one.
And here is the funnel plot I made from the meta-analysis data.
Note: I had to make this plot myself, the authors did not publish it – they stated in the paper:
No asymmetry was found on visual inspection of the funnel plot for each biomarker, suggesting that significant publication bias was unlikely.
See how the effect gets smaller (more left) as the “s.e. of md” goes down? That’s the standard error of the mean difference – the smaller it is, the more precise the result is, the more confident we are in the result. More people = smaller standard error.
With a smaller numbers of people, the standard error goes up, and the more variable the results become. One study may find a huge effect, the next a tiny effect. The fact ALL the small studies found a comparatively large effect is extremely suspicious.
So yeah, there was asymmetry in the funnel plot for the effect of alcohol on HDL cholesterol. The asymmetry says to me that there are missing studies that showed no effect of alcohol on HDL cholesterol, and so the true effect of alcohol on HDL cholesterol will thus be smaller than they said.
To be honest, there’s probably no effect, or if there is, it’s tiny.
To be fair though, I should say most of the studies had a small follow-up time. It’s entirely possible longer studies would have found a larger effect. The point is, we don’t know.
There are likely other proposed mechanisms, but I think the HDL mechanism is the one commonly thought of as the big one. :
The best-known effect of alcohol is a small increase in HDL cholesterol
So, I don’t really see the evidence as being particularly in support of alcohol protecting the heart. The observational evidence is confounded and possibly has reverse causation. The trial evidence looks to be biased. What about the genetic evidence?
We use genetics to look at things that are difficult to test observationally or through trials. We do this because it can (and should) be unconfounded and is not affected by reverse causation. This is true when we can show how and why the genetics works.
For proteins, we’re on pretty solid ground. A change in gene X causes a change in protein Y. But for behaviours in general, we’re on much shakier ground.
There is one gene, however, that if slightly faulty, produces a protein that doesn’t break down alcohol properly. This is a good genetic marker, since people without that protein get hangovers very quickly after drinking alcohol, so tend not to drink.
One found
Individuals with a genetic variant associated with non-drinking and lower alcohol consumption had a more favourable cardiovascular profile and a reduced risk of coronary heart disease than those without the genetic variant.
(in an Asian population) found:
robust evidence that alcohol consumption adversely affects several cardiovascular disease risk factors, including blood pressure, waist to hip ratio, fasting blood glucose and triglyceride levels. Alcohol also increases HDL cholesterol and lowers LDL cholesterol.
So alcohol may well cause higher HDL cholesterol levels.
Note that in genetic studies, you’re looking at lifetime exposure to something, in this case alcohol. So as above, a trial looking at the long-term intake of alcohol may find it raises HDL cholesterol.
It’s just, currently, the trial data doesn’t support this.
Halfway now, and I hope I have shown that the evidence alcohol protects the heart is shaky at best. This is kind of important for later. I don’t claim to have done a systematic or thorough search though, so let me know if there is anything big I’ve missed!
Let’s return to the article.
I got side-tracked by the article’s reference to the paper that said alcohol increases risk to loads of bad stuff, and has a J-shaped association with heart disease.
The is an example of why I mostly dislike research articles being converted into media articles. It is *exceedingly* difficult to convey the nuances of epidemiological research in 850 words to a lay audience. It just isn’t possible to relay all the necessary information that was used to inform the overall conclusion of the Global Burden of Disease study.
David Spiegelhatler’s flippant remarks at the end probably don’t help:
Yet Prof David Spiegelhalter, Winton Professor for the Public Understanding of Risk at the University of Cambridge, sounded a note of caution about the findings.
“Given the pleasure presumably associated with moderate drinking, claiming there is no ‘safe’ level does not seem an argument for abstention,” he said.
“There is no safe level of driving, but the government does not recommend that people avoid driving.
“Come to think of it, there is no safe level of living, but nobody would recommend abstention.”
states in the discussion that their
results point to a need to revisit alcohol control policies and health programmes, and to consider recommendations for abstention
Spiegelhalter seizes on the use of the word abstention to make the study authors sound more unreasonable that they actually are. I don’t think this is particularly helpful when talking about, well, anything. If you can make people who disagree with you look unreasonable, then it’ll make for an easier argument, but it doesn’t make you right and them wrong.
The study in attempted to explain the additional risk of cancers from drinking to that of smoking, because the public in general understand that smoking is bad. I don’t have an opinion one way or the other for this method of communicating risk.
I’m quite happy to state I don’t know enough about communication of risk.
What I do know is that that communicating risk is difficult, as few people are trained in statistics. Even those who are aren’t necessarily able to convert an abstract risk into their daily reality. So maybe the paper is useful, maybe not. I do not think their research question is brilliant, but my opinion is pretty uninformed:
In essence, we aim to answer the question: ‘Purely in terms of cancer risk – how many cigarettes are there in a bottle of wine’?
I don’t think it’s “shameless” (why should the authors feel shame?), and it isn’t a “deliberate conflation” of smoking and drinking. It’s expressing the risk of one behaviour as the similar risk you get from doing a different behaviour.
The article’s theory is that the authors wrote the paper for headlines (It’s worth stating here that saying “yeah, right” in an article makes you sound like a child.):
Maybe they were targeting the media with their paper. In general, researchers pretty much all want their work to be noticed, to have people possibly even act on their work. That’s whole point of research. It’s not a bad thing to want your work to be useful.
I dislike overstated claims, making work seem more important than it is, and gunning for the media at the expense of good work. But equally, researchers need their work to be seen. We’re rated on it now. If our work is shown to have “impact”, then it’s classified better, so we’re classified better, so our university’s are classified as better. I dislike this (not least because it means method work can be ignored, since it may take years for it to be appreciated and used widely), but there we go.
Questioning the paper’s academic merit is fine though, so what are the criticisms of the paper? There’s just one: that the authors chose a level of smoking that has not been extensively measured as the comparator.
The article says they used 35 cigarette week and “extrapolated” to 10 cigarettes per week, and called this “having a guess”.
It’s not extrapolation, and it’s not a guess.
The authors looked at previous studies, usually meta-analyses, to see what the extra risk of smoking 35 cigarettes a week was on several cancers, adjusted for alcohol intake. They made some assumptions with how they calculated the risk of 10 cigarettes a week: they assumed each cigarette was as bad as the next one, so assumed that each of the 35 cigarettes contributed to the extra risk of cancer equally.
This assumes a linear association between your exposure (smoking) and outcome (cancer), an incredibly common thing done by almost all researchers. It is actually interpolation though, not extrapolation (since the data point they wanted was between two they had). And it isn’t a guess, it’s based on evidence (with appropriate assumptions).
The article says there is a single study estimating risks at low levels of cigarette smoking that should have been used. However, that study didn’t adjust for drinking, so it was useless for this study. For the study to be meaningful, they had to work out the extra risk from smoking on cancer independent from any effect of alcohol, since alcohol and smoking are correlated.
Finally, the study didn’t just report 10 cigarettes a week. They reported 35 cigarettes a week, so made no guesses or assumptions (beyond those made in the meta-analyses). So I think the criticism of the study was unfounded. The article felt otherwise:
OK, but all it was doing was communicating risk. If people haven’t thought about smoking 10 cigarettes then it didn’t do it well, but how would anyone know? Has a study been done asking people?
This isn’t a war on alcohol, or a conspiracy to link alcohol and smoking so people stop drinking. It’s not a crusade by people that hate alcohol. It was trying to communicate the risk of alcohol to people who might not know how to deal with the statistics presented in dense academic papers.
The “decades of epidemiological studies” referenced is actually a paper from 2018, concluding:
The study supports a J-shaped association between alcohol and mortality in older adults, which remains after adjustment for cancer risk. The results indicate that intakes below 1 drink per day were associated with the lowest risk of death.
The J-shaped association could easily be confounding – teetotalers are different to drinkers in many ways (see above). But that’s not really “decades of studies” anyway, and the conclusion was that drinking very little or nothing was best.
The second reference is to a systematic review of observational studies. This is relevant to the point about decades of research, but not conclusive given they are observational studies.
The claim that the positive association between alcohol intake and heart health has “been tested and retested dozens, if not hundreds, of times by researchers all over the world and has always come up smiling” is facetious.
It betrays a lack of understanding of causality, or publication bias, of confounding and reverse causation.
Basically, a lack of understanding about the very studies the article is leveraging to support its argument. It shows ignorance of how to make causal claims, because the entire premise of the argument has been built on observational data.
This is next part is inflammatory and wrong:
It certainly wouldn’t put you in the “Flat Earth” territory to believe that alcohol might not be good for you, unless you took as gospel that observational evidence was causal.
This reference is to observational studies, not “biological experiments”. I don’t know which biological experiments is meant here, maybe the ones I talked about earlier and dismissed? Also, the best observational evidence we have is probably genetic for many things, because the chance of confounding is slightly less. And the genetic studies say any level of alcohol has a risk.
There are certainly people who have agendas. People who want everyone to stop drinking. I do not doubt this. But who in the “’public health’ lobby” is the article referencing? What claims have they made? Without references, this it’s a pointless argument.
Also, public health people would like it if everyone stopped smoking and drinking, because public health would improve. That is, on average, people would be healthier – even if alcohol helps the heart, more people die of alcohol-related causes than would be saved by any protective effect of alcohol.
But this doesn’t mean public health people call for teetotalism.
To my knowledge, they generally advocate reducing drinking and in general, moderation. Portraying them as fanatics who “deny, doubt and dismiss” is ludicrous.
Prospective studies are good, because they can rule out reverse causation, i.e. heart problems can’t cause you to reduce alcohol intake if everyone starts with a good heart. But they do not address confounding. They are just as flawed to confounding as cross-sectional studies.
So prospective studies might be the best “observational” evidence (not “epidemiological” evidence given we deal with trials too), but only if you want to discount genetics. And “best” doesn’t mean “correct”
Statistical significance in individual studies is not something I have every talked about in meta-analysis. Because it isn’t relevant. At all. In fact, if your small studies are all significant and your big studies aren’t, it’s probably because you have publication bias, i.e. small studies are published because they had “interesting” results, big ones because they were good.
The article is now comparing meta-analyses with 31 and 25 studies with one with 2 studies. Given the large variation in the effects seen in the studies from the previous meta-analyses, I wouldn’t trust the result of just 2 studies. I actually tried to find those 2 studies to see if they were big/good, but in the original meta-analysis paper, they don’t make it easy to isolate which studies those two actually are. So I gave up.
This part is a fundamental misunderstanding of statistics. Saying something is “not statistically significantly” associated with an outcome is not the same as saying something is “not associated” with an outcome.
There are plenty of reasons why even large associations may not be statistically significant. In general, it will be because you didn’t study enough people, or for long enough. But how the analysis was conducted matters, as does chance. But it takes as much or more evidence to prove two things aren’t associated as proving they are.
If you start from the assumption that alcohol is good, then yeah, you would need evidence that there are risks from very light drinking. But why start from that premise?
We know that drinking lots is bad, so why assume drinking a bit is good? I can see why, when presented with evidence that moderate alcohol drinking and good heart health are correlated, people might think drinking is good for your heart. But what about every other disease?
In the absence of complete evidence, it would make sense to assume that if lots of alcohol is bad, some alcohol may also be bad. I think it is a bit much to start from the premise that because moderate drinking is correlated with good heart health, small quantities of alcohol are fine or good.
The burden of proof should be on whether alcohol is fine in any quantity. And then finding out how much is “reasonably ok”, and at which point it becomes “too much”.
And no, again, we don’t know that “very light drinking confers significant health benefits to the heart”, because this is a causal statement and you only have observational evidence. If you drink very lightly, your heart may well be in better shape than people who drink a lot or don’t drink, but that doesn’t mean the drinking caused your heart to be healthy.
I certainly dismiss this article as quackery with mathematics…
Actually, this is a good point, but is against the article’s argument. If you have low-quality, biased studies in a meta-analysis, that meta-analysis will be more low-quality and biased. Meta-analysis is not a cure for poor underlying research.
Stated somewhat more prosaically:
shit in, shit out
“Ultra-low relative risks” is relative. Most people won’t be concerned about small risks. But it makes a big difference to a population.
Research is often not targeted at individuals, it’s targeted at people who make policies that affect vast numbers of people. A small decrease in risk probably won’t affect any single person in any noticeable way. But it might save hundreds or thousands of people.
The article is guilty of the same thing. It “clings” to research that shows a beneficial effect of alcohol because it suits the argument. The observational evidence is confounded. It’s biased. The trial evidence is likely biased and wrong.
If all your evidence suffers from the same flaw (confounding, affecting each study roughly the same), then the size of your pile of evidence is completely irrelevant. A lot of wrong answers won’t help you find the right one.
A good example in a different field is survivorship bias when looking at the damage done to planes returning from missions in WW2. Researchers looked at the damage on returning planes, and recommended that damaged areas get reinforced.
Except this would be pointless.
Abraham Wald noted that planes that returned survived – they never saw the damage done to the planes that were shot down. If a plane returned with holes, those holes didn’t matter. Whereas the areas that were NOT hit did matter. It wouldn’t matter how many planes you looked at. You could gather all the evidence that existed, and it would still be wrong, because of bias.
The same is true of observational studies.
You can do a million studies, but if they are all biased the same way, your answer will be wrong.
The article makes the same ignorant point once again, conflating observational research with causal inference, while also cherry-picking studies. The facile point Snowdon makes about spending time on PubMed to reinforce his own views belies his own flawed approach to medical literature.
And that’s it for the article!
In summary, the article uses observational data to make causal claims, cherry picks evidence (while accusing others of doing the same), and seems to misunderstand basic statistical points about statistical significance.
]]>But it also got me thinking about the previous conferences and training courses I’ve been on, and how tricky I find it to do something that seems to be pretty essential to an academic’s career: networking.
In this post, I’ll talk about my past conferences, and how I muddled through without any idea what I was doing. I still have no idea what I’m doing, so don’t expect any helpful tips or hints (I mean, “talk to people” seems to be the sole advice necessary, perhaps with the additional hint to look up who’s going to a conference and maybe hit them up with an email beforehand saying you’d love to meet to have a chat). But if you feel like you don’t make the most of conferences, have trouble starting conversations with people you don’t know, then at least you’ll know you’re not alone. It’s probably worth mentioning that I spoke reasonably frequently in my own department and took many courses there, but I don’t class that as at all similar to speaking at a Conference or going away on a week(s)-long course.
The first conference I went to was in London (UK), a one-day conference organised by The Lancet called in 2013 (1st year of my PhD). I wasn’t presenting, and, to be honest, I can’t remember much about this conference, other than I sat in a lecture theatre for a day, didn’t really move and didn’t speak to anyone. There are two impressions I took away: the first was that Ben Goldacre’s hair is magnificent; the second was that there was an early-year PhD student who spoke to this massive room full of professionals, and I thought “I can’t imagine doing that”.
I also wasn’t presenting at the second conference I went to, the in Cambridge (UK) in 2014 (now the European causal inference meeting, how times change). I was massively out of place here – the conference was held in the mathematics department, which should have been my first clue. It turned out that my tiny epidemiology/medical statistics brain was unprepared for very technical lectures about things I didn’t understand. I guess the lesson here is to check the conference thoroughly before you go to avoid wasting hours sat staring at a series of intimidating Greek symbols you can’t even guess the meaning of.
I went to a lot of local training courses (Bristol does loads), but this was my first training course that lasted more than a week, and also happened to be on a different continent. The at the University of Michigan (USA) was a 3-week course, where I did basic epidemiology for 3 weeks in the morning, and three 1-week courses in the afternoon. The basic epi course was helpful, as were two of the other courses. The third course, however, was taught entirely in R and SAS, statistical software packages I had no understanding of. I thus did not attend after the first day; I couldn’t understand what was going on, and I figured my time could be better spent.
The University of Michigan is in Ann Arbor, which is great, and I spent a lot of time walking/running around the nice woods. I went with a colleague from my University, which can certainly be difficult – it’s one thing to see someone in the office day-to-day, but a three week trip to a different continent is very different. I think it went ok, but I can only speak for myself…
Overall, we managed to get to know a couple of other course delegates, but as many of the attendees were there as parts of groups, it was quite difficult to socialise. Still, we learnt things, which was the major focus of our time there. I also realised that home-cooking is great – I was pretty sick of takeaway and eating out by the end of the trip. On the way home, we stopped off at Washington and New York – because the air fare wasn’t any different, we didn’t have to pay for the flights (universities are great), but we obviously paid for our hotel rooms.
The third conference I went to was in Alaska, the (short names are for boring conferences). It was a 5-day event, and I was presenting a poster on the first day, which is especially fun when it’s a 20-hour trip with a 9-hour time difference.
I’d never been to a conference of this size before, so had no real idea of what to expect. Still, I arrived, registered, and slept in preparation for all the questions I would undoubtedly be asked in the poster session. I also went through the conference schedule to find all the sessions I wanted to attend. I’d been given the advice not to both with clearly irrelevant sessions that I at least wouldn’t find interesting: it would be a better use of time to have an hour off, read something, or do some work instead. There were probably 5 or more parallel sessions for every session, mixed in with plenaries and social events. As it turned out, because the conference had a very broad remit, I don’t think there were any sessions on at the same time I wanted to go to, which was nice.
I arrived promptly for my poster session. During the session, I spoke to all of two people. They were both lovely (I even went with one, and an Alaskan native, up a mountain on the final day, which was awesome). But it seemed something of an anticlimax to travel several thousand miles and only speak to two people about my work. It wasn’t even that my poster was incredibly dull (it was only slightly dull, I’m sure), it was that very few people came to a session where there were hundreds of posters on display. I was a little relieved I didn’t have to talk to too many people, but mostly felt deflated.
For the rest of the conference, I went to a Wellcome Trust organised event (as they sponsored my PhD), walked a little around Anchorage, and, as mentioned, went up a mountain with some conference delegates. The necessity of bear mace was a little daunting, but there were people literally running up the mountain (presumably for pleasure), so I figured it was fine. Although since there were quite a few people on the mountain, maybe the runners just felt they didn’t need to outrun the bears…
To round of 2014, I went to the 7th in Glasgow (UK). I was presenting a poster, and after presenting in Alaska I wasn’t really looking forward to it. I was vaguely aware that there would be a poster walk though, which I guessed meant someone official would lead a group around, and whoever was in the group would read the poster and they might ask some questions.
I was therefore quite surprised when I found that I would, in fact, be part of the poster walk: everyone scheduled to stand by their posters during the session would in fact be required to give a talk about their poster to all the other people who were scheduled to stand by their posters. Looking at the layout of the posters, I figured out I would have to speak about halfway through the session. Nowadays, this would give me plenty of time to think of something to say (it was probably only a 5-minute speech at best), but as I had never given a public speech like this before, I felt some pressure.
Still, red-faced and stuttering, I gave a short talk about the work I had done the previous year and answered a few questions. I have literally no idea what anyone else’s posters were about – I was too busy racking my brain thinking of what I needed to say, or too busy feeling relieved to pay much attention.
Morale: even if you are just presenting a poster, be prepared to give a talk about it to 20 other poster people.
When I was looking around for conferences at which I could speak, preferably to an audience sympathetic to a new PhD student who hadn’t spoken at a conference before, I was advised that the (I couldn’t find a good link) in Cardiff (UK) would be a good fit. In fairness, the YSM conference was indeed a good place to give my first presentation – there were only two parallel sessions, the crowd were nice statisticians (many of whom had come over from the Office of National Statistics (ONS) in Newport), and the other presenters were a mixture of ONS staff, early postdocs and PhD students like myself.
I had practiced my talk a fair few times, both to myself and with others, so felt prepared. I don’t remember the specifics, but I gave a talk about albatross plots in a lecture theatre (one of the old ones with wooden benches rising up in tiers I think), answered some questions, and only really felt nervous before speaking (I’ve rarely felt nervous once starting, too busy concentrating on speaking I guess). Afterwards, a nice man came up to me and chatted about my talk, although as far as I remember, this has to date been the only time people have chatted with me after a talk.
I’d love to say that after this talk, the ice had been broken and I never felt nervous before speaking to a crowd about my work again, but it doesn’t really work like that. Over time, I’ve become pretty inured to giving talks (and will usually quite happily talk in front of anyone about anything now), but I still get a little nervous at conferences.
In any case, any UK-based statistical-type people who are looking for a first-time conference, YSM is a good place to start. They’re friendly!
In my third year of my PhD (of four years) in 2016, I decided I should probably go to more conferences and give talks. As such, I sent abstracts to the , , and the . All the abstracts were about the albatross plots I developed, and I figured I would go to the ones that accepted my abstract. As it turned out, they all did.
The ISCB conference was in Birmingham, and although some of the talks were relevant to my work, there wasn’t much there that was really what I was most interested in at the time (evidence synthesis). Still, I enjoyed the conference, and was looking forward to giving my talk. I was immediately daunted by the size of the lecture theatre though – it was a full-on 300/500/some large number seat auditorium, with a projection screen the size of a cinema screen. It was much bigger than I expected – there were plenty of parallel sessions, there weren’t many talks about evidence synthesis at the conference, and there weren’t many people in the auditorium for my session, so I thought I’d be in a smallish room.
I was presenting last, so I read and re-read my presentation and paid no attention to the people on stage (yeah, it was a stage) before me. When it was my turn, I headed up, probably quite a bit more nervous than I’ve been since. The size of the screen behind me was a distraction – I was more used to being able to gesture at the plots and for people to know what I was talking about, but that wouldn’t work here. I was also distracted by the size of the stage – whenever I’d talked before, I had to stay pretty much in place to avoid getting blinded by the projector or blocking peoples’ view.
I managed to say what I needed to though, and it probably went fine. I was asked some questions by people in the audience, and I answered as well as I could. I was given a recommendation to add something to the plot, that I instantly forgot because I was too busy trying to remember how to reply. It’s like exchanging names – I’m too busy trying to remember how I say my part (“hello, my name is…”) to remember the name of the person to which I’m speaking. It would probably be easier if I went first… Still, like trading names, I didn’t feel I could whip out my phone and note down the recommendation before I forgot it.
In any case, I finished up and sat down (the chair said it was nice to hear about something “refreshingly vague”, which I still think is a compliment, but can’t quite be sure). There was a bit of time at the end before a plenary, so I spoke to one or two people who wanted to know a bit more – I even made an albatross plot for someone who was asking about, I think, the in a meta-analysis with previous studies looking at magnesium sulphate to treat early heart attacks (hint: it looks weird in a meta-analysis).
After the little break was over, the plenary speaker got up to deliver his lecture. I realised immediately that it was the same person who gave me the recommendation I had now forgotten. And it was , probably one of the most famous living statisticians. Damn. I really wish I’d whipped out my phone and noted down what he said.
The RSS conference up in Manchester was much of the same, lots of parallel sessions, not speaking to anyone, limited applicability. The difference was this time I gave a speech in a smaller room, although to be honest I don’t really remember much about it. I guess it was unexceptional, apart from the amusing coincidence that the person speaking immediately before me, also spoke immediately before me at the ISCB conference. That’s niche academia for you I guess. I also went to an early career meet up on the first night, but for whatever reason, I wasn’t really in the right frame of mind to be exceptionally sociable, and don’t think I saw anyone from the night again during the conference.
The Cochrane Colloquium in Seoul was quite different. Mostly because it was in Seoul. There were still many parallel sessions, although now my probably wasn’t in finding something I wanted to go to, it was whittling down the things I wanted to go to since they all happened at the same time. Overall, I’m not really a fan of more than a couple of parallel sessions – a few people from my university went to the conference (colloquium…), and we were all speaking at the same time. This was a bit disappointing, I was looking forward to listening to a friend’s talk.
It also meant that each lecture room was pretty small. I guess that’s good for intimacy, but at the same time, I’d travelled across the globe to give a speech to a room that at best had 30-40 people in it. Probably more than were in the auditorium at ISCB, but that was 2 hours away by train at rush hour. I gave my speech, went to interesting talks, failed to win “best talk by a newbie” (understandably, I was just hopeful), went to some training-type sessions, the usual stuff.
Seoul had, however, quite a few differences to previous conferences. There were social things happening, and I got to know a few people. This really made a difference – like in Alaska, I could now go do things with people, including karaoke before and after soju (soju is great), going on a tour of Seoul, cooking our own bulgogi (Korean BBQ), making kimchi (dear lord, do not keep kimchi un-refrigerated in your hotel room), and going on a tour of the demilitarized zone (DMZ) between North and South Korea. For those that haven’t been, the DMZ is the no-man’s land between warring North and South Korea. The North side is all barren and military-esque. The South side has an amusement park, with rides. I… I’m not sure the South are taking this seriously.
In short, Korea was much better than previous conferences, and it was due both to the more relevant talks (good ol’ evidence synthesis), and meeting people and being able to do things with them. So although I have no tips on HOW to achieve this (karaoke and soju work great, but probably limited opportunity to get them both together), it was certainly a good thing at this particular conference. I imagine it also helped that I knew some people at the conference already – with the exception of the YSM, where I met a few people that I knew at the conference, I had never been to a conference where I knew anyone. So maybe go with a friend, if possible.
If you can, go to this course. You can ski. It’s brilliant.
People from my university also teach on it, and as they’ve taught me on short courses before, I can say that they’re pretty good. The courses I went on were also pretty good (James Carpenter was particularly good I thought). Wengen probably isn’t the most exciting place to go if you don’t like to hike up snowy mountains or ski/snowboard/toboggan down them, but it’s pretty good if you do.
There’s always a conclusion, right? Well, what I’ve learnt from my experiences is that conferences can be pretty hit and miss in terms of content – sometimes everything is interesting and sometimes very little is interesting – and it can be difficult to get to know people, especially if they are already there with people they know. However, sometimes (and this is probably much more true of the longer conferences), you can make some good friends and have a great time. So far, I’ve only met people randomly – the social events that have dedicated “get to know each other” or networking sessions I don’t think have ever worked for me.
I’m still nervous before giving a speech. Much less than I used to be, but still a bit. Practice has helped – the more conferences I do, the better I will be – but I also teach on some of my university’s short courses, and this practice has helped a lot too. As has the knowledge that in several years of giving talks, no one has ever slammed my ideas or been rude, literally everyone who has spoken to me has been nothing but nice and friendly.
So yes. If you’ve never been to a conference before, then start with something relatively small, preferably go with at least one person you know, possibly go to some without a poster/talk, then go with a poster, then a talk. That’s the progression I did, and it felt fine. Of course, you could always dive in with an international conference talk on the first go, it’ll probably be fine. I hope that everyone in other fields appear as nice as they do in mine. And try to make friends, but if you don’t, that’s fine too.
I did the in 2017, and I sucked. Properly sucked. I didn’t speak for 10 seconds, since I forgot what I was supposed to say. You get one slide in the 3-minute thesis, and have 3 minutes to describe your thesis, and this format did not work well either with my content (my thesis is pretty long and complicated, and the underlying statistical problem that makes up the interesting part of my thesis usually takes a little longer than 3 minutes to properly explain to a lay-person), or with me. So all my speaking practice meant squat when it came to talking in a really limited time-frame in a competition. I’m fine with a casual chat where I talk about my work, but something about that competition made me into a babbling wreck of a speaker.
My supervisor who was there said it was fine. I don’t think I believe them…
If you’re wondering how I managed to go to so many conferences as a PhD student, the I did a PhD with the Wellcome Trust, who gave me a reasonable sum that I could have used on anything – recruiting patients, travel, training, lab reagents/equipment, data etc. If you ever think about doing a PhD in epidemiology, I would strongly recommend the Wellcome Trust’s . There are other links, but it’s late, I’m tired, and I like Bristol.
]]>As I said last week, my PhD was in at the University of Bristol, and lasted for 4 years. My thesis was the written report that described what I did for the latter 3 years, and was assessed in my Viva.
Three years ago, when I was just starting my PhD proper, we discussed what I would do and in what order. My aim was to make for prostate cancer better, since PSA is a bit awful at detecting prostate cancer (although there is nothing better at the moment). To do this, I intended to find an individual characteristic (things like age, ethnicity, weight etc.) that was associated with both prostate cancer and PSA.
Quick note: PSA is a protein made in the prostate that can be found in the blood – if the prostate is damaged, then more PSA is in the blood (be it from cancer, infection or inflammation). A high PSA level indicates damage to the prostate, but it is very hard to tell from what – only a biopsy is definitive.
If something is associated with prostate cancer, then the overall risk of prostate cancer changes with that something, which is then called a “risk factor” for prostate cancer. For example, Black men have a of prostate cancer than White or Asian men. If something is associated with PSA, then the overall level of PSA in the blood changes with the something. For example, taking the drug finasteride . can be used to reduce the size of the prostate (for conditions like benign prostatic enlargement), and also for hair loss.
When PSA is used as a test for prostate cancer, a PSA level of less than 4.0 ng/ml is usually considered “normal” (although other thresholds of “normal” are used, such as 2.5 ng/ml or 3.0 ng/ml, and can depend on the age of the man). Since finasteride reduces PSA levels by half, this is sometimes taken into account by doctors – if a man has a PSA test and is taking his finasteride, his PSA might be doubled to get a more accurate reading. This is a simple example of adjusting test results to better fit each individual; if the PSA were not doubled, then it would be much lower than it should be, and might mask prostate cancer. So if something affects PSA, then it should be taken into account when measuring PSA.
Things become a bit more complicated when something affects both prostate cancer risk as well as PSA. If something increases the risk of prostate cancer (such as being Black), then on average, it also increases PSA levels, since prostate cancer also increases PSA levels. This effect can be removed if you just look at men without prostate cancer, but this is tricky, since prostate cancer is common and lots of men can have prostate cancer without realising or being diagnosed (the statistic at medical school was 80% of men at 80 years old have prostate cancer).
As an additional problem, because men have PSA tests before being offered a prostate biopsy to see whether they have prostate cancer, anything that affects PSA levels may look like it affects prostate cancer risk too. If something lowers PSA levels (like finasteride), then some men will go from having a PSA above the threshold for a biopsy, to having a PSA below the threshold. So although the risk of prostate cancer may be the same, because not all men with prostate cancer are DIAGNOSED with the disease, it can look like things that reduce PSA are protective for prostate cancer, and things that increase PSA are a risk for prostate cancer.
Below is a diagram showing the effects of increasing age on prostate cancer status (i.e. whether a man actually has prostate cancer) and PSA (PSA levels increase with age), and how this could affect prostate cancer diagnosis. We are reasonably sure that age increases both prostate cancer risk (same as many cancers) and PSA levels (as the prostate becomes more leaky over time, letting more PSA into the blood to increase the PSA levels in the blood), but it is not so clear for other things.
My PhD was to look for a variable (individual characteristic) that was associated with both prostate cancer and PSA, and try to work out how much it affected each, and therefore how much PSA would need to be adjusted for to account for the effect on PSA, without touching the effect on prostate cancer. So for age above, I would work out exactly how much an increase of 1 year in age increased PSA – the top right line in the diagram. Once PSA was adjusted for age, it would hopefully be better at finding prostate cancer, since it would no longer be affected by changes in age.
Before I even started my PhD proper, I created a that gave the deadlines for all the work I knew I would need to do. First, I would need to find a variable, then perform a couple of systematic reviews to find all the studies that looked at the associations between the variable, prostate cancer and PSA. Then, I would need to conduct a couple of meta-analyses, which would combine the associations to get my final results. I also wanted to use individual participant data (published papers in epidemiology usually give summary results, telling you the association between two things, rather than listing any individual participant results, which would generally contravene patient confidentiality). The individual data would be used to enhance the meta-analysis results, but individual data takes a long time to source (about a year for all the data I requested). The Gantt chart I created is shown below.
This Gantt chart shows the 9 chapters I intended to write, the bare basics of what I would need to do for each, as well as when I would write up each chapter (“thesis production”). As far as plans go, this one was pretty limited, but it gave us a timeframe to work from.
In actuality, I didn’t stick to this very well at all. Chapters got removed or changed, new chapters were added, but the point I wanted to make remains: as I did each stage of research, I wrote up a chapter detailing what I did and how, as if each chapter was its own research paper. This meant when it came to the final 3 months, I was still adding new content, but the bulk of the work had already been done and I didn’t really need to remember what I did 2 years ago, it was right in front of me. Also, since I was writing up as I went along, I was forced to fully comprehend everything I was doing and why – doing it is one thing, but doing it and writing it down so other people could follow the rationale needs much greater understanding.
PhDs should all start with a plan (Gantt chart optional but useful), but writing up as you go along is definitely a huge time-saver in the end. I admit that what I wrote in the earlier years was… well… poor quality, but that’s to be expected. The only way to get better is to practice, and writing up as I went along gave me plenty of useful practice.
My thesis was split into 8 chapters (eventually), each with its own objectives (which I listed in the thesis itself in the first chapter, and below too). I had 2 introductory/background chapters, 5 analysis chapters, and a discussion chapter.
Chapter 1 |
Provide background information on prostate cancer and PSA, as well as using PSA as a screening test for prostate cancer. |
Chapter 2 |
Describe evidence synthesis methodologies relevant to this thesis. |
Chapter 3 |
Identify individual characteristics that have a plausible association with both prostate cancer and PSA, and have a large body of evidence examining this association, then select a characteristic to examine further. |
Chapter 4 |
Perform a systematic review and aggregate data meta-analysis of studies examining the associations between the chosen characteristic, prostate cancer, advanced prostate cancer and PSA. |
Chapter 5 |
Identify and collect data from large, well conducted prostate cancer studies, then perform an individual participant data meta-analysis of the associations between the characteristic, prostate cancer, advanced prostate cancer and PSA. |
Chapter 6 |
Combine the results of the aggregate and individual participant data to estimate the associations between the characteristic, prostate cancer, advanced prostate cancer and PSA as precisely as possible. |
Chapter 7 |
Perform a Mendelian randomisation analysis to assess evidence for causality between the characteristic, prostate cancer, advanced prostate cancer and PSA. |
Chapter 8 |
Summarise the results, strengths and limitations from the thesis, and indicate what direction future work may take. |
In addition to the 8 chapters, I had a title page (with a word count), an abstract, a dedication and acknowledgements page, a declaration (that I didn’t cheat), then section with contents, list of figures, and list of tables. My appendix contained information I thought was too specialist for the thesis, or just surplus to requirements (but still interesting). Given this was a thesis, the specialist stuff was really niche… I also put 2 papers I published during the PhD as images at the end of the appendix (turning PDFs to images for inclusion in word is incredibly irritating, but I thought it best to do it this way rather than combine the PDFs later) – these papers were relevant to the thesis, I published other papers but left them out. My appendix also had a list of acronyms; I included over 100 previously conducted studies in my thesis, most of which had acronyms, so putting a list of them (and all the medical and statistical terms that are acronymised) was likely pretty useful.
Side note: writing papers for outside projects was also very beneficial during my PhD, and would recommend PhD students do it if there’s time. Firstly, outside work can pay. Secondly, working with other people on other work increases your research skills and contacts, and counts as networking, something I still struggle with. Thirdly, it increases the number of papers you’re on, something I am told is very important in academia. Finally, concentrating on one piece of work for 3 years can be crushing – taking a break to do other work can paradoxically be relaxing. Teaching is also good to do, not least because I found teaching (for me, 30-40 people for only 1-2 hours on a short course) a great way to practice public speaking, which comes in handy at conferences. So yeah – PhD students, do extra work if you have time, it’s great.
My introductory chapters gave the background for prostate cancer and PSA testing I needed people to know before reading the rest of the thesis, and described fundamental evidence synthesis methods, which I would use extensively in the thesis. However, because my analyses were pretty disparate, I kept most of the chapter-specific methodology in the analysis chapters themselves. I imagine this makes it clearer when reading through – if you go 1 chapter at a time, all the information you would need is there in the chapter, you don’t have to flick back through to the introductory chapters.
My analysis chapters were written at the time of the analysis, with substantial editing later as I became a better writer (note: not a good write, just better than I was). After I finished each chapter’s analysis, I wrote up what I did and sent it to my 3 supervisors (I hear this is unusual, most departments have 1 or 2 supervisors, but I know of one person in a different university with over 10 supervisors). My supervisors read through and made comments – most chapters were read through and changed 3-4 times before I started compiling my thesis.
My analyses were iterative. Every piece of research is likely at least a little iterative – you start out with an idea, and gradually it becomes refined over time. Writing up each chapter after the analysis helped with this, since I could spot any errors. It did make my code a mess though, so much so that for the main analyses I rewrote the entire thing so it would be clearer. Although, since it’s code, I would happily rewrite it today to make it even more clear. Code seems to never be finished. In any case, I was still editing my analyses up to a month before submission, fixing little errors that crept in.
It wasn’t until 3 months before the deadline that I assembled a complete thesis out of the individual chapters I made. I read up on all the rules for the thesis from the university, and I created the list of figures/tables/contents that autogenerates in Word. I captioned all my figures and tables properly in Word so they would appear in those autogenerated lists. But at this stage, my supervisors still wanted to see individual chapters, so I was maintaining both an up-to-date thesis, as well as individual chapters, which got confusing. I’m not sure what the best way is here; creating the thesis as a whole was important and took a good day or two to do correctly, and was a good psychological boost, but maintaining different copies of files is asking for trouble.
Side note: I use for referencing, rather than (the two options available to me). Mendeley has three advantages: 1) it’s free, 2) the library of references synchronises between my work and home computers, and 3) it makes Word crash less often than Endnote. In total, I had 306 references, which was a strain on Word, so preventing crashes was very important. There are few things are irritating as losing 5 minutes of work on your thesis, when that work was fixing typos on 20 different pages that you now can’t remember.
I eventually produced a preliminary-final-draft of my thesis, and had to FTP (file transfer protocol, used instead of email for large files or for secure sending) it to my supervisors because it was too large for an email (although we also used a shared drive, but this isn’t accessible at home). My supervisors read through it and made any more comments – in total, my supervisors probably read through my entire thesis 5 times at different stages of its development. This is likely an enormous positive of starting writing early on: it gives supervisors a much longer time window to make helpful suggestions.
My final-final-draft was submitted a day or two before the deadline. At this stage, I was beyond caring whether I had made any mistakes. Everyone had read through it multiple times, and I just wanted it to be done. I think I made a half-hearted attempt to read through one last time but gave up, and just sent it. I could always make corrections later.
Writing a thesis takes up a substantial amount of life. This can be drawn out over years, or it can be condensed into the smallest possible time. I favour drawing out the process – starting early has so many advantages, whereas starting with 3 months to go is likely to cause an overwhelming amount of stress. My PhD was also 4 years, and the limit for PhDs in my university is also 4 years, so there was no chance of taking more time at the end of the PhD to write up – the deadline was final an immovable (in reality, I’m sure they would give more time if necessary, but they say you need a very good reason to do so).
So yes, start early. This advice could go for literally all work (including the homework I always left to the last minute), but with a thesis, it’s completely worth it.
]]>