I talked about my PhD Viva last week. This week, I’ll talk about my thesis. I’ll describe briefly what I did, how we made decisions about what to write and when, and anything else interesting about writing a 65,000 word thesis.
As I said last week, my PhD was in molecular, lifecourse and genetic epidemiology at the University of Bristol, and lasted for 4 years. My thesis was the written report that described what I did for the latter 3 years, and was assessed in my Viva.
My research – a bit of background
Three years ago, when I was just starting my PhD proper, we discussed what I would do and in what order. My aim was to make prostate-specific antigen (PSA) testing for prostate cancer better, since PSA is a bit awful at detecting prostate cancer (although there is nothing better at the moment). To do this, I intended to find an individual characteristic (things like age, ethnicity, weight etc.) that was associated with both prostate cancer and PSA.
Quick note: PSA is a protein made in the prostate that can be found in the blood – if the prostate is damaged, then more PSA is in the blood (be it from cancer, infection or inflammation). A high PSA level indicates damage to the prostate, but it is very hard to tell from what – only a biopsy is definitive.
If something is associated with prostate cancer, then the overall risk of prostate cancer changes with that something, which is then called a “risk factor” for prostate cancer. For example, Black men have a higher risk of prostate cancer than White or Asian men. If something is associated with PSA, then the overall level of PSA in the blood changes with the something. For example, taking the drug finasteride decreases the PSA level by about half. Finasteride can be used to reduce the size of the prostate (for conditions like benign prostatic enlargement), and also for hair loss.
When PSA is used as a test for prostate cancer, a PSA level of less than 4.0 ng/ml is usually considered “normal” (although other thresholds of “normal” are used, such as 2.5 ng/ml or 3.0 ng/ml, and can depend on the age of the man). Since finasteride reduces PSA levels by half, this is sometimes taken into account by doctors – if a man has a PSA test and is taking his finasteride, his PSA might be doubled to get a more accurate reading. This is a simple example of adjusting test results to better fit each individual; if the PSA were not doubled, then it would be much lower than it should be, and might mask prostate cancer. So if something affects PSA, then it should be taken into account when measuring PSA.
Things become a bit more complicated when something affects both prostate cancer risk as well as PSA. If something increases the risk of prostate cancer (such as being Black), then on average, it also increases PSA levels, since prostate cancer also increases PSA levels. This effect can be removed if you just look at men without prostate cancer, but this is tricky, since prostate cancer is common and lots of men can have prostate cancer without realising or being diagnosed (the statistic at medical school was 80% of men at 80 years old have prostate cancer).
As an additional problem, because men have PSA tests before being offered a prostate biopsy to see whether they have prostate cancer, anything that affects PSA levels may look like it affects prostate cancer risk too. If something lowers PSA levels (like finasteride), then some men will go from having a PSA above the threshold for a biopsy, to having a PSA below the threshold. So although the risk of prostate cancer may be the same, because not all men with prostate cancer are DIAGNOSED with the disease, it can look like things that reduce PSA are protective for prostate cancer, and things that increase PSA are a risk for prostate cancer.
Below is a diagram showing the effects of increasing age on prostate cancer status (i.e. whether a man actually has prostate cancer) and PSA (PSA levels increase with age), and how this could affect prostate cancer diagnosis. We are reasonably sure that age increases both prostate cancer risk (same as many cancers) and PSA levels (as the prostate becomes more leaky over time, letting more PSA into the blood to increase the PSA levels in the blood), but it is not so clear for other things.
My PhD was to look for a variable (individual characteristic) that was associated with both prostate cancer and PSA, and try to work out how much it affected each, and therefore how much PSA would need to be adjusted for to account for the effect on PSA, without touching the effect on prostate cancer. So for age above, I would work out exactly how much an increase of 1 year in age increased PSA – the top right line in the diagram. Once PSA was adjusted for age, it would hopefully be better at finding prostate cancer, since it would no longer be affected by changes in age.
My thesis plan
Before I even started my PhD proper, I created a Gantt chart that gave the deadlines for all the work I knew I would need to do. First, I would need to find a variable, then perform a couple of systematic reviews to find all the studies that looked at the associations between the variable, prostate cancer and PSA. Then, I would need to conduct a couple of meta-analyses, which would combine the associations to get my final results. I also wanted to use individual participant data (published papers in epidemiology usually give summary results, telling you the association between two things, rather than listing any individual participant results, which would generally contravene patient confidentiality). The individual data would be used to enhance the meta-analysis results, but individual data takes a long time to source (about a year for all the data I requested). The Gantt chart I created is shown below.
This Gantt chart shows the 9 chapters I intended to write, the bare basics of what I would need to do for each, as well as when I would write up each chapter (“thesis production”). As far as plans go, this one was pretty limited, but it gave us a timeframe to work from.
In actuality, I didn’t stick to this very well at all. Chapters got removed or changed, new chapters were added, but the point I wanted to make remains: as I did each stage of research, I wrote up a chapter detailing what I did and how, as if each chapter was its own research paper. This meant when it came to the final 3 months, I was still adding new content, but the bulk of the work had already been done and I didn’t really need to remember what I did 2 years ago, it was right in front of me. Also, since I was writing up as I went along, I was forced to fully comprehend everything I was doing and why – doing it is one thing, but doing it and writing it down so other people could follow the rationale needs much greater understanding.
PhDs should all start with a plan (Gantt chart optional but useful), but writing up as you go along is definitely a huge time-saver in the end. I admit that what I wrote in the earlier years was… well… poor quality, but that’s to be expected. The only way to get better is to practice, and writing up as I went along gave me plenty of useful practice.
My thesis was split into 8 chapters (eventually), each with its own objectives (which I listed in the thesis itself in the first chapter, and below too). I had 2 introductory/background chapters, 5 analysis chapters, and a discussion chapter.
Provide background information on prostate cancer and PSA, as well as using PSA as a screening test for prostate cancer.
Describe evidence synthesis methodologies relevant to this thesis.
Identify individual characteristics that have a plausible association with both prostate cancer and PSA, and have a large body of evidence examining this association, then select a characteristic to examine further.
Perform a systematic review and aggregate data meta-analysis of studies examining the associations between the chosen characteristic, prostate cancer, advanced prostate cancer and PSA.
Identify and collect data from large, well conducted prostate cancer studies, then perform an individual participant data meta-analysis of the associations between the characteristic, prostate cancer, advanced prostate cancer and PSA.
Combine the results of the aggregate and individual participant data to estimate the associations between the characteristic, prostate cancer, advanced prostate cancer and PSA as precisely as possible.
Perform a Mendelian randomisation analysis to assess evidence for causality between the characteristic, prostate cancer, advanced prostate cancer and PSA.
Summarise the results, strengths and limitations from the thesis, and indicate what direction future work may take.
In addition to the 8 chapters, I had a title page (with a word count), an abstract, a dedication and acknowledgements page, a declaration (that I didn’t cheat), then section with contents, list of figures, and list of tables. My appendix contained information I thought was too specialist for the thesis, or just surplus to requirements (but still interesting). Given this was a thesis, the specialist stuff was really niche… I also put 2 papers I published during the PhD as images at the end of the appendix (turning PDFs to images for inclusion in word is incredibly irritating, but I thought it best to do it this way rather than combine the PDFs later) – these papers were relevant to the thesis, I published other papers but left them out. My appendix also had a list of acronyms; I included over 100 previously conducted studies in my thesis, most of which had acronyms, so putting a list of them (and all the medical and statistical terms that are acronymised) was likely pretty useful.
Side note: writing papers for outside projects was also very beneficial during my PhD, and would recommend PhD students do it if there’s time. Firstly, outside work can pay. Secondly, working with other people on other work increases your research skills and contacts, and counts as networking, something I still struggle with. Thirdly, it increases the number of papers you’re on, something I am told is very important in academia. Finally, concentrating on one piece of work for 3 years can be crushing – taking a break to do other work can paradoxically be relaxing. Teaching is also good to do, not least because I found teaching (for me, 30-40 people for only 1-2 hours on a short course) a great way to practice public speaking, which comes in handy at conferences. So yeah – PhD students, do extra work if you have time, it’s great.
My introductory chapters gave the background for prostate cancer and PSA testing I needed people to know before reading the rest of the thesis, and described fundamental evidence synthesis methods, which I would use extensively in the thesis. However, because my analyses were pretty disparate, I kept most of the chapter-specific methodology in the analysis chapters themselves. I imagine this makes it clearer when reading through – if you go 1 chapter at a time, all the information you would need is there in the chapter, you don’t have to flick back through to the introductory chapters.
My analysis chapters were written at the time of the analysis, with substantial editing later as I became a better writer (note: not a good write, just better than I was). After I finished each chapter’s analysis, I wrote up what I did and sent it to my 3 supervisors (I hear this is unusual, most departments have 1 or 2 supervisors, but I know of one person in a different university with over 10 supervisors). My supervisors read through and made comments – most chapters were read through and changed 3-4 times before I started compiling my thesis.
My analyses were iterative. Every piece of research is likely at least a little iterative – you start out with an idea, and gradually it becomes refined over time. Writing up each chapter after the analysis helped with this, since I could spot any errors. It did make my code a mess though, so much so that for the main analyses I rewrote the entire thing so it would be clearer. Although, since it’s code, I would happily rewrite it today to make it even more clear. Code seems to never be finished. In any case, I was still editing my analyses up to a month before submission, fixing little errors that crept in.
It wasn’t until 3 months before the deadline that I assembled a complete thesis out of the individual chapters I made. I read up on all the rules for the thesis from the university, and I created the list of figures/tables/contents that autogenerates in Word. I captioned all my figures and tables properly in Word so they would appear in those autogenerated lists. But at this stage, my supervisors still wanted to see individual chapters, so I was maintaining both an up-to-date thesis, as well as individual chapters, which got confusing. I’m not sure what the best way is here; creating the thesis as a whole was important and took a good day or two to do correctly, and was a good psychological boost, but maintaining different copies of files is asking for trouble.
Side note: I use Mendeley for referencing, rather than Endnote (the two options available to me). Mendeley has three advantages: 1) it’s free, 2) the library of references synchronises between my work and home computers, and 3) it makes Word crash less often than Endnote. In total, I had 306 references, which was a strain on Word, so preventing crashes was very important. There are few things are irritating as losing 5 minutes of work on your thesis, when that work was fixing typos on 20 different pages that you now can’t remember.
I eventually produced a preliminary-final-draft of my thesis, and had to FTP (file transfer protocol, used instead of email for large files or for secure sending) it to my supervisors because it was too large for an email (although we also used a shared drive, but this isn’t accessible at home). My supervisors read through it and made any more comments – in total, my supervisors probably read through my entire thesis 5 times at different stages of its development. This is likely an enormous positive of starting writing early on: it gives supervisors a much longer time window to make helpful suggestions.
My final-final-draft was submitted a day or two before the deadline. At this stage, I was beyond caring whether I had made any mistakes. Everyone had read through it multiple times, and I just wanted it to be done. I think I made a half-hearted attempt to read through one last time but gave up, and just sent it. I could always make corrections later.
Writing a thesis takes up a substantial amount of life. This can be drawn out over years, or it can be condensed into the smallest possible time. I favour drawing out the process – starting early has so many advantages, whereas starting with 3 months to go is likely to cause an overwhelming amount of stress. My PhD was also 4 years, and the limit for PhDs in my university is also 4 years, so there was no chance of taking more time at the end of the PhD to write up – the deadline was final an immovable (in reality, I’m sure they would give more time if necessary, but they say you need a very good reason to do so).
So yes, start early. This advice could go for literally all work (including the homework I always left to the last minute), but with a thesis, it’s completely worth it.