Albatross plots: Part 2

Years ago, I wrote a post called Albatross plots: Part 1, which goes alongside our developing albatross plots. My plan, I think, was to go through how the contours are generated.

The supplementary material for the albatross plot paper covers the statistic pretty well. But, for whatever reason, it wasn’t published with the paper.

So, in this post I’ll go through the supplementary material that was supposed to go with the paper, detailing how the contours are generated. In Part 3, in possibly another 3 years (or in 3 days, who knows?), I may go through the adjust function, and how that works.

Normal Distributions

First though, a bit of probability.

This is a normal distribution, also called a “Bell curve” since it looks like a bell:

How is a normal distribution generated? Random chance.

In concrete terms, there’s lots of showing how if you drop a bunch of balls through a large triangle of pins with bins at the bottom, the balls fall into the bins in a normal distribution-type shape. This is because a normal distribution is the sum of an infinite number of random chances. Each ball, when it falls through the pins, will hit a pin and randomly fall to the left or to the right. It’s 50-50 which way the ball will fall, so we would expect to see the balls falls to the left half the time, and to the right half the time. With an infinite number of balls and an infinite number of pins, we will see a normal distribution.

In abstract terms, imagine 1,000,000 observations, all starting at 0. Randomly add 1 or subtract 1 from each number (to simulate a ball going left or right), and repeat this 9,999 more times. Now look at the distribution of values: we would expect half the observations to be above 0, and half to be below 0 (with some observations at 0). Some observations could have values of -10,000 or 10,000, because by chance they could have received -1 or +1 10,000 times. This would be vanishingly rare (0.5^10,000, in fact, or 1/2^10000). But if we had infinite observations, then it’s certain some would have those values. Normal distributions approach, but never touch, a probability of 0 for any value.

Below are a series of plots showing histograms of the 1,000,000 observations after 1, 10, 100, 1000 and 10,000 iterations of adding or subtracting 1 (I overlaid a normal distribution to the last one to show how well it fits). You can see how as the number of iterations increases, the more it starts to look like a normal distribution.

I don’t know why this plot decided it wanted to extend to -2 and +2, I guess Stata doesn’t like making histograms of just two numbers.

Side note: the variance of the distribution after 10, 100, 1000 and 10,000 iterations were as follows: 9.99, 99.8, 999.8 and 10,000. The variance of a distribution generated by adding or subtracting 1 randomly N times is N. Fun, right?

Statistical side note (don’t read unless you particularly like statistics, or thought that I confused binomial and normal distributions): When we use discrete quantities (balls, observations in a simulation), we’re actually creating a binomial distribution, not a normal distribution. A normal distribution is strictly continuous, and a binomial distribution is strictly discrete. But if you have an infinite number of observations and infinite number of repeats, then a binomial distribution becomes a normal distribution, according to the , specifically the .

Statistical side note 2: The reason the variance is equal to the number of times you add or subtract 1 is because of how you calculate the variance of a binomial distribution. In a regular binomial distribution, you have a test, and if it passes, you add 1 – you don’t subtract 1 if it fails – and you repeat N times. So the only variables are the number of tests you do (N), and the probability of passing the test (p). The variance of a binomial distribution is N x p x (1-p). In the simulation above, this would be N x 0.5 x 0.5. But adding or subtracting 1 is the same as adding 2 or doing nothing, so what we’ve actually done is shifted the binomial distribution to the left (so the mean is 0), and stretched it by doubling all the values. This has the effect of multiplying the variance by 4, so we now have the variance equal to N x 0.5 x 0.5 x 4, which is just N. Double fun, right? Right?

Here’s the Stata code I used to generate the plots:

set obs 1000000
set seed 123
gen x = 0
gen add = 0
forvalues i = 1/10000 {
                qui replace add = -1
                qui replace add = 1 if runiform() >= 0.5
                qui replace x = x + add
       ��        foreach j in 1 10 100 1000 10000 {
                                if `i’ == `j’ {
                                                hist x, name(“_`i'”, replace) title(“Iterations = `i'”) discrete

P Values

But: why talk about normal distributions so much in a post about albatross plots?

Because this is how P values are estimated.

P values are (usually) based on Wald tests. A Wald test estimates a Z score by dividing the effect estimate by its standard error:

Where Zp is the Z score, b is the effect estimate, and SE is its standard error.

The Z score is then converted to a P value using the normal distribution. The Z score is the number of standard errors the effect estimate is away from 0. Given a normal distribution is defined by random chance, we can use it to say what the chance is we got the effect estimate we did if the true effect was 0, and only random chance was at play.

A normal distribution is defined mathematically: we know the probability that any particular value will fall above or below a certain point by using the .

The P value is the chance that, if the true effect size were zero, we see an effect estimate at least as large as the one we found (a two-sided P value doesn’t care about whether the effect estimate is positive or negative, just whether it’s large).

So why is any of this important?

Well, a Wald test is the same in any statistical method, and is widely used to estimate P values. By “widely used”, I mean the P value was estimated using the Wald test in literally every frequentist study I’ve read in the last 8 years (Bayesian statistics is different, but almost all epidemiology studies are frequentist, not Bayesian).

Effect Contours

So, for all statistical types, we only need to know one thing: how the standard error is estimated. Each contour is assigned an effect estimate, and the x-axis is the P value (from which we know the Z score), so only the standard error is left.

Fortunately, all standard errors we’re going to look at are proportional to 1 over the square root of the number of participants in the analysis. The aim for each statistical method is therefore to find the quantity phi (the circle with the vertical line through it) such that:

When we merge equations (1) and (2) and rearrange, we can draw contours in the form:

where Zp is the Z score, and b is the effect size defining the particular contour. In the albatross plots, the x-axis is the P value and the y-axis is the number of participants, so equation (3) is literally the equation for each contour, i.e. y = mx.

For most effect sizes, phi depends on one or more quantities, such as the ratio of group sizes, the baseline risk (for binary outcomes), and the standard deviation (for continuous outcomes). Values for these quantities must be specified to define the contours, and would usually be chosen to represent a typical value across the included studies.

For studies involving two independent groups, we have to define the numbers of participants as n1 and n2, with total sample size N and ratio of group sizes r = n1/n2, such that n1 = rN/(r+1) and n2 = N/(r+1). This is because as the ratio of group sizes gets further away from 1, the larger the standard error of any particular analysis will be for the same total number of participants. A study with 50 participants in both the intervention and control arms of a study has more information than a study with 99 participants in the intervention arm and 1 in the control arm, and this is reflected in the standard error.

I’ll go through the derivation of the effect contour equations for common statistical methods. The initial equations – those defining the standard error – are widely available, so I haven’t referenced them.

Remember, in each equation, we’re trying to find , the quantity we can plug into formula (3) to create the effect contours.

Note: wordpress doesn’t really do equations, so I’ve copied the text out as images to make my life easier.

And that’s it – that’s all the derivations of all the contours used in albatross plots!

I hope that someone finds this useful – apologies for having to use images instead of equations!

5 thoughts on “Albatross plots: Part 2

  1. Hey thanks for all the great work you do.Albatross Plots are nice new method and incredibly useful tool for literature researches! I am not a crack in statistics and wondered if it would be sound to use correlation coefficients r, I got from the plot, to calculate R²?


    1. Hi Roland, thanks for the message!

      It depends on the studies in question – the correlation coefficient you get from the plots is correct if the study didn’t adjust for any covariables (and if any other assumptions are met). If the studies adjusted for a bunch of variables (i.e. in linear regression), then the standard error will be smaller than as estimated in the albatross plot, so the estimated correlation will be HIGHER in the albatross plot than in the study.

      If, however, the studies present just simple correlations between two variables, then as far as I know and barring anything weird in the studies, the estimated correlation coefficients should be reasonably fine for each study. Specifically, if you just have the number of participants and the P value, the correlation coefficient is roughly: sqrt(z^2 / [n + z^2]) – this is what the albatross plot program in Stata will estimate for you.

      All the best,


  2. Hi Sean,
    Great job, love the albatross plot! I’d like to use it for my meta analysis and cite your work in my publication but I only work with R, where can I find the instructions to create the albatross plot?


    1. Hi Emilie, sorry for the slow response!

      Thanks for the message – I haven’t managed to code up an R version of albatross plots as yet, since I’m not great with ggplot2. In terms of creating one yourself, it’s totally doable, but I imagine quite arduous.

      1) You’ll need a dataframe with the number of participants (N) in each study, and the P value (P) and the effect direction (E) of whatever effect estimate you are looking at. If it’s a mean difference, standardized mean difference, odds ratio or risk ratio, you may want other information (or need to assume other information in order to generate effect contours). Exactly what you’ll need is listed in table 1 of the paper:

      2) Albatross plots are a scatter plot with some contours superimposed. The X-axis is the P value, which I coded to be on the log10 scale for the inverse P value, i.e. log10(1/P). This means the X-axis extends from P=1 in the middle to P->0 at either end. The effect direction (E) makes the X-axis value either positive or negative, i.e. negative effect directions extend to the left towards P->0, and positive extend to the right towards P->0. The Y-axis is scaled as a squared log10 scale, so [log10(N)]^2 – this just makes it easier to interpret.

      In Stata, I scale all the P and N values to fit the X and Y scales, then relabel the values. As an example, a study with a P value of 0.05 and 100 people with a positive effect direction has the co-ordinates: X=1.30, Y=4. Another example, a study with a P value of 0.001 and 1000 people with a negative effect direction has the co-ordinates: X=3, Y=9. I imagine you can relabel the X and Y axes to show the correct P and N values in R: it’s nice numbers at least – for the X axis, 0=1, 1=0.1, 2=0.01 etc., and for the Y axis, 0=1, 1=10, 4=100, 9=1000 etc.

      3) That’s all for the individual study points in the scatter plot – I have different markers for different groups, and have an adjust option that changes the N value to the effective N value, but the equations are a bit of a nightmare – let me know if you want them. I do intend to put them up as a post one day…

      4) For the superimposed line graphs showing the effect contours, you need the equations in table 1 of the paper (link above). First, decide on the statistical method, then make sure that you have the additional required variables (if necessary). The additional variables aren’t for each study, The equations are in the form N = Z^2p * [effect estimate] * [usually something else]. This is analagous to Y = X * [chosen value for the contour] * [a constant].

      For simplicity of reading, the equations use Z^2 rather than the P value, but they are interchangable with a transformation. In Stata it’s Z = invnormal(P/2) or P = normal(-abs(Z))*2. In Excel, it’s normsinv(P/2) or P = normsdist(-abs(Z))*2. I think R uses normal() and invnormal(), but can’t remember. The P/2 business is specifically for 2-sided P values: you don’t bother with it for 1-sided P values. However, if you are using 1-sided P values, you’ll need to rescale the X axis. Let’s hope you’re not using 1-sided P values, because they are a faff…

      Ok, so once you’ve decided on the statistical type, you need to make decisions about a) the contour effect size, and b) the size of any remaning variables. For a), you just need to play around with likely effect sizes, and find ~3 that you like the look of. For b), you need to know roughly what the values of each additional variable is in each of the included studies. For instance, for mean differences, the equation requires the standard deviation (SD) of the mean and the ratio of group sizes. If the ratio of group sizes is 1 (i.e. same sized groups), then you can just use the simpler equation that’s listed. But in either case, you need the SD – to be comparable, each included study should have roughly the same SD, and you can plug that number into the equation.

      For odds ratios (ORs) and risk ratios (RRs), you’ll need estimates of the ratio of group sizes and the baseline risks in the control groups. Plus in these numbers to the relevant equations, and you’re left with what I put above: Y = X * [chosen value for the contour] * [a constant].

      Now, if you have wildly different ratios of group sizes, then you may want to adjust down to the effective sample size, as I mentioned above. This means estimating the number of participants required to get the same P value and effect size IF the number of participants in each group in a study were equal. The equations necessary to do this aren’t published, so let me know if you need this (or anyone else, if you’re reading this, it’ll give me an incentive to publish a post about it).

      Once you have the equations you’ll need to produce the contours, you’ll have to recale them to fit your graph. This means: a) turning the scaled P value (the X axis value) into a Z score, i.e. invnormal(10^-abs(X)/2), then b) further scaling the right hand side of the equation so Y is scaled correctly, i.e. log10 the right side then square it.

      This will give you a Y = mX + c equation, which you should be able to plot in a specified range in R on top of the scatter plot.

      Example: for a standardised mean difference (equally sized groups), the equation for the contour is:

      N = [Z^2] * (8 + SMD^2) / (2*SMD^2)

      Now do a) above:

      N = [invnormal(10^-abs(X)/2] * (8 + SMD^2) / (2*SMD^2)

      And b):

      Y = log10([invnormal(10^-abs(X)/2] * (8 + SMD^2) / (2*SMD^2))^2

      And let’s say you want the SMD to be 1 for a contour:

      Y = log10([invnormal(10^-abs(X)/2] * [9/2])^2

      If you wanted the SMD to be 2 for another contour:

      Y = log10([invnormal(10^-abs(X)/2] * [12/8])^2

      If you wanted the group sizes to be unequal for the contours (so if every study had a 2:1 group size ratio, for example), you’d use the equation for unequal sized groups, substituting in a value of r. For example, if you wanted a 2:1 group size ratio (r=2):

      N = [Z^2] * (2*(r+1)^2 + r*SMD^2) / (2*r*SMD^2)
      N = [invnormal(10^-abs(X)/2] * (2*(2+1)^2 + 2*SMD^2) / (2*2*SMD^2)
      Y = log10([invnormal(10^-abs(X)/2] * (18 + 2*SMD^2) / (4*SMD^2))^2

      So for an SMD of 1:

      Y = log10([invnormal(10^-abs(X)/2] * [20/4])^2

      The other statistical types follow the same rules – use the equations, substitute the values, scale everything correctly, plot the data and the contours, rescale the X and Y axes.

      I appreciate this is complicated, especially when written in a comment, so please let me know if you run into trouble!

      All the best,


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. (  /  Change )

Google photo

You are commenting using your Google account. (  /  Change )

Twitter picture

You are commenting using your Twitter account. (  /  Change )

Facebook photo

You are commenting using your Facebook account. (  /  Change )

Connecting to %s