Albatross plots: Part 2

Years ago, I wrote a post called Albatross plots: Part 1, which goes alongside our developing albatross plots. My plan, I think, was to go through how the contours are generated.

The supplementary material for the albatross plot paper covers the statistic pretty well. But, for whatever reason, it wasn’t published with the paper.

So, in this post I’ll go through the supplementary material that was supposed to go with the paper, detailing how the contours are generated. In Part 3, in possibly another 3 years (or in 3 days, who knows?), I may go through the adjust function, and how that works.

Normal Distributions

First though, a bit of probability.

This is a normal distribution, also called a “Bell curve” since it looks like a bell:

How is a normal distribution generated? Random chance.

In concrete terms, there’s lots of showing how if you drop a bunch of balls through a large triangle of pins with bins at the bottom, the balls fall into the bins in a normal distribution-type shape. This is because a normal distribution is the sum of an infinite number of random chances. Each ball, when it falls through the pins, will hit a pin and randomly fall to the left or to the right. It’s 50-50 which way the ball will fall, so we would expect to see the balls falls to the left half the time, and to the right half the time. With an infinite number of balls and an infinite number of pins, we will see a normal distribution.

In abstract terms, imagine 1,000,000 observations, all starting at 0. Randomly add 1 or subtract 1 from each number (to simulate a ball going left or right), and repeat this 9,999 more times. Now look at the distribution of values: we would expect half the observations to be above 0, and half to be below 0 (with some observations at 0). Some observations could have values of -10,000 or 10,000, because by chance they could have received -1 or +1 10,000 times. This would be vanishingly rare (0.5^10,000, in fact, or 1/2^10000). But if we had infinite observations, then it’s certain some would have those values. Normal distributions approach, but never touch, a probability of 0 for any value.

Below are a series of plots showing histograms of the 1,000,000 observations after 1, 10, 100, 1000 and 10,000 iterations of adding or subtracting 1 (I overlaid a normal distribution to the last one to show how well it fits). You can see how as the number of iterations increases, the more it starts to look like a normal distribution.

I don’t know why this plot decided it wanted to extend to -2 and +2, I guess Stata doesn’t like making histograms of just two numbers.

Side note: the variance of the distribution after 10, 100, 1000 and 10,000 iterations were as follows: 9.99, 99.8, 999.8 and 10,000. The variance of a distribution generated by adding or subtracting 1 randomly N times is N. Fun, right?

Statistical side note (don’t read unless you particularly like statistics, or thought that I confused binomial and normal distributions): When we use discrete quantities (balls, observations in a simulation), we’re actually creating a binomial distribution, not a normal distribution. A normal distribution is strictly continuous, and a binomial distribution is strictly discrete. But if you have an infinite number of observations and infinite number of repeats, then a binomial distribution becomes a normal distribution, according to the , specifically the .

Statistical side note 2: The reason the variance is equal to the number of times you add or subtract 1 is because of how you calculate the variance of a binomial distribution. In a regular binomial distribution, you have a test, and if it passes, you add 1 – you don’t subtract 1 if it fails – and you repeat N times. So the only variables are the number of tests you do (N), and the probability of passing the test (p). The variance of a binomial distribution is N x p x (1-p). In the simulation above, this would be N x 0.5 x 0.5. But adding or subtracting 1 is the same as adding 2 or doing nothing, so what we’ve actually done is shifted the binomial distribution to the left (so the mean is 0), and stretched it by doubling all the values. This has the effect of multiplying the variance by 4, so we now have the variance equal to N x 0.5 x 0.5 x 4, which is just N. Double fun, right? Right?

Here’s the Stata code I used to generate the plots:

clear
set obs 1000000
set seed 123
gen x = 0
gen add = 0
forvalues i = 1/10000 {
                qui replace add = -1
                qui replace add = 1 if runiform() >= 0.5
                qui replace x = x + add
                foreach j in 1 10 100 1000 10000 {
                                if `i’ == `j’ {
                                                hist x, name(“_`i'”, replace) title(“Iterations = `i'”) discrete
                                }
                }
}  

P Values

But: why talk about normal distributions so much in a post about albatross plots?

Because this is how P values are estimated.

P values are (usually) based on Wald tests. A Wald test estimates a Z score by dividing the effect estimate by its standard error:

Where Zp is the Z score, b is the effect estimate, and SE is its standard error.

The Z score is then converted to a P value using the normal distribution. The Z score is the number of standard errors the effect estimate is away from 0. Given a normal distribution is defined by random chance, we can use it to say what the chance is we got the effect estimate we did if the true effect was 0, and only random chance was at play.

A normal distribution is defined mathematically: we know the probability that any particular value will fall above or below a certain point by using the .

The P value is the chance that, if the true effect size were zero, we see an effect estimate at least as large as the one we found (a two-sided P value doesn’t care about whether the effect estimate is positive or negative, just whether it’s large).

So why is any of this important?

Well, a Wald test is the same in any statistical method, and is widely used to estimate P values. By “widely used”, I mean the P value was estimated using the Wald test in literally every frequentist study I’ve read in the last 8 years (Bayesian statistics is different, but almost all epidemiology studies are frequentist, not Bayesian).

Effect Contours

So, for all statistical types, we only need to know one thing: how the standard error is estimated. Each contour is assigned an effect estimate, and the x-axis is the P value (from which we know the Z score), so only the standard error is left.

Fortunately, all standard errors we’re going to look at are proportional to 1 over the square root of the number of participants in the analysis. The aim for each statistical method is therefore to find the quantity phi (the circle with the vertical line through it) such that:

When we merge equations (1) and (2) and rearrange, we can draw contours in the form:

where Zp is the Z score, and b is the effect size defining the particular contour. In the albatross plots, the x-axis is the P value and the y-axis is the number of participants, so equation (3) is literally the equation for each contour, i.e. y = mx.

For most effect sizes, phi depends on one or more quantities, such as the ratio of group sizes, the baseline risk (for binary outcomes), and the standard deviation (for continuous outcomes). Values for these quantities must be specified to define the contours, and would usually be chosen to represent a typical value across the included studies.

For studies involving two independent groups, we have to define the numbers of participants as n1 and n2, with total sample size N and ratio of group sizes r = n1/n2, such that n1 = rN/(r+1) and n2 = N/(r+1). This is because as the ratio of group sizes gets further away from 1, the larger the standard error of any particular analysis will be for the same total number of participants. A study with 50 participants in both the intervention and control arms of a study has more information than a study with 99 participants in the intervention arm and 1 in the control arm, and this is reflected in the standard error.

I’ll go through the derivation of the effect contour equations for common statistical methods. The initial equations – those defining the standard error – are widely available, so I haven’t referenced them.

Remember, in each equation, we’re trying to find , the quantity we can plug into formula (3) to create the effect contours.

Note: wordpress doesn’t really do equations, so I’ve copied the text out as images to make my life easier.

And that’s it – that’s all the derivations of all the contours used in albatross plots!

I hope that someone finds this useful – apologies for having to use images instead of equations!