4.2 How To Read a GWAS

A genome-wide association study (GWAS, pronounced “GEE-wah-s”) is an exploratory method for discovering correlations between any one of many (often millions) of available genotyped variants (usually single-nucleotide polymorphisms, SNPs, pronounced “snips”) and a phenotype of interest. It follows the process of (literally) checking each variant in turn for correlation with the phenotype. Because of this massive “multiple testing” problem, we adjust the typical threshold for statistical significance (p < 0.05) for the equivalent of 1 million tests (that is, we divide 0.05 by 1 million); so a statistically significant result in a GWAS usually requires p < 5 x 10^-8. (We do not divide by the total number of tested SNPs even if it is >1 million - up to 17 million tests is now common - because the tests are not independent. That is, linkage disequilibrium means many tests are highly correlated, so it’s like doing the exact same test over and over again, where the result is guaranteed to be the same, so we don’t count it against the multiple testing burden.) Beyond evaluating whether there are significant associations observed, there are a variety of common ways that papers use to summarize the (millions of) results.

Participants

Because of the large number of tests, and the low threshold for statistical significance, and the anticipated small effect of any one SNP, we need a very large sample to detect any of these effects as statistically significant. (P-values are the result of a function depending on both the effect size AND the sample size - so if we’re looking for small p-values, we need either large effects - which we don’t expect in genetics - or a large number of participants.) For normal-range phenotypes (ie. anything not restricted to cases of extreme, severe, rare outcomes), which is the vast majority of work in behavior genetics, we typically need sample sizes over 100,000 participants for adequate “power” to detect our anticipated reasonable effect sizes as being statistically significant.

Take note of any specific participant characteristics as well. Was the same limited to a certain ancestry group? (Most work so far has relied on samples of European ancestry; more research is now emerging in other ancestry groups in the past few years, but papers do still typically restrict to a single narrowly defined ancestry group.) Also note if they’re looking at only males or only females, or to a certain age group. This can inform how we interpret the phenotype and results.

Common tables and figures

Manhattan plot: This will be included in >99% of all GWAS you encounter. It is a way to summarize the results of the millions of tests that have been performed. For examples, see this twitter bot that only tweets Manhattan plots derived from publicly available GWAS results: https://twitter.com/SbotGwa. (The bot does a great job of tidying everything up and pulling together additional data sources beyond the GWAS, but now you need an account to access it. You can freely access an overwhelming pile of GWAS results with graphs directly from Japan Biobank - try clicking ‘Random’ in the top right menu - and FinnGen - try typing a random page number at the bottom of the list.) The horizontal x-axis is a map of the genome, organized from left to right by chromosome, and within chromosome by location. Each dot represents a single tested SNP. The vertical location of the SNP along the y-axis is its p-value for the correlation of itself with the phenotype - but in Manhattan plots, the p-values are negative-log-transformed (that’s the -log10(p) label on the y-axis) so that SMALLER values are HIGHER on the plot. Basically, statistically significant SNPs are jumping up, saying “Look at me!” There is usually a horizontal dotted line provided at the level of statistical significance (again, usually p < 5x10^-8) so you can visually see where in the genome (by chromosome location) the statistically significant SNPs are located. Occasionally you will see a circular Manhattan plot; these are the same, except they are harder to read. People who use circular Manhattan plots are wrong.

QQ Plot: These are falling out of fashion, but see examples at the bottom of results pages from Japan Biobank or FinnGen. The x-axis is the distribution of p-values under the null (assuming no effects, but randomly occurring low p-values due to multiple testing alone). The y-axis is the observed distribution of p-values across all tests. If the dots (again, representing each tested SNP) fall along the diagonal, then there are not statistically significant effects (i.e. lower p-values) more than would be expected by chance. If there is “lift-off” from the diagonal (and the GWAS methods specify that ancestry principal components were included as covariates), then those SNPs have lower p-values than would be expected by chance alone.

Barplots summarizing evidence: There are a wide variety of follow-up analyses that can be done using the GWAS results. Barplots are commonly used to summarize these results. Because the metrics can vary widely, it’s important to read the text to figure out what the axes are indicating. Usually, higher bars mean that thing/label is more “important”/strongly represented by lower p-values within the GWAS results; because the actual metric being depicted can be so variable, we often transform metrics to get them into this “bigger bar = more important” aesthetic.

Effect Sizes

A p-value is not an effect size. It is the result of a function depending on both the effect size AND the sample size. A low p-value may occur either because there is a large effect OR because the sample size is large.

Largest genetic effect: Depending on the paper, this may be presented in the text, or it may be buried in supplemental materials available alongside the paper (supplements for GWAS can be substantial - the available files should be linked/downloadable from wherever the paper is posted online). Because of consistently small individual effect sizes, many papers now restrict reporting of individual SNP effects to summary depictions, such as the Manhattan and QQ plots, and provide a limited table of “top results”, or even a full table of millions of results, in the supplement. Usually, the effect size of a single SNP is reported in terms of its odds ratio (OR; for a binary yes/no outcome) or its correlation (r) or standardized regression weight (beta) (for continuous outcomes). Another commonly used effect size reporting metric is often labeled variance explained (r^2 - conveniently, the square of what the correlation metric would be).

After the individual SNP results, there is a MULTITUDE of ways that the set of results may be summarized. Some common approaches are:

Polygenic scores: Aggregation of genome-wide variants, summed like items on a test. Usually reported in terms of correlation (r) or variance explained (r^2). We know that effect sizes of polygenic scores tend to be inflated by parallel cultural transmission processes - the strongest test of a polygenic score is WITHIN FAMILIES (that is, how well does it correlate with phenotype differences within a family, eg. between siblings, where ancestry/cultural confounds are controlled). Under NO CIRCUMSTANCE should a polygenic score be tested in the exact same sample in which the SNP effects (GWAS) were estimated - this leads to CATASTROPHIC overestimation of the effect (eg. it’s not hard to get r^2 = 1.0, because the number of variables is greater than the number of participants, so “overfitting” to the sample is a BIG problem).

Heritability among unrelated participants: The logic of twin studies can be extended to “unrelated” folks who have been genotyped, where we can ask the same question: to what extent are more closely related (more genetically similar) individuals more similar in their phenotypes? The heritability is usually reported as a % out of 100 or a decimal place (e.g. 0.xx) out of 1.0. It is sometimes labeled as VA (the Variance attributable the Additive effect of the measured genotypes).

Genetic correlation with other phenotypes: Just like we can estimate correlations between observed phenotypes, so we can estimate the extent to which the pattern of results in a GWAS is similar to patterns of GWAS results observed for other phenotypes. Genetic correlation is typically scaled to range from 0.0 to +/- 1.0, REGARDLESS of the observed phenotypic correlation and the observed heritabilities of the phenotypes (so, even if the observed phenotypes are weakly correlated and/or weakly heritable, they can still show a high degree of genetic correlation, indicating that what phenotypic correlation/heritability there is can be traced to similar patterns of associated SNPs. Because most GWAS results these days are shared publicly in-full, it is relatively easy to take results from a single GWAS and estimate genetic correlation with any GWAS that has ever been done, without needing access to the underlying data (that is, the method requires only the results/summary statistics). It is sometimes labeled rg (the correlation -r- between observed genetic “influences” -g- on each phenotype).

Gene-based or Pathway analyses: Evaluation of whether p-values are lower than would be expected by chance within certain pre-defined sets of SNPs, such as individual genes or “pathways” (sets of multiple genes that share some common function, such as “dopamine genes” or “skeletal genes” or “genes expressed in the brain”). This is one of the RARE EXCEPTIONS to the “p-value is not an effect size” rule - most gene- or pathway-tests ONLY give a p-value as the result. The paper will usually tell you what the adjusted threshold for statistical significance is here - it will depend on the number of genes/pathways tested. Keep in mind that ONLY pre-defined genes/pathways CAN be tested - so it cannot test what we don’t yet know.

Tissue/expression enrichment: There are a huge, rapidly developing array of methods for examining tissue expression and epigenetic mechanism “enrichment,” or (as in the gene- or pathway-based methods) overrepresentation of SNPs with low p-values in regions known to be expressed in certain tissues or known to be susceptible to variations in expression from a variety of factors (including but not limited to methylation, which is one commonly studied form of epigenetic modification).

Next: 4.3. Additional Readings

Previous: 4.1. How To Read a Classical Twin Study

Home: Table of Contents