2.1 Candidate Genes

In the late 20th century, the development of novel technologies allowed us to partially read small parts of the genome (to describe this process broadly, we tend to use the word “genotype” as a verb, as shorthand for “to read the genotype”).

In the olden days, we weren’t able to obtain full genome reads, however. Genotyping required targeting specific sequences within the DNA, cutting them away from the rest of the surrounding DNA, and investigating them intentionally. Similar techniques can be used for a variety of applications in genetic research and application (including cutting DNA into small bits, running them across an electrophoresis gel, and comparing the results across two individuals, say a suspect and DNA found at a crime scene - a process you’ve likely seen depicted in countless crime dramas). In behavior genetics, our application of this technology took the form of candidate gene research - if we knew in advance what gene (or, even more specifically, which variant within a gene) we wanted to study, we could cut out just that part, amplify it, and determine the genotype at that targeted location.

The classic candidate genes are the variants that you’re most likely to have heard are associated with human behavior. 5-HTT-LPR, MAOA-uVNTR, COMT-Val158Met, DRD4-7R all emerged as major candidate genes for human behavior in the 1990s and 2000s. The term candidate gene, however, is a pretty serious misnomer. Genes are huge, complicated things, with many sections that can be read in many variable ways (this is how your same DNA in every cell of your body can create so many different tissues and organs - the genotype is unchanged, but how it is read differs across tissue types). And even their “thing”-ness is questionable; debates around how to define “a gene” are still very much around, as we develop our understanding of the variable reads (including alternative start-stop locations, meaning that genes are not only next to each other but can also overlap), effects of regulatory elements upstream and sometimes downstream of the genes, and the role of intronic elements (the non-coding parts of the genes, in contrast with the coding exonic regions). Candidate genes would be more accurately called candidate VARIANTS. The term “candidate gene” seldom refers to an entire gene; rather, it typically refers to the targeted investigation of a single variant (from potentially hundreds) within a gene. (The specification or label of exactly which variant within the whole gene is what is labeled after the “-“ in my gene list at the start of this paragraph.)

So, how did we choose which genes and which variants to go after? The genes were relatively easy to compile a shortlist of. We looked to non-human animal research, which had been investigating the consequences of removing an entire gene from model organisms, like mice, fruit flies, and worms. Luckily, much of the genome is conserved across species during evolution. That is, most of our DNA isn’t working at making us specifically human or specifically ourselves, it’s just trying to keep the basic functions of life going (breathing, eating, making sure our cells stick together). As a result, we share many genes (and biological systems) with other species, so when we find a gene that, say, substantially alters how dopamine is processed, and we know from dopamine administration trials in humans and non-humans that dopamine changes behavior (including things like risk-taking) we might be optimistic that the particular gene identified in the model organisms would be an interesting target within humans. Picking the specific variant within a given gene was more tricky, especially in the late 20th century when model organism research (and technology) was largely limited to targeting the removal of whole genes (or even multi-gene segments). In genotyping, a whole gene was too much, especially if we wanted to characterize naturally occurring variation among individuals (not just gene presence versus gene absence). You might think that the variants within genes (your -LPRs, -uVNTRs, -Val158Mets, -7Rs, etc.) were selected because they were known to have the greatest impact on gene function (or even organism behavior). The reality is much more mundane and practical. Specific variants were chosen within these candidate genes for one primary reason - they were relatively easy to genotype.

DNA has a physical structure, with turns and folds and repetitions that make certain parts harder to physically access than others. For the most part, what we refer to as candidate genes were selected because they were one of the hundreds of potential variants within large genes of overall consequence (that is, you don’t want to be missing it entirely) that could be pretty easily genotyped in a lab by someone with relatively little training. In many cases, it was because the variant regions were large - most of the variants targeted by candidate gene studies are what’s known as a variable number of tandem repeat regions (VNTRs), which means a short sequence of As, Cs, Ts, and/or Gs repeats a variable number of times (as is the case for the 5-HTT-LPR, where LPR stands for Limited Polymorphism Repeat (polymorphism meaning “many forms”; for MAOA-uVNTR, where that lower case u is how we lazily type lower case greek letter “mu”, identifying the specific VNTR we’re talking about; and DRD4-7R, referring to the 7 Repeat version of a VNTR that can commonly have both more or less than 7 repeats, but the 7R form is what’s been proposed for an increased likelihood of ADHD-like traits).

So, big stuff is easy to see/genotype, and it was all we could do at the time, better than nothing, right?

This week you’ll read in the Slate Star Codex (2019) blog post and the Keller & Duncan (2011) paper about the history of 5-HTT-LPR research and how we wasted immense money, time, and public attention chasing and defending false positives. I also want to give you some specific background on MAOA-uVNTR (“the warrior gene”), whose story runs very much parallel to 5-HTT-LPR, but happens to be the first gene that I ever personally published on, and so is, as a cautionary tale, quite near and dear to my heart. And for that reason, and because I’ve gone on quite long enough here, dear reader, I’ll be covering the story of my own false positive MAOA finding in lecture this week.

Next: 2.2. Additional Readings and Sources

Previous: 2.0. We’ve Been Wrong Before

Home: Table of Contents