#4.5: What happens inside the cells during a genetic engineering experiment
By Rohan M.
Hopefully you’ve arrived here right after finishing our previous blog post on the Engineer-It Kit, and are hungry for an explanation as to why our seemingly oh-so-fragile plasmids had such a tremendous effect! But if you’re not, that’s great too. Because the lessons we’re about to learn pay their dividends whether you’re concerned with theory or experiment…or if you’re just cramming for AP Bio (believe me, I can empathize). The whole dance between plasmid and protein is encapsulated in an idea called the Central Dogma, but for simplicity’s sake, we’re just going to focus on the first part of this idea, called transcription. Since this post is just meant to be a short, informative note I’ll eschew my usual verbosity and get right into things: transcription is all about converting DNA to RNA.
The Central Dogma involves three steps: transcription, translation, and enzymatic processing. Here we’ll focus on transcription, which converts DNA to RNA. Zero to Genetic Engineering Hero book Figure 4.18
As we touched on in the DNA extraction experiment, DNA consists of a phosphate backbone that strings together a series of deoxyribose sugars which themselves attach to any of four nitrogenous bases (which we denote A, G, C, and T). RNA is very similar, except it substitutes these deoxyribose sugars for plain ribose ones, which have one extra OH group sprouting off the sugar’s carbon ring. Everything else is the same though (except that instead of a nitrogenous base T, it has a nitrogenous base U – but this doesn’t matter much to us).
The main molecular difference between RNA and DNA is that RNA has an extra OH group. Zero to Genetic Engineering Hero book Figure 4.19
However this slight difference – plus the fact that RNA is much, much, much less likely to form double-stranded helical structures than DNA is – gives RNA some very different properties. For instance, while DNA is very stable (it has to be, as it stores genetic information over hundreds of thousands of years!), RNA only tends to last for a few minutes to a few hours. Obviously, there are still lots of unanswered questions about why life evolved in this way, but a good way to think about things is that RNA is a more flexible and actionable form of a cell’s genetic information. A cell can get down and dirty with its RNA, because even if it gets messed up, the DNA is still securely stored behind lock-and-key in the nucleus. But we’re getting ahead of ourselves. How do we even get RNA from DNA in the first place?
An artistic (left) and more realistic (right) depiction of the enzyme RNAP, which converts strands of DNA to complementary strands of RNA. Zero to Genetic Engineering Hero book Figure 4.21, 4.22
The answer is a handy little enzyme (which is a type of protein) called RNA polymerase (RNAP). And here’s the crazy bit – RNA polymerase is itself coded for by a gene in all organisms’ DNA. So at least some of the time, it’s creating itself (which is why life chose Lisp, everything is macros, and biology is beautiful)! Wonderment aside, RNAP works by threading a single strand of DNA through its complex machinery, reading each individual base, and synthesizing the corresponding RNA base. But given just how massive even an E. coli cell’s whole genome is, it’s intractable to transcribe everything in a single go. Which is exactly why we have genes.
What is a gene? Most people would probably say something along the lines of “the code for some trait.” But that’s actually wrong. The most succinct and correct definition of a gene is: all the information needed to read and operate a specific bit of genetic code, including that bit of genetic code itself. This is a veritable tongue twister, but the idea is actually pretty simple. A gene is not just the data, it’s also the metadata – when and where the data should be read, who has access to the data, what the data is about, etc.
So how does this interplay between data and metadata help us get from DNA to RNA? Well first off, we usually refer to the data and metadata as coding and non-coding regions, respectively (since the data explicitly codes for the protein of interest, while the metadata doesn’t). While things can be more complicated on a case-by-case basis, the non-coding region usually comes before the coding region, and its job is to determine whether the right conditions have been met for the coding region to be transcribed. If these conditions have been met, then various “transcription machinery” attaches to the non-coding region and sets of a chain reaction which ends with an RNA transcription of the coding region (this is all done through a complex dance of proteins changing shape – see this Kurzgesagt video for more on the topic).
For RNAP to bind to a gene and start transcribing it, a small molecule known as a sigma factor must first bind to that gene’s promoter. Zero to Genetic Engineering Hero book Figure 4.23
This chain reaction starts with a small molecule known as a sigma factor. If the conditions for transcription are met, then its shape will fit the shape of a section of non-coding DNA called the promoter. The promoter is one of those rare cases of aptly named things in biology, because all it determines is whether or not to promote gene transcription. Notice I said “promote” rather than “allow.” That’s because nothing in biology is deterministic – it’s all a game of chance and probability. Certain conditions increase the chance of transcription happening, but there’s always some non-zero chance it will occur. Messiness aside, once the sigma factor binds to the promoter, it recruits RNA polymerase molecules which themselves bind to the sigma factor. But we still face two major problems in producing our RNA transcript. For one, the sigma factor was helpful as a way of getting RNAP’s attention, but now it’s in the way. Not to mention that it’s currently located in the non-coding region, but we want a transcript of the coding region.
To solve the first problem, the RNA polymerase needs to bond directly to the DNA strand, rather than bonding to it via a sigma factor. It does this by transcribing a short segment of non-coding DNA near where it’s currently located, which helps it grip onto the DNA. Then, the sigma factor detaches and RNA polymerase speeds off towards the coding region, beginning transcription.
I don’t blame you if you think this sounds overly complicated, but then again, almost everything in biology is. The upshot is that this byzantine interplay of sigma factor, promoter, and RNAP allows the cell precise control over gene transcription. And things get even more mind-boggling when you realize that there is a wide variety of sigma factors, all of which are themselves coded for by genes which need sigma factors to attach to their own promoters to be transcribed!
But even if we sweep all that under the rug, there’s a pretty simple question we’ve been ignoring thus far. How does RNA polymerase know which way to go? How does it know to move in the forwards direction, toward the coding region of the DNA, rather than moving backwards, towards the non-coding region?
DNA has directionality – one end is called the 5’ end and the other is called the 3’ end. A double helix contains two strands of DNA with one going 5’-3’ and the other going 3’-5’. Zero to Genetic Engineering Hero book Figure 4.24, 4.25
Well, first off, we don’t talk about forwards and backwards when we’re describing DNA. Instead, we talk about regions that are upstream or downstream from a point of interest. At first glance, it may seem impossible to determine one way for the other due to the apparent symmetry of a DNA molecule. But it’s actually not symmetrical! Remember, the basic structure of DNA’s deoxyribose backbone is a chain of many carbon rings. The fifth carbon on this ring, C5, is what attaches to the phosphate group that allows the next deoxyribose ring to be added on to the current one. The third carbon on this ring, C3, attaches to an OH group which allows the current deoxyribose ring to attach to the C5 phosphate group of the previous deoxyribose sugar. In this sense, there is some symmetry-breaking in DNA’s structure – it has so-called 5’ and 3’ ends, but only the former can be extended. That’s why we refer to the 5’ end as being upstream and the 3’ one as being downstream. In its double-helical form, two DNA strands join together in an antiparallel fashion: the 5’ end of one strand aligns with the 3’ end of the other and vice versa. So upstream for one strand is downstream for the other!
RNAP travels downstream from the promoter and transcribes the lagging strand, because a RNA strand that is complementary to the lagging strand is equivalent to the leading strand. Zero to Genetic Engineering Hero book Figure 4.26, 4.27
Returning to our original question, RNA polymerase evolved to proceed in the 3’ or downstream direction, just like a gear that can only turn one way. But it’s not that simple – the promoter may be on either of the two DNA strands, and which way is downstream differs for both. So the strand with the promoter – the (+), or leading strand – makes sure to orient the RNA polymerase in the downstream direction. If it didn’t, the RNAP’s internal machinery would “clog up,” like items going the wrong way on a conveyor belt. But even though RNAP travels along the leading strand, it actually transcribes from (-), or the lagging strand!
If you’re anything like me, your reaction to this weird piece of information is something along the lines of: What?! Why?! How?! Followed by a very clear understanding of what a love-hate relationship is, because despite its elegance, biology really can be so frustrating at times. But trust me, this one does make sense if we take the time to understand it! Earlier, I said that RNAP works by reading each base of the DNA strand it's transcribing and then choosing the corresponding RNA base (A, C, U, or G). But obviously, RNAP can’t “read” any more than any molecule can read. What really happens is that millions or billions of bases randomly slosh into RNAP’s machinery. However, only the base which complements the current base in the DNA strand will stick. Once it does, the RNA polymerase knows that it can move one more base downstream and repeat the process. But there’s a problem!
RNAP doesn’t really read a DNA base and then put the complementary RNA base next to it. Instead, it tries out all RNA bases in the area until something sticks. Zero to Genetic Engineering Hero book Figure 4.28
The RNA base that complements the current DNA base is by definition the complementary base. You’d be excused for thinking duh!, but this means that if we transcribe a given DNA strand, we end up with its complementary RNA strand…rather than its analogous RNA strand. That’s exactly why, if we want an RNA copy of the leading strand, we have to transcribe its complementary strand – the lagging strand! Since two complements cancel out, and since transcription generates a complementary strand, then transcribing the complementary strand of the strand we actually want is the only way to end up with the RNA version of that strand.
We’re really deep in the weeds now, but I promise, we’ve almost made it! The only question we’ve yet to answer is how RNAP knows when to stop transcription. The promoter only controls its own gene, a very localized section of genetic code. So RNAP needs some way of determining when the promoter that gave it the initial OK signal to start transcription no longer has jurisdiction over where it currently is. This can happen in all sorts of ways. Sometimes the transcript RNA will begin to form a hairpin structure that folds in on itself which causes RNAP to detach from both the DNA strand and RNA transcript, like a car skidding off the road. Sometimes, RNAP can even be chased away by another protein called Rho. Or it might just lose its grip because it’s started to transcribe a lot of A-T pairs, which are weaker than G-C pairs and don’t allow it to hold onto the DNA as tightly. Genes employ a mix of all of these techniques to dislodge RNAP.
And that’s basically it: transcription, or there and back again! That was a real lightspeed, seat-of-your-pants overview we just went on, so you may need to take a few minutes to step back through the finer details. But now we understand what happened in our Engineer-It Kit experiment. The plasmids we engineered inside our cells were transcribed into RNA, which was then translated into protein (this part is still magic to us, but don’t worry, we’ll get to know it much more intimately in time) which in our case was colorful purple pigment. Things really start getting interesting, though, when we think about this pigment as a product we might want to extract from our cell. I’ll see you there next – and before you ask, it’s upstream of this post!