Information

Is it possible to have multiple stop codons in one exon?


I would be very happy if someone can help me to find the answers for the following related questions.

  1. Can one exon have many stop codons?
  2. Can protein synthesis happen, if the stop codon is at the beginning of the exon?

An exon can have multiple stop codons but the first codon will terminate the ORF. The remainder of the exon will be a part of the 3'UTR.

However, there are some cases in which stop codon readthrough happens [1]. In these cases an internal stop codon does not terminate the translation and the ribosome reads through it. During this process an aminoacyl-tRNA successfully competes with the release factor. This is quite evident in the case of selenocysteine and pyrollysine- tRNAs which bind to theUAG(amber) stop codon in certain organisms [2,3] (this is also referred to as amber suppression). The exact mechanisms that affect stop codon readthrough are not elucidated but features in the mRNAs are likely to play a role in this. In case of amber suppression, the relative rarity of theUAGstop codon and lower concentrations of release factor-1, can be major factors.


I don't know what you mean by "beginning of the exon" but there are short ORFs (< 100 codons) which can code for small peptides [4]. Small peptide as small as 6 amino acid residues has been reported but most of the known small peptides are ~30-100 residues long.


References:
[1] Loughran, Gary, et al. "Evidence of efficient stop codon readthrough in four mammalian genes." Nucleic acids research 42.14 (2014): 8928-8938.
[2] Agafonov, Dmitry E., et al. "Efficient suppression of the amber codon in E. coli in vitro translation system." FEBS letters 579.10 (2005): 2156-2160.
[3] Wikipedia: Expanded genetic code
[4] Andrews, Shea J., and Joseph A. Rothnagel. "Emerging evidence for functional peptides encoded by short open reading frames." Nature Reviews Genetics 15.3 (2014): 193-204.


Start codons in DNA may be more numerous than previously thought

Image of an agar plate streaked with 16 different strains of Escherichia coli, each containing a green fluorescent protein with a different start codon (annotated along the edge of the plate). The 16 codons correspond to the 16 strongest expressing codons. Image is a composite of two super-imposed images from a laser scanner. Credit: Jeff Glasgow/Ariel Hecht/Kelly Irvine/NIST

For decades, scientists working with genetic material have labored with a few basic rules in mind. To start, DNA is transcribed into messenger RNA (mRNA), and mRNA is translated into proteins, which are essential for almost all biological functions. The central principle regarding that translation has long held that only a small number of three-letter sequences in mRNA, known as start codons, could trigger the production of proteins. But researchers might need to revisit and possibly rewrite this rule, after recent measurements from a team including scientists from the National Institute of Standards and Technology (NIST).

The findings, to be published on February 21, 2017, in the journal Nucleic Acids Research by scientists in a research collaboration between NIST and Stanford University, demonstrate that there are at least 47 possible start codons, each of which can instruct a cell to begin protein synthesis. It was previously thought that only seven of the 64 possible triplet codons trigger protein synthesis.

"It could be that many potential start codons had remained undiscovered because no one could see them," said lead author Ariel Hecht, a team member at the Joint Initiative for Metrology in Biology, a research collaboration that includes NIST and Stanford.

Scientists made many of their initial discoveries about DNA and RNA, including start codons, in the 1950s and 1960s. Those ideas have since become enshrined in textbooks around the globe as the modern understanding of the rules of molecular biology.

Genetic code is typically represented via sequences of four letters—A, C, G, and T or U—which correspond to the molecular units known as adenine, cytosine, guanine and thymine (for DNA code) or uracil (for RNA code). Fifty years ago, the best available research tools indicated that there were only a few start codons (with sequences of AUG, GUG and UUG) in most living things. Start codons are important to understand because they mark the beginning of a recipe for translating RNA into specific strings of amino acids (i.e., proteins).

The JIMB team's realization that there might be something amiss in the general understanding of how codons perform began unexpectedly over a round of bagels and coffee. Hecht and his colleagues Jeff Glasgow, Lukmaan Bawazer and Matt Munson were discussing colleague Paul Jaschke's unsuccessful attempt to refactor a virus, phiX174. Refactoring is a kind of re-coding or rearranging used to study genomes and to identify essential genes. phiX174 can be used to infect E. coli cells as a part of such studies.

Jaschke had replaced the start codons of several genes with codons that should not have started translation (AUA and ACG). However, to Jaschke's surprise, he was still detecting the expression of those genes that should have been silenced due to removal.

Hecht pondered what seemed like a rather naïve question: Was Jaschke's experimental result actually wrong? What if the results indicated that codons didn't fit a traditional description of start or not, but instead had varying likelihoods to initiate start translation? To the best of their knowledge, no one had ever systematically explored whether translation could be initiated from all 64 codons. No one had ever proved that you cannot start translation from any codon.

"We kind of all collectively asked ourselves: had anyone ever looked?" said Hecht. A further review of available literature on the topic indicated that the answer was no.

The levels at which 64 different codons initiate the production of amino acids, the building blocks of proteins. Credit: Hecht et al., Nucleic Acids Res 2017 gkx070

Unlike geneticists working a half-century ago, the JIMB team and others who peer into the inner workings of cells now have far more powerful tools at their disposal, including green fluorescent protein (GFP), a protein adapted from jellyfish, and nanoluciferase, another protein adapted from a deep sea shrimp. Both GFP and nanoluciferase emit light when expressed inside cells and have been optimized within the past decade to produce very strong signals that can be used to probe the cells in depth.

"Ten years ago the tools to make this kind of measurement didn't exist," Hecht said.

NIST specializes in the process of precision measurement, and the start codon challenge proved irresistible to the JIMB team. The collaboration was formed in 2016 with the goal of advancing biomeasurement science and facilitating the process of discovery by bringing together experts from academia, government labs and industry for collective scientific investigations.

With the use of GFP and nanoluciferase, the team measured translation initiation in the bacteria E. coli from all 64 codons. They were able to detect initiation of protein synthesis from 47 codons.

The implications of the work could be quite profound for our understanding of biology.

"We want to know everything going on inside cells so that we can fully understand life at a molecular scale and have a better chance of partnering with biology to flourish together," said Stanford professor and JIMB colleague and advisor, Drew Endy. "We thought we knew the rules, but it turns out there's a whole other level we need to learn about. The grammar of DNA might be even more sophisticated than we imagined."

Still, the JIMB team cautions, this paper is really just the first step, and it is unclear what studies of other organisms will reveal.

"We need to be very careful about extrapolating from these findings or applying them to other organisms without further, deeper research," said Hecht. He hopes that this paper will encourage or inspire other researchers to explore the topic to find even more answers.

"It could be that all codons could be start codons," Hecht said. "I think it is just a matter of being able to measure them at the right level."


Hentze, M. W. & Kulozik, A. E. Cell 96, 307–310 (1999).

Li, S. & Wilkinson, M. F. Immunity 8, 135–141 (1998).

Maquat, L. E. in Translational Control of Gene Expression (eds Hershey, J. W. B., Mathews, M. B. & Sonenberg, N.) 849–868 (Cold Spring Harbor Laboratory Press, 2000).

Humphries, R. K., Ley, T. J., Anagnou, N. P., Baur, A. W. & Nienhuis, A. W. Blood 64, 23–32 (1984).

Takeshita, K., Forget, B. G., Scarpa, A. & Benz, E. J. Blood 64, 13–22 (1984).

Urlaub, G., Mitchell, P. J., Ciudad, C. J. & Chasin, L. A. Mol. Cell. Biol. 9, 2868–2880 (1989).

Cheng, J. & Maquat, L. E. Mol. Cell. Biol. 13, 1892–1902 (1993).

Carter, M. S., Li, S. & Wilkinson, M. F. EMBO J. 15, 5965–5975 (1996).

Cheng, J., Belgrader, P., Zhou, X. & Maquat, L. E. Mol. Cell. Biol. 14, 6317–6325 (1994).

Thermann, R. et al. EMBO J. 17, 3484–3494 (1998).

Le Hir, H., Moore, M. J. & Maquat, L. E. Genes Dev. 14, 1098–1108 (2000).

Shyu, A. B. & Wilkinson, M. F. Cell 102, 135–138 (2000).

Iborra, F. J., Jackson, D. A. & Cook, P. R. Science 293, 1139–1142 (2001).

Wilkinson, M. F. & Shyu, A. B. Bioessays 23, 775–787 (2001).

Kim, V. N. et al. EMBO J. 20, 2062–2068 (2001).

Kim, V. N., Kataoka, N. & Dreyfuss, G. Science 293, 1832–1836 (2001).

Le Hir, H., Izaurralde, E., Maquat, L. E. & Moore, M. J. EMBO J. 19, 6860–6869 (2000).

Le Hir, H., Gatfield, D., Izaurralde, E. & Moore, M. J. EMBO J. 20, 4987–4997 (2001).

Lykke-Andersen, J., Shu, M. D. & Steitz, J. A. Science 293, 1836–1839 (2001).

Kataoka, N., Diem, M. D., Kim, V. N., Yong, J. & Dreyfuss, G. EMBO J. 20, 6424–6433 (2001).

Lykke-Andersen, J., Shu, M. D. & Steitz, J. A. Cell 103, 1121–1131 (2000).

Gersappe, A. & Pintel, D. J. Mol. Cell. Biol. 19, 1640–1650 (1999).

Lozano, F., Maertzdorf, B., Pannell, R. & Milstein, C. EMBO J. 13, 4617–4622 (1994).

Muhlemann, O. et al. Mol. Cell 8, 33–43 (2001).

Maquat, L. E. Am. J. Hum. Genet. 59, 279–286 (1996).

Valentine, C. R. Mut. Res. 411, 87–117 (1998).

Wang, J., Hamilton, J. I., Carter, M. S., Li, S. & Wilkinson, M. F. Science (in the press).

Lund, E. & Dahlberg, J. E. Science 282, 2082–2085 (1998).

Dostie, J., Lejbkowicz, F. & Sonenberg, N. J. Cell Biol. 148, 239–245 (2000).

Dostie, J., Ferraiuolo, M., Pause, A., Adam, S. A. & Sonenberg, N. EMBO J. 19, 3142–3156 (2000).

Etchison, D. & Etchison, J. R. J. Virol. 61, 2702–2710 (1987).

McKendrick, L., Thompson, E., Ferreira, J., Morley, S. J. & Lewis, J. D. Mol. Cell. Biol. 21, 3632–3641 (2001).

Goidl, J. & Allen, W. R. Trends Biochem. Sci. 3, N225–N228 (1978).

Mangiarotti, G. Biochemistry 38, 3996–4000 (1999).

Chen, A. C.-Y. & Shyu, A.-B. Trends Biochem. Sci. 20, 465–470 (1995).

Fortes, P. et al. Mol. Cell 6, 191–196 (2000).

Ishigaki, Y., Li, X., Serin, G. & Maquat, L. E. Cell 106, 607–617 (2001).

Aravind, L. & Koonin, E. V. Genome Res 10, 1172–1184 (2000).

Ohno, M., Segref, A., Bachi, A., Wilm, M. & Mattaj, I. W. Cell 101, 187–198 (2000).

Visa, N., Izaurralde, E., Ferreira, J., Daneholt, B. & Mattaj, I. W. J. Cell Biol. 133, 5–14 (1996).

Wells, S. E., Hillner, P. E., Vale, R. D. & Sachs, A. B. Mol. Cell 2, 135–140 (1998).

Serin, G., Gersappe, A., Black, J. D., Aronoff, R. & Maquat, L. E. Mol. Cell. Biol. 21, 209–223 (2001).

Mendell, J. T., Medghalchi, S. M., Lake, R. G., Noensie, E. N. & Dietz, H. C. Mol. Cell. Biol. 20, 8944–8957 (2000).


Code Busting: Genomes Where a "Stop" Sign Means "Go"

Up until now it has been believed that most organisms use the same genetic code to specify how to make proteins. A new article in Science, however, shows that this may not be true, and that considerably more variation in organisms’ genetic codes may exist than was expected. Out there in the "microbial dark matter," in organisms that won’t grow in the lab and that have not previously been examined, are species that have specified one of the canonical stop codons (TAG, TGA and TAA in DNA, and UAG, UGA and UAA in the transcribed RNA) to encode an amino acid instead. And the percentage of organisms that have done so is much higher than anyone would have thought possible.

What are stop codons? They play an essential role in protein synthesis by telling the protein-making machine, the ribosome, when to stop making a particular protein. In bacteria, genes are often grouped in units of regulation known as operons the genes in these operons are transcribed as one long message. The stop codons act like periods, telling the ribosome to "stop here" as it works its way down the message. The ribosome releases the protein it has just made, and then moves on down the message to the next one.

What these scientists have discovered is that in these organisms with non-canonical genetic codes, one of their "stop" codons doesn’t mean stop, it means "insert an amino acid here." The result is that the ribosome reads through the "stop" codon, inserting an amino acid instead, and thus joins what would have been two proteins into one.

How did the scientists spot this phenomenon? According to the Science Daily article describing this work, they found a genome where, if the canonical three stop codons were used, many reading frames were about 200 nucleotides long. These very short reading frames would make proteins only 60 to 70 amino acids in length, which is much shorter than the norm. They also noticed that many short reading frames terminated with the same stop codon UGA. If they reassigned UGA to glycine, however, all the genes became normal in length. Now that they knew such a thing might happen, they looked for other examples in the wide wonderful world of genomic samples and they found many other cases where it appeared organisms had non-canonical genetic codes.

How might this happen? A mutation would have to occur in a gene encoding a tRNA. In the case of the stop codon UGA, it is one base different from the codon that specifies glycine, GGA. If a tRNA(gly) that normally recognizes GGA mutates so as to recognize UGA, that tRNA(gly) would insert glycine at UGA codons and prevent termination. Not so bad, you say? In bacteria, many transcriptional units are one long message containing the coding sequences for multiple genes. If a stop codon is incorrectly read in such messages, proteins end up as concatenated chains. UGA represent 29 percent of all stop codons in E. coli, so that could make a fine mess.

In these newly identified genomes all the UGAs or all the UAAs or all the UAGs appear to be read as amino acids. Which it is depends on the organism. Bacteria seem to favor replacing UGAs. Another thing — the termination factor(s) that normally bind UGA and end translation have been inactivated. Otherwise there would be competition between the mutant tRNA and the termination factor(s) for UGA binding.

Thus, in bacteria with these alternate genetic codes all UGA stop codons are treated the same way. This only works, however, because none of those UGA codons are still supposed to be stop codons. In order to terminate proteins properly, any necessary stops must be either UAA or UAG, the remaining stop codons.

There are some puzzles here. If this kind of universal reassignment of tRNA(gly) to UGA were to happen suddenly in E. coli, and the termination factor(s) for UGA were to be inactivated, 29 percent of all proteins would potentially become concatenated with other proteins. This would be a catastrophe.

A smaller version of this can happen in E. coli. We know because we sometimes see suppressor mutations that restore truncated proteins to full length. Somewhere else in the same strain a mutation happens that makes a tRNA recognize the particular stop codon present in the truncated protein’s coding sequence. The mutant tRNA "suppresses" the early termination by inserting its particular amino acid.

Suppressor strains are sickly, because not just the truncated gene gets affected. Other proteins that should be terminated are not because of the mutated tRNA. These strains can easily die out or revert unless maintained by strong selection for their suppressive effect. That is, if the mutation that they are suppressing is bad enough, then being sick is better than being dead.

However, the kind of change discussed in this article is far more radical: the organism now uses just two stop codons, and the missing former stop codon now specifies an amino acid instead. That means that if a switch occurred, any gene that used the missing stop codon had to substitute one of the other stop codons in its place. Genome wide.

It’s very difficult to change codes in midstream, so to speak, especially with respect to stop codons. To get a code switch like this right, the decoder (the tRNA) and the thing being decoded (the genome) have to switch the function of the signal (the codon) simultaneously and universally.

As an example of what can happen if a code switch occurs suddenly and universally, suppose your bank keeps records of your transactions with tabs to separate different deposits.

A computer virus rewrites the code for your account so that where there were tabs you now have a 2, for example.

The meaning is lost and the bank thinks you just stole the U.S. Treasury. You get arrested and thrown in jail, and all your assets are frozen.

Drawing of a tRNA molecule, taken from Meyer and Nelson (2011).

The Science paper implies that switches of this kind have arisen multiple times in evolution. It should be a very, very rare event, though, because of the number of steps involved for it to have happened gradually. For example, here’s one scenario:

  1. One out of a four tRNA(gly) genes switches to recognizing UGA. The resulting mutant tRNA(gly) then rescues a truncated protein that has an early termination codon (UGA) somewhere in its sequence.
  2. Because there are four tRNAs in E. coli that recognize GGA codons, if one mutates, there still would be three left unmutated, so GGA codons still would be read as glycine. And not all UGAs would be read as glycines either, because the termination factor for UGA would still be around, and could bind to some proportion of the messages that use UGA as a stop codon, and properly terminate their proteins. But the mutant strain would be sick.
  3. Under pressure from the detrimental effects of the mutant tRNA, some other genes could change their stop codons from UGA to UAA. Since these genes would no longer be affected, those cells would survive and the bacterial strain would get healthier.
  4. The cycle could continue. Another tRNA(gly) gene might mutate, and the strain would get sicker. Any change of UGAs to UAAs could potentially alleviate this, though how much is unknown.
  5. Perhaps the abnormal UGA-tRNAs might prevent some bacterial viruses from reproducing in them. But in order to be effective in messing up viral translation, the abnormal tRNAs would probably be just as effective at messing up the host’s translation.
  6. Finally, the strain could lose the particular termination factor(s) that recognize UGA, and then any remaining messages with UGA codons would be forced to convert or adapt to concatenation or die.

Notice, this all depends on a sickly strain competing against healthy sisters in the wild long enough for other genes to mutate their stop codons. It’s hard to maintain these strains in the lab, where competition is reduced and food is plentiful, so no one knows how likely this might be.

And to change the code across the whole genome is wildly improbable. Changing a UGA to UAA would have to happen at least 1160 times (29 percent of E. coli‘s more than 4000 genes use TGA, which is UGA in RNA). The steps in this scenario are testable, but they would need to be demonstrated as feasible under realistic conditions.

Why do I go into all of this? The paper seems to imply that this process has happened multiple times. They report that the distribution of species with these non-canonical codes is patchy across evolutionary trees, suggesting multiple origins. But if it’s very hard for such a transition to happen once, how likely is it that it has happened multiple times? The paper does not address how or why that might be the case.

Could there be another explanation? Perhaps all these species had unique codes to begin with. Now there’s a thought that should bring us to a full stop.


How can there be 64 codon combinations but only 20 possible amino acids?

Codons are three letter genetic words: and the language of genes use 4 letters (=nitrogenous bases). Hence 64 words are there in genetic dictionary, to represent 20 amino acids that the biological organisms use.

Explanation:

And you must note that more than one codon may code for the same amino acid. This is referred to as degeneracy of the code.

For example, three amino acids are coded by any of six different codons, and that alone uses up 18 of the 64 combinations.

Three of the codons are stop codons.

They do not code for any amino acid.

Instead, they act as signals to end the genetic message carried by messenger RNA .

The number of amino acids coded by codons is

#1 " codon" × color(white)(l)2 " amino acids" = color(white)(ll)2 " codons"#
#2 " codons" × 9 " amino acids" = 18 " codons"#
#3 " codons" × 1 " amino acid" = color(white)(X)3 " codons"#
#4 " codons" × 5 " amino acids" = 20 " codons"#
#6 " codons" × 3 " amino acids" = 18 " codons"#
#color(white)(XXXXXXXXXXXXXXXX)3" stop codons"#
#stackrel(—————————————————————————)(color(white)(XXXXXXXXXl)"TOTAL" = 64 " codons")#


Effects of Mutations

The majority of mutations have neither negative nor positive effects on the organism in which they occur. These mutations are called neutral mutations. Examples include silent point mutations, which are neutral because they do not change the amino acids in the proteins they encode.

Many other DNA damages or errors have no effects on the organism because they are repaired before protein synthesis occurs. Cells have multiple repair mechanisms to fix errors in DNA.

Beneficial Mutations

Some mutations have a positive effect on the organism in which they occur. They are referred to as beneficial mutations. They generally code for new versions of proteins that help organisms adapt to their environment. If they increase an organism&rsquos chances of surviving or reproducing, the mutations are likely to become more common over time. There are several well-known examples of beneficial mutations. Here are just two:

  1. Mutations have occurred in bacteria that allow the bacteria to survive in the presence of antibiotic drugs. The mutations have led to the evolution of antibiotic-resistant strains of bacteria.
  2. A unique mutation is found in people in a small town in Italy. The mutation protects them from developing atherosclerosis, which is the dangerous buildup of fatty materials in blood vessels. The individual in which the mutation first appeared has even been identified.

Harmful Mutations

Imagine making a random change in a complicated machine such as a car engine. The chance that the random change would improve the functioning of the car is very small. The change is far more likely to result in a car that does not run well or perhaps does not run at all. By the same token, any random change in a gene's DNA is likely to result in the production of a protein that does not function normally or may not function at all. Such mutations are likely to be harmful. Harmful mutations may cause genetic disorders or cancer.

  • A genetic disorder is a disease, syndrome, or other abnormal condition caused by a mutation in one or more genes or by a chromosomal alteration. An example of a genetic disorder is cystic fibrosis. A mutation in a single gene causes the body to produce thick, sticky mucus that clogs the lungs and blocks ducts in digestive organs.
  • Cancer is a disease in which cells grow out of control and form abnormal masses of cells called tumors. It is generally caused by mutations in genes that regulate the cell cycle. Because of the mutations, cells with damaged DNA are allowed to divide without restrictions.

Inherited mutations are thought to play a role in about 5 to 10 percent of all cancers. Specific mutations that cause many of the known hereditary cancers have been identified. Most of the mutations occur in genes that control the growth of cells or the repair of damaged DNA.

Genetic testing can be done to determine whether individuals have inherited specific cancer-causing mutations. Some of the most common inherited cancers for which genetic testing is available hereditary, breast, and ovarian cancer, caused by mutations in genes named BRCA1 and BRCA2. Besides breast and ovarian cancers, mutations in these genes may also cause pancreatic and prostate cancers. Genetic testing is generally done on a small sample of body fluid or tissue, such as blood, saliva, or skin cells. The sample is analyzed by a lab that specializes in genetic testing, and it usually takes at least a few weeks to get the test results.

Should you get genetic testing to find out whether you have inherited a cancer-causing mutation? Such testing is not done routinely just to screen patients for risk of cancer. Instead, the tests are generally done only when the following three criteria are met:

  1. The test can determine definitively whether a specific gene is mutation is present. This is the case with the BRCA1 and BRCA2 gene mutations, for example.
  2. The test results would be useful to help guide future medical care. For example, if you found out you had a mutation in the BRCA1 or BRCA2 gene, you might get more frequent breast and ovarian cancer screenings than are generally recommended.
  3. You have a personal or family history that suggests you are at risk of inherited cancer.

Criterion number 3 is based, in turn, on such factors as:

  • diagnosis of cancer at an unusually young age.
  • several different cancers occurring independently in the same individual.
  • several close genetic relatives having the same type of cancer (such as a maternal grandmother, mother, and sister all having breast cancer).
  • cancer occurring in both organs in a set of paired organs (such as both kidneys or both breasts).

If you meet the criteria for genetic testing and are advised to undergo it, genetic counseling is highly recommended. A genetic counselor can help you understand what the results mean and how to make use of them to reduce your risk of developing cancer. For example, a positive test result that shows the presence of a mutation may not necessarily mean that you will develop cancer. It may depend on whether the gene is located on an autosome or sex chromosome and whether the mutation is dominant or recessive. Lifestyle factors may also play a role in cancer risk even for hereditary cancers, and early detection can often be life-saving if cancer does develop. Genetic counseling can also help you assess the chances that any children you may have will inherit the mutation.


Paper Finds Functional Reasons For “Redundant” Codons, Fulfilling a Prediction from Intelligent Design

A new peer-reviewed paper in the journal Frontiers in Genetics, “Redundancy of the genetic code enables translational pausing,” finds that so-called “redundant” codons may actually serve important functions in the genome. Redundant (also called “degenerate”) codons are those triplets of nucleotides that encode the same amino acid. For example, in the genetic code, the codons GGU, GGC, GGA, and GGG all encode the amino acid glycine. While it has been shown (see here) that such redundancy is actually optimized to minimize the impact of mutations resulting in amino acid changes, it is generally assumed that synonymous codons are functionally equivalent. They just encode the same amino acid, and that’s it.

Well, think again. The theory of intelligent design predicts that living organisms will be rich in information, and thus it encourages us to seek out new sources of functionally important information in the genome. This new paper fulfills an ID prediction by finding that synonymous codons can lead to different rates of translation that can ultimately impact protein folding and function.

This means that DNA contains multiple languages or encoded commands occupying the same string of contiguous bases. On the one hand, a string of nucleotide bases encodes amino acids. On the other hand, that same string contains information about the rate at which the ribosome should translate the protein so that it can properly fold into the right shape. The paper calls this “translational pausing.” The ribosome is capable of reading both sets of commands — as they put it, “[t]he ribosome can be thought of as an autonomous functional processor of data that it sees at its input.” To put it another way, the genetic code is “multidimensional,” a code within a code. This multidimensional nature exceeds the complexity of computer codes generated by humans, which lack the kind of redundancy of the genetic code. As the abstract states:

The codon redundancy (“degeneracy”) found in protein-coding regions of mRNA also prescribes Translational Pausing (TP). When coupled with the appropriate interpreters, multiple meanings and functions are programmed into the same sequence of configurable switch-settings. This additional layer of Ontological Prescriptive Information (PIo) purposely slows or speeds up the translation decoding process within the ribosome. Variable translation rates help prescribe functional folding of the nascent protein. Redundancy of the codon to amino acid mapping, therefore, is anything but superfluous or degenerate. Redundancy programming allows for simultaneous dual prescriptions of TP and amino acid assignments without cross-talk. This allows both functions to be coincident and realizable. We will demonstrate that the TP schema is a bona fide rule-based code, conforming to logical code-like properties. Second, we will demonstrate that this TP code is programmed into the supposedly degenerate redundancy of the codon table. We will show that algorithmic processes play a dominant role in the realization of this multi-dimensional code.

They write that the ribosome’s ability to undergo translational pausing “reveal[s] the ribosome, among other things, to be not only a machine, but an independent computer-mediated manufacturing system.” The paper even suggests, “Cause-and-effect physical determinism…cannot account for the programming of sequence-dependent biofunction.”

Apart from ID’s expectation of finding new layers of information in the genome, the paper implicitly challenges some common evolutionary assumptions. The notion that shared synonymous codons are functionally irrelevant has been used to buttress arguments for Darwinian evolution.

For one thing, some evolutionists claim that phylogenetic signals can be carried by the distribution of synonymous codons since they’re functionally equivalent. This paper suggests otherwise.

For another, seeking to infer the activity of natural selection, evolutionary biologists statistically analyze the frequency of synonymous (thought to be functionally unimportant) and nonsynonymous (thought to be functionally important) codons in a gene. (We’ve discussed this previously here and here.) As the thinking goes, if synonymous codons are functionally unimportant, then three conclusions may follow: a bias toward synonymous codons implies purifying selection in the gene, a bias towards nonsynonymous codons implies positive selection, and an equal balance implies neutral evolution (no selection). But if synonymous codons can have important functional meaning, then the whole methodology goes out the window, and hundreds of studies that used these methods to infer “selection” during the supposed “evolution of genes” could be wrong.

The evidence supports the view that synonymous codons have divergent effects upon translation, as the paper finds: “Data shows that with fixed levels of tRNA’s, synonymously encoded mRNA’s translate with different speeds” and “Recent work has built on the above observations showing a strong relationship between specific arrangements of codons in mRNA to the rate of translation.” Genetic modifications in the lab can even induce translational pausing:

“Pausingfunction” is caused by specific mRNA codon sequences rather than by tunnel-protein interactions to amino acid sequences. This contention is supported by data involving the substitution of rare codons with synonymous codons in E. coli. If the pausing effect was solely related to the amino acid chain sequence, then replacing codons with synonymous codons should still produce the same folded amino acid chain with the same translation speed. However, substitution of rare codons with synonymous codons did produce a change in speed and conformation changes.

These changes in translational speed can have phenotypic effects:

For example, a silent mutation in the human gene ABCB1 caused a conformational change to occur in the P-glycoprotein. This protein folded differently caused by a temporal change in translation affecting the timing of the folding process. … Thus, the protein folding pathways are affected by changes in the coding regions of DNA” (internal citations removed).

In short, “redundant” codons are not necessarily redundant at all. As the paper puts it: “we show why the term “degeneracy” is completely inappropriate. The dual coding functionality of redundancy is anything but ‘degenerate.’ It represents, instead, far more sophistication, layers, and dimensions of formal prescription.” In fact, this paper “defines new universal linguistic-like rules needed to identify and characterize codon mappings of TP events.” The authors write:

The TP code exhibits distinct meaning in relation to mappings between codons and pausing units. The TP code also exhibits a syntax or grammar that obeys strict codon relationships that demonstrate language properties. Because of the redundancy of the genetic code, it could be argued that the TP language is a subset of the genetic language. The subspace of the TP language resides, and thus appears to have a dependency on, the primary genetic code. Within this subspace, however, we argue that the TP language is decoupled from and remains independent of the protein-coding language.

Their conclusion about the high-information capacity of the genetic code is striking:

Redundancy in the primary genetic code allows for additional independent codes. Coupled with the appropriate interpreters and algorithmic processors, multiple dimensions of meaning, and function can be instantiated into the same codon string. We have shown a secondary code superimposed upon the primary codonic prescription of amino acid sequence in proteins. Dual interpretations enable the assembly of the protein’s primary structure while enabling additional folding controls via pausing of the translation process. TP provides for temporal control of the translation process allowing the nascent protein to fold appropriately as per its defined function. This duality in the coding function acts to reduce the redundancy in the genetic code when viewed holistically. The functionality of condonic redundancy denies the ill-advised label of “degeneracy.” When simultaneously combined with other coding schemas such as intron/exon boundary conditions, and overlapping and oppositely oriented promoters, multiple dimensions of independent coding by the same codon string has become apparent.

In his 2001 book No Free Lunch, William Dembski explained the primary prediction of intelligent design:

[W]hat about the predictive power of intelligent design? Intelligent design offers one obvious prediction, namely, that nature should be chock-full of specified complexity and therefore should contain numerous pointers to design … This prediction is increasingly being confirmed. (p. 362)

Multidimensional codes and new levels of specified complexity are exactly what ID predicts, and they’re exactly what this paper is reporting. It’s this sort of sophisticated, information-rich control that is expected by intelligent design, in contrast to Darwinian biology which fails to anticipate it. On the contrary, Darwinian advocates publish mountains of papers banking upon the unquestioned assumption that there is no important, functional reason for the existence of “redundant” or “degenerate” features. Slowly but surely, the data are turning the tide in the evolution debate.


This table shows the 20 amino acids used in proteins, and the codons that code for each amino acid.
Ala A GCU, GCC, GCA, GCG Leu L UUA, UUG, CUU, CUC, CUA, CUG
Arg R CGU, CGC, CGA, CGG, AGA, AGG Lys K AAA, AAG
Asn N AAU, AAC Met M AUG
Asp D GAU, GAC Phe F UUU, UUC
Cys C UGU, UGC Pro P CCU, CCC, CCA, CCG
Gln Q CAA, CAG Ser S UCU, UCC, UCA, UCG, AGU,AGC
Glu E GAA, GAG Thr T ACU, ACC, ACA, ACG
Gly G GGU, GGC, GGA, GGG Trp W UGG
His H CAU, CAC Tyr Y UAU, UAC
Ile I AUU, AUC, AUA Val V GUU, GUC, GUA, GUG
Start AUG, GUG Stop UAG, UGA, UAA


Marshall W. Nirenberg and his lab at the National Institutes of Health performed the experiments which first elucidated the correspondence between the codons and the amino acids for which they code. Har Gobind Khorana expanded on Nirenberg's work and found the codes for the amino acids that Nirenberg's methods could not. Khorana and Nirenberg won a share of the 1968 Nobel Prize in Physiology or Medicine for this work.


Genetic Code and its Special Features | Genetics | Biology

In this article we will discuss about the discovery and special features of genetic code.

Once the role of DNA as a hereditary material was established, it was clear that DNA contained the information for the formation of protein molecules or polypeptides. The sequence of purines and pyrimidines bases along the DNA molecule determines the sequence of amino acids in protein molecules. But the question was how DNA instructs the sequence of amino acids. Only four bases of DNA in some way must be determining 20 types of amino acids that form the proteins.

It was known that the information for the sequence of amino acids of a protein was contained in the sequence of bases on mRNA, which in turn was governed by the sequence of nucleotide bases in the DNA. The genetic code is a sequence of nitrogenous bases along a sugar phosphate strand of a DNA molecule.

The discovery of genetic code became possible through the significant contributions of Francis H.C. Crick, Severo Ochoa, Marshall Nirenberg, Hargobind Khorana, and J.H. Matthei in early 1960s. For this work, Hargobind Khorana shared the Nobel Prize in 1968 with Nirenberg and Holley.

The set of nucleotides that specify one amino acid is a codon. The simplest possible code is a singlet code in which one nucleotide codes for one amino acid (Table 5). Such a code is inadequate, for only four amino acids could be specified. A doublet code is also inadequate because it could specify only 16 amino acids, whereas a triplet code could specify 64 amino acids.

The first experimental evidence in support to the concept of triplet codes is provided by Crick and co-workers in 1961. Crick observed that deletion or addition of one or two base pairs in DNA of T4 bacteriophage disturbed normal DNA functioning i.e. normal protein synthesis could not take place.

The mutation produces a ‘frame shift’ in the reading frame. However, when three base pairs were added or deleted, the disturbance caused was minimum. Crick’s experiment also suggested that the code is degenerate i.e. many of the amino acids are specified by more than one triplet.

The genetic code has following special, features:

A codon is triplet in nature and is dependent on the sequence of nitrogenous bases in the DNA molecule. A codon codes for a particular amino acid (see Table 6).

The genetic code is commaless. There are no ‘punctuation marks’ (gaps) between the coding triplets. Reading of the code begins at a fixed point and continues three nucleotides at a time, without a pause till the terminator codon, which marks the end of the message is reached. There is no punctuation (comma) between the adjacent codons.

The codon present in the beginning of the cistron is known as initiation codon. It marks the beginning of the message for a polypeptide chain. The initiation codon is AUG in majority of cases and it codes for amino acid methionine.

Similarly, the last codon of a cistron helps in reading the termination of polypeptide chain. This is known as termination codon. There are three termination codons- UAA, UGA and UAG. These were called nonsense codons, because these do not code for any of the 20 amino acids.

5. Degeneracy of Genetic Code:

More than one codon can code for a particular amino acid. This multiple system of coding is known as degenerate system or degenerate code. The major degeneracy occurs at the third position (3′ end of the triplet codon). When first two bases are specified, the same amino acid may be coded for whether the third base is U, C, A or G. The third base is known as Wobbly base.

Note the genetic codes for the following amino acids:

i. Serine – UCU, UCC, UCA, UCG, AGU and AGC.

ii. Arginine – CGU, CGC, CGA, CGG, AGA and AGG.

6. Code is Non-Overlapping:

The genetic code is non-overlapping. A base is a part of only one codon. The sequence CCGCAC is read only as CCG and CAC and not as CCG, CGC, GCA, CAC.

7. The code is usually non-ambiguous since a particular codon always codes for the same amino acid throughout the living world. But in the presence of streptomycin, UUU which normally codes for phenylalanine may also code for isoleucine, leucine or serine.

Genetic code represents sequence of codons in mRNA and the corresponding amino acid residues of a polypeptide chain are arranged in the same linear sequence. mRNA is linear with DNA and with amino acids in polypeptide chain.


References

Shepherd JC: Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. Proc Natl Acad Sci. 1981, 78 (3): 1596-1600.

Shepherd JC: Periodic correlations in DNA sequences and evidence suggesting their evolutionary origin in a comma-less genetic code. Journal of Molecular Evolution. 1981, 17 (2): 94-102.

Zhurkin VB: Periodicity in DNA primary structure is defined by secondary structure of the coded protein. Nucleic Acids Res. 1981, 9 (8): 1963-1971.

Bibb MJ, Findlay PR, Johnson MW: The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. Gene. 1984, 30 (1–3): 157-166. 10.1016/0378-1119(84)90116-1

Silverman BD, Linsker R: A measure of DNA periodicity. J Theor Biol. 1986, 118 (3): 295-300.

Baldi P, Brunak S, Chauvin Y, Engelbrecht J, Krogh A: Periodic sequence patterns in human exons. Proc Int Conf Intell Syst Mol Biol. 1995, 3: 30-38.

Gutiérrez G, Oliver JL, Marin A: On the origin of the periodicity of three in protein coding DNA sequences. J Theor Biol. 1994, 167 (4): 413-414. 10.1006/jtbi.1994.1080

Trifonov EN: Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. J Mol Biol. 1987, 194 (4): 643-652.

Trifonov EN: Elucidating sequence codes: three codes for evolution. Ann NY Acad Sci. 1999, 870: 330-338.

Eigen M, Winkler-Oswatitsch R: Transfer-RNA: the early adaptor. Naturwissenschaften. 1981, 68 (5): 217-228.

Storn R, Price K: Differential evolution – A simple and efficient heuristic for global optimization over continuous spaces. J Global Optimization. 1997, 11: 341-359. 10.1023/A:1008202821328. 10.1023/A:1008202821328

Piyasatian N, Kinghorn BP: Balancing genetic diversity, genetic merit and population viability in conservation programs. Journal of Animal Breeding and Genetics. 2003, 120: 137-149. 10.1046/j.1439-0388.2003.00383.x. 10.1046/j.1439-0388.2003.00383.x

Tiwari S, Ramachandran S, Bhattacharya S, Ramaswamy R: Prediction of probable genes by Fourier analysis of genomic sequences. Computer Applications in the Biosciences. 1997, 13: 263-270.

Yan M, Lin Z, Zhang C: A new Fourier transform approach for protein coding measure based on the format of the Z curve. Bioinformatics. 1998, 14 (8): 685-690. 10.1093/bioinformatics/14.8.685

Arques DG, Lapayre JC, Michel CJ: Identification and simulation of shifted periodicities common to protein coding genes of eukaryotes, prokaryotes and viruses. J Theor Biol. 1995, 172 (3): 279-291. 10.1006/jtbi.1995.0024

Konopka AK, Smythers GW, Owens J, Maizel JV: Distance analysis helps to establish characteristic motifs in intron sequences. Gene Anal Tech. 1987, 4 (4): 63-74. 10.1016/0735-0651(87)90020-3

Saxonov S, Daizadeh I, Fedorov A, Gilbert W: EID: the Exon-Intron Database – an exhaustive database of protein-coding intron-containing genes. Nucleic Acids Res. 2000, 28: 185-190. 10.1093/nar/28.1.185