Explained: First full human genome sequenced 20 years after the first draft; Indian software plays key role
The complete sequence of a human genome marks a new era of genomics where no region of the genome-the entire human genetic code-is beyond reach.
Researchers from the Telomere-to-Telomere (T2T) consortium, which is an international collaboration of around 30 institutions, have sequenced the first truly complete human reference genome. The achievement can be the largest improvement to the human reference genome since its initial release 20 years ago.
The paper titled ‘The complete sequence of a human genome’ dubs the new sequence ‘T2T-CHM13’.
The final validation of the sequence was also aided by software developed by Chirag Jain who is an assistant professor at the Department of Computational and Data Sciences, Indian Institute of Sciences (IISc).
Why the achievement is significant?
• The achievement marks a new era of genomics where no region of the genome-the entire human genetic code-is beyond reach.
• This also unlocks newer regions in the human DNA and holds the potential of enhancing the understanding of a wide variety of disorders affecting people.
• It can also lead to a better genetic screening that enables quick and specific diagnostic tests to treat various maladies.
What is Human Genome?
It is the complete set of DNA. DNA strands are like a four-letter language- four chemical units or bases that are the alphabet.
The letters combine specifically with the letters in the opposite strand to form words (base pairs or bp), encoding information. All these words are stored in chromosomes in the human cells.
So, if the human genome were a history book, it will have around 3 billion words (base pairs) across 22 chapters (chromosomes) giving information about the human journey through time with a detailed blueprint for building every human cell that will give health care providers new powers to treat, prevent and cure diseases.
First draft of the human genome: What was it missing?
Celera Genomics and International Human Genome Sequencing, in 2001, published the first drafts of the human genome and revolutionized genomics. However, there were gaps.
As per Nature (scientific journal), sequencing was not truly complete and about 15% was missing owing to technological limitations. Subsequently, scientists solved some of the puzzles, but the most recent human genome, which the geneticists have used as reference since 2013, still lacked 8% of the full sequence.
So, if 8% of the genome was not sequenced earlier, it meant that some pages of the history book were still missing, meaning, that not all of the 3 billion + base pairs that each human genome contains, was sequenced at the time.
What the latest research has achieved?
Now, in the latest development, researchers at T2T have successfully sequenced the first truly complete human genome.
According to the paper titled ‘The complete sequence of a human genome’, addressing this 8% gap, T2T has successfully completed the first truly complete sequence of a human genome.
The sequence reference includes gapless assemblies for all 22 autosomes plus Chromosome X (which look the same in females and males), introduces 200 million bp of novel sequence containing 2,226 gene copies, correct errors; 115 are predicted to be protein-coding- important to understand diseases.
Newly completed regions include all the centromeric satellite arrays and the short arms of all five acrocentric chromosomes.
Satellite arrays-known to vary extensively in the human population- will help medical genomics and thereby will give a better understanding of inherited variation that underlies human evolution, physiology, and diseases.
Similarly, the better understanding of acrocentric chromosomes, which are linked to the disorders such as Down Syndrome, also has its usefulness.
What role does Indian software play in this breakthrough?
Chirag Jain, an Assistant professor at the Department of Computational and Data Sciences, Indian Institute of Sciences (IISc) informed that genome construction involved many newly designed computer algorithms, software to process sequencing data and turning it into a complete human genome.
One software (Winnowmap2) was developed and contributed by him with collaborators and it was critical in the final validation of the genome.
Pointing out that the software takes genome sequencing data as input and maps it into genome assembly, Chirag added that the mapping method had to take into account a large number of repetitive segments.
He said that the presence of repeats in the genome makes it challenging as there are many possible alignment candidates for a sequence, and a correct one is rarely obvious.
Once the data was correctly aligned, differences found between the genome and sequencing data showed few mistakes which were corrected by T2T before the final genome released.
Full human genome sequenced: Is it the last word?
The new sequence ‘T2T-CHM13’ achieved by the researchers represents one person’s genome and T2T has now teamed up with the Human Pangenome Reference consortium to sequence over 300 genomes from people across the world.
As per Nature (scientific journal), the new sequence is not the last word on the human genome as T2T had trouble resolving few regions on chromosomes, and estimates that about 0.3% of the genome might contain errors.
T2T researchers, in their paper, have noted that one limitation of CHM13 is the lack of Y Chromosomes. In order to finish a T2T reference sequence for all the human chromosomes, they are now in the process of sequencing and assembling the Y Chromosomes.