E's flat, ah's flat too: Sequencing the human genome... again

The CSIR is back in the news, this time for more pleasant reasons: sequencing the human genome. Several news items appeared on the achievement, by IGIB, Delhi, last week.

But not all the coverage is positive: as Arvind pointed out in a comment on an earlier post, some senior scientists (including Pushpa Bhargava) question the importance of the achievement as well as the ethics of announcing it to the media before it has been published in a peer-reviewed journal.

I agree on the latter point, and am unsure of the former -- but that is the job of peer-reviewers. It is how science works. The media is not qualified to evaluate new scientific claims. The peer-review system, as it currently exists, has its problems, but replacing qualified reviewers by journalists is hardly the solution (I assume that the work has indeed been or is being submitted to a relevant journal, though).

But there were a few other aspects of the original news report that left me disturbed. The comparison of time frames -- six weeks for this project, 13 years for the original Human Genome Project -- is quite inappropriate. It is always easier to do something for the second time. The original project was developing new sequencing technologies and computational algorithms that took the major part of those 13 years; eventually, the successful method (and one that is widely used today, including, I expect, by the IGIB group) is called the "shotgun" method, where many overlapping fragments are sequenced and then assembled like a jigsaw. These fragments are about 30 nucleotides long, while the human genome has about 4 billion nucleotides in it. Moreover, the genome is highly repetitive, and many of the short fragments would be identical and it would be hard to correctly "assemble" them. Worse, sequencing the short fragments is itself not an error-free method: one or two errors per fragment are expected. So when Celera Genomics came up with the method, they encountered considerable skepticism. Nevertheless, it proved to be the most feasible approach. To alleviate the problem of repetitive regions and sequencing errors, every part of the genome is "covered" 20-30 times by multiple fragments. Even so, completing the assembly for a new organism is a tedious and error-prone process requiring sophisticated software and much human judgement. The point is that these problems are now much better understood than when the Human Genome Project undertook its task, and software is continually getting better. If the IGIB team made significant algorithmic or technical innovations, hopefully they will be described in an upcoming paper.

Even more importantly, the task of sequencing a new human is much easier than that of sequencing a previously unsequenced organism, because a reference genome already exists, and the variation between different humans would be expected to be very small. As I wrote in a comment in reply to Arvind above: one can compare it with assembling a jigsaw with a few billions of pieces, many of which are identical or almost identical, without knowing the "big picture"; versus assembling it with the big picture available to you, knowing that there are only minor differences from the "reference picture". Technologically, there is nothing very hard any more about this. Equipment and software is marketed for the purpose by large biotech companies like Illumina, and is in use all over the world. If the IGIB team has made significant technical innovations, that is of interest, but it has not been mentioned in news items and it should, of course, be peer reviewed before it hits the media.

As for possible medical benefits: similar claims were made in support of the original Human Genome Project, but little benefit is seen so far. But these things should be seen as basic research, with medical benefits a possible and welcome spin-off, but not the primary goal. It is not at all easy to "link" specific genetic variations with specific diseases, and sequencing a handful of new genomes will not, I think, directly aid that problem. So what is the primary scientific accomplishment here? The news items don't say, but then, they should not be the primary medium of communicating this work. I look forward to the peer-reviewed article, when it appears.

E's flat, ah's flat too

Sunday, December 13, 2009

Sequencing the human genome... again

No comments:

About Me

Blog Archive