DISQUS

DISQUS Hello!  The comments on this profile are unclaimed and thus are unverified.

Do they belong to you? Claim these comments.

Ed Erwin's picture

Unregistered

Feeds

aliases

  • Ed Erwin

Ed Erwin

1 year ago

in How Much Data is a Human Genome? Not Much. on Think Gene
Your calculations are pretty much correct. The reference human genome still contains some unknown portions, so you need to be able to represent at least one possibility in addition to ACGT. But since you were talking probably about the real human genome, not the current unfinished data, that problem wouldn't apply.

Using the ".2bit" format, human genome version "hg18" fits into a file listed here as 770 MB.

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/...

The format is described here:

http://genome.ucsc.edu/FAQ/FAQformat#format7

As well as the earlier, and still sometimes useful, "nibble" format that used 2-bases per byte.

In biology, the sequence of ACGT isn't all that contains inherited information. (There is also all the proteins you inherit along with DNA, and DNA methylation, and lots more stuff still to discover.) But I wouldn't know where to start to compute the information content there.
Returning? Login