<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>Disqus - Latest Comments for Ed Erwin</title><link>http://disqus.com/people/c8a6eefbba9e787a63a4ce7be49408bd/</link><description></description><language>en</language><lastBuildDate>Fri, 27 Jun 2008 01:48:59 -0000</lastBuildDate><item><title>Re: How Much Data is a Human Genome? Not Much.</title><link>http://thinkgene.disqus.com/how_much_data_is_a_human_genome_not_much/#comment-2464582</link><description>Your calculations are pretty much correct.  The reference human genome still contains some unknown portions, so you need to be able to represent at least one possibility in addition to ACGT.  But since you were talking probably about the real human genome, not the current unfinished data, that problem wouldn't apply.&lt;br&gt;&lt;br&gt;Using the ".2bit" format, human genome version "hg18" fits into a file listed here as 770 MB.&lt;br&gt;&lt;br&gt;&lt;a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/" rel="nofollow"&gt;http://hgdownload.cse.ucsc.edu/goldenPath/hg18/...&lt;/a&gt; &lt;br&gt;&lt;br&gt;The format is described here:&lt;br&gt;&lt;br&gt;&lt;a href="http://genome.ucsc.edu/FAQ/FAQformat#format7" rel="nofollow"&gt;http://genome.ucsc.edu/FAQ/FAQformat#format7&lt;/a&gt;&lt;br&gt;&lt;br&gt;As well as the earlier, and still sometimes useful, "nibble" format that used 2-bases per byte.&lt;br&gt;&lt;br&gt;In biology, the sequence of ACGT isn't all that contains inherited information.  (There is also all the proteins you inherit along with DNA, and DNA methylation, and lots more stuff still to discover.) But I wouldn't know where to start to compute the information content there.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ed Erwin</dc:creator><pubDate>Fri, 27 Jun 2008 01:48:59 -0000</pubDate></item></channel></rss>