<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Disqus - Latest Comments for pauljdavis</title><link>http://disqus.com/by/pauljdavis/</link><description></description><atom:link href="http://disqus.com/pauljdavis/comments.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Wed, 16 Sep 2009 23:47:04 -0000</lastBuildDate><item><title>Re: DBGraffle4&amp;#160;: Automatically draw SQL schema in OmniGraffle</title><link>http://overooped.com/post/89728630#comment-16792865</link><description>&lt;p&gt;Wow, reaching back aways here. Just wanted to point out it was Jeff that let his domain expire but I forgive him cause I ended up killing is laptop and buying it off him.&lt;/p&gt;&lt;p&gt;Way cool that people are still finding this useful though.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pauljdavis</dc:creator><pubDate>Wed, 16 Sep 2009 23:47:04 -0000</pubDate></item><item><title>Re: Trying out mapreduce</title><link>http://saaientist.blogspot.com/2009/09/trying-out-mapreduce.html#comment-15817476</link><description>&lt;p&gt;Hey Jan,&lt;/p&gt;&lt;p&gt;The important point that you're missing is that your reducer stage receives its input in sorted order. So think of it as this shell pipeline:&lt;/p&gt;&lt;p&gt;$ cat snps.txt | ruby snp_mapper.rb | sort | ruby snp_reducer.rb | sort&lt;/p&gt;&lt;p&gt;Your reduce algorithm then works a line at a time with an accumulator. Each new line you check if its a new key, and if so write out the current accumulator and reset it. Some code from one of the AWS Elastic Map/Reduce examples should help clarify:&lt;/p&gt;&lt;p&gt;def reducer1(args):&lt;br&gt;  '''&lt;br&gt;  Only needed for command line testing, use "aggregate" with Hadoop&lt;br&gt;  '''&lt;br&gt;  last_item, item_count = None, 0&lt;br&gt;  for line in sys.stdin:&lt;br&gt;    item = line.strip().split('\t')[0].split(':')[1]&lt;br&gt;    if last_item != item and last_item is not None:&lt;br&gt;      print '%s\t%s' % (last_item, item_count)    &lt;br&gt;      last_item, item_count = None, 0  &lt;br&gt;    last_item = item&lt;br&gt;    item_count += 1       &lt;br&gt;  print '%s\t%s' % (last_item, item_count)&lt;/p&gt;&lt;p&gt;I doubt that the formatting survives, but you can find the code at [1].&lt;/p&gt;&lt;p&gt;Anyway, that's how you get rid of the hash in your reducer and in fact is at the heart of why map/reduce can index the internet.&lt;/p&gt;&lt;p&gt;[1] &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2274&amp;amp;categoryID=263" rel="nofollow noopener" target="_blank" title="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2274&amp;amp;categoryID=263"&gt;http://developer.amazonwebs...&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pauljdavis</dc:creator><pubDate>Wed, 02 Sep 2009 17:42:28 -0000</pubDate></item></channel></rss>