Re: DBGraffle4 : Automatically draw SQL schema in OmniGraffle

pauljdavis — Wed, 16 Sep 2009 23:47:04 -0000

Wow, reaching back aways here. Just wanted to point out it was Jeff that let his domain expire but I forgive him cause I ended up killing is laptop and buying it off him.

Way cool that people are still finding this useful though.

Re: Trying out mapreduce

pauljdavis — Wed, 02 Sep 2009 17:42:28 -0000

Hey Jan,

The important point that you're missing is that your reducer stage receives its input in sorted order. So think of it as this shell pipeline:

$ cat snps.txt | ruby snp_mapper.rb | sort | ruby snp_reducer.rb | sort

Your reduce algorithm then works a line at a time with an accumulator. Each new line you check if its a new key, and if so write out the current accumulator and reset it. Some code from one of the AWS Elastic Map/Reduce examples should help clarify:

def reducer1(args):
'''
Only needed for command line testing, use "aggregate" with Hadoop
'''
last_item, item_count = None, 0
for line in sys.stdin:
item = line.strip().split('\t')[0].split(':')[1]
if last_item != item and last_item is not None:
print '%s\t%s' % (last_item, item_count)
last_item, item_count = None, 0
last_item = item
item_count += 1
print '%s\t%s' % (last_item, item_count)

I doubt that the formatting survives, but you can find the code at [1].

Anyway, that's how you get rid of the hash in your reducer and in fact is at the heart of why map/reduce can index the internet.

[1] http://developer.amazonwebs...

Disqus - Latest Comments for pauljdavis

Re: DBGraffle4 : Automatically draw SQL schema in OmniGraffle

Re: Trying out mapreduce