We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

Guest • 9 years ago

Hi Karl, thanks for the helpful document. I just want to know shall we know the status of long map reduce function in-between. I have some knowledge about db.currentOp() but i have doubt how it should work with multiple operations running on same db ? Will it give me result for particular map reduce when other operations are also running. Please suggest.

Apurva Anil Kunkulol • 10 years ago

This was Exactly according to my requirements. Thanks Karl.

Absalon Luis Opazo Garcia • 11 years ago

really clear. Thanks

Praveen Kumar J • 12 years ago

Nice article on MapReduce!
Thanks for the post.

Brian Nesbitt • 13 years ago

Wouldn't it be easier to just "log" the data in the aggregated format and skip this map/reduce transformation step?

To get your output from above, its just a single upsert with a $inc modifier.
This is virtually no extra overhead at runtime and atomic as well.

db.hits.update( {game_id:1, year:2011, month:1, day:20}, {$inc: {count:1}}, true);

* Assume delete data anyway

Depending on what you offer in the display UI your going to have to map/reduce again on the aggregated data... or sum in code.

Karl Seguin • 13 years ago

Yes, for this single case it does. However, as you generate more statistics, possibly by logging more data in hits (userid, ip address) the balance switches.

First, writing a single entry into a collection will be faster than calculating multiple statistics and writing them into multiple collections (plus it can be done away from peak hours). Secondly, some statistics can only be meaningfully calculate against the entire set of data. Calculating how many unique users played yesterday can't easily be done as the hits come in (it can, but it requires a lot of temporary storage and extra queries).

All that said, you're point is particularly relevant to the work we did on mogade because one of the statistics proved difficult to do, and so we added extra information to hits on insert. Which would agree with you that even if you can't do it all when the hit happens, sometimes you have do some of it.

Johan Ismael • 13 years ago

The clearest explanation of Map-Reduce I've seen sor far !

Michael Woloszynowicz • 13 years ago

Thanks for the helpful post. It's interesting to see MapReduce applied directly to MongoDB. Although it would require a bit of code it wouldn't be much work to tie this in with Hadoop to get a distributed system.

Brendan McAdams • 13 years ago

We actually provide a Hadoop integration layer for Mongo which can read from MongoDB into Hadoop and write from Hadoop out to MongoDB:

https://github.com/mongodb/...

Michael Woloszynowicz • 13 years ago

Fantastic, thanks Brendan!

Karl Seguin • 13 years ago

agreed. could definitely come useful.