Adam Fields (work stuff) RSS

This is my blog about work stuff.

See this post for discussion of what this blog is about and what I do.

I am often available for consulting work, and always happy to discuss it even if I'm currently very busy. Email me or find me on twitter @fields if you need something.

Archive

Oct
8th
2010
Fri
permalink

How to use MongoDB to collect summary stats

When you have a lot of disparate jobs running in lots of separate processes across many different machines, it’s really helpful to collect various stats from them about how they’re doing. I’ve found mongodb to be very helpful for this. For my purposes, storing daily counts is sufficient and keeps the collection from getting too big. The thing that makes this particularly easy (at least using the ruby driver, others may not behave this way) is that when you pass a hash to the $inc operation, a mongodb update will treat each key of the hash as an attribute of the document and add the value in the hash for that key to the value already in the document for that attribute. Specifying an upsert will handle any missing documents. Here’s an example:

mdb = Mongo::Connection.new("mongodb").db("stats")
stat_coll = mdb.collection("daily_stats")

## initialize the hash with a default value for each key as 0
stats = Hash.new(0)
## do your stuff, and set the hash values as you go
some_loop.each{|thing|
  stats['thing1'] += 1 if thing == 1
  stats['thing2'] += 1 if thing == 2
}

## then pass the whole bunch to the db in one operation
stat_coll.update({'date' => Time.now.to_date.to_time}, {'$inc' => stats}, {:upsert => true})
mdb.connection.close
You can easily query the collection from the mongo shell for the results from the past few days, including only the fields you’re interested in, skipping ahead as needed to ignore earlier entries:
db.daily_stats.find({},{date:1, thing1:1, thing2:1}).skip(20);

Comments (View)

blog comments powered by Disqus