8th
2010
How to use MongoDB to collect summary stats
When you have a lot of disparate jobs running in lots of separate processes across many different machines, it’s really helpful to collect various stats from them about how they’re doing. I’ve found mongodb to be very helpful for this. For my purposes, storing daily counts is sufficient and keeps the collection from getting too big. The thing that makes this particularly easy (at least using the ruby driver, others may not behave this way) is that when you pass a hash to the $inc operation, a mongodb update will treat each key of the hash as an attribute of the document and add the value in the hash for that key to the value already in the document for that attribute. Specifying an upsert will handle any missing documents. Here’s an example:
mdb = Mongo::Connection.new("mongodb").db("stats")
stat_coll = mdb.collection("daily_stats")
## initialize the hash with a default value for each key as 0
stats = Hash.new(0)
## do your stuff, and set the hash values as you go
some_loop.each{|thing|
stats['thing1'] += 1 if thing == 1
stats['thing2'] += 1 if thing == 2
}
## then pass the whole bunch to the db in one operation
stat_coll.update({'date' => Time.now.to_date.to_time}, {'$inc' => stats}, {:upsert => true})
mdb.connection.close
You can easily query the collection from the mongo shell for the results from the past few days, including only the fields you’re interested in, skipping ahead as needed to ignore earlier entries:
db.daily_stats.find({},{date:1, thing1:1, thing2:1}).skip(20);