Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When aggregating stats in this manner (by Day) how do people deal with Time Zones?

For instance, if I have one user in, say, NZST, their "Tuesday, 22 February" is still "Monday, 21 February" in PST - and the real issue is that the buckets are off. So you can't just store in UTC and then move it by whatever timezone offset, as then you are grabbing different "buckets".

I don't think that explanation is very clear (I had to draw a diagram to figure it out myself). Hopefully someone smarter than I am can figure it out anyway.

We've worked around it by just storing hour aggregates, but I'm interested in case someone else has a smart solution :)



You're right, when your finest granularity is "per day" you lose any real sense of what a day means to the user (and when it starts and ends).

Right now we don't have a solution. As it turns out (I guess) our users don't mind. In the future we want to provide certain (or all) stats also in the "last 24 hours" (rolling) time frame, which will help. One benefit of our Redis-powered analytics is that they're live. Even if your idea of a "day" is different than ours, you can see the count/totals/averages/etc updating in realtime (relative to the "day" we decide on). So users can get instant feedback if something is happening, but for the most part only care about day over day stats (which make the TZ matter a lot less).


The simple solution is to store everything in UTC, but that means the smallest resolution you can store is 30 minutes (some time zones are :30 offset, and you ignore the Chatham islands that are :45 offset - http://en.wikipedia.org/wiki/Time_zones). It's really annoying and I have wished many times everyone would just use UTC.


Flickr decided that UTC would be the default for their stats. As long as you stick to it, it's not that big of a problem.


Not if you're trying to save RAM (and operations to fetch said data) by storing stats per day. You'd need all stats to be per-hour in order for it work for any timezone.


If you store by day UTC, you'd need two fetches to get a day in some other time zone. But if you store by hour, you need 24 fetches.


They can't in their (Disqus) case because they're aggregating all the stats per day into a single value. I guess they could do it 'per account TZ' since the account name is in the key, but that means TZ calc on each write (not that that will make a huge difference in perf).

For a generic solution with easy TZ calc, you need to aggregate your stats into hourly values instead (or half hourly if you care about those wacky non-aligned timezones). The increased fetches don't matter because you can just mget them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: