Saturday, 20 August 2011

What's wrong with DateTime anyway?

A few times after tweeting about Noda Time, people have asked why they should use Noda Time - they believe that the .NET date and time support is already good enough. Now obviously I haven't seen their code, but I suspect that pretty much any code base doing any work with dates and times will be clearer using Noda Time - and quite possibly more correct, due to the way that Noda Time forces you into making some decisions which are murky in .NET. This post is about the shortcomings of the .NET date and time API. Obviously I'm biased, and I hope this post isn't seen as disrespectful to the BCL team - aside from anything else, they work under a different set of constraints regarding COM interop etc.

What's does a DateTime mean?

When there's a Stack Overflow question about DateTime not doing quite what the questioner expected, I often find myself wondering what a particular value is meant to represent, exactly. It sounds simple - it's a date and time, right? But it gets rather more complicated as soon as you start thinking about it more carefully. For example, assuming the clock doesn't tick between the two property invocations, what should the value of "mystery" be in the following snippet?

DateTime utc = DateTime.UtcNow;
DateTime local = DateTime.Now;
bool mystery = local == utc;

I honestly don't know what this will do. There are three options which all make a certain amount of sense:

  • It should always be true: the two values are associated with the same instant in time, it's just that one is expressed locally and one is expressed universally
  • It should always be false: the two values represent different kinds of data, so are automatically unequal
  • It should return true if your local time zone is currently in sync with UTC, i.e. when time zones are disregarded completely, the two values are equal

I don't care much what the actual behaviour is - the fact that the behaviour is unobvious is a symptom of a deeper problem. It all comes back to the DateTime.Kind property which allows a DateTime to represent one of three kinds of value:

  • DateTimeKind.Utc: A UTC date and time
  • DateTimeKind.Local: A date and time which is a local time for the system the code is executing on
  • DateTimeKind.Unspecified: Um, tricky. Depends on what you do with it.

The value of the property affects various different operations in different ways. For example, if you call ToUniversalTime() on an "unspecified" DateTime, it will assume that you really meant it as a local value before. On the other hand, if you call ToLocalTime() on an "unspecified" DateTime, it will assume that you really meant it as a UTC value before. That's one model of behaviour.

If you construct a DateTimeOffset from a DateTime and a TimeSpan, the behaviour is somewhat different:

  • A UTC value is simple - you've given it UTC, and you want to represent "UTC + the specified offset"
  • A local value is only sometimes valid: the constructor validates that the offset from UTC at the specified local time in the system default time zone is the same as the offset you've specified.
  • An unspecified value is always valid, and represents the local time in some unspecified time zone, such that the offset is valid at the time.

I don't know about you, but this sort of thing gives me the semantic heebie-jeebies. It's like having a "number" type which has a sequence of digits - but you have to ask another property whether those digits are hex or decimal, and the answer can sometimes be "Well, what do you think?"

Of course, in .NET 1.1, DateTimeKind didn't even exist. This didn't mean the problem didn't exist - it means that the confusing behaviour which tries to make sense of a type which represents different kinds of value couldn't even try to be consistent. It had to be based on the context: it was as if it were permanently Unspecified.

Doesn't DateTimeOffset fix this?

Okay, so now we know we don't like DateTime much. Does DateTimeOffset help us? Well, somewhat. A DateTimeOffset value has a very definite meaning: it's a local date and time with a specific offset from UTC. I should probably take a moment to explain what I mean by "local" date and times - and instants - at this point.

A local date and time isn't tied to any particular time zone. At this moment, is it before or after "10pm on August 20th 2011"? It depends where you are in the world. (I'm leaving aside any non-ISO calendar representations for the moment, by the way.) So a DateTimeOffset contains a time-zone-independent component (that "10pm on ..." part) but also an offset from UTC - which means it can be converted to an instant on what I think of as the time line. Ignoring relativity, everyone on the planet experiences a a particular instant simultaneously. If I click my fingers (infinitely quickly!) then any particular event in the universe happened before that instant, at that instant or after that instant. Whether you were in a particular time zone or not is irrelevant. In that respect instants are global compared to the local date and time which any particular individual may have observed at a particular instant.

(Still with me? Keep going - I'm hoping that the previous paragraph will end up being the hardest in this post. It's a hugely important conceptual point though.)

So a DateTimeOffset maps to an instant, but also deals with a local date and time. That means it's not really an ideal type if we only want to represent a local date and time - but then neither is DateTime. A DateTime with a kind of DateTimeKind.Local isn't really local in the same sense - it's tied to the default time zone of the system it's running on. A DateTime with a kind of DateTimeKind.Unspecified is closer in some cases - such as when constructing a DateTimeOffset - but the semantics are odd in other cases, as described above. So neither DateTimeOffset nor DateTime are good types to use for genuinely local date and time values.

DateTimeOffset also isn't a good type to use if you want to tie yourself to a specific time zone, because it has no idea of the time zone which gave the relevant offset in the first place. As of .NET 3.5 there's a pretty reasonable TimeZoneInfo class, but no type which talks about "a local time in a particular time zone". So with DateTimeOffset you know what that particular time is in some unspecified time zone, but you don't know what the local time will be a minute later, as the offset for that time zone could change (usually due to daylight saving time changes).

What about dates and times?

So far I've only been talking about "date and time" values. What about date values and time values - values which only have one component or the other. It's more common to want to represent a date than a time, but both are common enough to be worth considering.

Now yes, you can use a DateTime for a date - heck, there's even the DateTime.Date property which will return the date for a particular date and time... but as another DateTime which happens to be at midnight. That's not at all the same as having a separate type which is readily identifiable as "just a date" (and likewise "just a time of day" - .NET uses TimeSpan for that, which again doesn't really feel quite right to me).

What about time zones themselves? Surely TimeZoneInfo is fine there.

As I said before, TimeZoneInfo isn't bad. It suffers from two major problems and some minor ones:

First, it's all based on Windows time zone IDs. That's natural enough - but it's not what the rest of the world uses. Every non-Windows system I've ever seen is based on the Olson (aka tz aka zoneinfo) time zone database, and the IDs assigned there. You may have seen IDs such as "Europe/London" or "America/Los_Angeles" - those are Olson identifiers. Talk to a web service offering geo information, chances are it'll talk in Olson identifiers. Interact with another calendaring system, chances are it'll talk in Olson identifiers. Now there are problems there too in terms of identifier stability, which the Unicode Consortium tries to address with CLDR... but at least you've got a good chance. It would be nice if TimeZoneInfo offered some kind of mapping between the two identifier schemes, or somewhere else in .NET did. (Noda Time knows about both sets of identifiers, although the mapping isn't publicly accessible just yet. This will be fixed before release.)

Second, it's based on DateTime and DateTimeOffset, which means you've got to be careful when you use it - if you assume one kind of DateTime when you're actually giving or receiving another kind, you may have problems. It's reasonably well documented, but frankly explaining this sort of thing is intrinsically hard enough without having to put everything in terms which are inconsistent.

Then there are a few issues around ambiguous or invalid local date and time values. These occur due to daylight saving changes: if the clock goes forward (e.g. from 1am to 2am) that introduces some invalid local date and time values (e.g. 1.30am doesn't occur on that day). If the clock goes backward (e.g. from 2am to 1am) that introduces ambiguities: 1.30am occurs twice. You can explicitly ask TimeZoneInfo whether a particular value is invalid or ambiguous, but it's easy to miss that it's even a possibility. If you try to convert a local value to a UTC value via a time zone, it will throw an exception if it's invalid but silently assume standard time (as opposed to daylight saving time) if it's ambiguous. That sort of decision leads developers to not even consider the possibilities involved. Speaking of which...

This all sounds too complicated.

You may be thinking at this point, "You're making a big deal out of nothing. I don't want to think about this stuff - why are you trying to make everything so complicated? I've been using the .NET API for years, and not had problems." If so, I suspect there are three broad possibilities:

  • You're far, far smarter than I am, and understood all of these intricacies through intuition. Your code always makes use of the right kind of DateTime, uses DateTimeOffset appropriately, and will always do the right thing with invalid or ambiguous local date and time values. No doubt you also write lock-free multi-threaded code sharing state in a way which is as efficient as possible but still rock solid. What the heck are you doing reading this in the first place?
  • You have run into these issues, but have mostly forgotten them - after all, they've only sucked away 10 minutes of your life at a time, as you experimented to get something that appeared to work (or at least made the unit tests pass; the unit tests which may well be conceptually wrong too). Maybe you've wondered about it, but decided that the problem was with you rather than the API.
  • You've never seen the problems, but only because you don't bother testing your code, which has so far only ever run in a single time zone, on computers which are always turned off at night (thus missing all daylight saving transitions). In some ways you're lucky, but you've got a time zone.

Okay, that was somewhat facetious, but it really is a problem. If you've never really thought about the difference between "local" times and "global" instants before, you should have done. It's an important distinction - similar to the distinction between binary floating point and decimal floating point types. Failures can be subtle, hard to diagnose, hard to explain, hard to correct, pervasive, and easy to reintroduce at another point of the program.

Handling date and time values is intrinsically tricky. There are nasty cases to think about like days which don't start at midnight due to daylight saving changes (for example, Sunday October 17th 2010 in Brazil started at 1am). If you're particularly unlucky you'll have to work with multiple calendar systems (Gregorian, Julian, Coptic, Buddhist etc). If you deal with dates and time around the start of the 20th century you may see some very odd time zone transitions as countries went from strictly-longitudinal offsets to mostly "round" values (e.g. Paris in 1911). You may need to deal with governments changing time zone transitions with only a couple of weeks' notice. You may need to deal with time zone identifiers changing (e.g. Asia/Calcutta to Asia/Kolcata).

All of this is on top of the actual business rules you're trying to implement, of course. They may be complicated too. Given all this complexity, you should at least have an API which allows you to express what you mean relatively clearly.

So is Noda Time perfect then?

Of course not. Noda Time suffers several problems:

  • Despite all of the above, I'm a rank amateur when it comes to the theory of date and time. Leap seconds baffle me. The thought of a Julian-Gregorian calendar with a cutover point makes me want to cry, which is why I haven't quite implemented it yet. As far as I'm aware, no-one involved in Noda Time is an expert - although Stephen Colebourne, the author of Joda Time and lead of JSR-310 lurks on the mailing list. (Point of trivia: He was present at my first presentation on Noda Time. I asked if anyone happened to know the difference between the Gregorian calendar and the ISO-8601 calendar. He raised his hand and gave the correct answer, obviously. I asked how he happened to know it, and he replied, "I'm Stephen Colebourne." I nearly collapsed.)
  • We haven't finished yet. A beautifully designed API is useless if it isn't implemented.
  • There are bound to be bugs - the BCL team's code is exercised on hundreds of thousands of machines around the world all the time. Errors are likely to be picked up quickly.
  • We don't have any resources - we're a small group of active developers doing this for fun. I'm not saying that for pity (it's great fun) but for the inevitable issues around the amount of time that can be spent on features, documentation etc.
  • We're not part of the BCL. Want to use Noda Time in a LINQ to SQL (or even NHibernate) query? Good luck with that. Even if we succeed beyond my expectations, I'm not expecting other open source projects to take a dependency on us for ages.

Having said that, I am pleased with the overall design. We've tried to keep a balance between flexibility and providing one simple way of achieving any particular goal (with more to do, of course). I'll write another post some time about the design style we've been gradually evolving towards, comparing it with both Joda Time and .NET. The best outcome is the set of types to come out of it, each of which has a reasonably clear role. I won't bore you with all the details here - see other posts, documentation etc.

Ironically, the best outcome for the world would probably be for the BCL team to pick up on this post and decide to overhaul the API radically for .NET 6 (I'm assuming the ship has effectively sailed on .NET 5). While I'm enjoying doing this, I'm sure there are other projects I'd enjoy too - and frankly date and time is too important a concept to rest on my shoulders for the .NET community for long.

Conclusion

I hope I've persuaded you that the .NET API has significant flaws. I may have also persuaded you that Noda Time is worth looking at more closely, but that's a secondary goal really. If you truly understand the flaws in the built-in types - in particular the semantic ambiguity around DateTime - then you're more likely to use those types carefully and accurately in your code. That alone is enough to make me happy.

16 comments:

  1. Excellent post, Jon!

    I have encountered many of the issues you discussed. After diving into DateTime craziness for hours, I gave into a migraine and decided to acknowledge the problem existed and leave it at that. "Let Future Jim worry about it."

    Looks like I'm Future Jim now...

    ReplyDelete
    Replies
    1. I know... I always hate past-me for not taking these issues seriously. :)

      Delete
  2. "... the best outcome for the world would probably be for the BCL team to pick up on this post..."

    You're THE Jon Skeet. You don't have a red phone with them already? I would have expected you to chat regularly with Anders regarding the design of C#.Next :)

    ReplyDelete
  3. Great post Jon, glad you are doing this for us because I've had som much problems with dates in .NET. Some of them never solved because the underlying API is just wrong on so many levels.

    Thanks for this, really looking forward to using NodaTime.

    ReplyDelete
  4. I'm glad it's not just me who's struggling with the .NET time API. Indeed just recently I spent an hour or so (as I seem to do once every year or so) making sure I understand the bits of it that I'm using. Seems like each time I do this I realize that I know just enough to be dangerous ;-)

    The Noda factoring of this makes a lot of sense to me. Kudos to you guys for taking on this task!

    ReplyDelete
  5. Thanks a lot for this post, Jon.
    By the way, can you share some links that explains theory of date and time? I'm sure you're said you're an amateur in these things just because you're humble

    ReplyDelete
  6. The concept of dealing with datetime and timezones is quite interesting. But, I've not found any examples to use this Noda-Time; almost no samples or examples.

    Please provide the examples if possible.

    ReplyDelete
  7. This is just my two cents.. I haven't given this topic nearly as much thought. Would it be useful to analyze the problem not at the level of class library artifacts but at the level of programming problems and solution design? For instance, to define what classes of functionality require what levels of functionality to operate properly. I feel like time falls into the bucket of "cross cutting concern" although this is an abstract thought - it seems like maybe it's too hard to make a one-size-fits-all solution and ultimately something more like a guidance document would be most useful to programmers at large...

    ReplyDelete
  8. To answer the mystery: Depends on your current time zone. If timezone is UTC then its true. Otherwise its false.

    ReplyDelete
  9. Great thing going here Jon. Have you thought of how you'll serialize/store this data (Any specific format)?

    If you're not sure why Dates and TimeZones are touchh...
    take your Birthdate. If you were born at 1:03am January 1, in a State/Province (or country) that spans more than 2 time zones, A government system might record that you were born on the 31st of December the previous year depending on how they save their dates. Sure they save the TimeZone offset so "Realistically" you were born on Jan 1st. But then that system exports your BirthTime to another application and forgets to put the TimeZone on it because it's a Legacy system... (like a downhill rolling snowball of yellow snow).

    ReplyDelete
  10. I have never had an issue with the DateTime struct in .NET. However, I have always made it a practice to work with UTC time exclusively, and only convert it to local time when displaying information to the user. I also handle offsets and time zone differences myself as well. This seems to avoid almost all localization issues/issues you outlined in your article. However, since I have never used DateTimeOffset or TimeZoneInfo, these issues may present themselves the second I ever try to let .NET manage my times.

    I actually wrote a rather useful class called DateTimeUTC (made this during a project where different times had to be compared at any given point anywhere in the world) which uses DateTime and TimeSpan as its backbone, but it follows my general DateTime practice above, and I haven't had any of the issues outlined in the article. I'll see if I can find the source sometime and post it.

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Jon,
    How to store ZonedDateTime values to SQL Server? (Sorry for off-topic)

    ReplyDelete
  13. what's the difference between the Gregorian calendar and the ISO-8601 calendar ?

    ReplyDelete