Tuesday, 26 November 2013

Noda Time v1.2.0 released

Somewhat tardily, I'm happy to announce the release of Noda Time 1.2.0, which we released last Monday.

While the changes in Noda Time 1.1 were around making a Portable Class Library version and filling in the gaps from the first release, Noda Time 1.2 is all about serialization and text formatting.

On the serialization side, Noda Time now supports XML and binary serialization natively, and comes with an optional assembly to handle JSON serialization (using Json.NET). On the text formatting side, Noda Time 1.2 now properly supports formatting and parsing of the Duration, OffsetDateTime, and ZonedDateTime types.

We also fixed a few bugs, and added a some more convenience methods — Interval.Contains() and ZonedDateTime.Calendar, among others — in response to requests we received from people using the library.

Finally, it apparently wouldn’t be a proper Noda Time major release without fixing another spelling mistake in our API: we replaced Period.Millseconds in 1.1, but managed not to spot that we’d also misspelled Era.AnnoMartyrm, the era used in the Coptic calendar. That’s fixed in 1.2, and I think (hope) that we’re done now.

There’s more information about all of the above in the comprehensive serialization section of the user guide, the pattern documentation for the Duration, OffsetDateTime, and ZonedDateTime types, and the 1.2.0 release notes.

You can pick up Noda Time 1.2.0 from the NuGet repository as usual (core, testing, JSON support packages), or from the links on the Noda Time home page, which also hosts the User Guide and API reference.

Saturday, 6 April 2013

Noda Time v1.1.0 released

I'm pleased to announce the release of version 1.1.0 of Noda Time. The primary new feature is a Portable Class Library version (in the same package) which allows you to use Noda Time when writing applications for Windows Store, Windows Phone 7 and Windows Phone 8. There are additional features around the time zone data available from TZDB, including location information and fuller Windows time zone ID mappings... and a few other bits and bobs, as you might expect.

Enjoy!

Wednesday, 7 November 2012

Noda Time v1.0 released

Go get Noda Time 1.0!

Today is the end of the longest release cycle I've been personally involved in. On November 5th 2009, I announced my intention to write a port of Joda Time for .NET. The next day, Noda Time was born - with a lofty (foolhardy) set of targets.

Near the end of a talk *about* Noda Time this evening, I released Noda Time 1.0.0.

It's taken three years, but I'm immensely proud of what we've managed to achieve. We're far from "done" but I believe we're already significantly ahead of most other date/time APIs I've seen in terms of providing a clean API which reduces *incidental* complexity while highlighting the *inherent* complexity of the domain. (This is a theme I'm becoming dogmatic about on various fronts.)

There's more to do - I can't see myself considering Noda Time to be "done" any time soon - but hopefully now we've got a stable release, we can start to build user momentum.

One point I raised at the DotNetDevNet presentation tonight was that there's a definite benefit (in my very biased view) in just *looking into* Noda Time:

  • If you can't use it in your production code, use it when prototyping
  • If you can't use it in your prototype code, play with it in personal projects
  • If you can't use it in personal projects, read the user guide to understand the concepts

I hope that simply looking at the various types that Noda Time providers will give you more insight into how you should be thinking about date and time handling in your code. While the BCL API has a lot of flaws, you can work around most of them if you make it crystal clear what your data means at every step. The type system will leave that largely ambiguous, but there's nothing to stop you from naming your variables descriptively, and adding appropriate comments.

Of course, I would far prefer it if you'd start using Noda Time and raising issues on how to make it better. Spread the word.

Oh, and if anyone from the BCL team is reading this and would like to include something like Noda Time into .NET 5 as a "next generation" date/time, I'd be *really* interested in talking to you :)

Saturday, 20 August 2011

What's wrong with DateTime anyway?

A few times after tweeting about Noda Time, people have asked why they should use Noda Time - they believe that the .NET date and time support is already good enough. Now obviously I haven't seen their code, but I suspect that pretty much any code base doing any work with dates and times will be clearer using Noda Time - and quite possibly more correct, due to the way that Noda Time forces you into making some decisions which are murky in .NET. This post is about the shortcomings of the .NET date and time API. Obviously I'm biased, and I hope this post isn't seen as disrespectful to the BCL team - aside from anything else, they work under a different set of constraints regarding COM interop etc.

What's does a DateTime mean?

When there's a Stack Overflow question about DateTime not doing quite what the questioner expected, I often find myself wondering what a particular value is meant to represent, exactly. It sounds simple - it's a date and time, right? But it gets rather more complicated as soon as you start thinking about it more carefully. For example, assuming the clock doesn't tick between the two property invocations, what should the value of "mystery" be in the following snippet?

DateTime utc = DateTime.UtcNow;
DateTime local = DateTime.Now;
bool mystery = local == utc;

I honestly don't know what this will do. There are three options which all make a certain amount of sense:

  • It should always be true: the two values are associated with the same instant in time, it's just that one is expressed locally and one is expressed universally
  • It should always be false: the two values represent different kinds of data, so are automatically unequal
  • It should return true if your local time zone is currently in sync with UTC, i.e. when time zones are disregarded completely, the two values are equal

I don't care much what the actual behaviour is - the fact that the behaviour is unobvious is a symptom of a deeper problem. It all comes back to the DateTime.Kind property which allows a DateTime to represent one of three kinds of value:

  • DateTimeKind.Utc: A UTC date and time
  • DateTimeKind.Local: A date and time which is a local time for the system the code is executing on
  • DateTimeKind.Unspecified: Um, tricky. Depends on what you do with it.

The value of the property affects various different operations in different ways. For example, if you call ToUniversalTime() on an "unspecified" DateTime, it will assume that you really meant it as a local value before. On the other hand, if you call ToLocalTime() on an "unspecified" DateTime, it will assume that you really meant it as a UTC value before. That's one model of behaviour.

If you construct a DateTimeOffset from a DateTime and a TimeSpan, the behaviour is somewhat different:

  • A UTC value is simple - you've given it UTC, and you want to represent "UTC + the specified offset"
  • A local value is only sometimes valid: the constructor validates that the offset from UTC at the specified local time in the system default time zone is the same as the offset you've specified.
  • An unspecified value is always valid, and represents the local time in some unspecified time zone, such that the offset is valid at the time.

I don't know about you, but this sort of thing gives me the semantic heebie-jeebies. It's like having a "number" type which has a sequence of digits - but you have to ask another property whether those digits are hex or decimal, and the answer can sometimes be "Well, what do you think?"

Of course, in .NET 1.1, DateTimeKind didn't even exist. This didn't mean the problem didn't exist - it means that the confusing behaviour which tries to make sense of a type which represents different kinds of value couldn't even try to be consistent. It had to be based on the context: it was as if it were permanently Unspecified.

Doesn't DateTimeOffset fix this?

Okay, so now we know we don't like DateTime much. Does DateTimeOffset help us? Well, somewhat. A DateTimeOffset value has a very definite meaning: it's a local date and time with a specific offset from UTC. I should probably take a moment to explain what I mean by "local" date and times - and instants - at this point.

A local date and time isn't tied to any particular time zone. At this moment, is it before or after "10pm on August 20th 2011"? It depends where you are in the world. (I'm leaving aside any non-ISO calendar representations for the moment, by the way.) So a DateTimeOffset contains a time-zone-independent component (that "10pm on ..." part) but also an offset from UTC - which means it can be converted to an instant on what I think of as the time line. Ignoring relativity, everyone on the planet experiences a a particular instant simultaneously. If I click my fingers (infinitely quickly!) then any particular event in the universe happened before that instant, at that instant or after that instant. Whether you were in a particular time zone or not is irrelevant. In that respect instants are global compared to the local date and time which any particular individual may have observed at a particular instant.

(Still with me? Keep going - I'm hoping that the previous paragraph will end up being the hardest in this post. It's a hugely important conceptual point though.)

So a DateTimeOffset maps to an instant, but also deals with a local date and time. That means it's not really an ideal type if we only want to represent a local date and time - but then neither is DateTime. A DateTime with a kind of DateTimeKind.Local isn't really local in the same sense - it's tied to the default time zone of the system it's running on. A DateTime with a kind of DateTimeKind.Unspecified is closer in some cases - such as when constructing a DateTimeOffset - but the semantics are odd in other cases, as described above. So neither DateTimeOffset nor DateTime are good types to use for genuinely local date and time values.

DateTimeOffset also isn't a good type to use if you want to tie yourself to a specific time zone, because it has no idea of the time zone which gave the relevant offset in the first place. As of .NET 3.5 there's a pretty reasonable TimeZoneInfo class, but no type which talks about "a local time in a particular time zone". So with DateTimeOffset you know what that particular time is in some unspecified time zone, but you don't know what the local time will be a minute later, as the offset for that time zone could change (usually due to daylight saving time changes).

What about dates and times?

So far I've only been talking about "date and time" values. What about date values and time values - values which only have one component or the other. It's more common to want to represent a date than a time, but both are common enough to be worth considering.

Now yes, you can use a DateTime for a date - heck, there's even the DateTime.Date property which will return the date for a particular date and time... but as another DateTime which happens to be at midnight. That's not at all the same as having a separate type which is readily identifiable as "just a date" (and likewise "just a time of day" - .NET uses TimeSpan for that, which again doesn't really feel quite right to me).

What about time zones themselves? Surely TimeZoneInfo is fine there.

As I said before, TimeZoneInfo isn't bad. It suffers from two major problems and some minor ones:

First, it's all based on Windows time zone IDs. That's natural enough - but it's not what the rest of the world uses. Every non-Windows system I've ever seen is based on the Olson (aka tz aka zoneinfo) time zone database, and the IDs assigned there. You may have seen IDs such as "Europe/London" or "America/Los_Angeles" - those are Olson identifiers. Talk to a web service offering geo information, chances are it'll talk in Olson identifiers. Interact with another calendaring system, chances are it'll talk in Olson identifiers. Now there are problems there too in terms of identifier stability, which the Unicode Consortium tries to address with CLDR... but at least you've got a good chance. It would be nice if TimeZoneInfo offered some kind of mapping between the two identifier schemes, or somewhere else in .NET did. (Noda Time knows about both sets of identifiers, although the mapping isn't publicly accessible just yet. This will be fixed before release.)

Second, it's based on DateTime and DateTimeOffset, which means you've got to be careful when you use it - if you assume one kind of DateTime when you're actually giving or receiving another kind, you may have problems. It's reasonably well documented, but frankly explaining this sort of thing is intrinsically hard enough without having to put everything in terms which are inconsistent.

Then there are a few issues around ambiguous or invalid local date and time values. These occur due to daylight saving changes: if the clock goes forward (e.g. from 1am to 2am) that introduces some invalid local date and time values (e.g. 1.30am doesn't occur on that day). If the clock goes backward (e.g. from 2am to 1am) that introduces ambiguities: 1.30am occurs twice. You can explicitly ask TimeZoneInfo whether a particular value is invalid or ambiguous, but it's easy to miss that it's even a possibility. If you try to convert a local value to a UTC value via a time zone, it will throw an exception if it's invalid but silently assume standard time (as opposed to daylight saving time) if it's ambiguous. That sort of decision leads developers to not even consider the possibilities involved. Speaking of which...

This all sounds too complicated.

You may be thinking at this point, "You're making a big deal out of nothing. I don't want to think about this stuff - why are you trying to make everything so complicated? I've been using the .NET API for years, and not had problems." If so, I suspect there are three broad possibilities:

  • You're far, far smarter than I am, and understood all of these intricacies through intuition. Your code always makes use of the right kind of DateTime, uses DateTimeOffset appropriately, and will always do the right thing with invalid or ambiguous local date and time values. No doubt you also write lock-free multi-threaded code sharing state in a way which is as efficient as possible but still rock solid. What the heck are you doing reading this in the first place?
  • You have run into these issues, but have mostly forgotten them - after all, they've only sucked away 10 minutes of your life at a time, as you experimented to get something that appeared to work (or at least made the unit tests pass; the unit tests which may well be conceptually wrong too). Maybe you've wondered about it, but decided that the problem was with you rather than the API.
  • You've never seen the problems, but only because you don't bother testing your code, which has so far only ever run in a single time zone, on computers which are always turned off at night (thus missing all daylight saving transitions). In some ways you're lucky, but you've got a time zone.

Okay, that was somewhat facetious, but it really is a problem. If you've never really thought about the difference between "local" times and "global" instants before, you should have done. It's an important distinction - similar to the distinction between binary floating point and decimal floating point types. Failures can be subtle, hard to diagnose, hard to explain, hard to correct, pervasive, and easy to reintroduce at another point of the program.

Handling date and time values is intrinsically tricky. There are nasty cases to think about like days which don't start at midnight due to daylight saving changes (for example, Sunday October 17th 2010 in Brazil started at 1am). If you're particularly unlucky you'll have to work with multiple calendar systems (Gregorian, Julian, Coptic, Buddhist etc). If you deal with dates and time around the start of the 20th century you may see some very odd time zone transitions as countries went from strictly-longitudinal offsets to mostly "round" values (e.g. Paris in 1911). You may need to deal with governments changing time zone transitions with only a couple of weeks' notice. You may need to deal with time zone identifiers changing (e.g. Asia/Calcutta to Asia/Kolcata).

All of this is on top of the actual business rules you're trying to implement, of course. They may be complicated too. Given all this complexity, you should at least have an API which allows you to express what you mean relatively clearly.

So is Noda Time perfect then?

Of course not. Noda Time suffers several problems:

  • Despite all of the above, I'm a rank amateur when it comes to the theory of date and time. Leap seconds baffle me. The thought of a Julian-Gregorian calendar with a cutover point makes me want to cry, which is why I haven't quite implemented it yet. As far as I'm aware, no-one involved in Noda Time is an expert - although Stephen Colebourne, the author of Joda Time and lead of JSR-310 lurks on the mailing list. (Point of trivia: He was present at my first presentation on Noda Time. I asked if anyone happened to know the difference between the Gregorian calendar and the ISO-8601 calendar. He raised his hand and gave the correct answer, obviously. I asked how he happened to know it, and he replied, "I'm Stephen Colebourne." I nearly collapsed.)
  • We haven't finished yet. A beautifully designed API is useless if it isn't implemented.
  • There are bound to be bugs - the BCL team's code is exercised on hundreds of thousands of machines around the world all the time. Errors are likely to be picked up quickly.
  • We don't have any resources - we're a small group of active developers doing this for fun. I'm not saying that for pity (it's great fun) but for the inevitable issues around the amount of time that can be spent on features, documentation etc.
  • We're not part of the BCL. Want to use Noda Time in a LINQ to SQL (or even NHibernate) query? Good luck with that. Even if we succeed beyond my expectations, I'm not expecting other open source projects to take a dependency on us for ages.

Having said that, I am pleased with the overall design. We've tried to keep a balance between flexibility and providing one simple way of achieving any particular goal (with more to do, of course). I'll write another post some time about the design style we've been gradually evolving towards, comparing it with both Joda Time and .NET. The best outcome is the set of types to come out of it, each of which has a reasonably clear role. I won't bore you with all the details here - see other posts, documentation etc.

Ironically, the best outcome for the world would probably be for the BCL team to pick up on this post and decide to overhaul the API radically for .NET 6 (I'm assuming the ship has effectively sailed on .NET 5). While I'm enjoying doing this, I'm sure there are other projects I'd enjoy too - and frankly date and time is too important a concept to rest on my shoulders for the .NET community for long.

Conclusion

I hope I've persuaded you that the .NET API has significant flaws. I may have also persuaded you that Noda Time is worth looking at more closely, but that's a secondary goal really. If you truly understand the flaws in the built-in types - in particular the semantic ambiguity around DateTime - then you're more likely to use those types carefully and accurately in your code. That alone is enough to make me happy.

Friday, 19 August 2011

First NuGet package of Noda Time available - feedback please!

Having been stuck in a bit of an API dilemma recently and having also found time to release Unconstrained Melody as a NuGet package, I figured it would be worth doing the same for Noda Time, in the hope of getting more potential "customers" to have a look. Even though fetching the source and building it is pretty painless, it's more than I'd expect for a casual onlooker, where NuGet makes the whole process pretty simple.

So, while you're reading the rest of this post, why not download the Noda Time 0.1 packages? There are three to pick up, although you only really need the first:

  • NodaTime: the core library
  • NodaTime.Experimental: our sort of sandbox; when there are multiple ways of attacking a problem, I'll probably try to have them all in here at the same time. This only works for new classes and things which can be implemented as extension methods of course - but that should cover a lot of ground. This is more for comment than production use, basically.
  • NodaTime.Testing: Extra library which is designed for testing code which uses Noda Time - for example, a fake clock to be used where you're passing in IClock for dependency injection.

What do I need to know?

It's worth being aware of the core concepts in Noda Time:

  • Instant: defines a universal point in time (we don't handle relativity) without reference to a time zone or calendar system. (Internally it's the number of ticks since the Unix epoch, but don't let that concern you too much.)
  • CalendarSystem: A way of breaking down time into years, months, days, hours, minutes etc. You'll almost certainly only want to use the ISO calendar system, which is the universal default in Noda Time.
  • DateTimeZone: Essentially, a mapping between UTC and local time. But full of subtleties like local times that don't exist or are ambiguous...
  • LocalDateTime: A date and time in a particular calendar but not in a particular time zone. Something like "15th January 2010, 10:30am (ISO)". This is the type you're actually most likely to perform arithmetic - it makes sense to do things like adding a week or a month. Closely related are LocalDate and LocalTime, for situations where you only need a date or only need a time of day respectively.
  • ZonedDateTime: A date and time in a particular calendar and time zone. Think of it as a local date and time + DateTimeZone + offset from UTC. A ZonedDateTime can be mapped unambiguously to an instant.

All of the above are immutable in Noda Time. All except CalendarSystem and DateTimeZone are structs.

Those are the important concepts. There are some more details on the wiki page, but hopefully that's enough to get you going.

What's in the package?

Currently, I'm reasonably confident in the API and implementation of:

  • CalendarSystem (ISO, Gregorian, Julian and Coptic are implemented; a few more to come)
  • DateTimeZone (Olson identifiers; we have a mapping for the Windows identifiers, but I don't think we expose them yet.)
  • ZonedDateTime: Conversion to and from a LocalDateTime (with various options for handling awkward situations)
  • LocalDateTime / LocalDate / LocalTime: Various conversions between them (and construction of course). Arithmetic adding periods (e.g. "this date/time plus two weeks") is implemented but may not be the final API or semantics; manipulation is still up in the air. There are some extension methods in the experimental package to say localDateTime.WithMonthOfYear(10) etc - other options include a builder API, more direct ways of adding periods localDateTime.AddDays(10) and more. Feedback very welcome on this.
  • Instant/Duration/Interval: These are fairly primitive types which don't have much need for extra work

LocalDateTime and ZonedDateTime have conversions to/from .NET types (DateTime and DateTimeOffset) - that may be your best bet for getting "into" the API, unless you want to construct values explicitly.

What's not there yet?

The big feature missing from all of this is formatting and parsing. We have an implementation for Instant and Offset, but those are fairly limited types - obviously ZonedDateTime and LocalDateTime are the big ones here. It's worth understanding the basic plan though...

For each appropriate type, we're going to support the framework convention of ToString on the type itself, as well as static Parse and TryParse methods. However, I personally regard those as legacy approaches. There'll also be INodaFormatter<T> and INodaParser<T> which define interfaces for formatting and parsing - with concrete implementations like LocalDateTimeParser etc. These will be thread-safe and reusable, so you'll be able to set up the parser or formatter once for each pattern you want to use. I see this as giving better maintainability and performance, as well feeling like better OO. Thoughts on this general principle would be welcome - the actual interfaces are there now, but somewhat in flux.

We have yet to decide how to expose what I'm calling pseudo-mutators - methods which get you from one value to another, e.g. the WithYearOfMonth method mentioned earlier. What we're really missing is use cases, so if you can dump a list of feature requests based on what you actually want to do that would be wonderful.

There are no doubt some other features we'll want for v1.0 - and some that might make it, but may not. If there's something you regard as a must-have, holler about it.

Where's the documentation?

The API documentation is actually pretty reasonable - at least, some exists for pretty much everything (no warnings about missing summaries) but it may very well be skimpy in places, and there's just a chance some of it is out of date. Before the full release we'll do a proper pass through, of course. This documentation is built nightly (or on demand) so by the time you read it, it may not reflect the package you've downloaded, if I've applied some fixes etc. I wouldn't expect any really big changes in the very near future though. Before release I'll set up versioned documentation.

The project wiki is in a worse state - the key concepts page is reasonably accurate, but obviously there's a lot of work to be done overall.

How do I give feedback?

The main point of releasing 0.1 as a NuGet package is to get feedback. Lots of feedback. This should be done on the project issues page. If it's a non-trivial matter which you'd like to discuss further, then it'd be great if you'd join the Google group / mailing list, but don't feel you have to get involved to leave feedback.

While high quality feedback is obviously desirable, at this point I'd really just like lots of it. So in particular:

  • If the NuGet packaging is wrong, please let me know - I'm a complete newbie on this.
  • I'm not going to care much whether you report something as a bug or a feature request.
  • If you don't have time to see if your feedback is already covered elsewhere, don't bother.
  • Even if it's only a vague idea, I still want to hear it.
  • Positive feedback is useful too - say what you like as well as what you don't.

Obviously I'll go through and prune the feedback, mark duplicates etc, but that's work that I can do afterwards. If you find any "barriers to entry" when it comes to giving feedback, let me know by email (skeet@pobox.com) and I'll do what I can do fix it.

Tuesday, 30 November 2010

The joys of date/time arithmetic

(Cross-posted to my main blog and the Noda Time blog, in the hope that the overall topic is still of interest to those who aren't terribly interested in Noda Time per se.)

I've been looking at the "period" part of Noda Time recently, trying to redesign the API to simplify it somewhat. This part of the API is what we use to answer questions such as:

  • What will the date be in 14 days?
  • How many hours are there between now and my next birthday?
  • How many years, months and days have I been alive for?

I've been taking a while to get round to this because there are some tricky choices to make. Date and time arithmetic is non-trivial - not because of complicated rules which you may be unaware of, but simply because of the way calendaring systems work. As ever, time zones make life harder too. This post won't talk very much about the Noda Time API details, but will give the results of various operations as I currently expect to implement them.

The simple case: arithmetic on the instant time line

One of the key concepts to understand when working with time is that the usual human "view" on time isn't the only possible one. We don't have to break time up into months, days, hours and so on. It's entirely reasonable (in many cases, at least) to consider time as just a number which progresses linearly. In the case of Noda Time, it's the number of ticks (there are 10 ticks in a microsecond, 10,000 ticks in a millisecond, and 10 million ticks in a second) since midnight on January 1st 1970 UTC.

Leaving relativity aside, everyone around the world can agree on an instant, even if they disagree about everything else. If you're talking over the phone (using a magic zero-latency connection) you may think you're in different years, using different calendar systems, in different time zones - but still both think of "now" as "634266985845407773 ticks".

That makes arithmetic really easy - but also very limited. You can only add or subtract numbers of ticks, effectively. Of course you can derive those ticks from some larger units which have a fixed duration - for example, you could convert "3 hours" into ticks - but some other concepts don't really apply. How would you add a month? The instant time line has no concept of months, and in most calendars different months have different durations (28-31 days in the ISO calendar, for example). Even the idea of a day is somewhat dubious - it's convenient to treat a day as 24 hours, but you need to at least be aware that when you translate an instant into a calendar that a real person would use, days don't always last for 24 hours due to daylight savings.

Anyway, the basic message is that it's easy to do arithmetic like this. In Noda Time we have the Instant structure for the position on the time line, and the Duration structure as a number of ticks which can be added to an Instant. This is the most appropriate pair of concepts to use to measure how much time has passed, without worrying about daylight savings and so on: ideal for things like timeouts, cache purging and so on.

Things start to get messy: local dates, times and date/times

The second type of arithmetic is what humans tend to actually think in. We talk about having a meeting in a month's time, or how many days it is until Christmas (certainly my boys do, anyway). We don't tend to consciously bring time zones into the equation - which is a good job, as we'll see later.

Now just to make things clear, I'm not planning on talking about recurrent events - things like "the second Tuesday and the last Wednesday of every month". I'm not planning on supporting recurrences in Noda Time, and having worked on the calendar part of Google Mobile Sync for quite a while, I can tell you that they're not fun. But even without recurrences, life is tricky.

Introducing periods and period arithmetic

The problem is that our units are inconsistent. I mentioned before that "a month" is an ambiguous length of time... but it doesn't just change by the month, but potentially by the year as well: February is either 28 or 29 days long depending on the year. (I'm only considering the ISO calendar for the moment; that gives enough challenges to start with.)

If we have inconsistent units, we need to keep track of those units during arithmetic, and even request that the arithmetic be performed using specific units. So, it doesn't really make sense to ask "how long is the period between June 10th 2010 and October 13th 2010" but it does make sense to ask "how many days are there between June 10th 2010 and October 13th 2010" or "how many years, months and days are there between June 10th 2010 and October 13th 2010".

Once you've got a period - which I'll describe as a collection of unit/value pairs, e.g. "0 years, 4 months and 3 days" (for the last example above) you can still give unexpected behaviour. If you add that period to your original start date, you should get the original end date... but if you advance the start date by one day, you may not advance the end date by one day. It depends on how you handle things like "one month after January 30th 2010" - some valid options are:

  • Round down to the end of the month: February 28th
  • Round up to the start of the next month: March 1st
  • Work out how far we've overshot, and apply that to the next month: March 2nd
  • Throw an exception

All of these are justifiable. Currently, Noda Time will always take the first approach. I believe that JSR-310 (the successor to Joda Time) will allow the behaviour to be resolved according to a strategy provided by the user... it's unclear to me at the moment whether we'll want to go that far in Noda Time.

Arithmetic in Noda Time is easily described, but the consequences can be subtle. When adding or subtracting a period from something like a LocalDate, we simply iterate over all of the field/value pairs in the period, starting with the most significant, and add each one in turn. When finding the difference between two LocalDate values with a given set of field types (e.g. "months and days") we get as close as we can without overshooting using the most significant field, then the next field etc.

The "without overshooting" part means that if you add the result to the original start value, the result will always either be the target end value (if sufficiently fine-grained fields are available) or somewhere between the original start and the target end value. So "June 2nd 2010 to October 1st 2010 in months" gives a result of "3 months" even though if we chose "4 months" we'd only overshoot by a tiny amount.

Now we know what approach we're taking, let's look at some consequences.

Asymmetry and other oddities

It's trivial to show some assymetry just using a period of a single month. For example:

  • January 28th 2010 + 1 month = February 28th 2010
  • January 29th 2010 + 1 month = February 28th 2010
  • January 30th 2010 + 1 month = February 28th 2010
  • February 28th 2010 - 1 month = January 28th 2010

It gets even more confusing when we add days into the mix:

  • January 28th 2010 + 1 month + 1 day = March 1st 2010
  • January 29th 2010 + 1 month + 1 day = March 1st 2010
  • March 1st 2010 - 1 month - 1 day = January 31st 2010

And leap years:

  • March 30th 2013 - 1 year - 1 month - 10 days = February 19th 2012 (as "February 30th 2012" is truncated to February 29th 2012)
  • March 30th 2012 - 1 year - 1 month - 10 days = February 18th 2012 (as "February 30th 2011" is truncated to February 28th 2011)

Then we need to consider how rounding works when finding the difference between days... (forgive the pseudocode):

  • Between(January 31st 2010, February 28th 2010, Months & Days) = ?
  • Between(February 28th 2010, January 31st 2010, Months & Days) = -28 days

The latter case is relatively obvious - because if you take a whole month of February 28th 2010 you end up with January 28th 2010, which is an overshoot... but what about the first case?

Should we return the determine the number of months by "the largest number such that start + period <= end"? If so, we get a result of "1 month" - which makes sense given the first set of results in this section.

What worries me most about this situation is that I honestly don't know offhand what the current implementation will do. I think it would be best to return "28 days" as there isn't genuinely a complete month between the two... <tappety tappety>

Since writing the previous paragraph, I've tested it, and it returns 1 month and 0 days. I don't know how hard it would be to change this behaviour or whether we want to. Whatever we do, however, we need to document it.

That's really at the heart of this: we must make Noda Time predictable. Where there are multiple feasible results, there should be a simple way of doing the arithmetic by hand and getting the same results as Noda Time. Of course, picking the best option out of the ones available would be good - but I'd rather be consistent and predictable than "usually right" be unpredictably so.

Think it's bad so far? It gets worse...

ZonedDateTime: send in the time zones... (well maybe next year?)

I've described the "instant time line" and its simplicity.

I've described the local date/time complexities, where there's a calendar but there's no time zone.

So far, the two worlds have been separate: you can't add a Duration to a LocalDateTime (etc), and you can't add a Period to an Instant. Unfortunately, sooner or later many applications will need ZonedDateTime.

Now, you can think of ZonedDateTime in two different ways:

  • It's an Instant which knows about a calendar and a time zone
  • It's a LocalDateTime which knows about a time zone and the offset from UTC

The "offset from UTC" part sounds redundant at first - but during daylight saving transitions the same LocalDateTime occurs at two different instants; the time zone is the same in both cases, but the offset is different.

The latter way of thinking is how we actually represent a ZonedDateTime internally, but it's important to know that a ZonedDateTime still unambiguously maps to an Instant.

So, what should we be able to do with a ZonedDateTime in terms of arithmetic? I think the answer is that we should be able to add both Periods and Durations to a ZonedDateTime - but expect them to give different results.

When we add a Duration, that should work out the Instant represented by the current DateTime, advance it by the given duration, and return a new ZonedDateTime based on that result with the same calendar and time zone. In other words, this is saying, "If I were to wait for the given duration, what date/time would I see afterwards?"

When we add a Period, that should add it to the LocalDateTime represented by the ZonedDateTime, and then return a new ZonedDateTime with the result, the original time zone and calendar, and whatever offset is suitable for the new LocalDateTime. (That's deliberately woolly - I'll come back to it.) This is the sort of arithmetic a real person would probably perform if you asked them to tell you what time it would be "three hours from now". Most people don't take time zones into account...

In most cases, where a period can be represented as a duration (for example "three hours") the two forms of addition will give the same result. Around daylight saving transitions, however, they won't. Let's consider some calculations on Sunday November 7th 2010 in the "Pacific/Los_Angeles" time zone. It had a daylight saving transition from UTC-7 to UTC-8 at 2am local time. In other words, the clock went 1:58, 1:59, 1:00. Let's start at 12:30am (local time, offset = -7) and add a few different values:

  • 12:30am + 1 hour duration = 1:30am, offset = -7
  • 12:30am + 2 hours duration = 1:30am, offset = -8
  • 12:30am + 3 hours duration = 2:30am, offset = -8
  • 12:30am + 1 hour period = 1:30am, offset = ???
  • 12:30am + 2 hour period = 2:30am, offset = -8
  • 12:30am + 3 hour period = 3:30am, offset = -8

The ??? value is the most problematic one, because 1:30 occurs twice... when thinking of the time in a calendar-centric way, what should the result be? Options here:

  • Always use the earlier offset
  • Always use the later offset
  • Use the same offset as the start date/time
  • Use the offset in the direction of travel (so adding one hour from 12:30am would give 1:30am with an offset of -7, but subtracting one hour from 2:30am would give 1:30am with an offset of -8)
  • Throw an exception
  • Allow the user to pass in an argument which represents a strategy for resolving this

This is currently unimplemented in Noda Time, so I could probably choose whatever behaviour I want, but frankly none of them has much appeal.

At the other daylight saving transition, when the clocks go forward, we have the opposite problem: adding one hour to 12:30am can't give 1:30am because that time never occurs. Options in this case include:

  • Return the first valid time after the transition (this has problems if we're subtracting time, where we'd presumably want to return the latest valid time before the transition... but the transition has an exclusive lower bound, so there's no such "latest valid time" really)
  • Add the offset difference, so we'd skip to 2:30am
  • Throw an exception
  • Allow the user to pass in a strategy

Again, nothing particularly appeals.

All of this is just involved in adding a period to a ZonedDateTime - then the same problems occur all over again when trying to find the period between them. What's the difference (as a Period rather than a simple Duration) between 1:30am with an offset of -7 and 1:30am with an offset of -8? Nothing, or an hour? Again, at the moment I really don't know the best course of action.

Conclusion

This post has ended up being longer than I'd expected, but hopefully you've got a flavour of the challenges we're facing. Even without time zones getting involved, date and time arithmetic is pretty silly - and with time zones, it becomes very hard to reason about - and to work out what the "right" result to be returned by an API should be, let alone implement it.

Above all, it's important to me that Noda Time is predictable and clearly documented. Very often, if a library doesn't behave exactly the way you want it to, but you can tell what it's going to do, you can work around that - but if you're having to experiment to guess the behaviour, you're on a hiding to nothing.

Saturday, 10 April 2010

Documentation with Sandcastle - a notebook

(Posted to both my main code blog and the Noda Time blog.)

I apologise in advance if this blog post becomes hard to get good information from. It's a record of trying to get Sandcastle to work for Noda Time; as much as anything it's meant to be an indication of how smooth or otherwise the process of getting started with Sandcastle is. My aim is to be completely honest. If I make stupid mistakes, they'll be documented here. If I have decisions to make, they'll be documented here.

I should point out that I considered using NDoc (it just didn't make sense to use a dead project) and docu (I'm not keen on the output style, and it threw an exception when I tried running it on Noda Time anyway). I didn't try any other options as I'm unaware of them. Hmm.

Starting point and aims

My eventual aim is to include "build the docs" as a task in the build procedure for Noda Time. I don't have much in the way of preconceived ideas of what the output should be: my guess is a CHM file and something to integrate into MSDN, as well as some static web pages. Ideally I'd like to be able to put the web pages on the Google Code project page, but I don't know how feasible that will be. If the best way forward turns out to be something completely different, that's fine.

(I've mailed Scott Hanselman and Matt Hawley about the idea of having an ASP.NET component of some form which could dynamically generate all this stuff on the fly - you'd just need to upload the .xml and .dll files, and let it do the rest. I'm not expecting that idea to be useful for Noda Time in the near future, but you never know.)

Noda Time has some documentation, but there are plenty of public members which don't have any XML documentation at all at the moment. Obviously there's a warning available for this so we'll be able to make sure that eventually everything's documented, but we also need to be able to build documentation before we reach that point.

Step 0: building the XML file

The build project doesn't currently even create the .xml file, so that's the first port of call - just a case of ticking a box and then changing the default filename slightly... because for some bizarre reason, Visual Studio defaults to creating a ".XML" file instead of ".xml". Why? Where else are capitals used in file extensions?

Rebuild the solution, gaze in fear at the 496 warnings generated, and we have everything we should need from Visual Studio. My belief is that I should now be able to close Visual Studio and not reopen it (with the Noda Time solution, anyway) during the course of this blog post.

Step 1: building Sandcastle

First real choice: which version of Sandcastle do I go for? There was a binary release on May 29th 2008, a source release on July 2nd 2008, and three commits to source control since then, the latest of which was in July 2009. Personally I like the idea of not having to actually install anything: tools which can just be run without installation are nicer for Open Source projects, particularly if you can check the binaries into source control with appropriate licence files. That way anyone can build after just fetching. On the other hand, I'm not sure how well the Sandcastle licence fits in with the Apache 2 licence we're using for Noda Time. I can investigate that later.

What the heck, let's try building it from source. It's probably easier to go from that to the installed version than the other way round. Change set 26202 downloaded and unpacked... now how do we build it, and what do we need to build? Okay, there's a solution file, which opens up in VS2008 (unsurprising and not a problem). Gosh, what a lot of projects (but no unit tests?) - still, everything builds with nary a warning. I've no idea what to do with it now, but it's a start. It looks like it's copied four executables and a bunch of DLLs into the ProductionTools directory, which is promising.

Shockingly, it's only just occurred to me to check for some documentation to see whether or not I'm doing the right thing. Looking at the Sandcastle web page, it seems I'm not missing much. Well, I was aware that this might be the case.

Step 2: Sandcastle Help File Builder

I've heard about SHFB from a few places, and it certainly sounds like it's the way to go - it even has a getting started guide and installation instructions! It looks like there's a heck of a lot of documentation for something sounds like it should be simple, but hey, let's dive in. (I know it sounds inconsistent to go from complaining about no documentation to complaining about too much - but I'm really going from complaining about no documentation to complaining about something being so complicated that it needs a lot of documentation. I'm very grateful to the SHFB team for documenting everything, even if I plan to read it on a Just-In-Time basis.)

A few notes from the requirements page:

  • It looks like I'll need to install the HTML Help Workshop if I want CHM files; the Help 2 compiler should already be part of the VS2008 SDK which I'm sure is already installed. I have no idea where Help 3 fits into this :(
  • It looks like I need a DXROOT environment variable pointing at my Sandcastle "installation". I wonder what that means in my home-built version? I'll assume it just means the Development directory containing ProductionTools and ProductionTransforms.
  • There's a further set of patches available in the Sandcastle Styles project. Helpfully, this says it includes all the updates in the July 2009 source code, and can be applied to the binary installation from May 2008. It's not clear, however, whether it can also be applied to a home-built version of Sandcastle. Given that I can get all the latest stuff in conjunction with an installed version, it sounds like it's worth installing the binary release after all. (Done, and patches installed.)
  • It sounds like I need to install the H2 Viewer and H2Reg. (I suspect that H2Reg will be something we direct our users at rather than shipping and running ourselves; I don't intend to have an MSI-style "installer" for Noda Time at the moment, although the recent CoApp announcement sounds interesting. It's too early to worry about that for the moment though.)
  • We're not documenting a web site project, so I'm not bothering with "Custom Web Code Providers". I've installed quite enough by this point, thank you very much. Oh, except I haven't installed SHFB itself yet. I'd better do that now...

Step 3: creating a Help File Builder project

This feels like it could be reasonably straightforward, so long as I don't try to do anything fancy. Let's follow (roughly) the instructions. (I'm doing it straight to Noda Time rather than using the example project.)

Open the GUI, create a new project, add a documentation source of NodaTime.csproj... and hit Build Project. Wow, this takes quite a while - and this is a pretty beefy laptop. However, it seems to work! I have a CHM file which looks like it includes all the right stuff. Hoorah! It's a pretty huge CHM file (just over 3MB) for a relatively small project, but never mind.

Let's build it again, this time with all the output enabled (Help 1, Help 2, MSHelpViewer and Website).

Hmm... no MS Help 2 compiler found. Maybe I didn't have the VS2008 SDK installed after all. After a bit of hunting, it's here. Time to install it - and make sure it doesn't mess up the Sandcastle installation, as the SHFB docs warned me about. Yikes - 109MB. Ah well.

Okay, so after the SDK installation, rebuild the help... which will take even longer of course, as it's now building four different output formats. 3 minutes 18 seconds in the end... not too bad, but not something I'll want to do after every build :)

Step 4: checking the results

  • Help 1 (CHM): looks fine, if old-fashioned :)
  • Help 2 (HxS): via H2Viewer, looks fine - I'm not sure whether I dare to integrate it with MSDN just yet though.
  • ASP.NET web site: works even in Chrome
  • Static HTML: causes Chrome to flicker, constantly reloading. Works fine in Firefox. Maybe I need to submit a bug report.

I'm not entirely sure which output option corresponds to which result here; in particular, is "Website" the static one or the ASP.NET one? What's MSHelpViewer? It's easy enough to find out of course - I'll just experiment at a later date.

Step 5: building from the command line

I can't decide whether this is crucial (as it should be part of a continuous build server) or irrelevant (as there are so many tools to install, I may never get the ability to run a CB server with everything installed). However, it would certainly be nice.

Having set SHFBROOT appropriately, running msbuild gives this error:

SHFB : error BE0067: Unable to obtain assembly name from project file '[...]' using Configuration 'Debug', Platform 'X64'

Using Debug is definitely correct, but X64 sounds wrong... I suspect I want AnyCPU instead. Fortunately, this can be set in the SHFB project file (it was previously just defaulting). Once that's been fixed, the build works with one warning: BHT0001: Unable to get executing project: Unable to obtain internal reference. Supposedly this may indicate a problem in SHFB itself... I shall report it later on. It doesn't seem to affect the ability to produce help though.

Conclusion

That wasn't quite as painful as I'd feared. I'm nearly ready to check in the SHFB project file now - but I need to work out a few other things first, and probably create a specific "XML" configuration for the main project itself. I'm somewhat alarmed at the number of extra bits and pieces that I had to install though - and the lack of any mention of Help 3 is also a bit worrying.

I've just remembered one other option that I haven't tried, too - MonoDoc. I may have another look at that at a later date, although the fact that it needs a GTK# help viewer isn't ideal.

I still think the Open Source community for .NET has left a hole at the moment. It may be that SHFB + Sandcastle is as good as it gets, given the limitations of how much needs to be installed to build MS help files. I'd still like to see a better way of providing docs for web sites though... ideally one which doesn't involve shipping hundreds of files around when only two are really required.