OmniFocus: What We've Learned So Far (Engineering)

Today's post is the first in an ongoing series I'm calling OmniFocus: What We've Learned So Far (or OF: WWLSF, if you prefer acronyms). As we move slowly but steadily towards a feature freeze and public beta, I thought it would be interesting to get some input from various people here at Omni on things that have gone well, as well as things that have sucked challenges we didn't anticipate—basically, the ups and downs behind building a piece of commercial software.

We're going to start out in the technical arena, so I apologize if code-talk makes you yawn so hard you accidentally drool a little. Here is Omni's engineering perspective on an important lesson learned during OmniFocus's development process, which can be boiled down to: we ♥ CoreData, but not as a primary file format. 

With more on this subject, here is Tim Wood, hater of Aeron chairs, terror of the Unreal Tournament battlefield, and OmniFocus team lead:

There are many things that are great about CoreData, but using CoreData as a user-visible file format was really painful. Since inception, our xcdatamodel file has had 92 revisions, with most of those exposed to several thousand people via our automated builds. Most of these changes aren't things that users would notice; we often add or remove precalculated summaries, denormalize data or generally change the underlying CoreData representation to make our app easier to implement and tune. Yet, with CoreData, the SQLite mapping would be busted beyond hope by adding or removing a column.

Manually building code to migrate between model versions is really not an option. If CoreData had a Rails-like migration facility where columns could be added and removed trivially via SQL ALTER statements, it might be feasible, but it still wouldn't be good. CoreData explicitly disclaims any support for direct access to the various stores, so it isn't a public file format and hinders our users from easy access to their data. In practical terms, we all know that a liberal application of the letter 'Z' will get you most of the way to accessing your data. Still, this isn't ideal.

What CoreData is great for is building an optimized cache of your data, fetching against it and then binding it to your interface.

A couple of other key observations are that we already needed a public file format for Export (we chose a custom XML grammar, but that's merely a detail). And, using a variant of the public file format for the pasteboard format is a great way avoid writing and testing more code (as is using your pasteboard archive/unarchive code to implement your AppleScript 'duplicate' support…)

Given that, I tweaked our XML archiving to support writing a set of CoreData inserts, updates and deletes as a transaction. We can then write out a small fragment of our content in a new gzipped XML file inside our document wrapper. The structure of our XML transactions is very simple with a key feature being that we can trivially merge a big batch of transaction into a single XML document that contains only the final set of objects as inserts.

On startup of OmniFocus, it scans the transaction log in the user's document and builds a cache validation dictionary that contains:

• Version of Mac OS X

• CoreData's version

• SVN revision of the application

• The last transaction identifier

We then open up the CoreData SQLite persistent store and peek at its metadata. If it isn't an exact match, we close up the persistent store, and rebuild the entire thing by importing our coalesced transaction log in exactly the same way we would import one of our backup files.

There are many extra implementation details (locking, catching the insert/update/delete notification, undo/redo vs. AppleScript, two-phase commit between the XML and SQLite, …), but we are really happy with the central approach.

Some of the fun things this gives us:

• You can run the same build of the application on 10.4 and 10.5, switching regularly and not worry if CoreData is going to ignite your SQLite store.

• You can run multiple builds of OmniFocus on the same data and not lose anything (more work may be needed for major file format upgrades if there ever is one).

• If we do screw up one of our automated builds and mess up cache updating code, the user's data doesn't get touched and it's just fine on the next build.

• Until the transaction log is compacted, we actually have the full record of edits and we could hypothetically implement persistent undo, allowing the user to rollback to yesterday's version…

• … or calculate the changes they've made since some point in time.

The last point is really interesting and I'm hoping to make good use of that in the future for things like computer-to-computer synchronization (no, I'm not promising anything)!