Stuff Michael Meeks is doing

This is my (in)activity log. You might like to visit Collabora Productivity a subsidiary of Collabora focusing on LibreOffice support and services for whom I work. Also if you have the time to read this sort of stuff you could enlighten yourself by going to Unraveling Wittgenstein's net or if you are feeling objectionable perhaps here. Failing that, there are all manner of interesting things to read on the LibreOffice Planet news feed.

Older items: 2023: ( J F M A M J ), 2022: ( J F M A M J J A S O N D ), 2021, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, legacy html


OOXML SDK project opened up

It is great to see Microsoft's OOXML SDK project open sourced today; clearly while Collabora Productivity backs ODF as the preferred document format - we're pleased to see more FOSS appear out there in this space.

So what is it ? - Microsoft just open-source'd just under nine thousand lines of C#, with around a hundred thousand lines of code generated from the OOXML schemas. It is under an Apache 2.0 license, which seems reasonable (though I'd prefer a copy-left).

Where is it ? - The OOXML SDK is hosted at github - a great place to get distributed, non-hierarchical, peer based community involvement. Unfortunately the free-flowing goodness of github has an un-necessary roadblock that goes with it, hopefully one day people generally will learn.

Will LibreOffice use it ? (no) - Well mostly no; we have our own internal parsing mechanisms written in C++, this C# code targets the Common Language Runtime which has a different scope. We also have an efficient, and tuned internal document model that doesn't match even our ODF XML format - for example repeated formula are stored in groups to assist with OpenCL calculations so even if languages matched, switching to another representation would make little sense.

Will LibreOffice use it ? (yes) Having said that the OOXML SDK includes a rather nice validator. In recent years Markus, a Collabora engineer, has been developing an awesome torture test that loads fifty-thousand documents and re-exports them to umpteen formats consuming a big machine for around five days. One of those export formats is OOXML, after export we like to validate those to try to avoid interoperability regressions. New and improved validation can only help there, particularly the ability to go beyond a simple schema validation to check extra constraints. OOXML / Strict validation would also be lovely. Naturally we need Mono support (which I hear is coming) since all our headless automated tests run under Linux. We also currently allow configuration with --with-export-validation that validates the output from all of the unit tests that are run during compilation - it would be useful to have a command-line tool for this too.

What about ODF ? - ODF still rocks just as much of course. One feature I particularly like is Flat ODF which lets you express an entire document, images and all as a single XML file; in LibreOffice that has comparable performance to zipped ODF.

Is there an ODF equivalent ? - of course ! Generating and parsing ODF or Flat ODF is really pretty simple using any number of platforms and toolkits for ZIP / XML and in-memory DOM models. Then again there is benefit to re-using and adding to semantic sugar around that. In the C# / CLR world you can use AODL, or if Java floats your boat Apache's ODF Toolkit which recently had a new release.

What about validation ? - currently our automated testing tends to use Alex Brown & Cedric Bosdonnat (of SUSE's) nice Office-o-tron which handles both ODF and OOXML, a nice combination to be commended to the OOXML SDK.

Who should I start stoning ? - From my perspective ODF wins the standards beauty contest here hands down, but it's always good to have more developers working in the open and working together. If we have fixes for the SDK I suppose we'll try to contribute them back to github somewhere. Obviously our primary focus is always ODF, as an enabler for the primary goal of a better LibreOffice, and Free Software in every productivity environment. Having said that, we increasingly store and preserve OOXML attributes we have little use for in LibreOffice to re-export in order to ensure high fidelity round-trips. Better validation will be appreciated for that too.


My content in this blog and associated images / data under images/ and data/ directories are (usually) created by me and (unless obviously labelled otherwise) are licensed under the public domain, and/or if that doesn't float your boat a CC0 license. I encourage linking back (of course) to help people decide for themselves, in context, in the battle for ideas, and I love fixes / improvements / corrections by private mail.

In case it's not painfully obvious: the reflections reflected here are my own; mine, all mine ! and don't reflect the views of Collabora, SUSE, Novell, The Document Foundation, Spaghetti Hurlers (International), or anyone else. It's also important to realise that I'm not in on the Swedish Conspiracy. Occasionally people ask for formal photos for conferences or fun.

Michael Meeks (michael.meeks@collabora.com)

Made with Pyblosxom