Monday, April 24, 2006

The subtle glitch in JCR 1.0

After investigating JCR 1.0 (spawn of JSR-170) implementations (Alfresco and JackRabbit, both excellent realizations) for a few days, I feel a strange combination of excitation and disappointment. I am excited by the idea of having a standardized API to hit content repositories, a playground where proprietary interfaces have been ruling for a long time. The reason of my disappointment will take a little longer to explain.

One of the key selling point of JCR 1.0 is that it enables you to swap your repository from vendor A to vendor B, provided that they both support the same level of feature (so far JSR-170 defines 2 levels). This is true and it works but to some extent; and it is in this restriction that lies the glitch that annoyed me so much.

JCR repositories rarely come as naked JSR-170 implementations: they usually come complete with other ways of reaching their content (WebDav, CIFS, portals...) and - hopefully - useful administration tools.

Because JCR 1.0 is an extremely granular API, where familiar concepts like folders and files are not handled as such but abstracted as general purpose nodes, it allows each implementation to decide how to represent the classical hierarchy of folders everybody is used to and expects to find in a repository. And this is the crux of the problem: if you stick to the pure JCR API, you will be able to run your code on any compatible repository but the connected tools and other data retrieval channels will not recognize your data structure and will ignore it.

For example, a block of code will create a data structure that JackRabbit WebDav access represents as a folder containing a file, while the same block of code will store data that Alfresco will not be able to display in any of its data or web access tools. I have even come to inconsistencies (not the proper term because it is logical that it does not work) within the same tool: code that creates this folder and file that JackRabbit browser interface can render will be exposed as meaningless folder and files in its WebDav access.

The conclusion of this story is that yes, you will be able to swap one repository for another, but the trade-off is that you will potentially loose some or all of the extra features of the repositories. To me, this sounds like a major loss.

All this reminds me of JDO: so generic and versatile that it has never been good enough in one particular persistence strategy compared to Hibernate database-only persistence that has reached excellence and universality.

Let u’s hope that JCR 2.0, the upcoming spawn of JSR-283, will address this concern. But will it? Is it the interest of the expert group members to be potentially swapped-out when all of them look for lock-in strategies? Only time will tell.


Ach said...

Hi Dave,
You really tell me that we can't exchange jackrabbit with Alfresco later without a lot of pain in ass?!
I am not familiar with alfresco at all but I was thinkging about testing it or Magnolia later instead of this sluggish rabbit!
So this is a standard like Ansi SQL ya?! there is no two RDBMS that you can exchange easily :))
So what is the benefit of a standard on paper?
-Good luck ;)

David Dossot said...

I do not know if converting from one document model to another one (say JackRabbit to Alfresco) would be so painful: at least it can be automated, so you, once you have written it, you can run it on all your data.

You are right to question the interest of standards that are not thourough or ambitious enough. This is the problem in JCP: the common ground between vendors / oss that constitute the expert groups can, sometimes, be so limited that the resulting JSR ends up useless.

Anyway, let's hope JCR v2 will be bolder and that intercompatible import/export features will be added to existing repositiories.

David Nuescheler said...

Hi David,

thanks for mentioning JCR.

I would like to comment on a couple of your statements.

It is true that JCR in version 1.0 tries to stay out of the realm of defining a semantic information model, with the notable exception of the Files and Folders metaphore.

In the case of Files and Folders the specification defines the respective nodetypes, which are available throughout compliant repositories.

It is not entirely correct that the repository implementation chooses how to expose content in the repository, but it is the content application that defines the content model. Much like in a relational database it is not the relational database vendor that exposes the data model but the application defines it.

If an application exposes Files and Folders without extending from nt:file and nt:folder I would argue that it is a broken application.

Content repository applications such as WebDAV or CIFS Servers can be and have been written entirely portable on top of JCR and specifically those applications are very portable across all JCR compliant repositories.

Switching repositories should really not be an issue, switching from one content application with content model (a) to another content application with content model (b) may of course be hard, but that of course is not even something that JCR would like to address.

David Dossot said...

David, thanks for taking the time to write these very appreciated precisions.