Monday, November 10, 2014

CMIS Document Migration with Apache Chemistry and Camel

The Headache of Data Migration 

Migration of data between different content repositories can be difficult.  The primary goal of a migration project is to move as losslessly as possible the stored files, associated metadata and filing hierarchy from one system into another.  But data migration can be challenging.

Migrations typically require that an analyst first create a detailed map for how document types and properties will be transferred between the two systems, and then a developer implements that strategy by writing a migration script.  The actual migration process can be tedious and involve a sequence of imports and exports and things like parallel intermediate files or databases which hold normalized property data.

Something Easier: The Apache Camel camel-cmis Component

Recently while looking at how to migrate content stored in an Alfresco repository into a Nuxeo repository, I came across a blog article by Bilgin Ibryam about the Apache Camel project connector for CMIS, a component he contributed to the Camel project.  I was impressed by how he was able to define in just two lines of Java code a program that could move all the data from an Alfresco repository into Nuxeo by recursively iterating through the folder hierarchy starting at the repository root node, and preserving the hierarchy in the move.

While an indiscriminate migration of all content from one repository into another wasn't exactly what I was looking for, I did find that the camel-cmis component was a good starting point for creating a simple migration tool that could move content easily between CMIS compliant repositories.

Besides the repo-to-repo copy, the camel-cmis component also has the ability to identify groups of documents by using a CMIS query and can then pipe the document data from the result set into the next processing step of a Camel route.

Migrating Engineering Documents from Alfresco to Nuxeo

My goal was to be able to successfully migrate into Nuxeo engineering documents which were stored in Alfresco and defined by a content model and document type based on Alfresco aspects.

To do that, I tweaked the camel-cmis component to accept source and target folders, rather than migrate all documents from the repository starting at the repository root.

I modified the camel-cmis component to accept custom metadata properties, and by using CMIS 1.1 'secondary-types' Alfresco aspect data can also be handled.  Both Nuxeo and Alfresco understand CMIS 1.1.

And finally, I created a simple Camel Message Translator (Java bean) that maps the names of the document types and properties extracted from Alfresco to the names in the content model that are used by Nuxeo.  In this case, the property name translations were defined in a simple key-value property file which, when applied, maps the extracted property names before passing them into Nuxeo.

With that it's then possible to write a simple Camel route that defines a migration of data under an Alfresco folder to a Nuxeo folder:

This Camel route recursively copies the contents of a specified Alfresco folder and its children to a folder in the Nuxeo repository, maintaining the folder hierarchy.  The following screenshots show how documents and folder structure were moved from an Alfresco Share folder into Nuxeo.

Documents in Alfresco Share

Documents Migrated to Nuxeo

You can see that the documents moved from Alfresco were all engineering AutoCAD DWG files.  The files, custom metadata, and foldering hierarchy were copied into Nuxeo.  Then within Nuxeo we can see the migrated documents.  Also, through a configuration of Nuxeo, we are able to display the engineering metadata and render the AutoCAD file content as both thumbnails and preview images.

Using CMIS tools, and software plug-ins for engineering data management and AutoCAD document management, Formtek can assist organizations with ECM migration to the Nuxeo platform.

Footnotes on CMIS and Camel

The use of CMIS makes it easy to interact with compliant content repositories in a standard way.  It enables the easy sharing of content between repositories from different vendors  CMIS is based on a web services interface that accepts either REST or SOAP protocol.

The Apache Chemistry project provides open source implementation of the CMIS standard.  Both the Alfresco and Nuxeo implementations of CMIS  are based on the Chemistry libraries.  Chemistry offers CMIS server libraries only available for Java.  CMIS client libraries exist for Java, Python, PHP, .NET and ObjectiveC, but the Java libraries are the most complete and best tested.

Apache Camel is an open source framework for implementing Enterprise Integration Patterns (EIP).  It lets you use messaging and transport models like HTTP, ActiveMQ, JMS, JBI, SCA, and CXF to grab data, transform and move it to different end points.

1 comment:

  1. Recently I have been looking for a similar solution for a migration project, and I am glad to have found your post! I may need to do more though, as in my case. I am required to migrate documents into a different folder structure.