Monday, November 19, 2012

Book Review: "Intelligent Document Capture with Ephesoft" by Pat Myers and Ike Kavas


Intelligent Document Capture with Ephesoft is a new book from Packt Publishing.  The primary authors of the book are Pat Myers, executive vice president of Zia, and Ike Kavas, founder and CTO of Ephesoft and also former Kofax employee.  Myers and Kavas together developed the Ephesoft training program.

What is Ephesoft?  Ephesoft software is used to process and capture paper, email and fax documents for use within ECM, ERP and other enterprise software systems.  ECM systems supported by Ephesoft include Alfresco, FileNet, SharePoint, and generic CMIS repositories.  Ephesoft's capabilities include document classification, separation, and data extraction.

Ephesoft is Open Source software and similar in functionality to proprietary systems like IBM-DataCapEMC CaptivaKofax, and Athento.  It is built from Open Source components like Spring DM, Hibernate, Lucene, and jBPM.

At only 161 pages, this book on Ephesoft uses a format that's considerably shorter than many other technical books, and because of the large number of screenshots it contains, it is a relatively quick read.

The book provides a high-level overview of Ephesoft and describes a path that users can take to get an Ephesoft document capture system up and running quickly.  After finishing this book, the reader will have enough background to get started with building their own capture projects based on Ephesoft.   But that's not to say that this book is a definitive reference for Ephesoft.  Actually, there is much more detailed documentation available on-line that can be found in the Ephesoft wiki pages.  Free on-line training is also available from Ephesoft via the YouTube-based Ephesoft University.

The book consists of the following chapters:
  1. Introduction
    Discusses document capture history, benefits of capture, and a description of some typical high-ROI document capture use cases like mortgage loan processing, claims processing, and the handling of invoices and sales orders.
    At a high level, and in a way not specific to Ephesoft, the book describes different document classification methods like the use of barcodes, image layout classification, keywords, and content analysis.
    Similarly the book explains different types of extraction methods, like zonal OCR (optical character recognition), keywords, position information, and the look up of supplemental information from databases and other systems.
  2. A Quick Tour of Ephesoft
    This chapter describes each of the five tabs in the Ephesoft administrative user interface [see also the on-line Ephesoft Admin Manual]:
        - Batch Class Management
        - Batch Instance Management
        - Workflow Management
        - Folder Management
        - Reports
    It also describes the four tabs of the Operator User Interface [see also the on-line Ephesoft User Manual]:
        - Home/Batch List
        - Batch Details
        - Web Scanner
        - Batch Upload
    The description for each tab is based on a screenshot followed up with details about how to use the features available on the tab.
    This chapter is made available for free by Packt as a sample of the book and can be found online here.
  3. Creating a Batch Class
    This chapter gives an example of how to create a new batch class from the Ephesoft administrative user interface.
    The standard Ephesoft mailroom automation batch template is copied and modified to create a new custom batch class.  Then a new document type for that batch is added and configured.  With training, Ephesoft is able to recognize the document type for automatic classification and separation.
    With configuration, Ephesoft can extract content from scanned images and map the extracted data as key/value pairs to fields for the document type.  Field data can also be validated with validation rules using regular expressions.
  4. Processing a Batch
    This chapter uses the batch class created in chapter 3 and shows how incoming documents for this batch class can be processed.  Batch processing is performed from the Operator's interface.
    This is the shortest chapter in the book.  It shows how a batch is started, and from the Operator's interface, how the review and verification steps are performed.
  5. Core Ephesoft Features
    I found the book to become more interesting after this point, because starting in this chapter the examples are a bit more detailed.
    For example, there is information here about the different types of document classification and how to configure them: Search, Image, Barcode, Automatic, and Programmatic.
    Also discussed is how, once document and field data have been captured, how to export that information into a repository (primarily via CMIS) or database.
  6. Ephesoft Extended Features
    This chapter gets into more advanced features available in Ephesoft.  For example, it describes some features of classification based on image and barcode recognition that are a bit more advanced than the techniques described in chapter 5.
    The Enterprise version of Ephesoft includes an integration with OpenText's RecoStar OCR engine -- this chapter describes how to enable and configure the option.
    Discussed here are product extension points where the user can write Java 'scripts' which customize and change standard product behavior.
    The chapter also talks about how the base Ephesoft product can be extended with plugins and how to write new custom plugins.
  7. Tips
    The final chapter collects a variety of general tips and pieces of information to optimize your use of Ephesoft.  It contains troubleshooting hints like how to configure logging and how to monitor batch processes.  It also discusses how to configure Ephesoft to use authentication with LDAP and Active Directory.
Would I recommend this book?  I'd highly recommend it to someone that is not currently familiar with Ephesoft and who wants to jump start their use of the product.  But existing users of Ephesoft probably won't find too much new information here.

Again, while almost all the information presented in the book can be found elsewhere on-line, the advantage of the book is that the information is presented here in a directed and easy-to-consume format.  What's missing from the book though are more in-depth examples and perhaps more information about reporting and working with scanners.


Support for the Ephesoft Enterprise edition is available via an annual subscription. [Assistance with Ephesoft is also available from partners.  Formtek is an Ephesoft Platinum partner and we have a number of successful Ephesoft implementations.]



No comments:

Post a Comment