TAWPI TODAY Magazine TAWPI - The Association For Work Process Improvement

Top Stories



Overcoming Data Overload

E-BizDoc, Inc., Uses PDF/A Compression to Reduce Storage Requirements of Color Scans by 90 percent for New York Agencies

By Howard  Gross

When E-BizDocs Inc., one of the largest records management and document imaging companies in upstate New York, started working with two New York State agencies – the Education Department and Office of Mental Health – the company was faced with an extreme challenge. The agencies combined had more than 400 million pages of legal and medical documents that needed to be digitally archived. Complicating the matter even more was the fact that the agencies required the documents to be archived in color at 300 dpi in order to preserve unique identifying features, such as color gradations on diplomas, photos and various signatures and handwritten notes on medical records.


The two agencies’ requirements quickly resulted in data overload. E-BizDocs anticipated that it would go through at least one terabyte of storage every two weeks as they progressed through the project. For example, the Education Department was scanning employee applications in four color format for its Office of Professions. Similarly, the Office of Mental Health was scanning medical records in four color. Each batch of documents processed – essentially equivalent to a standard file box – would result in 1 gigabyte of data.


E-BizDocs needed to find a cost-effective way to minimize storage requirements while ensuring that they were producing archive-quality digital reproductions of the documents that could easily be e-mailed and accessed by multiple people within the agencies for years to come.


The solution, the E-BizDocs found, was PDF/A with mixed raster content (MRC) compression from LuraTech Inc., a leading provider of open, ISO-compliant JPEG2000 and PDF/A technology. Using LuraTech’s LuraDocument PDF Compressor Server, E-BizDocs was able to significantly reduce storage and network traffic requirements by producing highly compressed PDF/A files from scanned color documents. In fact, by using the LuraTech PDF/A compression solution, the New York State agencies were able to reduce their storage requirements by 90 percent and improve electronic transfer capabilities by reducing file sizes from 8 megabits to 80 kilobits per page.


The fact that the documents were in PDF/A format also was important to the New York State Education Department and Office of Mental Health. The agencies are required to keep many of these documents in excess of 40 years, so they needed to be able to view the records long into the future. PDF/A is an open International Standards Organization (ISO) file format designed for long-term archiving based on PDF. PDF/A provides users assurance that documents will maintain their appearance and readability regardless of the applications and systems used to create them or future availability of viewing applications or software versions.


How PDF/A with MRC Works

The LuraDocument PDF Compressor enables the generation of highly compressed PDF and PDF/A files from color or black and white scanned documents with the use of MRC compression technology. This proven multi-layer segmentation and compression process offers the best way to minimize the size of scanned documents while maintaining superior image quality and text legibility.


MRC is a unique process through which text and images are separated into their own individual layers, also known as a multi-layered segmentation process, and then optimally compressed (see Figure 1). The underlying concept of the compression process is the partitioning of the document into three distinct segments:

  1. Bi-level image containing text
  2. Foreground image containing the color information of the text segments
  3. Residual image devoid of text components.


Each segment is then compressed using separate algorithms that are specifically adapted to the corresponding type of data. The text is compressed losslessly using the Fax G4 format, while the foreground and background are highly compressed using the JPEG2000 format. MRC reduces full-color documents to the size of a TIFF G4 file, while black and white scanned documents are approximately 50 percent smaller than Fax G4.


From this experience, E-BizDocs has learned that producing high-quality replication of original documents in color to PDF/A via MRC compression does not have to be costly or require inordinate amounts of storage. With the right solution, color documents can be scanned and compressed in a size comparable to black and white PDFs. Moreover, PDF/A ensures long-term accessibility of the digital document copies. By using this standard, E-BizDocs was able to alleviate the agencies’ concerns about future technology changes and comply with regulations that require them to maintain records for more than 40 years.


Howard Gross is president and founder of E-BizDocs.
He can be reached at 518-694-4618.


 

top of page

Advertisers Index
AnyDoc Software
Cash Management Solutions
DRS-Disaster Recovery Services
IBML, LLC

ImageRemit

J&B Software
OPEX
WAUSAU


today magazine 2008 Issues

January/February

March/April

May/June

July/August


TAWPI  75 Federal St., Suite 901

Boston, MA 02110-1407

Tel: (617) 426-1167
Fax: (617) 521-8675

info@tawpi.org

today®
The Journal of Work
Process Improvement

©2008 All rights reserved

ISSN: 1073-2233