Top Stories
Overcoming Data Overload
E-BizDoc, Inc., Uses PDF/A Compression to Reduce Storage Requirements of Color Scans by 90 percent for New York Agencies
By Howard Gross
When E-BizDocs Inc., one of the largest records management and document imaging companies in upstate New York, started working with two New York State agencies – the Education Department and Office of Mental Health – the company was faced with an extreme challenge. The agencies combined had more than 400 million pages of legal and medical documents that needed to be digitally archived. Complicating the matter even more was the fact that the agencies required the documents to be archived in color at 300 dpi in order to preserve unique identifying features, such as color gradations on diplomas, photos and various signatures and handwritten notes on medical records.
The two agencies’ requirements quickly resulted in data overload. E-BizDocs anticipated that it would go through at least one terabyte of storage every two weeks as they progressed through the project. For example, the Education Department was scanning employee applications in four color format for its Office of Professions. Similarly, the Office of Mental Health was scanning medical records in four color. Each batch of documents processed – essentially equivalent to a standard file box – would result in 1 gigabyte of data.
E-BizDocs needed to find a cost-effective way to minimize storage requirements while ensuring that they were producing archive-quality digital reproductions of the documents that could easily be e-mailed and accessed by multiple people within the agencies for years to come.
The solution, the E-BizDocs found, was PDF/A with mixed raster content (MRC) compression from LuraTech Inc., a leading provider of open, ISO-compliant JPEG2000 and PDF/A technology. Using LuraTech’s LuraDocument PDF Compressor Server, E-BizDocs was able to significantly reduce storage and network traffic requirements by producing highly compressed PDF/A files from scanned color documents. In fact, by using the LuraTech PDF/A compression solution, the New York State agencies were able to reduce their storage requirements by 90 percent and improve electronic transfer capabilities by reducing file sizes from 8 megabits to 80 kilobits per page.
The fact that the documents were in PDF/A format also was important to the New York State Education Department and Office of Mental Health. The agencies are required to keep many of these documents in excess of 40 years, so they needed to be able to view the records long into the future. PDF/A is an open International Standards Organization (ISO) file format designed for long-term archiving based on PDF. PDF/A provides users assurance that documents will maintain their appearance and readability regardless of the applications and systems used to create them or future availability of viewing applications or software versions.
How PDF/A with MRC Works
The LuraDocument PDF Compressor enables the generation of highly compressed PDF and PDF/A files from color or black and white scanned documents with the use of MRC compression technology. This proven multi-layer segmentation and compression process offers the best way to minimize the size of scanned documents while maintaining superior image quality and text legibility.
MRC is a unique process through which text and images are separated into their own individual layers, also known as a multi-layered segmentation process, and then optimally compressed (see Figure 1). The underlying concept of the compression process is the partitioning of the document into three distinct segments:
- Bi-level image containing text
- Foreground image containing the color information of the text segments
- Residual image devoid of text components.
Each segment is then compressed using separate algorithms that are specifically adapted to the corresponding type of data. The text is compressed losslessly using the Fax G4 format, while the foreground and background are highly compressed using the JPEG2000 format. MRC reduces full-color documents to the size of a TIFF G4 file, while black and white scanned documents are approximately 50 percent smaller than Fax G4.
From this experience, E-BizDocs has learned that producing high-quality replication of original documents in color to PDF/A via MRC compression does not have to be costly or require inordinate amounts of storage. With the right solution, color documents can be scanned and compressed in a size comparable to black and white PDFs. Moreover, PDF/A ensures long-term accessibility of the digital document copies. By using this standard, E-BizDocs was able to alleviate the agencies’ concerns about future technology changes and comply with regulations that require them to maintain records for more than 40 years.
Howard Gross is president and founder of E-BizDocs.
He can be reached at 518-694-4618.
top of page