A Better Kind of OCR Promised by Brainware
December 9, 2008 Alex Woodie
The advent of optical character recognition (OCR) technology has done much to speed the handling of documents. Without a way to digitize the information in documents, the paper crush would threaten to overwhelm. But OCR and related workflow technologies aren’t perfect, and often just shift the burden from manual paper shuffling to manual electronic document shuffling. A company called Brainware says it has way to lessen the e-document drudgery by truly automating the input and verification of any paper-based document, from the mailroom to the ERP system.
For decades, visions of the paperless office have danced like sugarplum fairies in the heads of farsighted business executives. Instead of business being based on transactions conducted on paper documents and all the limitations they bring, electronic documents would herald a new age of speed and accuracy in the creation and completion of transactions–the age of e-business.
While it is true that electronic documents and the Web have combined to reshape how business is done, much of the world still relies heavily on paper-based processes, and that isn’t going to change any time soon. Think of the number of bills you pay online, and compare it to all the invoices you still receive in the mail. Similarly, many businesses have adopted standard EDI processing to eliminate paper processing, but they must make exceptions for partners that don’t support EDI, or where small transaction volumes make EDI or other e-business transaction processing too expensive.
When companies need to process large volumes of paper-based documents, OCR is often employed to get the information off the paper. But once the data is in hand, the related content management and workflow applications don’t always combine to drive efficiency into the system, explains Charlie Kaplan, vice president of marketing and product management at Ashburn, Virginia-based Brainware.
“The notion of using imaging and workflow technology as the means to automating the workflow is far short of what it should actually be,” Kaplan says. “Now that I don’t have paper, I route it around the organization, and it needs to be keyed and approved and validated and re-keyed and so on and so forth. So it’s still a high touch process. We think of this as a workflow-assisted human process. You’ve traded one pain for another.”
Brainware’s solution to this problem is a product called Distiller that can do much of this routing and e-document shuffling behind the scenes. The software, which runs on Windows, takes the raw TIFF output from any OCR scanning engine, and automatically categorizes the document (based on its “neural network” and self-learning technology). After it has correctly categorized the document, it then extracts data from the relevant fields and sends it directly to the order entry or ERP system, thereby bypassing the content management or workflow system entirely.
And while Brainware would seem to be in the OCR business, it does not see itself as an OCR provider. “It causes a lot of confusion. We get called an OCR technology, but we try and call ourselves intelligent data capture,” Kaplan says. “It’s an important distinction, because the OCR just generates text. And in order to put that text in any context, so you know what the word is that you’re looking at, you gotta do all this other stuff.”
According to Brainware, customers adopting Distiller can expect the software to correctly process more than 90 percent of the documents that it’s faced with. In other words, nine out of 10 documents are never touched as they proceed from the scanner to the ERP system. This can enable huge cost savings for companies large enough to have sought automation solutions to document processing in the first place.
The savings start in the mailroom. “Companies get all this mail, and somebody has to sit there and take it out of envelope, take out staples, make sure it’s clean, and press the mail before it hits the scanner,” Kaplan says. “Then there are companies that spend time applying barcodes and separator sheets, putting them into the appropriate piles, whether they’re separating remittances or invoices or memos or statements.
“We say, let’s skip all that. We’ll do that automatically with the application. The way the system works is it actually learns from examples. So if I show it examples of invoices and remittances and claims and any other document type, the system learns what makes those documents similar to one another. This is where we use the neural network technology.”
The same neural network technology developed by Brainware’s German inventors allows Distiller to automatically categorize the various fields within the document, which leads to even more savings. “We have the logic in Distiller to extract all the line items, and while doing that it will do cross validation,” such as checking amount totals, Kaplan says. “Believe it or not, sometimes you get invoiced for things incorrectly.”
One large Brainware customer is an energy services giant Halliburton, which uses Distiller at its global shared service centers in Oklahoma and Dubai. According to Kaplan, Distiller processes more than 2 million invoices per year for Halliburton. These invoices are sent in multiple languages from 550,000 different vendors, but Distiller was able to distinguish a Halliburton invoice from only 31 different examples, he says. From there, Distiller was off and running, and today provides Halliburton with a 92 percent passthrough rate into its SAP system.
Distiller’s neural network-based approach is superior in many ways to the template and keyword-based approaches of first-generation OCR and imaging systems, Kaplan says. But it may be impossible to ever achieve a system that delivers 100 percent accuracy.
“Often the biggest problem is just a poor quality scan,” Kaplan says. “Companies like Halibuton get invoices from crazy places that are printed on almost the equivalent of tissue paper. Scanning technology is good, but there are certain types that are really hard to scan. There are plenty of OCR errors that you get. It could be as simple as the printer of the invoice needs to be cleaned, because you get a smudge. Or somebody put a stamp over the numbers.”
Perfection may not be attainable, but you can still save millions of dollars for your company. Another Brainware customer, Alltell Wireless, may have set the record for quickest return on investment.
Before Alltell implemented Distiller, the company managed to record about a million dollars in savings for all of 2006 by paying its invoices early, which is not that much for a company of its size, Kaplan said. Just three or four months after installing Distiller in late 2007, the company had already realized $17 million in discounts. “I think they paid for their software in a couple of weeks or a month,” Kaplan says.
Brainware has customers in all types of industries, including some AS/400 shops. Several customers use Distiller to input transaction data into JD Edwards ERP systems, including JohnsonDiversey and Old Dominion Freight Line. There is a good write-up of JohnsonDiversey’s use of Distiller on Brainware’s Web site at www.brainware.com/docs/JohnsonDiversey_IOMA.pdf.
The one similarity among Brainware customers is that they tend to be larger shops–mostly $1 billion in revenues and up–that do a lot of paper-based business. After all, a 50 percent efficiency boost for a company that dedicates one employee to opening mail and order entry will not do much for the bottom line. It will also slow payback on a system that starts at about $500,000, installed. But for companies with big operations, Distiller can mean big savings.