Skip to main content

Hitachi
Contact InformationContact Information

Efficiency Gains from AI-OCR-based Form Recognition Service

    Author

    Atsuo Yamazaki

    • Financial Innovation Center, Business Planning Unit, Financial Information Systems Sales Management Division, Hitachi, Ltd.

    Manabu Inada

    • Application Cloud Services Business Division, Application Services 1, Lumada Solution Hub Business Promotion Center, Hitachi, Ltd.

    Naohiro Suzuki

    • AI Business Development, Lumada CoE, Service Platform Business Division, Hitachi, Ltd.

    Naoki Imoto

    • Financial Innovation Center, Business Planning Unit, Financial Information Systems Sales Management Division, Hitachi, Ltd.

    View author details

    Atsuo Yamazaki

    Yamazaki Atsuo

    • Financial Innovation Center, Business Planning Unit, Financial Information Systems Sales Management Division, Hitachi, Ltd.
    • Current work and research: Planning business related to big data and AI for the financial industry.

    Manabu Inada

    Inada Manabu

    • Application Cloud Services Business Division, Application Services 1, Lumada Solution Hub Business Promotion Center, Hitachi, Ltd.
    • Current work and research: AI-OCR business planning and service development.

    Naohiro Suzuki

    Suzuki Naohiro

    • AI Business Development, Lumada CoE, Service Platform Business Division, Hitachi, Ltd.
    • Current work and research: AI consulting service and solution development.

    Naoki Imoto

    Imoto Naoki

    • Financial Innovation Center, Business Planning Unit, Financial Information Systems Sales Management Division, Hitachi, Ltd.
    • Current work and research: Planning business related to big data and AI for the financial industry.

    Introduction

    Innovation in digital technologies such as artificial intelligence (AI) and the Internet of Things (IoT) is bringing major changes to society. Hitachi’s new IT strategy involves an acceleration of its work toward the digitalization of all corners of society as a means of improving convenience for the public and enhancing efficiency in both the public and private sectors(1).

    Meanwhile, paper forms such as invoices are still used to record and pass on information at government agencies, private-sector companies, and other organizations that engage in administrative work. The information on these forms is entered into computer systems for processing, meaning that when this work is done manually there is scope for improving efficiency by instead adopting AI-based optical character recognition (AI-OCR).

    This article describes an existing OCR and AI-OCR that combines OCR and AI and presents an example of how operational efficiency has been boosted by use of Hitachi’s form recognition service that incorporates AI-OCR. The article also describes Hitachi’s plans for dark data analysis, a technique for extracting the value hidden in ordinary business documents.

    Transition from OCR to AI-OCR

    Overview of OCR and AI-OCR

    OCR is a way of reading the text contained in image data. Similarly, AI-OCR is the application of AI for this purpose. While it is a form of OCR in the sense that it extracts text from image data, AI-OCR is distinguished by making use of AI in its recognition processing to overcome issues with conventional OCR.

    What this means in practice is that it is able to read complex handwritten text (such as casual jottings, handwritten text that is not delineated by use of lined paper, or crossed-out text), forms such as invoices with layouts that vary from company to company, and loosely formatted documents such as contracts. Accuracy can also be further enhanced by use of AI learning. The technique has a strong affinity with robotic process automation (RPA) with which it can be combined to expand the scope of task automation at organizations where this is needed.

    Role of AI-OCR in Society-wide Digitalization

    This section considers the future role of AI-OCR, the market for which is steadily growing, and the scope for its further development amid the shift to paperless practices as part of society-wide digitalization.

    The market for AI-OCR is projected to increase from an estimated JPY700 million in FY2018 to JPY3.2 billion in FY2030(2). The two main reasons for this are as follows.

    1. Rising expectations for AI-OCR as a means of boosting efficiency through the use of digital technology to enter data from paper forms, which will remain in use in existing practices.
      Among the factors behind this are the obstacles to making existing practices paperless (digital), which include the need to fundamentally redesign entire workflows and their computer systems and other associated infrastructure, and the need to maintain record-keeping integrity (document authenticity, accessibility, and so on) for entered or archived information at a similar level to that provided by paper records.
    2. Ongoing digitalization of archived paper documents and rising demand for the use and analysis of data

    For these reasons, AI-OCR is expected to continue to play a part in future digitalization.

    New Applications Made Possible by Transition from OCR to AI-OCR

    Fig. 1—Diagram of How Transition from OCR to AI-OCR Enables Use in Wider Range of Document Management Tasks Fig. 1—Diagram of How Transition from OCR to AI-OCR Enables Use in Wider Range of Document Management Tasks By providing both technical and service enhancements, the transition from OCR to AI-OCR enables the technology to adapt to a variety of different task characteristics. Even as AI-OCR becomes established, however, it is anticipated that constraints imposed by existing practices and other such factors will see continued demand for the sort of fixed-format form scanning for which conventional OCR is used.

    The transition from OCR to AI-OCR has expanded the range of document management tasks in which the technology can be put to use. Figure 1 shows how this works. AI-OCR has delivered service improvements as well as the technical improvements associated with advances in IT. Rather than having to obtain dedicated scanners specifically for seasonal work such as times of peak activity or work involving forms that are handled in small volumes or in many different formats, as was the case with conventional OCR, AI-OCR is delivered in a standardized form as a cloud service. A later section discusses the problem of unstructured data for which use of AI-OCR is difficult.

    Efficiency Gains from AI-OCR-based Form Recognition Service

    Past Work by Hitachi and Overview of Form Recognition Service

    Fig. 2—Overview of AI-OCR-based Form Recognition Service Fig. 2—Overview of AI-OCR-based Form Recognition Service The service is equipped with recognition engines for both fixed-format forms and free-format forms such as invoices that do not follow a predefined format. It uses the recognition technique that best suits the task and delivers very accurate results together with a confidence score indicating this recognition accuracy.

    Hitachi has been developing OCR technologies since the practice first entered commercial use to present-day AI-OCR, and continues to do so with a view to the future. This work began in 1968 with the launch of the Hitachi H-8252 optical character reader, the first such general-purpose OCR system to be manufactured in Japan(3), (4). Development of the technology has continued, culminating in its current cloud-based AI-OCR service for scanning a wide range of business forms that utilizes deep learning and other such techniques. By drawing on business and technical know-how built up over many years, Hitachi aims to develop new services for the world of the future that will resolve the workplace challenges faced by its customers.

    Hitachi currently supplies its form recognition service primarily to financial institutions, using it as a means to shift to paperless practices. Suitable for data entry tasks in a wide range of industries, the form recognition service features a service platform on which a number of different AI-OCR engines use AI to perform high-accuracy text recognition, with capabilities that include the scanning of fixed- and free-format forms, printed and handwritten text, and two-dimensional barcodes. It is also equipped with a proprietary Hitachi algorithm that calculates a confidence score for recognition accuracy, providing an easy way to identify data that may have been scanned incorrectly. These technical features smooth integration of the service with other business applications, making it possible to automate a wide range of forms processing work. Figure 2 shows an overview of the form recognition service.

    Technology-based and Service-based Approaches to Efficiency Improvement

    Fig. 3—Technology-based Approach to Use of Form Recognition Service to Improve EfficiencyFig. 3—Technology-based Approach to Use of Form Recognition Service to Improve EfficiencyTo improve the efficiency of various types of form processing, the form recognition service uses AI-OCR and other such technologies to scan both fixed- and free-format forms with high recognition rates, also providing confidence scores to help reduce the amount of work required for checking the OCR output.

    The ways in which the form recognition service can be used to improve efficiency can be broadly divided into the following two approaches.

    (1) Technology-based approach

    The form recognition service overcomes problems with applications where use of OCR is difficult for technical reasons. Figure 3 shows an overview.

    1. Reduce workloads by using service to process a wide variety of forms
      If AI-OCR is to reduce the workload of data entry staff it needs to be capable of continuous improvement in character recognition rates and of use for reading a wide range of different forms. In the case of free-format forms such as invoices where the locations of monetary amounts differ from company to company, AI-OCR also needs to be able to work without having to specify the locations of these form data fields in advance.
      To achieve a high level of recognition accuracy that is not influenced by the particular idiosyncrasies of different people’s handwriting, AI-OCR combines deep learning with advanced natural language processing. Highly accurate handwriting recognition AI was achieved by using deep learning to learn the different variations in things like character shapes and spacing found in handwriting. Similarly, low recognition error rates were achieved by using advanced natural language processing to estimate the probabilities of different characters appearing at the word level and thereby to assess the accuracy of handwriting recognition AI, and also to calculate the confidence scores discussed below.
      The form recognition service also improves the accuracy of text recognition by selecting the optimal AI-OCR engine for fixed- or free-format forms from a mix of options that includes third-party as well as Hitachi’s own recognition technology. The service also includes a learning capability for updating the recognition model to continuously improve character recognition rates. In the case of free-format forms, meanwhile, the service can be used to read a wide range of these regardless of how they are laid out, being able to identify where required fields are located on the form without this needing to be specified in advance. Instead, all that is needed is to specify which information needs to be extracted for the particular task being undertaken.
    2. Reduce workloads by using confidence scores to rate recognition accuracy
      As an expectation of 100% recognition accuracy is unrealistic, the scanned information needs to be checked by humans. Distinguishing between tasks that can be automated by RPA (processing that can be delegated to machines) and those for which human checking is required is an important factor in determining how much the volume of this checking work can be reduced.
      The form recognition service provides a confidence score for the accuracy of AI-OCR that is calculated by a proprietary Hitachi algorithm. By doing so, the risk of incorrect data entry due to AI-OCR failing to scan text or misrecognizing it can be reduced by applying task-specific check rules (checks on things like format or the number of digits) to the OCR output and associated confidence score and taking remedial action. This makes it possible for instances of misrecognition by AI-OCR to be picked up as part of the work procedure.
      This allows for greater automation of form processing while at the same time reducing the risk of incorrect data entry by, for example, allowing OCR with a high confidence score to proceed automatically without checking and only having humans check instances with a medium or low score. The business risk of invalid data entry can also be further minimized by cross-checking OCR data against data from other systems and using this to adjust the confidence score.

    Fig. 4—Service-based Approach to Use of Form Recognition Service to Improve EfficiencyFig. 4—Service-based Approach to Use of Form Recognition Service to Improve EfficiencyThe form recognition service is cloud-based to provide scalability in response to individual requirements, including seasonal work where there is a peak in workload or work involving forms that are handled in small volumes or in many different formats.

    (2) Service-based approach

    The form recognition service also overcomes the problem of applications where conventional OCR is usable but impractical for cost-benefit reasons. Figure 4 shows an overview.

    1. Use in applications that handle forms in small volumes or many different formats
      In some applications where forms suitable for scanning by OCR are used, the nature of the work is such that the volume of forms processed is low or their formats are complex. Conventional OCR can be impractical for cost-benefit reasons in cases like these, such as when dedicated scanners are needed.
      As the form recognition service is cloud-based, in contrast, it does not require dedicated scanners and can make use of the devices that companies already have on site. The service provides continuous improvement of recognition accuracy and includes learning execution. Moreover, it can be put in place and operated at lower cost than conventional on-premises systems. It is also designed to work with a wide variety of businesses, without needing to worry about task-specific forms or devices.
    2. Use for seasonal tasks with high short-term workloads
      There are numerous cases of forms used in business that are mainly processed at a particular time of the year. The workloads during these busy times can be many times or even an order of magnitude higher than the rest of the year.
      As the form recognition service operates on a service platform in the cloud it is able to scale in response to special requirements such as these periods of high workload.

    Application to Data Entry and Checking at Processing Center

    Fig. 5—Example Application of Form Recognition Service (Efficiency Improvement at Processing Center) Fig. 5—Example Application of Form Recognition Service (Efficiency Improvement at Processing Center) Form data entry at the processing center is performed manually by staff and the entered data is checked visually against the original form. The confidence score provided by the form recognition service (high, medium, or low) can be used to identify when manual data entry is or is not required.

    The processing of money transfers is one of the three main forms of activity at a financial institution. It refers to the making of payments or other movements of money from one account to another without the use of cash, and may take various forms such as bank transfers, remittances, or the shifting of funds between accounts. It involves bank branches accepting various types of transfer forms from customers and passing them to a processing center where the data on the form is entered and data entry checked. Because these centers process such a large number of forms, high staff workloads and recruitment difficulties are among the issues they face. While they may deal with this by delegating money transfer processing to business process outsourcing services, this does not make the task of data entry and checking go away.

    The form recognition service offers a way to investigate how to ensure business sustainability and cut overheads by reducing the workload for data entry and checking at processing centers. Figure 5 lists the issues and the benefits of adopting the form recognition service.

    Forms processing is something that takes place not only at financial institutions, but also in numerous industries, such as the handling by government agencies of applications from the public or the freight industry’s processing of the various forms that it produces every day. The aim of the form recognition service is to support the processing of forms in different industries by scanning a wide variety of these forms and assessing the accuracy of the extracted information.

    Future Plans

    One of the applications where AI-OCR still struggles is the reading and analysis of loosely formatted business documents such as contracts or product catalogues (unstructured data).

    A prerequisite for reading free-format forms using AI-OCR is that the extent of format variability is similar to that, for example, of the total amount field in an invoice, where there is a degree of uniformity in where this information is located, albeit with small differences between different companies. Accordingly, this sort of form is sometimes referred to as partially free format.

    The term unstructured data is used for documents that lack the sort of structure found in a relational database, with no consistency in where fields are positioned relative to one another and where the same field can be expressed differently from one document to another, as is the case in many business documents. Examples of unstructured data include text, images, video, and audio.

    Another key word is “dark data.” This refers to the value hidden in the business documents created in the course of corporate activity and is so called because approximately 80% of generated data is never re-used(5), (6). As a term, dark data has a broad meaning that encompasses both structured and unstructured data.

    One example of the analysis of dark data might involve wanting to extract sales totals from financial reports made by different companies, a task that is complicated by the companies using different terminology, such as “construction revenue” or “total sales.” Despite these variations in terminology, it is possible to identify this information by generating sales total, financial period, and other feature values from the hierarchical formats (tables, columns, and so on) used in these financial reports.

    Hitachi is looking at applying this new technology to unstructured data that is currently difficult to scan and analyze.

    Conclusions

    This article has described the efficiency improvements provided by a form recognition service based on AI-OCR. It is anticipated that applications for AI-OCR will go beyond the simple reading of text to encompass further operational efficiencies through integration with RPA. In the future, Hitachi intends to continue pursuing efficiency gains that bring innovation to the business workplace using technologies like AI-OCR.

    Anyone interested in learning more about the form recognition service is urged to visit Hitachi’s Japanese website(7).