Use of AI for Data Extraction
in the Insurance Industry
02.08.2024
In the insurance industry, efficient and smooth document processing is crucial for business success. Among other things, it enables fast claims settlement and improved customer service. Numerous documents, such as claims notifications, invoices, cost estimates, and customer correspondence, are processed every day. Until now, the industry has mainly relied on Optical Character Recognition (OCR) technology to extract text from these documents. A subsequent AI model was used to extract the relevant information and translate the text into a structured form (see Figure 1).
Figure 1: Common, OCR-based, two-step approach to data extraction.
While OCR-based methods are certainly justified, the advantages of end-to-end processing without the use of OCR outweigh the disadvantages. In particular, OCR-based approaches are susceptible to error propagation, which can result from the conventional two-stage approach. In contrast, end-to-end processing (see Figure 2) significantly reduces computational overhead, eliminates reliance on external OCR tools, and provides customized solutions for specific enterprise documents. This technology also facilitates the generation of training data through the direct use of document-text pairs.
Figure 2: End-to-end approach to data extraction.
Our OCR-free GenAI solution uses advanced technologies from Google / Naver Clova (OCR-free Document Understanding Transformer) and DeepMind (Perceiver) to enable in-depth understanding of multi-page documents. It is specifically designed to accurately extract both individual fields and complex table structures from automotive documents. This marks a significant advance in document processing within the insurance industry.
By viewing multiple pages simultaneously, connections can be better recognized, which leads to more precise information extraction. Details that are introduced on one page and continued on another can be seamlessly integrated. This results in more comprehensive and accurate data extraction that would not be possible with traditional page-based systems.
At ControlExpert, we are continually committed to serving our customers' needs with innovative solutions. Our latest GenAI technology significantly optimizes the processing of multi-page documents and demonstrates our commitment to improving processes. We are convinced that this technology will have a positive impact on the insurance industry and look forward to supporting our customers in this area.
Deep Dive Architecture:
The model described uses an architecture that allows complete documents to be processed directly as images without relying on traditional text recognition (OCR). The model's Image Encoder, originating from the Donut, takes the collected images of the individual pages and converts them into high-dimensional embeddings. The Perceiver Decoder receives both the question to the document and the image embeddings via cross attention (see Figure 3: Combination of Donut Encoder and Perceiver Decoder (modified Figure 1 from Perceiver)) in order to look up and extract the relevant information from the large amount of data. In contrast to a conventional decoder, where the embeddings are added to the context of the model, resulting in quadratic complexity with respect to the context length (see LLaVA), this approach enables efficient memory utilization by using cross attention between a long document and a short question. The decoder generates the answer to the question, with the output including not only the next token, but also the bounding box and the document page containing the answer.
Figure 3: Combination of Donut Encoder and Perceiver Decoder (modified Figure 1 from Perceiver as well as donut image).