Harnessing information from unstructured data using large language models
properties.trackTitle
properties.trackSubtitle
The insurance industry is no stranger to data; it thrives on it. Yet, a significant portion of this data is unstructured, trapped in documents such as attending physician statements. However, that is quickly changing. Large language models (LLMs) have emerged as an accessible technology that is able to unlock these treasure troves of data and deliver previously difficult-to-access information and insights.
In this article series, we will explore how LLMs work, consider likely use cases in underwriting and claims, and examine the factors insurers must consider as they develop their AI strategies.
Large language models: The basics
We’ve all heard of Chat GPT – it’s the most famous large language model. But what does that mean, exactly? An LLM is a type of generative and natural language processing model that can perform a variety of tasks, including generating and classifying text, answering questions in a conversational manner, and translating text from one language to another.
LLMs convert vast amounts of data, such as articles and wikis, into encodings from which they can be ‘trained’ to understand context. The label “large” refers to the number of values (parameters) that are trainable. The language model can also learn from human feedback in a process called reinforcement learning with human feedback, or RLHF. Some of the most successful LLMs, like Chat GPT-3, have hundreds of billions of parameters that have been trained on the equivalent of 600,000 books worth of data and feedback from several hundred people.
The ability of LLMs to generate meaningful text by predicting the next viable word (or “token,” to be precise) in a sentence places them in a class of AI algorithms known as Generative AI.
Working like our brains
Breakthrough technologies
Pre-training and fine-tuning
More recent LLMs such as Chat GPT-4 were pre-trained (trained from scratch) on at least 1 PetaByte of data. That’s the equivalent of reading one billion books. Given the amount of data and hardware required, pre-training LLMs can be expensive. Fortunately, workarounds exist to lower the costs. One example is fine-tuning, which involves adjusting and adapting a pre-trained model to meet a specific need. For instance, an insurance provider could apply its own data to a pre-trained LLM to fine-tune it with the goal of outputting an underwriting recommendation.
The least expensive method of customizing an LLM is to add an information retrieval system to enhance the data available to the model on a particular subject. This is called retrieval augmented generation, or RAG for short. It is what allows us to, for example, extract and process details from an attending physician statement. RAG architectures are both efficient and cost-effective for large-scale specific information processing use cases.
Key strengths of LLMs
Powered by vast artificial neural networks, LLMs are sophisticated AI systems that offer an exciting array of opportunities for life and disability insurance companies, including:
- Synthesizing unstructured data: LLMs excel at extracting information from disparate sources, and consolidating data for efficient analysis.
- Personifying information: They offer a novel way to interact with unstructured data, making it more accessible and actionable.
- Competitive advantage: Companies with proprietary, domain-specific, unstructured data (e.g., medical underwriting data) can gain a significant edge by harnessing the previously untapped potential of their data reserves.
Making LLM integration viable
While the promise of LLMs is enticing, there are practical considerations that insurers must address to ensure successful implementation. The two most prominent are cost and technology.
The cost-effectiveness of LLM integration largely depends on the chosen training approach. Insurers should assess their specific use cases to determine the most efficient strategy. LLM training can be costly, but there are workarounds to mitigate expenses and techniques like fine-tuning and RAG can optimize the process. Careful management of cloud costs is vital, too, as significant expenses could accrue if not monitored diligently. Since properly curated datasets are critical for model accuracy and performance, insurers must also factor in the resource costs associated with creating datasets for training LLMs.
LLMs require suitable hardware, particularly GPUs, for hosting and inference (i.e., generating text outputs). Carriers need to assess their hardware infrastructure and invest accordingly. The choice between utilizing cloud-based services or investing in dedicated hardware is crucial. Long-term objectives and budget constraints need to be considered when making this decision.
Immense rewards
Harnessing the power of LLMs is not without its challenges, but the rewards could be immense. By embracing LLMs, insurers can transform unstructured data into actionable insights, thereby potentially enhancing decision-making, risk assessment, and customer service.
Insurers must carefully weigh cost factors and technology choices to make LLM integration into their operational processes a viable objective. Those who embrace this transformative technology may find themselves at the forefront of innovation in the life and disability insurance sector, better equipped to meet the evolving needs of their clients and navigate the complex landscape of modern insurance.
Next article: Using large language models for AI-enabled underwriting and claims.