abstract data lines
© Dmitry / stock.adobe.com

Building a robust, web-based application backed by well-grounded software practices

Data Engineering Case Study

    alt txt

    properties.trackTitle

    properties.trackSubtitle

    Overview

    Munich Re's North America Data Engineering team (DE), within the Integrated Analytics group, brings unique models to life. Data Engineers wear many hats: they are designers, developers, and debuggers. Our team creates and maintains the infrastructure that our business partners use to integrate data science insights into pre-existing or novel workflows.

    Recently, our team delivered a web-based, client-facing application that is now used in the daily underwriting workflow – allowing the client to receive case data and provide model scores through API requests. In this case study, we explore the many design decisions and technical expertise that made it possible to build and integrate a non-trivial underwriting solution within a short timeframe. This process is the software development and deployment cycle in action, customized to align with the evolving insurance industry and our client's specific needs.

    Development
    Designing the application

    Based on client requirements and our standard software best practices, the Data Engineering team approached the project by focusing on two primary aspects:
    data science - case study - web app
    data science - case study - web app
    DE devised a secure and robust technology stack that leveraged the best-suited innovations for the task and utilized an Agile software development process to ensure the client's needs were met expediently. The stack consists of two main parts: the backend (a collection of the pieces responsible for processing and storing the data) and the frontend (a client-facing user interface). The table (Technical Components) below highlights a few technical components of the application and the rationale for selection. The design decisions were oriented around a cloud-native architecture. Security received further consideration and concern, which is discussed in detail later in this article.
    What is Agile Software Development? Agile is a project management framework that encourages frequent, incremental deliverables and high levels of collaboration. Feature development, testing, and deployment tasks are planned and spread equally amongst team members, allowing quality work to be produced quickly. For this project, as features were developed, they were deployed and made available for the client to test in real-time, resulting in iterative improvements.
    App Component Technology What is it? Why is it used?
    Frontend ReactJS JavaScript framework for creating interactive user interfaces (UI). Provides an easy way to create a dashboard to display statistics about the cases submitted.
    Parcel Web application bundler, which combines the multiple assets that are needed for the UI (HTML, CSS, and JavaScript). Makes the UI more efficient to run on modern web browsers.
    Backend Flask (with gunicorn) Python libraries and tools for creating web-based applications. Allows for lightweight applications that can handle complexity. Also used to distribute API request loads.
    CosmosDB A NoSQL Azure database that supports SQL querying. Enables data retrieval via PowerBI. Stores the parsed details of requests submitted. It is designed for fast retrieval of items and does not require a schema.
    Azure Blob Storage Storage for unstructured data (like model object files). Provides a simple service for saving and loading files in the cloud.
    Deployment Docker Designed to make it easier to create, deploy, and run applications by using containers. Allows a developer to package up the needed parts for an application (libraries, dependencies, etc.) and ship it as one package.
    Azure Pipeline and Azure App Service Continuous integration and continuous deployment (CI/CD) tool. Code is deployed to “slots” set up in App Service, with separate regions for Development, Non-Production (also called UAT), and Production. Streamlines the integration of new changes into the system by automatically building, testing, and deploying application code.

    Deployment
    Making the application available to the client

    The application heavily utilizes Azure Pipeline to build and deploy the application created using Azure App Services. When changes are made to the code, Azure Pipeline runs a predefined series of jobs encompassing the app's components, including automated testing. This process is commonly called continuous integration and continuous deployment (CI/CD), where manual intervention is limited. The testing environments are identical copies of the production environment where the final application runs.
    Deployment model
    1. Developer commits new changes via pull requests to Azure Repos.
    2. If the tests pass, another developer will peer-review and approve changes.
    3. On approval, the Azure Pipeline is triggered to create containerized development instances of the application, run automated unit tests and check code coverage.
    4. Once the application is tested in the application's development region, Azure Pipeline is triggered to deploy the new code to a User Acceptance Testing (UAT) environment, which is otherwise identical to the production environment where the final app runs.
    5. The client runs regression tests on this copy of the application to validate the changes.
    6. Upon client approval, the change to the version of the application moves to production.

    Once the application is live, the client uses API calls to submit the fields required to predict each applicant. In return, it receives a real-time response that contains the model output and any other calculated fields. Clients are generally well equipped to integrate these responses into their workflows, though data engineering also assists along the way to ensure a smooth process. Other integrated functionalities of Azure Cloud Services, such as health monitoring, are used to check the live production application's availability and responsiveness. This automatic process can restart parts of the application to make sure it is always live and working for the client.

    Testing is consistent throughout both development and deployment to ensure that the application acted as expected. During development, unit tests, or tests that cover small, well-defined portions of the code, are run on sample data. More technically challenging tests verify that reading and writing to databases were working correctly. For this, Azure's CosmosDB Emulator is used to make mock connections to local copies of the database to test the application without incurring cloud storage costs. Finally, user acceptance testing on a parallel but isolated copy of the production app is vital to ensure the process worked as desired from a business user's perspective.

    Security
    Protecting the application

    Security is paramount since data submitted to the API contains sensitive information, and the model response is also proprietary. During development, sensitive material was received from the client using Secure File Transfer Protocol (SFTP) rather than email. SFTP provides a secure means of transferring files. As part of the deployment, all traffic to and from the API is securely encrypted. To ensure no malicious user can make API requests, the API endpoint is accessible only to users with an API access key that Azure App Services have authenticated. Also, the production and non-production app instances are isolated and use different databases with different access keys. If a malicious user ever received access to the API key, that user could post requests but would not read any of the information stored in the databases. Logging can also assist in detecting intrusion attempts by detecting a suspicious web crawling pattern, and the app is made temporarily unavailable while the event is investigated.

    Finally, the entire Azure infrastructure is contained in a virtual network. A virtual network isolates the resources inside the network from the public. All virtual network resources can communicate using private IP addresses, but the network is generally hidden from the public. Azure Firewall also monitors and controls the incoming and outgoing network traffic based on predetermined security rules. This establishes a barrier between a trusted internal network and untrusted external network, such as the Internet, and provides another security layer. 

    Security is paramount since data submitted to the API contains sensitive information, and the model response is also proprietary.

    Conclusion

    This case study presents a high-level view of the decisions and processes that our Data Engineering team undertook to deploy machine learning model predictions as a web-based service for our client. Throughout, we considered various choices for cloud architecture and security, with some options ruled out due to the client's requirements or after testing. As Data Engineers, we strive to improve. Our continuous integration/continuous deployment (CI/CD) process was also enhanced over time. For example, resource allocation between the production and non-production applications was modified to optimize performance while managing costs. Currently, the application is stable in production and returning secure, real-time results to the client.

    Integrated Analytics provides white-labeled analytics solutions for Munich Re's North America Life clients. To that end, Data Science and Data Engineering will work hand in hand with clients to develop products that meet business needs and technical requirements.  

    Contact Us
    Batra Raghav
    Raghav Batra
    Staff Engineer, Integrated Analytics
    Munich Re North America Life
    Srinivasan Pragnya
    Pragnya Srinivasan
    Director, Integrated Analytics
    Munich Re North America Life

    Related Content