The main function of MLOps is to automate more repeatable steps in the ML workflow for data scientists and ML engineers, from model development and training to model deployment and operation (model serving). Automating these steps can create agility for the business and provide a better experience for users and end customers, increasing the speed, functionality, and reliability of ML. These automated processes also reduce risk and free developers from rote tasks, allowing them to spend more time innovating. It all contributes to profitability: A 2021 global study by McKinsey found that companies that successfully scale AI can increase their earnings before interest and taxes (EBIT) by as much as 20%.
“It’s not uncommon for companies with sophisticated ML capabilities to incubate different ML tools in various areas of the business,” said Vincent David, Senior Director of Machine Learning at Capital One. “But often you start to see similarities — ML systems do similar things, but slightly differently. Companies that are figuring out how to get the most out of their ML investments are unifying and augmenting their best ML capabilities to Create standardized, foundational tools and platforms that everyone can use and ultimately create differentiated value in the marketplace.”
In practice, MLOps requires close collaboration between data scientists, ML engineers, and site reliability engineers (SREs) to ensure consistent repeatability, monitoring, and maintenance of ML models. Over the past few years, Capital One has developed MLOps best practices across industries: balancing user needs, adopting a common cloud-based technology stack and underlying platform, leveraging open source tools, and ensuring appropriate levels of accessibility and governance for data and model.
Understand the different needs of different users
There are usually two main types of users of ML applications – technical experts (data scientists and ML engineers) and non-technical experts (business analysts) – and it is important to strike a balance between their different needs. Technologists often prefer complete freedom to use all available tools to build models for their intended use cases. On the other hand, non-technical experts need user-friendly tools that give them access to the data they need to create value in their own workflows.
To build consistent processes and workflows that cater to both groups, David recommends meeting with the application design team and subject matter experts, covering a wide range of use cases. “We look at specific cases to understand the problem so that users get what they need to help their work, in particular, but also the company as a whole,” he said. “The key is figuring out how to create the right capabilities while balancing the various stakeholders and business needs within the enterprise.”
Adopt a common technology stack
If these teams do not use the same technology stack, collaboration between development teams (critical to successful MLOps) can be difficult and time-consuming. A unified technology stack allows developers to standardize and reuse components, functions, and tools in models such as Lego bricks. “This makes it easier to combine related functionality, so developers don’t waste time switching from one model or system to another,” David said.
A cloud-native stack built to take advantage of the cloud model of distributed computing allows developers to self-service infrastructure on demand, continuously leveraging new capabilities and introducing new services. Capital One’s decision to go all-in on the public cloud had a significant impact on developer efficiency and speed. Now, releasing code to production is much faster, and ML platforms and models can be reused across the wider enterprise.
Save time with open source ML tools
Open source ML tools (codes and programs that anyone can use and adapt for free) are a core element in creating a strong cloud foundation and unified technology stack. Using existing open source tools means companies don’t need to invest valuable technical resources to reinvent the wheel, allowing teams to build and deploy models faster.