Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierFlux principal

What is MLflow?

MLflow is an open source platform, used for managing machine learning workflows. It was launched back in 2018 and has grown in popularity ever since, reaching 10 million users in November 2022. AI enthusiasts and professionals have struggled with experiment tracking, model management and code reproducibility, so when MLflow was launched, it addressed pressing problems in the market. MLflow is lightweight and able to run on an average-priced machine. But it also integrates with more complex tools, so it’s ideal to run AI at scale.

A short history

Since MLflow was first released in June 2018,  the community behind it has run a recurring survey to better understand user needs and ensure the roadmap s address real-life challenges.  About a year after the launch, MLflow 1.0 was released, introducing features such as improved metric visualisations, metric X coordinates, improved search functionality and HDFS support. Additionally, it offered Python, Java, R, and REST API stability.

MLflow 2.0 landed in November 2022, when the product also celebrated 10 million users. This version incorporates extensive community feedback to simplify data science workflows and deliver innovative, first-class tools for MLOps. Features and improvements include extensions to MLflow Recipes (formerly MLflow Pipelines) such as AutoML, hyperparameter tuning, and classification support, as well as improved integrations with the ML ecosystem, a revamped MLflow Tracking UI, a refresh of core APIs across MLflow’s platform components, and much more.

In September 2023, Canonical released Charmed MLflow, a distribution of the upstream project.

Why use MLflow?

MLflow is often considered the most popular ML platform. It enables users to perform different activities, including:

  • Reproducing results: ML projects usually start with simplistic plans and tend to go overboard, resulting in an overwhelming quantity of experiments. Manual or non-automated tracking implies a high chance of missing out on finer details. ML pipelines are fragile, and even a single missing element can throw off the results. The inability to reproduce results and codes is one of the top challenges for ML teams.
  • Easy to get started: MLflow can be easily deployed and does not require heavy hardware to run. It is suitable for beginners who are looking for a solution to better see and manage their models. For example, this video shows how Charmed MLflow can be installed in less than 5 minutes.
  • Environment agnostic: The flexibility of MLflow across libraries and languages is possible because it can be accessed through a REST API and Command Line Interface (CLI). Python, R, and Java APIs are also available for convenience.
  • Integrations: While MLflow is popular in itself, it does not work in a silo. It integrates seamlessly with leading open source tools and frameworks such as Spark, Kubeflow, PyTorch or TensorFlow.
  • Works anywhere: MLflow runs on any environment, including hybrid or multi-cloud scenarios, and on any Kubernetes.

MLflow components

MLFlow is an end-to-end platform for managing the machine learning lifecycle. It has four primary components:

MLflow Tracking

MLflow Tracking enables you to track experiments, with the primary goal of comparing results and the parameters used. It is crucial when it comes to measuring performance, as well as reproducing results. Tracked parameters include metrics, hyperparameters, features and other artefacts that can be stored on local systems or remote servers. 

MLflow Models

MLflow Models provide professionals with different formats for packaging their models. This gives flexibility in where models can be used, as well as the format in which they will be consumed. It encourages portability across platforms and simplifies the management of the machine learning models. 

MLflow projects

Machine learning projects are packaged using MLflow Projects. It ensures reusability, reproducibility and portability. A project is a directory that is used to give structure to the ML initiative. It contains the descriptor file used to define the project structure and all its dependencies. The more complex a project is, the more dependencies it has. They come with risks when it comes to version compatibility as well as upgrades.

MLflow project is useful especially when running ML at scale, where there are larger teams and multiple models being built at the same time. It enables collaboration between team members who are looking to jointly work on a project or transfer knowledge between them or to production environments.

MLflow model registry

Model Registry enables you to have a centralised place where ML models are stored. It helps with simplifying model management throughout the full lifecycle and how it transitions between different stages. It includes capabilities such as versioning and annotating, and provides APIs and a UI.

Key concepts of MLflow

MLflow is built around two key concepts: runs and experiments. 

  • In MLflow, each execution of your ML model code is referred to as a run. All runs are associated with an experiment. 

An MLflow experiment is the primary unit for MLflow runs. It influences how runs are organised, accessed and maintained. An experiment has multiple runs, and it enables you to efficiently go through those runs and perform activities such as visualisation, search and comparisons. In addition, experiments let you run artefacts and metadata for analysis in other tools.

Kubeflow vs MLflow

Both Kubeflow and MLFlow are open source solutions designed for the machine learning landscape. They received massive support from industry leaders, and are driven by a thriving community whose contributions are making a difference in the development of the projects.  The main purpose of both Kubeflow and MLFlow is to create a collaborative environment for data scientists and machine learning engineers, and enable teams to develop and deploy machine learning models in a scalable, portable and reproducible manner.

However, comparing Kubeflow and MLflow is like comparing apples to oranges. From the very beginning, they were designed for different purposes. The projects evolved over time and now have overlapping features. But most importantly,  they have different strengths. On the one hand, Kubeflow is proficient when it comes to machine learning workflow automation, using pipelines, as well as model development. On the other hand, MLFlow is great for experiment tracking and model registry. From a user perspective, MLFlow requires fewer resources and is easier to deploy and use by beginners, whereas Kubeflow is a heavier solution, ideal for scaling up machine learning projects.

Read more about Kubefllow vs. MLflow

Go to the blog

Charmed MLflow vs the upstream project

Charmed MLflow is Canonical’s distribution of the upstream project. It is part of Canonical’s growing MLOps portfolio. It has all the features of the upstream project, to which we add enterprise-grade capabilities such as:

  • Simplified deployment: the time to deployment is less than 5 minutes, enabling users to also upgrade their tools seamlessly.
  • Simplified upgrades using our guides.
  • Automated security scanning: The bundle is scanned at a regular cadence..
  • Security patching: Charmed MLflow follows Canonical’s process and procedure for security patching. Vulnerabilities are prioritised based on severity, the presence of patches in the upstream project, and the risk of exploitation.
  • Maintained images: All Charmed MLflow images are actively maintained.
  • Comprehensive testing: Charmed MLflow is thoroughly tested on multiple platforms, including public cloud, local workstations, on-premises deployments, and various CNCF-compliant Kubernetes distributions.

Further reading

Meet our Federal team at NLIT 2024

We’re excited to announce our participation in NLIT 2024. As our collaboration with the Department of Energy (DOE) is strengthening, we’re looking forward to meeting our partners and customers on-site to discuss the critical topics for 2024: Cybersecurity, Artificial Intelligence and open-source innovation.

AI/ML Solutions for the DOE 

The public sector, including DOE, invests heavily in AI, aiming to develop predictive algorithms for some of the most cutting-edge research. Agencies kickstart initiatives with different use cases in mind, such as predicting weather patterns, power plant maintenance, and collecting and processing carbon data, looking for tooling that enables them to run AI at scale. 

Secure your AI stack 

We understand the difficulty of ensuring compliance and security while building AI applications and models at scale. We secure and maintain the widest open-source software library and solutions like Charmed Kubeflow, MLFlow, Spark and Kafka with reliable security patching and up to 10 years of support. We simplify the AI journey with an integrated and secure stack. 

Kubeflow, for example, helps professionals focus on the development and deployment of machine learning models, offering security patching, user management and a wide range of integrations on top of any Kubernetes. 

Read more about AI in public sector

To provide the most complete AI solutions to DOE, we’ve partnered with the leading hardware, silicon and cloud providers, such as NVIDIA, DELL, AWS, Google Cloud, HPE, Intel, Azure and more. 

Ubuntu 24.04 LTS – Powering Diverse Computing Environments with Security and Intelligence

Join us for an excerpt discussion on April 9th at 4:30 PM in Room #618.

This session explores the latest Ubuntu 24.04 LTS, showcasing its roles from powering edge devices to orchestrating core high-performance computing (HPC) infrastructure. We will explore Ubuntu’s seamless integration across the computing landscape, highlighting its capacity to support diverse and secure systems. 

We speak about Ubuntu’s newly available Confidential Computing integrations, a crucial feature for protecting guest workloads in cloud and on-premise environments against unauthorized access. We will also discuss defense-in-depth best practices for security hardening any Canonical solutions like FIPS/STIG/CIS Benchmarks.

Finally, we will present Ubuntu’s comprehensive, open-source MLOps solution, designed to accelerate the deployment of AI models from experimentation to production. Attendees will gain a holistic understanding of how Ubuntu 24.04 LTS is shaping the future of computing with secure, scalable, and intelligent computing.

Cybersecurity with Ubuntu Pro

With our commitment towards securing open source, last year, we announced the general availability of Ubuntu Pro subscription. It secures an organisation’s Linux estate from OS to the application level. Pro is available on-prem, in the cloud and air-gapped environments, automating security patching, auditing, access management and compliance. Ubuntu Pro delivers FIPS compliance and automation for security standards such as DISA’s Ubuntu STIG, and CIS hardening via the Ubuntu Security Guide (USG).

One of the growing concerns for 2024 is application security. Many open-source packages for applications and toolchains exist in a space with no guarantee or SLA for security patching. With Ubuntu Pro, we secure over 23,000 + open source applications.

Schedule a meeting with our Federal  Directors, Devin Breen and Kelley Riggs, for an in-person discussion or a demo!

DEMOS

At NLIT, our Field Software Engineer Ethan Myers will showcase a few of Canonical’s software products, including Landscape, Kubeflow, and Microcloud.

Landscape is a systems management tool for Ubuntu. It automates security patching, auditing, access management, and compliance tasks across your Ubuntu estate. Use it in well-connected or airgapped environments: at sea, in space, and everywhere in between.

Kubeflow is an open-source machine learning toolkit based on Kubernetes. It gives data scientists the tools they need to develop, test, and deploy machine learning models to production on top of Kubernetes.

MicroCloud – an easy-to-deploy, highly available cloud suitable for private clouds, edge compute, and as an high-power eddev/test test environment. MicroClouds offer virtualization and containerization (LXD), distributed storage (Ceph) and software-defined networking (OVN), all wrapped up in a simple deployment and easy-to-use web interface.

Schedule a meeting or in-person demo

Schedule a meeting with our Federal Directors  Kelley Riggs and Devin Breen for an in-person discussion or a demo!

A deep dive into Kubeflow pipelines 

Widely adopted by both developers and organisations, Kubeflow is an MLOps platform that runs on Kubernetes and automates machine learning (ML) workloads. It covers the entire ML lifecycle, enabling data scientists and machine learning engineers to develop and deploy ML models. Kubeflow is designed as a suite of leading open source projects that enable different capabilities such as model serving, training or hypertuning optimisations.

At Canonical, we deliver Charmed Kubeflow – an official distribution of the upstream solution with additional security maintenance, tool integrations, and enterprise support and managed services – so we know a thing or two about the project. In our experience, one of the most  important concepts to understand with respect to both Kubeflow itself and the broader ML lifecycle is machine learning pipelines. Taking advantage of pipelines is the best way to effectively deploy models at scale in production, so let’s break down this critical component in the MLOps landscape.

What is an ML pipeline?

A machine learning pipeline is an important component of ML systems, ensuring simplified experimentation and capability to take models to production. They are a series of steps that automate how ML models are created, in order to streamline the workflow,development and deployment. ML pipelines simplify the complexity of the end-to-end ML lifecycle, helping professionals to develop and deploy models. Amongst their benefits, ML pipelines ensure scalability thanks to their ability to handle large volumes of data while supporting collaboration and reproducibility.

A core value of MLOps platforms such as Kubeflow  is that they enable professionals to build and maintain ML pipelines.

What is Kubeflow Pipelines?

Kubeflow Pipelines or KFP is the heart of Kubeflow. It is a Kubeflow component that enables the creation of ML pipelines. It is used to help you build and deploy container-based ML workflows that are portable and scalable. The main goals of Kubeflow Pipelines are to simplify the following processes:

  • Orchestration of the end-to-end ML pipelines
  • Experimentation with various ideas and techniques
  • Experiment management 
  • Reuse of components and pipelines to enable users to quickly put together end-to-end solutions without having to re-build each time

Components of Kubeflow Pipelines

Kubeflow Pipelines is part of the Kubeflow project. It can be used as part of the project or as an independent tool. It is made of 3 main components:

  • User interface (UI) for managing and tracking experiments, jobs, and runs
  • Engine for scheduling multi-step ML workflows
  • SDK for defining and manipulating pipelines and components

Kubeflow Pipelines use cases

Kubeflow Pipelines is typically most useful for advanced users of Kubeflow or professionals who already have experience with machine learning. You don’t necessarily need KFP in the experimentation phase of the ML journey, but it becomes useful when you want to take yourmodels to production. The main use cases for KFP include:

  • Workflow automation: Data scientists and machine learning engineers often perform a lot of the initial experimentation phase manually to better understand optimisation possibilities and quickly iterate. But once they have defined their workflow, they can use KFP to automate the process and save time.
  • Model deployment to production: Models are usually compiled in a binary file. Traditionally, for the model to be loaded to a server where the requirements for inference are met, this file would be manually copied to  the machine that hosts the application. KFP simplifies this process by enabling you to build automated pipelines to multiple applications or servers.
  • Model maintenance and updates: The ML lifecycle is an iterative process and models need to be updated periodically. KFP helps users run updates and rollbacks across multiple applications or servers. Once the model is updated in one place and the update transaction is complete, KFP ensures the update is quickly applied to all client applications.  
  • Multi-tenant ML environment: Organisations often have large data and ML teams that need to share their resources. KFP enables simple and effective sharing of the environment, where each collaborator gets an isolated environment. It is then utilised by the K8s cluster and tools such as Volcano to schedule resources or manage containers.  This helps professionals isolate workflows and keep track of pending and running jobs for each collaborator. 

Benefits of KFP

Among machine learning specialists, Kubeflow Pipelines is widely adopted for a number of reasons. The most important benefits of KFP include:

  • Streamlined workflow automation: Kubeflow Pipelines allows users to define the machine learning pipelines as a sequence of steps, each with its input, output, and dependencies. This leads to streamlining the machine learning workflows, and reduces the overhead and complexity of managing and executing your pipelines.
  • Improved collaboration: Kubeflow Pipelines provides a central and shared platform for data scientists, machine learning engineers, and IT operations teams to collaborate on machine learning projects. It allows them to share pipelines and artifacts with others, and enables the tracking and monitoring of the pipelines across the entire organisation.
  • Enhanced performance and scalability: Kubeflow Pipelines runs on Kubernetes, which provides a scalable and flexible infrastructure for running machine learning pipelines and models. This allows you to easily scale up and down the pipelines, and ensure that your pipelines are performant and reliable.
  • Resource optimisation: KFP is a cloud native application, so it can leverage the resource schedulers that Kubernetes platforms provide. This leads to optimised usage of the existing resources and faster project delivery.
  • Extensive support for popular machine learning frameworks: KFP provides built-in support for popular machine learning frameworks like TensorFlow, PyTorch, and XGBoost, as well as a rich ecosystem of integrations and plugins for other tools and services. Charmed Kubeflow goes a step further and enables additional integrations with tools and frameworks such as NVIDIA NGC Containers, Triton Inference Server and MLflow.

Whereas Kubeflow Pipelines is a feature-rich tool, it still raises some challenges for beginners. It comes with a steep learning curve and there is limited documentation available. Since it is a fully open source tool, there is a big community that can help beginners, but it can be frustrating at times. You can alleviate these challenges by taking advantage of enterprise support or managed services from organisations which distribute Kubeflow.

Architecture of Kubeflow Pipelines

Kubeflow Pipelines is a complex component with capabilities that unblock users and enable them to automate their workflows and reduce their time spent on manual tasks. The following architecture depicts these capabilities:  

source: Kubeflow community

As the diagram illustrates, users can interact with KFP either through the user interface or through development tools such as Notebooks. Initially, users create components or specify a pipeline using the Kubeflow Pipelines domain-specific language (DSL). Once defined, the compiler transforms the Python code into a YAML static configuration. Then, the Pipeline Service creates a pipeline run from the static configuration. It calls the server of Kubernetes API for creating the necessary Kubernetes resources (CRDs) to run the pipeline. If you have a resource scheduler integrated, you can use it to run the pipeline when resources are available or at a desired time. To complete the pipeline, the containers are executed within the Kubernetes pods, using orchestration controllers.

Two types of data can be stored. The first type is metadata, which includes experiments, jobs, pipeline runs, and single scalar metrics. The second type is artefacts, which includes pipeline packages, views, and large-scale metrics (time series). Metadata is stored in a MySQL database, whereas artefacts are stored within MinIO.  Storing them in an external component also enables portability, so artefacts can be migrated to different clusters or environments.

Kubernetes resources created by the Pipeline Service are monitored by the Persistence Agent. To enable reproducibility, the input and output of the containers are recorded. It enables professionals to use the configurations and replicates different tasks, also being able to check if the results match. They consist of parameters or data artefact URIs and are seen as metadata.

The Pipeline web server enables users to get a visual understanding of the steps from the Kubeflow Pipelines. It presents various information, including list of pipelines currently running, history of pipeline execution, data artefacts and logs for debugging.

Get started with Kubeflow Pipelines

In order to access Kubefow Pipelines, users can either deploy them independently or as part of the Kubeflow project. For simplified deployment, we recommend using Charmed Kubeflow.

  1. Deploy Charmed Kubeflow following the tutorial. You can do it on any environment, including public cloud or on-prem. Ensure that you have enough resources available, so you do not bump into problems along the way
  2. Access the Kubeflow dashboard. In case you are accessing it from a VM or from a public cloud, please ensure that you change the SOCKs proxy settings. There you will have different options, including to upload an existing pipeline or create a new one.
  3. Clone this repository from Github which contains a simple example of how to use some of the components of Kubeflow 
  4. Access the examples from the Notebook. There are several pipelines created which you can run, edit or play with. Of course, they are just examples. In order to build your own pipeline, check the official documentation of the Kubeflow project.

Further reading

Kubeflow vs MLflow

Launch NGC containers with Kubeflow

MLOps pipelines with Kubefow, MLflow and Seldon

AI on-prem: what should you know?

30 janvier 2024 à 13:11

Organisations are reshaping their digital strategies, and AI is at the heart of these changes, with many projects now ready to run in production. Enterprises often start these AI projects on the public cloud because of the ability to minimise the hardware burden. However, as initiatives scale, organisations often look to migrate the workloads on-prem for reasons including costs, digital sovereignty or compliance requirements. Running AI on your own infrastructure comes with clear benefits, but it also raises some major challenges that infrastructure and MLOps experts need to consider.

MLOps acts as the enabler in running AI workloads in a repeatable and reproducible manner. MLOps platforms such as Charmed Kubeflow are cloud-native applications that run on Kubernetes. Building such an architecture on-prem helps organisations to easily deploy, manage and scale their AI applications.

Advantages of AI on-prem

When building their AI strategies, organisations should consider factors such as cost-effectiveness, ability to manage, security and compliance, and performance. Let’s take a look at how running AI projects on-prem addresses these priorities

AI on existing infrastructure

Building a completely new data centre for AI projects can be overwhelming and take time, but it isn’t always necessary. If you already have existing infrastructure that you aren’t fully utilising, it could be suitable for your AI initiatives. Doing AI on-prem on existing infrastructure is a great way to quickly kickstart new projects and experiments, assess the possible return on investment of different use cases, and gain additional value from your existing hardware.

Secure ML workloads on-prem

Many organisations have already well-defined internal policies that also need to be followed by any new AI initiatives. Adhering to these policies is easier using on-prem infrastructure, ensuring a secure and compliant foundation for the MLOps platform and enabling you to build repeatable and reproducible ML pipelines.  Especially in highly regulated industries, running AI on-prem could accelerate compliance and security check-ups, helping you to focus on building models, rather than security concerns.

Cost-effective solution

While public clouds nowadays offer different types of instances to run machine learning workloads, for enterprises that store all their data on their own infrastructure, moving it would come with a significant cost. You can circumvent this challenge entirely by running your AI projects in the same location that you are already storing your data. This is one of the reasons why organisations often prefer building their AI workloads on-prem

Disadvantages of AI on-prem

Building and scaling AI projects requires computing power. For organisations that need more computing power, this is a big investment to make before even getting started. At the same time, on-prem infrastructure requires a significant upfront cost and comes with the burden of operating the infrastructure post-deployment. On-prem deployments also have only a limited number of pre-trained models and ready-made services that enterprises can take advantage of. 

At the opposite end of the spectrum, public clouds are easy to get started and do not require a big investment. They have big libraries of pre-trained models, such as Amazon BedRock, that can give organisations a head-start. That being said, public clouds often prove to be less cost-effective in the long-term.

Rolling out a new strategic initiative such as an artificial intelligence project comes with a new set of challenges. When deciding whether to run your AI initiatives on-prem, there are a number of key factors you should consider to determine whether it’s the right approach for you:

When should you run AI on-prem?

  • Compute performance: It’s no secret that AI projects require significant computing power, and these requirements are only increasing. You should only commit to an on-prem AI strategy if you are certain that you have the resources to satisfy these compute demands, with room to scale. 
  • Industry regulations: Complying with industry regulations is often easier when you have full control over your data on your own hardware. If you operate in highly-regulated sectors such as healthcare or financial services, then on-prem AI is likely to be the right choice. 
  • Privacy: These same principles extend to the broader realm of data privacy, which plays an important role in any AI project. On-prem infrastructure represents a compelling option for organisations looking to maximise control over their data and ML models.
  • Initial investment: The best infrastructure option will depend largely on the budget allocated for the initial investment. If you lack the resources to support upfront hardware costs, public cloud may be more suitable – unless you have existing, unutilised on-prem infrastructure that you can take advantage of.
  • Customisable solution: Do you want a ready-made solution, or a platform that enables you to customise your AI deployment to suit your specific requirements? If you’re looking for flexibility, on-prem is the clear winner.

Open source solutions for AI on-prem

Open source is at the heart of the AI revolution. There are a growing number of open source solutions that benefit from wide adoption in the machine-learning world. Organisations can build a fully open source MLOps platform on-prem using some of the leading tools available:

  • OpenStack: a fully functional cloud platform that ensures smooth integration with leading performance acceleration devices, such as GPUs.
  • Kubernetes: can be used as a container orchestration tool.
  • Kubeflow: a MLOps platform to develop and deploy machine learning models.
  • MLflow: a machine learning platform for model registry. 

Open source tools come with plenty of benefits. However, it is important to choose the right versions. To ensure the security of the tooling as well as seamless integration, organisations need official distributions that are suitable for enterprise deployments – such as those delivered by Canonical.

Want to learn more about AI on private cloud with open source? Enroll now for our live webinar 

Hybrid strategy with open source 

According to the Cisco 2022 Global Hybrid Cloud Trends Report, 82% of IT decision-makers have adopted a hybrid IT strategy. Correlating this with all the focus that organisations put nowadays on their artificial intelligence strategy, it is easy to notice that many of the new projects will run on a hybrid cloud scenario. Open source tools – like those that Canonical supports and integrates in an end to end solution – , mentioned also before enable organisations to build and scale their AI initiatives on their cloud of choice. Users can  It helps them kickstart on a public cloud to minimise the hardware burden and then develop a hybrid cloud strategy that ensures time effectiveness and cost efficiency. 

AI webinar series

Follow our webinar series and stay up to date with the latest news from the industry.

Further reading

❌
❌