Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierUbuntu blog

Ventana and Canonical collaborate on enabling enterprise data center, high-performance and AI computing on RISC-V

11 avril 2024 à 09:00

This blog is co-authored by Gordan Markuš, Canonical and Kumar Sankaran, Ventana Micro Systems

Unlocking the future of semiconductor innovation 

RISC-V, an open standard instruction set architecture (ISA), is rapidly shaping the future of high-performance computing, edge computing, and artificial intelligence. The RISC-V customizable and scalable ISA enables a new era of processor innovation and efficiency. Furthermore, RISC-V democratizes innovation by allowing new companies to develop their own products on its open ISA, breaking down barriers to entry and fostering a diverse ecosystem of technological advancement. 

By fostering a more open and innovative approach to product design, the RISC-V technology vendors are not just a participant in the future of technology; they are a driving force behind the evolution of computing across multiple domains. Its impact extends from the cloud to the edge:

  • In modern data centers, enterprises seek a range of infrastructure solutions to support the breadth of modern workloads and requirements. RISC-V provides a versatile solution, offering a comprehensive suite of IP cores under a unified ISA that scales efficiently across various applications. This scalability and flexibility makes RISC-V an ideal foundation for addressing the diverse demands of today’s data center environments.
  • In HPC, its adaptability allows for the creation of specialized processors that can handle complex computations at unprecedented speeds, while also offering a quick time to market for product builders.  
  • For edge computing, RISC-V’s efficiency and the ability to tailor processors for specific tasks mean devices can process more data locally, reducing latency and the need for constant cloud connectivity. 
  • In the realm of AI, the flexibility of RISC-V paves the way for the development of highly optimized AI chips. These chips can accelerate machine learning tasks by executing AI centric computations more efficiently, thus speeding up the training and inference of AI workloads.

One of the unique products that can be designed with RISC-V ISA are chiplets. Chiplets are smaller, modular blocks of silicon that can be integrated to form a larger, more complex chip. Instead of designing a single monolithic chip, a process that is increasingly challenging and expensive at cutting-edge process nodes, manufacturers can create chiplets that specialize in different functions and combine them as needed. RISC-V and chiplet technology is empowering a new era of chip design, enabling more companies to participate in innovation and tailor their products to specific market needs with unprecedented flexibility and cost efficiency.

Ventana and Canonical partnership and technology leadership

Canonical makes open source secure, reliable and easy to use, providing support for Ubuntu and a growing portfolio of enterprise-grade open source technologies. One of the key missions of Canonical is to improve the open source experience across ISA architectures. At the end of 2023, Canonical announced joining the RISC-V Software Ecosystem (RISE) community to  support the open source community and ecosystem partners in bringing the best of Ubuntu and open source to RISC-V platforms. 

As a part of our collaboration with the ecosystem, Canonical has been working closely with Ventana Micro Systems (Ventana). Ventana is delivering a family of high-performance RISC-V data center-class CPUs delivered in the form of multi-core chiplets or core IP for high-performance applications in the cloud, enterprise data center, hyperscale, 5G, edge compute, AI/ML and automotive markets. 

The relationship between Canonical and Ventana started with a collaboration on improving the upstream software availability of RISC-V in projects such as u-boot, EDKII and the Linux kernel. 

Over time, the teams have started enabling Ubuntu on Ventana’s Veyron product family. Through the continuous efforts of this partnership Ubuntu is available on the Ventana Veyron product family and as a part of Ventana’s Veyron Software Development Kit (SDK).

Furthermore, the collaboration extends to building full solutions for the datacenter, HPC, AI/ML and Automotive, integrating Domain Specific Accelerators (DSAs) and SDKs, promising to unlock new levels of performance and efficiency for developers and enterprises alike. Some of the targeted software stacks can be seen in the figure below.  

Today, Ventana and Canonical collaborate on a myriad of topics. Together through their joint efforts across open source communities and as a part of RISC-V Software Ecosystem (RISE), Ventana and Canonical are actively contributing to the growth of the RISC-V ecosystem. We are proud of the innovation and technology leadership our partnership brings to the ecosystem. 

Enabling the ecosystem with enterprise-grade and easy to consume open source on RISC-V platforms

Ubuntu is the reference OS for innovators and developers, but also the vehicle to enable enterprises to take products to market faster. Ubuntu enables teams to focus on their core applications without worrying about the stability of the underlying frameworks. Ventana and the RISC-V ecosystem recognise the value of Ubuntu and are using it as a base platform for their innovation. 

Furthermore, the availability of Ubuntu on RISC-V platforms not only allows developers to prototype their solutions easily but provides a path to market with enterprise-grade, secure  and supported open source solutions.Whether it’s for networking offloads in the data center, training AI models in the cloud, or running AI inference at the edge, Ubuntu is an established platform of choice.

Learn more about Canonical’s engagement in the RISC-V ecosystem 

Contact Canonical to bring Ubuntu and open source software to your RISC-V platform.

Learn more about Ventana

Canonical at America Digital Congress in Chile

4 avril 2024 à 14:55

We are excited to share that Canonical participates in America Digital Congress in Santiago, Chile, for the first time ever. It’s one of the leading events in the region about digital transformation bringing together VPs and experts from the most relevant global tech companies. 

Canonical, the publisher of Ubuntu, provides open source security, support and services. In addition to the OS, Canonical offers an integrated data and AI stack. With customers that include top tech brands, emerging startups, governments and home users, Canonical delivers trusted open source for everyone.

Join us at the booth A31 to learn how Canonical can support your digital transformation journey securely and cost-efficiently.

Canonical Expert Talk:
How to build a digital transformation strategy



Date & Time: April 11, 16:15 – 16:55.
C-Level Forum AI & Digital Transformation

Juan Pablo Noreña, Canonical Cloud Field Software Engineer, is delighted to be speaking at America Digital Congress about digital transformation and AI. In this talk, he will explore the significant benefits of introducing open source solutions in all stages of the infrastructure implementation process, from virtualization to AI platforms.

Juan Pablo will also showcase how this approximation improves security, reduces costs in the infrastructure life cycle, and makes them predictable, offering companies a competitive advantage in the market.

Key topics:

  • A general perspective of the open source role in infrastructure and its benefits.
  • A guide for decision-makers on how and where to start the development of an infrastructure strategy using open source solutions.
  • Explanation of the relevance of support for the solutions to ensure the sustained success of the strategy.

Canonical Partner Programmes

At Canonical, we provide the services our partners need to ensure their hardware and software works optimally with the Ubuntu platform. We operate a range of partner programmes, from essential product certification to strategic collaboration, help with QA and long-term strategic alliances. For technology customers, this has created a thriving market of suppliers with Ubuntu expertise. 

Are you interested to learn more about our partner programmes? Talk to the team at the booth or visit our partner webpage

Come and meet us at America Digital 

Come visit us at the booth to learn how Canonical could support you in the digital transformation journey. Check out our Data and AI offerings to learn more about our solutions.

Deploying Open Language Models on Ubuntu

28 mars 2024 à 22:18

This blog post explores the technical and strategic benefits of deploying open-source AI models on Ubuntu. We’ll highlight why it makes sense to use Ubuntu with open-source AI models, and outline the deployment process on Azure.

Authored by Gauthier Jolly, Software Engineer, CPC, and Jehudi Castro-Sierra, Public Cloud Alliance Director, both from Canonical.

Why Ubuntu for Open-Source AI?

  • Open Philosophy: Ubuntu’s open-source nature aligns seamlessly with the principles of open-source AI models, fostering collaboration and accessibility.
  • Seamless Integration: Deploying open-source AI is smooth on Ubuntu, thanks to its robust support for AI libraries and tools.
  • Community: Ubuntu’s large community provides valuable resources and knowledge-sharing for AI development.

The Role of Ubuntu Pro

Ubuntu Pro elevates the security and compliance aspects of deploying AI models, offering extended security maintenance, comprehensive patching, and automated compliance features that are vital for enterprise-grade applications. Its integration with Confidential VMs on Azure enhances the protection of sensitive data and model integrity, making it an indispensable tool for tasks requiring stringent security measures like ML training, inference, and confidential multi-party data analytics.

Why use the public cloud for deploying AI models?

Using a public cloud like Azure gives straightforward access to powerful GPUs and Confidential Compute capabilities, essential for intensive AI tasks. These features significantly reduce the time and complexity involved in setting up and running AI models, without compromising on security and privacy. Although some may opt for on-prem deployment due to specific requirements, Azure’s scalable and secure environment offers a compelling argument for cloud-based deployments.

Provisioning and Configuration

We are going to explore using open models on Azure by creating an instance with Ubuntu, installing NVIDIA drivers for GPU support, and setting up Ollama for running the models. The process is technical, involving CLI commands for creating the resource group, VM, and configuring NVIDIA drivers. Ollama, the chosen tool for running models like Mixtral, is best installed using Snap for a hassle-free experience, encapsulating dependencies and simplifying updates.

Provision an Azure VM

Begin by creating a resource group and then a VM with the Ubuntu image using the Azure CLI.

az group create --location westus --resource-group ml-workload
az vm create \
    --resource-group ml-workload \
    --name jammy \
    --image Ubuntu2204 \
    --generate-ssh-keys \
    --size Standard_NC4as_T4_v3 \
    --admin-username ubuntu --license-type UBUNTU_PRO

Note the publicIpAddress from the output – you’ll need it to SSH into the VM.

Install Nvidia Drivers (GPU Support)

For GPU capabilities, install NVIDIA drivers using Ubuntu’s package management system. Restart the system after installation.

sudo apt update -y
sudo apt full-upgrade -y
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install
sudo systemctl reboot

Important: Standard NVIDIA drivers don’t support vGPUs (fractional GPUs). See instructions on the Azure site for installing GRID drivers, which might involve building an unsigned kernel module (which may be incompatible with Secure Boot).

Deploying Ollama with Snap

Snap simplifies the installation of Ollama and its dependencies, ensuring compatibility and streamlined updates. The –beta flag allows you to access the latest features and versions, which might still be under development

sudo snap install --beta ollama

Configuration

Configure Ollama to use the ephemeral disk

sudo mkdir /mnt/models
sudo snap connect ollama:removable-media # to allow the snap to reach /mnt
sudo snap set ollama models=/mnt/models

Installing Mixtral

At this point, you can run one of the open models available out of the box, like mixtral or llama2. If you have a fine-tuned version of these models (a process that involves further training on a specific dataset), you can run those as well.

ollama run mixtral

The first run might take a while to download the model.

Now you can use the model through the console interface:

Installing a UI

This step is optional, but provides a UI via your web browser.

sudo snap install --beta open-webui

Access the web UI securely

To quickly access the UI without open ports in the Azure security group, you can create an SSH tunnel to your VM using the following command:

ssh -L 8080:localhost:8080 ubuntu@${IP_ADDR}

Go to http://localhost:8080 in your web browser on your local machine (the command above tunnels the traffic from your localhost to the instance on Azure).:

In case you want to make this service public, follow this documentation.

Verify GPU usage

sudo watch -n2 nvidia-smi

Check that the ollama process is using the GPU, you should see something like this:

+---------------------------------------------------------------------------+
| Processes:                                                                |                                                                            
|  GPU   GI   CI        PID   Type   Process name                GPU Memory |
|        ID   ID                                                 Usage      |
|===========================================================================|
|    0   N/A  N/A      1063      C   /snap/ollama/13/bin/ollama     4882MiB |
+---------------------------------------------------------------------------+

Complementary and Alternative Solutions

  • Charmed Kubeflow: Explore this solution for end-to-end MLOps (Machine Learning Operations), providing a streamlined platform to manage every stage of the machine learning lifecycle. It’s particularly well-suited for complex or large-scale AI deployments.
  • Azure AI Studio: Provides ease of use for those seeking less customization.

Conclusion

Ubuntu’s open-source foundation and robust ecosystem make it a compelling choice for deploying open-source AI models. When combined with Azure’s GPU capabilities and Confidential Compute features, you gain a flexible, secure, and performant AI solution.

Canonical at Google Next – What you need to know

27 mars 2024 à 11:00

Google Next is making its way to Las Vegas, and Ubuntu is joining the journey. As a proud sponsor, Canonical, the publisher of Ubuntu , invites you to join us at the event and visit booth #252 in the Mandalay Bay Expo Hall. As one of the most popular Linux operating systems, Canonical is dedicated to providing commercial support and driving open source innovation across a diverse range of industries and applications. Stop by and learn more about how Canonical and GCP are collaborating to empower businesses with secure and scalable solutions for their cloud computing needs. 

Ubuntu ‘Show you’re a Pro’ Challenge: Find and patch the vulnerabilities and earn awesome swag!

Are you an Ubuntu Pro? Test your skills at our booth! Sit down at our workstation and discover any unpatched vulnerabilities on the machine. Showcase your expertise by securing the system completely, and receive exclusive swag as a token of our gratitude.

Security maintenance for your full software stack

At Canonical, security is paramount. Ubuntu Pro offers a solution to offload security and compliance concerns for your open source stack, allowing you to concentrate on building and managing your business. Serving as an additional layer of services atop every Ubuntu LTS release, Ubuntu Pro ensures robust protection for your entire software stack, encompassing over 30,000 open source packages. Say farewell to fragmented security measures; Canonical provides a holistic approach, delivering  security and support through a unified vendor. Additionally, relish the assurance of vendor-backed SLA support for open source software, providing peace of mind for your operations.

Confidential computing across clouds

Confidential computing is a revolutionary technology that disrupts the conventional threat model of public clouds. In the past, vulnerabilities within the extensive codebase of the cloud’s privileged system software, including the operating system and hypervisor, posed a constant risk to the confidentiality and integrity of code and data in operation. Likewise, unauthorized access by a cloud administrator could compromise the security of your virtual machine (VM). 

Ubuntu Confidential VMs (CVMs) on Google Cloud offer enhanced security for your workloads by utilizing hardware-protected Trusted Execution Environments (TEEs). With the broadest range of CVMs available, Ubuntu enables users on Google Cloud to benefit from the cutting-edge security features of AMD 4th Gen EPYC processors with SEV-SNP and Intel Trust Domain Extensions (Intel TDX).

Scale your AI projects with open source tooling

Empower your organization with Canonical’s AI solutions. We specialize in the automation of machine learning workloads on any environment, whether private or public cloud, or hybrid or multi cloud. We provide an end-to-end MLOps solution to develop and deploy models in a secure, reproducible, and portable manner that seamlessly integrates with your existing technology stack. Let us help you unlock the full potential of AI.

Join Us at Google Next 2024

Mark your calendars and make plans to visit Canonical at Google Cloud Next 2024. Whether you’re seeking cutting-edge solutions for cloud computing, robust security measures for your software stack, or innovative AI tools to propel your organization forward, our team will be on hand to offer insights, demonstrations, and personalized consultations to help you harness the power of open source technology for your business. Join us at booth #252 to discover how Canonical and Ubuntu can elevate your digital journey. See you there!

Prompts:

Canonical at Google Next – What you need to know!

Canonical is excited to sponsor Google Cloud Next in Las Vegas, NV April 9-11, 2024. 

visit to the Canonical-Ubuntu booth #252 in the Mandalay Bay Expo Hall. 

Our team will be available to discuss the following:

  • Protect your full software tech stack with Ubuntu Pro providing security coverage for 30,000+ software packages.
  • Single vendor for security requirements – delivery, security, support; Vendor-backed SLA support for open source  
  • Confidential computing – OS support across all clouds (multi-cloud/hybrid cloud)
  • AI
    • Canonical provides tailored solutions to enable your organisation to efficiently run machine learning workloads. Canonical offers an end-to-end MLOps solution that can be used across all layers of the technology stack.

While at our booth, earn some awesome swag by showing that you’re an Ubuntu Pro. Take a seat at our workstation to find the unpatched vulnerabilities on the machine! Upgrade the machine to be fully secure to earn awesome swag! 

See you at the event

Generative AI with Ubuntu on AWS. Part II: Text generation

27 mars 2024 à 15:09

In our previous post, we discussed how to generate Images using Stable Diffusion on AWS. In this post, we will guide you through running LLMs for text generation in your own environment with a GPU-based instance in simple steps, empowering you to create your own solutions.

Text generation, a trending focus in generative AI, facilitates a broad spectrum of language tasks beyond simple question answering. These tasks include content extraction, summary generation, sentiment analysis, text enhancement (including spelling and grammar correction), code generation, and the creation of intelligent applications like chatbots and assistants.

In this tutorial, we will demonstrate how to deploy two prominent large language models (LLM) on a GPU-based EC2 instance on AWS (G4dn) using Ollama, an open source tool for downloading, managing, and serving LLM models. Before getting started, ensure you have completed our technical guide for installing NVIDIA drivers with CUDA on a G4DN instance.

We will utilize Llama2 and Mistral, both strong contenders in the LLM space with open source licenses suitable for this demo.

While we won’t explore the technical details of these models, it is worth noting that Mistral has shown impressive results despite its relatively small size (7 billion parameters fitting into an 8GB VRAM GPU). Conversely, Llama2 provides a range of models for various tasks, all available under open source licenses, making it well-suited for this tutorial. 

To experiment with question-answer models similar to ChatGPT, we will utilize the fine-tuned versions optimized for chat or instruction (Mistral-instruct and Llama2-chat), as the base models are primarily designed for text completion.

Let’s get started!

Step 1: Installing Ollama

To begin, open an SSH session to your G4DN server and verify the presence of NVIDIA drivers and CUDA by running:

nvidia-smi

Keep in mind that you need to have the SSH port open, the key-pair created or assigned to the machine during creation, the external IP of the machine, and software like ssh for Linux or PuTTY for Windows to connect to the server.

If the drivers are not installed, refer to our technical guide on installing NVIDIA drivers with CUDA on a G4DN instance.

Once you have confirmed the GPU drivers and CUDA are set up, proceed to install Ollama. You can opt for a quick installation using their binary, or choose to clone the repository for a manual installation.

To install Ollama quickly, run the following command

curl -fsSL https://ollama.com/install.sh | sh

Step 2: Running LLMs on Ollama

Let’s start with Mistral models and view the results by running:

ollama run mistral

This instruction will download the Mistral model (4.1GB) and serve it, providing a prompt for immediate interaction with the model.

Not a bad response for a prompt written in Spanish!. Now let’s experiment with a prompt to write code:

Impressive indeed. The response is not only generated rapidly, but the code also runs flawlessly, with basic error handling and explanations. (Here’s a pro tip: consider asking for code comments, docstrings, and even test functions to be incorporated into the code). 

Exit with the /bye command.

Now, let’s enter the same prompt with Llama2.

We can see that there are immediate, notable differences. This may be due to the training data it has encountered, as it defaulted to a playful and informal chat-style response. 

Let’s try Llama2 using the same code prompt from above:

The results of this prompt are quite interesting. Following four separate tests, it was clear that the generated responses had not only broken code but also inconsistencies within the responses themselves. It appears that writing code is not one of the out-of-the-box capabilities of Llama2 in this variant (7b parameters, although there are also versions specialized in code like Code-Llama2), but results may vary.

Let’s run a final test with Code-Llama, a Llama model fine-tuned to create and explain code:

We will use the same prompt from above to write the code:

This time, the response is improved, with the code functioning properly and a satisfactory explanation provided.

You now have the option to either continue exploring directly through this interface or start developing apps using the API.

Final test: A chat-like web interface

We now have something ready for immediate use. However,  for some added fun, let’s install a chat-like web interface to mimic the experience of ChatGPT.

For this test, we are going to use ollama-ui (https://github.com/ollama-ui/ollama-ui). 

⚠︎ Please note that this project is no longer being maintained and users should transition to Open WebUI, but for the sake of simplicity, we are going to still use the Ollama-ui front-end.

In your terminal window, clone the ollama-ui repository by entering the following command:

git clone https://github.com/ollama-ui/ollama-ui

Here’s a cool trick: when you run Ollama, it creates an API endpoint on port 11434. However, Ollama-ui will run and be accessible on port 8000, thus, we’ll need to ensure both ports are securely accessible from our machine.

Since we are currently running as a development service (without the security features and performance of a production web server), we will establish an SSH tunnel for both ports. This setup will enable us to access these ports exclusively from our local computer with encrypted communication (SSL).

To create the tunnel for both the web-ui and the model’s API, close your current SSH session and open a new one with the following command:

ssh -L 8000:localhost:8000 -L 11434:127.0.0.1:11434 -i myKeyPair.pem ubuntu@<Machine_IP>

Once the tunnel is set up, navigate to the ollama-ui directory in a new terminal and run the following command:

cd ollama-ui
make

Next, open your local browser and go to 127.0.0.1:8000 to enjoy the chat web inRunning an LLM model for text generation on Ubuntu on AWS with a GPU instanceterface!

While the interface is simple, it enables dynamic model switching, supports multiple chat sessions, and facilitates interaction beyond reliance on the terminal (aside from tunneling). This offers an alternative method for testing the models and your prompts.

Final thoughts

Thanks to Ollama and how simple it is to install the NVIDIA drivers on a GPU-based instance, we got a very straightforward process for running LLMs for text generation in your own environment. Additionally, Ollama facilitates the creation of custom model versions and fine-tuning, which is invaluable for developing and testing LLM-based solutions.

When selecting the appropriate model for your specific use case, it is crucial to evaluate their capabilities based on architectures and the data they have been trained on. Be sure to explore fine-tuned variants such as Llama2 for code, as well as specialized versions tailored for generating Python code.

Lastly, for those aiming to develop production-ready applications, remember to review the model license and plan for scalability, as a single GPU server may not suffice for multiple concurrent users. You may want to explore Amazon Bedrock, which offers easy access to various versions of these models through a simple API call or Canonical MLOps, an end-to-end solution for training and running your own ML models.

Quick note regarding the model size

The size of the model significantly impacts the production of better results. A larger model is more capable of reproducing better content (since it has a greater capacity to “learn”). Additionally, larger models offer a larger attention window (for “understanding” the context of the question), and allow more tokens as input (your instructions) and output (the response)

As an example, Llama2 offers three main model sizes regarding the parameter number: 7, 13, or 70 billion parameters. The first model requires a GPU with a minimum of 8GB of GPU RAM, whereas the second requires a minimum of 16GB of VRAM.

Let me share a final example:

I will request the 7B parameters version of Llama2 to proofread an incorrect version of this simple Spanish phrase, “¿Hola, cómo estás?”, which translates to “Hi, how are you?” in English. 

I conducted numerous tests, all yielding incorrect results like the one displayed in the screenshot (where “óle” is not a valid word, and it erroneously suggests it means “hello”).

Now, let’s test the same example with Llama2 with 13 billion parameters:

While it failed to recognize that I intended to write “hola,” this outcome is significantly better as it added accents, question marks and detected that “ola” wasn’t the right word to use (if you are curious, it means “wave”) .

Canonical at Google Next – What you need to know

27 mars 2024 à 11:00

Google Next is making its way to Las Vegas, and Ubuntu is joining the journey. As a proud sponsor, Canonical, the publisher of Ubuntu , invites you to join us at the event and visit booth #252 in the Mandalay Bay Expo Hall. As one of the most popular Linux operating systems, Canonical is dedicated to providing commercial support and driving open source innovation across a diverse range of industries and applications. Stop by and learn more about how Canonical and GCP are collaborating to empower businesses with secure and scalable solutions for their cloud computing needs. 

Ubuntu ‘Show you’re a Pro’ Challenge: Find and patch the vulnerabilities and earn awesome swag!

Are you an Ubuntu Pro? Test your skills at our booth! Sit down at our workstation and discover any unpatched vulnerabilities on the machine. Showcase your expertise by securing the system completely, and receive exclusive swag as a token of our gratitude.

Security maintenance for your full software stack

At Canonical, security is paramount. Ubuntu Pro offers a solution to offload security and compliance concerns for your open source stack, allowing you to concentrate on building and managing your business. Serving as an additional layer of services atop every Ubuntu LTS release, Ubuntu Pro ensures robust protection for your entire software stack, encompassing over 30,000 open source packages. Say farewell to fragmented security measures; Canonical provides a holistic approach, delivering  security and support through a unified vendor. Additionally, relish the assurance of vendor-backed SLA support for open source software, providing peace of mind for your operations.

Confidential computing across clouds

Confidential computing is a revolutionary technology that disrupts the conventional threat model of public clouds. In the past, vulnerabilities within the extensive codebase of the cloud’s privileged system software, including the operating system and hypervisor, posed a constant risk to the confidentiality and integrity of code and data in operation. Likewise, unauthorized access by a cloud administrator could compromise the security of your virtual machine (VM). 

Ubuntu Confidential VMs (CVMs) on Google Cloud offer enhanced security for your workloads by utilizing hardware-protected Trusted Execution Environments (TEEs). With the broadest range of CVMs available, Ubuntu enables users on Google Cloud to benefit from the cutting-edge security features of AMD 4th Gen EPYC processors with SEV-SNP and Intel Trust Domain Extensions (Intel TDX).

Scale your AI projects with open source tooling

Empower your organization with Canonical’s AI solutions. We specialize in the automation of machine learning workloads on any environment, whether private or public cloud, or hybrid or multi cloud. We provide an end-to-end MLOps solution to develop and deploy models in a secure, reproducible, and portable manner that seamlessly integrates with your existing technology stack. Let us help you unlock the full potential of AI.

Join Us at Google Next 2024

Mark your calendars and make plans to visit Canonical at Google Cloud Next 2024. Whether you’re seeking cutting-edge solutions for cloud computing, robust security measures for your software stack, or innovative AI tools to propel your organization forward, our team will be on hand to offer insights, demonstrations, and personalized consultations to help you harness the power of open source technology for your business. Join us at booth #252 to discover how Canonical and Ubuntu can elevate your digital journey. See you there!

Prompts:

Canonical at Google Next – What you need to know!

Canonical is excited to sponsor Google Cloud Next in Las Vegas, NV April 9-11, 2024. 

visit to the Canonical-Ubuntu booth #252 in the Mandalay Bay Expo Hall. 

Our team will be available to discuss the following:

  • Protect your full software tech stack with Ubuntu Pro providing security coverage for 30,000+ software packages.
  • Single vendor for security requirements – delivery, security, support; Vendor-backed SLA support for open source  
  • Confidential computing – OS support across all clouds (multi-cloud/hybrid cloud)
  • AI
    • Canonical provides tailored solutions to enable your organisation to efficiently run machine learning workloads. Canonical offers an end-to-end MLOps solution that can be used across all layers of the technology stack.

While at our booth, earn some awesome swag by showing that you’re an Ubuntu Pro. Take a seat at our workstation to find the unpatched vulnerabilities on the machine! Upgrade the machine to be fully secure to earn awesome swag! 

See you at the event

Accelerate AI development with Ubuntu and NVIDIA AI Workbench

18 mars 2024 à 22:10
Fig.1. NVIDIA AI Workbench

Canonical expands its collaboration with NVIDIA through NVIDIA AI Workbench. NVIDIA AI Workbench is supported across workstations, data centres, and cloud deployments.

NVIDIA AI Workbench is an easy-to-use toolkit that allows developers to create, test, and customise AI and machine learning models on their PC or workstation and scale them to the data centre or public cloud.  It simplifies interactive development workflows while automating technical tasks that halt beginners and derail experts. Collaborative AI and ML development is now possible on any platform – and for any skill level. 

As the preferred OS for data science, artificial intelligence and machine learning, Ubuntu and Canonical play an integral role in AI Workbench capabilities. 

  • On Windows, Ubuntu powers AI Workbench via WSL2. 
  • In the cloud, Ubuntu 22.04 LTS enables AI Workbench cloud deployments as the only target OS supported for remote machines. 
  • For AI application deployments from the datacenter to cloud to edge, Ubuntu-based containers are included as a key part of AI Workbench.

This seamless end user experience is made possible thanks to the partnership between Canonical and NVIDIA.

Define your AI journey, start local and scale globally

Create, collaborate, and reproduce generative AI and data science projects with ease. Develop and execute while NVIDIA AI Workbench handles the rest:

  • Streamlined setup: easy installation and configuration of containerized development environments for GPU-accelerated hardware.
  • Laptop to cloud: start locally on a RTX PC or workstation and scale out to data centre or cloud in just a few clicks.
  • Automated workflow management: simplified management of project resources, versioning, and dependency tracking.
Fig 2. Environment Window in AI Workbench Desktop App

Ubuntu and NVIDIA AI Workbench improve the end user experience for Generative AI workloads on client machines

As the established OS for data science, Ubuntu is now commonly being used for AI/ML development and deployment purposes. This includes development, processing, and iterations of Generative AI (GenAI) workloads. GenAI on both smaller devices and GPUs is increasingly important with the growth of edge AI applications and devices. Applications such as smart cities require more edge devices such as cameras and sensors and thus require more data to be processed at the edge. To make it easier for end users to deploy workloads with more customisability, Ubuntu containers are often preferred due to their ease of use for bare metal deployments. NVIDIA AI Workbench offers Ubuntu container options that are well integrated and suited for GenAI use cases.

Fig 3. AI Workbench Development Workflow

Peace of mind with Ubuntu LTS

With Ubuntu, developers benefit from Canonical’s 20-year track record of Long Term Supported releases, delivering security updates and patching for 5 years. With Ubuntu Pro, organisations can extend that support and security maintenance commitment to 10 years to offload security and compliance from their team so you can focus on building great models. Together, Canonical and Ubuntu provide an optimised and secure environment for AI innovators wherever they are. 

Getting started is easy (and free).

Get started with Canonical Open Source AI Solutions

Canonical accelerates AI Application Development with NVIDIA AI Enterprise

18 mars 2024 à 22:10

Charmed Kubernetes support comes to NVIDIA AI Enterprise

Canonical’s Charmed Kubernetes is now supported on NVIDIA AI Enterprise 5.0. Organisations using Kubernetes deployments on Ubuntu can look forward to a seamless licensing migration to the latest release of the NVIDIA AI Enterprise software platform providing developers the latest AI models and optimised runtimes.

NVIDIA AI Enterprise 5.0

NVIDIA AI Enterprise 5.0 is supported across workstations, data centres, and cloud deployments, new updates include:

  • NVIDIA NIM microservices is a set of cloud-native microservices developers can use as building blocks to support custom AI application development and speed production AI, and will be supported on Charmed Kubernetes.
  • NVIDIA API catalog: providing quick access for enterprise developers to experiment, prototype and test NVIDIA-optimised foundation models powered by NIM. When ready to deploy, enterprise developers can export the enterprise-ready API and run on a self-hosted system
  • Infrastructure management enhancements include support for vGPU heterogeneous profiles, Charmed Kubernetes, and new GPU platforms.

Charmed Kubernetes and NVIDIA AI Enterprise 5.0

Data scientists and developers leveraging NVIDIA frameworks and workflows on Ubuntu across the board now have a single platform to rapidly develop AI applications on the latest generation NVIDIA Tensor Core GPUs. For data scientists and AI/ML developers who would like to deploy their latest AI workloads using kubernetes, it is vital to leverage the most performance out of Tensor Core GPUs through NVIDIA drivers and integrations.

Fig. NVIDIA AI Enterprise 5.0

With Charmed Kubernetes from Canonical, several features are provided that are unique to this distribution including inclusion of NVIDIA operators and GPU optimisation features, composability and extensibility using customised integrations through Ubuntu operating system.

Best-In-Class Kubernetes from Canonical 

Charmed Kubernetes can automatically detect GPU-enabled hardware and install required drivers from NVIDIA repositories. With the release of Charmed Kubernetes 1.29, the NVIDIA GPU Operator charm is available for specific GPU configuration and tuning. With support for GPU operators in Charmed K8s, organisations can rapidly and repeatedly deploy the same models utilising existing on-prem or cloud infrastructure to power AI workloads. 

With the NVIDIA GPU operator, users can automatically detect the GPU on the system and install NVIDIA repositories. It also allows for the most optimal configurations through features such as NVIDIA Multi-Instance GPU (MIG) technology in order to leverage the most efficiency out of the Tensor Core GPUs. GPU-optimised instances for AI/ML applications reduce latency and allow for more data processing, freeing for larger-scale applications and more complex model deployment. 

Paired with the GPU Operator, the Network Operator enables GPUDirect RDMA (GDR), a key technology that accelerates cloud-native AI workloads by orders of magnitude. GDR allows for optimised network performance, by enhancing data throughput and reducing latency. Another distinctive advantage is its seamless compatibility with NVIDIA’s ecosystem, ensuring a cohesive experience for users. Furthermore, its design, tailored for Kubernetes, ensures scalability and adaptability in various deployment scenarios. This all leads to more efficient networking operations, making it an invaluable tool for businesses aiming to harness the power of GPU-accelerated networking in their Kubernetes environments.

Speaking about these solutions, Marcin “Perk” Stożek, Kubernetes Product Manager at Canonical says: “Charmed Kubernetes validation with NVIDIA AI Enterprise is an important step towards an enterprise-grade, end-to-end solution for AI workloads. By integrating NVIDIA Operators with Charmed Kubernetes, we make sure that customers get what matters to them most: efficient infrastructure for their generative AI workloads.” 

Getting started is easy (and free). You can rest assured that Canonical experts are available to help if required.

Get started with Canonical open source solutions with NVIDIA AI Enterprise 

Try out NVIDIA AI Enterprise with Charmed Kubernetes with a free, 90-day evaluation

Large Language Models (LLMs) Retrieval Augmented Generation (RAG) using Charmed OpenSearch

11 mars 2024 à 07:00

Improving the relevance of your LLM application by leveraging Charmed Opensearch’s vector database

Large Language Models (LLMs) fall under the category of Generative AI (GenAI), an artificial intelligence type that produces content based on user-defined context. These models undergo training using an extensive dataset composed of trillions of combinations of words from natural language, enabling them to empower interactive and conversational applications across various scenarios.

Renowned LLMs like GPT, BERT, PaLM, and LLaMa can experience performance improvements by gaining access to additional structured and unstructured data. This additional data may include public or internal documents, websites, and various text forms and content. This methodology, termed retrieval-augmented generation (RAG), ensures that your conversational application generates accurate results with contextual relevance and domain-specific knowledge, even in areas where the pertinent facts were not part of the initial training dataset.

RAG can drastically improve the accuracy of an LLM’s responses. See the example below:

“What is PRO?” response without RAG

Pro is a subscription-based service that offers additional features and functionality to users. For example, Pro users can access exclusive content, receive priority customer support, and more. To become a Pro user, you can sign up for a Pro subscription on our website. Once you have signed up, you can access all of the Pro features and benefits.

“What is PRO?” response with RAG

Ubuntu Pro is an additional stream of security updates and packages that meet compliance requirements, such as FIPS or HIPAA, on top of an Ubuntu LTS. It provides an SLA for security fixes for the entire distribution (‘main and universe’ packages) for ten years, with extensions for industrial use cases. Ubuntu Pro is free for personal use, offering the full suite of Ubuntu Pro capabilities on up to 5 machines.

This article guides you on leveraging Charmed OpenSearch to maintain a relevant and up-to-date LLM application.

What is OpenSearch?

OpenSearch is an open-source search and analytics engine. Users can extend the functionality of OpenSearch with a selection of plugins that enhance search, security, performance analysis, machine learning, and more. This previous article we wrote provides additional details on the comprehensive features of OpenSearch. We discussed the capability of enabling enterprise-grade solutions through Charmed OpenSearch. This blog will emphasise a specific feature pertinent to RAG: utilising OpenSearch as a vector database.

What is a vector database?

Vector databases allow you to store and index, for example, text documents, rich media, audio, geospatial coordinates, tables, and graphs into vectors. These vectors represent points in N-dimensional spaces, effectively encapsulating the context of an asset. Search tools can look into these spaces using low-latency queries to find similar assets in neighbouring data points. These search tools typically do this by exploiting the efficiency of different methods for obtaining, for example, the k-nearest neighbours (k-NN) from an index of vectors.

In particular, OpenSearch enables this feature with the k-NN plugin and augments this functionality by providing your conversational applications with other essential features, such as fault tolerance, resource access controls, and a powerful query engine.

Using the OpenSearch k-NN plugin for RAG

IIn this section, we provide a practical example of using Charmed OpenSearch in the RAG process as a retrieval tool with an experiment using a Jupyter notebook on top of Charmed Kubeflow to infer an LLM.

1. Deploy Charmed OpenSearch and enable the k-NN plugin. Follow the Charmed OpenSearch tutorial, which is a good starting point. At the end, verify if the plugin is enabled, which is enabled by default:

$ juju config opensearch plugin_opensearch_knn
true

2. Get your credentials. The easiest way to create and retrieve your first administrator credentials is to add a relation between Charmed Opensearch and the Data Integrator Charm, which is also part of the tutorial.

3. Create a vector index for your k-NN index.  Now, we can create a vector index for your additional documents encoded into the knn_vectors data type. For simplicity, we will use the opensearch-py client.

from opensearchpy import OpenSearch

os_host = 10.56.118.209
os_port = 9200
os_url = "https://10.56.118.209:9200"
os_auth = ("opensearch-client_7","sqlKjlEK7ldsBxqsOHNcFoSXayDudf30")

os_client = OpenSearch(
    hosts = [{'host': os_host, 'port': os_port}],
    http_compress = True, 
    http_auth = os_auth,
    use_ssl = True,
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False
)

os_index_name = "rag-index"

settings = {
    "settings": {
        "index": {
            "knn": True,
            "knn.space_type": "cosinesimil"
        }
    }
}

opensearch_client.indices.create(index=os_index_name, body=settings)

properties={
    "properties": {
        "vector_field": {
            "type": "knn_vector",
            "dimension": 384
        },
        "text": {
            "type": "keyword"
        }
    }
}

opensearch_client.indices.put_mapping(index=os_index_name, body=properties)

4. Aggregate source documents. In this example, we will select a list of web content that we want our application to use as relevant information to provide accurate answers:

content_links = [
	https://discourse.ubuntu.com/t/ubuntu-pro-faq/34042
]

5. Load document contents into memory and split the content into chunks. It will allow us to create the embeddings from the selected documents and upload them to the index we created.

from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader(content_links)
htmls = loader.load()

from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    chunk_size=500, 
    chunk_overlap=0,
    separator="\n")
docs = text_splitter.split_documents(htmls)

6. Create embeddings for text chunks and store embeddings in the vector index. It will allow us to create the embeddings from the selected documents and upload them to the index we created.

from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L12-v2",
            encode_kwargs={'normalize_embeddings': False})


from langchain.vectorstores import OpenSearchVectorSearch

docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings,
                                    ef_construction=256,
                                    engine="faiss",
                                    space_type="innerproduct",
                                    m=48, opensearch_url=os_url,
                                    index_name=os_index_name,
                                    http_auth=os_auth,
                                    verify_certs=False)

7. Use the similarity search to retrieve the documents that provide context to your query. The search engine will perform the Approximate k-NN Search, for example,  using the cosine similarity formula, and return the relevant documents in the context of your question.

query = """
  What is Pro?
"""

similar_docs = docsearch.similarity_search(query, k=2, 
                                    raw_response=True, 
                                    search_type="approximate_search",
                                    space_type="cosinesimil")

8. Prepare you LLM. We used a simple example using a HugginFace pipeline to load an LLM.

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.llms import HuggingFacePipeline

model_name="TheBloke/Llama-2-7B-Chat-GPTQ"


model = AutoModelForCausalLM.from_pretrained(
            model_name,
            cache_dir="model",
            device_map='auto'
        )

tokenizer = AutoTokenizer.from_pretrained(model_name,cache_dir="llm/tokenizer")

pl = pipeline(
            "text-generation",
            model=model,
            tokenizer=tokenizer,
            max_length = 2048.
        )

llm = HuggingFacePipeline(pipeline=pl)

9. Create a prompt template. It will define the expectations of the response and specify that we will provide context for an accurate answer.

from langchain import PromptTemplate

question_prompt_template = """
    You are a friendly chatbot assistant that responds in a conversational manner to user's questions. 
    Respond in short but complete answers unless specifically asked by the user to elaborate on something. 
    Use History and Context to inform your answers.

Context:
---------
{context}
---------
Question: {question}
Helpful Answer:"""

QUESTION_PROMPT = PromptTemplate(
    template=question_prompt_template, input_variables=["context", "question"]
)

10. Infer the LLM to answer your question using the context documents retrieved from OpenSearch.

from langchain.chains.question_answering import load_qa_chain

question = "What is Pro?"

chain = load_qa_chain(llm, chain_type="stuff", prompt=QUESTION_PROMPT)
chain.run(input_documents=similar_docs, question=query)

Conclusion

Retrieval-augmented generation (RAG) is a method that enables users to converse with data repositories. It’s a tool that can revolutionise how you access and utilise data, as we showed in our tutorial. With RAG, you can improve data retrieval, enhance knowledge sharing, and enrich the results of your LLMs to give more contextually relevant, insightful responses that better reflect the most up-to-date information in your organisation.

The benefits of better LLMs that can access your knowledge base are as obvious as they are alluring: you gain better customer support, employee training and developer productivity. On top of that, you ensure that your teams get LLM answers and results that reflect accurate, up-to-date policy and information rather than generalised or even outright useless answers.

As we showed, Charmed OpenSearch is a simple and robust technology that can enable RAG capabilities. With it (and our helpful tutorial), any business can leverage RAG to transform their technical or policy manuals and logs into comprehensive knowledge bases.

Enterprise-grade and fully supported OpenSearch solution

Charmed OpenSearch is available for the open-source community. Canonical’s team of experts can help you get started with it as the vector database to leverage the power of the k-NN search for your LLM applications at any scale. Contact Canonical if you have questions. 

Watch the webinar: Future-proof AI applications with OpenSearch as a vector database

Generative AI on a GPU-Instance with Ubuntu on AWS: Part 1 – Image Generation

2 février 2024 à 21:16

We recently published a technical document showing how to install NVIDIA drivers on a G4DN instance on AWS, where we covered not only how to install the NVIDIA GPU drivers but also how to make sure to get CUDA working for any ML work. 

In this document we are going to run one of the most used generative AI models, Stable Diffusion, on Ubuntu on AWS for research and development purposes.

According to AWS, “G4dn instances, powered by NVIDIA T4 GPUs, are the lowest cost GPU-based instances in the cloud for machine learning inference and small scale training. (…) optimized for applications using NVIDIA libraries such as CUDA, CuDNN, and NVENC.”

G4DN instances come in different configurations:

Instance typevCPUsRAMGPUs
g4dn.xlarge4161
g4dn.2xlarge8321
g4dn.4xlarge16641
g4dn.8xlarge321281
g4dn.12xlarge481924
g4dn.16xlarge642561
g4dn.metal963848

For this exercise, we will be using the g4dn.xlarge instance, since we need only 1 GPU, and with 4 vCPUs and 16GB of RAM, it will provide sufficient resources for our needs, as the GPU will handle most of the workload. 

Image generation with Stable Diffusion

Stable Diffusion is a deep learning model released in 2022 that has been trained to transform text into images using latent diffusion techniques. Developed by Stability.AI, this groundbreaking technology not only provides open-source access to its trained weights but also has the ability to run on any GPU with just 4GB of RAM, making it one of the most used Generative AI models for image generation.

In addition to its primary function of text-to-image generation, Stable Diffusion can also be used for tasks such as image retouching and video generation. The license for Stable Diffusion permits both commercial and non-commercial use, making it a versatile tool for various applications.

Requirements

You’ll need SSH access. If running on Ubuntu or any other Linux distribution, opening a terminal and typing ssh will get you there. If running windows, you will need either WSL (to run a Linux shell inside windows) or PuTTY to connect to the machine using an external software.

Make sure you have NVIDIA Drivers and CUDA installed on your G4DN machine. Test with the following command:

nvidia-smi

You should be able to see the driver and CUDA versions as shown here:

Let’s get started!

Step 1: Create a python virtual environment:

First, we need to download some libraries and dependencies as shown below:

sudo apt-get install -y python3.10-venv
sudo apt-get install ffmpeg libsm6 libxext6 -y

Now we can create the Python environment.

python3 -m venv myvirtualenv

And finally, we need to activate it. Please note that every time we log in into the machine, we will need to reactivate it with the following line:

source myvirtualenv/bin/activate

Step 2: Download the web GUI and get a model.

To interact with the model easily, we are going to clone the Stable Diffusion WebUI from AUTOMATIC1111.

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

After cloning the repository, we can move on to the interesting part: choosing and downloading a Stable Diffusion model from the web. There are many versions and variants that can make the journey more complicated but more interesting as a learning experience. As you delve deeper, you will find that sometimes you need specific versions, fine-tuned or specialized releases for your purpose.

This is where HuggingFace is great, as they host a plethora of models and checkpoint versions that you can download. Please be mindful of the license model of each model you will be using.

Go to Hugging Face, click on models, and start searching for “Stable Diffusion”. For this exercise, we will use version 1.5 from runwayml.

Go to the “Files and versions” tab and scroll down to the actual checkpoint files.

Copy the link and go back to your SSH session. We will download the model using wget:

cd ~/stable-diffusion-webui/models/Stable-diffusion
wget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.safetensors

Now that the model is installed, we can run the script that will bootstrap everything and run the Web GUI.

Step 3: Run the WebUI securely and serve the model

Now that we have everything in place, we will run the WebUI and serve the model.

Just as a side note, since we are not installing this on a local desktop, we cannot just open the browser and enter the URL. This URL will only respond locally because of security constraints (in other words, it is not wise to open development environments to the public). Therefore, we are going to create an SSH tunnel.

Exit the SSH session.

If you are running on Linux (or Linux under WSL on Windows), you can create the tunnel using SSH by running the following command:

ssh -L 7860:localhost:7860 -i myKeyPair.pem ubuntu@<the_machine's_external_IP>

In case you are running on Windows and can’t use WSL, follow these instructions to connect via PuTTY.

If everything went well, we can now access the previous URL in our local desktop browser. The entire connection will be tunneled and encrypted via SSH.

In your new SSH session, enter the following commands to run the WebUI.

cd ~/stable-diffusion-webui
./webui.sh

The first time will take a while as it will install PyTorch and all the required dependencies. After it finishes, it will give you the following local URL:

http://127.0.0.1:7860

So open your local browser and go to the following URL: http://127.0.0.1:7860

We are ready to start playing. 

We tested our first prompt with all the default values, and this is what we got. Quite impressive, right?

Now you are ready to start generating!

Final thoughts

I hope this guide has been helpful in deploying the Stable Diffusion model on your own instance and has also provided you with a better understanding of how these models work and what can be achieved with generative AI. It is clear that generative AI is a powerful tool for businesses today. 

In our next post, we will explore how to deploy and self-host a Large Language Model, another groundbreaking AI tool. 

Remember, if you are looking to create a production-ready solution, there are several options available to assist you. From a security perspective, Ubuntu Pro offers support for your open source supply chain, while Charmed Kubeflow provides a comprehensive stack of services for all your machine learning needs. Additionally, AWS offers Amazon Bedrock, which simplifies the complexities involved and allows you to access these services through an API. 

Thank you for reading and stay tuned for more exciting AI content!

Ubuntu AI podcast: AI for day-to-day tasks

25 janvier 2024 à 07:57

Welcome to Ubuntu AI podcast, where we talk about AI with the industry leaders.

This episode was recorded in Riga, during the Ubuntu Summit 2023. We’re talking about the implementation of AI solutions for day-to-day tasks with the CEO of Nextcloud Frank Karlitschek.

AI usage in Nextcloud

We are talking about the AI usage at Nextcloud and privacy plays a big role there. Listen to the episode to learn more about how to ensure customer’s privacy when implementing AI solutions. We will also dive deeper into use-cases for Nextcloud.

Implementing AI solutions within your organization

You can built all your AI projects with secure and supported Canonical MLOps. Stable, secure, scalable tooling is a priority for enterprises. Having AI that enterprises can benefit from is critical.

If you are still defining the use-cases within your organization, our expert team is here to provide Canonical’s AI consulting services, designed to support you in every step of your journey.

Learn more about Canonical AI solutions here.

Download our guide to MLOps. Take your AI projects to production.

❌
❌