Get started with Inference Snaps using Qwen VL

This tutorial walks you through the process of installing and using an inference snap. Specifically, you will install and run Qwen 2.5 VL, a multi-modal large language model (LLM). This model is also referred to as a Vision Language Model (VLM) because it can ingest and interpret images, along with text prompts.

Set up your computer

To complete this tutorial you will need:

  • A computer running an operating system that supports snaps, e.g., Ubuntu 24.04. Older Ubuntu versions aren’t supported.

  • Hardware drivers. These may be pre-installed in your OS, but some devices require additional drivers.

  • A Docker installation

Install the snap

Install the Qwen VL snap from the 2.5 track (which refers to the version of the model). It is currently released with beta stability as indicated by the channel:

user@host:~$
sudo snap install qwen-vl --channel 2.5/beta
qwen-vl (2.5/beta) 2.5 from Canonical IoT Labs✓ installed

During installation, the snap detects information about your system such as installed hardware and available memory. It uses this information to select an appropriate engine for your setup.

In this context, engine refers to a combination of model weights and a runtime, along with some utilities to make it work, like a server exposing a standard API.

Check the status

Once the installation is complete, the server used to inference with the model starts automatically. This server comes from the engine.

Run the status command to see the selected engine and the status of the server. The output should include the server’s status and the URL where the API can be accessed:

user@host:~$
qwen-vl status
engine: intel-cpu
status: online
endpoints:
    openai: http://localhost:8326/v3

In this case, the intel-cpu engine was automatically selected during installation. You may end up with a different engine.

Chat with the LLM

The qwen-vl snap includes a basic chat client that can be used to chat with the LLM. Use the chat command to start a chat session:

user@host:~$
qwen-vl chat
Connected to http://localhost:8326/v3
Type your prompt, then ENTER to submit. CTRL-C to quit.
» Hi there
Hello! How can I assist you today?

»  

This chat client only supports text input. For queries that include images, you’ll need to set up a more advanced client.

Run and configure Open WebUI

Open WebUI is an advanced client interface for working with LLMs. Run the Open WebUI as a Docker container:

docker run --network=host --env PORT=9099 ghcr.io/open-webui/open-webui:0.6
  • --network=host allows the Docker container to access the URL of the inference snap’s HTTP server on the host machine

  • --env PORT=9099 changes the port that Open WebUI listens on from the default 8080 to 9099 to prevent collisions with other services on your host machine.

Confirm that you can access the web interface at http://localhost:9099, and configure a new connection in Open WebUI. You will need the URL from the status command.

Prompt Qwen VL with an image

Once the new connection is set, start a New Chat on the Open WebUI interface. In the dropdown menu at the top listing available models, select Qwen 2.5 VL.

Note

If the model is not listed, check if the configuration in the previous step was done incorrectly.

The chat screen has a text field, but you can add images and other attachments for additional context by clicking on the + button.

Selecting an image attachment

Download and add this image of the Dom Luís I Bridge as an attachment, and enter a suitable question in the chat screen. The model will take some time to process the query before it starts printing a response.

In this example, the prompt used was “What is the significance of the triangles?”. The expectation was that the LLM would explain the architectural reason for the triangular sections in the bridge. However, the model spotted the triangular buoy and responded according to that instead of the bridge.

Response of the model

Next steps

To become more familiar with inference snaps: