azure-vm-utils Updates

This page describes the policy for updating the azure-vm-utils source package with regard to upstream LTS stable releases. This is a special case of the standard SRU process; we treat them as an SRU special case for “Upstream releases” that may include bug fixes, new features and behaviour changes, and they may also fit under “other safe cases” (e.g. enabling new hardware or updating its description). Here we outline details of the project and the current state of their verification to prove that the LTS releases can be considered for such kinds of special cases.

In essence this document does not deviate from the normal SRU process, but it emphasises the need for special attention in the review prior to upload the package to be SRUed and in the verification once the package has been accepted into -proposed.

Background on azure-vm-utils

The azure-vm-utils package is a collection of utilities and udev rules to make the most of the Linux experience on Azure. For a detailed list of functionality this package provides to Ubuntu cloud images, refer to the upstream project page at GitHub.

Cloud platforms evolve at a rate that can’t be handled in six-month increments, and they will often develop features that they would like to be available to customers who don’t want to upgrade from earlier Ubuntu releases. As such, updating vm-azure-utils to more recent upstream releases is required within all Ubuntu releases, so they continue to function properly in their environment.

azure-vm-utils was synced in Ubuntu universe during the 25.04 release cycle.

Integrations and Interactions

It could interact with other udev rules, especially walinuxagent/waagent. Currently, the udev rules provided with azure-vm-utils identify:

  • Microsoft NMVe devices as a model or vendor

  • MANA, mlx4, or mlx5 drivers, all related to networking on Azure virtual machines (SR-IOV interfaces), which represent different technologies making this more circumscribed to the Azure Cloud itself.

Risks

Although azure-vm-utils is a package to be used intentionally for Ubuntu on Azure, this package could be installed on all Ubuntu systems by users, which should remain unaffected by any changes made to this package.

Sometimes this extends beyond what we might consider supportable configurations; we try to stretch ourselves so that no user who apparently has an otherwise functional system is affected by this package or changes to it.

This mainly includes regressions caused by the introduction of new udev rules that conflict with the existing mapping, especially netplan configurations generated by cloud-init.

Due to hardware limitations and knowledge of the above, and in addition to the QA tests indicated on this page, in order to mitigate the risk, upstream performs a test with a custom netplan configuration that enables DHCP on all Ethernet devices and validates that the SR-IOV interface is not managed by systemd-networkd.

Changes that affect to the existing udev mapping must be called out for explicit SRU.

There is also the possibility that upstream will add new features or change behaviour in a way that, although done with the best of intentions, turn out to mismatch Ubuntu user expectations or resulting in packaging problems.”

To mitigate, all new features and behaviour changes must also be explicitly called out by the SRU driver, and the upstream implementation details specifically reviewed for compatibility with Ubuntu user expectations and packaging behaviour. The review must be done by an Ubuntu core developer. A review of merely packaging changes without considering upstream feature and behaviour changes is insufficient.

Normally, SRUs are expected to be well tested upstream or in the development release to gain confidence in correctness.

Updating the azure-vm-utils package

New versions of azure-vm-utils can be SRUed into older releases as long as the following process, which is composed of a pre-review upload and a QA Process, is followed. Please note that a package is ready and allowed to be upload when it meets the requirements of the “Definition of Ready to Request the Upload”.

Definition of Ready to Request the Upload (Requirements)

The SRU should be done with a single process bug, targeting affected series (if an update targets one stable release, it must also target all subsequent active releases -except if two interims releases are alive, targeting only the newest one- and the development release):

  • The template at the end of this document should be used

  • Changelog should contain a reference to the bug and a link to the upstream releases announcement for the package’s version.

  • Prior to the upload to the unapproved queue, the pre-SRU cases stated below should be run, and a report has to be attached per series to the bug, as first comments, along with the output of the autopkgtest.

  • The package should be reviewed by a dedicated Ubuntu Core Developer following the guidelines stated by this SRU Special Case Document with special attention to confront the “Risk” section above to avoid a review based solely on the packaging before the upload, and they could sponsor the package if necessary. Nowadays, this role is performed by Nick Rosbrook

  • The package should be reviewed by a dedicated SRU Team Member as part of the normal SRU process. Nowadays, this role is performed by Andreas Hasenack

QA Process

It consists of three stages: pre-SRU tests, SRU tests and SRU Verification.

Pre-SRU Test Cases

These are the test cases that all azure-vm-utils are subjected to before even getting to SRU. It is intended to foresee this situation and prevent any risk. They are more detailed in the Bug template.

1.) Launch instance on Azure with nmve device(s)
2.) Upgrade azure-vm-utils (usually from PPA)
3.) Execute azure-nvme-id tool and check output
4.) Check for errors in the handling of the udev rules as stated in the template

SRU Test Cases

The following will be executed for representative combinations of supported architectures, image types and machine sizes:

1.) Build new cloud image with -proposed package
2.) Boot machine from image
3.) Run all CPC image tests against machine

SRU Verification

When a new version of azure-vm-utils is uploaded to -proposed, there will be validation actions performed by the CPC azure squad and others from Microsoft maintainers. Therefore, the following will be done:

  • By The CPC Azure squad team

    • The CPC Azure squad team will write new automated tests to cover new testable functionality (if any) in the new package

    • The automated testing that the CPC team normally runs against Azure images before they are published will be run against the -proposed package

    • The new package candidate version is built in devel-proposed and tested on the target suite. This will involve one or both of:

      • Installing the devel-proposed packages on an Azure VM, manually restoring the VM to a first boot state and rebooting it,

      • Generating a fresh image with the devel-proposed package version preinstalled and testing that directly.

    • Once the manual packaging tests pass successfully and the package requires no further changes, it will be marked as such on the tracking bug. On the development release, this is done by removing the block-proposed tag.

  • By the Microsoft team maintaining the Azure VM Utils project (upstream QA)

    • that the new package addresses the issues it is expected to address, and

    • that the new package passes their internal image validation

If appropriate due to the nature of the changes (functional embargo on publication), the steps above may be done in a private PPA prior to landing in devel-proposed.

The following additional steps also apply for the SRUs to supported releases once the packages have been accepted into the development release (if applicable):

  • Once accepted in to -proposed, a test image is built from -proposed, which is subjected to the full CPC image tests; this tests for more regressions across multiple Azure instance sizes.

The CPC team will be responsible for attaching a summary of testing to the bug. CPC team members will not mark ‘verification-done’ until this has happened.

Upload Process

As stated before, the pre-SRU cases should be demonstrated prior to the upload and attached/pasted at the first comment of the SRU bug. Some extended recommendations for the changelog follow:

About the changelog, it should contain:
  • a reference to the SRU process bug, as well as all pre-existing Launchpad and GitHub bugs that are fixed if applicable; however, not all changes will be represented by an individual Launchpad bug.

  • a reference to the upstream notes.

  • major changes must be called out, especially where
    • affect to the existing udev mapping.

    • changed behaviour is not backwards compatible.

  • Any packaging changes need to be stated

  • Any architecture-specific fixes need to be noted.

Releasing the SRU

We delegate to the SRU team member that this SRU can be released without meeting the 7-day aging period as long as all of the above steps have been completed, as has traditionally been done for packages solely intended for the Azure cloud (which is the case for the azure-vm-utils package), but keeping in mind that that population is fortunately increasing every cycle being a significant proportion of all Ubuntu users.

azure-vm-utils SRU Template

== Begin SRU Template ==

[ Special Case Acknowledgement ]

 - This SRU follows the “azure-vm-utils” special case documentation:
   https://documentation.ubuntu.com/sru/en/latest/reference/exception-Azure-VM-Utils-Updates
   with a special pre-review prior to the upload.

 - Special case review for upstream feature and behavioural changes, as well as the usual
   packaging changes (see special case docs and “Risks” section): **<TODO done by [NAME] on [DATE]>**.

 - Link to the review artefact from the above (merge proposal comment/email/matrix conversation): < TODO [URL]>

[ Impact ]

 This release contains both bug-fixes and new
 features and we would like to make sure all of our supported customers have
 access to these improvements.

 Full release notes are available at:
   https://github.com/Azure/azure-vm-utils/releases/tag/v < TODO [include version from upstream] >

 <TODO if fix any other LP bug>It fixes the following LP bugs:
 *** <TODO: list any LP: # included>

[ Test Plan ]

 A) Ensure the selftests run at autopkgtest time, and pass.
 B) Check installation/upgrade/removal in Azure VM machine
 C) Manual testing in Azure VM machines (hardware dependent) composed of
    1. For any MANA, mlx4, or mlx5 SR-IOV devices with the IFF_SLAVE flag
       set:
       (a) The appropriate udev properties should be configured; and
       (b) systemd-networkd should report these devices as unmanaged

    2. Any network devices not matching the above criteria are left
       unaffected.
    3. Manually check that the command `DEVNAME=<nvme device name> azure-
       nvme-id --udev gives correct output for some device. Only Direct
       Disk v2 is fully supported for now.
    4. Check that expected /dev/disk/azure symlinks are created for the
       device.
    5. Use unmkintramfs to ensure that the udev rules are copied to the
       initrd correctly.
    6. Any other testing considered appropriate by reviewers on an individual
       SRU basis which should be specified in the artifact produced in the
       pre-upload review.

 Detailed commands can be found below. A) and B) are not detailed - it is
 understood that the reviewer or tester knows how to do it.

 Results of this test plan should be attached to the bug in comments per
 target series of the SRU for the SRU Team (Andreas Hasenack) to review.

 Requirements:

 Some azure VM are required to test this package, as it contains
       specific configurations for Ubuntu on Azure. An Azure account and
       access to the Azure portal is needed.

 azure-cli package is needed for command line VM creation ( https://packages.microsoft.com/repos/azure-cli/ ).

 Preparing the manual testing:

## 0.0 ## AZURE VM CREATION: selection of VM's family size depending on what disk and net driver we need to check:

For Microsoft NVMe Direct Disk v2, MSFT NVMe Accelerator v1.0 and networking
on mellanox v5 ->
    az vm create --resource-group miriam-azure-vm-utils --name nmve_direct --image "Canonical:ubuntu-25_10-daily:server:latest" --ssh-key-values ~/.ssh/id_rsa.pub --size Standard_E2ads_v6 --admin-username ubuntu

For Microsoft NVMe Direct Disk ->
    az vm create --resource-group miriam-azure-vm-utils --name nmve_direct_noversion --image "Canonical:ubuntu-25_10-daily:server:latest" --ssh-key-values ~/.ssh/id_rsa.pub --size Standard_D2alds_v6 --admin-username ubuntu

For Net mana driver ->
    az vm create --resource-group miriam-azure-vm-utils --name nmvemana --image "Canonical:ubuntu-25_10-daily:server:latest" --ssh-key-values ~/.ssh/id_rsa.pub --size Standard_D2ls_v6 --admin-username ubuntu

Note: Earlier v2/v3/v4 sizes with AN enabled is most likely to result in mlx4, but there's no guarantee. Therefore, we may never have the opportunity to test mlx4 for sure.

Please, log in every machine with ssh ubuntu@<vm_ip> to perform the next steps.

      ## 0.1 ## CHECK DEVICES IN ORIGINAL STATE

      # DISKS

      $ nvme list | grep -e "Microsoft NVMe Direct Disk v2" -e "MSFT NVMe Accelerator v1.0" -e "Microsoft NVMe Direct Disk"

      # NETWORK

      To record differences before and after installing the package plus
      rebooting, the following command

      $ networkctl status -a -l -n 0 | tee net_{before,after}.txt

      was used.  Note that actual differences are occasioned by the rebooting
      of systemd-networkd , not by the deployment of the azure-vm-utils package.
      However, we include the step for a sanity check.

      To check driver presence (mana, mlx4, mlx5)

      $ networkctl status $(ip a | grep SLAVE | cut -d':' -f2 | xargs) | grep -i driver | grep -e mana -e mlx

      To check udev rules (at this stage, only showing the driver):

      $ udevadm info /sys/class/net/$(ip a | grep SLAVE | cut -d':' -f2 | xargs) | grep -e AZURE_UNMANAGED_SRIOV -e ID_NET_MANAGED_BY -e ID_NET_DRIVER

      ## 0.2 ## INSTALLING PACKAGE

      $ sudo apt install azure-vm-utils

      checking all went well:

      $ dpkg -l azure-vm-utils | grep ii

      ## 0. 3 ## ENABLING SYSTEMD-NETWORKD IN DEBUG MODE

      $ sudo mkdir -p /etc/systemd/system/systemd-networkd.service.d/

      $ sudo cat > /etc/systemd/system/systemd-networkd.service.d/10-debug.conf <<EOF
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
EOF

      ## 0. 4 ## REBOOTING

      $ sudo shutdown -r now

 Performing manual testing C):

     ## 1.a ## UDEV CONFIGURED

     $ udevadm info /sys/class/net/$(ip a | grep SLAVE | cut -d':' -f2 | xargs) | grep -e AZURE_UNMANAGED_SRIOV -e ID_NET_MANAGED_BY -e ID_NET_DRIVER

     ## 1.b ## NETWORKD-SYSTEMD PROCESS THE FILE FOR SRI-OV DEVICES AND RETURNS THEM UNMANAGED

     $ sudo journalctl -b -u systemd-networkd | grep azure

     $ sudo journalctl -b -u systemd-networkd | grep $(ip a | grep SLAVE | cut -d':' -f2 | xargs) | grep -e SLAVE -e unmanaged

     ## 2 ## NO OTHER NETWORKING ITEMS AFFECTED

     $ networkctl status -a -l -n 0 | tee net_after.txt

     $ diff net_before.txt net_after.txt

     The output should be something similar to:

       17c17
 < Link File: /usr/lib/systemd/network/99-default.link
  ---
  > Link File: /run/systemd/network/10-netplan-eth0.link
  24d23
  < Alternative Names: enx002248814532
  52a52
  > enx002248814532

     ## 3 ## azure-nvme UTIL WORKS FOR NMVE DISK V2

     $ sudo DEVNAME=$(nvme list | grep v2 | cut -d' ' -f1) azure-nvme-id --udev

     ## 4 ## SYMLINKS ARE CREATED FOR THE NVME DEVICES

     $ udevadm info $(nvme list | grep "MSFT NVMe Accelerator v1.0" | cut -d' ' -f1) | grep -i -e model -e azure

     $ udevadm info $(nvme list | grep "Microsoft NVMe Direct Disk" | cut -d' ' -f1) | grep -i -e model -e azure

     $ udevadm info $(nvme list | grep "Microsoft NVMe Direct Disk v2" | cut -d' ' -f1) | grep -i -e model -e azure

     ## 5 ## UDEV RULES ARE COPIED TO INITRD

     $ unmkinitramfs /boot/initrd.img-$(uname -r) initramfs/

     $ ls initramfs/lib/udev/rules.d/*0*azure* | wc -l # it should return only 2

[ Where problems could occur ]

Although azure-vm-utils is a package to be used intentionally for Ubuntu on
Azure, this package could be installed on all Ubuntu systems by users, which
should remain unaffected by any changes made to this package.

Sometimes this extends beyond what we might consider supportable
configurations; we try to stretch ourselves so that no user who apparently
has an otherwise functional system is affected by this package or changes
to it.

This mainly includes regressions caused by the introduction of new udev rules
that conflict with the existing mapping, specially netplan configurations
generated by cloud-init.

The test plan above is intended to foresee this situation and prevent any
risk. In addition, due to hardware constraints and awareness of the above,
upstream performs a test with a custom netplan configuration that enables DHCP
on all Ethernet devices and validates that the interface is unmanaged.

Additionally, upstream feature additions or behaviour changes may unintentionally diverge
from Ubuntu user expectations or introduce packaging issues. To mitigate, such changes must
be explicitly called out by the SRU driver and reviewed by an Ubuntu core developer for
compatibility with Ubuntu expectations and packaging behaviour (see “Risks” in the “azure-vm-utils”
special case documentation referenced in the Special Case Acknowledgement).


[ Other Info ]

The QA process consists of three stages: pre-SRU tests (the [Test Plan]
section), SRU tests and SRU Verification.

Output reports per target series of the [Test Plan] should be attached
to this bug before them can be reviewed by the SRU team.

SRU Test Cases
..............

The following will be executed for representative combinations of supported
architectures, image types and machine sizes:

   1.) Build new cloud image with -proposed package
   2.) Boot machine from image
   3.) Run all CPC image tests against machine

SRU Verification
................

When a new version of azure-vm-utils is uploaded to -proposed, there will be
validate actions performed by the CPC azure squad and others from Microsoft
maintainers. Therefore, the following will be done:

- By The CPC Azure squad team

  - the CPC azure squad team will write new automated tests to cover new
     testable functionality (if any) in the new package
  - the automated testing that the CPC team normally runs against Azure
     images before they are published will be run against the -proposed
     package.
  - The new package candidate version is built in devel-proposed and tested
     on the target suite. This will involve one or both of:
     - Installing the devel-proposed packages on an Azure VM, manually
        restoring the VM to a first boot state and rebooting it,
     - Generating a fresh image with the devel-proposed package version
        preinstalled and testing that directly
  - Once the manual packaging tests pass successfully and the package
     requires no further changes, it will be marked as such on the tracking
     bug. On the development release, this is done by removing the
     *block-proposed* tag.

- By the Microsoft team maintaining the Azure VM Utils project(upstream QA)
  - that the new package addresses the issues it is expected to address, and
  - that the new package passes their internal image validation

**If appropriate due to the nature of the changes (functional embargo on publication), the steps above may be done in a private PPA prior to landing in devel-proposed.**

The CPC team will be responsible for attaching a summary of testing to the bug.
CPC team members will not mark ‘verification-done’ until this has happened.

== End SRU Template ==