2023.1 Series (21.2.0 - 21.4.x) Release Notes

2023.1-eom-3

Bug Fixes

  • The fix for CVE-2024-47211 results in image checksum being required in all cases. However there is no checksum requirement for file:// based images. When checksum is missing for file:// based image_source it is now calculated on-the-fly.

2023.1-eom

Security Issues

  • An issue in Ironic has been resolved where image checksums would not be checked prior to the conversion of an image to a raw format image from another image format.

    With default settings, this normally would not take place, however the image_download_source option, which is available to be set at a node level for a single deployment, by default for that baremetal node in all cases, or via the [agent]image_download_source configuration option when set to local. By default, this setting is http.

    This was in concert with the [DEFAULT]force_raw_images when set to True, which caused Ironic to download and convert the file.

    In a fully integrated context of Ironic’s use in a larger OpenStack deployment, where images are coming from the Glance image service, the previous pattern was not problematic. The overall issue was introduced as a result of the capability to supply, cache, and convert a disk image provided as a URL by an authenticated user.

    Ironic will now validate the user supplied checksum prior to image conversion on the conductor. This can be disabled using the [conductor]disable_file_checksum configuration option.

Bug Fixes

  • Fixes a security issue where Ironic would fail to checksum disk image files it downloads when Ironic had been requested to download and convert the image to a raw image format. This required the image_download_source to be explicitly set to local, which is not the default.

    This fix can be disabled by setting [conductor]disable_file_checksum to True, however this option will be removed in new major Ironic releases.

    As a result of this, parity has been introduced to align Ironic to Ironic-Python-Agent’s support for checksums used by standalone users of Ironic. This includes support for remote checksum files to be supplied by URL, in order to prevent breaking existing users which may have inadvertently been leveraging the prior code path. This support can be disabled by setting [conductor]disable_support_for_checksum_files to True.

21.4.3

Upgrade Notes

  • When upgrading Ironic to address the qemu-img image conversion security issues, the ironic-python-agent ramdisks will also need to be upgraded.

  • When upgrading Ironic to address the qemu-img image conversion security issues, the [conductor]conductor_always_validates_images setting may be set to True as a short term remedy while ironic-python-agent ramdisks are being updated. Alternatively it may be advisable to also set the [agent]image_download_source setting to local to minimize redundant network data transfers.

  • As a result of security fixes to address qemu-img image conversion security issues, a new configuration parameter has been added to Ironic, [conductor]permitted_image_formats with a default value of “raw,qcow2,iso”. Raw and qcow2 format disk images are the image formats the Ironic community has consistently stated as what is supported and expected for use with Ironic. These formats also match the formats which the Ironic community tests. Operators who leverage other disk image formats, may need to modify this setting further.

Security Issues

  • Ironic now checks the supplied image format value against the detected format of the image file, and will prevent deployments should the values mismatch. If being used with Glance and a mismatch in metadata is identified, it will require images to be re-uploaded with a new image ID to represent corrected metadata. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic always inspects the supplied user image content for safety prior to deployment of a node should the image pass through the conductor, even if the image is supplied in raw format. This is utilized to identify the format of the image and the overall safety of the image, such that source images with unknown or unsafe feature usage are explicitly rejected. This can be disabled by setting [conductor]disable_deep_image_inspection to True. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic can also inspect images which would normally be provided as a URL for direct download by the ironic-python-agent ramdisk. This is not enabled by default as it will increase the overall network traffic and disk space utilization of the conductor. This level of inspection can be enabled by setting [conductor]conductor_always_validates_images to True. Once the ironic-python-agent ramdisk has been updated, it will perform similar image security checks independently, should an image conversion be required. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic now explicitly enforces a list of permitted image types for deployment via the [conductor]permitted_image_formats setting, which defaults to “raw”, “qcow2”, and “iso”. While the project has classically always declared permissible images as “qcow2” and “raw”, it was previously possible to supply other image formats known to qemu-img, and the utility would attempt to convert the images. The “iso” support is required for “boot from ISO” ramdisk support.

  • Ironic now explicitly passes the source input format to executions of qemu-img to limit the permitted qemu disk image drivers which may evaluate an image to prevent any mismatched format attacks against qemu-img.

  • The ansible deploy interface example playbooks now supply an input format to execution of qemu-img. If you are using customized playbooks, please add “-f {{ ironic.image.disk_format }}” to your invocations of qemu-img. If you do not do so, qemu-img will automatically try and guess which can lead to known security issues with the incorrect source format driver.

  • Operators who have implemented any custom deployment drivers or additional functionality like machine snapshot, should review their downstream code to ensure they are properly invoking qemu-img. If there are any questions or concerns, please reach out to the Ironic project developers.

  • Operators are reminded that they should utilize cleaning in their environments. Disabling any security features such as cleaning or image inspection are at your own risk. Should you have any issues with security related features, please don’t hesitate to open a bug with the project.

  • The [conductor]disable_deep_image_inspection setting is conveyed to the ironic-python-agent ramdisks automatically, and will prevent those operating ramdisks from performing deep inspection of images before they are written.

  • The [conductor]permitted_image_formats setting is conveyed to the ironic-python-agent ramdisks automatically. Should a need arise to explicitly permit an additional format, that should take place in the Ironic service configuration.

Bug Fixes

  • Fixes multiple issues in the handling of images as it relates to the execution of the qemu-img utility, which is used for image format conversion, where a malicious user could craft a disk image to potentially extract information from an ironic-conductor process’s operating environment.

    Ironic now explicitly enforces a list of approved image formats as a [conductor]permitted_image_formats list, which mirrors the image formats the Ironic project has historically tested and expressed as known working. Testing is not based upon file extension, but upon content fingerprinting of the disk image files. This is tracked as CVE-2024-44082 via bug 2071740.

21.4.2

Bug Fixes

  • Adds an ISO publisher value to ISO images which are mastered as part of cleaning/deployment/service operations in support of a fix for bug 2032377.

21.4.1

Bug Fixes

  • Fixes an issue with units tests that show this DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future. cls = validator_for(schema) Removed the warning for deprecated schema by using a new template.

  • Fixes Ironic integration with Cinder because of changes which resulted as part of the recent Security related fix in bug 2004555. The work in Ironic to track this fix was logged in bug 2019892. Ironic now sends a service token to Cinder, which allows for access restrictions added as part of the original CVE-2023-2088 fix to be appropriately bypassed. Ironic was not vulnerable, but the restrictions added as a result did impact Ironic’s usage. This is because Ironic volume attachments are not on a shared “compute node”, but instead mapped to the physical machines and Ironic handles the attachment life-cycle after initial attachment.

  • Fixes Invalid cross-device link in some cases when using file:// image URLs.

  • Fixes the behavior of file:/// image URLs pointing at a symlink. Ironic no longer creates a hard link to the symlink, which could cause confusing FileNotFoundError to happen if the symlink is relative.

  • Fixes an issue when listing allocations as a project scoped user when the legacy RBAC policies have been disabled which forced an HTTP 406 error being erroneously raised. Users attempting to list allocations with a specific owner, different from their own, will now receive an HTTP 403 error.

  • Properly eject the virtual media from a DVD device in case this is the only MediaType available from the Hardware, and Ironic requested CD as the device to be used. See bug 2039042 for details.

  • Fixes bug of iRMC driver in parse_driver_info where, if FIPS is enabled, SNMP version is always required to be version 3 even though iRMC driver’s xxx_interface doesn’t use SNMP actually.

  • Fixes bug in iRMC driver, where irmc power_interface sets and updates irmc_ipmi_succeed flag which is used by rest of iRMC driver code to deal with iRMC firmware’s IPMI incompatibility but ipmitool power_interface doesn’t set nor update irmc_ipmi_succeed flag and rest of iRMC driver code fail to handle iRMC firmware’s IPMI incompatibility correctly.

  • Fixes an issue where an agent token could be inadvertently orphaned if a node is already in the target power state when we attempt to turn the node off.

  • Fixes scope classification check with the “self_owned_node” policy check where it was limited to check execution with only project scoped, so system scoped users who ticked the policy endpoint would basically get an incorrect error.

  • Fixes an issue where a System Scoped user could not trigger a node into a manageable state with cleaning enabled, as the Neutron client would attempt to utilize their user’s token to create the Neutron port for the cleaning operation, as designed. This is because with requests made in the system scope, there is no associated project and the request fails.

    Ironic now checks if the request has been made with a system scope, and if so it utilizes the internal credential configuration to communicate with Neutron.

  • Fixes secure boot with anaconda deploy.

  • Fixes the bug where provisioning a Redfish managed node fails if the BMC doesn’t support EthernetInterfaces attribute, even if MAC address information is provided manually. This is done by handling of MissingAttributeError sushy exception in get_mac_addresses() method. This fix is needed to successfully provision machines such as Cisco UCSB and UCSX.

  • Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.

  • No longer re-calculates checksums for images that are already raw. Previously, it would cause significant delays in deploying raw images.

  • The per-node external_http_url setting in the driver info is now used for a boot ISO. Previously this setting was only used for a config floppy.

  • Fixes an issue where the conductor service would fail to launch when the neutron network_interface setting was enabled, and no global cleaning_network or provisioning_network is set in ironic.conf. These settings have long been able to be applied on a per-node basis via the API. As such, the service can now be started and will error on node validation calls, as designed for drivers missing networking parameters.

  • Fixes Raid creation issue in iLO6 and other BMC with latest schema by removing ‘VolumeType’, ‘Encrypted’ and changing placement of ‘Drives’ to inside ‘Links’.

  • Provides a fix for service role support to enable the use case where a dedicated service project is used for cloud service operation to facilitate actions as part of the operation of the cloud infrastructure.

    OpenStack clouds can take a variety of configuration models for service accounts. It is now possible to utilize the [DEFAULT] rbac_service_role_elevated_access setting to enable users with a service role in a dedicated service project to act upon the API similar to a “System” scoped “Member” where resources regardless of owner or lessee settings are available. This is needed to enable synchronization processes, such as nova-compute or the networking-baremetal ML2 plugin to perform actions across the whole of an Ironic deployment, if desirable where a “System” scoped user is also undesirable.

    This functionality can be tuned to utilize a customized project name aside from the default convention service, for example baremetal or admin, utilizing the [DEFAULT] rbac_service_project_name setting.

    Operators can alternatively entirely override the service_role RBAC policy rule, if so desired, however Ironic feels the default is both reasonable and delineates sufficiently for the variety of Role Based Access Control usage cases which can exist with a running Ironic deployment.

  • Fixes an issue where an agent token was being orphaned if a baremetal node timed out during cleaning operations, leading to issues where the node would not be able to establish a new token with Ironic upon future in some cases. We now always wipe the token in this case.

21.4.0

Prelude

The Ironic team hereby announces the release of OpenStack 2023.1 (Ironic 23.4.0). This repesents the completion of a six month development cycle, which primarily focused on internal and scaling improvements. Those improvements included revamping the database layer to improve performance and ensure compatability with new versions of SQLAlchemy, enhancing the ironic-conductor service to export application metrics to prometheus via the ironic-prometheus-exporter, and the addition of a new API concept of node sharding to help with scaling of services that make frequent API calls to Ironic. The new Ironic release also comes with a slew of bugfixes for Ironic services and hardware drivers. We sincerely hope you enjoy it!

New Features

  • Adds support for the service role, which is intended for service to service communication, such as for those where ironic-inspector, nova-compute, or networking-baremetal needs to communicate with Ironic’s API.

  • Adds the ability for Ironic to send conductor process metrics for monitoring. This requires the use of a new [metrics]backend option value of collector. This data was previously only available through the use of statsd. This requires ironic-lib version 5.4.0 or newer. This capability can be disabled using the [sensor_data]enable_for_conductor option if set to False.

  • Adds a [sensor_data]enable_for_nodes configuration option to allow operators to disable sending node metric data via the message bus notifier.

  • Adds a new gauge metric ConductorManager.PowerSyncNodesCount which tracks the nodes considered for power state synchrnozation.

  • Adds a new gauge metric ConductorManager.PowerSyncRecoveryNodeCount which represents the number of nodes which are being evaluated for power state recovery checking.

  • Adds a new gauge metric ConductorManager.SyncLocalStateNodeCount which represents the number of nodes being tracked locally by the conductor.

  • There are now configurable random wait times for fake drivers in a new ironic.conf [fake] section. Each supported driver having one configuration option controlling the delay. These delays are applied to operations which typically block in other drivers. This allows more realistic scenarios to be arranged for performance and functional testing of ironic itself.

  • Adds support for setting a shard key on a node, and filtering node or port lists by shard. This shard key is not used for any purpose internally in Ironic, but instead is intended to allow API clients to filter for a subset of nodes or ports. Being able to fetch only a subset of nodes or ports is useful for parallelizing any operational task that needs to be performed across all nodes or ports.

  • Adds support for querying for nodes which are sharded or unsharded. This is useful for allowing operators to find nodes which have not been assigned a shard key.

  • Adds support for querying for a list of shards via /v1/shards. This endpoint will return a list of currently assigned shard keys as well as the count of nodes which has those keys assigned. Using this API endpoint, operators can see a high level listing of how their nodes are sharded.

Known Issues

  • Sensor data notifications to the message bus, such as using the [metrics]backend configuration option of collector on a dedicated API service process or instance, is not presently supported. This functionality requires a periodic task to trigger the transmission of metrics messages to the message bus notifier.

Upgrade Notes

  • Ironic now has support for the service role, which is available in the system scope as well as the project scope. This functionality is for service to service communication, if desired. Effective access rights are similar to the manager or the owner scoped admin privileges.

  • Two statsd metrics names have been modified to provide structural clarity and consistency for consumers of statistics metrics. Consumers of metrics statistics may need to update their dashboards as the post_clean_step_hook metric is now named AgentBase.post_clean_step_hook, and the post_deploy_step_hook is now named AgentBase.post_deploy_step_hook.

Deprecation Notes

  • The setting values starting with send_sensor in the [conductor] configuration group have been deprecated and moved to a [sensor_data] configuration group. The names have been updated to shorter, operator friendly names..

Bug Fixes

  • When aborting cleaning, the last_error field is no longer initially empty. It is now populated on the state transition to clean failed.

  • When cleaning or deployment fails, the last_error field is no longer temporary set to None while the power off action is running.

  • Fixes an issue that when a node has console enabled but pid file missing, the console could not be disabled as well as be restarted, which makes the console feature unusable.

  • Fixes issues that auto-allocated console port could conflict on the same host under certain circumstances related to conductor takeover.

    For more information, see story 2010489.

  • Fixes a database API internal check to update the inspection_finished_at field upon the completion of inspection.

  • Fixes an issue in the online upgrade logic where database models for Node Traits and BIOS Settings resulted in an error when performing the online data migration. This was because these tables were originally created as extensions of the Nodes database table, and the schema of the database was slightly different enough to result in an error if there was data to migrate in these tables upon upgrade, which would have occured if an early BIOS Setting adopter had data in the database prior to upgrading to the Yoga release of Ironic.

    The online upgrade parameter now subsitutes an alternate primary key name name when applicable.

  • When a conductor service is stopped it will now continue to respond to RPC requests until [DEFAULT]hash_ring_reset_interval has elapsed, allowing a hash ring reset to complete on the cluster after conductor is unregistered. This will improve the reliability of the cluster when scaling down or rolling out updates.

    This delay only occurs when there is more than one online conductor, to allow fast restarts on single-node ironic installs (bifrost, metal3).

Other Notes

  • The default logging level for the oslo_concurrencty.lockutils module logging has been changed to WARNING. By default, the debug logging was resulting in lots of noise. Operators wishing to view debug logging for this module can tuilize the [DEFAULT]default_log_levels configuration option.