Node cleaning¶

Overview¶

Ironic provides two modes for node cleaning: automated and manual.

Automated cleaning is automatically performed before the first workload has been assigned to a node and when hardware is recycled from one workload to another.

Manual cleaning must be invoked by the operator.

Automated cleaning¶

When hardware is recycled from one workload to another, Ironic performs automated cleaning on the node to ensure it’s ready for another workload. This ensures the tenant will get a consistent bare metal node deployed every time.

With automated cleaning, nodes move to cleaning state when moving from active -> available state (when the hardware is recycled from one workload to another). Nodes also traverse cleaning when going from manageable -> available state (before the first workload is assigned to the nodes). For a full understanding of all state transitions into cleaning, please see Bare Metal State Machine.

Enabling automated cleaning¶

To enable automated cleaning, ensure that your ironic.conf is set as follows:

[conductor]
automated_clean=true

This will enable the default set of cleaning steps, based on your hardware and Ironic hardware types used for nodes. This includes, by default, erasing all of the previous tenant’s data.

You may also need to configure a Cleaning Network.

Cleaning steps¶

The way cleaning steps are determined depends on the value of conductor.automated_cleaning_step_source:

Autogenerated cleaning steps (‘autogenerated’)

This is the default mode of Ironic automated cleaning and provides the original Ironic behavior implemented originally in Kilo.

Steps are collected from hardware interfaces and ordered from higher to lower priority, where a larger integer is a higher priority. In case of a conflict between priorities across interfaces, the following resolution order is used: Power, Management, Deploy, BIOS, and RAID interfaces.

You can skip a cleaning step by setting the priority for that cleaning step to zero or None. You can reorder the cleaning steps by modifying the integer priorities of the cleaning steps.

Runbook-based cleaning steps (‘runbook’)

When using Runbooks for Cleaning & Servicing for automated cleaning, the exact steps and their order are defined in the runbook. Priority-based ordering does not apply; steps execute in the order specified in the runbook.

If there is not a runbook assigned to perform cleaning on the node, and automated_cleaning is enabled, the machine will fail to clean and go into a clean failed state.

Hybrid (‘hybrid’)

This uses a runbook-based cleaning method if a cleaning runbook is configured for the node being cleaned. In this mode, if there is not a runbook configured for cleaning Ironic will fall-back to autogenerating cleaning steps.

See How do I change the priority of a cleaning step? for more information on changing the priority of an autogenerated cleaning step.

See Configuring automated cleaning with runbooks for full details on configuring cleaning runbooks.

Storage cleaning options¶

Warning

Ironic’s storage cleaning options by default will remove data from the disk permanently during automated cleaning.

Clean steps specific to storage are erase_devices, erase_devices_metadata and (added in Yoga) erase_devices_express.

erase_devices aims to ensure that the data is removed in the most secure way available. On devices that support hardware-assisted secure erasure (many NVMe and some ATA drives), this is the preferred option. If hardware-assisted secure erasure is not available and if deploy.continue_if_disk_secure_erase_fails is set to True, cleaning will fall back to using shred to overwrite the contents of the device. By default, if erase_devices is enabled and Ironic is unable to erase the device, cleaning will fail to ensure data security.

Note

erase_devices may take a very long time (hours or even days) to complete, unless fast, hardware-assisted data erasure is supported by all the devices in a system.

erase_devices_metadata clean step doesn’t provide as strong assurance of irreversible destruction of data as erase_devices. However, it has the advantage of a reasonably quick runtime (seconds to minutes). It operates by destroying the metadata of the storage device without erasing every bit of the data itself. Attempts to restore data after running erase_devices_metadata may be successful but would certainly require relevant expertise and specialized tools.

Lastly, erase_devices_express combines some of the perks of both erase_devices and erase_devices_metadata. It attempts to utilize hardware-assisted data erasure features if available (currently only NVMe devices are supported). In case hardware-assisted data erasure is not available, it falls back to metadata erasure for the device (which is identical to erase_devices_metadata). It can be considered a time-optimized mode of storage cleaning, aiming to perform as thorough data erasure as it is possible within a short period of time. This clean step is particularly well suited for environments with hybrid NVMe-HDD storage configuration as it allows fast and secure erasure of data stored on NVMes combined with equally fast but more basic metadata-based erasure of data on commodity HDDs.

By default, Ironic will use erase_devices_metadata early in cleaning for reliability (ensuring a node cannot reboot into its old workload) and erase_devices later in cleaning to securely erase the drive; erase_devices_express is disabled.

Operators can use deploy.erase_devices_priority and deploy.erase_devices_metadata_priority to change the priorities of the default device erase methods or disable them entirely by setting 0. Other cleaning steps can have their priority modified via the conductor.clean_step_priority_override option. For example, the configuration snippet below disables erase_devices_metadata and erase_devices and instead performs an erase_devices_express erase step.

[deploy]
erase_devices_priority=0
erase_devices_metadata_priority=0

[conductor]
clean_step_priority_override=deploy.erase_devices_express:95

This ensures that erase_devices and erase_devices_metadata are disabled so that storage is not cleaned twice and then assigns a non-zero priority to erase_devices_express, hence enabling it. Any non-zero priority specified in the priority override will work; larger values will cause the disk erasure to run earlier in the cleaning process if multiple steps are enabled.

Other configurations that can modify how Ironic erases disks are below. This list may not be comprehensive. Please review ironic.conf.sample (linked) for more details:

deploy.enable_ata_secure_erase, default True
deploy.enable_nvme_secure_erase, default True
deploy.shred_random_overwrite_iterations, default 1
deploy.shred_final_overwrite_with_zeros, default True
deploy.disk_erasure_concurrency, default 4

Warning

Ironic automated cleaning is defaulted to a secure configuration. You should not modify settings related to it unless you have special hardware needs or a unique use case. Misconfigurations can lead to data exposure vulnerabilities.

Configuring automated cleaning with runbooks¶

Starting with the 2025.2/Flamingo release, operators can configure Ironic to use runbooks for automated cleaning instead of relying on autogenerated steps. This provides more control over the cleaning process and ensures consistency across nodes.

Warning

When using runbooks for automated cleaning, ensure they include appropriate security measures such as disk erasure. Ironic does not validate that a runbook performs disk cleaning operations or any other specific cleaning step.

Trait matching

As always with runbooks, you must have a trait on the node which matches the runbook name. This allows a fail-safe to prevent dangerous, hardware-specific cleaning steps from running on incompatible hardware.

You can disable this check by setting conductor.automated_cleaning_runbook_validate_traits to False.

openstack baremetal node add trait myNode CUSTOM_RB_EXAMPLE

Configure cleaning runbooks

Runbooks can be configured at three levels (from most to least specific):

Per-node:
Operators can set a per-node cleaning runbook override using the following command:
openstack baremetal node set myNode --driver-info cleaning_runbook=CUSTOM_RB_EXAMPLE
Warning

Customizing cleaning per node requires setting conductor.automated_cleaning_runbook_from_node to True.

Enabling node-level runbooks allows node owners to override cleaning behavior via use a noop runbook. Only enable this in trusted environments.
Per-resource-class:

Operators can set a runbook per resource_class using conductor.automated_cleaning_runbook_by_resource_class to build a list of mappings of resource_class to runbook. These runbooks are used to clean any node in that resource class that do not have a node-level override.

In this example, the large resource_class uses CUSTOM_FULL_CLEAN and the small resource_class uses CUSTOM_QUICK_CLEAN. Nodes in those resource classes would still be required to have traits matching the runbook name.
```
[conductor]
automated_cleaning_runbook_by_resource_class = large:CUSTOM_FULL_CLEAN,small:CUSTOM_QUICK_CLEAN
```
Global default: Operators can also configure a global default, which is used for nodes which do not already have a more specific runbook configured, such as node-level overrides or a resource_class mapping.

In this example, any node cleaned in the environment would use CUSTOM_DEFAULT_CLEAN. Unless trait mapping is disabled, all nodes would be required to have a trait also named CUSTOM_DEFAULT_CLEAN to successfully clean.
```
[conductor]
automated_cleaning_runbook = CUSTOM_DEFAULT_CLEAN
```

Create and assign runbooks

Create a runbook with the necessary cleaning steps:

baremetal runbook create --name CUSTOM_SECURE_ERASE \
  --steps '[{"interface": "deploy", "step": "erase_devices", "args": {}, "order": 0}]'

Ensure nodes have the matching trait:

baremetal node add trait <node> CUSTOM_SECURE_ERASE

Management Interface¶

idrac-redfish cleaning steps¶
Name	Details	Stoppable	Arguments
`clear_job_queue`	Clear iDRAC job queue.	no
`clear_secure_boot_keys`	Clear all secure boot keys.	no
`export_configuration`	(Deprecated) Export the configuration of the server. Exports the configuration of the server against which the step is run and stores it in specific format in indicated location. Uses Dell’s Server Configuration Profile (SCP) from sushy oem extension to get ALL configuration for cloning.	no	`export_configuration_location` (required) – URL of location to save the configuration to.
`import_configuration`	(Deprecated) Import and apply the configuration to the server. Gets pre-created configuration from storage by given location and imports that into given server. Uses Dell’s Server Configuration Profile (SCP).	no	`import_configuration_location` (required) – URL of location to fetch desired configuration from.
`import_export_configuration`	Import and export configuration in one go. Gets pre-created configuration from storage by given name and imports that into given server. After that exports the configuration of the server against which the step is run and stores it in specific format in indicated storage as configured by Ironic.	no	`export_configuration_location` (required) – URL of location to save the configuration to. `import_configuration_location` (required) – URL of location to fetch desired configuration from.
`known_good_state`	Reset iDRAC to known good state. An iDRAC is reset to a known good state by resetting it and clearing its job queue.	no
`reset_idrac`	Reset the iDRAC.	no
`reset_secure_boot_keys_to_default`	Reset secure boot keys to manufacturing defaults.	no
`set_bmc_clock`	Set the BMC clock using Redfish Manager resource.	no	`datetime_local_offset` – The local time offset from UTC `target_datetime` (required) – The datetime to set in ISO8601 format
`update_firmware`	Updates the firmware on the node.	no	`firmware_images` (required) – A list of firmware images to apply.

ilo cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`activate_license`	Activates iLO Advanced license.	0	no	`ilo_license_key` (required) – The HPE iLO Advanced license key to activate enterprise features.
`add_https_certificate`	Adds the signed HTTPS certificate to the iLO.	0	no	`cert_file` (required) – This argument represents the path to the signed HTTPS certificate which will be added to the iLO.
`clear_secure_boot_keys`	Clear all secure boot keys. Clears all the secure boot keys. This operation is supported only on HP Proliant Gen9 and above servers.	0	no
`create_csr`	Creates the CSR.	0	no	`csr_params` (required) – This arguments represents the information needed to create the CSR certificate. The keys to be provided are City, CommonName, OrgName, State.
`reset_bios_to_default`	Resets the BIOS settings to default values. Resets BIOS to default settings. This operation is currently supported only on HP Proliant Gen9 and above servers.	10	no
`reset_ilo`	Resets the iLO.	0	no
`reset_ilo_credential`	Resets the iLO password.	30	no
`reset_secure_boot_keys_to_default`	Reset secure boot keys to manufacturing defaults. Resets the secure boot keys to manufacturing defaults. This operation is supported only on HP Proliant Gen9 and above servers.	20	no
`security_parameters_update`	Updates the security parameters.	0	no	`security_parameters` (required) – This argument represents the ordered list of JSON dictionaries of security parameters. Each security parameter consists of three fields, namely ‘param’, ‘ignore’ and ‘enable’ from which ‘param’ field will be mandatory. These fields represent security parameter name, ignore flag and state of the security parameter. The supported security parameter names are ‘password_complexity’, ‘require_login_for_ilo_rbsu’, ‘ipmi_over_lan’, ‘secure_boot’, ‘require_host_authentication’. The security parameters will be updated (in the order given) one by one on the baremetal server.
`update_auth_failure_logging_threshold`	Updates the Auth Failure Logging Threshold security parameter.	0	no	`ignore` – This argument represents boolean parameter. If set ‘True’ the security parameters will be ignored by iLO while computing the overall iLO security status. If not specified, default will be ‘False’. `logging_threshold` – This argument represents the authentication failure logging threshold that can be set for ilo. If not specified, default will be 1.
`update_firmware`	Updates the firmware.	0	no	`firmware_images` (required) – This argument represents the ordered list of JSON dictionaries of firmware images. Each firmware image dictionary consists of three mandatory fields, namely ‘url’, ‘checksum’ and ‘component’. These fields represent firmware image location URL, md5 checksum of image file and firmware component type respectively. The supported firmware URL schemes are ‘file’, ‘http’, ‘https’ and ‘swift’. The supported values for firmware component are ‘ilo’, ‘cpld’, ‘power_pic’, ‘bios’ and ‘chassis’. The firmware images will be applied (in the order given) one by one on the baremetal server. For more information, see https://docs.openstack.org/ironic/latest/admin/drivers/ilo.html#initiating-firmware-update-as-manual-clean-step `firmware_update_mode` (required) – This argument indicates the mode (or mechanism) of firmware update procedure. Supported value is ‘ilo’.
`update_firmware_sum`	Clean step to update the firmware using Smart Update Manager (SUM)	0	no	`checksum` (required) – The md5 checksum of the SPP image file. `components` – The list of firmware component filenames. If not specified, SUM updates all the firmware components. `url` (required) – The image location for SPP (Service Pack for Proliant) ISO.
`update_minimum_password_length`	Updates the Minimum Password Length security parameter.	0	no	`ignore` – This argument represents boolean parameter. If set ‘True’ the security parameters will be ignored by iLO while computing the overall iLO security status. If not specified, default will be ‘False’. `password_length` – This argument represents the minimum password length that can be set for ilo. If not specified, default will be 8.

ilo5 cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`activate_license`	Activates iLO Advanced license.	0	no	`ilo_license_key` (required) – The HPE iLO Advanced license key to activate enterprise features.
`add_https_certificate`	Adds the signed HTTPS certificate to the iLO.	0	no	`cert_file` (required) – This argument represents the path to the signed HTTPS certificate which will be added to the iLO.
`clear_ca_certificates`	Clears the certificates provided in the list of files to iLO.	0	no	`certificate_files` (required) – The list of files containing the certificates to be cleared. If empty list is specified, all the certificates on the ilo will be cleared, except the certificates in the file configured with configuration parameter ‘webserver_verify_ca’ are spared as they are required for booting the deploy image for some boot interfaces.
`clear_secure_boot_keys`	Clear all secure boot keys. Clears all the secure boot keys. This operation is supported only on HP Proliant Gen9 and above servers.	0	no
`create_csr`	Creates the CSR.	0	no	`csr_params` (required) – This arguments represents the information needed to create the CSR certificate. The keys to be provided are City, CommonName, OrgName, State.
`erase_devices`	Erase all the drives on the node. This method performs out-of-band sanitize disk erase on all the supported physical drives in the node. This erase cannot be performed on logical drives.	0	no	`erase_pattern` – Dictionary of disk type and corresponding erase pattern to be used to perform specific out-of-band sanitize disk erase. Supported values are, for “hdd”: (“overwrite”, “crypto”, “zero”), for “ssd”: (“block”, “crypto”, “zero”). Default pattern is: {“hdd”: “overwrite”, “ssd”: “block”}.
`one_button_secure_erase`	Erase the whole system securely. The One-button secure erase process resets iLO and deletes all licenses stored there, resets BIOS settings, and deletes all Active Health System (AHS) and warranty data stored on the system. It also erases supported non-volatile storage data and deletes any deployment setting profiles.	0	no
`reset_bios_to_default`	Resets the BIOS settings to default values. Resets BIOS to default settings. This operation is currently supported only on HP Proliant Gen9 and above servers.	10	no
`reset_ilo`	Resets the iLO.	0	no
`reset_ilo_credential`	Resets the iLO password.	30	no
`reset_secure_boot_keys_to_default`	Reset secure boot keys to manufacturing defaults. Resets the secure boot keys to manufacturing defaults. This operation is supported only on HP Proliant Gen9 and above servers.	20	no
`security_parameters_update`	Updates the security parameters.	0	no	`security_parameters` (required) – This argument represents the ordered list of JSON dictionaries of security parameters. Each security parameter consists of three fields, namely ‘param’, ‘ignore’ and ‘enable’ from which ‘param’ field will be mandatory. These fields represent security parameter name, ignore flag and state of the security parameter. The supported security parameter names are ‘password_complexity’, ‘require_login_for_ilo_rbsu’, ‘ipmi_over_lan’, ‘secure_boot’, ‘require_host_authentication’. The security parameters will be updated (in the order given) one by one on the baremetal server.
`update_auth_failure_logging_threshold`	Updates the Auth Failure Logging Threshold security parameter.	0	no	`ignore` – This argument represents boolean parameter. If set ‘True’ the security parameters will be ignored by iLO while computing the overall iLO security status. If not specified, default will be ‘False’. `logging_threshold` – This argument represents the authentication failure logging threshold that can be set for ilo. If not specified, default will be 1.
`update_firmware`	Updates the firmware.	0	no	`firmware_images` (required) – This argument represents the ordered list of JSON dictionaries of firmware images. Each firmware image dictionary consists of three mandatory fields, namely ‘url’, ‘checksum’ and ‘component’. These fields represent firmware image location URL, md5 checksum of image file and firmware component type respectively. The supported firmware URL schemes are ‘file’, ‘http’, ‘https’ and ‘swift’. The supported values for firmware component are ‘ilo’, ‘cpld’, ‘power_pic’, ‘bios’ and ‘chassis’. The firmware images will be applied (in the order given) one by one on the baremetal server. For more information, see https://docs.openstack.org/ironic/latest/admin/drivers/ilo.html#initiating-firmware-update-as-manual-clean-step `firmware_update_mode` (required) – This argument indicates the mode (or mechanism) of firmware update procedure. Supported value is ‘ilo’.
`update_firmware_sum`	Clean step to update the firmware using Smart Update Manager (SUM)	0	no	`checksum` (required) – The md5 checksum of the SPP image file. `components` – The list of firmware component filenames. If not specified, SUM updates all the firmware components. `url` (required) – The image location for SPP (Service Pack for Proliant) ISO.
`update_minimum_password_length`	Updates the Minimum Password Length security parameter.	0	no	`ignore` – This argument represents boolean parameter. If set ‘True’ the security parameters will be ignored by iLO while computing the overall iLO security status. If not specified, default will be ‘False’. `password_length` – This argument represents the minimum password length that can be set for ilo. If not specified, default will be 8.

irmc cleaning steps¶
Name	Details	Stoppable	Arguments
`clear_secure_boot_keys`	Clear all secure boot keys.	no
`reset_secure_boot_keys_to_default`	Reset secure boot keys to manufacturing defaults.	no
`restore_irmc_bios_config`	Restore BIOS config for a node.	no
`set_bmc_clock`	Set the BMC clock using Redfish Manager resource.	no	`datetime_local_offset` – The local time offset from UTC `target_datetime` (required) – The datetime to set in ISO8601 format
`update_firmware`	Updates the firmware on the node.	no	`firmware_images` (required) – A list of firmware images to apply.

redfish cleaning steps¶
Name	Details	Stoppable	Arguments
`clear_secure_boot_keys`	Clear all secure boot keys.	no
`reset_secure_boot_keys_to_default`	Reset secure boot keys to manufacturing defaults.	no
`set_bmc_clock`	Set the BMC clock using Redfish Manager resource.	no	`datetime_local_offset` – The local time offset from UTC `target_datetime` (required) – The datetime to set in ISO8601 format
`update_firmware`	Updates the firmware on the node.	no	`firmware_images` (required) – A list of firmware images to apply.

Firmware Interface¶

redfish cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`update`	Update the Firmware on the node using the settings for components.	0	no	`settings` (required) – A list of dicts with firmware components to be updated

Bios Interface¶

idrac-redfish cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`apply_configuration`	Apply the BIOS settings to the node.	0	no	`settings` (required) – A list of BIOS settings to be applied
`factory_reset`	Reset the BIOS settings of the node to the factory default.	0	no

ilo cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`apply_configuration`	Applies the provided configuration on the node.	0	no	`settings` (required) – Dictionary with current BIOS configuration.
`factory_reset`	Reset the BIOS settings to factory configuration.	0	no

irmc cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`apply_configuration`	Applies BIOS configuration on the given node. This method takes the BIOS settings from the settings param and applies BIOS configuration on the given node. After the BIOS configuration is done, self.cache_bios_settings() may be called to sync the node’s BIOS-related information with the BIOS configuration applied on the node. It will also validate the given settings before applying any settings and manage failures when setting an invalid BIOS config. In the case of needing password to update the BIOS config, it will be taken from the driver_info properties.	0	no	`settings` (required) – Dictionary containing the BIOS configuration.

redfish cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`apply_configuration`	Apply the BIOS settings to the node.	0	no	`settings` (required) – A list of BIOS settings to be applied
`factory_reset`	Reset the BIOS settings of the node to the factory default.	0	no

Raid Interface¶

agent cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`create_configuration`	Create a RAID configuration on a bare metal using agent ramdisk. This method creates a RAID configuration on the given node.	0	no
`delete_configuration`	Deletes RAID configuration on the given node.	0	no

idrac-redfish cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`create_configuration`	Create RAID configuration on the node. This method creates the RAID configuration as read from node.target_raid_config. This method by default will create all logical disks.	0	no	`create_nonroot_volumes` – This specifies whether to create the non-root volumes. Defaults to True. `create_root_volume` – This specifies whether to create the root volume. Defaults to True. `delete_existing` – Setting this to True indicates to delete existing RAID configuration prior to creating the new configuration. Default value is False.
`delete_configuration`	Delete RAID configuration on the node.	0	no

ilo5 cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`create_configuration`	Create a RAID configuration on a bare metal using agent ramdisk. This method creates a RAID configuration on the given node.	0	no	`create_nonroot_volumes` – This specifies whether to create the non-root volumes. Defaults to True. `create_root_volume` – This specifies whether to create the root volume. Defaults to True.
`delete_configuration`	Delete the RAID configuration.	0	no

irmc cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`create_configuration`	Create the RAID configuration. This method creates the RAID configuration on the given node.	0	no	`create_nonroot_volumes` – This specifies whether to create the non-root volumes. Defaults to True. `create_root_volume` – This specifies whether to create the root volume.Defaults to True.
`delete_configuration`	Delete the RAID configuration.	0	no

redfish cleaning steps¶
Name	Details	Priority	Stoppable	Arguments
`create_configuration`	Create RAID configuration on the node. This method creates the RAID configuration as read from node.target_raid_config. This method by default will create all logical disks.	0	no	`create_nonroot_volumes` – This specifies whether to create the non-root volumes. Defaults to True. `create_root_volume` – This specifies whether to create the root volume. Defaults to True. `delete_existing` – Setting this to True indicates to delete existing RAID configuration prior to creating the new configuration. Default value is False.
`delete_configuration`	Delete RAID configuration on the node.	0	no

Manual cleaning¶

Manual cleaning is typically used to handle long-running, manual, or destructive tasks that an operator wishes to perform either before the first workload has been assigned to a node or between workloads. When initiating a manual clean, the operator specifies the cleaning steps to be performed. Manual cleaning can only be performed when a node is in the manageable state. Once the manual cleaning is finished, the node will be put in the manageable state again.

Ironic added support for manual cleaning in the 4.4 (Mitaka series) release.

Setup¶

In order for manual cleaning to work, you may need to configure a Cleaning Network.

Starting manual cleaning via API¶

Manual cleaning can only be performed when a node is in the manageable state. The REST API request to initiate it is available in API version 1.15 and higher:

PUT /v1/nodes/<node_ident>/states/provision

(Additional information is available here.)

This API will allow operators to put a node directly into cleaning provision state from manageable state via ‘target’: ‘clean’. The PUT will also require the argument ‘clean_steps’ to be specified. This is an ordered list of cleaning steps. A cleaning step is represented by a dictionary (JSON), in the form:

{
    "interface": "<interface>",
    "step": "<name of cleaning step>",
    "args": {"<arg1>": "<value1>", ..., "<argn>": "<valuen>"}
}

The ‘interface’ and ‘step’ keys are required for all steps. If a cleaning step method takes keyword arguments, the ‘args’ key may be specified. It is a dictionary of keyword variable arguments, with each keyword-argument entry being <name>: <value>.

If any step is missing a required keyword argument, manual cleaning will not be performed and the node will be put in clean failed provision state with an appropriate error message.

If, during the cleaning process, a cleaning step determines that it has incorrect keyword arguments, all earlier steps will be performed and then the node will be put in clean failed provision state with an appropriate error message.

An example of the request body for this API:

{
  "target":"clean",
  "clean_steps": [{
    "interface": "raid",
    "step": "create_configuration",
    "args": {"create_nonroot_volumes": false}
  },
  {
    "interface": "deploy",
    "step": "erase_devices"
  }]
}

In the above example, the node’s RAID interface would configure hardware RAID without non-root volumes, and then all devices would be erased (in that order).

An example is setting the BMC clock using the Redfish management interface:

{
  "target": "clean",
  "clean_steps": [{
    "interface": "management",
    "step": "set_bmc_clock",
    "args": {"target_datetime": "2025-07-22T12:34:56+00:00"}
  }]
}

This step requires the node to use the redfish management interface and that the Redfish service exposes the DateTime and DateTimeLocalOffset fields under the Manager Resource.

Alternatively, you can specify a runbook instead of clean_steps:

{
  "target":"clean",
  "runbook": "<runbook_name_or_uuid>"
}

The specified runbook must match one of the node’s traits to be used.

Starting manual cleaning via “openstack baremetal” CLI¶

Manual cleaning is available via the baremetal node clean command, starting with Bare Metal API version 1.15.

The argument --clean-steps must be specified. Its value is one of:

a JSON string
path to a JSON file whose contents are passed to the API
- to read from stdin. This allows piping in the clean steps. Using ‘-’ to signify stdin is common in Unix utilities.

The following examples assume that the Bare Metal API version was set via the OS_BAREMETAL_API_VERSION environment variable. (The alternative is to add --os-baremetal-api-version 1.15 to the command.):

export OS_BAREMETAL_API_VERSION=1.15

Examples of doing this with a JSON string:

baremetal node clean <node> \
    --clean-steps '[{"interface": "deploy", "step": "erase_devices_metadata"}]'

baremetal node clean <node> \
    --clean-steps '[{"interface": "deploy", "step": "erase_devices"}]'

Or with a file:

baremetal node clean <node> \
    --clean-steps my-clean-steps.txt

Or with stdin:

cat my-clean-steps.txt | baremetal node clean <node> \
    --clean-steps -

Runbooks for Manual Cleaning¶

Instead of passing a list of clean steps, operators can now use runbooks. Runbooks are curated lists of steps that can be associated with nodes via traits which simplifies the process of performing consistent cleaning operations across similar nodes.

To use a runbook for manual cleaning:

baremetal node clean <node> --runbook <runbook_name_or_uuid>

Runbooks must be created and associated with nodes beforehand. Only runbooks that match the node’s traits can be used for cleaning that node. For more information on the runbook API usage, see Runbooks for Cleaning & Servicing.

Cleaning Network¶

If you are using the Neutron DHCP provider (the default) you will also need to ensure you have configured a cleaning network. This network will be used to boot the ramdisk for in-band cleaning. You can use the same network as your tenant network. For steps to set up the cleaning network, please see Configure the Bare Metal service for cleaning.

In-band vs out-of-band¶

Ironic uses two main methods to perform actions on a node: in-band and out-of-band. Ironic supports using both methods to clean a node.

In-band¶

In-band steps are performed by Ironic making API calls to a ramdisk running on the node using a deploy interface. Currently, all the deploy interfaces support in-band cleaning. By default, ironic-python-agent ships with a minimal cleaning configuration, only erasing disks. However, you can add your own cleaning steps and/or override default cleaning steps with a custom Hardware Manager.

Out-of-band¶

Out-of-band are actions performed by your management controller, such as IPMI, iLO, or DRAC. Out-of-band steps will be performed by Ironic using a power or management interface. Which steps are performed depends on the hardware type and hardware itself.

For Out-of-Band cleaning operations supported by iLO hardware types, refer to Node Cleaning Support.

FAQ¶

How are cleaning steps ordered?¶

For automated cleaning, cleaning steps are ordered by integer priority, where a larger integer is a higher priority. In case of a conflict between priorities across hardware interfaces, the following resolution order is used:

Power interface
Management interface
Deploy interface
BIOS interface
RAID interface

For manual cleaning, the cleaning steps should be specified in the desired order.

How do I skip a cleaning step?¶

For automated cleaning, cleaning steps with a priority of zero or None are skipped.

How do I change the priority of a cleaning step?¶

For manual cleaning, or runbook-based cleaning, specify the cleaning steps in the desired order.

For automated cleaning, it depends on whether the cleaning steps are out-of-band or in-band.

Most out-of-band cleaning steps have an explicit configuration option for priority.

Changing the priority of an in-band (ironic-python-agent) cleaning step requires use of conductor.clean_step_priority_override, a configuration option that allows specifying the priority of each step using multiple configuration values:

[conductor]
clean_step_priority_override=deploy.erase_devices_metadata:123
clean_step_priority_override=management.reset_bios_to_default:234
clean_step_priority_override=management.clean_priority_reset_ilo:345

This parameter can be specified as many times as required to define priorities for several cleaning steps - the values will be combined.

What cleaning step is running?¶

To check what cleaning step the node is performing or attempted to perform and failed, run the following command; it will return the value in the node’s driver_internal_info field:

baremetal node show $node_ident -f value -c driver_internal_info

The clean_steps field will contain a list of all remaining steps with their priorities, and the first one listed is the step currently in progress or that the node failed before going into clean failed state.

Should I disable automated cleaning?¶

Automated cleaning is recommended for Ironic deployments, however, there are some tradeoffs to having it enabled. For instance, Ironic cannot deploy a new instance to a node that is currently cleaning, and cleaning can be a time consuming process. To mitigate this, we suggest using NVMe drives with support for NVMe Secure Erase (based on nvme-cli format command) or ATA drives with support for cryptographic ATA Security Erase, as typically the erase_devices step in the deploy interface takes the longest time to complete of all cleaning steps.

Why can’t I power on/off a node while it’s cleaning?¶

During cleaning, nodes may be performing actions that shouldn’t be interrupted, such as BIOS or Firmware updates. As a result, operators are forbidden from changing the power state via the Ironic API while a node is cleaning.

Advanced topics¶

Parent Nodes¶

The concept of a parent_node is where a node is configured to have a “parent”, and allows for actions upon the parent, to in some cases take into account child nodes. Mainly, the concept of executing clean steps in relation to child nodes.

In this context, a child node is primarily intended to be an embedded device with its own management controller. For example “SmartNIC’s” or Data Processing Units (DPUs) which may have their own management controller and power control.

The relationship between a parent node and a child node is established on the child node. Example:

baremetal node set --parent-node <parent_node_uuid> <child_node_uuid>

Child Node Clean Step Execution¶

You can execute steps that perform actions on child nodes. For example, turn them on (via step power_on), off (via step power_off), or to signal a BMC-controlled reboot (via step reboot).

For example, if you need to explicitly power off child node power, before performing another step, you can articulate it with a step such as:

[{
  "interface": "power",
  "step": "power_off",
  "execute_on_child_nodes": True,
  "limit_child_node_execution": ['f96c8601-0a62-4e99-97d6-1e0d8daf6dce']
},
{
  "interface": "deploy",
  "step": "erase_devices"
}]

As one would imagine, this step will power off a singular child node, as a limit has been expressed to a singular known node, and that child node’s power will be turned off via the management interface. Afterwards, the erase_devices step will be executed on the parent node.

Note

While the deployment step framework also supports the execute_on_child_nodes and limit_child_node_execution parameters, all of the step frameworks have a fundamental limitation in that child node step execution is intended for synchronous actions which do not rely upon the ironic-python-agent running on any child nodes. This constraint may be changed in the future.

Power Management with Child Nodes¶

The mix of child nodes and parent nodes has special power considerations, and these devices are evolving in the industry. That being said, the Ironic project has taken an approach of explicitly attempting to “power on” any parent node when a request comes in to “power on” a child node. This can be bypassed by setting a driver_info parameter has_dedicated_power_supply set to True, in recognition that some hardware vendors are working on supplying independent power to these classes of devices to meet their customer use cases.

Similarly to the case of a “power on” request for a child node, when power is requested to be turned off for a “parent node”, Ironic will issue “power off” commands for all child nodes unless the child node has the has_dedicated_power_supply option set in the node’s driver_info field.

Troubleshooting¶

If cleaning fails on a node, the node will be put into clean failed state. If the failure happens while running a clean step, the node is also placed in maintenance mode to prevent Ironic from taking actions on the node. The operator should validate that no permanent damage has been done to the node and that no processes are still running on it before removing the maintenance mode.

Note

Older versions of Ironic may put the node to maintenance even when no clean step has been running.

Nodes in clean failed will not be powered off, as the node might be in a state such that powering it off could damage the node or remove useful information about the nature of the cleaning failure.

A clean failed node can be moved to manageable state, where it cannot be scheduled by Nova and you can safely attempt to fix the node. To move a node from clean failed to manageable:

baremetal node manage $node_ident

You can now take actions on the node, such as replacing a bad disk drive.

Strategies for determining why a cleaning step failed include checking the Ironic conductor logs, viewing logs on the still-running ironic-python-agent (if an in-band step failed), or performing general hardware troubleshooting on the node.

When the node is repaired, you can move the node back to available state, to allow it to be scheduled by Nova.

# First, move it out of maintenance mode
baremetal node maintenance unset $node_ident

# Now, make the node available for scheduling by Nova
baremetal node provide $node_ident

The node will begin automated cleaning from the start, and move to available state when complete.

Node cleaning

Node cleaning¶

Overview¶

Automated cleaning¶

Enabling automated cleaning¶

Cleaning steps¶

Storage cleaning options¶

Configuring automated cleaning with runbooks¶

Management Interface¶

Firmware Interface¶

Bios Interface¶

Raid Interface¶

Manual cleaning¶

Setup¶

Starting manual cleaning via API¶

Starting manual cleaning via “openstack baremetal” CLI¶

Runbooks for Manual Cleaning¶

Cleaning Network¶

In-band vs out-of-band¶

In-band¶

Out-of-band¶

FAQ¶

How are cleaning steps ordered?¶

How do I skip a cleaning step?¶

How do I change the priority of a cleaning step?¶

What cleaning step is running?¶

Should I disable automated cleaning?¶

Why can’t I power on/off a node while it’s cleaning?¶

Advanced topics¶

Parent Nodes¶

Child Node Clean Step Execution¶

Power Management with Child Nodes¶

Troubleshooting¶

ironic 33.0.1.dev88

Page Contents