Workload Balance Migration Strategy

Synopsis

display name: Workload Balance Migration Strategy

goal: workload_balancing

Workload balance using live migration

Description

It is a migration strategy based on the VM workload of physical servers. It generates solutions to move a workload whenever a server’s CPU or RAM utilization % is higher than the specified threshold. The threshold specified is used to trigger a migration, but it is also used to determine if there is an available host, with low enough utilization, to migrate the instance. The VM to be moved should make the host close to average workload of all compute nodes.

Requirements

  • Hardware: compute node should use the same physical CPUs/RAMs

  • Software: Ceilometer component ceilometer-agent-compute running in each compute node, and Ceilometer API can report such telemetry “instance_cpu_usage” and “instance_ram_usage” successfully.

  • You must have at least 2 physical compute nodes to run this strategy.

Limitations

  • We cannot forecast how many servers should be migrated. This is the reason why we only plan a single virtual machine migration at a time. So it’s better to use this algorithm with CONTINUOUS audits.

  • It assume that live migrations are possible

Metrics

The workload_balance strategy requires the following metrics:

metric

service name

plugins

unit

comment

cpu

ceilometer

none

percentage

CPU of the instance. Used to calculate the threshold

memory.resident

ceilometer

none

MB

RAM of the instance. Used to calculate the threshold

Note

  • The parameters above reference the instance CPU or RAM usage, but the threshold calculation is based of the CPU/RAM usage on the hypervisor.

  • The RAM usage can be calculated based on the RAM consumed by the instance, and the available RAM on the hypervisor.

  • The CPU percentage calculation relies on the CPU load, but also on the number of CPUs on the hypervisor.

  • The host memory metric is calculated by summing the RAM usage of each instance on the host. This measure is close to the real usage, but is not the exact usage on the host.

Cluster data model

Default Watcher’s Compute cluster data model:

Nova cluster data model collector

The Nova cluster data model collector creates an in-memory representation of the resources exposed by the compute service.

Actions

Default Watcher’s actions:

action

description

migration

Migrates a server to a destination nova-compute host

This action will allow you to migrate a server to another compute destination host. Migration type ‘live’ can only be used for migrating active VMs. Migration type ‘cold’ can be used for migrating non-active VMs as well active VMs, which will be shut down while migrating.

The action schema is:

schema = Schema({
 'resource_id': str,  # should be a UUID
 'migration_type': str,  # choices -> "live", "cold"
 'destination_node': str,
 'source_node': str,
})

The resource_id is the UUID of the server to migrate. The source_node and destination_node parameters are respectively the source and the destination compute hostname.

Note

Nova API version must be 2.56 or above if destination_node parameter is given.

Planner

Default Watcher’s planner:

Weight planner implementation

This implementation builds actions with parents in accordance with weights. Set of actions having a higher weight will be scheduled before the other ones. There are two config options to configure: action_weights and parallelization.

Limitations

  • This planner requires to have action_weights and parallelization configs tuned well.

Configuration

Strategy parameters are:

parameter

type

default value

description

metrics

String

instance_cpu_usage

Workload balance base on cpu or ram utilization. Choices: [‘instance_cpu_usage’, ‘instance_ram_usage’]

threshold

Number

25.0

Workload threshold for migration. Used for both the source and the destination calculations. Threshold is always a percentage.

period

Number

300

Aggregate time period of ceilometer

granularity

Number

300

The time between two measures in an aggregated timeseries of a metric. This parameter is only used with the Gnocchi data source, and it must match to any of the valid archive policies for the metric.

Efficacy Indicator

None

Algorithm

For more information on the Workload Balance Migration Strategy please refer to: https://specs.openstack.org/openstack/watcher-specs/specs/mitaka/implemented/workload-balance-migration-strategy.html

How to use it ?

Create an audit template using the Workload Balancing strategy.

$ openstack optimize audittemplate create \
  at1 workload_balancing --strategy workload_balance

Run an audit using the Workload Balance strategy. The result of the audit should be an action plan to move VMs from any host where the CPU usage is over the threshold of 26%, to a host where the utilization of CPU is under the threshold. The measurements of CPU utilization are taken from the configured datasouce plugin with an aggregate period of 310.

$ openstack optimize audit create -a at1 -p threshold=26.0 \
        -p period=310 -p metrics=instance_cpu_usage

Run an audit using the Workload Balance strategy to obtain a plan to balance VMs over hosts with a threshold of 20%. In this case, the stipulation of the CPU utilization metric measurement is a combination of period and granularity.

$ openstack optimize audit create -a at1 \
       -p granularity=30 -p threshold=20 -p period=300 \
       -p metrics=instance_cpu_usage --auto-trigger