All Statsd Metrics¶
account-auditor
Metrics¶
Metric Name |
Description |
|
Count of audit runs (across all account databases) which caught an Exception. |
|
Count of individual account databases which passed audit. |
|
Count of individual account databases which failed audit. |
|
Timing data for individual account database audits. |
account-reaper
Metrics¶
Metric Name |
Description |
|
Count of devices failing the mount check. |
|
Timing data for each reap_account() call. |
|
Count of HTTP return codes from various operations (e.g. object listing, container deletion, etc.). The value for X is the first digit of the return code (2 for 201, 4 for 404, etc.). |
|
Count of failures to delete a container. |
|
Count of containers successfully deleted. |
|
Count of containers which failed to delete with zero successes. |
|
Count of containers which failed to delete with at least one success. |
|
Count of failures to delete an object. |
|
Count of objects successfully deleted. |
|
Count of objects which failed to delete with zero successes. |
|
Count of objects which failed to delete with at least one success. |
account-server
Metrics¶
- ..note::
“Not Found” is not considered an error and requests which increment
errors
are not included in the timing data.
Metric Name |
Description |
|
Timing data for each DELETE request resulting in an error: bad request, not mounted, missing timestamp. |
|
Timing data for each DELETE request not resulting in an error. |
|
Timing data for each PUT request resulting in an error: bad request, not mounted, conflict, recently-deleted. |
|
Timing data for each PUT request not resulting in an error. |
|
Timing data for each HEAD request resulting in an error: bad request, not mounted. |
|
Timing data for each HEAD request not resulting in an error. |
|
Timing data for each GET request resulting in an error: bad request, not mounted, bad delimiter, account listing limit too high, bad accept header. |
|
Timing data for each GET request not resulting in an error. |
|
Timing data for each REPLICATE request resulting in an error: bad request, not mounted. |
|
Timing data for each REPLICATE request not resulting in an error. |
|
Timing data for each POST request resulting in an error: bad request, bad or missing timestamp, not mounted. |
|
Timing data for each POST request not resulting in an error. |
account-replicator
Metrics¶
Metric Name |
Description |
|
Count of syncs handled by sending differing rows. |
|
Count of “diffs” operations which failed because “max_diffs” was hit. |
|
Count of accounts found to be in sync. |
|
Count of accounts found to be in sync via hash
comparison ( |
|
Count of completely missing accounts which were sent via rsync. |
|
Count of syncs handled by sending entire database via rsync. |
|
Count of database replication attempts. |
|
Count of database replication attempts which failed due to corruption (quarantined) or inability to read as well as attempts to individual nodes which failed. |
|
Count of databases on <device> deleted because the delete_timestamp was greater than the put_timestamp and the database had no rows or because it was successfully sync’ed to other locations and doesn’t belong here anymore. |
|
Count of replication attempts to an individual node which were successful. |
|
Timing data for each database replication attempt not resulting in a failure. |
container-auditor
Metrics¶
Metric Name |
Description |
|
Incremented when an Exception is caught in an audit pass (only once per pass, max). |
|
Count of individual containers passing an audit. |
|
Count of individual containers failing an audit. |
|
Timing data for each container audit. |
container-replicator
Metrics¶
Metric Name |
Description |
|
Count of syncs handled by sending differing rows. |
|
Count of “diffs” operations which failed because “max_diffs” was hit. |
|
Count of containers found to be in sync. |
|
Count of containers found to be in sync via hash
comparison ( |
|
Count of completely missing containers where were sent via rsync. |
|
Count of syncs handled by sending entire database via rsync. |
|
Count of database replication attempts. |
|
Count of database replication attempts which failed due to corruption (quarantined) or inability to read as well as attempts to individual nodes which failed. |
|
Count of databases deleted on <device> because the delete_timestamp was greater than the put_timestamp and the database had no rows or because it was successfully sync’ed to other locations and doesn’t belong here anymore. |
|
Count of replication attempts to an individual node which were successful. |
|
Timing data for each database replication attempt not resulting in a failure. |
container-server
Metrics¶
Note
“Not Found” is not considered an error and requests
which increment errors
are not included in the timing data.
Metric Name |
Description |
|
Timing data for DELETE request errors: bad request, not mounted, missing timestamp, conflict. |
|
Timing data for each DELETE request not resulting in an error. |
|
Timing data for PUT request errors: bad request, missing timestamp, not mounted, conflict. |
|
Timing data for each PUT request not resulting in an error. |
|
Timing data for HEAD request errors: bad request, not mounted. |
|
Timing data for each HEAD request not resulting in an error. |
|
Timing data for GET request errors: bad request, not mounted, parameters not utf8, bad accept header. |
|
Timing data for each GET request not resulting in an error. |
|
Timing data for REPLICATE request errors: bad request, not mounted. |
|
Timing data for each REPLICATE request not resulting in an error. |
|
Timing data for POST request errors: bad request, bad x-container-sync-to, not mounted. |
|
Timing data for each POST request not resulting in an error. |
container-sync
Metrics¶
Metric Name |
Description |
|
Count of containers skipped because they don’t have sync’ing enabled. |
|
Count of failures sync’ing of individual containers. |
|
Count of individual containers sync’ed successfully. |
|
Count of container database rows sync’ed by deletion. |
|
Timing data for each container database row synchronization via deletion. |
|
Count of container database rows sync’ed by Putting. |
|
Timing data for each container database row synchronization via Putting. |
container-updater
Metrics¶
Metric Name |
Description |
|
Count of containers which successfully updated their account. |
|
Count of containers which failed to update their account. |
|
Count of containers which didn’t need to update their account. |
|
Timing data for processing a container; only includes timing for containers which needed to update their accounts (i.e. “successes” and “failures” but not “no_changes”). |
object-auditor
Metrics¶
Metric Name |
Description |
|
Count of objects failing audit and quarantined. |
|
Count of errors encountered while auditing objects. |
|
Timing data for each object audit (does not include any rate-limiting sleep time for max_files_per_second, but does include rate-limiting sleep time for max_bytes_per_second). |
object-expirer
Metrics¶
Metric Name |
Description |
|
Count of objects expired. |
|
Count of errors encountered while attempting to expire an object. |
|
Timing data for each object expiration attempt, including ones resulting in an error. |
object-reconstructor
Metrics¶
Metric Name |
Description |
|
A count of partitions on <device> which were reconstructed and synced to another node because they didn’t belong on this node. This metric is tracked per-device to allow for “quiescence detection” for object reconstruction activity on each device. |
|
Timing data for partitions reconstructed and synced to another node because they didn’t belong on this node. This metric is not tracked per device. |
|
A count of partitions on <device> which were reconstructed and synced to another node, but also belong on this node. As with delete.count, this metric is tracked per-device. |
|
Timing data for partitions reconstructed which also belong on this node. This metric is not tracked per-device. |
|
Count of suffix directories whose hash (of filenames) was recalculated. |
|
Count of suffix directories reconstructed with ssync. |
object-replicator
Metrics¶
Metric Name |
Description |
|
A count of partitions on <device> which were replicated to another node because they didn’t belong on this node. This metric is tracked per-device to allow for “quiescence detection” for object replication activity on each device. |
|
Timing data for partitions replicated to another node because they didn’t belong on this node. This metric is not tracked per device. |
|
A count of partitions on <device> which were replicated to another node, but also belong on this node. As with delete.count, this metric is tracked per-device. |
|
Timing data for partitions replicated which also belong on this node. This metric is not tracked per-device. |
|
Count of suffix directories whose hash (of filenames) was recalculated. |
|
Count of suffix directories replicated with rsync. |
object-server
Metrics¶
Metric Name |
Description |
|
Count of objects (files) found bad and moved to quarantine. |
|
Count of container updates saved as async_pendings (may result from PUT or DELETE requests). |
|
Timing data for POST request errors: bad request, missing timestamp, delete-at in past, not mounted. |
|
Timing data for each POST request not resulting in an error. |
|
Timing data for PUT request errors: bad request, not mounted, missing timestamp, object creation constraint violation, delete-at in past. |
|
Count of object PUTs which exceeded max_upload_time. |
|
Timing data for each PUT request not resulting in an error. |
|
Timing data per kB transferred (ms/kB) for each non-zero-byte PUT request on each device. Monitoring problematic devices, higher is bad. |
|
Timing data for GET request errors: bad request, not mounted, header timestamps before the epoch, precondition failed. File errors resulting in a quarantine are not counted here. |
|
Timing data for each GET request not resulting in an error. Includes requests which couldn’t find the object (including disk errors resulting in file quarantine). |
|
Timing data for HEAD request errors: bad request, not mounted. |
|
Timing data for each HEAD request not resulting in an error. Includes requests which couldn’t find the object (including disk errors resulting in file quarantine). |
|
Timing data for DELETE request errors: bad request, missing timestamp, not mounted, precondition failed. Includes requests which couldn’t find or match the object. |
|
Timing data for each DELETE request not resulting in an error. |
|
Timing data for REPLICATE request errors: bad request, not mounted. |
|
Timing data for each REPLICATE request not resulting in an error. |
object-updater
Metrics¶
Metric Name |
Description |
|
Count of drives not mounted or async_pending files with an unexpected name. |
|
Timing data for object sweeps to flush async_pending container updates. Does not include object sweeps which did not find an existing async_pending storage directory. |
|
Count of async_pending container updates which were corrupted and moved to quarantine. |
|
Count of successful container updates. |
|
Count of failed container updates. |
|
Count of async_pending files unlinked. An async_pending file is unlinked either when it is successfully processed or when the replicator sees that there is a newer async_pending file for the same object. |
proxy-server
Metrics¶
In the table, <type>
is the proxy-server controller responsible for the
request and will be one of account
, container
, or object
.
Metric Name |
Description |
|
Count of errors encountered while serving requests before the controller type is determined. Includes invalid Content-Length, errors finding the internal controller to handle the request, invalid utf8, and bad URLs. |
|
Count of node hand-offs; only tracked if log_handoffs is set in the proxy-server config. |
|
Count of times only hand-off locations were utilized; only tracked if log_handoffs is set in the proxy-server config. |
|
Count of client timeouts (client did not read within
|
|
Count of detected client disconnects during PUT operations (does NOT include caught Exceptions in the proxy-server which caused a client disconnect). |
Additionally, middleware often emit their own metrics
proxy-logging
Middleware¶
In the table, <type>
is either the proxy-server controller responsible
for the request: account
, container
, object
, or the string
SOS
if the request came from the Swift Origin Server middleware.
The <verb>
portion will be one of GET
, HEAD
, POST
, PUT
,
DELETE
, COPY
, OPTIONS
, or BAD_METHOD
. The list of valid
HTTP methods is configurable via the log_statsd_valid_http_methods
config variable and the default setting yields the above behavior.
Metric Name |
Description |
|
Timing data for requests, start to finish. The <status> portion is the numeric HTTP status code for the request (e.g. “200” or “404”). |
|
Timing data up to completion of sending the response headers (only for GET requests). <status> and <type> are as for the main timing metric. |
|
This counter metric is the sum of bytes transferred in (from clients) and out (to clients) for requests. The <type>, <verb>, and <status> portions of the metric are just like the main timing metric. |
The proxy-logging
middleware also groups these metrics by policy. The
<policy-index>
portion represents a policy index:
Metric Name |
Description |
|
Timing data for requests, aggregated by policy index. |
|
Timing data up to completion of sending the response headers, aggregated by policy index. |
|
Sum of bytes transferred in and out, aggregated by policy index. |
tempauth
Middleware¶
In the table, <reseller_prefix>
represents the actual configured
reseller_prefix or NONE
if the reseller_prefix is the empty string:
Metric Name |
Description |
|
Count of regular requests which were denied with HTTPUnauthorized. |
|
Count of regular requests which were denied with HTTPForbidden. |
|
Count of token requests which were denied. |
|
Count of errors. |