# How to Use the atsd Package to Communicate with Axibase Time-Series Database

## 1. Package Overview

The package allows you query time-series data and statistics from Axibase Time-Series Database (ATSD) and save time-series data in ATSD. List of package functions:

• set_connection(), save_connection(), show_connection() – are used to manage the connection with ATSD. Set up and store the url, user name, and password. Configure cryptographic protocol and enforce SSL certificate validation in the case of https connection.
• query() – get historical data and forecasts from ATSD.
• get_metrics() – get information about the metrics collected by ATSD.
• get_entities() – get information about the entities collected by ATSD.
• get_series_tags() – get unique series tags for the metric.
• save_series() – save time series into ATSD.
• to_zoo() – converts a time-series data frame to ‘zoo’ object for manipulating irregular time-series with built-in functions in zoo package.

## 2. Connecting to ATSD

Execute library(atsd)  to start working with the atsd package. The connection parameters are loaded from the package configuration file, atsd/connection.config,  which is located in the atsd package folder. The command

installed.packages()["atsd", "LibPath"]

shows you where the atsd package folder is. Open a text editor and modify the configuration file. It should look as follows:

 # the url of ATSD including port number
url=http://host_name:port_number

# the user name
user=atsd_user_name

# validate ATSD SSL certificate: yes, no
verify=no

# cryptographic protocol used by ATSD https server:
# default, ssl2, ssl3, tls1
encryption=ssl3   

Reload the modified connection parameters from the configuration file:

set_connection()

Check that parameters are correct:

show_connection()

Refer to Chapter 9 for more options on managing ATSD connection parameters.

## 3. Querying ATSD

Function name: query()

Description: The function retrieves historical time-series data or forecasts from ATSD.

Returns object: data frame

Arguments:

• metric  (required, string)
The name of the metric you want to get data for, for example, “disk_used_percent”.
To obtain a list of metrics collected by ATSD use the get_metrics() function.

• selection_interval  (required, string)
This is the time interval for which the data will be selected. Specify it as “n-unit”, where
unit is a Second, Minute, Hour, Day, Week, Month, Quarter, or Year and n is the number of units, for example, “3-Week” or “12-Hour”.

• entity   (optional, string)
The name of the entity you want to get data for. If not provided, then data for all entities will be fetched for the specified metric. Obtain the list of entities with the get_entities() function.

• entity_group   (optional, string)
The name of entity group, for example, “HP Servers”. Extracts data for all entities belonging to this group.

• tags   (optional, string vector)
List of user-defined series tags to filter the fetched time-series data, for example, c(“disk_name=sda1”, “mount_point=/”) .

• end_time   (optional, string)
The end time of the selection interval, for example, end_time = "date('2014-12-27')". If not provided, the current time will be used. Specify the date and time, or use one of the supported expressions: end time syntax. For example, ‘current_day’ would set the end of selection interval to 00:00:00 of the current day.

• aggregate_interval  (optional, string)
The length of the aggregation interval. The period of produced time-series will be equal to the aggregate_interval.  The value for each period is computed by the aggregate_statistics  function applied to all samples of the original time-series within the period. The format of the aggregate_interval  is the same as for the selection_interval  argument, for example, “1-Minute”.

• aggregate_statistics   (optional, string vector)
The statistic functions used for aggregation. Multiple values are supported, for example, c(“Min”, “Avg”, “StDev”). The default value is “Avg”.

• interpolation   (optional, string)
If aggregation is enabled, then the values for the periods without data will be computed by one of the following interpolation functions: “None”, “Linear”, “Step”. The default value is “None”.

• export_type  (optional, string)
Supported options: “History” or “Forecast”. The default value is “History”.

• verbose   (optional, string)
If verbose = FALSE,  then all console output will be suppressed. By default, verbose = TRUE.

Examples:

# get historic data for the given entity, metric, and selection_interval
dfr <- query(entity = "nurswgvml007", metric = "cpu_busy", selection_interval = "1-Hour")

# end_time usage example
query(entity = "host-383", metric = "cpu_usage", selection_interval = "1-Day",
end_time = "date('2015-02-10 10:15:03')")

# get forecasts
query(metric = "cpu_busy", selection_interval = "30-Minute",
export_type = "Forecast", verbose = FALSE)

# use aggregation
query(metric = "disk_used_percent", entity_group = "Linux",
tags = c("mount_point=/boot", "file_system=/dev/sda1"),
selection_interval = "1-Week", aggregate_interval = "1-Minute",
aggregate_statistics = c("Avg", "Min", "Max"),
interpolation = "Linear", export_type = "Forecast")

## 4. Transforming Data Frame to a zoo Object

Function name: to_zoo()

Description: The function builds a zoo object from the given data frame. The timestamp  argument provides a column of the data frame which is used as the index for the zoo object. The value  argument indicates the series which will be saved in a zoo object. If several columns are listed in the value  argument, they will all be saved in a multivariate zoo object. Information from other columns is ignored. To use this function the ‘zoo’ package should be installed.

Returns object: zoo object

Arguments:

• dfr  (required, data frame)
The data frame.

• timestamp  (optional, character or numeric)
Name or number of the column with timestamps. By default, timestamp = "Timestamp".

• value  (optional, character vector or numeric vector)
Names or numbers of columns with series values. By default, value = "Value".

Examples:

# query ATSD for data and transform it to zoo object
dfr <- query(entity = "nurswgvml007", metric = "cpu_busy", selection_interval = "1-Hour")
z <- to_zoo(dfr)

## 5. Getting Metrics

Function name: get_metrics()

Description: This function fetches a list of metrics and their tags from ATSD, and converts it to a data frame.

Returns object: data frame

Each row of the data frame corresponds to a metric and its tags:

• name
Metric name (unique)

• counter
Counters are metrics with continuously incrementing value

• lastInsertTime
Last time value was received by ATSD for this metric

• tags
User-defined tags (as requested by the “tags” argument)

Arguments:

• expression  (optional, string)
Select metrics matching particular name pattern and/or user-defined metric tags. For examples refer to “Expression syntax” chapter.

• active  (optional, one of strings: “true” or “false”)
Filter metrics by lastInsertTime  attribute. If active = “true”,  only metrics with positive lastInsertTime  are included in the response.

• tags  (optional, string vector)
User-defined metric tags to be included in the response. By default, all the tags will be included.

• limit  (optional, integer)
If limit > 0, the response shows the top-N metrics ordered by name.

• verbose  (optional, string)
If verbose = FALSE,  then all console output will be suppressed.

Examples:

# get all metrics and include all their tags in the data frame
metrics <- get_metrics()

# get the first 100 active metrics which have the tag, "table",
# include this tag into response and exclude oter user-defined metric tags
metrics <- get_metrics(expression = "tags.table != ''", active = "true",
tags = "table", limit = 100)

## 6. Getting Entities

Function name: get_entities()

Description: This function fetches a list of entities and their tags from ATSD, and converts it to a data frame.

Returns object: data frame

Each row of the data frame corresponds to an entity and its tags:

• name
Entity name (unique)

• enabled
Enabled status, incoming data is discarded for disabled entities

• lastInsertTime
Last time value was received by ATSD for this entity

• tags
User-defined tags (as requested by the “tags” argument)

Arguments:

• expression  (optional, string)
Select entities matching particular name pattern and/or user-defined entity tags. For examples refer to “Expression syntax” chapter.

• active  (optional, one of strings: “true” or “false”)
Filter entities by lastInsertTime  attribute. If active = “true”,  only entities with positive lastInsertTime  are included in the response.

• tags  (optional, string vector)
User-defined entity tags to be included in the response. By default, all the tags will be included.

• limit  (optional, integer)
If limit > 0, the response shows the top-N entities ordered by name.

• verbose  (optional, string)
If verbose = FALSE,  then all console output will be suppressed.

Examples:

# get all entities
entities <- get_entities()

# select entities by name and user-defined tag "app"
entities <- get_entities(expression =
"name like 'nur*' and lower(tags.app) like '*hbase*'")

## 7. Getting Time Series Tags

Function name: get_series_tags()

Description: The function determines time series collected by ATSD for a given metric. For each time series it lists tags associated with the series, and last time the series was updated. The list of fetched time series is based on data stored on disk for the last 24 hours.

Returns object: data frame

Each row of the data frame corresponds to a time series and its tags:

• entity
Name of entity which generate the time series.

• lastInsertTime
Last time value was received by ATSD for this time series.

• tags
Tags of the series.

Arguments:

• metric  (required, string)
The name of the metric you want to get time series for, for example, “disk_used_percent”.
To obtain a list of metrics collected by ATSD use the get_metrics() function.

• entity   (optional, string)
The name of the entity you want to get time series for. If not provided, then data for all entities will be fetched for the specified metric. Obtain the list of entities with the get_entities() function.

• verbose  (optional, string)
If verbose = FALSE,  then all console output will be suppressed.

Examples:

# get all time series and their tags collected by ATSD for the "disk_used_percent" metric
tags <- get_series_tags(metric = "disk_used_percent")

# get all time series and their tags for the "disk_used_percent" metric
# end "nurswgvml007" entity
get_series_tags(metric = "disk_used_percent", entity = "nurswgvml007")

## 8. Saving Time-series in ATSD

Function name: save_series()

Description: Save time-series from the data frame into ATSD. The data frame should have a column with timestamps and at least one numeric column with values of a metric.

Returns object: NULL

Arguments:

• dfr  (required, data frame)
The data frame should have a column with timestamps and at least one numeric column with values of a metric.

• time_col  (optional, numeric or character)
Number or name of the column with the timestamps. Default value is 1. For example, time_col = 1,  or time_col = “Timestamp”.  Read “Timestamps format” section below for supported timestamp classes and formats.

• time_format  (optional, string)
Optional string argument, indicates format of timestamps. This argument is used in the case when timestamp format is not clear from their class. The value of this argument can be one of the following: "ms" (for epoch milliseconds), "sec" (for epoch seconds), or a format string, for example "\%Y-\%m-\%d \%H:\%M:\%S". This format string will be used to convert the provided timestamps to epoch milliseconds before storing the timestamps in ATSD. Read “Timestamp format” section for details.

• tz  (optional, string)
By default, tz = "GMT". Specify time zone, when timestamps are strings formatted as described in the time_format  argument. For example, tz = "Australia/Darwin". View the “TZ” column of the time zones table for a list of possible values.

• metric_col  (required, numeric or character vector)
Specifies numbers or names of the columns where metric values are stored. For example, metric_col = c(2, 3, 4), or metric_col = c("Value", "Avg"). If metric_name  argument is not given, then names of columns, in lower case, are used as metric names when saving them in ATSD.

• metric_name  (optional, character vector)
Specifies metric names. The series indicated by metric_col  argument are saved in ATSD along with metric names, provided by the metric_name . So the number and order of names in the metric_name  should match to columns in <tt>metric_col . If metric_name  argument is not provided, then names of columns, in lower case, are used as metric names when saving them in ATSD.

• entity_col  (optional, numeric or character)
Optional argument, should be provided if the entity argument is not given. Number or name of a column with entities. Several entities in the column are allowed. For example, entity_col = 4, or entity_col = "server001".

• entity  (optional, character)
Should be provided if the entity_col  argument is not given. Name of the entity.

• tags_col  (optional, numeric or character vector)
Lists numbers or names of the columns containing tag values. So the name of a column is a tag name, and values in the column are the tag values.

• tags  (optional, character vector)
Lists tags and their values in “tag=value” format. Each indicated tag will be saved with each series.

• verbose  (optional, string)
If verbose = FALSE,  then all console output will be suppressed.

Timestamp format.

The list of allowed timestamp types.

• Numeric, in epoch milliseconds or epoch seconds. In that case time_format = "ms" or time_format = "sec" should be used, and time zone argument tz  is ignored.

• Object of one of type Date, POSIXct, POSIXlt, chron from the chron package or timeDate from the timeDate package. In that case arguments time_format  and tz  are ignored.

• String, for example, “2015-01-03 10:07:15”. In this case time_format  argument should specify which format string is used for the timestamps. For example, time_format = "\%Y-\%m-\%d \%H:\%M:\%S". Type ?strptime to see list of format symbols. This format string will be used to convert provided timestamps to epoch milliseconds before storing the timestamps in ATSD. So time zone, as written in tz  argument, and standard origin “1970-01-01 00:00:00” are used for conversion. In fact conversion is done with use of command: as.POSIXct(time_stamp, format = time_format, origin="1970-01-01", tz = tz).

Note that timestamps will be stored in epoch milliseconds. So if you put some data into ATSD and then retrieve it back, the timestamps will refer to the same time but in GMT time zone. For example, if you save timestamp "2015-02-15 10:00:00" with tz = "Australia/Darwin" in ATSD, and then retrieve it back, you will get the timestamp "2015-02-15 00:30:00" because Australia/Darwin time zone has a +09:30 shift relative to the GMT zone.

Entity specification

You can provide entity name in one of entity  or entity_col  arguments. In the first case all series will have the same entity. In the second case, entities specified in entity_col  column will be saved along with corresponding series.

Tags specification

The tags_col  argument indicates which columns of the data frame keeps the time-series tags. The name of each column specified by the tags_col  argument is a tag name, and the values in the column are tag values.

Before storing the series in ATSD, the data frame will be split into several data frames, each of them has a unique entity and unique list of tag values. This entity and tags are stored in ATSD along with the time-series from the data frame. NA’s and missing values in time-series will be ignored.

In tags  argument you can specify tags which are the same for all rows (records) of the data frame. So each series value saved in ATSD will have tags, provided in the tags  argument.

Examples:

# Save time-series from columns 3, 4, 5 of data frame dfr.
# Timestamps are saved as strings in 2nd column
# and their format string and time zone are provided.
# Entities and tags are in columns 1, 6, 7.
# All saved series will have tag "os_type" with value "linux".
save_series(dfr, time_col = 2, time_format = "%Y/%m/%d %H:%M:%S", tz = "Australia/Darwin",
metric_col = c(3, 4, 5), entity_col = 1, tags_col = c(6, 7),
tags = "os_type = linux")

## 9. Expression Syntax

In this section, we explain the syntax of the expression  argument of the functions get_metrics()   and get_entities(). The expression  is used to filter result for which expression  evaluates to TRUE .

The variable name is used to select metrics/entities by names:

# get metric with name 'cpu_busy'
metrics <- get_metrics(expression = "name = 'cpu_busy'", verbose = FALSE)

Metrics and entities have user-defined tags. Each of these tags is a pair (“tag_name” : “tag_value”). The variable tags.tag_name  in an expression refers to the tag_value for given metric/entity. If a metric/entity does not have this tag, the tag_value will be an empty string.

# get metrics without 'source' tag, and include all tags of fetched metrics in output
get_metrics(expression = "tags.source != ''", tags = "*")

To get metrics with a user-defined tag ‘table’ equal to ‘System’:

# get metrics whose tag 'table' is equal to 'System'
metrics <- get_metrics(expression = "tags.table = 'System'", tags = "*")

To build more complex expressions, use brackets (, ), and and, or, not  logical operators as well as && , ||, !.

entities <- get_entities(expression = "tags.app != '' and (tags.os != '' or tags.ip != '')")

To test if a string is in a collections, use in  operator:

get_entities(expression = "name in ('derby-test', 'atom.axibase.com')")

Use like  operator to match values with expressions containing wildcards: expression = "name like 'disk*'" . The wildcard *  mean zero or more characters. The wildcard .  means any one character.

metrics <- get_metrics(expression = "name like '*cpu*' and tags.table = 'System'")
# get metrics with names consisting of 3 letters
metrics <- get_metrics(expression = "name like '...'")

There are additional functions you can use in an expression:

• list(string, delimeter))  Splits the string by delimeter. The default delimiter is a comma.

• upper(string)  Converts the string argument to upper case.

• lower(string)  Converts the string argument to lower case.

• collection(name)  Refers to a named collection of strings created in ATSD.

• likeAll(string, collection of patterns)  Returns true if every element in the collection of patterns matches the given string.

• likeAny(string, collection of patterns)  Returns true if at least one element in the collection of patterns matches the given string.

get_metrics(expression = "likeAll(lower(name), list('cpu*,*use*'))")
get_metrics(expression = "likeAny(lower(name), list('cpu*,*use*'))")
get_metrics(expression = "name in collection('fs_ignore')")

The atsd package uses connection parameters to connect with ATSD. These parameters are:

• url  - the url of ATSD including port number

• user  - the user name

• verify  - should ATSD SSL certificate be validated

• encryption  - cryptographic protocol used by ATSD https server

The configuration parameters are loaded from the package configuration file when you load the atsd package into R. (See Section 2.)

The functions show_connection()set_connection(),  and save_connection(),  show configuration parameters, change them, and store them in the configuration file.

Function name: show_connection()

Returns object: NULL

Description: The function prints current values of the connection parameters. (They may be different from the values in the configuration file.)

Arguments: no

Examples:

show_connection()

Function name: set_connection()

Returns object: NULL

Description: The function overrides the connection parameters for the duration of the current R session without changing the configuration file. If called without arguments the function sets the connection parameters from the configuration file. If the file  argument is provided the function use it. In both cases the current values of the parameters became the same as in the file. In case the file  argument is not provided, but some of other arguments are specified, the only specified parameters will be changed.

Arguments:

• url   (optional, string)
The url of ATSD including port number.

• user   (optional, string)
The user name.

• verify   (optional, string)
String - “yes” or “no”, verify = "yes"  ensures validation of ATSD SSL certificate and verify = "no"  suppresses the validation (applicable in the case of ‘https’ protocol).

• encryption   (optional, string)
Cryptographic protocol used by ATSD https server. Possible values are: “default”, “ssl2”, “ssl3”, and “tls1” (In most cases, use “ssl3” or “tls1”.)

• file  (optional, string)
The absolute path to the file from which the connection parameters could be read. The file should be formatted as the package configuration file, see Section 2.

Examples:

# Modify the user
set_connection(user = "user001")

# Modify the cryptographic protocol
set_connection(encryption = "tls1")

# Set the parameters of the https connection: url, user name, password
# should the certificate of the server be verifyed
# which cryptographic protocol is used for communication
set_connection(url = "https://my.company.com:8443",
user = "user001",
verify = "no",
encryption = "ssl3")

# Set up the connection parameters from the file:
set_connection(file = "/home/user001/atsd_https_connection.txt")

Function name: save_connection()

Returns object: NULL

Description: The function writes the connection parameters into the configuration file. If called without arguments the functions use current values of the connection parameters (including NAs). Otherwise only the provided arguments will be written to the configuration file. If configuration file is absent it will be created in the atsd package folder. Arguments:

• url   (optional, string)
The url of ATSD including port number.

• user   (optional, string)
The user name.

• verify   (optional, string)
String - “yes” or “no”, verify = "yes"  ensures validation of ATSD SSL certificate and verify = "no"  suppresses the validation (applicable in the case of ‘https’ protocol).

• encryption   (optional, string)
Cryptographic protocol used by ATSD https server. Possible values are: “default”, “ssl2”, “ssl3”, and “tls1” (In most cases, use “ssl3” or “tls1”.)

Examples:

# Write the current values of the connection parameters to the configuration file.
save_connection()

# Write the user name and password in the configuration file.
save_connection(user = "user00", password = "123456")

# Write all parameters nedeed for the https connection to the configuration file.
save_connection(url = "https://my.company.com:8443",
user = "user001",
encryption = "ssl3")