Monitoring CycleCloud Clusters
CycleCloud clusters can be monitored by customizing alerts and notifications. Additionally, an event log can be used to analyze all CycleCloud activity.
Event Logging
A log of all Azure CycleCloud activity can be found in the Event Log, which is located in the sidebar:
You can search the log for a specific event or keyword with the search bar located in the upper right corner. The log can also be changed to show information based on three parameters:
- Event Type
- Priority
- Time Frame
Use the drop down menus to select the event log parameters. The page will automatically refresh to show the appropriate information.
Alerting
Azure CycleCloud can send notifications when various conditions are met on a monitored resource or in CycleCloud itself. These notifications may be viewed in the web interface and optionally may be emailed to one or more recipients.
Viewing Notifications
Any user may view recent notifications by clicking the envelope icon in the upper righthand corner of the screen. Selecting a notification will display its full subject and body. Each of these notifications has a priority level. From low to high, these are:
- Info: for informational purposes only. No action is necessary.
- Warn: indicates a possible issue. Further investigation may be warranted.
- Error: indicates a likely problem. Action may be needed to resolve this.
Customizing Alerts
Administrators can view, create or modify alerts by going to the alert configuration page. This page may be reached by selecting Alerting from the user menu in the upper right-hand corner of the screen.
On the left-hand side of the screen there is a list of named alerting rules (e.g. "Hosts Not Responding", "Jobs in Error State"). To view or edit one of these rules, simply select it in the list. At the bottom of the list are icons to create, delete, or duplicate these rules.
Alerting rules come in two forms: query-based rules and plugin-based rules. Query-based rules are generic alerting rules which may be created and edited through the web interface. Plugin-based rules use CycleCloud's plugin architecture to allow alerts which are not easily generated through a SQL-style query. Plugin-based rules may support different levels of customization depending on the plugin.
After making changes to a rule, be sure to click the Apply button in the lower righthand corner to save your changes.
Common Rule Configuration Options
Query and Plugin alerts have several customization options. These options are displayed at the top of the rule form:
- Enable this rule: If checked, this rule will generate notifications. Otherwise no notifications are generated and no emails are sent.
- Send alert emails to: One or more email addresses to receive notifications. For multiple addresses, separate each address with a comma. Note that this requires SMTP to be configured in CycleCloud.
- Priority: The relative priority of this message. See above for descriptions of these priorities.
Query-Based Alerting Rules
Query-based rules are the most common type, and are highly customizable. Queries are written using CycleCloud's SQL-like query language.
Queries are run every 5 minutes, or when the Run Now button is clicked at the bottom of the rule editing form. If a query returns one or more results, a notification will be generated. For most queries, this can result in messages being sent every 5 minutes until no results are returned. To limit the number of messages sent, there is an option to Generate messages only when the result count changes. If this box is checked, each time the query runs the number of results will be checked against the previous result count. If the numbers match, no notification will be generated.
When editing a query-based rule, there are two major steps: generating the query and creating the message template.
Generating a Query
The first step in generating a query is to select the record type via the dropdown that reads Query from ____ records. This is the equivalent of the FROM clause in the query language. For example, to create an alert on CycleCloud instances, select Cloud.Instance (Cloud Instance) from the menu.
The next step is to determine which attributes on each record are needed to generate the final notification. To do this, edit the top-half of the query and add a comma-separated list of attribute names after the SELECT
. For example, the following will allow an instance notification to contain region and instance id: SELECT Region, InstanceId
.
To complete the query, determine the condition(s) which should trigger the notification and fill in the WHERE
clause with a filter expression. Below are some examples of various instance filters. See the datastore query language documentation for more information on how to write filter expressions.
Alert on instances running outside of the "eastus" region
WHERE !startswith("eastus", Region)
Alert on execute nodes in the "example" cluster which were running for less than 1 hour
WHERE ClusterName === "example" && SessionUpTime < `1h` && startswith("execute", NodeName) && MachineState === “Terminated”
Note
When writing a query for the first time, it can be helpful to use the cycle_server execute
command to > test out various queries. Switch to the CycleCloud installation directory and run ./cycle_server execute <query>
to view instant results. For example: ./cycle_server execute 'SELECT Region, InstanceId FROM Cloud.Instance WHERE !startswith("eastus", Region)
Creating a Message Template
Now that the query is finished, create a subject and body for the notification message. The subject is plain text while the body is HTML. Both the subject and body use a templating language to inject query results into the content.
In the templating language, expressions are surrounded by {%= %}
symbols. The results of the query are stored in a context variable called "Results" which is a list of records. For example, {%= Results %}
would print out the full list of query results, and {%= size(Results) }
would print out the number of records in the list.
The most common way to format a notification is to print out the number of results in the subject and loop over the result set in the body, printing out details of each record. Below is an example of a message subject and body for reporting on instances running outside of the 'eastus' regions:
Subject:
{%= size(Results) %} instances found running outside of useast
Body:
<h2>The following instances are running outside of eastus:</h2>
<ul>
{% for Instance in Results %}
<li>{%= Instance.InstanceId %} is running in {%= Instance.Region %}</li>
{% endfor %}
</ul>
Email Configuration and Logging Levels
Logging in CycleCloud can be configured to output different levels of detail. The available levels are:
- Debug
- Info
- Warning
- Error
By default, CycleCloud includes all log messages. However, if less detail is desired, the level of logging can be adjusted. To do this, change the value of the Logging Level system setting to either INFO
or WARN
, or ERROR
.
Additionally, CycleCloud can be configured to email a user or group of users when errors occur. It must be configured with a mail server as well as the addresses to send to and the from. The following system settings control these values:
System Setting | Description |
---|---|
mail.host | The SMTP host used to send email. |
monitor.notify_to | The comma-separated email addresses the notifications are sent to. |
monitor.notify_from | The email address the notifications are sent from. |