Articles
Articles
Jan 22, 2025

Exploring SLA Features in Alert Manager Enterprise 3.2

Explore the Service Levels feature introduced in AME 3.2, enabling precise control over event management with customizable SLAs.

Exploring SLA Features in Alert Manager Enterprise 3.2

In this article we wish to introduce users to the Service Levels features that were introduced with AME version 3.2.  SLAs are a game-changing addition that empower you to define precise policies for managing service levels associated with events within AME.

What Are SLAs in AME?

SLAs enable fine-grained control over service levels, allowing you to specify objectives and thresholds for AME events. These agreements can be tailored to each tenant through the tenant configuration section, offering robust functionality to account for:

  •   Customer or Team Time Zones
  •   Specific Working Hours and Days
  •   Local and ad-hoc Holidays and Absences

In AME, each SLA is governed by an objective, and multiple objectives can be configured per tenant to apply to specific event conditions.

Defining SLAs: A Step-by-Step Guide

Example

To illustrate, let’s create an SLA for tracking Time to Respond.  Start by assigning a name, such as “Response Time", and providing a description.

Define a threshold for after how long the SLA is considered violated.  You can also configure a notification interval to alert teams until the SLA state is resolved.  For example, set hourly notifications for ongoing violations.  This will send a notification each hour until the SLA is no longer breached.

Additionally a reminder threshold can be set to warn teams before an SLA breach occurs.

Establish Start and Stop Conditions

The syntax used is similar to those of the AME rule engine to define when an SLA starts and stops.

Start Condition: Check if an event title contains the keyword “SLA.” (We only wish to match events with this keyword for SLA consideration)

Stop Condition: Ensure the SLA applies only to events in progress, excluding new events.  For this we populate the ame.status_type and the conditionals appropriately.

Notifications

Notifications for SLA violations or imminent breaches require defining a notification scheme. For this example we create a scheme named “SLA Violation” and link it to our notification target, which is email.  You can also use Slack, Teams or any other notification mechanism supported by AME.

Notification Settings

Use-Case: Update Event Metadata

You can also update event metadata to better manage SLA states, such as escalating urgency for breached SLA events. This ensures analysts can quickly identify high-priority issues, when evaluating events in AME according to urgency and priority.

Update event metadata when SLA is breached

Multiple SLAs for One Event

AME supports defining multiple SLAs for the same event. For example, you can track both Time to Respond (TTA) and Time to Resolve (TTR) by creating an additional objective and with start and stop conditions.

SLA Periods

You can configure the SLA validity periods with great flexibility in the configuration screen.  AME allows you to define which time-zone is in effect, which recurring and non-recurring holidays apply as well as specific working hours for your teams or customers.  These settings ensure that SLA rules only apply during active working periods.

If you are an MSSP managing multiple customers (tenants), then you can model the SLA times for your customers, applying their respective time-zones and working hours.

Configuration of Time Periods for SLA calculations

Reporting SLA Performance

AME includes a dedicated reporting dashboard to monitor SLA metrics. Key insights include:

  • SLA performance versus violations
  • Details broken down by tenant
  • Trends and analysis
SLA Compliancy Report

Conclusion

The SLA features in AME 3.2 provide unparalleled flexibility and control, enabling teams to meet service level expectations effectively. From response times to tailored working hours, these capabilities enhance your event management workflows.

Tutorial Video