BLOG

Intelligent Alerting, Fewer Headaches: Insider View at ilert AIOps

Daria Yankevich
August 9, 2024
Table of Contents:

You might have noticed that we released a series of AI-supported features last year. Intelligent alert grouping, developed to reduce alert fatigue, is the icing on the cake.

With it, we combined all ilert AI features in a new powerful add-on that aims to reduce stress and give more clarity during IT incidents.

This blog post will provide a complete guide on the features included in the brand-new AIOps add-on, explain how those features are built and function, and help you evaluate if it's worth investing in. 

How ilert Already Resolves the Problem of Alert Duplication 

Alert duplication happens when multiple alerts for the same issue are generated by different monitoring systems or redundant checks within the same system. For example, if a server goes down, alerts might be sent from the server's own monitoring tool, the network monitoring system, and the application performance monitoring system. This creates a flood of notifications for a single problem. As a result, IT teams become overwhelmed and desensitized to alerts. 

Alert fatigue increases the risk of critical alerts being missed or ignored, slowing down the incident resolution and potentially causing more significant issues if the underlying problem remains unaddressed. Managing alert duplication is essential to maintaining focus on genuine incidents and ensuring efficient incident response.

ilert itself is already one step towards reducing the impact of the alert noise problem. The platform provides centralized alert management by aggregating alerts from various monitoring tools, ensuring all alerts are visible in one place. Intelligent grouping is a new protective layer indispensable for teams managing vast volumes of alerts.

Intelligent Grouping: AI Looks Deep into Alerts

ilert's intelligent grouping feature employs a sophisticated approach to minimize duplication by deeply analyzing alerts' content. The AI looks beyond surface-level data, examining the context and underlying details of alerts to intelligently combine them into unified groups. 

This new approach is based on text embeddings models, a type of machine learning model that represents complex data as dense vectors of real numbers in a lower-dimensional space. Vector embeddings stand for words, sentences, or documents. They capture the semantic relationships between data points, meaning that similar items are placed closer together in the vector space. 

If an ilert user enables an intelligent alert grouping feature for their alert source, there is a whole new process running under the hood. 

How Does It Work?

There are four stages alerts pass when we enable intelligent grouping. 

1. Pre-Processing. Pre-processing involves normalizing and cleaning alerts. Being a centralized alert management platform, ilert already normalizes alerts across multiple alert sources into a common format. For intelligent alert grouping, we remove alert fields that are not relevant for grouping, e.g. timestamps or IDs. 

2. Vectorization. Each incoming alert is transformed into a vector. The model used in ilert is trained on large datasets and can capture a wide range of semantic meanings, making them suitable for encoding the information contained in alerts.

3. Adjusting to ilert deduplication logic. There are various adjustments to how exactly alerts are combined into groups. For example, ilert users can fine-tune when two alerts are considered duplicates by setting a threshold and previewing how their threshold would affect grouping based on past alerts. ilert AI will proceed with deduplication depending on how the threshold score is adjusted.

4. Feedback loop. We make it very easy to provide feedback on whether an alert was correctly grouped or not and use this feedback to further fine-tune and improve the deduplication feature.

Video: How to Enable Intelligent Alert Grouping

Our documentation contains text instructions on how to switch on the feature. We have also prepared a video tutorial for you. 

Event Filter: Get Rid of Unimportant Noise

Occasionally, marking alerts as low priority isn't sufficient, and it becomes necessary to discard events entirely. For example, Grafana's DatasourceNoData can be such an event. Therefore, you can set up one or multiple event filter groups for your alert source to ensure that only relevant events are processed into alerts.

The latest AIOps release introduces an advanced filtering option designed to streamline and enhance the alert management process. This new feature allows users to set an event count threshold on their alert source, coupled with a specific time window for triggering alerts. For example, you can define a condition such as: “Only generate an alert if there are 10 alerts within 5 minutes.” This threshold can be adjusted to match the criticality and frequency of events typical to your operational environment.

By implementing this event count threshold-based alerting mechanism, the system efficiently filters out inconsequential alerts, ensuring that only significant events prompt notifications. This selective alerting not only reduces the volume of alerts that need to be manually reviewed but also allows your team to focus their efforts on analyzing and responding to the most critical issues. 

When Everything is on Fire, Let ilert Speak

Incidents are an inevitable part of managing any complex system, and the ability to communicate effectively during them is crucial. That's why the AIOps add-on offers advanced features for incident communication. These include fast preparation of an incident summary and list of affected services so that engineers don't have to find proper words to update the status page. Additionally, the ilert AI assistance in post-mortem document creation is also included in the AIOps suite to help users cover a full life-cycle of incident response. Find more about post-mortem and AI-backed incident communication features in the blog.  

When AIOps is a Must-Have

To simplify your team's decision-making, we prepared a list of signals indicating that you need to use advanced AIOps features for incident management. 

  • Your team uses various monitoring tools that generate overlapping alerts.
  • Engineers are inundated with a large number of daily alerts, making it challenging to identify and prioritize critical issues. Your MTTA (Mean Time To Acknowledge) is too high.
  • Your team is relatively small and struggles to effectively manage and respond to the high volume of alerts.
  • A significant proportion of alerts are false positives, leading to unnecessary distractions.
  • Your team is struggling to distinguish between critical alerts that require immediate attention and non-critical alerts that can be addressed later.
  • Many alerts are generated by temporary, self-resolving issues that do not require intervention.
  • Engineers are experiencing alert fatigue, leading to desensitization and missed critical issues. 

We hope this list will be helpful for evaluating AIOps add-on for your organization. If you have additional questions, feel free to contact the ilert support team.

If you are curious about how all those AI features are built and function, we presented a thorough technical feature overview in Paris this summer. 

Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.