Reduce Noise through Intelligent Alert Grouping

Discover how AI can tackle the challenge of alert noise by filtering and merging redundant notifications. This article highlights the role of text embedding models in enhancing alert management efficiency.

Zsuzsanna Borovszki

Sep 12, 2024 • 5 min read

Understanding Alert Noise and Its Impact

In an ideal world, every alert would signal a unique and critical issue. However, in reality, alerts often come in waves. Alert noise refers to the overwhelming volume of notifications that incident response teams receive, many of which may be redundant or irrelevant. This can lead to alert fatigue, where critical issues might be overlooked due to the sheer number of notifications.

‍

Reducing alert noise can help your team:

1. Focus: By grouping similar alerts, teams can concentrate on resolving incidents instead of sifting through noise.

2. Efficiency: Fewer, more relevant alerts lead to quicker decision-making and faster incident resolution.

3. Reduce stress: A more manageable flow of alerts minimizes the risk of alert fatigue, where important issues might be overlooked due to overwhelming notification volume.

Alert Deduplication and Processing

Excessive alert noise, caused by multiple similar notifications, can overwhelm incident response teams. Rather than bombarding teams with notifications for every problem, deduplication merges these alerts into a single, actionable item. This process relies on the semantic similarity of events, meaning that it groups alerts that convey the same meaning, even if they differ in wording. ilert employs AI-driven techniques to compare alerts, merging those that are similar.

Understanding Embedding Models

Embedding models are the backbone of AI-driven alert deduplication. These models translate human language into numerical representations, or vectors, that capture the meaning of the text. By leveraging these vectors, systems can effectively compare and group related alerts, enabling more precise and meaningful deduplication that cuts through the noise.

‍

Vector embeddings are mathematical representations of data in a high-dimensional space, where each piece of data — whether it's a word, sentence, or document — is represented as a point in this space. The magic of embeddings lies in their ability to position similar items close to each other, making it easier to identify and group related data. For example, embedding models can transform complex text, like an alert message, enabling the system to group and deduplicate alerts that convey the same information.

‍

Implementing Alert Deduplication

1) Preprocessing Alerts

The first step in deduplication is preprocessing. This involves normalizing the format of incoming alerts and cleaning the data by removing irrelevant elements like timestamps and IDs. By doing this, you ensure that all alerts are comparable and ready for accurate deduplication.

‍

2) Generating Text Embeddings

After preprocessing, each alert is transformed into a vector embedding using a pre-trained model like BERT or OpenAI. Vectors represent the meaning of the alerts, allowing for effective comparison and grouping during deduplication.

‍

3) Implementing Deduplication Logic

Once alerts are vectorized, the system uses similarity measures such as cosine similarity to compare them. If two alerts are deemed similar enough—based on a predefined threshold—they are merged into a single alert. This threshold can be fine-tuned to balance the accuracy of deduplication.

‍

4) Continuous Feedback and Optimization

A feedback loop is necessary because it enables operators to flag missing duplicates or false positives, allowing the system to constantly improve by modifying thresholds and fine-tuning the embedding models.

Key Considerations for Effective Deduplication

While embedding models are a powerful tool for deduplication, several key issues need to be addressed:

‍

Which Model to Choose? The right choice of embedding model will determine how well your deduplication process works. Fine-tuned or domain-specific models are better able to capture the nuanced information of your alerts, improving the deduplication outcomes.
What Threshold is Optimal? Establishing the appropriate threshold is essential. When a threshold is set too low, different warnings may be mistakenly combined, while a threshold set too high may result in duplicates being missed. Finding the ideal balance requires ongoing testing and tweaking.

‍

Reducing Noise with ilert

ilert AI offers a powerful solution for reducing alert noise through its advanced deduplication and alert management features. By integrating with your monitoring tools, ilert normalizes incoming alerts and uses AI-driven techniques to identify and merge duplicate notifications. This process significantly cuts down on the volume of alerts, allowing your team to focus on incident resolution.

‍

With ilert, you can ensure that only the most relevant alerts reach your team, reducing the risk of missed critical issues and enhancing overall incident response efficiency.

Engineering

How to Import Existing ilert Resources into Terraform

A comprehensive guide on how to add your existing incident management configurations to your Infrastructure as Code project

Engineering

Daria Yankevich

Aug 28, 2024 • 5 min read

Welcome to our detailed guide, which will help you incorporate your current ilert configurations for incident management into Terraform. Here, you will find a step-by-step tutorial to import your existing ilert resources to the Infrastructure as Code project and recommendations from our engineering team on best practices to maintain consistency across your infrastructure and incident management processes.

‍

If you are yet to start incorporating IaC practices in your organization, we recommend beginning with this ilert Terraform provider overview.

What problem do we solve?

The most common case is when users start their journey with the ilert incident management platform through the user interface and incorporate their established setup into Infrastructure as Code practices later. The ilert UI is typically more user-friendly and intuitive, making it quicker for engineers to create and configure resources like alert policies or on-call schedules. For instance, when experimenting with different settings or making quick changes, using the UI is faster and more straightforward than writing and applying Terraform code. On the other hand, once a resource configuration is stable and well-understood, engineers might prefer to codify it in Terraform for better consistency, version control, and automation across different environments.

‍

Even companies utilizing ilert with IaC practices for years might use a combination of ilert UI and Terraform based on factors like ease of use, the immediacy of needs, the complexity of resource management, and the team's experience level with Terraform. The hybrid approach allows flexibility during initial setup phases or when manual intervention is necessary, while Terraform is favored for long-term consistency and automation.

‍

So, it's perfectly fine that not all your ilert resources are already a part of your Terraform project. However, importing existing resources into an IaC project might be a bit tricky. Common problems are duplicates (newly created resources) in Terraform instead of the import of existing ones or errors like

‍

Error: Bad request: api respond with status code: 400, error code: ERROR, message: The email '[example@example.com]' is already used by user 1234567

‍

Let's see how to import existing ilert resources smoothly and avoid these issues.

Step 1: Identify an ilert resource ID you want to import

Let's see how to import an alert source created in the ilert interface into Terraform. Start with identifying a unique ID. You can do it directly in the UI or by using the API.

Method 1: Via ilert UI

Log into your ilert account and navigate Alert sources in the top menu.
Find the alert source you want to import from the list or use a search field.
Click on the alert source's name to view its details. Then navigate the URL: https://example.ilert.com/source/view?id=1234567. Copy the numbers at the end; this is the ID you need.

Method 2: Via API

Ensure you have an API key that can be generated from your ilert account settings under the API section.
Make a GET request to the ilert API:
GET https://api.ilert.com/api/v1/alert-sources‍
The API response will be in JSON format and include all alert sources' details, including their IDs. Look for the "id" field.

Step 2: Setup a Terraform block

In Terraform, a "block" refers to a section of code that defines a specific piece of configuration. Blocks are the building units in Terraform configuration files. Each block usually starts with a keyword that specifies what type of resource or setting you are configuring, followed by the details of that configuration enclosed in curly braces {}.

‍

In your Terraform configuration file (e.g., main.tf), you would define a resource block for the alert source.

‍


resource "ilert_alert_source" "example_alert_source" {
  name        = "Critical Server Alerts"
  integration_type = "API"
  escalation_policy_id = 1234  # Replace with the actual escalation policy ID
  auto_resolve_timeout = 900   # Time in seconds before automatically resolving the alert

  email_notification {
    email = "alerts@example.com"
  }

  sms_notification {
    phone_number = "+1234567890"
  }

}

Step 3: Execute the Terraform Import

In your terminal or command line, navigate to the directory containing your Terraform configuration files. Then, execute the following command:

‍

terraform import ilert_alert_source.example_alert_source <ALERT_SOURCE_ID>

‍

"ilert_alert_source.example_alert_source" refers to the Terraform resource you defined in your ".tf" file. Replace <ALERT_SOURCE_ID> with the actual ID of the alert source from ilert that you noted earlier.

‍

Note that while in 99% of the cases, the import keys (identifiers) are the same as the entity’s ID, they sometimes might differ. You can find the import description at the bottom of each resource in the ilert Terraform provider's documentation.

Step 4: Complete the Configuration

After importing, Terraform knows about the existing alert source. However, the configuration file itself might not have all the details yet.

‍

Execute a "terraform plan" to see what Terraform recognizes about the imported resource. This command will show you the current state of the resource compared to your configuration.

‍

Based on the output of a terraform plan, update the resource block in your .tf file with the appropriate configurations.

‍

Following these steps, you successfully import an existing ilert alert source into Terraform, enabling you to manage it as part of your Infrastructure as Code (IaC) setup. This process helps maintain consistency, allows easier updates, and integrates the alert source into your version-controlled infrastructure management.

Best practices and recommendations

Should I generate all the ilert entities via Terraform? How do other teams automate setting up and configuring their incident management workflows within IaC practices? We addressed these questions to ilert's CTO, Christian.

‍

‍"It largely depends on the company structure.
‍
If you have a centralized Ops team that handles tasks such as user and team synchronization (via Terraform, API, SSO provisioning, etc.), then you could theoretically also consider having that team manage all other resources, especially policies, alert sources and alert actions.

However, we advise against this if there are independent responder teams. In our opinion, it’s best practice for team-relevant resources to be managed by the team members themselves. In larger organizations, we also see teams with DevOps skills who manage this individually through their own Terraform configurations, but some teams exclusively use the ilert UI.

In cases with responders who do not directly interact with the resources and only deal with alerts and incidents or in organizations where the number of alert sources remains manageable, we also see customers using complete Terraform configurations for all account resources.

There are even setups where Ops teams have fully automated the ilert onboarding, starting with assigning the correct escalation policy to alert sources based on, for example, Prometheus labels. Another example is DevOps teams submitting pull requests to the Terraform repository to provision themselves independently via GitHub actions.

Personally, I believe that centralized Ops teams also benefit when responders take responsibility for their own alert sources and services because this leads to greater engagement with the platform, which in turn results in better workflows and faster response times."

Product

Ubidots: New IIoT Integration in ilert's Catalog

The seamless integration between ilert and Ubidots aims to streamline your operations, reduce machines' downtime, and improve overall efficiency.

Product

Daria Yankevich

Aug 13, 2024 • 5 min read

We are excited to add one more integration from the Industrial Internet of Things realm to our catalog! The seamless integration between ilert and Ubidots aims to streamline your operations, reduce machines' downtime, and improve overall efficiency.

What is Ubidots?

Ubidots is an innovative Internet of Things (IoT) platform that allows users to collect, analyze, and visualize data from their devices and sensors. It offers tools for building IoT applications, including real-time data visualization, cloud-based data storage, and advanced data analytics.

Users connect their hardware with the Ubidots platform using HTTP, MQTT, TCP, UDP, or by parsing custom/industrial protocols. The service works equally well for managing one or a thousand devices. Ubidots has various use cases. For example, with its help, companies track air and water quality, monitor the location and status of valuable assets, optimize the consumption and production of energy resources, and much more.

How Ubidots' Users Can Benefit from the Integration with ilert

Imagine a manufacturing company using Ubidots to monitor the performance of their machinery. Sensors placed on critical equipment collect data in real-time, providing insights into operating conditions and performance metrics. By connecting Ubidots with ilert, an alert is sent out immediately through multiple channels when an abnormal pattern is detected—such as a spike in temperature or vibration indicating a potential failure. An on-call technician receives the alert via SMS and phone call, ensuring they are aware of the issue even if their phone is on mute. The technician can then respond swiftly, checking the equipment and performing necessary maintenance before a failure occurs. After resolving the issue, the team can review the detailed post-incident report generated by ilert to understand the root cause and take steps to prevent future occurrences.

‍

The integration between Ubidots and ilert enhances the manufacturers' ability to respond to incidents quickly and efficiently. Here are a few key features of the integration.

‍

Actionable alerts. When there is an issue with a machine or sensor, users of ilert integration for Ubidots receive real-time actionable alerts. They can accept or reroute an alert to another engineer without logging into ilert.

‍

Live dashboards. Active monitoring becomes an intuitive and simple task thanks to Ubidots' drag-n-drop dashboards and its broad offer of widgets. Bring your SCADAs to the cloud and be in touch with your operation from anywhere.

‍

Automated on-call management. ilert's eliminates the manual effort and errors associated with managing on-call duties. The schedules are always at hand, and users will never miss on-call duty with automatic reminders.

‍

Status pages. Ubidots' customers can easily update stakeholders on the status of machines via ilert private and public status pages. ilert status pages communicate incidents on auto-pilot, and there are various authentication options for access fine-tuning, like passwordless email login or whitelisted IP addresses.

‍

AIOps. For those with hundreds of devices and who deal with large amounts of alerts regularly, ilert intelligent alert grouping and filtering can help reduce alert noise and better allocate engineering resources.

‍

Integrating ilert with Ubidots brings a new level of efficiency and responsiveness to IoT monitoring and incident management. If you are new to Ubidots, start a free 30-day trial here.

‍

For more information on setting up this integration, visit our integration guide.

Product

Intelligent Alerting, Fewer Headaches: Insider View at ilert AIOps

The new add-on aims to reduce stress during incidents and helps manage them smarter.

Product

Daria Yankevich

Aug 09, 2024 • 5 min read

You might have noticed that we released a series of AI-supported features last year. Intelligent alert grouping, developed to reduce alert fatigue, is the icing on the cake.

‍

With it, we combined all ilert AI features in a new powerful add-on that aims to reduce stress and give more clarity during IT incidents.

‍

This blog post will provide a complete guide on the features included in the brand-new AIOps add-on, explain how those features are built and function, and help you evaluate if it's worth investing in.

How ilert Already Resolves the Problem of Alert Duplication

Alert duplication happens when multiple alerts for the same issue are generated by different monitoring systems or redundant checks within the same system. For example, if a server goes down, alerts might be sent from the server's own monitoring tool, the network monitoring system, and the application performance monitoring system. This creates a flood of notifications for a single problem. As a result, IT teams become overwhelmed and desensitized to alerts.

‍

Alert fatigue increases the risk of critical alerts being missed or ignored, slowing down the incident resolution and potentially causing more significant issues if the underlying problem remains unaddressed. Managing alert duplication is essential to maintaining focus on genuine incidents and ensuring efficient incident response.

‍

ilert itself is already one step towards reducing the impact of the alert noise problem. The platform provides centralized alert management by aggregating alerts from various monitoring tools, ensuring all alerts are visible in one place. Intelligent grouping is a new protective layer indispensable for teams managing vast volumes of alerts.

Intelligent Grouping: AI Looks Deep into Alerts

ilert's intelligent grouping feature employs a sophisticated approach to minimize duplication by deeply analyzing alerts' content. The AI looks beyond surface-level data, examining the context and underlying details of alerts to intelligently combine them into unified groups.

‍

This new approach is based on text embeddings models, a type of machine learning model that represents complex data as dense vectors of real numbers in a lower-dimensional space. Vector embeddings stand for words, sentences, or documents. They capture the semantic relationships between data points, meaning that similar items are placed closer together in the vector space.

‍

If an ilert user enables an intelligent alert grouping feature for their alert source, there is a whole new process running under the hood.

How Does It Work?

There are four stages alerts pass when we enable intelligent grouping.

‍

1. Pre-Processing. Pre-processing involves normalizing and cleaning alerts. Being a centralized alert management platform, ilert already normalizes alerts across multiple alert sources into a common format. For intelligent alert grouping, we remove alert fields that are not relevant for grouping, e.g. timestamps or IDs.

‍

2. Vectorization. Each incoming alert is transformed into a vector. The model used in ilert is trained on large datasets and can capture a wide range of semantic meanings, making them suitable for encoding the information contained in alerts.

‍

3. Adjusting to ilert deduplication logic. There are various adjustments to how exactly alerts are combined into groups. For example, ilert users can fine-tune when two alerts are considered duplicates by setting a threshold and previewing how their threshold would affect grouping based on past alerts. ilert AI will proceed with deduplication depending on how the threshold score is adjusted.

‍

4. Feedback loop. We make it very easy to provide feedback on whether an alert was correctly grouped or not and use this feedback to further fine-tune and improve the deduplication feature.

Video: How to Enable Intelligent Alert Grouping

Our documentation contains text instructions on how to switch on the feature. We have also prepared a video tutorial for you.

‍

Event Filter: Get Rid of Unimportant Noise

Occasionally, marking alerts as low priority isn't sufficient, and it becomes necessary to discard events entirely. For example, Grafana's DatasourceNoData can be such an event. Therefore, you can set up one or multiple event filter groups for your alert source to ensure that only relevant events are processed into alerts.

‍

The latest AIOps release introduces an advanced filtering option designed to streamline and enhance the alert management process. This new feature allows users to set an event count threshold on their alert source, coupled with a specific time window for triggering alerts. For example, you can define a condition such as: “Only generate an alert if there are 10 alerts within 5 minutes.” This threshold can be adjusted to match the criticality and frequency of events typical to your operational environment.

‍

By implementing this event count threshold-based alerting mechanism, the system efficiently filters out inconsequential alerts, ensuring that only significant events prompt notifications. This selective alerting not only reduces the volume of alerts that need to be manually reviewed but also allows your team to focus their efforts on analyzing and responding to the most critical issues.

‍

When Everything is on Fire, Let ilert Speak

Incidents are an inevitable part of managing any complex system, and the ability to communicate effectively during them is crucial. That's why the AIOps add-on offers advanced features for incident communication. These include fast preparation of an incident summary and list of affected services so that engineers don't have to find proper words to update the status page. Additionally, the ilert AI assistance in post-mortem document creation is also included in the AIOps suite to help users cover a full life-cycle of incident response. Find more about post-mortem and AI-backed incident communication features in the blog.

When AIOps is a Must-Have

To simplify your team's decision-making, we prepared a list of signals indicating that you need to use advanced AIOps features for incident management.

‍

Your team uses various monitoring tools that generate overlapping alerts.
Engineers are inundated with a large number of daily alerts, making it challenging to identify and prioritize critical issues. Your MTTA (Mean Time To Acknowledge) is too high.
Your team is relatively small and struggles to effectively manage and respond to the high volume of alerts.
A significant proportion of alerts are false positives, leading to unnecessary distractions.
Your team is struggling to distinguish between critical alerts that require immediate attention and non-critical alerts that can be addressed later.
Many alerts are generated by temporary, self-resolving issues that do not require intervention.
Engineers are experiencing alert fatigue, leading to desensitization and missed critical issues.

‍

We hope this list will be helpful for evaluating AIOps add-on for your organization. If you have additional questions, feel free to contact the ilert support team.

‍

If you are curious about how all those AI features are built and function, we presented a thorough technical feature overview in Paris this summer.

‍

Insights

Reduce Noise through Intelligent Alert Grouping

Insights

Zsuzsanna Borovszki

Sep 12, 2024 • 5 min read

Understanding Alert Noise and Its Impact

‍

Reducing alert noise can help your team:

1. Focus: By grouping similar alerts, teams can concentrate on resolving incidents instead of sifting through noise.

2. Efficiency: Fewer, more relevant alerts lead to quicker decision-making and faster incident resolution.

3. Reduce stress: A more manageable flow of alerts minimizes the risk of alert fatigue, where important issues might be overlooked due to overwhelming notification volume.

Alert Deduplication and Processing

Understanding Embedding Models

‍

Implementing Alert Deduplication

1) Preprocessing Alerts

‍

2) Generating Text Embeddings

‍

3) Implementing Deduplication Logic

‍

4) Continuous Feedback and Optimization

Key Considerations for Effective Deduplication

While embedding models are a powerful tool for deduplication, several key issues need to be addressed:

‍

Which Model to Choose? The right choice of embedding model will determine how well your deduplication process works. Fine-tuned or domain-specific models are better able to capture the nuanced information of your alerts, improving the deduplication outcomes.
What Threshold is Optimal? Establishing the appropriate threshold is essential. When a threshold is set too low, different warnings may be mistakenly combined, while a threshold set too high may result in duplicates being missed. Finding the ideal balance requires ongoing testing and tweaking.

‍

Reducing Noise with ilert

‍

With ilert, you can ensure that only the most relevant alerts reach your team, reducing the risk of missed critical issues and enhancing overall incident response efficiency.

Engineering

How to Import Existing ilert Resources into Terraform

A comprehensive guide on how to add your existing incident management configurations to your Infrastructure as Code project

Engineering

Daria Yankevich

Aug 28, 2024 • 5 min read

‍

If you are yet to start incorporating IaC practices in your organization, we recommend beginning with this ilert Terraform provider overview.

What problem do we solve?

‍

Error: Bad request: api respond with status code: 400, error code: ERROR, message: The email '[example@example.com]' is already used by user 1234567

‍

Let's see how to import existing ilert resources smoothly and avoid these issues.

Step 1: Identify an ilert resource ID you want to import

Let's see how to import an alert source created in the ilert interface into Terraform. Start with identifying a unique ID. You can do it directly in the UI or by using the API.

Method 1: Via ilert UI

Log into your ilert account and navigate Alert sources in the top menu.
Find the alert source you want to import from the list or use a search field.
Click on the alert source's name to view its details. Then navigate the URL: https://example.ilert.com/source/view?id=1234567. Copy the numbers at the end; this is the ID you need.

Method 2: Via API

Ensure you have an API key that can be generated from your ilert account settings under the API section.
Make a GET request to the ilert API:
GET https://api.ilert.com/api/v1/alert-sources‍
The API response will be in JSON format and include all alert sources' details, including their IDs. Look for the "id" field.

Step 2: Setup a Terraform block

‍

In your Terraform configuration file (e.g., main.tf), you would define a resource block for the alert source.

‍


resource "ilert_alert_source" "example_alert_source" {
  name        = "Critical Server Alerts"
  integration_type = "API"
  escalation_policy_id = 1234  # Replace with the actual escalation policy ID
  auto_resolve_timeout = 900   # Time in seconds before automatically resolving the alert

  email_notification {
    email = "alerts@example.com"
  }

  sms_notification {
    phone_number = "+1234567890"
  }

}

Step 3: Execute the Terraform Import

In your terminal or command line, navigate to the directory containing your Terraform configuration files. Then, execute the following command:

‍

terraform import ilert_alert_source.example_alert_source <ALERT_SOURCE_ID>

‍

Step 4: Complete the Configuration

After importing, Terraform knows about the existing alert source. However, the configuration file itself might not have all the details yet.

‍

Execute a "terraform plan" to see what Terraform recognizes about the imported resource. This command will show you the current state of the resource compared to your configuration.

‍

Based on the output of a terraform plan, update the resource block in your .tf file with the appropriate configurations.

‍

Best practices and recommendations

‍

Product

Intelligent Alerting, Fewer Headaches: Insider View at ilert AIOps

The new add-on aims to reduce stress during incidents and helps manage them smarter.

Product

Daria Yankevich

Aug 09, 2024 • 5 min read

You might have noticed that we released a series of AI-supported features last year. Intelligent alert grouping, developed to reduce alert fatigue, is the icing on the cake.

‍

With it, we combined all ilert AI features in a new powerful add-on that aims to reduce stress and give more clarity during IT incidents.

‍

How ilert Already Resolves the Problem of Alert Duplication

‍

Intelligent Grouping: AI Looks Deep into Alerts

‍

If an ilert user enables an intelligent alert grouping feature for their alert source, there is a whole new process running under the hood.

How Does It Work?

There are four stages alerts pass when we enable intelligent grouping.

‍

4. Feedback loop. We make it very easy to provide feedback on whether an alert was correctly grouped or not and use this feedback to further fine-tune and improve the deduplication feature.

Video: How to Enable Intelligent Alert Grouping

Our documentation contains text instructions on how to switch on the feature. We have also prepared a video tutorial for you.

‍

Event Filter: Get Rid of Unimportant Noise

‍

When Everything is on Fire, Let ilert Speak

When AIOps is a Must-Have

To simplify your team's decision-making, we prepared a list of signals indicating that you need to use advanced AIOps features for incident management.

‍

Your team uses various monitoring tools that generate overlapping alerts.
Engineers are inundated with a large number of daily alerts, making it challenging to identify and prioritize critical issues. Your MTTA (Mean Time To Acknowledge) is too high.
Your team is relatively small and struggles to effectively manage and respond to the high volume of alerts.
A significant proportion of alerts are false positives, leading to unnecessary distractions.
Your team is struggling to distinguish between critical alerts that require immediate attention and non-critical alerts that can be addressed later.
Many alerts are generated by temporary, self-resolving issues that do not require intervention.
Engineers are experiencing alert fatigue, leading to desensitization and missed critical issues.

‍

We hope this list will be helpful for evaluating AIOps add-on for your organization. If you have additional questions, feel free to contact the ilert support team.

‍

If you are curious about how all those AI features are built and function, we presented a thorough technical feature overview in Paris this summer.

‍

Engineering

Alerting with Twilio: Connect Your Monitoring with the Top-1 Communications Platform

Pros and cons of enabling direct notifications for critical alerts

Engineering

Daria Yankevich

Aug 06, 2024 • 5 min read

You might be surprised. Why does ilert, the platform dedicated to alerting and incident management, publish anything about the direct (in the sense of bypassing an incident management tool) connection between monitoring solutions and Twilio? Do they take the bread out their own month? —You might think. Working on DevOps incident management since 2009, we believe every solution fits specific needs. So, in this article, we will uncover in what cases direct alerting with Twilio might work well, how to connect Twilio with your monitoring, and when it's time to consider a comprehensive incident management platform.

What does Twilio do?

Twilio is a cloud communications platform that allows developers to integrate various communication methods into their applications. This includes voice, messaging (SMS, MMS, chat), video, and email. Twilio's services are designed to make it easy for developers to add communication features without having to build the infrastructure themselves.

‍

Twilio is an industry leader. It does have competitors, like Vonage (formerly Nexmo), Plivo, Sinch, and MessageBird, but by July 2024, it's a solution number one, according to Gartner. Dozens of millions of developers across the globe use it for their products. So, if you have recently received a notification from Airbnb or Uber, there is a high chance Twilio processed it. Incident management platforms such as PagerDuty, VictorOps, and your humble servant ilert also run notifications on Twilio.

How does Twilio work?

Twilio is a cloud-based platform that integrates various communication methods into applications using a set of APIs. Users start by creating an account on the Twilio website and accessing their unique Account SID and Auth Token from the Twilio Console, which are used for API authentication. Developers then choose the desired communication service and, if necessary, purchase phone numbers via the Twilio Console. Using Twilio's SDKs for different programming languages, they can write code to send requests to Twilio’s API endpoints, facilitating actions like sending SMS, making voice calls, or initiating video conferences.

Pros: Bypass Other Tools and Connect Your Monitoring with Twilio

Developers consider using Twilio alerting for incident management purposes for multiple reasons. Here are the most prominent of them.

‍

Cost-effective. Twilio's pay-as-you-go pricing model makes it ideal for startups that don't need many notifications.

Simplicity. The fewer dependencies you have, the better. Direct integration reduces the number of tools and platforms that need to be managed, simplifying the overall system architecture.

Complete control over data flow. You see and take care of the event flow yourself.

Easy to implement. Twilio has a straightforward API and extensive documentation.

Cons: Alerting is not Yet Incident Management

While it might be convenient to receive an alert right from the monitoring tool, there are incident management protocols that cannot be followed without proper tools. Don't get it wrong: protocols without practical application are nothing, but millions of IT incidents have taught the DevOps community how to approach critical situations and reduce incidents' impact. By the end of the day, engineers require altering not for fun but to be aware of serious issues that influence business (read—service availability, customer satisfaction, and revenue), so the stakes are high. Here are the disadvantages of using Twilio as a standalone incident management tool.

‍

No escalation possibilities. Advanced escalation policies, such as routing alerts based on on-call schedules or incident severity, are not supported out of the box.

No centralized incident management. Unlike dedicated platforms, Twilio does not offer features like incident tracking, automatic resolution workflows, status pages, or post-incident analysis. Developers will have to handle all these manually and, ironically, in many cases, this will require purchasing a few additional tools.

Custom development and maintenance. Setting up and maintaining direct integrations requires custom scripting and ongoing development work. The same goes for keeping custom integrations up-to-date with changes in monitoring tools or Twilio APIs.

Scalability issues. While Twilio can handle large volumes of messages, managing and processing a high volume of alerts directly can be challenging.

Alert fatigue. This is connected to the previous point. Without sophisticated filtering, grouping, and deduplication features, there's a risk of receiving too many alerts. Imagine waking several nights in a row or being constantly interrupted during a working day.

Limited collaboration features. After receiving alerts, developers have to take action. In most of the cases, IT incidents are not handled by one person only. The lack of a centralized communication space where all alert details and timeline are available for engineers may lead to communication gaps and inefficiencies in coordinating incident response.

Missing decoupled infrastructure and high availability. It's overlooked that hosting alerting scripts or software on the same hardware or in the same datacenter as other software can be problematic. If there's downtime, the alerting system is likely to be affected, causing missed alerts. Additionally, maintaining high uptime above 99.9% becomes more challenging.

Geographical limitations. If you have a distributed team, it might be complicated to set up SMS and voice alerting in many countries. Different regional policies and restrictions exist, some of which prohibit calling or delivering messages.

Harder to adhere to an SLA commitment. Twilio can also experience downtime. In such situations, incident management platforms like ilert have a backup plan and can automatically switch to a different provider to minimize outage for clients. Relying solely on Twilio makes it challenging to guarantee a high uptime percentage to your customers, as your uptime depends heavily on the service.

‍

Are you still unsure how to proceed—with Twilio or with a more advanced alert and incident management? We simplified the decision process for you. Below, you will find a brief checklist that will help you. If you don't tick all the boxes, we recommend deciding in favor of an incident management platform.

‍

You are a small company with no more than 2–3 engineers.
You have a single monitoring tool and don't plan to add more in the next year.
Your monitoring solutions fire less than 250 alerts per month.
Your responders are all in the same region.
Your use case doesn't require a high uptime SLA guarantee.

Step-by-step Instructions on How to Send Alerts via Twilio

Go to the Twilio website and sign up for an account.
After signing up, you will get your Account SID and Auth Token. Keep these credentials safe.
Ensure you have Node.js installed on your machine. You can download it from here.
Initialize a new Node.js project and install the Twilio library via npm install twilio. Then, setup a Twilio client in your script using your credentials:


const twilio = require("twilio");
const client = new twilio("ACCOUNT_SID", "AUTH_TOKEN", {
  autoRetry: true,
  maxRetries: 3,
});

Use the client to send an SMS using Twilio:


function sendSmsAlert(message, to) {
  client.messages.create({
      body: message,
      to,  // recipient's phone number E164 format
      from: "YOUR_TWILIO_NUMBER"
  })
  .then((message) => console.log(`Alert sent: ${message.sid}`))
  .catch((error) => console.error(`Failed to send alert: ${error.message}`));
}

sendSmsAlert("Server CPU usage is above threshold", "+1234567890");

Or use the client to call using Twilio:


function makeVoiceCallAlert(to) {
  client.calls.create({
      url: "http://demo.twilio.com/docs/voice.xml", // URL of TwiML instructions
      to, // recipient's phone number E164 format
      from: "YOUR_TWILIO_NUMBER"
  })
  .then((call) => console.log(`Alert call initiated: ${call.sid}`))
  .catch((error) => console.error(`Failed to initiate alert call: ${error.message}`));
}

makeVoiceCallAlert("+1234567890");

If you are using a monitoring tool like Prometheus, Nagios, or another system, you can integrate the SMS sending or phone calling logic within the alert handler or use a webhook to trigger the sendSmsAlert or makeVoiceCallAlert function.

Summary

Twilio is a reliable solution with a strong market presence. In some cases, standalone, it can work well for diving into IT and DevOps alerting purposes. Small teams with limited budgets, a low volume of alerts, and only a single monitoring tool will benefit from using Twilio for alerting purposes. In contrast, teams that handle extensive amounts of events from various monitoring solutions, need comprehensive communication during incidents and have high finance and reputation risks should consider incident management platforms to mitigate downtimes.

‍

‍This is a reminder that ilert offers a Free plan for small teams. With it, you can handle up to 100 SMS and voice messages, unlimited push and email notifications, use as many monitoring integrations as you like and take advantage of a status page. Learn more about ilert's pricing.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Our team's favorites

Daria Yankevich

Alerting with Twilio: Connect Your Monitoring with the Top-1 Communications Platform

Pros and cons of enabling direct notifications for critical alerts

Read more ->

Roman Frey

How to Deploy Qdrant Database to Kubernetes Using Terraform: A Step-by-Outer Guide with Examples

There is no Terraform deployment guide for Qdrant on the internet, only the Helm variant, so we decided to publish this article.

Read more ->

Christian Fröhlingsdorf

How to Keep Observability Alive in Microservice Landscapes through OpenTelemetry

Observability, beyond its traditional scope of logging, monitoring, and tracing, can be intricately defined through the lens of incident response efficiency—specifically by examining the time it takes for teams to grasp the full context and background of a technical incident.

Read more ->

Daniel Weiß

ITIL vs. DevOps: What is best for your organization?

Read more ->

Latest Posts

Insights

Reduce Noise through Intelligent Alert Grouping

Zsuzsanna Borovszki

Sep 12, 2024 • 5 min read

Understanding Alert Noise and Its Impact

‍

Reducing alert noise can help your team:

1. Focus: By grouping similar alerts, teams can concentrate on resolving incidents instead of sifting through noise.

2. Efficiency: Fewer, more relevant alerts lead to quicker decision-making and faster incident resolution.

3. Reduce stress: A more manageable flow of alerts minimizes the risk of alert fatigue, where important issues might be overlooked due to overwhelming notification volume.

Alert Deduplication and Processing

Understanding Embedding Models

‍

Implementing Alert Deduplication

1) Preprocessing Alerts

‍

2) Generating Text Embeddings

‍

3) Implementing Deduplication Logic

‍

4) Continuous Feedback and Optimization

Key Considerations for Effective Deduplication

While embedding models are a powerful tool for deduplication, several key issues need to be addressed:

‍

Which Model to Choose? The right choice of embedding model will determine how well your deduplication process works. Fine-tuned or domain-specific models are better able to capture the nuanced information of your alerts, improving the deduplication outcomes.
What Threshold is Optimal? Establishing the appropriate threshold is essential. When a threshold is set too low, different warnings may be mistakenly combined, while a threshold set too high may result in duplicates being missed. Finding the ideal balance requires ongoing testing and tweaking.

‍

Reducing Noise with ilert

‍

With ilert, you can ensure that only the most relevant alerts reach your team, reducing the risk of missed critical issues and enhancing overall incident response efficiency.

Engineering

How to Import Existing ilert Resources into Terraform

A comprehensive guide on how to add your existing incident management configurations to your Infrastructure as Code project

Daria Yankevich

Aug 28, 2024 • 5 min read

‍

If you are yet to start incorporating IaC practices in your organization, we recommend beginning with this ilert Terraform provider overview.

What problem do we solve?

‍

Error: Bad request: api respond with status code: 400, error code: ERROR, message: The email '[example@example.com]' is already used by user 1234567

‍

Let's see how to import existing ilert resources smoothly and avoid these issues.

Step 1: Identify an ilert resource ID you want to import

Let's see how to import an alert source created in the ilert interface into Terraform. Start with identifying a unique ID. You can do it directly in the UI or by using the API.

Method 1: Via ilert UI

Log into your ilert account and navigate Alert sources in the top menu.
Find the alert source you want to import from the list or use a search field.
Click on the alert source's name to view its details. Then navigate the URL: https://example.ilert.com/source/view?id=1234567. Copy the numbers at the end; this is the ID you need.

Method 2: Via API

Ensure you have an API key that can be generated from your ilert account settings under the API section.
Make a GET request to the ilert API:
GET https://api.ilert.com/api/v1/alert-sources‍
The API response will be in JSON format and include all alert sources' details, including their IDs. Look for the "id" field.

Step 2: Setup a Terraform block

‍

In your Terraform configuration file (e.g., main.tf), you would define a resource block for the alert source.

‍


resource "ilert_alert_source" "example_alert_source" {
  name        = "Critical Server Alerts"
  integration_type = "API"
  escalation_policy_id = 1234  # Replace with the actual escalation policy ID
  auto_resolve_timeout = 900   # Time in seconds before automatically resolving the alert

  email_notification {
    email = "alerts@example.com"
  }

  sms_notification {
    phone_number = "+1234567890"
  }

}

Step 3: Execute the Terraform Import

In your terminal or command line, navigate to the directory containing your Terraform configuration files. Then, execute the following command:

‍

terraform import ilert_alert_source.example_alert_source <ALERT_SOURCE_ID>

‍

Step 4: Complete the Configuration

After importing, Terraform knows about the existing alert source. However, the configuration file itself might not have all the details yet.

‍

Execute a "terraform plan" to see what Terraform recognizes about the imported resource. This command will show you the current state of the resource compared to your configuration.

‍

Based on the output of a terraform plan, update the resource block in your .tf file with the appropriate configurations.

‍

Best practices and recommendations

‍

Product

Ubidots: New IIoT Integration in ilert's Catalog

The seamless integration between ilert and Ubidots aims to streamline your operations, reduce machines' downtime, and improve overall efficiency.

Daria Yankevich

Aug 13, 2024 • 5 min read

What is Ubidots?

How Ubidots' Users Can Benefit from the Integration with ilert

‍

The integration between Ubidots and ilert enhances the manufacturers' ability to respond to incidents quickly and efficiently. Here are a few key features of the integration.

‍

Integrating ilert with Ubidots brings a new level of efficiency and responsiveness to IoT monitoring and incident management. If you are new to Ubidots, start a free 30-day trial here.

‍

For more information on setting up this integration, visit our integration guide.

Ready to elevate your incident management?

Start for free

The solution for operation teams.

Start for Free Learn more

Join our newsletter

Imprint Privacy Policy Cookie Preferences Legal

Open Preferences

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Discover Our Articles Here

Understanding Alert Noise and Its Impact

Alert Deduplication and Processing

Understanding Embedding Models

Implementing Alert Deduplication

1) Preprocessing Alerts

2) Generating Text Embeddings

3) Implementing Deduplication Logic

4) Continuous Feedback and Optimization

Key Considerations for Effective Deduplication

Reducing Noise with ilert

What problem do we solve?

Step 1: Identify an ilert resource ID you want to import

Method 1: Via ilert UI

Method 2: Via API

Step 2: Setup a Terraform block

Step 3: Execute the Terraform Import

Step 4: Complete the Configuration

Best practices and recommendations

What is Ubidots?

How Ubidots' Users Can Benefit from the Integration with ilert

How ilert Already Resolves the Problem of Alert Duplication

Intelligent Grouping: AI Looks Deep into Alerts

How Does It Work?

Video: How to Enable Intelligent Alert Grouping

Event Filter: Get Rid of Unimportant Noise

When Everything is on Fire, Let ilert Speak

When AIOps is a Must-Have

Understanding Alert Noise and Its Impact

Alert Deduplication and Processing

Understanding Embedding Models

Implementing Alert Deduplication

1) Preprocessing Alerts

2) Generating Text Embeddings

3) Implementing Deduplication Logic

4) Continuous Feedback and Optimization

Key Considerations for Effective Deduplication

Reducing Noise with ilert

What problem do we solve?

Step 1: Identify an ilert resource ID you want to import

Method 1: Via ilert UI

Method 2: Via API

Step 2: Setup a Terraform block

Step 3: Execute the Terraform Import

Step 4: Complete the Configuration

Best practices and recommendations

How ilert Already Resolves the Problem of Alert Duplication

Intelligent Grouping: AI Looks Deep into Alerts

How Does It Work?

Video: How to Enable Intelligent Alert Grouping

Event Filter: Get Rid of Unimportant Noise

When Everything is on Fire, Let ilert Speak

When AIOps is a Must-Have

What does Twilio do?

How does Twilio work?

Pros: Bypass Other Tools and Connect Your Monitoring with Twilio

Cons: Alerting is not Yet Incident Management

Step-by-step Instructions on How to Send Alerts via Twilio

Summary

Our team's favorites

Alerting with Twilio: Connect Your Monitoring with the Top-1 Communications Platform

How to Deploy Qdrant Database to Kubernetes Using Terraform: A Step-by-Outer Guide with Examples

How to Keep Observability Alive in Microservice Landscapes through OpenTelemetry

ITIL vs. DevOps: What is best for your organization?

Latest Posts

Reduce Noise through Intelligent Alert Grouping

Understanding Alert Noise and Its Impact

Alert Deduplication and Processing

Understanding Embedding Models

Implementing Alert Deduplication

1) Preprocessing Alerts

2) Generating Text Embeddings

3) Implementing Deduplication Logic

4) Continuous Feedback and Optimization

Key Considerations for Effective Deduplication

Reducing Noise with ilert

How to Import Existing ilert Resources into Terraform

What problem do we solve?

Step 1: Identify an ilert resource ID you want to import

Method 1: Via ilert UI