Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Our team's favorites

Daria Yankevich

Alerting with Twilio: Connect Your Monitoring with the Top-1 Communications Platform

Pros and cons of enabling direct notifications for critical alerts

Read more ->
Roman Frey

How to Deploy Qdrant Database to Kubernetes Using Terraform: A Step-by-Outer Guide with Examples

There is no Terraform deployment guide for Qdrant on the internet, only the Helm variant, so we decided to publish this article.

Read more ->
Christian Fröhlingsdorf

How to Keep Observability Alive in Microservice Landscapes through OpenTelemetry

Observability, beyond its traditional scope of logging, monitoring, and tracing, can be intricately defined through the lens of incident response efficiency—specifically by examining the time it takes for teams to grasp the full context and background of a technical incident.

Read more ->
Daniel Weiß

ITIL vs. DevOps: What is best for your organization?

Read more ->

Latest Posts

Engineering

Alerting with Twilio: Connect Your Monitoring with the Top-1 Communications Platform

Pros and cons of enabling direct notifications for critical alerts

Daria Yankevich
Aug 06, 2024 • 5 min read

You might be surprised. Why does ilert, the platform dedicated to alerting and incident management, publish anything about the direct (in the sense of bypassing an incident management tool) connection between monitoring solutions and Twilio? Do they take the bread out their own month? —You might think. Working on DevOps incident management since 2009, we believe every solution fits specific needs. So, in this article, we will uncover in what cases direct alerting with Twilio might work well, how to connect Twilio with your monitoring, and when it's time to consider a comprehensive incident management platform. 

What does Twilio do?

Twilio is a cloud communications platform that allows developers to integrate various communication methods into their applications. This includes voice, messaging (SMS, MMS, chat), video, and email. Twilio's services are designed to make it easy for developers to add communication features without having to build the infrastructure themselves. 

Twilio is an industry leader. It does have competitors, like Vonage (formerly Nexmo), Plivo, Sinch, and MessageBird, but by July 2024, it's a solution number one, according to Gartner. Dozens of millions of developers across the globe use it for their products. So, if you have recently received a notification from Airbnb or Uber, there is a high chance Twilio processed it. Incident management platforms such as PagerDuty, VictorOps, and your humble servant ilert also run notifications on Twilio.

How does Twilio work?

Twilio is a cloud-based platform that integrates various communication methods into applications using a set of APIs. Users start by creating an account on the Twilio website and accessing their unique Account SID and Auth Token from the Twilio Console, which are used for API authentication. Developers then choose the desired communication service and, if necessary, purchase phone numbers via the Twilio Console. Using Twilio's SDKs for different programming languages, they can write code to send requests to Twilio’s API endpoints, facilitating actions like sending SMS, making voice calls, or initiating video conferences.

Pros: Bypass Other Tools and Connect Your Monitoring with Twilio

Developers consider using Twilio alerting for incident management purposes for multiple reasons. Here are the most prominent of them.

Cost-effective. Twilio's pay-as-you-go pricing model makes it ideal for startups that don't need many notifications.

Simplicity. The fewer dependencies you have, the better. Direct integration reduces the number of tools and platforms that need to be managed, simplifying the overall system architecture.

Complete control over data flow. You see and take care of the event flow yourself.

Easy to implement. Twilio has a straightforward API and extensive documentation.

Cons: Alerting is not Yet Incident Management

While it might be convenient to receive an alert right from the monitoring tool, there are incident management protocols that cannot be followed without proper tools. Don't get it wrong: protocols without practical application are nothing, but millions of IT incidents have taught the DevOps community how to approach critical situations and reduce incidents' impact. By the end of the day, engineers require altering not for fun but to be aware of serious issues that influence business (read—service availability, customer satisfaction, and revenue), so the stakes are high. Here are the disadvantages of using Twilio as a standalone incident management tool.

No escalation possibilities. Advanced escalation policies, such as routing alerts based on on-call schedules or incident severity, are not supported out of the box. 

No centralized incident management. Unlike dedicated platforms, Twilio does not offer features like incident tracking, automatic resolution workflows, status pages, or post-incident analysis. Developers will have to handle all these manually and, ironically, in many cases, this will require purchasing a few additional tools.

Custom development and maintenance. Setting up and maintaining direct integrations requires custom scripting and ongoing development work. The same goes for keeping custom integrations up-to-date with changes in monitoring tools or Twilio APIs.

Scalability issues. While Twilio can handle large volumes of messages, managing and processing a high volume of alerts directly can be challenging.

Alert fatigue. This is connected to the previous point. Without sophisticated filtering, grouping, and deduplication features, there's a risk of receiving too many alerts. Imagine waking several nights in a row or being constantly interrupted during a working day.

Limited collaboration features. After receiving alerts, developers have to take action. In most of the cases, IT incidents are not handled by one person only. The lack of a centralized communication space where all alert details and timeline are available for engineers may lead to communication gaps and inefficiencies in coordinating incident response. 

Missing decoupled infrastructure and high availability. It's overlooked that hosting alerting scripts or software on the same hardware or in the same datacenter as other software can be problematic. If there's downtime, the alerting system is likely to be affected, causing missed alerts. Additionally, maintaining high uptime above 99.9% becomes more challenging.

Geographical limitations. If you have a distributed team, it might be complicated to set up SMS and voice alerting in many countries. Different regional policies and restrictions exist, some of which prohibit calling or delivering messages.

Harder to adhere to an SLA commitment. Twilio can also experience downtime. In such situations, incident management platforms like ilert have a backup plan and can automatically switch to a different provider to minimize outage for clients. Relying solely on Twilio makes it challenging to guarantee a high uptime percentage to your customers, as your uptime depends heavily on the service.

Are you still unsure how to proceed—with Twilio or with a more advanced alert and incident management? We simplified the decision process for you. Below, you will find a brief checklist that will help you. If you don't tick all the boxes, we recommend deciding in favor of an incident management platform. 

  1. You are a small company with no more than 2–3 engineers.
  2. You have a single monitoring tool and don't plan to add more in the next year.
  3. Your monitoring solutions fire less than 250 alerts per month.
  4. Your responders are all in the same region.
  5. Your use case doesn't require a high uptime SLA guarantee.

Step-by-step Instructions on How to Send Alerts via Twilio

  1. Go to the Twilio website and sign up for an account.
  2. After signing up, you will get your Account SID and Auth Token. Keep these credentials safe.
  3. Ensure you have Node.js installed on your machine. You can download it from here.
  4. Initialize a new Node.js project and install the Twilio library via npm install twilio. Then, setup a Twilio client in your script using your credentials:

const twilio = require("twilio");
const client = new twilio("ACCOUNT_SID", "AUTH_TOKEN", {
  autoRetry: true,
  maxRetries: 3,
});

  1. Use the client to send an SMS using Twilio:

function sendSmsAlert(message, to) {
  client.messages.create({
      body: message,
      to,  // recipient's phone number E164 format
      from: "YOUR_TWILIO_NUMBER"
  })
  .then((message) => console.log(`Alert sent: ${message.sid}`))
  .catch((error) => console.error(`Failed to send alert: ${error.message}`));
}

sendSmsAlert("Server CPU usage is above threshold", "+1234567890");

  1. Or use the client to call using Twilio:

function makeVoiceCallAlert(to) {
  client.calls.create({
      url: "http://demo.twilio.com/docs/voice.xml", // URL of TwiML instructions
      to, // recipient's phone number E164 format
      from: "YOUR_TWILIO_NUMBER"
  })
  .then((call) => console.log(`Alert call initiated: ${call.sid}`))
  .catch((error) => console.error(`Failed to initiate alert call: ${error.message}`));
}

makeVoiceCallAlert("+1234567890");

  1. If you are using a monitoring tool like Prometheus, Nagios, or another system, you can integrate the SMS sending or phone calling logic within the alert handler or use a webhook to trigger the sendSmsAlert or makeVoiceCallAlert function.

Summary

Twilio is a reliable solution with a strong market presence. In some cases, standalone, it can work well for diving into IT and DevOps alerting purposes. Small teams with limited budgets, a low volume of alerts, and only a single monitoring tool will benefit from using Twilio for alerting purposes. In contrast, teams that handle extensive amounts of events from various monitoring solutions, need comprehensive communication during incidents and have high finance and reputation risks should consider incident management platforms to mitigate downtimes.

This is a reminder that ilert offers a Free plan for small teams. With it, you can handle up to 100 SMS and voice messages, unlimited push and email notifications, use as many monitoring integrations as you like and take advantage of a status page. Learn more about ilert's pricing.

Product

HetrixTools and ilert: Augment your Uptime and Blacklist Monitoring with Powerful Incident Management

ilert users can now seamlessly connect ilert with HetrixTools' monitoring capabilities.

Daria Yankevich
Aug 01, 2024 • 5 min read

ilert users can now seamlessly connect ilert with HetrixTools' monitoring capabilities. This streamlined integration ensures smooth IT operations with minimal downtime and faster issue resolution.

What is HetrixTools?

HetrixTools provides monitoring solutions designed to help businesses thoroughly oversee their IT infrastructure. Their wide array of services includes uptime monitoring, server monitoring, and blacklist monitoring, enabling users to stay consistently informed about the status and well-being of their systems.

Key features of HetrixTools include:

  • Uptime Monitoring: Users can track the availability of websites and services to ensure they are accessible 24/7. HetrixTools checks 12 monitoring locations around the world so that any outage is immediately detected.
  • Server Monitoring: HetrixTools' customers have control over various crucial metrics such as CPU usage, RAM usage, disk space, and more.
  • Blacklist Monitoring: The solution monitors IP addresses and domains against over 100 blacklists. This coverage ensures that engineers are promptly alerted if any of the IPs or domains are blacklisted.

How HetrixTools Users Can Benefit from Integration with ilert

The connection of HetrixTools with ilert brings a new level of efficiency and responsiveness to monitoring and incident management processes. Here are a few advantages users are getting from this integration

  1. Various alerting channels. Developers will instantly receive critical alerts from HetrixTools through multiple channels, such as SMS, phone calls, and push notifications, even when devices are muted.
  2. Automate on-call management. ilert eliminates the manual effort and potential errors associated with managing on-call duties. While HetrixTools detects an issue, ilert ensures that there is always someone on-call to look into the problem promptly.
  3. Integrate with multiple tools. Users can connect their HetrixTools' monitoring with various IT service management (ITSM) tools, like ServiceNow, Jira, Datto Autotask, via ilert. This allows for a more cohesive and automated incident response workflow.
  4. Post-Incident Analysis. ilert provides detailed reports and analytics in combination with all incident-related communications from chat tools to help users understand what went wrong and how to prevent similar issues in the future. This continuous improvement cycle is crucial for maintaining a robust IT infrastructure.

By leveraging the strengths of both platforms, ilert and HetrixTools, users can ensure that their IT infrastructure is monitored comprehensively and managed proactively.

For more information on how to set up this integration, visit our integration guide.

Insights

Leveraging AI for Efficient On-call Scheduling

This article introduces the use cases of GenAI across the stages of the incident management process, beginning with the preparation stage. It explains how AI can be leveraged for efficient, effective, and accurate on-call scheduling, including examples from ilertAI.

Sirine Karray
Jul 26, 2024 • 5 min read

Introduction

Regardless of industry specifications, creating and maintaining a highly functional incident management process is crucial for organizations of all sizes. The various potential applications of Generative AI in this process can significantly enhance the efficiency, accuracy, and speed of incident detection, analysis, and resolution. GenAI can be utilized across all stages of the incident management process, including preparation, response, communication, and learning.

In this article we will start with the preparation stage.

Prepare: Using AI Assistants for On-call Scheduling

Creating an on-call schedule that balances team needs and ensures coverage is crucial for incident management. AI Assistants can streamline this process. By employing AI Assistants, complex scheduling requirements, such as follow-the-sun rotations, become manageable. An intuitive chat interface powered by an LLM can guide users through setting up their schedules, asking relevant questions to understand specific requirements and preferences. This AI-assisted approach simplifies scheduling, making it less time-consuming and more tailored to the unique dynamics of each team.

The AI Assistant engages the user in a conversation to gather necessary details for the schedule. This involves asking about involved team members, rotation types, and on-call coverage. The Assistant's ability to parse natural language enables it to understand and categorize user responses into structured data that can be used in the next steps. The process begins with understanding user inputs and then executing functions to generate the schedule.

Steps for Creating an On-call Schedule

1. Understanding User Inputs:

The Assistant initiates the process by engaging the user in a conversation to gather all the necessary details for creating the schedule. This involves asking about the team members, types of rotations, and on-call coverage. Thanks to its natural language processing abilities, the Assistant can understand and organize the user's responses into structured data for the next steps. The instructions for this conversation are provided to the Assistant.

2. Executing Functions to Generate the Schedule:

After processing and organizing the input data, the Assistant uses the function calling feature to run a custom function specifically designed for schedule creation. This function takes the prepared data and designs the on-call schedule, ensuring all requirements and constraints are satisfied. The end result is a JSON document that represents the finalized on-call schedule.

This use of OpenAI's function calling feature highlights the Assistant's capability to connect conversational input with programmatic output, allowing for complex task automation like schedule creation within a conversational interface.

Below is a sample conversation with ilert AI to generate a follow-the-sun schedule:

Besides AI-assisted on-call scheduling, LLMs can be leveraged to respond to incidents by reducing noise through intelligent alert grouping, enhancing incident communications, and creating thorough postmortem analyses.

Ready to elevate your incident management?
Start for free
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.