Try ilert AIOps

All-in-one Incident Management Platform

Manage on-call, respond to incidents and communicate them via status pages using a single application.

Trusted by leading companies

Highlights

The features you need to operate always-on-services

Every feature in ilert is built to help you to respond to incidents faster and increase uptime.

Harness the power of generative AI

Enhance incident communication and streamline post-mortem creation with ilert Al. ilert AI helps your business to respond faster to incidents.

Read more
Integrations

Deploy in minutes with 100+ ready-to-use integrations

ilert seamlessly connects with your tools using out pre-built integrations or via email. ilert integrates with monitoring, ticketing, chat, and collaboration tools.

Transform your Incident Response today - start free trial
Start for free
Customers

See how industry leaders achieve 99.9% uptime with ilert

Organizations worldwide trust ilert to streamline incident management, enhance reliability, and minimize downtime. Read what our customers have to say about their experience with our platform.

Stay up to date

Expert insights from our blog

Insights

Incident Response Management: A Category of Its Own

As Atlassian phases out Opsgenie, teams are rethinking incident response. Is IRM just a feature or a category of its own? This article explores that question, with insights from Opsgenie users migrating to ilert and a look at ilert’s vendor-neutral philosophy.

Birol Yildiz
Mar 28, 2025 • 5 min read

In recent weeks, I’ve spoken with several Opsgenie customers who are evaluating a migration to ilert after Atlassian’s decision to phase out Opsgenie and fold its functionality into other products. Atlassian is giving Opsgenie users “two options: move to Jira Service Management for robust end-to-end incident management, or move to Compass for alerting and on-call management.” This has raised a broader question in our industry: 

Is Incident Response Management (IRM) a standalone category or just a feature within larger platforms?

I want to reflect on that question and share why I firmly believe IRM remains a distinct, essential category—not merely a feature. I’ll highlight insights from those customer conversations and explain ilert’s vendor-neutral approach to integrations, which even led us to sunset our own uptime monitoring feature for the greater good of our ecosystem.

What Opsgenie’s transition taught us

First, let’s consider the insight from Opsgenie’s end-of-life. Along with PagerDuty, Opsgenie was a pioneer that helped build the incident response management category, so seeing it put on the shelf is bittersweet. Many of its users have expressed frustration that development stagnated as Atlassian integrated Opsgenie’s features into Jira Service Management (JSM). In fact, we have had customers switching to ilert way before Atlassian’s EOL announcement of Opsgenie, “citing Opsgenie’s stagnation as Atlassian folded its features into Jira Service Management.” 

This sentiment captures the crux of the issue: the all-in-one solution offered in JSM may include incident response features, but it can be cumbersome for teams that primarily need a nimble, real-time alerting and on-call management tool. Opsgenie’s fate illustrates the dilemma. Atlassian’s strategy treats incident management as a component of a broader suite (ITSM or a developer portal like Compass) rather than a product in itself. 

Opsgenie users I spoke with are weighing these Atlassian-provided paths, but many are also looking at dedicated IRM platforms because they feel something would be lost in translation if incident response became just another module inside a larger tool. Their intuition aligns with what we’ve long believed in the industry.

IRM: Feature or Standalone Platform?

It’s a fair question to ask: As adjacent software categories mature, could incident response simply become a feature of monitoring, observability, or ITSM platforms? After all, many monitoring tools now have alerting capabilities, and IT service platforms have incident modules. Atlassian’s move with Opsgenie is one prominent example of viewing IRM as a feature within a bigger product.

However, there’s a reason dedicated IRM platforms like ilert, PagerDuty and xMatters exist (and continue to thrive). The nature of incident response—bridging humans and complex systems under pressure—calls for a specialized focus. Treating IRM as just a checkbox feature risks oversimplifying what it does. The core value of an IRM platform is to act as the central dispatcher between people and systems during critical moments. This goes far beyond what a typical add-on feature can accomplish.

Let’s unpack that with an analogy: You wouldn’t consider “customer support” just a feature of your email service, even though you can technically manage support via email. Companies still invest in dedicated support platforms because specialization matters. Similarly, incident response has its own workflows and urgency that warrant a purpose-built solution.

Why Incident Response Management remains a distinct category

In my view, IRM stands as a distinct category for several key reasons:

  • Centralized alert dispatching: A true IRM platform serves as a hub for all critical alerts, regardless of source. It funnels signals from various monitoring, observability, and automation tools into one stream and ensures they reach the right people at the right time. This “single pane of glass” for incidents is difficult to achieve when incident management is scattered across different modules in different systems. Neither JSM nor Compass alone covers the need for a centralized alert dispatcher and incident management. By contrast, a dedicated IRM tool is built from the ground up to be that centralized dispatcher.
  • Specialized on-call and escalation workflows: IRM platforms provide rich capabilities like on-call scheduling, rotation management, multi-step escalations, automated stakeholder notifications, and postmortem tracking. These aren’t side features; they are the heart of the product. When incident response is a mere feature elsewhere, these capabilities often end up less flexible or buried behind other priorities. A distinct IRM system keeps the focus on minimizing response times and coordinating people efficiently during high-stress incidents—its entire roadmap revolves around these outcomes, not around broader IT processes or monitoring features.
  • Vendor-neutral integration hub: Perhaps one of the strongest arguments for IRM as its own category is integration breadth. Modern organizations typically use a heterogeneous set of tools: different monitoring systems (cloud provider monitors, application performance tools, etc.), logging and observability platforms, ITSM for ticketing, chat apps for collaboration, CI/CD pipelines, and more. An incident response platform needs to play nicely with all of them. If you rely on an incident feature inside one vendor’s platform, you might be limited in connecting to external tools. A standalone IRM platform is vendor-neutral by design, acting as a Switzerland that connects everything. For example, ilert deliberately does not compete with monitoring vendors; we focus on integrating with them. We even decided to discontinue our own built-in uptime monitoring feature so we could “maintain our vendor-neutral position” and avoid conflicts of interest with our monitoring partners. Being neutral ensures that the IRM system’s only goal is to reliably route alerts between all your systems and your people without bias toward where the data comes from.
  • Lightweight layer over existing tools: A dedicated IRM solution adds a thin but crucial layer on top of your existing infrastructure. It doesn’t replace your monitoring or your ticketing system. Instead, it makes them more effective by ensuring that alerts from the former get actionable response and by avoiding overload of the latter. In practice, many companies pair an IRM platform with their ITSM. For instance, you might continue managing incident records and compliance in ServiceNow but use ilert to handle the real-time paging and human coordination. The two systems complement each other: ServiceNow is excellent for structured ITIL workflows, while ilert serves as a dispatcher for critical alerts, integrating with over 100 monitoring, observability, ITSM and chat tools to trigger immediate action before a formal ticket is even filed. This kind of flexible orchestration is only possible when IRM is a separate, integrative layer rather than locked inside one of the tools.
  • Focus and innovation: Finally, keeping IRM as its own category fosters innovation. When a product’s sole mission is incident response, its team can iterate and improve on that problem faster than if incident features are just one item on a long list of priorities in a larger suite. The result is often more user-friendly on-call experiences, smarter alert routing (even leveraging AI for noise reduction or auto-remediation), and features like status pages and analytics that are deeply tuned to incident management needs. We’ve seen a wave of innovation from specialized IRM startups and platforms precisely because they are tackling this as a primary challenge, not a secondary feature.

Integration over competition: ilert’s vendor-neutral stance

One concrete example of treating IRM as a category is how we at ilert approach our product strategy. We believe an incident response platform should complement the rest of your toolchain, not compete with it. This philosophy is why we made the conscious choice to sunset our uptime monitoring offering. By stepping back from providing our own monitoring, we can fully embrace integrations with best-of-breed monitoring and observability tools used by our customers.

In our announcement about this change, we explained that discontinuing the feature allows us to maintain our vendor-neutral position for monitoring and avoid any potential conflicts of interest when engaging in partnerships with vendors of uptime monitoring software.

In other words, we never want ilert to favor one data source over another. Our job is to reliably route alerts from any source to the people who need to see them.

This vendor-neutral, integration-first approach has a big payoff for users: it means you can plug ilert into whatever systems you already have and trust that we’re focused solely on improving your incident response process. It’s the opposite of a walled garden. We’ve built 100+ integrations and even tailored our features to work hand-in-hand with systems like Jira, ServiceNow, Datadog, Amazon CloudWatch, Slack, Microsoft Teams, and so on. The feedback from former Opsgenie customers moving to ilert is that this openness and focus are exactly what they were looking for. They want their incident response platform to be an unbiased orchestrator, not pushing them to replace tools that already work well for them.

The IRM Platform as the Central Dispatcher

At its heart, an Incident Response Management platform is the central dispatcher between people and systems during an outage or critical event. Companies often have monitoring tools that detect issues and ticketing systems that record and assign work. But it’s the IRM platform that bridges the gap in real time, ensuring that when something breaks at 2 AM, the right on-call engineer’s phone rings, and the team can mobilize immediately. It coordinates humans (through alerts, escalations, and collaboration) in response to machine signals. 

This role is unique. If you try to handle it purely within a monitoring tool, you might get alerts out, but you miss the human workflow aspects (like escalations or communications across teams). If you try to handle it purely within an ITSM tool, you often sacrifice speed and simplicity (turning emergencies into tickets can introduce delay or bureaucracy).

The true measure of an IRM platform’s value is in how effectively it connects and accelerates your existing investments: your monitoring becomes more actionable, your on-call staff more effective, and your incident process more transparent. All of this happens without forcing you to change the tools you use for observability or ITSM. That’s why I see IRM as its own pillar in the tech stack—a mission control that sits alongside observability and ITSM, not inside them.

Closing thoughts

The question of “category or feature?” is a healthy one to revisit as platforms evolve. In the case of incident response management, my experience and recent customer discussions reinforce that it remains a category in its own right. 

The stakes during incidents are too high, the integrations needed are too many, and the workflows are too specialized for IRM to be an afterthought or merely a line-item feature. Instead, we should view IRM platforms as complementary partners to our monitoring, DevOps, and ITSM tools, each doing what they do best.

For ilert, this means continuing a calm and focused pursuit of being the best dispatcher of trust between all the systems that detect problems and all the people who solve them. We’ll integrate, orchestrate, and stay vendor-neutral, so our users can confidently rely on a platform that puts incident response first

In a world where everything from cloud services to ticketing systems is expanding in scope, there’s real value in something that deliberately stays specialized. Incident Response Management is this something—a standalone discipline and platform that ensures when things go wrong, they get fixed as fast as humanly (and technologically) possible.

Engineering

An ultimate step-by-step guide on Zabbix Cloud Monitoring

Learn how to set up Zabbix Cloud for AWS Auto-Discovery and receive critical alerts via SMS, phone calls, or push notifications.

Tim Nguyen Van
Mar 26, 2025 • 5 min read

Learn how to set up Zabbix Cloud for AWS Auto-Discovery and receive critical alerts via SMS, phone calls, or push notifications.

During the last Zabbix Summit, the company presented a cloud version of its well-known monitoring platform. We at ilert constantly see the growing popularity of Zabbix as more and more teams across the globe utilize it for their monitoring needs. To help users quickly adopt the new cloud version, we delivered this guide.

Why Zabbix 

Maintaining the functionality and health of cloud infrastructure, such as servers, virtual machines, databases, containers, and apps, is essential for companies of different sizes. Zabbix Cloud Monitoring is an effective instrument for keeping an eye on all these resources across well-known cloud providers like Google Cloud Platform (GCP), Microsoft Azure, and Amazon Web Services (AWS).

Zabbix Cloud Monitoring gives businesses proactive alerting, automatic anomaly detection, and real-time insight into their cloud infrastructures. In contrast to conventional monitoring solutions, Zabbix combines agent-based and agentless techniques to monitor important performance indicators, identify problems before they become more serious, and guarantee peak system performance.

What this guide covers 

This step-by-step guide will help you:

  • Set up and configure Zabbix Cloud Monitoring for AWS Auto-Discovery;
  • Integrate cloud services using API-based monitoring for full visibility
  • Create dashboards to proactively manage your infrastructure;
  • Receive critical Zabbix alerts via multiple channels, like SMS, phone calls, messenger, or push notifications with the help of ilert.

Prerequisites: What you will need to follow this guide

  • A registered account on Zabbix Cloud;
  • A Zabbix Cloud instance deployed and accessible via a web browser;
  • AWS Account with API Access;
  • IAM (Identity and Access Management) Policy: you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions;
  • CloudWatch Metrics: Ensure that CloudWatch metrics are enabled for your AWS resources, such as EC2 instances, RDS databases, and S3 buckets, to provide monitoring data.

Stage 1: Creating an IAM Policy for Zabbix

1. In AWS, open the IAM service and click “Policies.”

2. On the top right corner, click “Create policy.”

3. Select “JSON” and add the following configuration to the policy editor.

1{
2    "Version": "2012-10-17",
3    "Statement": [
4        {
5            "Action": [
6                "cloudwatch:Describe*",
7                "cloudwatch:Get*",
8                "cloudwatch:List*",
9                "ec2:Describe*",
10                "rds:Describe*",
11                "s3:ListAllMyBuckets",
12                "s3:GetBucketLocation"
13            ],
14            "Effect": "Allow",
15            "Resource": "*"
16        }
17    ]
18}

4. Enter a new name for the policy and click “Create policy.

5. Now, navigate to Users and click “Create user.”

6. Enter a user name and click “Next.”

7. Choose “Attack policies directly” in the Permission options and select the Zabbix policy.

8. Navigate to the created user and create a new access key.

9. Choose “Third-party service” and proceed to the next step.

10. An Access and a Secret access key have been created, which you will need in your Zabbix configuration.

Stage 2: Creating an AWS Discovery host in Zabbix Cloud

1. On the sidebar, navigate to “Data Collection” and select “Hosts.”

2. Enter a name for your Host, select AWS by HTTP as a template, add a Host group, and click “Add.”

3. Now, reopen the newly created Host and navigate to “Macros.” Add the following Macros: {$AWS.ACCESS.KEY.ID} {$AWS.REGION} {$AWS.SECRET.ACCESS.KEY} and fill the values with the Access key, Region, and the Secret access key.

4. Find the “Hosts” section in the “Monitoring” tab again; you can now see your hosts.

5. By clicking “Latest Data,” you can now see all the latest data received from your AWS EC2 instance.

Zabbix Dashboards

Zabbix Dashboards provide an easy-to-use interface for monitoring your infrastructure, including cloud environments. They give you an extensive overview of key metrics in one location, including database performance, storage usage, server health, and cloud resources.

Using Zabbix Dashboards for infrastructure and cloud monitoring, you can keep track of your resources more effectively. Key features of the dashboards are:

  • Customizable layout
  • Real-time monitoring
  • Various widget types (graphs, availability, status, maps, etc.)

Configuring Monitoring Dashboards

After setting up auto-discovery for AWS resources and integrating your AWS environment with Zabbix Cloud Monitoring, you may create monitoring dashboards to get complete insight into your cloud architecture.

1. Navigate to Dashboards and click “Create dashboard.”

2. Add a name and choose the owner of the new dashboard.

3. You can now add various widgets like graphs, maps, charts, availability status, and more to your dashboard.

Triggers and media types in Zabbix Cloud

Triggers and media types are essential for proactive monitoring. They enable you to automatically identify problems with your Cloud infrastructure, such as high CPU usage, low disk space, or service outages, and promptly alert you when it's crucial.

What are Triggers?

Triggers in Zabbix are expressions that evaluate the data gathered from monitored items (such as CPU usage, memory usage, disk space, etc.). When a predefined threshold is reached or exceeded, a trigger is activated.

Trigger examples:

  • CPU Utilization: A trigger could be set up to alert if an EC2 instance’s CPU usage exceeds 85% for more than 5 minutes.
  • Disk Space Usage: A trigger could be set to notify if an EC2 instance’s disk usage exceeds 90%.

Configuring Triggers

1. Navigate to “Monitoring,” then “Hosts.

2. Select the host for which you want to create a trigger and click “Triggers” under the “Configuration” section.

3. Now click “Create trigger.”

4. In this example, I’ll configure a CPU usage trigger.

5. After entering the trigger name, we can now add the expression. In this case, it will set the severity to “High” whenever the CPU usage is above 85% and will recover when the CPU usage falls below 80%.

What are Media types?

In Zabbix, Media types relate to the different options for receiving notifications or alerts when a trigger is active. A Media type specifies how and through which channels Zabbix will send notifications to users.

Zabbix supports a variety of media types, allowing you to customize alerting according to your preferences or requirements. Some common Media types include:

  • Email: Send notifications via email to alert users of any issues.
  • SMS: Send text messages (SMS) for mobile alerts.
  • Webhook: Trigger a custom action or integrate with third-party systems via webhooks.
  • Third-party integrations: Use external services or platforms, such as ilert, to route alerts to specific teams or applications, ensuring a smooth integration into your existing incident management processes.

Stage 3: Connect Zabbix with ilert using the ilert Media type

To connect Zabbix with ilert, create a new User in Zabbix and add ilert as a Media type.

Add the Integration key of your Zabbix alert source into the Send to field.

For further information, please refer to ilert's Zabbix Integration Guide.

Engineering

PWA Checklist: How to Ensure High Performance for Your Progressive Web App

Check the structured checklist that we use to measure and optimize ilert's PWA performance.

Jan Arnemann
Mar 19, 2025 • 5 min read

In this article, we’ll share the structured checklist that we use to measure and optimize ilert's PWA performance.

At ilert, we build our Progressive Web App (PWA) using Capacitor, Ionic, React, and MUI to deliver a robust and responsive incident management platform. Progressive Web Apps are revolutionizing web experiences by combining the best of web and mobile applications. They offer fast native-like experiences, offline capabilities, and many more. 

However, ensuring high performance is crucial for providing users with a smooth and engaging experience.

Understanding progressive web apps

Feel free to skip this chapter if you already have experience with progressive web apps.

Progressive web apps (PWAs) are a type of web application that provides a native app-like experience to users. Built using web technologies such as HTML, CSS, and JavaScript, PWAs are designed to work seamlessly across multiple platforms and devices. One of the standout features of progressive web apps is their ability to deliver a smooth and engaging user experience, even in areas with poor internet connectivity.

Unlike traditional websites, PWAs are designed to be installed on a user’s device, allowing for offline access and push notifications. This means users can continue to interact with the app even when they are not connected to the internet. 

Additionally, unlike native apps, PWAs are not platform-specific and do not require a separate codebase for each platform, making them a versatile and cost-effective solution for developers.

You might also be interested in how to migrate desktop components to a Progressive Web App.

Core Features of Progressive Web Apps

PWAs come with several core features that set them apart from traditional web applications and native apps. These features include:

  • Offline functionality: Thanks to caching and service workers, PWAs can function even when the user is offline or has a poor internet connection. This ensures that users can access essential features and content without interruption.
  • Installability: PWAs can be installed on a user’s device, providing the convenience of offline access and the ability to send push notifications.
  • Custom offline page: When users are offline, PWAs can display a custom offline page, enhancing the user experience by providing useful information or alternative actions.
  • Push notifications: PWAs can send push notifications to users, keeping them engaged and informed even when the app is not open.
  • App window: PWAs can be displayed in a custom app window, offering a native app-like experience that feels integrated with the device’s operating system.

Why progressive web apps performance matters

Performance is a critical factor for PWAs as it directly impacts user experience. Users perceive the performance of a PWA as critical to their overall experience, influencing their engagement and satisfaction. A fast-loading and responsive PWA encourages users to stay, interact, and utilize its features effectively.

One of the key metrics in performance evaluation is Time to Interactive (TTI). This is the time it takes for a PWA to become fully interactive. 

Ideally, your TTI should be below 3.8 seconds, which is considered fast and ensures a smooth user experience. A TTI between 3.9 and 7.3 seconds indicates moderate performance that needs improvement, while anything above 7.3 seconds is considered slow and likely to increase frustration and bounce rates.

PWA TTI Benchmarks

Measuring PWA performance

Before optimizing your PWA, it’s essential to establish a baseline by measuring current performance. Here are some effective ways to evaluate your PWA’s speed and responsiveness:

  • Browser developer tools: Inspect loading times and resource usage in Chrome DevTools.
  • Manual testing: Just clicking through the app can actually reveal a lot of performance issues and bad UX. Doing this after developing a new feature can bring significant insights for optimization. 

At ilert, manual testing is an important part of our performance evaluation. Whenever we develop a new feature, we actively test it by navigating through the app, identifying potential performance issues, and ensuring a smooth user experience. Manual testing also helps to identify performance issues that automated tools might miss. 

Following our performance checklist, we can proactively address issues before they impact our users.

Checklist: Improving PWA performance

1. Optimize bundle sizes

  • Reduce bundle sizes: Remove unused code and load images from a CDN to minimize bundle sizes.
  • Implement code splitting: Only load the necessary scripts and components that are required for the current page.


2. Implement Lazy Loading

  • Don't load images and content until they are needed.
  • Use skeletons wisely: Avoid excessive placeholders that may cause unnecessary lag.


3. Minimize artificial loading times

  • Eliminate unnecessary delays: Review intentional load times to ensure they are essential for UX.


4. Optimize app size

  • Trim audio files: Reduce unnecessary sound assets or cut them shorter.


5. Optimize network requests

  • Implement caching strategies: Enable offline functionality and reduce repeated network requests.
  • Ensure a fast backend: Reduce API response times through efficient backend design and optimized queries.

We are actively making incremental improvements with our PWA. Small changes such as optimizing skeleton loaders, reducing JavaScript bundle sizes, and ensuring a fast backend have significantly reduced load times and improved our time to interactive (TTI). Feel free to copy the checklist from the article and use it when reviewing your PWA next time.

Explore all
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.