Try ilert AIOps

All-in-one Incident Management Platform

Manage on-call, respond to incidents and communicate them via status pages using a single application.

Trusted by leading companies

Highlights

The features you need to operate always-on-services

Every feature in ilert is built to help you to respond to incidents faster and increase uptime.

Harness the power of generative AI

Enhance incident communication and streamline post-mortem creation with ilert Al. ilert AI helps your business to respond faster to incidents.

Read more
Integrations

Deploy in minutes with 100+ ready-to-use integrations

ilert seamlessly connects with your tools using out pre-built integrations or via email. ilert integrates with monitoring, ticketing, chat, and collaboration tools.

Transform your Incident Response today - start free trial
Start for free
Customers

See how industry leaders achieve 99.9% uptime with ilert

Organizations worldwide trust ilert to streamline incident management, enhance reliability, and minimize downtime. Read what our customers have to say about their experience with our platform.

Stay up to date

Expert insights from our blog

Product

New Features: Heartbeat 2.0, Holidays, Branded Status Page Login, and much more

Get the latest on ilert: improved Heartbeat monitoring, smarter holiday settings, mobile app upgrades, AIOps release, and enhanced integrations.

Daria Yankevich
Apr 24, 2025 • 5 min read

Welcome to the ilert quarterly product updates! If you missed the winter round-up, check the previous issue and learn more about ilert Deployment events, call flow AI voice agent, updated reports, and more.

ilert Heartbeat monitoring 2.0

At ilert, we do our best to provide as many sources as possible to send alerts to our platform. While our integrations catalog is constantly growing, Heartbeat is the only monitoring option available in ilert out of the box. It helps to check connectivity between users' systems and tools and ilert. With the recent update, we significantly improved this feature.

For those who have yet to try this feature in ilert, a monitor sends HTTP requests, aka heartbeats, at regular intervals to a chosen destination and checks if the signal is received on time. If the heartbeat fails to arrive, it means something might be wrong, and the monitoring tool triggers an alert.

First and foremost, a Heartbeat monitor is now a separate entity in ilert. Users will note that monitors are now live in the separate section of the navigation bar. This is not just a rearrangement; with this change, heartbeat monitors have received all vast alerting settings that other third-party integrations have in ilert, including various grouping and filtering options. So, to set up a new monitor, you will first need to visit the "Heartbeat monitoring” section and create a monitor that will ping the service in the chosen interval, and then visit Alert sources to specify alerting settings for a newly created monitor. Furthermore, this approach simplifies the management of monitors.

Additionally, with this change, all monitors can target one alert source, which means you can unify alerting settings for as many heartbeat monitors as you like. This significantly reduces time for adjustments, especially for teams with dozens of Heartbeats in ilert.

Heartbeat monitoring is included in all ilert plans, even the Free. However, users can now buy additional monitors as add-ons right from their account. Learn more about the add-on in the ilert pricing

Email alert source reworked

Email as an alert source is now also more advanced than it was before. Emails sent to ilert are treated as alerts from other monitoring tools, meaning all templating, filtering, and dynamic routing settings are applied. The interface was also improved to simplify the setup. 

Managing Holidays

ilert’s holiday feature, built into the Support Hours settings, makes it easy to manage exceptions to your regular support schedule. Whether it’s a national holiday, a company-wide day off, or any other irregular non-working day, you can account for it without manually editing your on-call rotations or escalation policies. It’s a smart, streamlined way to keep your team’s availability accurate—no last-minute adjustments required. You can either manually create holidays in ilert or transfer the list of days from a country-specific list. Continue reading a step-by-step instruction on how to set up holidays in ilert.

Status pages refinements

Status page widget live updates

Your users don't have to update a page to see the changes in the widget. The widget refresh is now automatic, so your clients will notice the updates as soon as they occur.

Branded status page login

Private and audience-specific status pages now display the page logo and favicon in the login form. This is a great way to have a smooth user experience that fully reflects your brand identity.

Status pages custom analytics

If you're using a custom domain for your status page, you can now integrate external analytics tools to better understand your audience. Whether you're curious about where your traffic is coming from or how users interact with your page, ilert supports two options: Google Analytics and PostHog. Just add your tracking key to connect your preferred platform and start gaining insights into your status page's performance and usage.

Call flow improvements

ilert hotlines can now forward calls directly to external support numbers, even if they use IVR menus—no confirmation needed from the other end. This update ensures smooth call handoffs without requiring manual workarounds or dedicated ilert users. It’s a simple way to connect your callers with third-party support while keeping your workflows clean and efficient.

With the latest release, you can easily copy a node or the whole subtree, or remove it by clicking the three dots. To paste the branch, simply copy it first, then choose a place where you'd like to add it, click the plus icon, and then click “Paste," which will appear as the first available action in the menu. 

Also, it's simpler now to duplicate call flows. Just go to your Call flow list and click the three dots to create a copy of the previously created tree. 

Additionally, choosing a voice that will accompany your call flow is easier now. You will find the AI voice menu at the top of the call flow editor. There, you can also test and listen to them all to pick the best option for your company. The voice of your choice will be applied to the whole call flow. 

Voicemail is currently also visible in call logs. You will easily find left messages with the help of the icon in the Status column.  

Audit logs enhancements

With the recent updates, you can download logs as a CSV file

Furthermore, you can navigate to audit logs from the detail view pages of Call Flows, Escalations, Status Pages, Services, Metrics, and other features within ilert. For example, if you noticed changes in the Support hours, you can find a support hour schedule that you are interested in, click the three-dot icon on the right side of the screen, and choose “Go to audit logs.” It is a fast way to track changes and usage across your organization.  

As a reminder, audit logs are available for ilert Enterprise customers. They are accessed via the Settings menu (the cog icon at the top right corner of the ilert navigation panel). If you want to enable Audit logs for your account, just message us at support@ilert.com.

ilert mobile app

incident management app

While it's hard to enjoy the sound of incoming critical notifications, we did our best to improve this experience for you. You will find various sound options to ensure you never miss an incoming alert. We also introduced short and long tones, so you can choose something that will for sure catch your attention, but won't give a heart attack.

The list of incidents in the ilert mobile app is now enhanced with more filtering options. You can choose filtering by services, status pages, statuses, and time frames. 

Also, the alert list and alert details look way better on mobile. We simplified and cleaned the interface to make it more intuitive to navigate the most critical section of the platform. 

Call logs—incoming phone calls that go through your ilert call flows—are now also visible in the mobile app. If you are using the Call flow add-on, you will find logs in the navigation panel, right after the Incidents section. 

Coverage requests are shown for 24 hours before they are moved to “Past requests” to give users more time to react and take a colleague's shift. 

Remember to download the ilert mobile app for Android or iPhone.

AIOps is out of BETA

ilert AIOps is no longer in BETA version and is available for purchase as an add-on. The set of features is designed to make alerting smarter and more efficient by reducing noise and ensuring that only relevant alerts reach on-call teams. It automatically groups related alerts and filters duplicates so teams can focus on what really matters. It helps detect incidents faster, cut through the chaos during outages, and ultimately reduce alert fatigue. Read in our blog a detailed view on how ilert intelligent alerting features bring clarity and calm to incident response.

Minor improvements

Alert templates are supplemented by new fields: eventType and alertKey to help fine-tune alerting preferences. 

More notification time options are available for ilert maintenance windows to provide you with more flexibility in informing your users and stakeholders about the upcoming maintenance. You can also double-check who will be notified by clicking “Show details” right below the "Schedule maintenance window” button. 

The Alert sources menu is easier to navigate. You can use filters to find the needed solution. Don't forget that if you cannot find your solution in the list, simply leave the name of the tool in the field at the bottom of the page, and we will reach out to you for further use case clarifications.

More integrations

Find new alert sources in the ilert catalog!

Dash0—an AI-powered, OpenTelemetry-native observability platform that helps developers and SREs troubleshoot and resolve incidents faster by providing a high-quality experience for exploring logs, metrics, and traces—all in one place.

SAP Focused Run—an advanced operations platform designed for large-scale IT landscapes, providing high-volume system monitoring, alerting, and analytics for SAP and non-SAP environments.

IT Conductor—a patented, cloud-based service orchestration and automation platform designed to monitor, manage, and orchestrate enterprise IT through intelligent automation. It provides full-stack SAP monitoring, management, and orchestration, streamlining end-to-end management of the entire SAP landscape. 

Checkmk integration was enhanced. It now has a bi-directional option, so users can acknowledge, clos, or annotate Checkmk events from ilert. 

Improved Cisco Meraki alert source can now automatically resolve the corresponding alerts in ilert. Just check that alert type IDs are listed in the ilert Documentation article

Moreover, we introduced Argo CD in the list of ilert deployment integrations. ArgoCD is a GitOps tool for Kubernetes that automates deployments by syncing the desired state from Git, ensuring consistent and auditable releases. With the ilert Deployment integration for Argo CD, you can display your deployment pipelines in ilert and expand the context of alerts.

Insights

Top 5 Incident Response Platforms for 2025

Looking for an OpsGenie or PagerDuty alternative? Here's the list of the best incident response solutions in 2025.

Daria Yankevich
Apr 10, 2025 • 5 min read

An incident response platform helps organizations manage, track, and resolve IT incidents quickly and efficiently. With the right platform, teams can minimize downtime, reduce the impact of incidents, and improve overall response times. 

In this article, we’ll explore the top 5 incident response platforms for 2025, helping you choose the best solution for your needs.

This list is slightly biased — after all, we do offer a full end-to-end incident management platform ourselves. That said, we’ve made every effort to keep things fair. The platforms we’ve included are trusted, robust, and capable of handling all your operational needs. We’ve also broken down their similarities and differences to help you navigate the landscape and find the right fit—even if it’s not us.

Key Takeaways

  • Selecting an incident management tool is critical for effective incident management, especially for companies navigating EU regulations and recent industry changes like OpsGenie’s EOL.
  • Key features to look for in incident response and management include multi-channel alerting, automated workflows, customizable escalation policies, and robust integrations with existing systems.
  • Leading platforms offer advanced functionalities tailored for various organizational needs but can vary significantly in cost and suitability for different team sizes.

Key Features of Leading Incident Response Platforms

When evaluating platforms in 2025, several core features stand out as essential for engineering and operations teams. Let's start with alerting features. First and foremost, alerting must be multi-channel—supporting voice calls, SMS, push, email, and chat tools like Slack or Microsoft Teams—and fully actionable without requiring the user to log in or switch apps. Time-to-response is critical, and eliminating friction at this step can mean the difference between a minor service disruption and a major outage. Advanced capabilities such as alert deduplication, intelligent grouping, noise reduction through filtering rules, and reusable templates help reduce alert fatigue, ensuring that responders only receive relevant and high-priority signals.

Another critical component is on-call management. Platforms should offer automated on-call scheduling with support for rotations, overrides, and hand-offs, as well as fully customizable escalation policies, ensuring the right person is notified based on severity, time of day, or other dynamic conditions. It's also important that the UI is convenient and easy to use for all members of on-call teams.

Integration capabilities are key for embedding the incident response process into your existing tooling. Leading platforms offer native integrations with monitoring and observability tools (like Prometheus, Datadog, or PRTG), log aggregators (such as Loki), ITSM tools (e.g., ServiceNow, Jira Service Management), and CI/CD systems (like GitHub or GitLab). These integrations ensure seamless data flow and enable fast context gathering during incidents.

Status pages are another valuable asset. They allow teams to communicate transparently with users and stakeholders during outages, reducing support load and building trust.

Finally, post-incident analysis is no longer a nice-to-have. Platforms should support automated postmortem creation by capturing timelines, chat logs, alerts, and resolution steps. This not only reduces administrative overhead but also enables teams to focus on root cause analysis, lessons learned, and continuous improvement.

In short, a modern incident management platform should act as a control center—tightly connected with your stack, automating where it can, and enabling humans to focus on the decisions that matter most.

ilert: A European powerhouse for end-to-end incident management

ilert is a modern, European-based incident management platform that delivers end-to-end workflows—combining powerful alerting, on-call management, automation, and status communication in a single, unified solution. With multi-channel, fully actionable alerts (SMS, voice, push, email, Slack, MS Teams), ilert ensures fast response times and a seamless on-call experience.

Its intelligent alert processing includes AI-powered deduplication, grouping, dynamic routing, flexible templating, and 100+ integrations with tools like Prometheus, Zabbix, Grafana, Datadog, and AWS CloudWatch. The intuitive on-call scheduler supports rotations, overrides, and escalation policies, all configurable via an easy-to-use UI or mobile app.

ilert’s advanced call routing acts as a smart hotline, featuring multi-language IVR menus, an AI voice agent, PIN code protection, blocked number handling, and voicemail fallback — making it ideal for operations teams and MSPs.

Integrated status pages (public, private, or audience-specific) allow real-time incident communication and reduce support load. Unlike standalone tools, ilert's status pages are natively integrated, enabling full automation and consistency.

As a Germany-based company, ilert is GDPR-compliant and offers EU data residency, making it the go-to choice for privacy-conscious organizations. It's a more agile, customer-centric alternative to PagerDuty and Opsgenie — especially after Opsgenie’s EOL — and is trusted by enterprises like IKEA, Lufthansa Systems, Adesso, and NTT Data.

ilert supports a wide range of use cases — from DevOps and SecOps to industrial operations — and excels in serving MSPs and IT service providers, with features like multi-tenant support, custom alert routing, and SLA-focused design.

PagerDuty: A Veteran in incident management

PagerDuty has long been considered a pioneer in the incident management space. Founded in 2009, the platform has evolved into a comprehensive solution tailored primarily for DevOps and SRE teams in large, complex environments. It offers a mature feature set that includes multi-channel alerting, on-call management, escalation policies, and real-time incident tracking.

One of PagerDuty’s strengths lies in its extensive integration ecosystem, supporting hundreds of tools such as Datadog, New Relic, AWS CloudWatch, Splunk, and more. It also features event intelligence, using machine learning to automatically suppress noise, correlate related alerts, and prioritize incidents — helping reduce alert fatigue and focus teams on what matters most.

For larger enterprises, PagerDuty offers Runbook Automation, Service Graphs, and Business Impact Metrics, making it easier to manage dependencies, assess incident impact, and align technical operations with business priorities.

However, this depth and breadth come with trade-offs. Many teams — especially those in mid-sized companies or with simpler needs — report that PagerDuty can feel overly complex and rigid, with a steep learning curve and a pricing model that quickly scales with team size and advanced feature usage.

In short, PagerDuty remains a robust and trusted platform, especially for large enterprises with advanced automation and integration needs. But for teams seeking a more agile, cost-effective, and privacy-compliant solution — particularly in Europe — there are now modern alternatives better suited to evolving operational demands.

Looking for a PagerDuty alternative? Check the comparison between PagerDuty and ilert.

xMatters: Advanced workflow automation

xMatters is an established player in the incident management space, with a strong focus on workflow automation and event-driven orchestration. Designed to support DevOps, ITOps, and business continuity teams, xMatters enables organizations to build custom workflows that connect monitoring systems, notification channels, ticketing tools, and more — all through a low-code interface.

Its incident response capabilities include multi-channel alerting, on-call scheduling, escalations, and automated response actions. What sets xMatters apart is its ability to let users define automated workflows that trigger based on specific conditions.

However, xMatters can feel more focused on process automation than on hands-on, engineer-friendly incident resolution. Teams looking for an intuitive UI and tight integration with modern DevOps workflows may find it less direct than alternatives like ilert or PagerDuty. Additionally, its user interface and setup process can be perceived as complex, especially for smaller teams or those without dedicated tooling engineers.

While xMatters is a solid choice for organizations that prioritize event orchestration and workflow design, it may be overkill for teams simply looking for fast, effective incident alerting and response. That said, for enterprises with sophisticated ITSM needs and a strong focus on process automation, xMatters remains a powerful and highly customizable platform.

Grafana IRM: Unified incident response for Grafana ecosystem

Grafana IRM (Incident Response & Management) is the new, integrated incident management solution from Grafana Labs, combining the capabilities of Grafana OnCall and Grafana Incident into a single, cloud-based platform. Built natively into the Grafana Cloud ecosystem, Grafana IRM aims to simplify the entire incident lifecycle — from detection to resolution — for teams already using Grafana for observability.

One of the key advantages of Grafana IRM is its seamless integration with Grafana Cloud monitoring tools like Loki, Tempo, and Prometheus. Teams can create, track, and resolve incidents directly from their dashboards without needing to jump between multiple systems. The platform includes built-in on-call scheduling, automated escalations, and incident tracking, all accessible from a unified interface. It also supports customizable workflows, helping teams define how alerts are routed, how incidents are escalated, and how post-incident reviews are handled — all while keeping stakeholders in the loop via native notifications.

For teams already invested in Grafana Cloud, IRM offers convenience and speed. It reduces tool sprawl, lowers onboarding complexity, and keeps incident response tightly aligned with monitoring and logging. However, the platform may not be ideal for teams with hybrid or diverse monitoring stacks outside of Grafana Cloud, as it is tightly coupled to the Grafana ecosystem. Additionally, some advanced enterprise-grade features — such as AI-based alert deduplication, voice-based incident routing, or multi-tenant support — are better covered by dedicated platforms like ilert or PagerDuty.

Grafana IRM is the future-facing replacement for Grafana OnCall, which officially entered maintenance mode in March 2025

Overall, Grafana IRM is a solid and integrated option for Grafana Cloud users seeking a native, streamlined incident response experience—but it may serve best as a complement or starting point rather than a fully standalone platform for complex or non-Grafana environments.

OpsGenie: solution for Jira Service Management users

Opsgenie, once a go-to solution for incident alerting and on-call management, has long been part of the Atlassian ecosystem. Known for its clean interface, solid alert routing logic, and tight integration with Jira and Confluence, Opsgenie served many DevOps and IT teams well—especially those already invested in Atlassian products.

The platform offered core features like on-call scheduling, multi-channel alerting, escalation policies, and integrations with popular monitoring tools such as Datadog and Prometheus. Its alert customization and incident timeline features made it a practical choice for managing critical events, with support for collaboration tools like Slack.

However, Opsgenie will be phased out and merged into Atlassian’s broader ITSM suite, primarily Jira Service Management (JSM). This shift has introduced challenges for teams that relied on Opsgenie as a standalone, lightweight incident response tool. The tighter coupling with JSM increases complexity and may not suit agile DevOps teams or service providers seeking flexibility and speed.

As a result, many organizations are now actively searching for an Opsgenie alternative—one that delivers the same reliability with more responsive support, a dedicated roadmap, and deeper flexibility. Platforms like ilert have emerged as top choices, offering seamless migration paths, GDPR compliance, and advanced alerting, scheduling, and automation capabilities that go beyond what Opsgenie provided. Meanwhile, if you are using JSM and plan to continue doing so, Opsgenie is still a great solution that will soon merge into the familiar platform.

Summary

Choosing the right incident response platform is crucial for maintaining service reliability and ensuring quick resolutions to incidents. Each of the platforms reviewed in this blog post offers unique strengths and features, making them suitable for various organizational needs.

Product

Postmortem Template to Optimize Your Incident Response

Discover key elements of a postmortem template and get a free download to improve incident response—even without an incident management platform.

Marko Simon
Apr 01, 2025 • 5 min read
Download postmortem template

A postmortem template is a structured tool for documenting incidents, understanding their causes, and learning how to prevent them in the future. This article explains the essential elements of an effective postmortem and how ilert can streamline this process, making your incident response more efficient. It also offers a downloadable version of a postmortem template that you can use if you haven't yet utilized an incident management platform in your organization.

Key takeaways

  • Postmortem templates turn incidents into valuable learning opportunities, helping teams identify vulnerabilities and improve future responses.
  • Postmortems are used for further improvements within the teams and external communication with stakeholders.
  • Key elements of an effective postmortem include an incident timeline, impact and mitigation details, and a root cause analysis for continuous improvement.
  • ilert streamlines the postmortem process by automating data collection and promoting a blameless culture that focuses on learning rather than assigning fault.

The importance of an incident postmortem in incident management

Postmortems are more than just documents; they’re blueprints for turning incidents into invaluable learning opportunities. Documenting incidents in a structured manner helps pinpoint system vulnerabilities and enhance your team’s future responses. This method not only resolves current issues but also serves as a crucial reference for managing future incidents effectively.

Consider the chaos of an incident: systems failing, users affected, and the clock ticking. When the dust settles, a well-crafted postmortem template helps you make sense of the madness. It provides a clear, step-by-step account of what happened, why it happened, and how project management can help prevent it from happening again. Such a structured approach transforms a negative event into a positive learning experience.

Moreover, having a consistent incident postmortem process ensures that every incident is analyzed comprehensively. This consistency helps teams identify patterns and recurring issues, leading to more effective and proactive incident management.

Key elements of an effective postmortem template

Creating an effective postmortem template starts with a clear title and introduction that summarizes the incident. This sets the stage for anyone reading the document, providing immediate context.

Following this is the incident timeline—a chronological account of events leading up to and during the incident, complete with timestamps. This section is crucial for understanding the sequence of events and identifying contributing factors and potential triggers.

The impact and mitigation section is another critical component. Here, you detail the effects of the incident on users and describe the immediate corrective actions taken. This section helps teams understand the real-world implications of the incident and the effectiveness of their initial response.

Root cause analysis and lessons learned are the heart of any postmortem template. By identifying the root cause, teams can implement measures to prevent similar incidents in the future. Lessons learned provide valuable insights into what worked well and what didn’t, fostering a culture of continuous improvement.

Using a consistent format in postmortem documentation facilitates thorough analysis and more effective incident management. Regularly updating the template based on feedback and outcomes from previous postmortems further enhances its effectiveness. Ultimately, an effective postmortem template is not just a document; it’s a dynamic tool for continuous learning and improvement.

ilert's built-in postmortem feature

ilert takes the hassle out of creating postmortem documents. It automatically gathers data from various incident-related communications and status updates, making the documentation process seamless. This feature is a lifesaver when you’re dealing with the aftermath of an incident and need to focus on analysis rather than data collection.

Integration with chat tools like Slack and Microsoft Teams further streamlines the process. ilert can automatically compile alerts triggered during incidents and include relevant messages from linked channels. This means you don’t have to manually sift through endless chat logs to find pertinent information.

Once the document is generated, its status transitions to “created,” and users can view a simplified markdown version or access the raw text file for further adjustments. This flexibility allows teams to fine-tune the document before sharing it with stakeholders, ensuring that it meets all requirements and provides valuable insights into the development process.

Moreover, ilert allows you to link postmortems to specific incidents and publish them on all relevant status pages. This ensures everyone is aligned and has access to the postmortem report. Making the postmortem process more efficient, ilert helps teams concentrate on identifying root causes and areas for improvement.

Example incident and postmortem document creation with ilert

Let's imagine the following incident scenario to show you ilert in action and help you better understand the structure of the postmortem process.

Incident scenario

Company XY is a website hosting service that utilizes a cloud provider to host and deliver their customers’ websites. They get notified about any incidents on the cloud provider's site.

In the late afternoon, several alerts were created in ilert signaling unreachable customer websites. About half of the customers were impacted. The issue was escalated by the responder, creating an incident. Gregory created an incident and set the status to "Investigating." This was immediately reflected on the status page. After identifying the cause of the problem, the status was changed to "Identified" to keep the users informed. Later, Francesca chimed in, got info from the provider, and set the status to "Monitoring." After 1,5 hours, the incident was resolved, and Francesca put the status to "Resolved."

(By the way, if you are feeling lost identifying the difference between alerts and incidents, we have a dedicated article. Shortly, alerts are technical signals from monitoring tools, while incidents stand for the disruptions that impact users and must be communicated).

The illustrations below show the whole process vividly.

Postmortem creation with ilert
The team receives alerts and communicates via ilert incident management platform
Incident creation in ilert
An incident is created in ilert
How to create a postmortem automatically
The incident is resolved
Generate postmortem using ilert AI
Automatic postmortem generation with ilert AI
Postmortem template from ilert
A preview of the postmortem document created with ilert AI

Automatic postmortem creation

After the dust had settled, engineers created a postmortem report. ilert reviewed all available information, including alert details, logs, messages, and status updates, and prepared a clear, structured post-mortem document.

All postmortems are saved in ilert. However, users can also download or save it as a plain text.

# [00000 Partial data center outage causing some websites to be down.](https://test.ilert.com/incidents/view?id=000)
Generated by Francesca Sala on 18.03.2025 17:40.
All timestamps are local to Europe/Berlin.

# Post-Mortem Document

## Incident Timeline

### March 18, 2025
- **14:26:24.109Z**: Received event from alert source indicating website thernos.com is down.
- **14:26:25.426Z**: Francesca Sala notified via email.
- **14:26:25.437Z**: Gregory George notified via email.
- **14:26:24.129Z**: Assigned to Gregory George.
- **14:27:06.664Z**: Accepted by Gregory George.
- **14:33:52.317Z**: Gregory George linked incident 'Partial data center outage causing some websites to be down' to this alert.
- **14:36:46.682Z**: Gregory George changed linked incident status to Identified.
- **14:59:00.145Z**: Gregory George added a comment regarding an email from Thernos asking for an estimate on website restoration.
- **15:00:28.502Z**: Francesca Sala added a comment indicating the provider is restarting affected regions.
- **15:09:21.785Z**: Francesca Sala changed linked incident status to Monitoring.
- **16:03:51.741Z**: Francesca Sala changed linked incident status to Resolved.
- **16:06:36.737Z**: Francesca Sala added a comment indicating the incident is resolved and the website is online again.
- **16:06:36.737Z**: Incident resolved by Francesca Sala.

### March 18, 2025 (Additional Alerts)
- **14:26:30.692Z**: Received event from alert source indicating website akisp.com is down.
- **14:26:31.884Z**: Francesca Sala notified via email.
- **14:26:31.887Z**: Gregory George notified via email.
- **14:26:30.705Z**: Assigned to Gregory George.
- **14:27:06.640Z**: Accepted by Gregory George.
- **14:33:48.699Z**: Gregory George linked incident 'Partial data center outage causing some websites to be down' to this alert.
- **14:36:46.699Z**: Gregory George changed linked incident status to Identified.
- **15:09:21.813Z**: Francesca Sala changed linked incident status to Monitoring.
- **16:03:51.770Z**: Francesca Sala changed linked incident status to Resolved.
- **16:06:36.524Z**: Francesca Sala added a comment indicating the incident is resolved and the website is online again.
- **16:06:36.524Z**: Incident resolved by Francesca Sala.

### March 18, 2025 (Additional Alerts)
- **14:26:36.713Z**: Received event from alert source indicating website kontore.com is down.
- **14:26:37.916Z**: Gregory George notified via email.
- **14:26:37.923Z**: Francesca Sala notified via email.
- **14:26:36.737Z**: Assigned to Gregory George.
- **14:27:06.602Z**: Accepted by Gregory George.
- **14:33:08.523Z**: Gregory George linked incident 'Partial data center outage causing some websites to be down' to this alert.
- **14:36:46.716Z**: Gregory George changed linked incident status to Identified.
- **15:09:21.837Z**: Francesca Sala changed linked incident status to Monitoring.
- **16:03:51.802Z**: Francesca Sala changed linked incident status to Resolved.
- **16:06:36.209Z**: Francesca Sala added a comment indicating the incident is resolved and the website is online again.
- **16:06:36.209Z**: Incident resolved by Francesca Sala.

## Impact

The incident caused a partial outage in one of our data centers, affecting the availability of several customer websites, including Thernos, Akisp, and Kontore. Approximately half of our hosted sites were down, leading to customer inquiries and potential business disruptions. The affected websites experienced degraded performance and were unreachable for a period of time, causing inconvenience to users and potentially impacting business operations for the affected customers.

## Root Cause Analysis

The root cause of the incident was identified as an issue with our data center provider. The provider experienced an outage in one of their data centers, which led to the unavailability of several hosted websites. The provider worked on resolving the issue by restarting the affected regions, which eventually restored the services.

## Action Items

1. **Monitoring Provider Status**: Francesca Sala will continue to monitor the cloud provider's status page for updates during incidents.
2. **Customer Communication**: Gregory George will draft and update the status page to keep customers informed during incidents.
3. **Incident Documentation**: Francesca Sala will create and share a post-mortem document after the incident is resolved.

This post-mortem document provides a detailed account of the incident, its impact, root cause, and the actions taken to prevent recurrence.

Use ilert or download a postmortem template and fill in manually

Based on this example, we prepared a Google Docs template that you can use if you are not yet utilizing the ilert incident management platform. While assembling and writing all the information manually will be more time-consuming, it is still the first step to better arranging post-incident learnings and preparing for the next challenges.

Download a postmortem template.

A few words on blameless postmortems and blameless culture

A blameless postmortem focuses on collective learning and improvement rather than assigning fault to individuals. This approach fosters a supportive work environment and encourages team members to be honest and open during the postmortem process. Instead of pointing fingers, the focus is on understanding what happened and how to prevent it in the future.

Asking "what" and "how" questions instead of "who" during postmortem meetings helps analyze incidents without attributing blame. This promotes a growth mindset and fosters a culture of continuous improvement. A "no argument" policy during discussions ensures the focus remains on process improvement rather than assigning blame.

Utilizing data-driven insights, ilert AI provides unbiased evaluations of incidents, eliminating personal biases in reporting. This also helps create a blameless culture where the ultimate goal is to learn from incidents and improve future responses rather than playing the blame game.

Common pitfalls to avoid in postmortem document creation

To maximize the value of your postmortems, avoid these key pitfalls—ranked by their impact on long-term learning and operational resilience:

Not analyzing patterns across incidents

  • Treating each incident in isolation can hide recurring issues.
  • Regularly review multiple postmortems to detect patterns, systemic weaknesses, or process gaps.
  • Use this insight to inform broader improvements and prevent similar incidents in the future.

Failure to follow up on action items

  • Insight is meaningless without execution. If postmortem action items aren’t completed, incidents are likely to repeat.
  • Always assign owners and due dates, and track completion progress.

Using a generic template

  • A one-size-fits-all postmortem template may omit crucial incident-specific details.
  • Customize templates to include everything relevant—like timeline, impact, contributing factors, and remediation steps.

Lack of a blameless culture

  • If people feel blamed, they’re less likely to share honestly.
  • Promote a culture of psychological safety and learning, not punishment.

Vague or unconstructive feedback

  • Feedback that lacks clarity or actionability won’t lead to meaningful change.
  • Encourage specific, constructive feedback that points to clear improvements.

Poor stakeholder communication

  • Not sharing postmortems with key stakeholders reduces organizational learning.
  • Proactively circulate findings to relevant teams, leadership, and other affected parties to keep everyone aligned.

Summary

Postmortem templates are essential tools for transforming incidents into learning opportunities. By documenting incidents in a structured manner, teams can identify system vulnerabilities, improve future responses, and foster a culture of continuous improvement. ilert’s built-in features and AI enhancements make the postmortem process seamless and efficient, allowing teams to focus on what really matters.

Implementing a formal postmortem process and avoiding common pitfalls ensures that every incident becomes a stepping stone toward success. By embracing a blameless culture, teams can learn from their experiences and drive better outcomes. Remember, the ultimate goal is to turn every failure into an opportunity for growth and improvement.

Frequently Asked Questions

What is the purpose of using ilert AI in postmortem creation?

Using ilert AI for postmortem creation speeds up the process of the final stage of incident response, letting you focus on evaluating the incident instead of spending ages on paperwork. It's all about getting to the good stuff quicker!

What happens after an incident reaches the "Resolved" state?

Once an incident hits the "Resolved" state, the team collects all the relevant details and documents everything discussed to ensure everyone is on the same page. ilert users skip the manual part of work and jump right to the discussions and action items execution.

What information does ilert AI consider when generating a postmortem document?

Ilert AI generates a postmortem document by considering the incident's context, including history updates, Slack or Microsoft Teams messages, subscribers, services, involved users, and any linked alert details.

How can users include relevant messages from communication channels in their postmortem document?

You can easily add relevant messages to your postmortem by linking your Slack or Microsoft Teams channels, which the ilert bot will scan for you. Alternatively, copy and paste chat transcripts manually from anywhere you need.

Explore all
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Our Cookie Policy
We use cookies to improve your experience, analyze site traffic and for marketing. Learn more in our Privacy Policy.
Open Preferences
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.