ilert seamlessly connects with your tools using out pre-built integrations or via email. ilert integrates with monitoring, ticketing, chat, and collaboration tools.
We have transformed our incident management process with ilert. Our platform is intuitive, reliable, and has greatly improved our team's response time.
ilert has helped Ingka significantly reduce both MTTR & MTTA over the last 3 years, the collaboration with the team at ilert is what makes the difference. ilert has been top notch to address even the smallest needs from Ingka and have consistently delivered on the product roadmap. This has inspired the confidence of our consumers making us a 'go to' for all on call management & status pages.
Karan Honavar
Engineering Manager at IKEA
ilert is a low maintenance solution, it simply delivers [...] as a result, the mental load has gone.
Tim Dauer
VP Tech
We even recommend ilert to our own customers.
Maximilian Krieg
Leader Of Managed Network & Security
We are using ilert to fix our problems sooner than our customers are realizing them. ilert gives our engineering and operations teams the confidence that we will react in time.
Dr. Robert Zores
Chief Technology Officer
ilert has proven to be a reliable and stable solution. Support for the very minor issues that occured within seven years has been outstanding and more than 7,000 incidents have been handled via ilert.
Stefan Hierlmeier
Service Delivery Manager
The overall experience is actually absolutely great and I'm very happy that we decided to use this product and your services.
Timo Manuel Junge
Head Of Microsoft Systems & Services
The easy integration of alert sources and the reliability of the alerts convinced us. The app offers our employees an easy way to respond to incidents.
As our customer base and data demands grew exponentially over the years, scaling our database infrastructure became imperative. Our vision was to set up an active-active database architecture that would ensure regional independence and exceptional service quality globally. Here’s an in-depth look at how our team managed to migrate our production data to AWS RDS Aurora, incorporating cutting-edge strategies to minimize impact during the transitional phase.
Understanding the Challenge
Facing limitations with our existing Community MySQL setup, we needed a scalable, high-availability solution that could handle our increasing load and improve global data access. Our aim was to implement an active-active configuration with AWS RDS Aurora to facilitate regional independence and enhance global service delivery.
Step 1: Strategic Pre-Migration Planning
Preparation was the initial key step. Our team meticulously examined the existing database system, charting out all dependencies and specifications. We enlisted all infrastructure components using Terraform, which not only facilitated a smoother setup in AWS but also ensured consistency across our environments—crucial for reducing potential errors during migration.
Our team undertook an exhaustive planning process, evaluating every aspect of our existing database configurations and querying demands. This analysis helped us precisely define the compute and storage specifications for our Aurora setup.
Step 2: Configuring AWS RDS Aurora and Read-Only Services
We set up the AWS RDS Aurora cluster, ensuring it met all our specifications for high performance and reliability. To ensure a smooth switch, we set up read-only services connected to an Aurora read replica. This step was vital as it allowed our services to continue operating on a read-only basis without any disruption when the main database was temporarily unavailable.
Parallel to this, we configured NGINX Ingress inside our Kubernetes clusters, which played a pivotal role in managing traffic during the migration. By defining specific rules in NGINX Ingress, we directed traffic between our normal and read-only service instances, maintaining service availability even when the main database was in a critical state.
Here is an example for the service-a ingress rule:
And here is an example script to manage the traffic between read-only and normal instances:
# Move traffic to readonly instances
kubectl patch service service-a -p '{"spec":{"selector": {"app": "service-a-readonly"}}}'
# Move traffic to back to write instances
kubectl patch service service-a -p '{"spec":{"selector": {"app": "service-a"}}}'
Step 3: Timing and Executing the Migration
To minimize customer impact, we scheduled the migration during our lowest traffic period. The switch to "read-only" mode for the main database lasted only 4 minutes. During this window, our applications were seamlessly interacting with the read-only services connected to the Aurora replica, ensuring continuous availability of data for reading purposes.
Simultaneously, we initiated the final synchronization of the last batch of data from the MySQL database to the Aurora database. At the end of this process, the Aurora cluster was promoted to handle both read and write operations.
Step 4: Switching Over with Minimal Disruption
Following the successful synchronization and promotion of the Aurora cluster, we switched the live traffic from the read-only instances back to the normal service instances, now pointing to the newly promoted Aurora cluster. This switch was handled delicately through updated NGINX Ingress rules, which redirected all traffic to the new Aurora setup, now capable of handling both read and write operations.
Step 5: Monitoring and Optimization Post-Migration
Post-migration, our team engaged in meticulous monitoring to ensure the system was functioning as expected. We paid close attention to performance metrics such as query efficiency, CPU usage, and storage utilization. Continuous optimizations were applied to ensure that our queries were fully leveraging Aurora’s advanced capabilities.
Conclusion
Migrating to AWS RDS Aurora with just a 4-minute read-only window exemplifies our team's commitment to operational excellence and minimal customer impact. Our detailed preparation, use of sophisticated tools like Terraform, and strategic execution enabled us to not only enhance our database performance but also prepare our global infrastructure to better serve our customers through an innovative active-active setup.
Today, we are successfully operating our Aurora Database Cluster in an active-active configuration across two independent regions, spanning six availability zones. This configuration not only boosts performance and ensures higher availability but also reduces latency for our global customer base.
Looking ahead, we are planning to scale our operations even further, enhancing our infrastructure's resilience and efficiency. Our journey with AWS Aurora is a testament to our ongoing commitment to leveraging cutting-edge technology to deliver the best possible service to our customers.
ilert is a validated partner of AWS and has Amazon RDS Ready and Qualified Software achievements. ilert provides out-of-the-box integrations with various Amazon services that are aimed at monitoring your systems and alerting your team when anomalies are detected. Go to the cloud and achieve operational excellence with ilert and AWS.
We’re thrilled to announce that we’ve integrated with Netdata, a popular open-source monitoring solution, to give you more visibility and control over your systems. This powerful combination enhances your ability to monitor, detect, and respond to system alerts in real time.
What is Netdata?
Netdata is an open-source, real-time performance monitoring solution. It's designed to monitor everything from system resources like CPU, memory, disk, and network usage to application-level metrics across hundreds of data points. Netdata collects, and visualizes thousands of metrics with minimal resource overhead, making it ideal for small- and large-scale environments.
Netdata offers two key components to deliver its powerful monitoring: Agent and Cloud. While the Agent is built to work on nearly any environment, from bare-metal servers and virtual machines to containers and Kubernetes clusters, Netdata Cloud is a solution for organizations managing multiple systems. It offers a unified view of all infrastructure, regardless of how many instances or locations the user has. The new ilert integration is available for both components.
What Does the Netdata and ilert Integration Offer?
This integration enriches Netdata with access to an end-to-end incident management platform. Here are the key advantages for the integration users.
Actionable multi-channel alerting: Netdata’s real-time metrics trigger instant alerts via ilert, ensuring your team is notified of critical issues as soon as they arise. Alerts can be delivered through various channels such as SMS, email, voice calls, or push notifications. Users can take the first actions right in the channel where the notification was received; no login is required.
Automated and distributed on-call duty: ilert provides advanced on-call scheduling, escalations, and rotations. Netdata users can ensure the right people are alerted based on the team’s availability, avoiding unnecessary delays in response time.
Reduced noise with the help of AI: You can configure custom alert thresholds in ilert based on the content of alerts Netdata provides. Set rules for critical system parameters and ensure you’re only notified when it really matters.
Advanced incident communication: When an alert is triggered, ilert helps you manage the entire incident lifecycle, from initial detection to resolution. You can keep stakeholders informed with the help of branded status pages and automated notifications. ilert AI also assists in messaging and provides information on affected services in case of incidents.
How to Get Started with the Netdata and ilert Integration
Install Netdata agent: If you haven’t already, install the Netdata Agent on your systems. You can follow the installation guide for a Linux, macOS, and Windows setup. Or set up Netdata Cloud: If you have a distributed infrastructure, sign up for Netdata Cloud to gain centralized visibility and monitoring across all your instances.
Connect Netdata with ilert: in your ilert account, go to Alert Sources and choose Netdata from the list. Follow the step-by-step instructions.
In this quarterly product update, you’ll discover how to customize ilert dashboards to fit your team’s needs, find advanced filters for building complex alert actions, and reduce costs as an MSP using ilert status pages.
Customizable Dashboard
The ilert dashboard is a flexible and customizable page designed to help you monitor your team's preferred metrics and gain insights into various aspects of ilert. You can create multiple dashboards, each tailored to specific teams or purposes. With a wide range of widgets available, you can easily add, remove, and organize them to build a dashboard that fits your unique needs. Learn more about this feature Documentation.
New Alerting Features
Alert Grouping Metrics
A few months ago, we introduced Intelligent Alert Grouping, a powerful feature designed to reduce alert fatigue and help teams manage large volumes of alerts more effectively. With the latest update, ilert users can now track the effectiveness of this feature. For each alert source with alert grouping enabled, you can view the reduced alert volume, improved response times, and grouping precision based on user feedback. You can also adjust the time period and choose between daily, weekly, or monthly reports.
Additionally, grouped alerts are now clearly visible in the Alert timeline, and you can quickly access raw event data via the </> icon.
For those who haven't yet tried the Intelligent Alert Grouping feature, we've introduced a new section called "Similar Alerts," located under Links. The Similar Alerts feature shows responders-related open alerts across all alert sources, even with Intelligent Alert Grouping turned off. The same feature is also available when resolving alerts.
For users already leveraging AIOps features, we've polished and improved how grouped alerts are displayed in the Event Explorer. Just as a reminder, Event Explorer provides a detailed view of alerts from a specific alert source. To see alert group information in JSON format, click the circle in the Alert details and navigate to Event Explorer for further investigation.
Alert Action Conditional Execution
ilert now offers even more customization for automated workflows, giving you greater precision when specifying the conditions for alerts that should trigger actions. You can choose from various filter groups such as alert summary, status, priority, and alertKey, as well as custom alert details specific to the selected alert source. Alerts that don’t match your criteria won’t trigger any action. For advanced users, we’ve also enabled the option to set filters using a code editor.
Additionally, we recommend reviewing the list of trigger events for alert actions. We've added the ability to activate an action if an alert isn't resolved within a specified time frame.
Audience-specific Status Pages
A massive update for MSPs, IT service providers, and clients with many teams and status pages! An audience-specific page is a private page accessible to authenticated ilert users only. It dynamically displays services and metrics based on the user's team assignment. With this update, you can use a single status page for many clients; only relevant services will be visible to them. This feature provides the next level of flexibility and simplifies managing status communication with multiple groups while reducing costs. Learn more about audience-specific status pages in this article, and feel free to message support@ilert.com to get a quote for your account.
Status Page Announcement Bar
Share essential news through your status page. In the settings, you can enable an announcement bar that will appear at the top of the screen. You can make your message more engaging by using simple Markdown and emojis.
Call Flow
Turn Text to Speech with AI
Make your call-routing messages and IVR menus sound more natural and human-like. We've introduced six AI voices that you can use to communicate with callers. Test the audio in your call flow builder by clicking the "Play" button under the Preview section.
Repeat the Flow with One Click
It's now easier to retry routing a call if no one responds on the first attempt. The "Route Call" node now has a new drop-down menu where you can adjust the number of retries.
Global Search
Try the new Global Search feature in the upper right corner of the ilert interface, or simply use the hotkeys Cmd (Ctrl) + K. This new search functionality considers both semantic and textual input, displaying results for alerts, alert actions, connectors, escalation policies, and all other essential resources within your ilert account.
Teaser: Deployment Events
We're excited to announce the launch of Deployment Events in ilert and invite you to join our Beta group to help us gather your valuable feedback. Our initial integration with GitHub enables you to view and link deployment events directly in ilert. This will provide your team with enhanced insight into how recent changes are affecting your services. The Deployment Events feature automatically sends repository changes, such as releases and merged pull requests, to ilert. If an incident occurs, ilert will attempt to correlate changes to alerts and provide recommendations for remediation. Contact us via support@ilert.com, and we will add you to the group.
Read this article if you struggle to import existing ilert resources into Terraform.
ChatOps Updates
Nice update for Telegram users. ilert now identifies who accepted the alert from this messenger and adds the engineer as a responder. This update improves the collaboration and visibility of actions taken during the issues.
Also, here is a minor update for Slack! Alerts created in Slack now include the channel name from which they were reported.
New in the Mobile App
Grouped alerts are now clearly visible in the mobile app as well. Additionally, you can provide feedback on alert grouping accuracy directly from the app. Your feedback helps enhance the feature's precision, so both thumbs up and down contribute to better performance. Votes also influence the Grouping Precision metric.
We improved the Do Not Disturb mode override for Android app users. Our documentation provides a step-by-step guide for critical alerts to overcome any silent settings on your mobile.
Latest Integrations
HetrixTools—a platform for uptime, server, and blacklist monitoring. It ensures optimal performance and security of IT infrastructure.
Ubidots—an Industrial Internet of Things platform designed for developers and businesses to connect, collect, and visualize sensor data easily.
Postman Monitors—an API development platform allowing developers to design, test, document, and monitor APIs.
AWX Ansible—a toolset for automating IT tasks, such as configuration management, application deployment, and orchestration.
Samsara—a cloud-based platform that provides solutions for managing and monitoring vehicle fleets, equipment, and environmental conditions in real-time. The ilert integration was updated and improved.