Hi 👋,
I'm Matvey, and I'm joining Keep as a co-founder.
With this blog post, I share my story, how I tried to solve alerting-related problems over the last 5 years, how I built my previous startup, why I think AIOps is ready for change, and why I believe Keep is the only company with the potential to disrupt the AIOps space.
Five years ago I was woken in the middle of the night by a false-positive alert from DataDog. It’s the second year that I’ve been on call every second day. This means I’ve never left home without a laptop, power bank, and two mobile phones (for better cellular coverage). That night I was sitting on my bed, after checking metrics, reading logs, and clicking through the product, and thought:
Why is everything related to alerting so painful?
This question became my obsession. I left my job and founded a startup dedicated to helping on-call engineers like myself live better lives. It was 2018, and my main thesis was: “Incident response tools don't care much about end-users. Let’s build something more friendly.”
Surprisingly, it worked well. Cool investors liked the thesis, users liked the product…
Two years later Grafana Labs noticed a small startup of six people competing with PagerDuty ($1.8B eval, public) and acquired our company. This is how I joined Grafana and started leading the Grafana OnCall.
We pivoted from pure SaaS to open-source-based cloud solutions, and the adoption of Grafana OnCall exploded.Today, I couldn’t be happier to see companies like Rootly, FireHydrant, and Incident.IO launching their own tools for on-call engineers. There’s finally fun, competition, and innovation in the market.
In recent history, there were only a few vendors. Today, one can choose from old vendors, new vendors, and even open source. If you are on call, your life is a bit better today, although being on call is still a very, very hard job.
About a year ago, I met Shahar (CTO at Keep) and Tal (CEO at Keep), engineers who "tried to fix an alerting problem" (we already know it's a promising way to start a company).
We occasionally met for a beer, and one day, we ended up discussing the future of observability:
The industry solved the problem of "who to notify and how to notify" well, the same is true for data collection, storage, and visualization. AI is the next big thing in observability, but nobody knows how to apply it. Let’s talk to a few customers of large AIOps vendors just to learn what they think?
The next two weeks, we spent talking to Fortune 500 companies about applying AI to their infrastructure. CTOs, Engineering Managers, and NOC managers shared insightful and, sometimes, shocking problems.
Again and again those people shared the same complaints about the market’s ability to under-deliver while over-promising: “Their product has no AI”, “we’re struggling with manual configuration on our scale”, or the most common and simple: “we don’t understand what AIOps means”.
Soon after, I was honored to be invited to join Keep as a co-founder. It was easy to agree.
AIOps today is a buzzword from the big enterprise world and, believe me, it took some effort to figure out what exactly it means. Hope you don’t mind if I will cut corners and speak openly that AIOps today is:
In other words, AIOps today is a patch for large enterprises, helping them fight noisy alerts. It only helps a little to save time because it requires tons of manual configuration and is pretty much isolated from the rest of the observability data, such as logs, metrics, and traces.
It has a potential to become much, much more than that.
AIOps & MLOps? MLOps is about bringing AI models to production.
AIOps & IRM? IRM is about "Who and how to notify, how to report and organize a war room."
We strongly believe that AIOps should become an intelligent layer on top of observability and help the organization focus on what's important.
AIOps should help you (developer, NOC, network engineer, manager, C-Level) figure out what's happening in your infrastructure, what will happen, and what to do with all that. It should remove pressure from your shoulders and not become an additional project for your already pressurized engineering team.
There are two challenges to solve to make it a reality:
We're halfway through solving the first challenge and have already achieved great results with a few customers. The latest developments in observability, such as OTEL, the popularity of OSS monitoring, and the massive adoption of best practices, are helping us a lot.
Keep is uniquely positioned to solve the second challenge as well. It has the unfair advantage of being open-source-based and having a great community of 60+ contributors. Keep Open Source is a popular alert hub adopted by multiple Fortune 500 companies worldwide and is growing in popularity.In general,
I am happy to be on board! Let's rock!