Keep - Open-source AIOps platform - (Launch Week 🚀) AI-powered workflows

(Launch Week 🚀) AI-powered workflows

February 26, 2025

Hey there! Matvey from Keep checking in.

We talked a lot about workflows. A workflow engine, or as some people call it, “GitHub Actions for alerts,” is a powerful feature Keep has. I recently covered the workflow basics, and I encourage you to read this blogpost if you haven’t yet: https://www.keephq.dev/blog/your-first-workflow-in-keep

Today, I want to cover how to leverage AI inside workflows. How do workflows “decide something” on the fly based on limited or unstructured data? Where does data go? Which AI is used? Is it available in the Open Source?

I can already hear some of you asking: "Why introduce uncertainty into our infrastructure?" It's a valid concern. As engineers, we strive for predictability and reliability in our systems. Generally, when working with well-defined, stable environments, we prefer straightforward if/else logic. However, there are several scenarios where traditional approaches fall short:

When dealing with human input, navigating environments with unknown variables, handling unpredictable third-party system behavior, or operating in highly unstable conditions, we often need more sophisticated solutions. In these cases, Large Language Models (LLMs) don't necessarily add more uncertainty – instead, they can help bring order to chaos by categorizing data, normalizing inputs, setting priorities, and making information actionable for automated processes.

New AI Providers

Before we get hands-on, I want to share that we've expanded our AI capabilities in Keep. We now support eight different AI providers, giving you flexibility in both deployment options and data privacy considerations:

DeepSeek
OpenAI
Anthropic
Grok
Gemini
Ollama
Llama.cpp
vLLM

Let's explore how these tools can transform your workflows.

Structured Output: why is it important?

One key feature supported across all AI providers in Keep is structured output. Imagine receiving a vague alert from some old system managed by some team from an office in a different region. The alert simply states, "Nothing works," with an environment field showing "real customer Acme Corp." Sure, we could ask their team to improve their alerting, but that request might sit in their backlog until next quarter. Instead of waiting or manually translating their environment values into our standard "dev," "staging," or "prod" categories, we can leverage AI to automate this process.

If we simply ask an LLM, "What's the environment of this alert - dev, staging, or production?" we'll likely get a verbose, analytical response like: "Hmm, I think it's more likely production because the impacted company name is mentioned, but Acme Corp sounds like an artificial name so..."

This is where structured output becomes invaluable. We can transform these ambiguous responses into structured data by constraining both the vocabulary and response format to specific JSON structures and providing only the exact options we want to receive. The result? An output directly used in if/else conditions, for loops, and other workflow instructions makes our automation reliable and predictable.

Here is an example of the configuration for the structured output:

        structured_output_format:  # We limit what model could return
          type: json_schema
          json_schema:
            name: workflow_applicability
            schema:
              type: object
              properties:
                should_run:
                  type: boolean
                  description: "Whether the workflow should be executed based on the alert"
              required: ["should_run"]
              additionalProperties: false
            strict: true

And here is an example response:

{"should_run": true}

Note that the description acts as an additional prompt where you could ask the model to lean to a specific value if it’s unsure what to answer.

The format used here is Json Schema, read more about it here: https://json-schema.org/learn/getting-started-step-by-step

Making AI decide if running auto-remediation workflow is needed

Let's dive into a practical example. Imagine we have an auto-remediation script - something as straightforward as cleaning up a MySQL table. While this might not be the most elegant solution, it's the kind of quick fix that keeps systems running in the real world. Sometimes you just need to get things working immediately, even if it's not the perfect long-term solution.

In this scenario, we'll use structured output to answer one crucial question: is it safe and appropriate to execute this workflow? This seemingly simple decision actually requires careful consideration of multiple factors, and that's where AI can help make a more informed choice.

Here is a workflow:

id: auto-fix-mysql-table-overflow
description: Clean heavy mysql tables after consulting with OpenAI using structured output
triggers:
  - type: incident
    events:
      - updated
      - created

steps:
  - name: ask-openai-if-this-workflow-is-applicable
    provider:
      config: "{{ providers.my_openai }}"
      type: openai
      with:
        prompt: "There is a task cleaning MySQL database. Should we run the task if we received an alert with such a name {{ alert.name }}?"
        model: "gpt-4o-mini" # This model supports structured output
        structured_output_format:  # We limit what model could return
          type: json_schema
          json_schema:
            name: workflow_applicability
            schema:
              type: object
              properties:
                should_run:
                  type: boolean
                  description: "Whether the workflow should be executed based on the alert"
              required: ["should_run"]
              additionalProperties: false
            strict: true

actions:
  - name: clean-db-step
    if: "{{ steps.ask-openai-if-this-workflow-is-applicable.results.response.should_run }}"
    provider:
      config: "{{ providers.mysql }}"
      type: mysql
      with:
        query: DELETE FROM bookstore.cache ORDER BY id DESC LIMIT 100;

https://github.com/keephq/keep/blob/main/examples/workflows/conditionally_run_if_ai_says_so.yaml

Making AI to enrich poorly configured alerts

Consider a scenario where you're handling alerts from across your infrastructure - legacy systems, orphaned applications, and everything in between. While some alerts come with proper labeling and context, others leave you guessing. What if we could use AI with structured output to fill in these information gaps, inferring missing fields from the context we do have?

Let's explore how we can use AI to enrich these alerts by extracting meaningful information from whatever data is available, creating more complete and actionable notifications.

id: enrich-using-ai
description: Enrich alerts using structured output from LLMs
triggers:
  - type: alert
    filters:
      - key: source
        value: prometheus

steps:
  - name: get-enrichments
    provider:
      config: "{{ providers.my_openai }}"
      type: openai
      with:
        prompt: "You received such an alert {{alert}}, generate missing fields."
        model: "gpt-4o-mini" # This model supports structured output
        structured_output_format: # We limit what model could return
          type: json_schema
          json_schema:
            name: missing_fields
            schema:
              type: object
              properties:
                environment:
                  type: string
                  enum: 
                    - "production"
                    - "pre-prod"
                    - "debug"
                  description: "Be pessimistic, return pre-prod or production only if you see evidence in the alert body."
                impacted_customer_name:
                  type: string
                  description: "Return undefined if you are not sure about the customer."
              required: ["environment", "impacted_customer_name"]
              additionalProperties: false
            strict: true

actions:
  - name: enrich-alert
    provider:
      type: mock
      with:
        enrich_alert:
          - key: environment
            value: "{{ steps.get-enrichments.results.response.environment }}"
          - key: impacted_customer_name
            value: "{{ steps.get-enrichments.results.response.impacted_customer_name }}"

Examples:

That’s all. Now, you are ready to write your first workflow and leverage AI 🎉

One last step left is to join our Slack, where our team is helping those adopting Keep: https://slack.keephq.dev/ We will be super happy to learn more about your use case!