Hey there! Matvey from Keep checking in.Â
We talked a lot about workflows. A workflow engine, or as some people call it, “GitHub Actions for alerts,” is a powerful feature Keep has. I recently covered the workflow basics, and I encourage you to read this blogpost if you haven’t yet: https://www.keephq.dev/blog/your-first-workflow-in-keep
Today, I want to cover how to leverage AI inside workflows. How do workflows “decide something” on the fly based on limited or unstructured data? Where does data go? Which AI is used? Is it available in the Open Source?
I can already hear some of you asking: "Why introduce uncertainty into our infrastructure?" It's a valid concern. As engineers, we strive for predictability and reliability in our systems. Generally, when working with well-defined, stable environments, we prefer straightforward if/else logic. However, there are several scenarios where traditional approaches fall short:
When dealing with human input, navigating environments with unknown variables, handling unpredictable third-party system behavior, or operating in highly unstable conditions, we often need more sophisticated solutions. In these cases, Large Language Models (LLMs) don't necessarily add more uncertainty – instead, they can help bring order to chaos by categorizing data, normalizing inputs, setting priorities, and making information actionable for automated processes.
Before we get hands-on, I want to share that we've expanded our AI capabilities in Keep. We now support eight different AI providers, giving you flexibility in both deployment options and data privacy considerations:
Let's explore how these tools can transform your workflows.
One key feature supported across all AI providers in Keep is structured output. Imagine receiving a vague alert from some old system managed by some team from an office in a different region. The alert simply states, "Nothing works," with an environment field showing "real customer Acme Corp." Sure, we could ask their team to improve their alerting, but that request might sit in their backlog until next quarter. Instead of waiting or manually translating their environment values into our standard "dev," "staging," or "prod" categories, we can leverage AI to automate this process.
If we simply ask an LLM, "What's the environment of this alert - dev, staging, or production?" we'll likely get a verbose, analytical response like: "Hmm, I think it's more likely production because the impacted company name is mentioned, but Acme Corp sounds like an artificial name so..."
This is where structured output becomes invaluable. We can transform these ambiguous responses into structured data by constraining both the vocabulary and response format to specific JSON structures and providing only the exact options we want to receive. The result? An output directly used in if/else conditions, for loops, and other workflow instructions makes our automation reliable and predictable.
Here is an example of the configuration for the structured output:
structured_output_format: # We limit what model could return
type: json_schema
json_schema:
name: workflow_applicability
schema:
type: object
properties:
should_run:
type: boolean
description: "Whether the workflow should be executed based on the alert"
required: ["should_run"]
additionalProperties: false
strict: true
And here is an example response:
{"should_run": true}
Note that the description acts as an additional prompt where you could ask the model to lean to a specific value if it’s unsure what to answer.
The format used here is Json Schema, read more about it here: https://json-schema.org/learn/getting-started-step-by-step
Let's dive into a practical example. Imagine we have an auto-remediation script - something as straightforward as cleaning up a MySQL table. While this might not be the most elegant solution, it's the kind of quick fix that keeps systems running in the real world. Sometimes you just need to get things working immediately, even if it's not the perfect long-term solution.
In this scenario, we'll use structured output to answer one crucial question: is it safe and appropriate to execute this workflow? This seemingly simple decision actually requires careful consideration of multiple factors, and that's where AI can help make a more informed choice.
Here is a workflow:
id: auto-fix-mysql-table-overflow
description: Clean heavy mysql tables after consulting with OpenAI using structured output
triggers:
- type: incident
events:
- updated
- created
steps:
- name: ask-openai-if-this-workflow-is-applicable
provider:
config: "{{ providers.my_openai }}"
type: openai
with:
prompt: "There is a task cleaning MySQL database. Should we run the task if we received an alert with such a name {{ alert.name }}?"
model: "gpt-4o-mini" # This model supports structured output
structured_output_format: # We limit what model could return
type: json_schema
json_schema:
name: workflow_applicability
schema:
type: object
properties:
should_run:
type: boolean
description: "Whether the workflow should be executed based on the alert"
required: ["should_run"]
additionalProperties: false
strict: true
actions:
- name: clean-db-step
if: "{{ steps.ask-openai-if-this-workflow-is-applicable.results.response.should_run }}"
provider:
config: "{{ providers.mysql }}"
type: mysql
with:
query: DELETE FROM bookstore.cache ORDER BY id DESC LIMIT 100;
https://github.com/keephq/keep/blob/main/examples/workflows/conditionally_run_if_ai_says_so.yaml
Consider a scenario where you're handling alerts from across your infrastructure - legacy systems, orphaned applications, and everything in between. While some alerts come with proper labeling and context, others leave you guessing. What if we could use AI with structured output to fill in these information gaps, inferring missing fields from the context we do have?
Let's explore how we can use AI to enrich these alerts by extracting meaningful information from whatever data is available, creating more complete and actionable notifications.
id: enrich-using-ai
description: Enrich alerts using structured output from LLMs
triggers:
- type: alert
filters:
- key: source
value: prometheus
steps:
- name: get-enrichments
provider:
config: "{{ providers.my_openai }}"
type: openai
with:
prompt: "You received such an alert {{alert}}, generate missing fields."
model: "gpt-4o-mini" # This model supports structured output
structured_output_format: # We limit what model could return
type: json_schema
json_schema:
name: missing_fields
schema:
type: object
properties:
environment:
type: string
enum:
- "production"
- "pre-prod"
- "debug"
description: "Be pessimistic, return pre-prod or production only if you see evidence in the alert body."
impacted_customer_name:
type: string
description: "Return undefined if you are not sure about the customer."
required: ["environment", "impacted_customer_name"]
additionalProperties: false
strict: true
actions:
- name: enrich-alert
provider:
type: mock
with:
enrich_alert:
- key: environment
value: "{{ steps.get-enrichments.results.response.environment }}"
- key: impacted_customer_name
value: "{{ steps.get-enrichments.results.response.impacted_customer_name }}"
Examples:
That’s all. Now, you are ready to write your first workflow and leverage AI 🎉
One last step left is to join our Slack, where our team is helping those adopting Keep: https://slack.keephq.dev/ We will be super happy to learn more about your use case!