Frequently Asked Questions

General

What is Omium?

Omium is an observability and reliability platform for AI agents. It captures execution traces, creates state checkpoints, detects failures, and enables one-click recovery — so your LangGraph, CrewAI, or custom agents stay debuggable in production.

What frameworks does Omium support?

Omium has auto-instrumentation for LangGraph and CrewAI. For any other Python framework (or custom agents), use the @omium.trace and @omium.checkpoint decorators. The REST API works from any language.

Is there a free tier?

Yes. The free tier includes 500 agent executions per month, core tracing, checkpoints, and 7-day data retention. No credit card required. Get started →

How long does setup take?

Under 5 minutes. Install the SDK (pip install omium), run omium init to authenticate, and add two lines to your agent code. See the quickstart →

Does Omium replace my logging or monitoring tools?

No. Omium complements your existing stack. It adds agent-specific observability — execution traces, state checkpoints, failure detection, and replay — that generic logging tools like Datadog or Sentry don’t provide for multi-step AI workflows.

Debugging agent failures

My AI agent failed mid-workflow — how do I find the root cause?

Omium captures every step of your agent’s execution as a trace. Open the failed run in the dashboard, expand the execution timeline, and click into the failing step. You’ll see the exact input, output, tool calls, and error message at the point of failure — no manual logging required.Related: Execution tracing →

How do I recover from a failed agent run without restarting from scratch?

Omium saves checkpoints — full state snapshots at critical points during execution. When a run fails, you can replay from the last valid checkpoint instead of re-running the entire pipeline. This saves time and avoids duplicate API calls.

# Replay from the last checkpoint
omium replay <execution_id>

My LangGraph agent produces wrong output but doesn't throw an error — how do I debug this?

These are silent failures — the hardest kind to catch. Omium’s failure detection monitors for output drift, hallucinations, and quality degradation even when no exception is thrown. Enable omium.instrument_langgraph() and the dashboard will flag anomalies automatically.Related: LangGraph integration →

My CrewAI agent is stuck in an infinite loop — what do I do?

Omium detects infinite loops and circular tool-call patterns in real-time. When detected, you’ll see a failure alert in the dashboard with the exact loop pattern. You can then:

View the trace to see where the loop starts
Roll back to the last checkpoint before the loop
Apply a fix and replay

How do I debug a multi-agent system where agents interact with each other?

Omium traces the full execution graph across agents — including handoffs, shared state, and tool calls between agents. The dashboard visualizes these as a connected timeline so you can follow the flow from one agent to another and pinpoint where communication breaks down.Related: Platform capabilities →

My agent's API calls are timing out — can Omium help?

Yes. Omium traces every external tool call your agent makes, including duration. You can filter executions by status (failed, timeout) and sort by latency to find the slow calls. Checkpoints before the timeout let you retry just the failing step.Related: Executions API →

Monitoring and observability

How do I monitor my AI agents in production?

Once instrumented, Omium automatically captures every execution. The dashboard shows real-time metrics: success rate, failure rate, latency, cost per run, and active runs. Set up Slack notifications for failures and daily digests.Related: Automations →

Can I track how much my AI agents cost to run?

Yes. Omium tracks token usage and estimated cost per execution, broken down by workflow, model, and time period. The Cost page in the dashboard shows trends and lets you set budget alerts.Related: Billing API →, API keys & billing →

How do I set up alerts when an agent fails?

Connect Slack in your dashboard settings. Omium sends real-time failure alerts to your configured channel. You can also set up daily/weekly digest reports that summarize wins, issues, and key metrics.Related: Platform →

What's the difference between a trace and a checkpoint?

A trace is a read-only record of what happened — every step, tool call, and LLM response during an execution. A checkpoint is a writable state snapshot that you can roll back to and replay from. Traces help you understand; checkpoints help you recover.Related: Checkpoints →, Platform capabilities →

Integration and setup

Does Omium work with OpenAI, Anthropic, and other LLM providers?

Yes. Omium is LLM-agnostic. It instruments at the agent framework level (LangGraph, CrewAI) or at the function level (@omium.trace), so it works regardless of which LLM provider your agents call.

Can I use Omium without LangGraph or CrewAI?

Absolutely. Use the @omium.trace and @omium.checkpoint decorators on any Python function. The REST API also works from non-Python services.

@omium.trace("my_step")
def my_custom_step(data):
    return process(data)

Related: Python SDK →, REST API →

How do I add Omium to an existing project?

Two lines of code. No refactoring needed.

import omium
omium.init()
omium.instrument_langgraph()  # or instrument_crewai()

Your existing agent code runs unchanged. Omium wraps framework internals to capture traces and checkpoints automatically.Related: Installation →, Quickstart →

Is my data secure?

Yes. All data is encrypted in transit (TLS) and at rest. Omium does not store your LLM prompts or responses unless you explicitly enable full-content tracing. API keys are scoped per project and can be rotated at any time.Related: API keys & billing →

Can I self-host Omium?

Enterprise plans include self-hosted deployment options. Contact us to discuss your requirements.

Pricing and billing

How does pricing work?

Omium has four tiers: Free (500 runs/mo), Developer (

49/mo, 2,500 runs), **Pro** (

299/mo, 25,000 runs), and Enterprise (custom). All tiers include core tracing and checkpoints. Higher tiers unlock failure analytics, fix suggestions, and priority support.See full pricing →

What counts as an execution?

One execution = one top-level agent run (e.g., one app.invoke() in LangGraph or one crew.kickoff() in CrewAI). Steps within that run (tool calls, LLM calls, checkpoints) are included and don’t count separately.

Can I upgrade or downgrade my plan at any time?

Yes. Changes take effect immediately. When upgrading, you’re charged the prorated difference. When downgrading, the new rate applies at the next billing cycle.Related: API keys & billing →

Still have questions? Join our Discord community or email us at founders@omium.ai.

Welcome

Getting started

Build with Omium

REST API

SDK and CLI

Configuration

Platform

Resources

Frequently Asked Questions

General

Debugging agent failures

Monitoring and observability

Integration and setup

Pricing and billing

Welcome

Getting started

Build with Omium

REST API

SDK and CLI

Configuration

Platform

Resources

​General

​Debugging agent failures

​Monitoring and observability

​Integration and setup

​Pricing and billing

General

Debugging agent failures

Monitoring and observability

Integration and setup

Pricing and billing