Loading…
May 21-22, 2026
Learn more and Register to Attend

The Sched app allows you to build your schedule, but is not a substitute for your event registration. You must be registered for Observability Summit North America 2026.

Please note: This schedule is automatically displayed in Central Daylight Time (UTC -5). To see the schedule in your preferred timezone, select from the drop-down menu located at the bottom of the menu to the right.

The schedule is subject to change.
Venue: Level One | Ballroom A clear filter
arrow_back View All Dates
Friday, May 22
 

9:00am CDT

Keynote: Welcome Back + Opening Remarks
Friday May 22, 2026 9:00am - 9:05am CDT

Friday May 22, 2026 9:00am - 9:05am CDT
Level One | Ballroom A

9:10am CDT

Sponsored Keynote: OpenSearch - See Everything: Open Observability for Agentic AI - Anirudha Jadhav, Amazon Web Services
Friday May 22, 2026 9:10am - 9:15am CDT
AI is accelerating software development at an exponential pace, but we have no idea what our AI systems are actually doing. Agents operate across distributed frameworks. One request spawns dozens of hops with zero visibility. The OpenSearch Observability Stack closes that gap—built for open source contributors, with a growing focus on developers and operators using these systems every day. Open source. Linux Foundation-governed. One pipeline. Every framework. Every model. Every hop visible. The agentic era deserves open infrastructure, and we’ll share how this is a step towards building it together.
Speakers
avatar for Anirudha Jadhav

Anirudha Jadhav

Sr. Engineering Leader, Amazon Web Services
Anirudha is a Senior Manager, Software Development at Amazon Web Services (AWS), leading development of insight engines and visualization platforms for the OpenSearch Project. He specializes in distributed systems, data analytics, and search technologies, including architecting one... Read More →
Friday May 22, 2026 9:10am - 9:15am CDT
Level One | Ballroom A

9:15am CDT

Sponsored Keynote: Datadog - Every Byte Counts: How Protocol Design Shapes the Cost of Observability - Amanda Sopkin, Datadog
Friday May 22, 2026 9:15am - 9:20am CDT
Today, many organizations are pushing beyond existing limits for telemetry volume. Systems are ever-more distributed and generative AI workloads produce enormous amounts of data. As telemetry volumes grow, observability pipelines must become more efficient.

At scale, telemetry egress directly impacts observability spend. Cloud providers charge per gigabyte of data transferred across regions or providers, and those bytes add up quickly. The protocol used to encode telemetry determines how much data is sent over the network. Even modest improvements in encoding efficiency (i.e. the protocol) can translate into significant cost savings. However, the OpenTelemetry Protocol (OTLP) was not initially optimized for performance. Instead, it prioritized interoperability and easy adoption.

Today the OpenTelemetry community is exploring OTAP, a new stateful protocol for transmitting OpenTelemetry data based on Apache Arrow. By using columnar encoding and maintaining state throughout a stream, OTAP avoids repeatedly sending the same metadata, reducing payload size and network transfer. However, because OTAP relies on long-lived stateful streams rather than independent requests, there is additional architectural and operational complexity in its implementation. There are further challenges to larger adoption by the community; for example, Apache Arrow support varies significantly across languages.

Protocol design today is critical to efficiently scaling your systems. In this talk we will explore how protocol design affects telemetry egress and overall observability cost. We will go over some strategies for improving encoding efficiency, compare stateless and stateful approaches, and discuss the potential benefits and drawbacks of adopting a protocol like OTAP. Join us to learn more about how your protocol decisions can influence your costs over time.
Speakers
avatar for Amanda Sopkin

Amanda Sopkin

Engineering Manager, Datadog

Friday May 22, 2026 9:15am - 9:20am CDT
Level One | Ballroom A

9:20am CDT

Keynote: Tracing the Agent's Mind: Extending OpenTelemetry for Deep MCP Inspection - Mustafa Dayıoğlu, TUBITAK & Zeyno Dodd, Conjectura R&D
Friday May 22, 2026 9:20am - 9:45am CDT
Production AI agents make thousands of tool-calling decisions daily, yet observability stops at the model boundary. OpenTelemetry's GenAI semantic conventions capture token counts and latencies—what the LLM processed—but not why an agent selected a specific tool. Research (McKenzie et al., 2023) demonstrates inverse scaling: more capable models exhibit unpredictable tool selection patterns. This gap leaves engineers guessing during critical production failures.

We present gen-ai-otel, an open-source OpenTelemetry extension introducing decision-level telemetry for MCP agents. A new attribute namespace (gen_ai.agent.*) captures tool selection confidence, session context, permission scope validation, and baseline deviations. The zero-sidecar architecture routes telemetry through standard Collector pipelines to existing backends—Jaeger, Prometheus, or graph databases—with low overhead and cardinality-aware attributes.

A live demo reconstructs an agent's decision chain, revealing anomalies invisible to token metrics—reducing decision-debugging time. Attendees leave with: 1) Collector configs, 2) Grafana dashboards for confidence tracking, 3) demo code and repo—all Apache 2.0 licensed.
Speakers
avatar for mustafa dayıoğlu

mustafa dayıoğlu

Senior Chief Researcher, TUBITAK (THE SCIENTIFIC AND TECHNOLOGICAL RESEARCH COUNCIL OF TÜRKİYE)
Mustafa Dayıoğlu (PhD, ITU) is a security architect with 25 years of experience in cybersecurity at TÜBİTAK, designing large-scale security systems serving 80 million citizens for regulated environments. Specializes in threat modeling and protocol development for AI agent systems... Read More →
avatar for Zeyno Dodd

Zeyno Dodd

R&D Solution Architect, Conjectura R&D
R&D Architect with 25+ years building distributed systems and leading open research collaborations. Principal collaborator on SFAMDF and GraphSentinel—open initiatives exploring proactive, federated security patterns for MCP‑based agentic AI systems. Research interests include... Read More →
Friday May 22, 2026 9:20am - 9:45am CDT
Level One | Ballroom A
  Keynote Sessions

10:20am CDT

Exploring Observability with MCP Servers - Tiffany Jernigan, Grafana Labs
Friday May 22, 2026 10:20am - 10:45am CDT
You may have heard of the pillars of observability: metrics, logs, traces, and, depending on who you ask, profiles. As systems grow in complexity, the need to both individually understand and correlate these signals becomes paramount for rapid incident detection, root cause analysis, and performance optimization. Yet, even with advances like OpenTelemetry, making sense of your own data often requires learning specialized query languages and navigating complex toolchains, which is a barrier for many users.

While AI tools like ChatGPT can offer general advice, they lack access to your specific observability data. This is where Model Context Protocol (MCP) servers come in. MCP servers provide a standardized way for AI assistants and other tools to securely connect to your observability data, making it easier to investigate and diagnose issues faster using natural language.

In this talk, we’ll cover MCP and demonstrate how to explore your observability data using Grafana MCP, while also touching on how the same approach can work with other MCP-compatible tools or custom MCP servers.
Speakers
avatar for Tiffany Jernigan

Tiffany Jernigan

Senior Developer Advocate, Grafana Labs
Tiffany is senior developer advocate at Grafana Labs and a CNCF Ambassador. She also formerly worked as a software developer and developer advocate at VMware, Amazon, Docker, and Intel. Prior to that, she graduated from Georgia Tech with a degree in electrical engineering. In her... Read More →
Friday May 22, 2026 10:20am - 10:45am CDT
Level One | Ballroom A
  AI and MCP in Observability

10:50am CDT

Show Me the Receipts: A Forensic Hunt for Observability - Mostafa Radwan, Datadog
Friday May 22, 2026 10:50am - 11:15am CDT
Today, observability platforms can process massive volumes of telemetry, but practitioners struggle to determine what matters during incidents, unnecessarily increasing usage bills.

This talk resolves the question: “Which telemetry data should we keep?” Learn how one team achieved 30% log reduction by flipping the script and asking “what did we actually use?” instead of “what should we collect?” They conducted a forensic audit of incident resolutions to find receipts proving which data sources truly mattered.

You’ll learn techniques for tracing backward from resolved incidents to identify which telemetry is deemed valuable and see how to map incidents to telemetry data that enabled resolution, revealing which sources proved critical, redundant, or unused.

Using OpenTelemetry (OTel) and Vector, an open-source tool for building fast and scalable observability pipelines, this approach provides a replicable pattern that the community can adapt across different environments.

You’ll leave with a framework for measuring telemetry value based on usage patterns, plus a repeatable audit process. The key question: “Where are the receipts?”
Speakers
avatar for Mostafa Radwan

Mostafa Radwan

Senior Solutions Engineer, Datadog
Mostafa is a technologist specialized in cloud native computing, observability, and security.

He started his career as a software engineer before getting in the trenches of application and production support.

He worked as a Solutions Architect at Docker where he helped enterp... Read More →
Friday May 22, 2026 10:50am - 11:15am CDT
Level One | Ballroom A
  Scalability Challenges and Solutions
  • Content Experience Level Any

11:20am CDT

[CANCELLATION] AI Training in Emerging Economies: Building Africa's Largest LLM From the Ground Up - Okikiola Oliyide, Awarri
Friday May 22, 2026 11:20am - 11:45am CDT
N-ATLaS is a multilingual African-language LLM we took from research to production on Kubernetes. This talk shows the end-to-end path we used to make it reproducible, observable, and affordable: data + finetune pipelines (artifacts, seeds, checkpoints), Argo-orchestrated training on mixed GPU pools, and a serving stack with Triton + KServe tuned for real traffic. I’ll walk through SRE guardrails that mattered for N-ATLaS (SLOs, golden signals, error budgets), supply-chain hygiene (image signing, provenance, model versioning), and the levers that cut cost-per-token while improving latency and uptime under pre-emptions. We’ll cover autoscaling, caching, model rollout strategies, and incident playbooks plus what we’d change after thousands of downloads and weeks of live usage. Expect hard-learned patterns, YAML you can run, and a plain-English checklist you can lift into your own cluster; whether you’re serving English or a low-resource language model.
Speakers
avatar for work okiki

work okiki

Lead DevOps Engineer, Awarri
Okikiola Oliyide is Lead Cloud DevOps Engineer at Awarri Technology, where he designs and operates large-scale Kubernetes platforms powering Africa’s largest LLM initiative. With 5+ years across AWS, GCP, and on-prem, he specialises in CI/CD, observability, and cost-efficient GPU... Read More →
Friday May 22, 2026 11:20am - 11:45am CDT
Level One | Ballroom A
  CNCF Observability Projects

11:50am CDT

⚡ Lightning Talk: Show Me the Money: Metrics Edition - Brian Davis, Red Canary
Friday May 22, 2026 11:50am - 12:00pm CDT
Existing cloud and Kubernetes cost management tools struggle to track expenses at a granular level, leaving engineers unable to answer critical questions like: How much is one specific customer costing us in DynamoDB usage? Or, which system component is consuming the most of our Kafka cluster?1


This lightning talk demonstrates how to leverage existing observability frameworks to gain detailed, low-level cost insights. Attendees will learn basic techniques to instrument standard metrics—such as component name, customer ID, and team—with custom labels for fine-grained cost allocation.1


This session includes a practical case study from Red Canary, who has used this exact methodology for over five years to transform their tactical decision-making and better manage cloud spend. By treating cost allocation as an observability problem, engineers can provide the finance team with the deep data required for effective resource management.1


Attendees will leave with an actionable plan for implementing a metrics-based cost tracking system (likely with the tooling you already have), independent of high-level cloud billing tools, to drive significant operational efficiency.
Speakers
avatar for Brian Davis

Brian Davis

Principal Software Architect, Red Canary
Principal Software Architect at Red Canary, a Zscaler Company, Brian Davis has been building and monitoring complex systems for over two decades, ranging from signal-processing algorithms to complex data-processing applications, deploying these on Solaris servers, on-prem virtual... Read More →
Friday May 22, 2026 11:50am - 12:00pm CDT
Level One | Ballroom A
  End-User Case Studies

12:05pm CDT

⚡ Lightning Talk: Observability Debt: When Telemetry Stops Telling the Truth - Spoorthi Palakshaiah, Relevance Lab
Friday May 22, 2026 12:05pm - 12:15pm CDT
This talk introduces observability debt as an operational issue that develops over time in evolving systems. Teams often instrument services early using observability frameworks, define metrics, dashboards, alerts, and SLOs, and initially gain confidence in their ability to understand system behavior. However, production systems rarely remain static. As systems evolve through refactoring, scaling, architectural changes, asynchronous processing, and organizational shifts. Observability artifacts frequently remain unchanged, creating a mismatch between what telemetry is assumed to represent and how the system actually behaves. This mismatch, referred to as observability debt, does not result from missing data but from telemetry whose meaning has drifted due to unmaintained assumptions, leading to dashboards that appear healthy, alerts that lack context, and slower incident understanding. To make this concrete, the talk uses a minimal personal system intentionally designed to model common production patterns. Starting from a low-debt state where telemetry reflects user impact, the system evolves while observability remains static, resulting in metrics that hide localized failures.
Speakers
avatar for Spoorthi Palakshaiah

Spoorthi Palakshaiah

DevOps Engineer, Relevance Lab
Spoorthi is a DevOps engineer with experience designing, building, and optimizing cloud infrastructure. She works extensively with Kubernetes, infrastructure as code, CI/CD pipelines, and open source observability tools to improve system reliability, scalability, and operational efficiency... Read More →
Friday May 22, 2026 12:05pm - 12:15pm CDT
Level One | Ballroom A

12:30pm CDT

Lunch
Friday May 22, 2026 12:30pm - 1:25pm CDT
Menu:

Smoked Turkey-Honey Dijon Wedge; Smoked Turkey, Honey-Dijon Cream Cheese, Lettuce, Marble Pumpernickel Focaccia (GF)
Ham & Swiss Wedge; Smoked Ham, Mustard Aioli, Lettuce, Egg Focaccia
Roasted Veggie Wrap (vg) 
Corn Chowder Soup (v, GF)

Blueberry Cheesecake (V) and Apple Spice Cake (vg, GF)
Friday May 22, 2026 12:30pm - 1:25pm CDT
Level One | Ballroom A

1:25pm CDT

Beyond Dashboards: Architecting AI Agents for Autonomous Observability - Divya Mahajan, Amazon & Achin Gupta, Intuit
Friday May 22, 2026 1:25pm - 1:50pm CDT
The future of observability isn't better dashboards—it's AI agents that reason across metrics, logs, and traces alongside your engineering team.

Engineers spend hours correlating signals across Grafana, Kibana, and Jaeger, mentally stitching together what happened and why. What if an agent could do that correlation automatically?

This session presents a practical architecture for building observability agents that autonomously triage incidents across all three pillars. we'll demonstrate an agent that ingests an alert, queries metrics, searches logs, examines traces, identifies root causes, and recommends remediation—while keeping humans in the loop.

We'll cover:

Why observability is ideal for agentic AI
Agent architecture with LangGraph orchestration
Integration patterns: MCP, REST APIs, and OpenTelemetry
Tool design for metrics, logs, and traces
Live demo: agent triaging a simulated incident
Production considerations: reliability, cost, guardrails
Attendees leave with a working reference architecture built on CNCF ecosystem tools (Prometheus, Jaeger, Loki, Grafana). All code is open source.
Speakers
avatar for Divya Mahajan

Divya Mahajan

Software Engineer, Amazon

Divya Mahajan is a Software Development Engineer at Amazon Alexa, where she builds production-grade Agentic AI and LLM systems at scale. Her work sits at the intersection of conversational AI, agentic automation, and reliable system design, with a focus on accuracy, observability... Read More →
avatar for Achin Gupta

Achin Gupta

Staff Software Engineer, Intuit
Achin Gupta is a Staff Software Engineer with 9 years of experience designing and building production grade distributed observability backends on Kubernetes. He also focuses on AI driven systems, developing LLM powered workflows and multi agent architectures, with an emphasis on observability... Read More →
Friday May 22, 2026 1:25pm - 1:50pm CDT
Level One | Ballroom A
  AI and MCP in Observability

1:55pm CDT

One Size Does Not Fit All: A Polystore Architecture for Logs and Traces - Suman Karumuri, KalDB
Friday May 22, 2026 1:55pm - 2:20pm CDT
Observability data isn't homogeneous. Security logs require needle-in-haystack searches with multi-year compliance retention. Kernel logs are uncompressible text. Structured logs enable fast aggregations, while semi-structured logs explode cardinality. Traces demand different access patterns entirely.

Modern requirements compound this. Observability must join with other data sources. Agentic AI systems generate massive volumes of unstructured and semi-structured logs and traces. Big data platforms have emerged as popular storage alternatives.

Forcing everything into one system creates impossible tradeoffs: slow queries, runaway costs, frustrated users.

At Airbnb and Slack, operating thousands of tenants across hundreds of clusters, we built a polystore architecture routing workloads to specialized engines, unified behind a single query interface. This required changes across the entire stack: instrumentation, collection, storage, and query layers.

This talk shares routing criteria, backend tradeoffs, and techniques for unified querying. Attendees will learn to optimize observability for better performance and lower costs.
Speakers
avatar for Suman Karumuri

Suman Karumuri

CEO, KalDB
Suman Karumuri is Founder and CEO of KalDB and author of KalDB, an open source serverless Lucene platform. He is co-author of the OpenTracing/OpenTelemetry specification and was previously tech lead of Zipkin. Over the past decade, he has built and ran petabyte-scale log search, distributed... Read More →
Friday May 22, 2026 1:55pm - 2:20pm CDT
Level One | Ballroom A

2:25pm CDT

Implementation of Unified Observability at Scale From Scratch - Ahmed J., Emaar
Friday May 22, 2026 2:25pm - 2:50pm CDT
Unified observability has lately been regarded as the holy grail by some. One platform, universal observability, for everything. Usually, this would be the default, but when you are at a 30-year-old non-technical enterprise, dealing with a mixture of legacy and modern systems, it's a whole different story.

A consequence of legacy decisions, in some cases, results in having multiple observability platforms for different teams within the company, adding overhead, cost, noise, and audit complexity. This was the case at Emaar, a property developer based in Dubai, until the PE team took on the exciting project of unifying all observability into one platform. This included applications, infrastructure, network, and security. The complexity arises not just from the different data sources, but rather from the number and nature of the deployment sites. This included sites across 10 countries consisting of data centers, hotels, malls, shops, etc.

This talk will outline the experience of implementing a unified observability platform consisting of thousands of network devices, machines, and application workloads using open-source technologies that resulted in 6 figures of cost savings.
Speakers
avatar for Ahmed J.

Ahmed J.

Platform Engineer, Emaar
Ahmed is a platform engineer with a background in artificial intelligence research and development. He excels at building scalable infrastructure to deploy and manage production-grade applications and models. He co-led the orchestration of modern infrastructure and observability at... Read More →
Friday May 22, 2026 2:25pm - 2:50pm CDT
Level One | Ballroom A
  End-User Case Studies

2:55pm CDT

How Observability-First Development Lets You Ship Agents in Weeks, Not Months - Anirudha Jadhav & Kevin Fallis, AWS
Friday May 22, 2026 2:55pm - 3:20pm CDT
Building AI agents is easy, but knowing why they fail is hard. Traditional APM tools were designed for request-response services, not autonomous agents that reason, plan, and execute multi-step workflows. When your agent makes unexpected decisions, standard metrics and traces don't tell you why.

This session introduces Eval-Driven Development, which focuses on building reliable agents through continuous observability and evaluation. Using OpenSearch AgentHealth, a new open-source platform for agent observability, we'll walk you through the full agent lifecycle of building, observing, improving, and repeating. We'll share a case study comparing two production root-cause-analysis agents. One was built with observability from day one and shipped in a 6 weeks, while the other was retrofitted later and took 12 months to reach production. You'll learn how we used agentic evaluation to score agent outputs and improve accuracy over time.

You'll walk away with patterns for instrumenting agents with OpenTelemetry, techniques for evaluating full decision sequences (not just outputs), and a framework for shortening your development timeline by building observability in from the start.
Speakers
avatar for Anirudha Jadhav

Anirudha Jadhav

Sr. Engineering Leader, Amazon Web Services
Anirudha is a Senior Manager, Software Development at Amazon Web Services (AWS), leading development of insight engines and visualization platforms for the OpenSearch Project. He specializes in distributed systems, data analytics, and search technologies, including architecting one... Read More →
avatar for Kevin Fallis

Kevin Fallis

Principal Senior Solutions Architect, Amazon Web Services
Kevin Fallis is seasoned leader, architect, and developer with experience across many industry verticals and disciplines such as agriculture, ad tech, financial services, networking, security, telecommunications and of course search technologies. His passion helps others leverage... Read More →
Friday May 22, 2026 2:55pm - 3:20pm CDT
Level One | Ballroom A
  AI and MCP in Observability

3:40pm CDT

Devs, Transform (Your Data) and Roll Out!: Learning and Leveraging OTTL - Reese Lee, New Relic
Friday May 22, 2026 3:40pm - 4:05pm CDT
The OpenTelemetry Collector has emerged as one of the project’s most critical pieces for ingesting and processing your app and infrastructure data, but did you know there’s even more you can do with your data before it reaches your backend?

Enter OTTL, or OpenTelemetry Transformation Language, a domain-specific language that can interact with and modify OTel data. Yes, the Collector already comes with dozens of components that can handle a wide range of data processing, BUT using OTTL in conjunction with the components enables even more powerful data manipulation.

In this session, learn about the benefits of OTTL, when to use it, and how to get started with OTTL. Get ready to explore:
* What OTTL is: A breakdown of the syntax and the underlying architecture within the OTel Collector.
* Why it’s useful: practical strategies for cost reduction (filtering noise), compliance (redacting PII), and standardization (normalizing attributes).
* How to use it: A live walkthrough of writing complex transformation statements for the transform and filter processors.
Speakers
avatar for Reese Lee

Reese Lee

Senior Developer Relations Engineer, New Relic
Reese Lee is a Senior Developer Relations Engineer at New Relic focusing on technical enablement via workshops, blog posts, documentation, and more. She is a Maintainer of the OpenTelemetry End User SIG, where she enjoys learning about interesting use cases and the different ways... Read More →
Friday May 22, 2026 3:40pm - 4:05pm CDT
Level One | Ballroom A
  CNCF Observability Projects

4:10pm CDT

Closing Remarks
Friday May 22, 2026 4:10pm - 4:15pm CDT

Friday May 22, 2026 4:10pm - 4:15pm CDT
Level One | Ballroom A
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Content Experience Level
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -