Loading…
May 21-22, 2026
Learn more and Register to Attend

The Sched app allows you to build your schedule, but is not a substitute for your event registration. You must be registered for Observability Summit North America 2026.

Please note: This schedule is automatically displayed in Central Daylight Time (UTC -5). To see the schedule in your preferred timezone, select from the drop-down menu located at the bottom of the menu to the right.

The schedule is subject to change.
Venue: Level One | Ballroom B clear filter
arrow_back View All Dates
Friday, May 22
 

10:20am CDT

Observing the Observers: Bringing OpenTelemetry to Autonomous AI Agents - Abdel Fane, OpenA2A
Friday May 22, 2026 10:20am - 10:45am CDT
Traditional observability assumes humans operate systems. AI agents break that model—they make autonomous decisions, execute operations without approval, and drift in capability over time. Yet most organizations have zero observability into their AI agent infrastructure.

When developers spin up MCP servers through Claude Desktop or Cursor, security and ops teams are blind. No metrics. No traces. No logs. Just autonomous agents accessing databases, calling APIs, and modifying production systems—completely outside your observability stack.

This talk explores how to instrument AI agents and MCP servers using familiar CNCF tools. We'll cover:

• Why traditional APM fails for autonomous agents (no request/response, emergent behavior, capability drift)
• Detecting anomalies in agent behavior (statistical baselines vs. ML-driven detection)
• Correlating agent actions to business outcomes

You'll see working demos of agent observability plus open-source code for instrumenting LangChain, CrewAI, and custom agents.

Walk away with patterns to extend your existing observability stack to AI agents before they become your biggest blind spot.
Speakers
avatar for Abdel Fane

Abdel Fane

CEO & Founder, OpenA2A
Abdel is a cybersecurity architect with 17+ years of experience securing enterprise environments across healthcare, finance, and government sectors. He has led security initiatives at Grail, Booz Allen Hamilton, Protiviti, and Allstate, specializing in cloud security & DevSecOps.
... Read More →
Friday May 22, 2026 10:20am - 10:45am CDT
Level One | Ballroom B
  AI and MCP in Observability

10:50am CDT

Don't Let Users Find Your Outages: Synthetic Monitoring for Kubernetes Platforms - Kate Agnew, Marriott & David Norton, Platformers
Friday May 22, 2026 10:50am - 11:15am CDT
No platform owner wants to be told their platform is down by a user. A core responsibility of the platform operating model is ensuring a reliable platform for the organization. In practice, it isn't always easy to detect when things are broken, especially when it falls outside of the traditional metrics coverage.

In our work, we adopted synthetic monitoring using Kuberhealthy, a CNCF project, to gain better visibility into whether the Kubernetes platform is operating as a user would expect. Synthetic monitoring allows us to replicate application developer workflows to validate end-to-end functionality of the platform.

Come and learn about implementing synthetics, how to not break things, and broadly how to improve stability with Kubernetes using synthetic monitoring.
Speakers
avatar for Kate Agnew

Kate Agnew

Sr. Director of Platform Engineering, Marriott
Kate Agnew is a Sr Director of Platform Engineering at Marriott, where she manages the enterprise Kubernetes and Service Mesh platform. Prior to Marriott, she held a similar platform leadership role at Optum, and has had multiple other leadership and technology positions at smaller... Read More →
avatar for David Norton

David Norton

President and Principal Consultant, Platformers
David Norton is a founder and principal consultant at Platformers. He has been working in cloud platform engineering since 2016. Prior to that, he worked as an application developer.

David lives in St. Louis Park, MN, and usually enjoys spending time with his family, playing pickleball, reading, and fishing... Read More →
Friday May 22, 2026 10:50am - 11:15am CDT
Level One | Ballroom B

11:20am CDT

Applying Observability to the Internet of Living Things (IoLT) - Sophia Solomon, Elastic
Friday May 22, 2026 11:20am - 11:45am CDT
We see IoT everywhere, from smart fridges to air quality sensors, but what about applying observability to billions of living things? Introducing Meowy, my virtual cat with a full observability stack. In this talk, I'll build a digital pet from scratch in Go, instrument it with OpenTelemetry, and visualize its "life" in real time, live-tracking its habits, moods, and (attempted) escapes.

I'll show how to create a RESTful "cat API," instrument it for tracing, and set up alerting with the ELK stack and Kibana visualizations. We'll cover observability basics (logs, metrics, and traces), how to apply them to our digital pet, how to structure telemetry data for "living" systems using AI tools, and how to query all our cat stats with an MCP-connected AI agent. By the end, we'll calculate the average MPH (meows per hour) and expand our understanding of observability applications. No prior observability experience required—just some Go basics and a love for any living thing, from feline to fungal!
Speakers
Friday May 22, 2026 11:20am - 11:45am CDT
Level One | Ballroom B

12:05pm CDT

⚡ Lightning Talk: A Drop-in System To Accelerate Metrics Observability by 100x Using Sketch-based Approximation - Milind Srivastava, Carnegie Mellon University
Friday May 22, 2026 12:05pm - 12:15pm CDT
Metrics observability workloads are growing in scale, resulting in (a) higher cost to operate observability infrastructure, and (b) slower query latencies.

The usual approaches to deal with these are:
- sample data
- roll up data
- reduce data cardinality
- send less queries

All of these approaches compromise the coverage of the observability infrastructure and can result in missing important anomalous behavior.

Through our research, we have developed a radically new approach to achieve large scale, low cost, and low latency without compromising the coverage of the observability infrastructure.

Our system reduce querying cost and latency by 100x by using 2 key techniques:
- streaming precomputation
- sketch-based approximation

Our system is developed as a drop-in accelerator to an existing Prometheus-Grafana stack, without modifying Prometheus or Grafana.

We will release an open-source prototype of this system in the Q1 2026.
Speakers
avatar for Milind Srivastava

Milind Srivastava

PhD Student, Carnegie Mellon University
Milind Srivastava is a PhD student at Carnegie Mellon University working on re-imagining the design of data analytics pipelines using semantic-preserving summarization, to drastically reduce costs, and increase performance. He is interested in seeing his research get adopted by industry... Read More →
Friday May 22, 2026 12:05pm - 12:15pm CDT
Level One | Ballroom B

12:15pm CDT

[Rescheduled] ⚡ Lightning Talk: GPU-Scanner: Extending CNCF Observability for Multi-GPU AI Workloads - Ritika Gupta, Oracle
Friday May 22, 2026 12:15pm - 12:25pm CDT
As large language models scale across hundreds of GPUs and multi-node AI systems, they’ve become a major operational challenge for infrastructure engineers. Traditional observability tools stop at the node level, leaving GPU health and utilization invisible until workloads fail or budgets spike. Imagine a 25 day training job failing on day 23 because one GPU silently throttled!

In this session, we’ll explore GPU-Scanner, an open-source observability extension for Kubernetes GPU clusters. Built to integrate with Prometheus & OpenTelemetry, GPU-Scanner adds both active and passive GPU health checks, capturing throughput, TFLOPs, memory diagnostics, thermal consistency, and long-run stability metrics.

We’ll demo real-world failure modes like catching a GPU “off the bus” or detecting thermal throttling and show how alerts flow into your existing observability stack. Leave with a practical playbook to proactively validate GPU clusters and maximize reliability and utilization.
Speakers
avatar for Ritika Gupta

Ritika Gupta

Senior SWE - AI Incubations, Oracle
With a knack for transforming chaos into seamless solutions Ritika Gupta creates technologies to bind Kubernetes, Containers and Cloud ecosystem leveraging cloud native tooling. She actively contributes to Kubernetes as an sig-windows member. Her expertise spans container orchestration... Read More →
Friday May 22, 2026 12:15pm - 12:25pm CDT
Level One | Ballroom B

1:25pm CDT

eBPF Application Instrumentation for Java: Challenges, Design, and Real-World Examples - Endre Sara, Causely, Inc & Stephen Lang, Grafana Labs
Friday May 22, 2026 1:25pm - 1:50pm CDT
Java is one of the most widely used languages for enterprise applications. Frameworks such as Spring Boot and Quarkus make observability straightforward when the OpenTelemetry Java agent can be injected.

In many production environments, however, modifying application code or JVM startup parameters is not possible. In these cases, eBPF-based instrumentation enables observability without code changes, but applying eBPF to Java is challenging. JVM abstraction layers, differences across JDK versions, and the diversity of frameworks and libraries complicate generic instrumentation. The problem becomes even harder when applications rely on TLS-encrypted communication such as HTTPS, gRPC, databases, and messaging systems, where payloads are opaque.

This talk explains how the OpenTelemetry eBPF Instrumentation (OBI) project addresses these challenges, covering key design decisions, trade-offs, and current limitations. The discussion is grounded in real-world examples, including Spring Boot services using HTTPS and gRPC, and a Quarkus application with TLS-encrypted PostgreSQL and Kafka, showing what is possible today with agentless Java observability using eBPF.
Speakers
avatar for Stephen Lang

Stephen Lang

Staff Software Engineer, Grafana Labs
Stephen is a Staff Software Engineer on Grafana's Beyla team and an approver for the OpenTelemetry eBPF Instrumentation (OBI) project.
avatar for Endre Sara

Endre Sara

Co-Founder, Causely, Inc
Endre is a Co-Founder of Causely, where he’s building the IT industry’s first causal reasoning. Previously, Endre was VP of Advanced Engineering at Turbonomic. Prior to Turbonomic, Endre was a VP at Goldman Sachs. Endre holds an M.E. in Electrical Engineering from the Technical... Read More →
Friday May 22, 2026 1:25pm - 1:50pm CDT
Level One | Ballroom B
  CNCF Observability Projects

1:55pm CDT

Breaking Free from Vendor Lock-In: Nubank DIY Observability Success - Diego Rocha, AWS & Otavio Valadares, Nubank
Friday May 22, 2026 1:55pm - 2:20pm CDT
Nubank is the largest digital bank outside Asia, operating in Brazil, Mexico, and Colombia, and serving over 120 million customers. As a cloud-native company, Nubank distributed digital environment relies on more than 4,000 microservices, generating nearly 1 petabyte of monitoring logs daily. To better manage this volume and reduce operational costs by over 50%, Nubank recently transitioned from an external vendor to an in-house log platform. In this talk, we'll share the platform architecture and the challenges encountered during the migration journey.
Speakers
avatar for Diego Rocha

Diego Rocha

Sr. Solutions Architect, AWS
avatar for Otavio Valadares

Otavio Valadares

Lead Software Engineer, Nubank
Lead Software Engineer @ Nubank
Friday May 22, 2026 1:55pm - 2:20pm CDT
Level One | Ballroom B

2:25pm CDT

The Legend of Config: Breath of the Cluster - Henrik Rexed, Dynatrace
Friday May 22, 2026 2:25pm - 2:50pm CDT
Configuring Ingress, Gateway API, or service meshes in Kubernetes can feel like exploring an open world without a map : one wrong turn, and traffic vanishes. In this session, we’ll explore how to detect and prevent misconfigurations using OpenTelemetry, eBPF-based instrumentation (OBI), and enriched logs from service meshes and ingress controllers. Like a hero collecting tools to unlock new areas, we’ll show how to identify relevant data sources, parse and process their output, and apply common correlation rules to understand the impact of configuration changes. We’ll demonstrate how these techniques can be applied across observability platforms to reduce tool sprawl and improve operational efficiency. Attendees will leave with a practical, backend-agnostic approach to building a multi-source observability strategy for Kubernetes networking.
Speakers
avatar for Henrik Rexed

Henrik Rexed

Cloud Native advocate & CNCF Ambassador, Dynatrace
Henrik is a Cloud Native Advocate at Dynatrace and a CNCF Ambassador . Prior to Dynatrace, Henrik has worked more than 15 years, as Performance Engineer. Henrik Rexed Is Also one of the Organizer of the conferences named WOPR, KCD Austria and the owner of the Youtube Channel Isit... Read More →
Friday May 22, 2026 2:25pm - 2:50pm CDT
Level One | Ballroom B

2:55pm CDT

Let Them Eat Bugs: Practical Showcase of Agentic Issue Resolution - May Walter, Hud
Friday May 22, 2026 2:55pm - 3:20pm CDT
What if we could move a big chunk of bug fixing and solving production issues to agentic AI? That would be so cool. In this talk we will go through the end to end process of setting up a background agentic workflow that detects production errors, finds their root causes, assesses the right solution and opens a PR - so you wake up in the morning to tasks almost fully completed for you by your loyal agent.

Together we will dive into the entire process of setting up this system that is currently running in real production environments - understanding the different tools, the infra challenges, the agentic accuracy spectrum, and more…
Speakers
avatar for May Walter

May Walter

Co-Founder & CTO, Hud
May Walter is a software engineer, researcher, entrepreneur and serial CTO. She is currently Co-Founder and CTO of Hud, building a Runtime Code Sensor to bridge the gap between coding agents and production. Before Hud she was a founding CTO at Santa, and CTO at Bond (acquired by REEF... Read More →
Friday May 22, 2026 2:55pm - 3:20pm CDT
Level One | Ballroom B
  AI and MCP in Observability

3:40pm CDT

Inside the Telemetry Data Plane: Constraints, Tradeoffs, and Scale - Eduardo Silva & José Lecaros, Chronosphere | A Palo Alto Networks Company
Friday May 22, 2026 3:40pm - 4:05pm CDT
Modern telemetry systems often struggle not because of missing features, but because of hidden constraints in how data is buffered, scheduled, and moved through the system. This session explores the practical realities of building a telemetry data plane that must operate under extreme throughput, tight latency budgets, and strict resource limits.

Using real-world experience from developing a high-performance open source telemetry agent, we’ll examine how design tradeoffs around buffering, concurrency, and I/O shape system behavior at scale. Topics include user-space serialization strategies, adaptive buffering models, memory-mapped persistence, and multithreaded I/O coordination, along with how these choices interact with core Linux primitives such as epoll, asynchronous I/O, and zero-copy techniques.

Rather than focusing on APIs or products, this talk dives into the mechanics and constraints that determine whether a telemetry system remains predictable under load. The discussion is grounded in production lessons learned from operating at billions of events per minute and highlights patterns that apply broadly to collectors, agents, and streaming systems.
Speakers
avatar for José Lecaros

José Lecaros

Support Engineer, Chronosphere | A Palo Alto Networks Company
He works as a Support Engineer at Chronosphere, helping both customers and the Fluent community. He's been a developer and support engineer for 20+ years.
avatar for Eduardo Silva

Eduardo Silva

Distinguished Engineer, Chronosphere | A Palo Alto Networks Company
Eduardo is an entrepreneur and Software Engineer. He is one of Fluentd project maintainers and creator of Fluent Bit, a lightweight Logs, Metrics, and Traces processor.
Friday May 22, 2026 3:40pm - 4:05pm CDT
Level One | Ballroom B

4:10pm CDT

[CANCELLATION] The Missing Layer in eBPF Observability: Storage - Kritik Sachdeva, IBM
Friday May 22, 2026 4:10pm - 4:35pm CDT
Modern observability has embraced eBPF for profiling CPU usage and tracing network paths in production systems. Yet one critical layer remains largely under-instrumented: storage. Despite being a frequent source of performance issues, storage I/O is still treated as a black box, especially in cloud native environments.

This talk we will walk through the basic storage I/O path in Linux and Kubernetes, highlight where traditional metrics fall short, and discuss the kinds of storage latency and wait signals that eBPF can surface at runtime without requiring kernel modifications or specialized debugging setups.

Using simple examples, the session will show how hidden storage latency and queuing effects surface in real workloads, and why these blind spots become more visible with data-intensive and AI workloads where applications or GPUs often wait on storage without clear indicators.

By the end of this talk, attendees will gain a practical understanding of where storage observability breaks down today, what eBPF can realistically help uncover at a foundational level, and how to reason about storage-related performance issues alongside CPU and networking metrics.
Speakers
avatar for kritik sachdeva

kritik sachdeva

Technical Support Professional, IBM
I’m Kritik Sachdeva, currently working as a Support Professional at IBM. I’ve been working with Ceph & OpenShift for the past 5 years, and since college I had a great interest in technologies like K8s, containers, or Ceph.

Since then, I’ve enjoyed exploring how different... Read More →
Friday May 22, 2026 4:10pm - 4:35pm CDT
Level One | Ballroom B
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Content Experience Level
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -