Loading…
May 21-22, 2026
Learn more and Register to Attend

The Sched app allows you to build your schedule, but is not a substitute for your event registration. You must be registered for Observability Summit North America 2026.

Please note: This schedule is automatically displayed in Central Daylight Time (UTC -5). To see the schedule in your preferred timezone, select from the drop-down menu located at the bottom of the menu to the right.

The schedule is subject to change.
Company: Any clear filter
Thursday, May 21
 

11:50am CDT

⚡ Lightning Talk: Beyond Billions: Operating Thanos, Prometheus & OpenTelemetry at Trillion-Scale - Narendra Sanikommu, Nvidia
Thursday May 21, 2026 11:50am - 12:00pm CDT
Operating a metrics system beyond billions of data points introduces failure modes that don't exist at smaller deployments. This lightning talk shares battle-tested lessons from running, Thanos, Prometheus and OpenTelemetry in production across distributed Kubernetes environments, focusing on three critical challenges: implementing multi-tenancy without noisy neighbor problems, building rate limiting that prevents a single tenant from destabilizing the cluster, and isolating query workloads so expensive queries don't starve metric ingestion.

The talk walks through real incidents where these challenges caused production impact, including 5xx errors on Thanos Receivers from unbounded queries, Prometheus remote write lag and partial query results from overwhelmed Store Gateways. For each problem, the talk presents custom solutions developed—including tenant-aware rate limiting middleware and workload isolation patterns—and shares concrete configuration approaches that attendees can apply to their own deployments.
Attendees will leave with actionable techniques for scaling their observability infrastructure to trillion-scale while maintaining reliability under load.
Speakers
avatar for Narendra Sanikommu

Narendra Sanikommu

Senior Software Engineer, Nvidia
Experienced software engineer who is passionate about solving complex software engineering challenges. With around 14 years of experience in software engineering – has a strong foundation in building and optimizing high-performance systems particularly in Observability, Big Data... Read More →
Thursday May 21, 2026 11:50am - 12:00pm CDT
Level One | Ballroom A
  Scalability Challenges and Solutions
  • Content Experience Level Any

1:55pm CDT

Taming Tenancy, Cost and Architecture at Collibra Through OpenTelemetry and Our Telemetry Backbone - Alex Van Boxel, Collibra
Thursday May 21, 2026 1:55pm - 2:20pm CDT
Operating a SaaS platform presents the same observability problems as any other enterprise, but due to the scale and tenancy, we introduce a huge multiplier on the observability signals, having an effect on cost and effectiveness.

This session dives into the techniques Collibra used to tame these problems and how to maintain clarity when infrastructure spans virtual machines, modern Kubernetes clusters, and a complex mix of single- and multi-tenant architectures. Without the right context, telemetry data becomes a noisy, indistinguishable flood.

We will dive into the architectural decision to leverage the C4 system model, ensuring every piece of telemetry carries the vital context of what it belongs to and where it sits in the hierarchy. Enabling us to gain insights into both signal attribution and allowing virtual chargebacks. The presentation details the implementation of a pipeline using custom-built OpenTelemetry collectors designed to handle the data and enrich it before sending it to the appropriate backends.

This session will give you practical insights on the challenges SaaS platforms have, but the techniques that are used to tame them can be applied everywhere.
Speakers
avatar for Alex Van Boxel

Alex Van Boxel

Principal System Architect, Collibra
Alex Van Boxel is a Principal System Architect at Collibra. With an engineering background in R&D at Alcatel-Lucent, Progress Software, and Veepee, he loves to focus on the fundamental building blocks of the software industry. That means reading, understanding, and contributing to... Read More →
Thursday May 21, 2026 1:55pm - 2:20pm CDT
Level One | Ballroom A
  End-User Case Studies
  • Content Experience Level Any

2:25pm CDT

The Speed of Metrics, the Fidelity of Traces: Architecting Post-Collection Aggregation - Zack Owens, New Relic
Thursday May 21, 2026 2:25pm - 2:50pm CDT
As organizations adopt observability practices, they face a scalability paradox: systems now generate petabytes of traces and logs, but querying this raw telemetry over long time horizons becomes prohibitively slow and expensive due to the data volume.

The standard solution of pre-aggregating high-cardinality telemetry into metrics at collection time through features in the OpenTelemetry collector works well for known patterns but fails when engineers need to ask new questions about historical data. This creates an uncomfortable choice for engineers and operators: fast dashboards with pre-aggregated metrics, or high-fidelity traces and logs that become unusable beyond short time windows.

This talk presents a post-collection aggregation approach that enables fast queries over long time periods of detailed telemetry without changes to collector-side configuration. This session explores techniques for incremental view materialization that work with timeseries data. Attendees will leave with concrete architectural patterns which are applicable to open source databases like ClickHouse or OpenSearch to answer novel questions without sacrificing query speed or data fidelity.
Speakers
avatar for Zack Owens

Zack Owens

Principal Software Engineer, New Relic
Zack Owens is a Principal Engineer and Architect at New Relic, focusing on the data platform and NRDB, a purpose-built timeseries database for observability.
Thursday May 21, 2026 2:25pm - 2:50pm CDT
Level One | Ballroom A
  Scalability Challenges and Solutions
  • Content Experience Level Any

3:40pm CDT

From Data Dumps To Smart Context: Building MCP Servers That AI Can Actually Use - Thomas Johnson, Multiplayer
Thursday May 21, 2026 3:40pm - 4:05pm CDT
Most MCP servers fail the same way: they expose observability data without understanding what AI models need to reason effectively. The result? Tools that overwhelm models with metrics, miss critical context, and introduce unnecessary security exposure.

At Multiplayer, we built an MCP server to give AI coding assistants access not just to production telemetry but to full stack data: frontend screens and data, backend traces, logs, and request/response content and headers. What we learned challenges the "more data is better" assumption that drives most integrations.

This talk shares the hard lessons from moving an MCP server into production. You'll learn why filtered, intent-driven context outperforms comprehensive data access, how to design tools that align with developer workflows rather than API surfaces, and the security trade-offs that matter when LLMs query your observability stack.

We'll cover practical design patterns for MCP servers in the observability space: scoping data by blast radius, surfacing relationships over raw metrics, and handling authentication without compromising developer experience. This talk is about what works when AI meets production systems.
Speakers
avatar for Thomas Johnson

Thomas Johnson

CTO and Co-founder, Multiplayer
Co-founder and CTO at Multiplayer, with 20+ years of experience as a backend developer building large-scale distributed software (and robots!)
Thursday May 21, 2026 3:40pm - 4:05pm CDT
Level One | Ballroom B
  AI and MCP in Observability
  • Content Experience Level Any
 
Friday, May 22
 

10:50am CDT

Show Me the Receipts: A Forensic Hunt for Observability - Mostafa Radwan, Datadog
Friday May 22, 2026 10:50am - 11:15am CDT
Today, observability platforms can process massive volumes of telemetry, but practitioners struggle to determine what matters during incidents, unnecessarily increasing usage bills.

This talk resolves the question: “Which telemetry data should we keep?” Learn how one team achieved 30% log reduction by flipping the script and asking “what did we actually use?” instead of “what should we collect?” They conducted a forensic audit of incident resolutions to find receipts proving which data sources truly mattered.

You’ll learn techniques for tracing backward from resolved incidents to identify which telemetry is deemed valuable and see how to map incidents to telemetry data that enabled resolution, revealing which sources proved critical, redundant, or unused.

Using OpenTelemetry (OTel) and Vector, an open-source tool for building fast and scalable observability pipelines, this approach provides a replicable pattern that the community can adapt across different environments.

You’ll leave with a framework for measuring telemetry value based on usage patterns, plus a repeatable audit process. The key question: “Where are the receipts?”
Speakers
avatar for Mostafa Radwan

Mostafa Radwan

Senior Solutions Engineer, Datadog
Mostafa is a technologist specialized in cloud native computing, observability, and security.

He started his career as a software engineer before getting in the trenches of application and production support.

He worked as a Solutions Architect at Docker where he helped enterp... Read More →
Friday May 22, 2026 10:50am - 11:15am CDT
Level One | Ballroom A
  Scalability Challenges and Solutions
  • Content Experience Level Any

11:20am CDT

Applying Observability to the Internet of Living Things (IoLT) - Sophia Solomon, Elastic
Friday May 22, 2026 11:20am - 11:45am CDT
We see IoT everywhere, from smart fridges to air quality sensors, but what about applying observability to billions of living things? Introducing Meowy, my virtual cat with a full observability stack. In this talk, I'll build a digital pet from scratch in Go, instrument it with OpenTelemetry, and visualize its "life" in real time, live-tracking its habits, moods, and (attempted) escapes.

I'll show how to create a RESTful "cat API," instrument it for tracing, and set up alerting with the ELK stack and Kibana visualizations. We'll cover observability basics (logs, metrics, and traces), how to apply them to our digital pet, how to structure telemetry data for "living" systems using AI tools, and how to query all our cat stats with an MCP-connected AI agent. By the end, we'll calculate the average MPH (meows per hour) and expand our understanding of observability applications. No prior observability experience required—just some Go basics and a love for any living thing, from feline to fungal!
Speakers
Friday May 22, 2026 11:20am - 11:45am CDT
Level One | Ballroom B
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Content Experience Level
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.