BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME:observabilitysummitna26
X-WR-CALDESC:Event Calendar
METHOD:PUBLISH
CALSCALE:GREGORIAN
PRODID:-//Sched.com Observability Summit North America 2026//EN
X-WR-TIMEZONE:UTC
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260520T130000Z
DTEND:20260520T220000Z
SUMMARY:Early Registration
DESCRIPTION:\n
CATEGORIES:REGISTRATION
LOCATION:Level One | Ballroom Lobby\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:ea696af13b9ae6469340c687f88e75d4
URL:http://observabilitysummitna26.sched.com/event/ea696af13b9ae6469340c687f88e75d4
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T123000Z
DTEND:20260521T220000Z
SUMMARY:Registration
DESCRIPTION:\n
CATEGORIES:REGISTRATION
LOCATION:Level One | Ballroom Lobby\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:b8854363f3bdd164dd83f1d0c15c6ad9
URL:http://observabilitysummitna26.sched.com/event/b8854363f3bdd164dd83f1d0c15c6ad9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T130000Z
DTEND:20260521T231500Z
SUMMARY:Coat Check
DESCRIPTION:\n
CATEGORIES:COAT CHECK
LOCATION:Level One | Ballroom Lobby\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:e7746e31521d3b8dcd681a74735d01f8
URL:http://observabilitysummitna26.sched.com/event/e7746e31521d3b8dcd681a74735d01f8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T140000Z
DTEND:20260521T141000Z
SUMMARY:Keynote: Welcome + Opening Remarks
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:afc0d7201c0b5f2f11ba287529572ab8
URL:http://observabilitysummitna26.sched.com/event/afc0d7201c0b5f2f11ba287529572ab8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T141500Z
DTEND:20260521T142000Z
SUMMARY:Sponsored Keynote: Zero-Code Observability: Close the Coverage Gaps That Cause Outages - Eden Federman\, Odigos
DESCRIPTION:The outages that hurt most start across multiple vectors: compiled languages\, third-party applications\, legacy services\, hard-to-instrument areas\, and latency-sensitive workloads. In this session\, Odigos co-founder and CTO Eden Federman will talk about how eBPF-based instrumentation with OpenTelemetry output delivers full distributed tracing across every service in your cluster — in minutes\, with no code changes and &lt\;1% overhead.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:193cd18ff90876e23a84c6764bb1b8d8
URL:http://observabilitysummitna26.sched.com/event/193cd18ff90876e23a84c6764bb1b8d8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T142500Z
DTEND:20260521T143000Z
SUMMARY:Sponsored Keynote: The Work Before the Magic: Autoremediation Readiness - Alok Bhide\, Chronosphere | A Palo Alto Networks Company
DESCRIPTION:The pitch for autoremediation is hard to resist: AI doesn't just surface issues faster — it fixes them on the spot\, leaving you to kick back\, validate\, and observe. MTTR doesn't just shrink\; it becomes a relic. Problems vanish before anyone even notices they existed.\n\nBut rush into it without solid data\, proper curation\, and clear policy\, and you're pulling a tap with too much pressure — nothing but foam\, no beer.\n\nClosed-loop remediation isn't a shortcut. It's the payoff at the end of a disciplined\, AI-driven observability practice.\n\nIn this talk\, we'll walk through the three things that make autoremediation actually work:\n\nSystem coverage that holds up at real scaleData that's clean\, navigable\, and actionableGround rules for what AI is — and isn't — allowed to do\n\nYou'll walk away with a practical readiness checklist and a clear framework for deciding where autoremediation belongs in your stack\, and where it definitely doesn't.\n\nNo hype. Just the work that earns AI the right to act in production.\n\n\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:2a7175ca3d12fba217a1b0c76aa9140e
URL:http://observabilitysummitna26.sched.com/event/2a7175ca3d12fba217a1b0c76aa9140e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T143500Z
DTEND:20260521T150000Z
SUMMARY:Keynote: 10 Million Spans Per Second: Lessons From Scaling OpenTelemetry at Reddit - Trevor Riles\, Reddit
DESCRIPTION:Reddit processes over 25 billion tracing events per hour across thousands of services. In this talk\, we share how we scaled our OpenTelemetry-based distributed tracing platform by 67% in one year—and what broke along the way. \n \n We'll cover our architecture: OpenTelemetry instrumentation across Python\, Go\, and JavaScript baseplate libraries feeding into Kafka pipelines and ClickHouse storage. You'll learn how we handled an incident that spiked ingestion to well over 10 million spans per second\, the sampling strategies we developed to balance cost with debuggability\, and why instrumenting three language runtimes simultaneously is harder than it sounds. \n \n Key takeaways: \n - Practical patterns for multi-language OTel instrumentation at scale \n - Remote sampling strategies that adapt to traffic patterns \n - ClickHouse schema design for sub-second trace queries \n - Building adoption through cross-functional partnerships\, not mandates \n \n Whether you're starting your tracing journey or scaling an existing platform\, this talk provides battle-tested lessons from running distributed tracing infrastructure serving one of the world's largest online communities.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:594c45c288ca3e09bd7a5faf25afa6a8
URL:http://observabilitysummitna26.sched.com/event/594c45c288ca3e09bd7a5faf25afa6a8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T150000Z
DTEND:20260521T152000Z
SUMMARY:Coffee + Networking Break
DESCRIPTION:Menu:\nAssorted Scones (GF)&nbsp\;\nBlueberry Maple Overnight Oats (v\, GF)&nbsp\;\nAssorted Fruit Yogurts\, including Dairy Free and Greek Yogurts
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A+B Foyer\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:58bf93250a9e2fd83e97f402fce6f4d4
URL:http://observabilitysummitna26.sched.com/event/58bf93250a9e2fd83e97f402fce6f4d4
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T152000Z
DTEND:20260521T154500Z
SUMMARY:The Invisible Tax: How Data Format Conversions Drive up Telemetry Pipeline Costs - Cijo Thomas & Joshua MacDonald\, Microsoft
DESCRIPTION:Telemetry signals traverse long pipelines before reaching observability backends. While enrichment\, filtering\, and redaction provide clear value\, significant compute cost often comes from repeated conversion through different data formats. \n Telemetry commonly flows through SDK formats\, wire protocols\, collector‑internal formats\, and backend ingestion schemas. Each boundary introduces marshaling\, unmarshalling and copying. These transformations add no new information\, yet consume CPU and memory and scale linearly with volume—creating a hidden "transform tax" that compounds dramatically at terabyte scale. \n This talk will share results from measuring instrumented OpenTelemetry SDK and Collector pipelines. We quantify compute spent on pure format conversion versus value‑generating processing and show how these costs grow with scale. \n Attendees will learn about conversion costs and strategies to reduce waste: eliminating unnecessary translations\, aligning pipeline representations\, leveraging zero‑copy techniques\, and minimizing transformation hops between pipeline stages. We also examine Apache Arrow‑based representations as one approach to reducing this overhead.
CATEGORIES:CNCF OBSERVABILITY PROJECTS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:f0a63251c898b6743cda4d35d7599535
URL:http://observabilitysummitna26.sched.com/event/f0a63251c898b6743cda4d35d7599535
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T152000Z
DTEND:20260521T154500Z
SUMMARY:[CANCELLATION] Scaling a Proprietary-to-OpenTelemetry Migration With AI-Assisted\, Spec-Driven Workflows - Ying Mo & Paras Kampasi\, IBM
DESCRIPTION:This talk presents a practical methodology for migrating a large proprietary observability platform to an OpenTelemetry-native architecture\, using a GenAI-assisted workflow paired with a robust spec-driven strategy. Faced with hundreds of custom Java-based sensors\, the engineering team designed a spec-driven conversion process that leverages GenAI to extract specifications\, generate unit tests\, and assist in implementing Go-based OpenTelemetry receivers. Each stage incorporates human review and test feedback loops to address the reliability limitations of GenAI and ensure functional correctness. \n \n Additionally\, a data-driven feasibility evaluation was conducted prior to large-scale conversion\, where defined task types were benchmarked with and without GenAI to quantify effort savings and highlight where GenAI provides the greatest value. \n \n Attendees will learn a reproducible workflow for large-scale migrations from proprietary to OpenTelemetry\, how to pair GenAI with automated testing to manage risk\, and insights on where GenAI accelerates real-world engineering tasks without compromising quality.
CATEGORIES:END-USER CASE STUDIES
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:53ba266c8915809731a5eb86f97cb36d
URL:http://observabilitysummitna26.sched.com/event/53ba266c8915809731a5eb86f97cb36d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T155000Z
DTEND:20260521T161500Z
SUMMARY:OpenTelemetry GenAI in Practice: What the Spec Says Vs. What You Actually See - Zach Groves\, Datadog
DESCRIPTION:OpenTelemetry’s GenAI semantic conventions are evolving quickly. Version 1.37 marked a major shift in how LLM behavior is expressed using standard spans and attributes. While later releases refined and clarified the spec\, real-world adoption remains uneven\, and “GenAI-compatible” can mean very different things across the ecosystem. \n \n In this talk\, I’ll share hands-on lessons from implementing and validating GenAI support in real emitters\, including close collaboration with Strands. Implementing the 1.37 spec on both sides surfaced semantic ambiguities that only became clear in practice and ultimately led to stronger implementations. \n \n I’ll also outline the current GenAI instrumentation landscape: Strands emitting 1.37+ compliant spans\; OpenLLMetry\, which mixes newer conventions with legacy and custom attributes\; and OpenInference\, which claims OpenTelemetry compatibility but does not emit GenAI semantic convention attributes. \n \n Finally\, I’ll show how these gaps surface in practice—teams believing they emit 1.37-compliant telemetry but sending pre-1.37 or non-spec data—and briefly touch on transition guidance like OTEL_SEMCONV_STABILITY_OPT_IN.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:37fcb3b9f97a9eda096a0cd053d5cbce
URL:http://observabilitysummitna26.sched.com/event/37fcb3b9f97a9eda096a0cd053d5cbce
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T155000Z
DTEND:20260521T161500Z
SUMMARY:Taming Observability at Scale in a Multi-Cluster Kubernetes Platform at Bloomberg - Joe Nathan Abellard\, Bloomberg
DESCRIPTION:Bloomberg runs a managed\, multi-cluster Kubernetes platform built atop Karmada to support AI and streaming analytics workloads. This comes with challenges around observability at scale. To meet disaster recovery requirements\, we use a multi-region architecture where each Karmada control plane is hosted on management clusters spanning multiple regions. This helps ensure high availability\, but also adds complexity related to observability. For example\, how do we aggregate and visualize metrics across multiple Prometheus servers when each management cluster has a dedicated Prometheus setup? This talk covers our multi-region architecture to meet DR requirements and our Prometheus stack with Thanos for global metrics aggregation. We’ll explore how we choose the right signals and define meaningful alerts in a complex multi-cluster environment to curb alert fatigue\, while ensuring timely issue detection. We’ll also discuss the challenges of defining SLIs and SLOs in a multi-tenant platform.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:66fa98dde3a0a3f57911a7454128d074
URL:http://observabilitysummitna26.sched.com/event/66fa98dde3a0a3f57911a7454128d074
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T162000Z
DTEND:20260521T164500Z
SUMMARY:AI-Powered Root Cause Analysis at Scale: From Theory To Production Lessons From Nubank's 120M+ Cus - Letícia Mota & Yevgeny Gladun\, Nubank
DESCRIPTION:This session presents an AI-powered SRE Agent designed to autonomously orchestrates complex\, multi-source investigations by querying internal observability providers and knowledge bases. \n A primary focus is the "Data Volume Problem." Modern observability systems generate terabytes of metrics and logs daily\; at Nubank’s scale\, the Prometheus MCP alone has more than 23\,000 metrics available\, while log queries can span billions of rows. The team overcame LLM context limits through on-premises data filtering\, intelligent summarization\, and selective context assembly. This architecture utilizes "Expert Guides" to reduce 23\,000 raw metrics to approximately 14 relevant data points before LLM processing. \n The talk covers multi-source orchestration using the Model Context Protocol (MCP) for pluggable tool discovery\, allowing the AI to progressively load and correlate only the observability sources. \n The platform enables the delivery of expert instructions for any specific scenario through targeted\, versioned prompts. This transformation allows the platform to scale across the enterprise\, performing virtually any investigative task beyond its original root cause analysis mission.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:821f5ba43b63a93d3597757395f183ec
URL:http://observabilitysummitna26.sched.com/event/821f5ba43b63a93d3597757395f183ec
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T162000Z
DTEND:20260521T164500Z
SUMMARY:Quantiles at Scale: Choosing the Right Estimation Algorithms for Observability - Mike Shi\, ClickHouse
DESCRIPTION:Quantiles like p90 and p99 sit at the heart of observability. They define dashboards\, drive SLOs\, and shape how teams reason about system performance. They are also some of the most expensive metrics to compute\, and the cost grows fast as data volumes increase. \n To keep up\, observability systems rely heavily on approximate quantile algorithms such as sketches and probabilistic data structures\, including t-digest. These approaches work well at small and medium scale\, but at tens or hundreds of petabytes\, things start to creak and limitations become apparent. \n We share hard won lessons from operating ClickHouse at extreme scale\, where quantile estimation must remain accurate and affordable over hundreds of petabytes of data. We break down the most common quantile algorithms used in observability today\, explain their real trade offs\, and show when each approach makes sense. We also explore a critical design decision: when quantiles should be computed on the fly at query time versus pre aggregated during ingestion. \n The goal is to give you a practical framework for choosing quantile algorithms that scale\, rather than blindly relying on defaults that stop working as your data grows.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:562d7e98c1667e147a983dcfd72422a7
URL:http://observabilitysummitna26.sched.com/event/562d7e98c1667e147a983dcfd72422a7
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T165000Z
DTEND:20260521T170000Z
SUMMARY:⚡ Lightning Talk: Summarizing the Noise: LLM Observability With Open Data Hub\, VLLM\, KServe and Prometheus - Twinkll Sisodia\, Red Hat
DESCRIPTION:As large language models (LLMs) move into production\, raw metrics alone aren’t enough. This talk presents an open-source AI observability solution built on Open Data Hub (ODH) that deploys LLMs using vLLM and KServe\, scrapes inference metrics using Prometheus\, and feeds them into a summarization model to generate actionable insights. We’ll demonstrate a working UI that translates low-level metrics like latency\, GPU usage\, and token throughput into human-readable summaries—giving platform teams an intelligent way to monitor LLMs at scale. No dashboards to interpret—just straight answers from your models about your models.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:e106b2f55bafb4ad67e9990df544a7c5
URL:http://observabilitysummitna26.sched.com/event/e106b2f55bafb4ad67e9990df544a7c5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T165000Z
DTEND:20260521T170000Z
SUMMARY:⚡ Lightning Talk: Beyond Billions: Operating Thanos\, Prometheus & OpenTelemetry at Trillion-Scale - Narendra Sanikommu\, Nvidia
DESCRIPTION:Operating a metrics system beyond billions of data points introduces failure modes that don't exist at smaller deployments. This lightning talk shares battle-tested lessons from running\, Thanos\, Prometheus and OpenTelemetry in production across distributed Kubernetes environments\, focusing on three critical challenges: implementing multi-tenancy without noisy neighbor problems\, building rate limiting that prevents a single tenant from destabilizing the cluster\, and isolating query workloads so expensive queries don't starve metric ingestion. \n \n The talk walks through real incidents where these challenges caused production impact\, including 5xx errors on Thanos Receivers from unbounded queries\, Prometheus remote write lag and partial query results from overwhelmed Store Gateways. For each problem\, the talk presents custom solutions developed—including tenant-aware rate limiting middleware and workload isolation patterns—and shares concrete configuration approaches that attendees can apply to their own deployments. \n Attendees will leave with actionable techniques for scaling their observability infrastructure to trillion-scale while maintaining reliability under load.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:320dbe20509c0a17cf15dfe00f9a583b
URL:http://observabilitysummitna26.sched.com/event/320dbe20509c0a17cf15dfe00f9a583b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T170500Z
DTEND:20260521T171500Z
SUMMARY:⚡ Lightning Talk: From Collector To Terminal: A Better Way To See Your OpenTelemetry Logs - Jon Reeve\, ControlTheory
DESCRIPTION:The OpenTelemetry Collector is powerful\, but the "debug exporter" only shows raw output. What if you could see your OpenTelemetry logs - with structure\, filters\, and context - right in your terminal? \n \n This talk introduces Gonzo\, an open-source\, OTLP-native terminal UI that visualizes logs from the Collector or any OTLP-capable source in real time. Learn how to validate both source instrumentation\, and Collector pipelines - including components like filelog\, k8sattributes\, and transform - without a backend. \n \n Whether debugging\, testing configs\, or teaching OTel\, Gonzo offers a faster\, clearer way to understand your telemetry as it flows. \n \n Key Takeaways: \n - Validate source instrumentation and Collector pipelines end-to-end \n - See enriched OTel logs with structure and context in the terminal \n - Debug and iterate on OTel configs faster - no backend required
CATEGORIES:THE FUTURE OF OPEN SOURCE OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:c940b8e3c90f01b6f898aaf5d299aff5
URL:http://observabilitysummitna26.sched.com/event/c940b8e3c90f01b6f898aaf5d299aff5
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T171500Z
DTEND:20260521T181500Z
SUMMARY:Lunch
DESCRIPTION:Menu:&nbsp\;MinneSalad: Romaine\, Baby Lettuce Greens\, Purple Cabbage\, Carrot Shreds\,Honey-Clover Gouda\, Sweet and Spicy Pepitas\, Cucumber\,Shredded Daikon\, Red Peppers\, Blueberry Balsamic Vinaigrette (vg\, gf)Sautéed Beef Tips\, Wild Rice\, Carrots\, Celery\, Onions\, Mushrooms\, Topped with Cheddar Cheese and Crispy Tater TotsWild Rice Hot Dish Plant-Based Ground Beef\, Wild Rice\, Carrots\, Celery\, Onions\, Mushrooms (vg)Wild Rice Cakes with Roasted Red Pepper Sauce\, Roasted Brussel Sprout Medley (ve)Homemade Dinner RollsAssorted Miniature Bundt Cakes
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:975c1bbc37c937719e261369f6d830e1
URL:http://observabilitysummitna26.sched.com/event/975c1bbc37c937719e261369f6d830e1
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T181500Z
DTEND:20260521T185000Z
SUMMARY:Panel: Telemetry That Matters - Diana Todea\, VictoriaMetrics; Antonio Jimenez Martinez\, Cisco ThousandEyes; Laura Luttmer\, Dynatrace
DESCRIPTION:Instrumentation has never been easier\, but are we truly gaining clarity? As data volumes rise\, dashboards multiply\, and observability costs increase\, developers may feel less insight and more friction. Are we collecting telemetry with purpose or just because we can? What problem is this data meant to solve? \n This panel brings together practitioners across open standards\, developer experience and real-world reliability engineering. The discussion will examine how zero code instrumentation affects workflows and system understanding\, how meaningful telemetry improves day to day engineering work and why unfiltered or unstructured data often has the opposite effect. The conversation will cover practical lessons for filtering\, dropping\, reducing and shaping telemetry so teams maintain visibility without unnecessary volume or cost. Finally\, we explore scaling observability across fleets of collectors with an OpAMP server\, ensuring consistent signal delivery and manageability as telemetry grows. \n At the center is a guiding question: What is the purpose of the telemetry we collect and how do we ensure it remains aligned with developer needs\, operational requirements\, and system reliability?
CATEGORIES:INTEGRATING OBSERVABILITY INTO DEVOPS PRACTICES
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:57b5343c408dd031266fd96cf05d98c3
URL:http://observabilitysummitna26.sched.com/event/57b5343c408dd031266fd96cf05d98c3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T182500Z
DTEND:20260521T185000Z
SUMMARY:Unified End-to-End Observability: How Comcast Generates SpanMetrics at Enterprise Scale - Raghu Vamshi Challa\, Comcast
DESCRIPTION:Enterprises often struggle with the "black box" nature of proprietary APM tools and the high cost of distributed tracing at scale. In this session\, we will demonstrate how Comcast tackled this challenge by migrating 350 critical applications from AppDynamics to a cloud-native OpenTelemetry (OTel) stack\, achieving a truly unified end-to-end observability experience. \n \n We will pull back the curtain on the architecture that powers this migration. Specifically\, we will show how we leveraged the OpenTelemetry Collector to generate Request\, Error\, and Duration (R.E.D.) metrics from trace data using the SpanMetrics connector. A key highlight will be our unique deployment of Conduit\, which serves as a resilient transport layer to ensure data integrity and effective load balancing in a high-volume environment. \n \n Attendees will leave with a blueprint for breaking free from APM vendor lock-in. To help the community fast-track this transition\, we will also be sharing and walking through our reusable\, battle-tested Grafana dashboards that can be leveraged by any enterprise.
CATEGORIES:END-USER CASE STUDIES
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:f960b0822fd35c18bbb59f160c1c3cc4
URL:http://observabilitysummitna26.sched.com/event/f960b0822fd35c18bbb59f160c1c3cc4
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T185500Z
DTEND:20260521T192000Z
SUMMARY:Policy as Code Meets OpenTelemetry: The Next Frontier of Observability - Christopher Voisey\, EnforceAuth
DESCRIPTION:Modern observability stacks excel at capturing signals about infrastructure health\, application performance\, and request flows. Yet one critical class of decisions remains largely invisible: authorization. \n In distributed systems\, authorization decisions increasingly determine not only whether an action succeeds\, but if data is accessed\, tools are invoked\, or automated agents are allowed to act. These decisions are often evaluated outside application code using Policy as Code frameworks\, yet their outcomes are rarely observable in a structured\, privacy preserving way. \n In this session\, we explore how Policy as Code\, Open Policy Agent\, and the OpenTelemetry project can be combined to treat authorization decisions as observable events. We examine what it means to observe a decision without logging sensitive inputs\, how decision structure differs from traditional metrics and traces\, and why decision level observability is becoming essential in cloud native and AI driven systems. \n Attendees will leave with a conceptual framework for thinking about authorization as telemetry\, and a clearer understanding of where observability is heading as systems become more autonomous and policy driven.
CATEGORIES:CNCF OBSERVABILITY PROJECTS
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:53346bd9d70ae95ccc8e4ff5228bffc8
URL:http://observabilitysummitna26.sched.com/event/53346bd9d70ae95ccc8e4ff5228bffc8
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T185500Z
DTEND:20260521T192000Z
SUMMARY:Taming Tenancy\, Cost and Architecture at Collibra Through OpenTelemetry and Our Telemetry Backbone - Alex Van Boxel\, Collibra
DESCRIPTION:Operating a SaaS platform presents the same observability problems as any other enterprise\, but due to the scale and tenancy\, we introduce a huge multiplier on the observability signals\, having an effect on cost and effectiveness. This session dives into the techniques Collibra used to tame these problems and how to maintain clarity when infrastructure spans virtual machines\, modern Kubernetes clusters\, and a complex mix of single- and multi-tenant architectures. Without the right context\, telemetry data becomes a noisy\, indistinguishable flood. We will dive into the architectural decision to leverage the C4 system model\, ensuring every piece of telemetry carries the vital context of what it belongs to and where it sits in the hierarchy. Enabling us to gain insights into both signal attribution and allowing virtual chargebacks. The presentation details the implementation of a pipeline using custom-built OpenTelemetry collectors designed to handle the data and enrich it before sending it to the appropriate backends. This session will give you practical insights on the challenges SaaS platforms have\, but the techniques that are used to tame them can be applied everywhere.
CATEGORIES:END-USER CASE STUDIES
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:91e387688e137856e03aa79ad303bf63
URL:http://observabilitysummitna26.sched.com/event/91e387688e137856e03aa79ad303bf63
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T192500Z
DTEND:20260521T195000Z
SUMMARY:Whats the Best Way To Reduce Storage Requirements Without Losing Insights? Push AI To the Edge! - Alex Degitz\, ElastiFlow Inc
DESCRIPTION:During this session we’ll discuss ElastiFlow’s Edge Observability strategy\, which includes an OTel native edge processing node with local DuckDB storage for all OTel signals and an agentic AI system that is model agnostic (we often run it with OpenAI’s gpt-oss-20b)\, exposing its tools through an MCP server. \n \n Instead of just forwarding OTel signals from various Edge collectors\, the signals are analyzed and routing decisions are made. Alerts are sent to the Observability Platform right away\, while logs are stored locally and analyzed for patterns. Instead of forwarding all logs\, we might only care about a few conditions of interest\, often correlated with other signals\, and send these to the Observability Platform\, while less interesting logs can be aggressively aggregated. \n \n With this approach\, we were able to reduce the storage and ingest cost of Observability Platforms by half while actually decreasing the mean time to insight.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:c620af6af5ed4749ba1b4f7b5e66c298
URL:http://observabilitysummitna26.sched.com/event/c620af6af5ed4749ba1b4f7b5e66c298
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T192500Z
DTEND:20260521T195000Z
SUMMARY:The Speed of Metrics\, the Fidelity of Traces: Architecting Post-Collection Aggregation - Zack Owens\, New Relic
DESCRIPTION:As organizations adopt observability practices\, they face a scalability paradox: systems now generate petabytes of traces and logs\, but querying this raw telemetry over long time horizons becomes prohibitively slow and expensive due to the data volume. \n \n The standard solution of pre-aggregating high-cardinality telemetry into metrics at collection time through features in the OpenTelemetry collector works well for known patterns but fails when engineers need to ask new questions about historical data. This creates an uncomfortable choice for engineers and operators: fast dashboards with pre-aggregated metrics\, or high-fidelity traces and logs that become unusable beyond short time windows. \n \n This talk presents a post-collection aggregation approach that enables fast queries over long time periods of detailed telemetry without changes to collector-side configuration. This session explores techniques for incremental view materialization that work with timeseries data. Attendees will leave with concrete architectural patterns which are applicable to open source databases like ClickHouse or OpenSearch to answer novel questions without sacrificing query speed or data fidelity.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:1a73095546d158b89c6f57b59431f527
URL:http://observabilitysummitna26.sched.com/event/1a73095546d158b89c6f57b59431f527
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T195500Z
DTEND:20260521T202000Z
SUMMARY:One Pane to Rule Them All: Uniting the Prometheus Community with OpenSearch Dashboards\, Logs\, and Trace - Anirudha Jadhav and Kevin Fallis\, AWS
DESCRIPTION:As infrastructure scales across regions and clusters\, Prometheus deployments fragment into isolated islands of metrics—disconnected from logs\, traces\, and the dashboards operators actually live in.\n\nThis talk is for the Prometheus community. If you've wrestled with federation sprawl\, alert duplication\, or the gap between your metrics and the rest of your observability story\, this session is for you.\n\nWe'll demonstrate how OpenSearch's distributed data source support lets multiple Prometheus clusters coexist natively alongside logs and traces in a single unified interface\, no data migration\, no parallel stacks.\n\nYou'll learn:Unified querying across Prometheus clustersSLO tracking wired directly into dashboardsApplication management that finally connects the signals your teams have been operating in isolationThis is about completing the observability loop the Prometheus community has always needed\, open\, composable\, and community-driven.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:9a2bfd9fe6bb9d3632e8e45396b013e6
URL:http://observabilitysummitna26.sched.com/event/9a2bfd9fe6bb9d3632e8e45396b013e6
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T195500Z
DTEND:20260521T202000Z
SUMMARY:When the Cloud Fails: Debugging the "Undocumented" - Dhruv Jain\, Gojek (GoTo Group) Indonesia
DESCRIPTION:What happens when a system degrades under high load while all internal metrics remain “green”? At hyperscale\, supporting on-demand services across Southeast Asia’s most populous countries\, a team observed up to a 7% drop in message delivery. The root cause was not application code\, messaging brokers\, or load balancers\, but a hidden limitation deep within a cloud provider’s firewall. This war-story session presents a forensic investigation into a managed cloud load balancer and its interaction with connection-tracking tables. The talk walks through the production cutover that triggered the issue and the targeted load testing that ultimately isolated the failure to cloud infrastructure behavior invisible to standard monitoring. Beyond root cause analysis\, the session focuses on outcomes: how sustained\, evidence-based debugging led the cloud provider to acknowledge the issue—initially labeled a “limitation”—and introduce a new observability metric\, firewall/connections_tracked. Attendees will leave with a practical framework for debugging black-box cloud failures and identifying the node-level metrics needed to detect silent network drops before they impact users.
CATEGORIES:END-USER CASE STUDIES
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:d30ece4a25799139da73fdce7adef01b
URL:http://observabilitysummitna26.sched.com/event/d30ece4a25799139da73fdce7adef01b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T202000Z
DTEND:20260521T204000Z
SUMMARY:Coffee + Networking Break
DESCRIPTION:Menu:\nRice Crispy Bars (GF)&nbsp\;\nPotato Chips (GF\, Vg) and&nbsp\;French Onion Dip (v\, GF)&nbsp\;\n\n
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A+B Foyer\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:5bbc6e505c2c24208c6803038d605fb3
URL:http://observabilitysummitna26.sched.com/event/5bbc6e505c2c24208c6803038d605fb3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T204000Z
DTEND:20260521T210500Z
SUMMARY:From Data Dumps To Smart Context: Building MCP Servers That AI Can Actually Use - Thomas Johnson\, Multiplayer
DESCRIPTION:Most MCP servers fail the same way: they expose observability data without understanding what AI models need to reason effectively. The result? Tools that overwhelm models with metrics\, miss critical context\, and introduce unnecessary security exposure. At Multiplayer\, we built an MCP server to give AI coding assistants access not just to production telemetry but to full stack data: frontend screens and data\, backend traces\, logs\, and request/response content and headers. What we learned challenges the "more data is better" assumption that drives most integrations. This talk shares the hard lessons from moving an MCP server into production. You'll learn why filtered\, intent-driven context outperforms comprehensive data access\, how to design tools that align with developer workflows rather than API surfaces\, and the security trade-offs that matter when LLMs query your observability stack. We'll cover practical design patterns for MCP servers in the observability space: scoping data by blast radius\, surfacing relationships over raw metrics\, and handling authentication without compromising developer experience. This talk is about what works when AI meets production systems.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:be127177b6188245aa831483c27220a6
URL:http://observabilitysummitna26.sched.com/event/be127177b6188245aa831483c27220a6
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T204000Z
DTEND:20260521T210500Z
SUMMARY:The Full Picture: Visualizing Service "Fullness" To Rethink Saturation Prevention - Tal Nordan\, Independent
DESCRIPTION:Saturation has long been the stepchild of "the Four Golden Signals of SRE". While latency\, traffic\, and errors are directly measurable through metrics like P99\, RPS\, and 5xx rates\, monitoring just how "full" a service is relies on indirect symptoms such as CPU usage or queue depth. Yet\, saturation should ideally rather be the first signal to alert\, as once it's reached\, other signals - latency and errors - spike fast. \n \n The inability to directly observe and mitigate saturation drives excessive safety margins\, chronically low CPU utilization and massive compute waste in latency-sensitive and customer-facing systems. This session introduces an open-source approach extending Envoy proxy and its seamless integration through eBPF and Cilium\, to provide direct observability into service saturation\, by comparing each instance's live number of concurrent requests to its true concurrency limit. We then explore how such direct visualization of saturation can help reduce MTTR and minimize waste.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:f028cbbe09bc30f80cd0ddee9886b580
URL:http://observabilitysummitna26.sched.com/event/f028cbbe09bc30f80cd0ddee9886b580
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T211000Z
DTEND:20260521T213500Z
SUMMARY:Why Are Your AI’s Decisions Hard To Explain: Trace Every Decision With Agentic AI Observability - Dhiraj Kumar Jain & Vikash Agrawal\, Amazon Web Services
DESCRIPTION:Agentic AI systems represent a fundamental shift in software architecture: autonomous agents reason\, plan\, invoke tools\, and orchestrate complex workflows without deterministic control flow. This breaks many assumptions behind traditional observability. \n \n When agents independently make decisions\, failures no longer follow a single request path. How do you debug emergent behavior across multiple agent steps? How do you analyze and control token-driven costs? How do you ensure reliability when outputs are non-deterministic? \n \n This session explores why observability is a first-class requirement in the agentic AI era and how OpenSearch can act as the analytical backbone for understanding autonomous AI systems in production. We will cover practical techniques for instrumenting agent workflows with OpenTelemetry and indexing traces\, logs\, metrics\, and AI decision artifacts into OpenSearch for deep correlation and analysis. \n \n Attendees will learn battle-tested patterns for tracing agent reasoning and tool usage\, investigating failures and hallucinations\, monitoring latency and cost signals\, and building dashboards that make agentic AI systems transparent\, debuggable\, and production-ready.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:ebf69012fcb0bb2b0430814d4f72d9dd
URL:http://observabilitysummitna26.sched.com/event/ebf69012fcb0bb2b0430814d4f72d9dd
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T211000Z
DTEND:20260521T213500Z
SUMMARY:Secure by Design: Rethinking Test Credentials for Synthetic Monitoring - Katie Kodes\, Katie Kodes
DESCRIPTION:Synthetic monitoring and end-to-end testing often require dangerous levels of access to production systems. Last summer\, I nearly emailed my bank details to a team I was training on new testing tools. If I hadn't caught that mistake\, I probably would have dumped them into an OTel collector too. \n \n This session explores the security implications of common testing practices\, and presents practical alternatives that maintain observability without compromising security. \n \n Attendees will learn authentication and authorization patterns to improve test security across the software development lifecycle. \n \n Implementing mitigations like health check endpoints\, synthetic data\, and privilege separation spans the full stack of infrastructure\, development\, monitoring\, and governance. Attendees will leave with a shared vocabulary they can use to align business\, development\, security\, and observability teams on safer test traffic in production.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:c83912fae6234721fe9b6d6d4863e868
URL:http://observabilitysummitna26.sched.com/event/c83912fae6234721fe9b6d6d4863e868
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T214000Z
DTEND:20260521T214500Z
SUMMARY:Closing Remarks
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:e9fe9c500d29b2fc8e0d567a988e2421
URL:http://observabilitysummitna26.sched.com/event/e9fe9c500d29b2fc8e0d567a988e2421
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260521T214500Z
DTEND:20260521T224500Z
SUMMARY:Evening Reception
DESCRIPTION:Join us onsite for drinks and appetizers with fellow attendees.\n\nMenu:\nGourmet Cheese Platter (v)\nFresh Vegetable Crudités Platter - Spinach Dip + Hummus (v)&nbsp\;\nWild Rice Cakes (vg\, gf) &nbsp\;with Red Pepper Sauce&nbsp\;\nFilo Tartlet - Sundried Tomato-Chicken
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A+B Foyer\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:3df7862b17168b23a5e64503faf1109c
URL:http://observabilitysummitna26.sched.com/event/3df7862b17168b23a5e64503faf1109c
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T123000Z
DTEND:20260522T220000Z
SUMMARY:Registration
DESCRIPTION:\n
CATEGORIES:REGISTRATION
LOCATION:Level One | Ballroom Lobby\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:4f1ca5eed24c339c000cc8681612fcb3
URL:http://observabilitysummitna26.sched.com/event/4f1ca5eed24c339c000cc8681612fcb3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T133000Z
DTEND:20260522T221500Z
SUMMARY:Coat Check
DESCRIPTION:\n
CATEGORIES:COAT CHECK
LOCATION:Level One | Ballroom Lobby\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:12ca76cbf278f0135fdc53c51bf83b4d
URL:http://observabilitysummitna26.sched.com/event/12ca76cbf278f0135fdc53c51bf83b4d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T140000Z
DTEND:20260522T140500Z
SUMMARY:Keynote: Welcome Back + Opening Remarks
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:db746fc900c8321a9a720b9f11ea6996
URL:http://observabilitysummitna26.sched.com/event/db746fc900c8321a9a720b9f11ea6996
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T141000Z
DTEND:20260522T141500Z
SUMMARY:Sponsored Keynote: OpenSearch - See Everything: Open Observability for Agentic AI - Anirudha Jadhav\, Amazon Web Services
DESCRIPTION:AI is accelerating software development at an exponential pace\, but we have no idea what our AI systems are actually doing. Agents operate across distributed frameworks. One request spawns dozens of hops with zero visibility. The OpenSearch Observability Stack closes that gap—built for open source contributors\, with a growing focus on developers and operators using these systems every day. Open source. Linux Foundation-governed. One pipeline. Every framework. Every model. Every hop visible. The agentic era deserves open infrastructure\, and we’ll share how this is a step towards building it together.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:8ff97544d6450cea0053bd44ab2eda98
URL:http://observabilitysummitna26.sched.com/event/8ff97544d6450cea0053bd44ab2eda98
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T141500Z
DTEND:20260522T142000Z
SUMMARY:Sponsored Keynote: Datadog - Every Byte Counts: How Protocol Design Shapes the Cost of Observability - Amanda Sopkin\, Datadog
DESCRIPTION:Today\, many organizations are pushing beyond existing limits for telemetry volume. Systems are ever-more distributed and generative AI workloads produce enormous amounts of data. As telemetry volumes grow\, observability pipelines must become more efficient.\n\nAt scale\, telemetry egress directly impacts observability spend. Cloud providers charge per gigabyte of data transferred across regions or providers\, and those bytes add up quickly. The protocol used to encode telemetry determines how much data is sent over the network. Even modest improvements in encoding efficiency (i.e. the protocol) can translate into significant cost savings. However\, the OpenTelemetry Protocol (OTLP) was not initially optimized for performance. Instead\, it prioritized interoperability and easy adoption.\n\nToday the OpenTelemetry community is exploring OTAP\, a new stateful protocol for transmitting OpenTelemetry data based on Apache Arrow. By using columnar encoding and maintaining state throughout a stream\, OTAP avoids repeatedly sending the same metadata\, reducing payload size and network transfer. However\, because OTAP relies on long-lived stateful streams rather than independent requests\, there is additional architectural and operational complexity in its implementation. There are further challenges to larger adoption by the community\; for example\, Apache Arrow support varies significantly across languages.\n\nProtocol design today is critical to efficiently scaling your systems. In this talk we will explore how protocol design affects telemetry egress and overall observability cost. We will go over some strategies for improving encoding efficiency\, compare stateless and stateful approaches\, and discuss the potential benefits and drawbacks of adopting a protocol like OTAP. Join us to learn more about how your protocol decisions can influence your costs over time.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:f25f7747f7471870f78d8394e36c4e6d
URL:http://observabilitysummitna26.sched.com/event/f25f7747f7471870f78d8394e36c4e6d
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T142000Z
DTEND:20260522T144500Z
SUMMARY:Keynote: Tracing the Agent's Mind: Extending OpenTelemetry for Deep MCP Inspection - Mustafa Dayıoğlu\, TUBITAK & Zeyno Dodd\, Conjectura R&D
DESCRIPTION:Production AI agents make thousands of tool-calling decisions daily\, yet observability stops at the model boundary. OpenTelemetry's GenAI semantic conventions capture token counts and latencies—what the LLM processed—but not why an agent selected a specific tool. Research (McKenzie et al.\, 2023) demonstrates inverse scaling: more capable models exhibit unpredictable tool selection patterns. This gap leaves engineers guessing during critical production failures. \n \n We present gen-ai-otel\, an open-source OpenTelemetry extension introducing decision-level telemetry for MCP agents. A new attribute namespace (gen_ai.agent.*) captures tool selection confidence\, session context\, permission scope validation\, and baseline deviations. The zero-sidecar architecture routes telemetry through standard Collector pipelines to existing backends—Jaeger\, Prometheus\, or graph databases—with low overhead and cardinality-aware attributes. \n \n A live demo reconstructs an agent's decision chain\, revealing anomalies invisible to token metrics—reducing decision-debugging time. Attendees leave with: 1) Collector configs\, 2) Grafana dashboards for confidence tracking\, 3) demo code and repo—all Apache 2.0 licensed.
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:9adb202809bd2b55e1a84fc5a492e046
URL:http://observabilitysummitna26.sched.com/event/9adb202809bd2b55e1a84fc5a492e046
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T144500Z
DTEND:20260522T152000Z
SUMMARY:Coffee + Networking Break
DESCRIPTION:Menu:\nSeasonal Fresh Cut Fruit&nbsp\;\nCinnamon-Apple Breakfast Bake (v)&nbsp\;\nGluten Free Muffin (GF)&nbsp\;
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A+B Foyer\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:f6da4da95db7691f5d3644cd5db64ddb
URL:http://observabilitysummitna26.sched.com/event/f6da4da95db7691f5d3644cd5db64ddb
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T144500Z
DTEND:20260522T152000Z
SUMMARY:Last Call for T-Shirts
DESCRIPTION:Please visit the t-shirt swag table if you have not yet picked up your event shirt!&nbsp\;
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A+B Foyer\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:b5c87ef2bf45c8fd13a5ee6554d65884
URL:http://observabilitysummitna26.sched.com/event/b5c87ef2bf45c8fd13a5ee6554d65884
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T152000Z
DTEND:20260522T154500Z
SUMMARY:Exploring Observability with MCP Servers - Tiffany Jernigan\, Grafana Labs
DESCRIPTION:You may have heard of the pillars of observability: metrics\, logs\, traces\, and\, depending on who you ask\, profiles. As systems grow in complexity\, the need to both individually understand and correlate these signals becomes paramount for rapid incident detection\, root cause analysis\, and performance optimization. Yet\, even with advances like OpenTelemetry\, making sense of your own data often requires learning specialized query languages and navigating complex toolchains\, which is a barrier for many users. \n \nWhile AI tools like ChatGPT can offer general advice\, they lack access to your specific observability data. This is where Model Context Protocol (MCP) servers come in. MCP servers provide a standardized way for AI assistants and other tools to securely connect to your observability data\, making it easier to investigate and diagnose issues faster using natural language. \n \nIn this talk\, we’ll cover MCP and demonstrate how to explore your observability data using Grafana MCP\, while also touching on how the same approach can work with other MCP-compatible tools or custom MCP servers.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:2107e816fefe9fbafcb7ad1b67c7e953
URL:http://observabilitysummitna26.sched.com/event/2107e816fefe9fbafcb7ad1b67c7e953
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T152000Z
DTEND:20260522T154500Z
SUMMARY:Observing the Observers: Bringing OpenTelemetry to Autonomous AI Agents - Abdel Fane\, OpenA2A
DESCRIPTION:Traditional observability assumes humans operate systems. AI agents break that model—they make autonomous decisions\, execute operations without approval\, and drift in capability over time. Yet most organizations have zero observability into their AI agent infrastructure. \n \nWhen developers spin up MCP servers through Claude Desktop or Cursor\, security and ops teams are blind. No metrics. No traces. No logs. Just autonomous agents accessing databases\, calling APIs\, and modifying production systems—completely outside your observability stack. \n \nThis talk explores how to instrument AI agents and MCP servers using familiar CNCF tools. We'll cover: \n \n• Why traditional APM fails for autonomous agents (no request/response\, emergent behavior\, capability drift) \n• Detecting anomalies in agent behavior (statistical baselines vs. ML-driven detection) \n• Correlating agent actions to business outcomes \n \nYou'll see working demos of agent observability plus open-source code for instrumenting LangChain\, CrewAI\, and custom agents. \n \nWalk away with patterns to extend your existing observability stack to AI agents before they become your biggest blind spot.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:b22f0e2cae331d23a7145cfc00c6508b
URL:http://observabilitysummitna26.sched.com/event/b22f0e2cae331d23a7145cfc00c6508b
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T155000Z
DTEND:20260522T161500Z
SUMMARY:Don't Let Users Find Your Outages: Synthetic Monitoring for Kubernetes Platforms - Kate Agnew\, Marriott & David Norton\, Platformers
DESCRIPTION:No platform owner wants to be told their platform is down by a user. A core responsibility of the platform operating model is ensuring a reliable platform for the organization. In practice\, it isn't always easy to detect when things are broken\, especially when it falls outside of the traditional metrics coverage. In our work\, we adopted synthetic monitoring using Kuberhealthy\, a CNCF project\, to gain better visibility into whether the Kubernetes platform is operating as a user would expect. Synthetic monitoring allows us to replicate application developer workflows to validate end-to-end functionality of the platform. Come and learn about implementing synthetics\, how to not break things\, and broadly how to improve stability with Kubernetes using synthetic monitoring.
CATEGORIES:INTEGRATING OBSERVABILITY INTO DEVOPS PRACTICES
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:82c6d9e04f39c12603b0ef798ed48df9
URL:http://observabilitysummitna26.sched.com/event/82c6d9e04f39c12603b0ef798ed48df9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T155000Z
DTEND:20260522T161500Z
SUMMARY:Show Me the Receipts: A Forensic Hunt for Observability - Mostafa Radwan\, Datadog
DESCRIPTION:Today\, observability platforms can process massive volumes of telemetry\, but practitioners struggle to determine what matters during incidents\, unnecessarily increasing usage bills. \n \n This talk resolves the question: “Which telemetry data should we keep?” Learn how one team achieved 30% log reduction by flipping the script and asking “what did we actually use?” instead of “what should we collect?” They conducted a forensic audit of incident resolutions to find receipts proving which data sources truly mattered. \n \n You’ll learn techniques for tracing backward from resolved incidents to identify which telemetry is deemed valuable and see how to map incidents to telemetry data that enabled resolution\, revealing which sources proved critical\, redundant\, or unused. \n \n Using OpenTelemetry (OTel) and Vector\, an open-source tool for building fast and scalable observability pipelines\, this approach provides a replicable pattern that the community can adapt across different environments. \n \n You’ll leave with a framework for measuring telemetry value based on usage patterns\, plus a repeatable audit process. The key question: “Where are the receipts?”
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:541f8f14a83bc1247d492a7b21b24c3e
URL:http://observabilitysummitna26.sched.com/event/541f8f14a83bc1247d492a7b21b24c3e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T162000Z
DTEND:20260522T164500Z
SUMMARY:[CANCELLATION] AI Training in Emerging Economies: Building Africa's Largest LLM From the Ground Up - Okikiola Oliyide\, Awarri
DESCRIPTION:N-ATLaS is a multilingual African-language LLM we took from research to production on Kubernetes. This talk shows the end-to-end path we used to make it reproducible\, observable\, and affordable: data + finetune pipelines (artifacts\, seeds\, checkpoints)\, Argo-orchestrated training on mixed GPU pools\, and a serving stack with Triton + KServe tuned for real traffic. I’ll walk through SRE guardrails that mattered for N-ATLaS (SLOs\, golden signals\, error budgets)\, supply-chain hygiene (image signing\, provenance\, model versioning)\, and the levers that cut cost-per-token while improving latency and uptime under pre-emptions. We’ll cover autoscaling\, caching\, model rollout strategies\, and incident playbooks plus what we’d change after thousands of downloads and weeks of live usage. Expect hard-learned patterns\, YAML you can run\, and a plain-English checklist you can lift into your own cluster\; whether you’re serving English or a low-resource language model.
CATEGORIES:CNCF OBSERVABILITY PROJECTS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:c37c4edb0113b8b60e9357094c0c6e9f
URL:http://observabilitysummitna26.sched.com/event/c37c4edb0113b8b60e9357094c0c6e9f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T162000Z
DTEND:20260522T164500Z
SUMMARY:Applying Observability to the Internet of Living Things (IoLT) - Sophia Solomon\, Elastic
DESCRIPTION:We see IoT everywhere\, from smart fridges to air quality sensors\, but what about applying observability to billions of living things? Introducing Meowy\, my virtual cat with a full observability stack. In this talk\, I'll build a digital pet from scratch in Go\, instrument it with OpenTelemetry\, and visualize its "life" in real time\, live-tracking its habits\, moods\, and (attempted) escapes. \n \nI'll show how to create a RESTful "cat API\," instrument it for tracing\, and set up alerting with the ELK stack and Kibana visualizations. We'll cover observability basics (logs\, metrics\, and traces)\, how to apply them to our digital pet\, how to structure telemetry data for "living" systems using AI tools\, and how to query all our cat stats with an MCP-connected AI agent. By the end\, we'll calculate the average MPH (meows per hour) and expand our understanding of observability applications. No prior observability experience required—just some Go basics and a love for any living thing\, from feline to fungal!
CATEGORIES:INTEGRATING OBSERVABILITY INTO DEVOPS PRACTICES
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:9e74044f9cd7c75958742a2cf9ca9cdf
URL:http://observabilitysummitna26.sched.com/event/9e74044f9cd7c75958742a2cf9ca9cdf
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T165000Z
DTEND:20260522T170000Z
SUMMARY:⚡ Lightning Talk: Show Me the Money: Metrics Edition - Brian Davis\, Red Canary
DESCRIPTION:Existing cloud and Kubernetes cost management tools struggle to track expenses at a granular level\, leaving engineers unable to answer critical questions like: How much is one specific customer costing us in DynamoDB usage? Or\, which system component is consuming the most of our Kafka cluster?1 \n \n \n This lightning talk demonstrates how to leverage existing observability frameworks to gain detailed\, low-level cost insights. Attendees will learn basic techniques to instrument standard metrics—such as component name\, customer ID\, and team—with custom labels for fine-grained cost allocation.1 \n \n \n This session includes a practical case study from Red Canary\, who has used this exact methodology for over five years to transform their tactical decision-making and better manage cloud spend. By treating cost allocation as an observability problem\, engineers can provide the finance team with the deep data required for effective resource management.1 \n \n \n Attendees will leave with an actionable plan for implementing a metrics-based cost tracking system (likely with the tooling you already have)\, independent of high-level cloud billing tools\, to drive significant operational efficiency.
CATEGORIES:END-USER CASE STUDIES
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:5fe7573dbcd9a7ca751fa3322db824f3
URL:http://observabilitysummitna26.sched.com/event/5fe7573dbcd9a7ca751fa3322db824f3
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T170500Z
DTEND:20260522T171500Z
SUMMARY:⚡ Lightning Talk: Observability Debt: When Telemetry Stops Telling the Truth - Spoorthi Palakshaiah\, Relevance Lab
DESCRIPTION:This talk introduces observability debt as an operational issue that develops over time in evolving systems. Teams often instrument services early using observability frameworks\, define metrics\, dashboards\, alerts\, and SLOs\, and initially gain confidence in their ability to understand system behavior. However\, production systems rarely remain static. As systems evolve through refactoring\, scaling\, architectural changes\, asynchronous processing\, and organizational shifts. Observability artifacts frequently remain unchanged\, creating a mismatch between what telemetry is assumed to represent and how the system actually behaves. This mismatch\, referred to as observability debt\, does not result from missing data but from telemetry whose meaning has drifted due to unmaintained assumptions\, leading to dashboards that appear healthy\, alerts that lack context\, and slower incident understanding. To make this concrete\, the talk uses a minimal personal system intentionally designed to model common production patterns. Starting from a low-debt state where telemetry reflects user impact\, the system evolves while observability remains static\, resulting in metrics that hide localized failures.
CATEGORIES:INTEGRATING OBSERVABILITY INTO DEVOPS PRACTICES
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:5cb3af996fed82a8bde99cd78a307642
URL:http://observabilitysummitna26.sched.com/event/5cb3af996fed82a8bde99cd78a307642
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T170500Z
DTEND:20260522T171500Z
SUMMARY:⚡ Lightning Talk: A Drop-in System To Accelerate Metrics Observability by 100x Using Sketch-based Approximation - Milind Srivastava\, Carnegie Mellon University
DESCRIPTION:Metrics observability workloads are growing in scale\, resulting in (a) higher cost to operate observability infrastructure\, and (b) slower query latencies. \n \n The usual approaches to deal with these are: \n - sample data \n - roll up data \n - reduce data cardinality \n - send less queries \n \n All of these approaches compromise the coverage of the observability infrastructure and can result in missing important anomalous behavior. \n \n Through our research\, we have developed a radically new approach to achieve large scale\, low cost\, and low latency without compromising the coverage of the observability infrastructure. \n \n Our system reduce querying cost and latency by 100x by using 2 key techniques: \n - streaming precomputation \n - sketch-based approximation \n \n Our system is developed as a drop-in accelerator to an existing Prometheus-Grafana stack\, without modifying Prometheus or Grafana. \n \n We will release an open-source prototype of this system in the Q1 2026.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:b0e7f94f0885a335060246f853333853
URL:http://observabilitysummitna26.sched.com/event/b0e7f94f0885a335060246f853333853
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T171500Z
DTEND:20260522T172500Z
SUMMARY:[Rescheduled] ⚡ Lightning Talk: GPU-Scanner: Extending CNCF Observability for Multi-GPU AI Workloads - Ritika Gupta\, Oracle
DESCRIPTION:As large language models scale across hundreds of GPUs and multi-node AI systems\, they’ve become a major operational challenge for infrastructure engineers. Traditional observability tools stop at the node level\, leaving GPU health and utilization invisible until workloads fail or budgets spike. Imagine a 25 day training job failing on day 23 because one GPU silently throttled! \n \n In this session\, we’ll explore GPU-Scanner\, an open-source observability extension for Kubernetes GPU clusters. Built to integrate with Prometheus & OpenTelemetry\, GPU-Scanner adds both active and passive GPU health checks\, capturing throughput\, TFLOPs\, memory diagnostics\, thermal consistency\, and long-run stability metrics. \n \n We’ll demo real-world failure modes like catching a GPU “off the bus” or detecting thermal throttling and show how alerts flow into your existing observability stack. Leave with a practical playbook to proactively validate GPU clusters and maximize reliability and utilization.
CATEGORIES:COMMUNITY-DRIVEN DEVELOPMENT IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:e1663fef63dcefc059b3fd5fab477a53
URL:http://observabilitysummitna26.sched.com/event/e1663fef63dcefc059b3fd5fab477a53
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T173000Z
DTEND:20260522T182500Z
SUMMARY:Lunch
DESCRIPTION:Menu:\n\nSmoked Turkey-Honey Dijon Wedge\;&nbsp\;Smoked Turkey\, Honey-Dijon Cream Cheese\, Lettuce\, Marble Pumpernickel Focaccia (GF)\nHam & Swiss Wedge\;&nbsp\;Smoked Ham\, Mustard Aioli\, Lettuce\, Egg Focaccia\nRoasted Veggie Wrap (vg)&nbsp\;\nCorn Chowder Soup (v\, GF)\n\nBlueberry Cheesecake (V) and Apple Spice Cake (vg\, GF)
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:06e411d5cea86310eb39ae3add8bb544
URL:http://observabilitysummitna26.sched.com/event/06e411d5cea86310eb39ae3add8bb544
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T182500Z
DTEND:20260522T185000Z
SUMMARY:Beyond Dashboards: Architecting AI Agents for Autonomous Observability - Divya Mahajan\, Amazon & Achin Gupta\, Intuit
DESCRIPTION:The future of observability isn't better dashboards—it's AI agents that reason across metrics\, logs\, and traces alongside your engineering team.\n\nEngineers spend hours correlating signals across Grafana\, Kibana\, and Jaeger\, mentally stitching together what happened and why. What if an agent could do that correlation automatically?\n\nThis session presents a practical architecture for building observability agents that autonomously triage incidents across all three pillars. we'll demonstrate an agent that ingests an alert\, queries metrics\, searches logs\, examines traces\, identifies root causes\, and recommends remediation—while keeping humans in the loop.\n\nWe'll cover:\n\nWhy observability is ideal for agentic AI\nAgent architecture with LangGraph orchestration\nIntegration patterns: MCP\, REST APIs\, and OpenTelemetry\nTool design for metrics\, logs\, and traces\nLive demo: agent triaging a simulated incident\nProduction considerations: reliability\, cost\, guardrails\nAttendees leave with a working reference architecture built on CNCF ecosystem tools (Prometheus\, Jaeger\, Loki\, Grafana). All code is open source.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:4f44e5fa957d79add7cfd3554759de92
URL:http://observabilitysummitna26.sched.com/event/4f44e5fa957d79add7cfd3554759de92
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T182500Z
DTEND:20260522T185000Z
SUMMARY:eBPF Application Instrumentation for Java: Challenges\, Design\, and Real-World Examples - Endre Sara\, Causely\, Inc & Stephen Lang\, Grafana Labs
DESCRIPTION:Java is one of the most widely used languages for enterprise applications. Frameworks such as Spring Boot and Quarkus make observability straightforward when the OpenTelemetry Java agent can be injected. \n \n In many production environments\, however\, modifying application code or JVM startup parameters is not possible. In these cases\, eBPF-based instrumentation enables observability without code changes\, but applying eBPF to Java is challenging. JVM abstraction layers\, differences across JDK versions\, and the diversity of frameworks and libraries complicate generic instrumentation. The problem becomes even harder when applications rely on TLS-encrypted communication such as HTTPS\, gRPC\, databases\, and messaging systems\, where payloads are opaque. \n \n This talk explains how the OpenTelemetry eBPF Instrumentation (OBI) project addresses these challenges\, covering key design decisions\, trade-offs\, and current limitations. The discussion is grounded in real-world examples\, including Spring Boot services using HTTPS and gRPC\, and a Quarkus application with TLS-encrypted PostgreSQL and Kafka\, showing what is possible today with agentless Java observability using eBPF.
CATEGORIES:CNCF OBSERVABILITY PROJECTS
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:8d40e2c40baa81c2b7913b7eeaf921d2
URL:http://observabilitysummitna26.sched.com/event/8d40e2c40baa81c2b7913b7eeaf921d2
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T185500Z
DTEND:20260522T192000Z
SUMMARY:Breaking Free from Vendor Lock-In: Nubank DIY Observability Success - Diego Rocha\, AWS & Otavio Valadares\, Nubank
DESCRIPTION:Nubank is the largest digital bank outside Asia\, operating in Brazil\, Mexico\, and Colombia\, and serving over 120 million customers. As a cloud-native company\, Nubank distributed digital environment relies on more than 4\,000 microservices\, generating nearly 1 petabyte of monitoring logs daily. To better manage this volume and reduce operational costs by over 50%\, Nubank recently transitioned from an external vendor to an in-house log platform. In this talk\, we'll share the platform architecture and the challenges encountered during the migration journey.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:9e32847764f6962604c6661cff54bb27
URL:http://observabilitysummitna26.sched.com/event/9e32847764f6962604c6661cff54bb27
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T185500Z
DTEND:20260522T192000Z
SUMMARY:One Size Does Not Fit All: A Polystore Architecture for Logs and Traces - Suman Karumuri\, KalDB
DESCRIPTION:Observability data isn't homogeneous. Security logs require needle-in-haystack searches with multi-year compliance retention. Kernel logs are uncompressible text. Structured logs enable fast aggregations\, while semi-structured logs explode cardinality. Traces demand different access patterns entirely. Modern requirements compound this. Observability must join with other data sources. Agentic AI systems generate massive volumes of unstructured and semi-structured logs and traces. Big data platforms have emerged as popular storage alternatives. Forcing everything into one system creates impossible tradeoffs: slow queries\, runaway costs\, frustrated users. At Airbnb and Slack\, operating thousands of tenants across hundreds of clusters\, we built a polystore architecture routing workloads to specialized engines\, unified behind a single query interface. This required changes across the entire stack: instrumentation\, collection\, storage\, and query layers. This talk shares routing criteria\, backend tradeoffs\, and techniques for unified querying. Attendees will learn to optimize observability for better performance and lower costs.
CATEGORIES:SCALABILITY CHALLENGES AND SOLUTIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:27ff99840f5cf43d67bbc16c268b7e19
URL:http://observabilitysummitna26.sched.com/event/27ff99840f5cf43d67bbc16c268b7e19
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T192500Z
DTEND:20260522T195000Z
SUMMARY:The Legend of Config: Breath of the Cluster - Henrik Rexed\, Dynatrace
DESCRIPTION:Configuring Ingress\, Gateway API\, or service meshes in Kubernetes can feel like exploring an open world without a map : one wrong turn\, and traffic vanishes. In this session\, we’ll explore how to detect and prevent misconfigurations using OpenTelemetry\, eBPF-based instrumentation (OBI)\, and enriched logs from service meshes and ingress controllers. Like a hero collecting tools to unlock new areas\, we’ll show how to identify relevant data sources\, parse and process their output\, and apply common correlation rules to understand the impact of configuration changes. We’ll demonstrate how these techniques can be applied across observability platforms to reduce tool sprawl and improve operational efficiency. Attendees will leave with a practical\, backend-agnostic approach to building a multi-source observability strategy for Kubernetes networking.
CATEGORIES:COMMUNITY-DRIVEN DEVELOPMENT IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:63332e16a97f4656ae2c77b2dd3acbb9
URL:http://observabilitysummitna26.sched.com/event/63332e16a97f4656ae2c77b2dd3acbb9
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T192500Z
DTEND:20260522T195000Z
SUMMARY:Implementation of Unified Observability at Scale From Scratch - Ahmed J.\, Emaar
DESCRIPTION:Unified observability has lately been regarded as the holy grail by some. One platform\, universal observability\, for everything. Usually\, this would be the default\, but when you are at a 30-year-old non-technical enterprise\, dealing with a mixture of legacy and modern systems\, it's a whole different story. A consequence of legacy decisions\, in some cases\, results in having multiple observability platforms for different teams within the company\, adding overhead\, cost\, noise\, and audit complexity. This was the case at Emaar\, a property developer based in Dubai\, until the PE team took on the exciting project of unifying all observability into one platform. This included applications\, infrastructure\, network\, and security. The complexity arises not just from the different data sources\, but rather from the number and nature of the deployment sites. This included sites across 10 countries consisting of data centers\, hotels\, malls\, shops\, etc. This talk will outline the experience of implementing a unified observability platform consisting of thousands of network devices\, machines\, and application workloads using open-source technologies that resulted in 6 figures of cost savings.
CATEGORIES:END-USER CASE STUDIES
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:e1755c5a2da0ddc838e8eeb7a736bf6e
URL:http://observabilitysummitna26.sched.com/event/e1755c5a2da0ddc838e8eeb7a736bf6e
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T195500Z
DTEND:20260522T202000Z
SUMMARY:How Observability-First Development Lets You Ship Agents in Weeks\, Not Months - Anirudha Jadhav & Kevin Fallis\, AWS
DESCRIPTION:Building AI agents is easy\, but knowing why they fail is hard. Traditional APM tools were designed for request-response services\, not autonomous agents that reason\, plan\, and execute multi-step workflows. When your agent makes unexpected decisions\, standard metrics and traces don't tell you why. \n \n This session introduces Eval-Driven Development\, which focuses on building reliable agents through continuous observability and evaluation. Using OpenSearch AgentHealth\, a new open-source platform for agent observability\, we'll walk you through the full agent lifecycle of building\, observing\, improving\, and repeating. We'll share a case study comparing two production root-cause-analysis agents. One was built with observability from day one and shipped in a 6 weeks\, while the other was retrofitted later and took 12 months to reach production. You'll learn how we used agentic evaluation to score agent outputs and improve accuracy over time. \n \n You'll walk away with patterns for instrumenting agents with OpenTelemetry\, techniques for evaluating full decision sequences (not just outputs)\, and a framework for shortening your development timeline by building observability in from the start.
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:36cd0aaf81635f4c02370d6014e9e342
URL:http://observabilitysummitna26.sched.com/event/36cd0aaf81635f4c02370d6014e9e342
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T195500Z
DTEND:20260522T202000Z
SUMMARY:Let Them Eat Bugs: Practical Showcase of Agentic Issue Resolution - May Walter\, Hud
DESCRIPTION:What if we could move a big chunk of bug fixing and solving production issues to agentic AI? That would be so cool. In this talk we will go through the end to end process of setting up a background agentic workflow that detects production errors\, finds their root causes\, assesses the right solution and opens a PR - so you wake up in the morning to tasks almost fully completed for you by your loyal agent. \n \nTogether we will dive into the entire process of setting up this system that is currently running in real production environments - understanding the different tools\, the infra challenges\, the agentic accuracy spectrum\, and more…
CATEGORIES:AI AND MCP IN OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:6ff82e88a7061ec7196e51f3c689dd7f
URL:http://observabilitysummitna26.sched.com/event/6ff82e88a7061ec7196e51f3c689dd7f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T202000Z
DTEND:20260522T204000Z
SUMMARY:Coffee + Networking Break
DESCRIPTION:Menu:\nAssorted Dry Snacks and Mini Candy Bars (v)
CATEGORIES:BREAKS
LOCATION:Level One | Ballroom A+B Foyer\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:5d7f08f055290806bed9ed0ddc9f913f
URL:http://observabilitysummitna26.sched.com/event/5d7f08f055290806bed9ed0ddc9f913f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T204000Z
DTEND:20260522T210500Z
SUMMARY:Devs\, Transform (Your Data) and Roll Out!: Learning and Leveraging OTTL - Reese Lee\, New Relic
DESCRIPTION:The OpenTelemetry Collector has emerged as one of the project’s most critical pieces for ingesting and processing your app and infrastructure data\, but did you know there’s even more you can do with your data before it reaches your backend? Enter OTTL\, or OpenTelemetry Transformation Language\, a domain-specific language that can interact with and modify OTel data. Yes\, the Collector already comes with dozens of components that can handle a wide range of data processing\, BUT using OTTL in conjunction with the components enables even more powerful data manipulation. In this session\, learn about the benefits of OTTL\, when to use it\, and how to get started with OTTL. Get ready to explore: * What OTTL is: A breakdown of the syntax and the underlying architecture within the OTel Collector. * Why it’s useful: practical strategies for cost reduction (filtering noise)\, compliance (redacting PII)\, and standardization (normalizing attributes). * How to use it: A live walkthrough of writing complex transformation statements for the transform and filter processors.
CATEGORIES:CNCF OBSERVABILITY PROJECTS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:0c3e69f812e9bef40cfb4a13b3546b05
URL:http://observabilitysummitna26.sched.com/event/0c3e69f812e9bef40cfb4a13b3546b05
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T204000Z
DTEND:20260522T210500Z
SUMMARY:Inside the Telemetry Data Plane: Constraints\, Tradeoffs\, and Scale - Eduardo Silva & José Lecaros\, Chronosphere | A Palo Alto Networks Company
DESCRIPTION:Modern telemetry systems often struggle not because of missing features\, but because of hidden constraints in how data is buffered\, scheduled\, and moved through the system. This session explores the practical realities of building a telemetry data plane that must operate under extreme throughput\, tight latency budgets\, and strict resource limits. \n \n Using real-world experience from developing a high-performance open source telemetry agent\, we’ll examine how design tradeoffs around buffering\, concurrency\, and I/O shape system behavior at scale. Topics include user-space serialization strategies\, adaptive buffering models\, memory-mapped persistence\, and multithreaded I/O coordination\, along with how these choices interact with core Linux primitives such as epoll\, asynchronous I/O\, and zero-copy techniques. \n \n Rather than focusing on APIs or products\, this talk dives into the mechanics and constraints that determine whether a telemetry system remains predictable under load. The discussion is grounded in production lessons learned from operating at billions of events per minute and highlights patterns that apply broadly to collectors\, agents\, and streaming systems.
CATEGORIES:INTEGRATING OBSERVABILITY INTO DEVOPS PRACTICES
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:f0798253f86eb2c8d4b7c1a78adf1c96
URL:http://observabilitysummitna26.sched.com/event/f0798253f86eb2c8d4b7c1a78adf1c96
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T211000Z
DTEND:20260522T211500Z
SUMMARY:Closing Remarks
DESCRIPTION:\n
CATEGORIES:KEYNOTE SESSIONS
LOCATION:Level One | Ballroom A\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:f8c2cd273121ce461fc8cf5ddd446f6f
URL:http://observabilitysummitna26.sched.com/event/f8c2cd273121ce461fc8cf5ddd446f6f
END:VEVENT
BEGIN:VEVENT
DTSTAMP:20260529T000145Z
DTSTART:20260522T211000Z
DTEND:20260522T213500Z
SUMMARY:[CANCELLATION] The Missing Layer in eBPF Observability: Storage - Kritik Sachdeva\, IBM
DESCRIPTION:Modern observability has embraced eBPF for profiling CPU usage and tracing network paths in production systems. Yet one critical layer remains largely under-instrumented: storage. Despite being a frequent source of performance issues\, storage I/O is still treated as a black box\, especially in cloud native environments. \n \n This talk we will walk through the basic storage I/O path in Linux and Kubernetes\, highlight where traditional metrics fall short\, and discuss the kinds of storage latency and wait signals that eBPF can surface at runtime without requiring kernel modifications or specialized debugging setups. \n \n Using simple examples\, the session will show how hidden storage latency and queuing effects surface in real workloads\, and why these blind spots become more visible with data-intensive and AI workloads where applications or GPUs often wait on storage without clear indicators. \n \n By the end of this talk\, attendees will gain a practical understanding of where storage observability breaks down today\, what eBPF can realistically help uncover at a foundational level\, and how to reason about storage-related performance issues alongside CPU and networking metrics.
CATEGORIES:THE FUTURE OF OPEN SOURCE OBSERVABILITY
LOCATION:Level One | Ballroom B\, Minneapolis\, MN\, USA
SEQUENCE:0
UID:9abe34a749ae229292fdd4af34a45f13
URL:http://observabilitysummitna26.sched.com/event/9abe34a749ae229292fdd4af34a45f13
END:VEVENT
END:VCALENDAR