The Information Bottleneck | Escuchar podcast en línea gratis

58 episodios

The Model Cheated — with Florian Brand (Prime Intellect)
27/07/2026 | 57 min
Florian Brand builds evals at Prime Intellect. The premise of the conversation is that writing a benchmark is the easy part now. Keeping the model from cheating it is the job, and it takes longer than the benchmark itself.
We get into why he thinks you can't evaluate a model apart from the CLI it runs in, what happens to statistics when a single run costs five figures, and whether the feeling that a model just works can ever become a number.
He also has a few stories about agents finding their way around the scoring that are worth hearing cold.
Timeline
00:13 Intro
01:00 What evals are for
04:05 Agentic benchmarks
07:10 Kimi K2 and model diversity
08:23 Long-horizon coding tasks
10:29 Building a benchmark
12:15 MirrorCode
14:27 Rubrics and LLM judges
16:30 The cost of expert labelers
17:49 Long runs and variance
19:44 Evaluating the harness
24:29 Chinese labs building CLIs
30:00 More reward hacking
37:45 Tau-bench and economic tasks
39:43 Benchmaxxing and GLM 5.2
45:15 Statistics and cost
47:56 Frontier convergence
52:04 Misuse in open and closed models
55:35 Self-improvement

Music
"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
About
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Pierre-Carl Langlais on Building Models from Data You Can Account For
23/07/2026 | 1 h 5 min
Most labs build language models by scraping the web and filtering afterward. Pierre-Carl Langlais runs it the other way around. At Pleias, the French-German lab he co-founded, the models are built from data he can actually account for, which in practice means open and public-domain sources plus a lot of synthetic data the lab generates itself. It sounds like a self-imposed handicap. It mostly isn't. One of their models is a 600 million parameter system that runs live inside the Paris subway's monitoring pipeline.
We cover the SYNTH pretraining dataset and why he thinks "ethical data" has to mean more than copyright-free. He explains why barely 2% of their Common Corpus appears in typical web crawls, and why that gap is really a preservation problem. From there, he gets blunt about benchmark maxing and whether GLM really earns its Opus-class reputation. He also argues that the quiet move by closed labs to hide reasoning traces is mostly about claiming ownership of model outputs. He's skeptical of sovereign AI, and not shy about how Mistral drifted from frontier research toward French corporate consulting. We finish on NVIDIA's persona datasets and the odd idea of training on the conditions that produced a text rather than the text itself.
Timeline
(00:02) Welcome and introductions
(00:49) Why synthetic data matters, and the SYNTH set
(04:15) Three reasons to control your training data
(07:18) What "ethical data" actually means
(11:08) How Common Corpus got built, from Wikipedia to PDFs
(16:35) Agentic harnesses and synthetic data
(20:03) Evaluating data when you train on reasoning traces
(25:27) General versus specialized pretraining
(27:08) Benchmark maxing and the GLM question
(31:51) Getting diversity in, and the NVIDIA personas
(35:02) Hidden reasoning traces and the fight over model IP
(38:17) Mid-training and the "It's All Training" thesis
(41:47) Can small models actually compete
(45:01) Cybersecurity and Europe's strategic gap
(47:08) Do you need a big model to orchestrate the small ones
(52:08) Sovereign AI and the limits of national champions
(56:42) Scaling laws when you control the data
(01:00:41) The NVIDIA persona datasets
(01:04:52) What you actually do with synthetic personas
(01:08:22) Closing thoughts
Music
"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
About
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Dhruv Batra: The Browser Is a Robotics Problem - From Embodied AI at Meta to Web Agents at Yutori
20/07/2026 | 1 h 7 min
Dhruv Batra spent years leading Embodied AI at Meta, training virtual robots to navigate photorealistic 3D scans of real buildings with pure reinforcement learning. Then he left to co-found Yutori and build agents for a very different environment: the web browser.
In this episode, Dhruv explains why he sees these as the same problem. Web agents, in his framing, are robots that act in a browser (pixels in, actions out), and the web turns out to be just as messy an environment as the physical world.
Along the way, we cover his definition of intelligence as "navigation in idea space," why robotics is lagging LLMs, the sim-to-real gap and why you can't fake friction coefficients, the teleoperation counterexample to the "it's a sensor problem" argument, and his provocative claim that under the current paradigm, we solved machine learning and didn't even realize it. He also makes the case for why the scaling hypothesis isn't falsifiable, why JEPA-style arguments deserve to be grappled with, how Yutori trains its Navigator models with RL on live websites, and what happens to the ad-supported web when agents, not eyeballs, do the browsing.
Timeline
00:01 — Intro
00:54 — What embodied AI actually means
06:47 — Intelligence as navigation in idea space
13:26 — Habitat: training robots with pure RL, no maps
20:04 — Why robotics is behind LLMs
28:24 — Sim-to-real: what you can and can't fake
33:34 — "We solved ML and nobody noticed"
37:12 — Leaving Meta, founding Yutori
43:21 — Web agents: screenshots in, actions out
48:15 — Why the web won't rebuild itself for agents
53:32 — Training Navigator: RL on live websites
1:01:04 — Who pays for the web when agents browse?
1:09:17 — What Yutori means, closing thoughts
Music
"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
About
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
How to Turn Research Into Billion-Dollar Companies, with Ion Stoica
16/07/2026 | 49 min
Ion Stoica has done what almost no academic ever does — repeatedly turned university research into billion-dollar companies. He co-founded Databricks (now valued at over $100 billion), Anyscale, Arena AI and Conviva, while his Berkeley lab produced the open source projects the entire AI industry runs on: Ray, vLLM, and SGLang.
In this episode, we ask him how it's actually done. His answer is surprisingly unromantic: solve a problem people already care about, build an artifact good enough that they adopt it, and pay attention to the moment users start asking "who maintains this after the students graduate?" - that's when a project becomes a company. He's also insistent that the credit belongs to his students.
From there, the conversation goes deep into what he's watching now: why the AI stack has become an order of magnitude more complex than the Hadoop/Spark era, why maximizing GPU utilization is "the name of the game" for any enterprise, and why coding agents will struggle with distributed systems long after they've mastered web apps. He shares a memorable reward-hacking story — a load balancer that maximized throughput by dropping requests — explains why the gap between open and closed models sits at about six months, and closes with his case for regulating AI by outcomes, not capabilities.
Timeline
00:00 — Introduction: welcoming Ion Stoica
01:21 — The playbook: how research projects become companies
05:22 — Will vLLM and SGLang stay open source?
07:47 — The real bottleneck in the AI stack: complexity, not just hardware
14:31 — Should algorithms follow infrastructure, or the other way around?
16:13 — Can AI coding tools write distributed systems and GPU kernels?
21:09 — Verifiers, harnesses, and the limits of outsourcing understanding
25:41 — Reward hacking: the load balancer that dropped requests
25:58 — How should enterprises consume GPUs? Utilization as the name of the game
30:23 — GPU scarcity: will the compute crunch ever end?
35:27 — Hyper-optimization and the risk of locking in today's architectures
37:17 — Open vs. closed models: why every company wants to own the stack
40:35 — The six-month gap, and the rising cost of training frontier models
43:58 — Kimi, Qwen, and who's incentivized to keep open models alive
45:39 — Regulation: outcomes, not capabilities
47:41 — Self-regulation, concentration of power, and auditing open models
48:32 — Wrap-up
Music
"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
About
The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Kaggle Grandmasters, Agent Skills, and Why Everyone Is Overfitting with Jean-Francois Puget (NVIDIA)
13/07/2026 | 59 min
Jean-Francois Puget is a Director and Distinguished Engineer at NVIDIA, where he leads the Kaggle Grandmasters team, and he's ranked third on Kaggle's all-time list. We caught him on the day NVIDIA announced Nemotron Ultra and its new agent skills repo. We talk about what skills actually are, why they beat MCP tools on context cost, and how NVIDIA built an evaluation pipeline to separate skills that help from skills that don't.
From there we talk about the thing JFP cares about most: evaluation. He explains why most LLM benchmarks reward overfitting, how his team discovered O3 could pick the right files to fix SWE-bench issues without reading them, and why the only benchmarks he trusts are the ones where you commit before you see the score, which is exactly how Kaggle works. He predicts a "bloodbath" for the wave of competitors letting coding agents chase leaderboard scores with no notion of validation.
We also get into what coding agents are actually good for ("a mix of a genius and a dumb person"), the multi-agent system at NVIDIA that built a working PyTorch clone that runs 10x slower than the real thing, his unfiltered take on frontier lab PR and the Mythos release, whether AI is a bubble, and the story of how his team won ARC-AGI with a 4-billion-parameter model at 20 cents a task, including jumping from third to first in the final hours of a seven-month competition.

Timeline
00:00 — Intro
01:05 — NVIDIA's announcements: Nemotron Ultra and the agent skills repo
07:21 — Skills vs MCP tools, and progressive disclosure
10:24 — Agents that write their own skills: a new form of learning
13:33 — When overfitting is fine (and when it isn't)
15:47 — Why most LLM benchmarks reward overfitting
17:06 — The SWE-bench contamination story: O3 picks files without reading them
19:45 — How LLMs changed Kaggle, and the coming "bloodbath"
25:40 — What makes a good data scientist: evaluation and one-bit experiments
28:56 — Running Codex at scale: the top token consumers at NVIDIA
29:37 — Did coding agents kill AutoML?
30:16 — Genius and dumb at once: the limits of coding agents
35:21 — Humans in the loop, sandboxing, and the teenage hacker who never wrote code
37:42 — Mythos, frontier lab PR, and open source
40:08 — Why NVIDIA builds open models, and where it's already frontier
43:48 — World models, robots, and the coffee test
49:20 — Why agents still can't play Dota
50:24 — Is AI a bubble?
53:14 — Winning ARC-AGI with a 4B model at 20 cents a task
57:39 — Kaggle is a legal drugMusic:
"Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

Más podcasts de Ciencias

Podcasts a la moda de Ciencias

Acerca de The Information Bottleneck

Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.

Sitio web del podcast

Ciencias Tecnología

Escucha The Information Bottleneck, Das Wissen | SWR y muchos más podcasts de todo el mundo con la aplicación de radio.net

Descarga la app gratuita: radio.net

Añadir radios y podcasts a favoritos
Transmisión por Wi-Fi y Bluetooth
Carplay & Android Auto compatible
Muchas otras funciones de la app

Abrir app

Descarga la app gratuita: radio.net

Añadir radios y podcasts a favoritos
Transmisión por Wi-Fi y Bluetooth
Carplay & Android Auto compatible
Muchas otras funciones de la app

The Information Bottleneck

Escanea el código,
Descarga la app,
Escucha.