PodcastsCienciasThe Information Bottleneck

The Information Bottleneck

Ravid Shwartz-Ziv & Allen Roush
The Information Bottleneck
Último episodio

50 episodios

  • The Information Bottleneck

    Infrastructure for AI at Scale - With Benny Chen (Fireworks AI)

    24/06/2026 | 1 h 5 min
    We talk a lot on this show about RL, agents, and the move between pre-training and post-training, but not enough about the layer everything actually runs on. Benny Chen, co-founder of Fireworks AI, one of the largest inference platforms around, walks us through what it takes to serve models at scale: sourcing GPUs, writing the kernels, the runtime, and the routing layer that lets a customer hit one endpoint and forget the rest.
    We talk why the real bottleneck is power, not chips, and why that favors Nvidia and Google. Why MoE keeps winning even when dense models look better on paper and why he'd rather run fungible capacity at 95% than specialized chips at 60%. We also talk about quantization limits, where RL efficiency has to go next, and his case that AI is still under-hyped. We also get into cross-region training, sparse autoencoders and why interpretability hasn't taken off in open source, whether open models can close the gap, and a frank read on Anthropic's go-to-market.

    Timeline
    00:00 — Intro: the part of AI nobody talks about
    01:20 — What "infrastructure for AI" actually means: the layers, from GPUs up to routing
    02:59 — Why not just buy your own GPUs and do it yourself?
    05:17 — The scale Fireworks runs at
    06:35 — Hardware inflation, GPU costs, and the real risk hiding in commit duration
    10:14 — Nvidia vs AMD vs TPUs, and why power is the bottleneck
    11:57 — Mixing GPU types and generations; fungibility vs. specialization
    14:22 — Once you have the GPUs, what's the next layer to build?
    17:04 — Dense vs. MoE, and why the hardware picks the winner
    21:07 — Quantization: is FP4 the floor? TurboQuant and INT vs. FP
    24:28 — How tied are the algorithms to the hardware?
    25:12 — DeepSeek, DeepGEMM, and next-token prediction as reconstruction loss
    28:50 — Why RL is still wildly inefficient compared to pre-training
    30:08 — Speculative decoding, AI-generated kernels, and auto-research
    34:00 — The AGI question: why text gets automated but vision may stay expensive
    37:07 — Hype check: why Benny thinks AI is still under-hyped
    41:28 — Training vs. inference at the infrastructure level
    44:12 — Scaling across data centers: cross-region training with Cursor
    45:40 — Sparse autoencoders, interpretability, and why open source is human-constrained
    49:04 — Will open models catch up — on quality and on compute?
    51:41 — Are we plateauing? Opus 4.7 vs. 4.6 and the coming data wars
    54:41 — Physical limits, HBM, and whether chips keep getting faster
    58:17 — The belief about inference everyone gets wrong
    59:31 — Anthropic, mythos, and a frank take on go-to-market
    1:04:41 — Wrap-up

    Music:
    "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.

    About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
  • The Information Bottleneck

    Broken Peer Review, AI, and Worms — with Oded Rechavi

    21/06/2026 | 1 h 18 min
    Oded Rechavi is a biologist at Tel Aviv University and the co-founder of QED, a company building AI to review scientific work. He's also spent years studying worms.
    We start with what's wrong with peer review and grant funding: why it takes years to publish, why reviewers are often your own competitors, and why the whole thing is locked to an economic model that rewards publishing more papers, not better ones. Oded explains why he doesn't call QED "peer review" at all, and what it would take to actually validate science instead of just stamping it.
    Then we get into the biology. C. elegans has exactly 959 cells, every one of them named, and a fully mapped brain. Oded's lab studies how a worm's experiences get passed to its offspring through RNA rather than DNA — meaning what happens to a worm in its lifetime can change its descendants. We also talk about using ancient DNA to reassemble the Dead Sea Scrolls, what AI can and can't do for biology, and why he wants to build an "Ironman suit" for researchers rather than replace them.
    00:00 Intro
    01:35 Why scientific publishing is broken
    04:02 Years to publish, and what it costs science
    07:20 Bad reviewers, conflicts of interest, and the money
    10:47 Why preprints don't fix it
    15:37 How AI conferences handle review
    22:07 Conferences vs. journals — does slow review help?
    25:22 Building QED: review, not peer review
    30:02 Tracking a paper from idea to submission
    33:11 What writing a grant actually involves
    35:00 The ERC reviewer crisis
    37:06 Tailoring feedback to your field
    41:48 Switching to biology
    44:30 Every cell has a name: inside C. elegans
    46:28 Inheritance without DNA
    48:16 What the worm "thinks" changes its offspring
    51:58 Reassembling the Dead Sea Scrolls with ancient DNA
    56:07 Psychedelics and worms
    58:36 Can AI run the research itself?
    1:04:49 Automation vs. validation
    1:07:12 The origin of life
    1:08:49 Why people reject AI-written work
    1:16:18 Will humans still have a role?
    1:17:39 Wrap-up
    Music:
    "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.

    About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
  • The Information Bottleneck

    Will AI Take Our Jobs? With Alex Imas (Google/University of Chicago)

    16/06/2026 | 1 h 29 min
    Will AI take our jobs? We put the question to Alex Imas, the new Director of AGI Economics at Google DeepMind and a professor at Chicago Booth, whose entire job now is studying how frontier AI reshapes the economy. His short answer: probably some of them, but the popular story is mostly wrong about which jobs and how fast.
    Alex makes the case that a job is a bundle of tasks, not a single thing AI either does or doesn't do, and that the number of people who should actually care about is how much consumer demand responds to falling prices. Get that wrong and you predict mass layoffs. Get it right and you sometimes predict more hiring. We get into why the automation panic is two centuries old, why he thinks blue-collar work is in more danger than white-collar, and why the people already winning are the ones adopting AI fastest.
    We also cover the AGI versus ASI distinction and why it changes everything for the economy, what happens when there's no moat and open models stay six to eight months behind, the three-tier pricing future he sees coming after the 2026 compute crunch, and what any of this means if you're deciding whether to send your kids to college.

    The episode was recorded before Alex joined Google
    Timestamps
    00:00 Meeting Alex Imas
    00:44 Will AI take our jobs?
    03:35 Is this an AI question or an economics question?
    06:18 The economy is already behind the AI we have
    07:43 Why AI adoption is K-shaped
    12:51 Was Andrew Yang right?
    13:45 The automation panic is 200 years old
    16:46 Dario's six-month claim, and why we don't see it yet
    17:22 A job is not a task
    22:38 The three numbers that actually predict the labor market
    22:42 The chess engine analogy and the centaur phase
    25:45 Recursive self-improvement and the hamburger problem
    30:06 Should AI labs be the ones answering alignment questions?
    31:17 The "invisible hand wave" and why nobody wants fully autonomous AI
    33:27 AGI vs ASI, and why the difference is everything
    35:28 Commodities vs relational goods
    41:14 Star Trek, replicators, and predicting with sci-fi
    45:20 Inequality and the Upper West Side VCs
    46:21 Your money manager was automated in the 1960s
    50:47 Are OpenAI and Anthropic overvalued? The moat problem
    54:29 What has to be true for the losses to make sense
    55:43 Cognitive atrophy and monopoly fears
    57:00 The 2026 compute crunch and the three-tier pricing future
    1:01:52 The Apple vs Android analogy
    1:03:54 A rich-country perspective
    1:04:16 Protecting the skills that actually matter
    1:07:02 Will not using AI become a status symbol?
    1:08:53 Does capitalism even survive?
    1:13:44 Redistribution becomes the political battleground
    1:18:16 Blue collar vs white collar: who's really at risk
    1:21:18 Advice for parents in an AI world
    1:22:43 Saving for retirement when the Valley says don't
    1:25:06 Will non-elite colleges survive?
    Music:
    "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.

    About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
  • The Information Bottleneck

    Why AI Benchmarks Are Lying to You - with Wenhu Chen (Meta/University of Waterloo)

    13/06/2026 | 1 h 19 min
    In this episode, we sit down with Wenhu Chen, research scientist at Meta MSL, assistant professor at the University of Waterloo, and the person behind MMLU-Pro and MMMU. If you've read a frontier model release in the last two years, you've seen his benchmarks. That makes him one of the best people to answer the question everyone dances around: when a model jumps from 40% to 90% on your benchmark, how much of that is real? In this episode, we dig into why benchmarks have become the loss function of the entire field - design a bad one, and thousands of brilliant researchers will spend months hill-climbing in the wrong direction. Wenhu is surprisingly candid about the limits of his own creations: contamination is everywhere, saturation turns frontier benchmarks into unit tests, and popular alternatives, such as LM Arena, mostly measure tone and length rather than capability. His answer is to evaluate models where they've never been: private codebases, hospital data, and the messy, live internet.
    We also talk about ClawBench, his new benchmark that deploys agents to over 140 real production websites to do things people actually want done, such, such as ordering food, booking tickets, and applying for jobs. The best model in the world completes about a third of these tasks. We unpack why: bot detection, models that refuse to click "pay," agents that give up the moment an environment doesn't match their training, and harnesses that can swing results by 20% without changing the model at all.
    Along the way, we cover the overlooked science of evaluating pre-training, data flywheels, and synthetic environments for agent training, and whether RL teaches models to reason or just surfaces what's already there. We close with Wenhu's predictions: exploration and adaptability will improve rapidly, but security will become the field's hardest problem as agents gain real permissions in the real world.

    Timestamps
    00:00 – Intro
    00:55 – What good evaluation means, and how it's changed since the early GPT days
    03:35 – Benchmarks as the field's loss function
    05:50 – Contamination: the problem nobody fully solves
    08:08 – MMLU-Pro scores: real progress or training on the test set?
    11:05 – Can you measure creativity?
    12:34 – Why human judges and arenas are unreliable — and what to use instead
    19:22 – What a good benchmark actually looks like
    22:34 – Chain of thought: signal or scratchpad?
    26:01 – Auto-research and hill-climbing agents
    28:52 – Harnesses: 20% swings without touching the model
    32:28 – Safety, model release, and an "FDA for models"
    36:53 – The overlooked science of pre-training evaluation
    43:49 – Designing pre-training benchmarks when one run costs a billion dollars
    49:45 – ClawBench: agents on 140+ live websites, and why the best model gets 33%
    54:42 – How MMLU-Pro and MMMU-Pro were born from public complaints
    59:16 – Pixel agents vs. APIs: will MCP kill computer use?
    1:02:11 – Training agents: data flywheels and synthetic environments
    1:05:43 – SFT vs. RL, and does RL teach reasoning or reveal it?
    1:09:21 – What gets solved next year — and what doesn't
    1:14:32 – Undervalued ideas, and what's next for ClawBench

    Music:
    "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.

    About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
  • The Information Bottleneck

    Why AI Benchmarks Are Lying to You - with Wenhu Chen (Meta/ University of Waterloo)

    13/06/2026 | 1 h 19 min
    In this episode, we sit down with Wenhu Chen, research scientist at Meta MSL, assistant professor at the University of Waterloo, and the person behind MMLU-Pro and MMMU. If you've read a frontier model release in the last two years, you've seen his benchmarks. That makes him one of the best people to answer the question everyone dances around: when a model jumps from 40% to 90% on your benchmark, how much of that is real? In this episode, we dig into why benchmarks have become the loss function of the entire field - design a bad one, and thousands of brilliant researchers will spend months hill-climbing in the wrong direction. Wenhu is surprisingly candid about the limits of his own creations: contamination is everywhere, saturation turns frontier benchmarks into unit tests, and popular alternatives, such as LM Arena, mostly measure tone and length rather than capability. His answer is to evaluate models where they've never been: private codebases, hospital data, and the messy, live internet.
    We also talk about ClawBench, his new benchmark that deploys agents to over 140 real production websites to do things people actually want done, such, such as ordering food, booking tickets, and applying for jobs. The best model in the world completes about a third of these tasks. We unpack why: bot detection, models that refuse to click "pay," agents that give up the moment an environment doesn't match their training, and harnesses that can swing results by 20% without changing the model at all.
    Along the way, we cover the overlooked science of evaluating pre-training, data flywheels, and synthetic environments for agent training, and whether RL teaches models to reason or just surfaces what's already there. We close with Wenhu's predictions: exploration and adaptability will improve rapidly, but security will become the field's hardest problem as agents gain real permissions in the real world.

    Timestamps
    00:00 – Intro
    00:55 – What good evaluation means, and how it's changed since the early GPT days
    03:35 – Benchmarks as the field's loss function
    05:50 – Contamination: the problem nobody fully solves
    08:08 – MMLU-Pro scores: real progress or training on the test set?
    11:05 – Can you measure creativity?
    12:34 – Why human judges and arenas are unreliable — and what to use instead
    19:22 – What a good benchmark actually looks like
    22:34 – Chain of thought: signal or scratchpad?
    26:01 – Auto-research and hill-climbing agents
    28:52 – Harnesses: 20% swings without touching the model
    32:28 – Safety, model release, and an "FDA for models"
    36:53 – The overlooked science of pre-training evaluation
    43:49 – Designing pre-training benchmarks when one run costs a billion dollars
    49:45 – ClawBench: agents on 140+ live websites, and why the best model gets 33%
    54:42 – How MMLU-Pro and MMMU-Pro were born from public complaints
    59:16 – Pixel agents vs. APIs: will MCP kill computer use?
    1:02:11 – Training agents: data flywheels and synthetic environments
    1:05:43 – SFT vs. RL, and does RL teach reasoning or reveal it?
    1:09:21 – What gets solved next year — and what doesn't
    1:14:32 – Undervalued ideas, and what's next for ClawBench

    Music:
    "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.

    About: The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.
Más podcasts de Ciencias
Acerca de The Information Bottleneck
Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.
Sitio web del podcast

Escucha The Information Bottleneck, StarTalk Radio y muchos más podcasts de todo el mundo con la aplicación de radio.net

Descarga la app gratuita: radio.net

  • Añadir radios y podcasts a favoritos
  • Transmisión por Wi-Fi y Bluetooth
  • Carplay & Android Auto compatible
  • Muchas otras funciones de la app