Powered by RND
PodcastsCienciasDev and Doc: AI For Healthcare Podcast

Dev and Doc: AI For Healthcare Podcast

Dev and Doc
Dev and Doc: AI For Healthcare Podcast
Último episodio

Episodios disponibles

5 de 30
  • Everything you need to know about LLM benchmarks- Turing Test, OpenAI's Healthbench, ARC prize, LM arena
    Whenever there was AI, there were benchmarks- from the turing test, to society-changing benchmarks like MNIST and ImageNet to modern problems like the ARC prize, benchmarked served a vital purpose to measure the performance of AI models. But something has shifted in modern times, in the LLM era have benchmarks lost their utility, becoming mere advertisement for big tech? Even seemingly more sophisticated benchmarks like LM Arena can be gamed by tech giants. We also deep dive into healthcare benchmarks like OpenAI's Healthbench (deeply problematic) and Microsoft's AI-DXO orchestrator agent for diagnosis. Where is this all going? How do we make the perfect benchmark? Or is the real work to be done afterwards in the real world?👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)---Timestamps00:00 Intro - The OG benchmarks - Turing test, MNIST, ImageNET06:40 Are large language models benchmarks similar to humans taking tests?10:05 Are we testing model capability vs production ready?12:00 LLM era - data contamination15:30 LM Arena - The leaderboard illusion paper - how big tech games benchmarks28:35 Goodhart's law - When a measure becomes a target, it ceases to be a good measure32:05 Some good benchmarks - games - Pokemon, ARC prize, Minecraft34:35 Medical benchmarks - OpenAI's healthbench has some big problems46:50 Microsoft AI-DXO orchestrator for case reports---Connect with UsYour Hosts:👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn🤖 Dev - Zeljko Kraljevic - TwitterFollow & Subscribe:YT: https://youtube.com/@DevAndDocSpotify: Follow us on SpotifyApple Podcasts: Listen on Apple PodcastsSubstack: https://aiforhealthcare.substack.com/For enquiries:📧 [email protected] Credits🎞️ Editor: Dragan Kraljević - Instagram🎨 Brand & Art: Ana Grigorovici - Behance
    --------  
    55:19
  • #28 AI agents explained - Manus AI, computer control, Agentic workflows (healthcare)
    AI agents are here, but how did we get here in the first place? How do we build and leverage AI agents for high stakes domains like healthcare? In this episode of Dev and Doc, we go deep into the forest that is AI agents and computer control - starting from the "caveman" era of LLMs discovering tools, to cultivating intelligent models and agentic workflows. We dissect everyday agents like MANUS AI, and deep dive into how, where and when AI agents should be used. Are these agents hype or hope, is this actually the second deepseek moment?👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)Episode Timestamps:00:00 Highlight3:13 start / intro5:20 LLM's caveman era - tool usage6:46 Agents have autonomy and interact with environment11:15 workflows and agentic flows15:30 when should you be using an agent?24:27 vibe coding is like driving a car29:07 Demo - MANUS gathering financial trends, computer control35:55 Demo MANUS AI- website creation for Autism Assessment49:05 computer control factions- Freedom vs Process automation55:00 Autism website testing59:13 summary + endHosts:👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokrFind us on:YT - https://youtube.com/@DevAndDocSpotify - https://podcasters.spotify.com/pod/show/devanddocApple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120Substack- https://aiforhealthcare.substack.com/For enquiries:📧[email protected]:🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d
    --------  
    1:00:48
  • #27 Exploring Claude Sonnet 3.7 for healthcare
    body{font-family:sans-serif;color:#fff;background:#121212;margin:0;padding:10px}p{margin:8px 0}h1{font-size:18px;margin:10px 0}.note{background:#535353;padding:10px;border-radius:4px;margin:10px 0}.timestamps span{color:#1DB954;font-weight:bold}a{color:#1DB954;text-decoration:none}Can Claude perform a range of complex clinical tasks? Dev and Doc are here to investigate.Claude sonnet 3.7 was released less than 48 hours ago, the model is highly intelligent and is one of the best we have seen in recent memory. Definitely passes the vibe check.We give some amazing examples of coding with claude with few shot prompts, and cover technical and clinical evaluations and share our first thoughts. We even tested claude to take a patient history!NB - PLEASE don't do this at home, obviously this is a demo and we do not in any way condone or recommend using an LLM as your doctor or healthcare provider, we are just demonstrating what the future could be. If you are sick, please seek a medical professional.👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)TIMESTAMPS00:00 start + highlights01:54 Introduction08:54 Benchmarks, state of the art14:44 guardrails, refusals, AI safety and catastrophic risks22:36 show and tell- great for coding and make video games!26:54 example hospital runner30:17 Medical use cases- clinical coding, biomedical entity extraction37:04 only medical example in Claude model card- still hallucinating citations38:37 making an anatomy app40:10 forecasting clinical diagnoses43:36 taking a medical history from a patient53:33 wrap up👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - linkedin.com/in/dr-joshua-auyeung🤖Dev - Zeljko Kraljevic twitter.com/zeljkokrYT:youtube.com/@DevAndDocSpotify:podcasters.spotify.com/pod/show/devanddocApple:podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120Substack:aiforhealthcare.substack.comFor enquiries - 📧 [email protected]🎞️ Editor - Dragan Kraljević instagram.com/dragan_kraljevic🎨 Brand design - Ana Grigorovici behance.net/anagrigorovici027d
    --------  
    58:03
  • #26 Is it still worth doing a PhD in 2025? (Computer Science / Machine Learning)
    Is it still worth doing a PhD in 2025? Is the academic system broken in this publish-or-perish landscape? When is a PhD not worth pursuing? About this Episode In this Dev and Doc episode, Zeljko (now associate professor!) and Josh (doctor, PhD drop out) talk about the good and the bad of PhD life. They provide insight into the academic world with a focus on computer science and machine learning. 👋 Connect With Us! Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎙️ Hosts 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn 🤖 Dev - Zeljko Kraljevic - Twitter ⏳ Timestamps 00:00 - Start and highlight 01:42 - Intro 03:11 - What made you pursue PhD in the first place 05:05 - Industry or PhD first 10:00 - Positives - Moonshots 17:03 - Positives - Access to world experts and collaboration 20:55 - Positives - Open source and open science 24:49 - Positives - A good environment enables a smooth PhD 27:04 - Negatives - You are a one-man show 31:33 - Negatives - Publish or Perish 45:44 - Bring your research closer to the audience through blogs and other media, journals are legacy media 51:20 - Verdict - Is a PhD still worth it in 2025? 📢 Follow Us LinkedIn Newsletter YouTube Spotify Apple Podcasts Substack 📧 Contact Us For enquiries - [email protected] 🎞️ Video Production 🎬 Editor - Dragan Kraljević - Instagram 🎨 Brand Design & Art Direction - Ana Grigorovici - Behance
    --------  
    56:41
  • #25 Testing Deepseek R1 on Complex Medical Tasks. Here's what we found. (GRPO explainer)
    Dev and Doc put Deepseek R1 to the test in a technical and clinical deep dive. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-au-yeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr TIMESTAMPS 00:00 Highlights 04:36 Intro 08:29 response from OpenAI, Anthropic- model training costs, tightening restrictions on China, pricing wars 13:13 what an open-source deepseek means for the world. 15:38 Sam altman and Dario amodei feeling the pressure 23:10 TECHNICAL deep dive - RLHF, ppo, dpo 37:08 GRPO, R1s secret sauce 45:02 the aha moment, learning like a human? 50:25 deepseek R1 training and controversy 59:08 deepseek healthcare evaluation - Ethnic Bias 1:06:17 The diagnostic acid test (fail) 1:12:46 Coding clinical data / Medical billing (shout out SNOMED) LinkedIn Newsletter https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817 YT - https://youtube.com/@DevAndDoc Spotify - https://podcasters.spotify.com/pod/show/devanddoc Apple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 Substack- https://aiforhealthcare.substack.com/ For enquiries - 📧[email protected] 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d
    --------  
    1:20:45

Más podcasts de Ciencias

Acerca de Dev and Doc: AI For Healthcare Podcast

Bringing doctors and developers together to unlock the potential of AI in healthcare. Together, we can build models that matter. 🤖👨🏻‍⚕️ Hello! We are Dev & Doc, Zeljko and Josh :) Josh is a Neurologist, AI Researcher and Clinical AI Lead. Zeljko is an AI engineer, CTO and associate professor (UCL) ------------- Substack- https://aiforhealthcare.substack.com/ YT - https://youtube.com/@DevAndDoc
Sitio web del podcast

Escucha Dev and Doc: AI For Healthcare Podcast, Masaje cerebral y muchos más podcasts de todo el mundo con la aplicación de radio.net

Descarga la app gratuita: radio.net

  • Añadir radios y podcasts a favoritos
  • Transmisión por Wi-Fi y Bluetooth
  • Carplay & Android Auto compatible
  • Muchas otras funciones de la app
Aplicaciones
Redes sociales
v7.23.9 | © 2007-2025 radio.de GmbH
Generated: 9/17/2025 - 8:37:42 PM