Powered by RND
PodcastsTecnologíaAIandBlockchain

AIandBlockchain

j15
AIandBlockchain
Último episodio

Episodios disponibles

5 de 210
  • Urgent!! Claude 4.5: The Truth About 30 Hours and Code
    30 hours of nonstop work without losing focus. Leading OSWorld with 61.4%. SWE-bench Verified — up to 82% in advanced setups. And all that — at the exact same price as Sonnet 4. Bold claims? In this episode, we cut through the hype and break down what’s really revolutionary about agent AI. 🧠You’ll learn why long-form coherence changes the game: projects that once took weeks can now shrink into days. We explain how Claude 4.5 maintains state over 30+ hours of multi-step tasks — and what that means for developers, research teams, and production pipelines.We’re speaking in metrics. SWE-bench Verified: 77.2% with a simple scaffold (bash + editor), up to 82.0% with parallel runs and ranking. OSWorld: a leap from ~42% to 61.4% in just 4 months — a real ability to use a computer, not just chat. This isn’t “hello world,” it’s fixing bugs in live repositories and navigating complex interfaces.Real-world data too. One early customer reported that switching from Sonnet 4 to 4.5 in an internal coding benchmark reduced error rates from 9% all the way to 0%. Yes, it was tailored to their workflow, but the signal of a qualitative leap in reliability is hard to ignore.Agents are growing up. Example: Devon AI saw +18% improvement in planning and +12% in end-to-end performance with Sonnet 4.5. Better planning, stronger strategy adherence, less drift — exactly what you need for autonomous pipelines, CI/CD, and RPA. 🎯The tooling is ready: checkpoints in Claude Code, context editing in the API, a dedicated memory tool to move state outside the context window. Plus an Agent SDK — the very same infrastructure powering their frontier products. For web and mobile users: built-in code execution and file creation — spreadsheets, slide decks, docs — right from chat, no manual copy-paste.Domain expertise is leveling up too:Law: handling briefing cycles, drafting judicial opinions, summary judgment analysis.Finance: investment-grade insights, risk modeling, structured product evaluation, portfolio screening — all with less human review.Security: −44% in vulnerability report processing time, +25% accuracy.Safety wasn’t skipped. Released under ASL3, with improvements against prompt injection, reduced sycophancy, reduced “confident hallucinations.” Sensitive classifiers (e.g., CBRN) now generate 10x fewer false positives than before, and 2x fewer since Opus 4 — safer and more usable.And the price? Still $3 input, $15 output per 1M tokens. Same cost, much more power. For teams in the US, Europe, India — the ROI shift is big.Looking ahead: the Imagine with Claude experiment — real-time functional software generation on the fly. No pre-written logic, no predetermined functions. Just describe what you need, and the model builds it instantly. 🛠️If you’re building agent workflows, DevOps bots, auto-code-review, or legal/fintech pipelines — this episode gives you the map, the benchmarks, and the practical context.Want your use case covered in the next episode? Drop a comment. Don’t forget to subscribe, leave a ★ rating, and share this episode with a colleague — that’s how you help us bring you more applied deep dives.Next episode teaser: real case study — “Building a 30-hour Agent: Memory, Checkpoints, OSWorld Tools, and Token Budgeting.”Key Takeaways:30+ hours of coherence: weeks-long projects compressed into days.SWE-bench Verified: 77.2% (baseline) → 82.0% (parallel + ranking).OSWorld 61.4%: leadership in “computer-using ability.”Developer infrastructure: checkpoints, memory tool, API context editing, Agent SDK.Safety: ASL3, fewer false positives, stronger resilience against prompt injection.SEO Tags:Niche: #SWEbenchVerified, #OSWorld, #AgentSDK, #ImagineWithClaudePopular: #artificialintelligence, #machinelearning, #programming, #AILong-tail: #autonomous_agents_for_development, #best_AI_for_coding, #30_hour_long_context, #ASL3_safetyTrending: #Claude45, #DevonAIRead more: https://www.anthropic.com/news/claude-sonnet-4-5
    --------  
    15:57
  • Openai. AI vs Experts: The Truth Behind the GDP Benchmark
    🤖📉 We all feel it: AI is transforming office work. But the usual indicators — hiring stats, GDP growth, tech adoption — always lag behind. They tell us what already happened, not what’s happening right now. So how do we predict how deeply AI will reshape the job market before it happens?In this episode, we break down one of the most ambitious and under-the-radar studies of the year — the GDP Benchmark: a new way to measure how ready AI is to perform real professional work. And no — this isn’t just another model benchmark.🔍 The researchers created actual job tasks, not abstract multiple-choice quizzes — 44 tasks across 9 core sectors that together represent most of the U.S. economy. Financial reports, C-suite presentations, CAD designs — all completed by top AI models and then blind-reviewed by real industry professionals, each with an average of 14 years of experience.Here’s what you’ll learn in this episode:What "long-horizon tasks" are and why they matter more than simple knowledge tests.How AI handles complex, multi-step jobs that demand attention to detail.Why success isn’t just about accuracy, but also about polish, structure, and aesthetics.Which model leads the race — GPT-5 or Claude Opus?What’s still holding AI back (spoiler: 3% of failures are catastrophic).Why human oversight remains absolutely non-negotiable.How better instructions and prompt scaffolding can dramatically boost AI performance — no hardware upgrades needed.💡 Most importantly: the GDP Benchmark is the first serious attempt to build a leading economic indicator of AI's ability to do valuable, real-world work. It offers business leaders, developers, and policymakers a new way to look forward — not just in the rearview mirror.🎯 This episode is for:Executives wondering where and when to deploy AI in workflows.Knowledge workers questioning whether AI will replace or assist them.Researchers and HR leaders looking to measure AI’s real impact on productivity.🤔 And here’s the question to leave you with: if AI can create the report, can it also handle the meeting about that report? GPT may generate slides, but can it lead a strategy session, build trust, or read a room? That’s the next frontier in measuring and developing AI — the messy, human side of work.🔗 Share this episode, drop your thoughts in the comments, and don’t forget to subscribe — next time, we’ll explore real-world tactics to make AI more reliable in business-critical tasks.Key Takeaways:The GDP Benchmark measures AI’s ability to perform real, complex digital work — not just quiz answers.Top models already match or exceed expert-level output in nearly 50% of cases.Most failures come from missed details or incomplete execution — not lack of intelligence.Better prompting and internal review workflows can significantly boost quality.Human-in-the-loop remains essential for trust, safety, and performance.SEO Tags:Niche: #AIinBusiness, #GDPBenchmark, #FutureOfWork, #AIvsHumanPopular: #artificialintelligence, #technology, #automation, #business, #productivityLong-tail: #evaluatingAIwork, #AIimpactoneconomy, #benchmarkingAImodelsTrending: #GPT5, #ClaudeOpus, #AIonTheEdge, #ExpertvsAI
    --------  
    14:08
  • Google. The Future of Robots: Thinking, Learning, and Reasoning
    Imagine a robot that doesn’t just follow your commands but actually thinks, analyzes the situation, and corrects its own mistakes. Sounds like science fiction? In this episode, we break down the revolution in general-purpose robotics powered by Gemini Robotics 1.5 — GR 1.5 and GRE 1.5.🔹 What does this mean in practice?Robots now think in human language — running an inner dialogue, writing down steps, and checking progress. This makes their actions transparent and predictable for people.They can learn skills across different robot bodies — and then perform tasks on new machines without retraining. One robot learns, and all of them get smarter.With the GRE 1.5 “brain”, they can plan complex, real-world processes — from cooking risotto by recipe to sorting trash according to local rules — with far fewer mistakes.But that’s just the beginning. We also explore how this new architecture:solves the data bottleneck with motion transfer,introduces multi-layered safety (risk recognition and automated stress tests),opens the door to using human and synthetic video for scalable training,and why trust and interpretability are becoming critical in AI robotics.This episode shows why GR 1.5 and GRE 1.5 aren’t just an evolution but a foundational shift. Robots are moving from being mere “tools” to becoming partners that can understand, reason, and adapt.❓Now, here’s a question for you: what boring, repetitive, or overly complex task would you be most excited to hand off to a robot like this? Think about it — and share your thoughts in the comments!👉 Don’t forget to subscribe so you won’t miss future episodes. We’ve got even more insights on how cutting-edge technology is reshaping our lives.Key Takeaways:GR 1.5 thinks in human language and self-corrects.Skills transfer seamlessly across different robots via motion transfer.GRE 1.5 reduces planning errors by nearly threefold.SEO Tags:Niche: #robotics, #artificialintelligence, #GeminiRobotics, #generalpurpose_robotsPopular: #AI, #robots, #futuretech, #neuralnetworks, #automationLong-tail: #robots_for_home, #future_of_artificial_intelligence, #AI_robot_learningTrending: #GenerativeAI, #EmbodiedAI, #AIrobotsRead more: https://deepmind.google/discover/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/
    --------  
    12:28
  • Arxiv. When Data Becomes Pricier Than Compute: The New AI Era
    Imagine this paradox: compute power for training AI models is growing 4× every year, yet the pool of high-quality data barely grows by 3%. The result? For the first time, it’s not hardware but data that has become the biggest bottleneck for large language models.In this episode, we explore what this shift means for the future of AI. Why do standard scaling approaches—like just making models bigger or endlessly reusing limited datasets—actually backfire? And more importantly, what algorithmic tricks let us squeeze every drop of performance from scarce data?We dive into:Why classic scaling laws (like Chinchilla) break down under fixed datasets.How cranking up regularization (30× higher than standard!) prevents overfitting.Why ensembles of models outperform even an “infinitely large” single model—and how just three models together can beat the theoretical maximum of one giant.How knowledge distillation turns unwieldy ensembles into compact, efficient models ready for deployment.The stunning numbers: from a 5× boost in data efficiency to an eye-popping 17.5× reduction in dataset size for domain adaptation.Who should listen? Engineers, researchers, and curious minds who want to understand how LLM training is shifting in a world where compute is becoming “free,” but high-quality data is the new luxury.And here’s the question for you: if compute is no longer a constraint, which forgotten algorithms and older AI ideas should we bring back to life? Could they hold the key to the next big breakthrough?Subscribe now so you don’t miss new insights—and share your thoughts in the comments. Sometimes the discussion is just as valuable as the episode itself.Key Takeaways:Compute is no longer the bottleneck—data is the real scarce resource.Strong regularization and ensembling massively boost data efficiency.Distillation makes ensemble power practical for deployment.Algorithmic techniques can deliver up to 17.5× data savings in real tasks.SEO Tags:Niche: #LLM, #DataEfficiency, #Regularization, #EnsemblingPopular: #ArtificialIntelligence, #MachineLearning, #DeepLearning, #AITrends, #TechPodcastLong-tail: #OptimizingModelTraining, #DataEfficiencyInAI, #FutureOfLLMsTrending: #AI2025, #GenerativeAI, #LLMResearchRead more: https://arxiv.org/abs/2509.14786
    --------  
    12:47
  • Arxiv. Small Batches, Big Shift in LLM Training
    What if everything you thought you knew about training large language models turned out to be… not quite right? 🤯In this episode, we dive deep into a topic that could completely change the way we think about LLM training. We’re talking about batch size — yes, it sounds dry and technical, but new research shows that tiny batches, even as small as one, don’t just work — they can actually bring major advantages.🔍 In this episode you’ll learn:Why the dogma of “huge batches for stability” came about in the first place.How LLM training is fundamentally different from classical optimization — and why “smaller” can actually beat “bigger.”The secret setting researchers had overlooked for years: scaling Adam’s β2 with a constant “token half-life.”Why plain old SGD is suddenly back in the game — and how it can make large-scale training more accessible.Why gradient accumulation may actually hurt memory efficiency instead of helping, and what to do instead.💡 Why it matters for you:If you’re working with LLMs — whether it’s research, fine-tuning, or just making the most out of limited GPUs — this episode can save you weeks of trial and error, countless headaches, and lots of resources. Small batches are not a compromise; they’re a path to robustness, efficiency, and democratized access to cutting-edge AI.❓Question for you: which other “sacred cows” of machine learning deserve a second look?Share your thoughts — your insight might spark the next breakthrough.👉 Subscribe now so you don’t miss future episodes. Next time, we’ll explore how different optimization strategies impact scaling and inference speed.Key Takeaways:Small batches (even size 1) can be stable and efficient.The secret is scaling Adam’s β2 correctly using token half-life.SGD and Adafactor with small batches unlock new memory and efficiency gains.Gradient accumulation often backfires in this setup.This shift makes LLM training more accessible beyond supercomputers.SEO Tags:Niche: #LLMtraining, #batchsize, #AdamOptimization, #SGDPopular: #ArtificialIntelligence, #MachineLearning, #NeuralNetworks, #GPT, #DeepLearningLong-tail: #SmallBatchLLMTraining, #EfficientLanguageModelTraining, #OptimizerScalingTrending: #AIresearch, #GenerativeAI, #openAIRead more: https://arxiv.org/abs/2507.07101
    --------  
    16:52

Más podcasts de Tecnología

Acerca de AIandBlockchain

Cryptocurrencies, blockchain, and artificial intelligence (AI) are powerful tools that are changing the game. Learn how they are transforming the world today and what opportunities lie hidden in the future.
Sitio web del podcast

Escucha AIandBlockchain, Inteligencia Artificial y muchos más podcasts de todo el mundo con la aplicación de radio.net

Descarga la app gratuita: radio.net

  • Añadir radios y podcasts a favoritos
  • Transmisión por Wi-Fi y Bluetooth
  • Carplay & Android Auto compatible
  • Muchas otras funciones de la app
Aplicaciones
Redes sociales
v7.23.9 | © 2007-2025 radio.de GmbH
Generated: 10/24/2025 - 11:20:44 PM