Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM an...
A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigans to new mini-docs, there was just too much for me not to make a vid.GiveWell: https://www.givewell.org/charities/top-charitiesAI Insiders ($9!): https://www.patreon.com/AIExplaineds1 Paper: https://arxiv.org/pdf/2501.19393Musk Bid: https://www.wsj.com/tech/ai/musks-97-4-billion-openai-bid-piles-pressure-on-altman-f6749e6c?mod=hp_lead_pos1Altman Reply: https://x.com/sama/status/1889059531625464090?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5EtweetGoogle vs OpenAI: https://x.com/sama/status/1888703820596977684RAND Study: https://www.rand.org/pubs/perspectives/PEA3691-4.htmlDev Meetup: https://x.com/btibor91/status/1888976302621040852Altman $100 Trillion: https://www.nytimes.com/2023/03/31/technology/sam-altman-open-ai-chatgpt.htmlKarpathy Vid: https://www.youtube.com/watch?v=7xTGNNLPyMIAmodei Warning: https://www.anthropic.com/news/paris-ai-summitBengio Source: https://www.youtube.com/watch?v=6HDjVncL5GoChapters:00:00 - Intro01:37 - AGI Inches Closer04:26 - ‘Super-Exponential’05:58 - Musk Bid07:34 - Luxury Goods and Land09:05 - ‘Benefits All Humanity’12:52 - ‘National Security’14:21 - s120:33 - Final thoughtsNon-hype Newsletter: https://signaltonoise.beehiiv.com/
--------
22:17
Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research
12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.Deep Research: https://openai.com/index/introducing-deep-research/https://www.youtube.com/watch?v=YkCDVn3_wiwGAIA Bench: https://openreview.net/forum?id=fibxvahvs3https://openreview.net/pdf?id=fibxvahvs3CodeELO:https://arxiv.org/pdf/2501.01257CamelCamel:https://uk.camelcamelcamel.com/Deepseek R1 with search: https://chat.deepseek.com/https://arxiv.org/pdf/2501.12948HaluBench: https://arxiv.org/pdf/2407.08488Chapters:00:00 - Introduction01:06 - Powered by o3, Humanity’s Last Exam, GAIA03:55 - Simple Tests 06:00 - Good News vs Deepseek R1 and Gemini Deep Research09:32 - Bad News on Hallucinations 14:14 - What Can’t it Browse?14:42 - For Shopping?16:40 - Final thoughts
--------
18:32
o3-mini and the “AI War”
o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetoric coming out of the West - and Dario Amodei and Alexandr Wang (CEOs of Anthropic and Scale AI respectively) in particular. The last thing we need is an “AI War”.https://wandb.me/simple-bench(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharingChapters: 00:00 - Introduction00:45 - o3 mini05:11 - First impressions vs Deepseek R107:21 - 10x Scale, o3-mini System Card, Amodei Essay, bitcoin wallets…12:40 - Simple Competition Finale13:03 - Clips and Final Thoughts on the “AI War”O3-mini: https://openai.com/index/openai-o3-mini/Paper: https://cdn.openai.com/o3-mini-system-card.pdfAmodei Essay: https://darioamodei.com/on-deepseek-and-export-controls?s=09FrontierMath wild stat:https://arxiv.org/pdf/2411.04872Sam Altman Channels Napoleon: https://x.com/sama/status/1883185690508488934Altman ‘pulls up releases’: https://x.com/sama/status/1884066337103962416“AI War” by Wang: https://scale.com/blog/win-the-ai-warAnthropic Original Views on Capabilities: https://www.anthropic.com/news/core-views-on-ai-safetyAI Insider Cost Comparison:https://x.com/arankomatsuzaki/status/1884676245922934788Deepseek R1 Paper: https://arxiv.org/pdf/2501.12948R1, o3-mini Price Comparison: https://techcrunch.com/2025/01/31/openai-launches-o3-mini-its-latest-reasoning-model/Semianalysis on $1,3M deepseek salaries, and them falling behind as ‘the time gap to match US capabilities increases’: https://semianalysis.com/2025/01/31/deepseek-debates/OpenAI Valuation: https://www.bloomberg.com/news/articles/2025-01-30/openai-in-talks-to-raise-funding-at-340-billion-value-wsj-says?srnd=phx-aiWang Clip: https://x.com/tsarnick/status/1867700453494206883Amodei Clip: https://x.com/ai_ctrl/status/1884951111771001188https://simple-bench.com/
--------
15:21
Nothing Much Happens in AI, Then Everything Does All At Once
When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timeline.00:00 - Introduction00:54 - OpenAI Operator04:53 - Perplexity Assistant 05:15 - StarGate07:51 - Better than o3?08:25 - DeepSeek R1 Analysis12:12 - Training Secrets15:19 - No More Process Rewarding ?19:01 - Hassabis Timeline Accelerates21:22 - Humanity’s Last Examhttps://app.grayswan.ai/arena/chat/harmful-ai-assistanthttps://app.grayswan.ai/arenahttps://openai.com/index/computer-using-agent/System Prompt: https://github.com/wunderwuzzi23/scratch/blob/master/system_prompts/operator_system_prompt-2025-01-23.txtOpenAI Operator: https://operator.chatgpt.com/System Card: https://cdn.openai.com/operator_system_card.pdfThere is No Plan: https://x.com/jeffclune/status/1882120726339318007Perplexity Assistant: https://x.com/perplexity_ai/status/1882466239123255686Stargate: https://openai.com/index/announcing-the-stargate-project/Labour goes to 0: https://moores.samaltman.com/Larry Ellison AI Surveillance: https://x.com/TheChiefNerd/status/1882042989184430332Amodei 1984: https://www.bloomberg.com/news/articles/2025-01-22/anthropic-ceo-says-openai-s-stargate-venture-seems-chaoticMicrosoft Hesitate: https://www.theinformation.com/articles/why-sam-altman-joined-forces-with-larry-ellison-and-took-a-step-back-from-microsoft?rc=sy0ihqDylan Patel o3+ for Anthropic: https://www.youtube.com/watch?v=7EH0VjM3dTkDeepseek R1: https://arxiv.org/pdf/2501.12948https://arxiv.org/pdf/2412.19437Diagram: https://pbs.twimg.com/media/GhyQsM6WQAE7W52?format=jpg&name=largehttps://simple-bench.com/Process: https://x.com/sama/status/1664018190840614912https://x.com/karpathy/status/1835561952258723930https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/?s=09Demis Interview: https://www.youtube.com/watch?v=yr0GiSgUvPUHumanity’s Last Exam: https://agi.safe.ai/https://x.com/DanHendrycks/status/1882481730671857815https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?s=09
--------
23:09
Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out
OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening today...80,000 Hours Channel: https://www.youtube.com/channel/UCafjal1QYJ3rb0Y9xZk1EzgSpotify: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDibAI Insiders ($9!): https://www.patreon.com/AIExplainedChapters:00:00 - Introduction01:13 - Pro Cost and OpenAI Operator04:00 - Agent Benchmarks Being Targeted07:48 - Fast Take-off, Altman08:48 - Altman flip-flops10:02 - Deepseek R1 First ReactionAltman ‘100x expectations out of control’: https://x.com/sama/status/1881258443669172470OpenAI Operator Table: https://x.com/btibor91/status/1881285255266750564WebVoyager: https://arxiv.org/pdf/2401.13919OSWorld: https://arxiv.org/pdf/2404.07972Axios Exclusive 1 (SuperAgent): https://www.axios.com/2025/01/19/ai-superagent-openai-meta?s=09Axios Exclusive 2: https://www.axios.com/2025/01/18/biden-sullivan-ai-race-trump-chinaDeepseek R1 Numbers: https://x.com/deepseek_ai/status/1881318130334814301Does 1.5B outperform 3.5 Sonnet on Math?: https://x.com/reach_vb/status/1881319500089634954Deepseek R1 (deepseek-reasoner) Pricing: https://api-docs.deepseek.com/quick_start/pricing/Altman Fast Takeoff: https://x.com/tsarnick/status/1879100390840697191OpenAI Economic Blueprint: https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdfTarget is Long-horizon Tasks: https://x.com/karinanguyen_/status/1879576037249667520Support Regulations: https://www.techemails.com/p/elon-musk-and-openaihttps://www.nytimes.com/2023/05/16/technology/openai-altman-artificial-intelligence-regulation.htmlDonation: https://qz.com/sam-altman-donate-million-zuckerberg-bezos-donald-trump-1851721035Amodei on Regulations by 2025: https://www.youtube.com/watch?v=ugvHCXCOmm4‘Feel the AGI’: https://x.com/polynoamial?lang=enGPT-5 and o-series merger: https://x.com/sama/status/1880358749187240274o1 Thinks in Chinese: https://techcrunch.com/2025/01/14/openais-ai-reasoning-model-thinks-in-chinese-sometimes-and-no-one-really-knows-why/Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Covering the biggest news of the century - the arrival of smarter-than-human AI. From the author of Simple Bench, which reveals the remaining gap between LLM and human reasoning. Hype-free, and the British accent is a freebie bonus.