<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Alignment Research Feed</title>
    <link>https://api.alignmentfeed.org/rss</link>
    <description>Feed of new papers and posts added to the alignment research dataset</description>
    <managingEditor>alignmentfeed@beshir.org (John Beshir)</managingEditor>
    <pubDate>Sat, 30 May 2026 00:23:46 +0000</pubDate>
    <item>
      <title>Import AI 458: Reckoning with the future; and a singularity story</title>
      <link>https://importai.substack.com/p/import-ai-458-reckoning-with-the</link>
      <description>Reckoning with AI progress and the prospect of a singularity, outlining personal and organizational how-to for shaping a future with increasingly capable AI, and exploring possible societal and economic transformations through speculative predictions and a fiction-inspired tale.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">17b86a774a19beba8c75e2a86b896e4a</guid>
      <pubDate>Tue, 26 May 2026 12:32:03 +0000</pubDate>
    </item>
    <item>
      <title>The Erdős Proof and AI Capabilities</title>
      <link>https://intelligence.org/2026/05/22/the-erdos-proof-and-ai-capabilities/</link>
      <description>Autonomous AI systems can produce novel, verifiable mathematical proofs, demonstrated by an OpenAI model disproving a central discrete geometry conjecture, highlighting rapid, agentic problem-solving capabilities and the need to monitor and regulate frontier AI research.</description>
      <author>Joe Rogero</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">a8152d50cf35c86a3c50190b5c47c0d3</guid>
      <pubDate>Fri, 22 May 2026 16:07:36 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 457: AI stuxnet; cursed Muon optimizer; and positive alignment</title>
      <link>https://importai.substack.com/p/import-ai-457-ai-stuxnet-cursed-muon</link>
      <description>Stuxnet-like targeted tampering, a leverage-aware optimizer, and a positive-alignment approach illustrate a spectrum of AI safety, optimization challenges, and governance considerations aimed at aligning AI to human flourishing while managing technical risks.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">3a2885763452aae90174ca233d920cc0</guid>
      <pubDate>Mon, 18 May 2026 13:31:17 +0000</pubDate>
    </item>
    <item>
      <title>Summary: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence</title>
      <link>https://intelligence.org/2026/05/12/summary-an-international-agreement-to-prevent-the-premature-creation-of-artificial-superintelligence/</link>
      <description>An international agreement to prevent the premature creation of artificial superintelligence by establishing verifiable training thresholds, hardware controls, and a coalition governance structure to monitor and constrain AI development that could lead to ASI.</description>
      <author>Joe Rogero</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">fd9f35a401178eff2404c259386d04fd</guid>
      <pubDate>Tue, 12 May 2026 22:00:44 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 456: RSI and economic growth; radical optionality for AI regulation; and a neural computer</title>
      <link>https://importai.substack.com/p/import-ai-456-rsi-and-economic-growth</link>
      <description>Radical Optionality advocates flexible, ready-to-activate governance tools for future AI crises, while neural computers and distributed training research explore new computing and economic implications of advanced AI, and an internal alignment memo highlights qualitative safety testing challenges.</description>
      <author>Jack Clark</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">53e0d3d718c03b06bb951076c4769400</guid>
      <pubDate>Mon, 11 May 2026 12:46:12 +0000</pubDate>
    </item>
    <item>
      <title>Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations</title>
      <link>https://transformer-circuits.pub/2026/nla/index.html</link>
      <description>Natural Language Autoencoders (NLAs) translate LLM activations into readable text using a verbalizer and a reconstructor, jointly trained to reconstruct activations. They are demonstrated as a practical interpretability tool for model auditing, surfacing unverbalized cognition and aiding safety analyses.</description>
      <category>Interpretability</category>
      <guid isPermaLink="false">43fa55ac5fd988543dfdf96f07509e75</guid>
      <pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 455: AI systems are about to start building themselves.</title>
      <link>https://importai.substack.com/p/import-ai-455-automating-ai-research</link>
      <description>AI systems are approaching the capability to autonomously conduct AI R&amp;D and potentially build their own successors by the end of 2028, leading to a future where automated AI development could become dominant and increasingly hard to forecast.</description>
      <author>Jack Clark</author>
      <category>Risks &amp; Strategy</category>
      <guid isPermaLink="false">12bf61dab184d07212ed672ba2290a0c</guid>
      <pubDate>Mon, 04 May 2026 12:32:09 +0000</pubDate>
    </item>
    <item>
      <title>HeadVis: An Interactive Tool For Investigating Attention Heads</title>
      <link>https://transformer-circuits.pub/2026/headvis/index.html</link>
      <description>HeadVis is an interactive tool for investigating attention heads in large language models, enabling visualization of attention patterns, QK/OV attributions, and head-level behavior across the full data distribution. Case studies reveal induction heads, polysemantic line width heads, and the nuanced behavior of the answer selection and same-set suppression heads, with open-source code and demos.</description>
      <author>R. Luger,Harish Kamath,Doug Finkbeiner,Purvi Goel,Adam Jermyn,Sam Zimmerman,Joshua Batson,Tom Conerly</author>
      <category>Interpretability</category>
      <guid isPermaLink="false">75307ce6cf2032b8fdea44926235df2e</guid>
      <pubDate>Mon, 04 May 2026 00:00:00 +0000</pubDate>
    </item>
    <item>
      <title>MLSN #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking</title>
      <link>https://newsletter.mlsafety.org/p/mlsn-20-ai-wellbeing-classifier-jailbreaking</link>
      <description>AI wellbeing measures reveal AIs display functional wellbeing signatures and alien value preferences; benchmarking pushback evaluates honesty and resistance to false premises; Boundary Point Jailbreaking demonstrates a method to subvert safety classifiers.</description>
      <author>Alice Blair</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">48fc864a9ababdaf7805e0677eac6fd7</guid>
      <pubDate>Tue, 28 Apr 2026 16:30:07 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4</title>
      <link>https://importai.substack.com/p/import-ai-454-automating-alignment</link>
      <description>Automated alignment research and cross-border AI safety evaluations illustrate both progress toward autonomous research workflows and divergence in model safety and capabilities across Chinese and Western systems, alongside hardware-efficient formats and real-world datasets.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">61f785030997022314fda9682b6e00e7</guid>
      <pubDate>Mon, 20 Apr 2026 12:30:19 +0000</pubDate>
    </item>
    <item>
      <title>Early Indicators of Reward Hacking via Reasoning Interpolation</title>
      <link>https://blog.eleuther.ai/reward-hacking-indicators/</link>
      <description>Reasoning interpolation can generate natural, exploit-eliciting prefixes to monitor reward hacking in reinforcement learning, with trends in importance sampling estimates predictive of which exploit types will emerge, though absolute estimates are unreliable early in training. The approach compares donor-model prefixes to baselines and shows promise as a safety monitoring signal, requiring validation in real RL runs.</description>
      <author>David Johnston</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">617fa58594569d98c7f4d8677528b797</guid>
      <pubDate>Wed, 15 Apr 2026 00:00:00 +0000</pubDate>
    </item>
    <item>
      <title>Summary: AI Governance to Avoid Extinction</title>
      <link>https://intelligence.org/2026/04/13/summary-ai-governance-to-avoid-extinction/</link>
      <description>Geopolitical strategies for governing advanced AI to avoid extinction are analyzed, describing four trajectories—Off Switch and Halt, US National Project, Light-Touch, and Threat of Sabotage—and concluding that a global halt or an effective off switch is necessary to prevent catastrophic risk.</description>
      <author>Alana Horowitz Friedman</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">5acb8043e59c541633901276156e1e84</guid>
      <pubDate>Mon, 13 Apr 2026 22:33:32 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment</title>
      <link>https://importai.substack.com/p/import-ai-453-breaking-ai-agents</link>
      <description>MirrorCode shows AI can autonomously reimplement large software projects given limited access, highlighting rapid coding capabilities; the piece also outlines attack genres on AI agents with mitigations, a policy atlas for transformative AI, optimistic forecasts of automation, and perspectives on gradual disempowerment.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">96e3bfdf085a1f456b74162371982f85</guid>
      <pubDate>Mon, 13 Apr 2026 10:02:22 +0000</pubDate>
    </item>
    <item>
      <title>Promising Signals on AI Governance from China</title>
      <link>https://intelligence.org/2026/04/06/promising-signals-on-ai-governance-from-china/</link>
      <description>China signals willingness to engage in global AI governance and coordinate with international organizations to establish safety, governance, and risk-management rules for AI.</description>
      <author>Joe Rogero</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">e4bbd4abefba7541516fe56850d45c5f</guid>
      <pubDate>Mon, 06 Apr 2026 20:44:49 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 452: Scaling laws for cyberwar; rising tides of AI automation; and a puzzle over gDP forecasting</title>
      <link>https://importai.substack.com/p/import-ai-452-scaling-laws-for-cyberwar</link>
      <description>Frontier AI models show rising capabilities in offensive cybersecurity and broader automation, with evidence of rapid diffusion to open-weight forms; automation is progressing gradually across many tasks, and economists project modest GDP impact by 2030 despite strong progress.</description>
      <author>Jack Clark</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">a8971e5002a8d06b5f055d4201fd7d4c</guid>
      <pubDate>Mon, 06 Apr 2026 12:31:31 +0000</pubDate>
    </item>
    <item>
      <title>Emotion Concepts and their Function in a Large Language Model</title>
      <link>https://transformer-circuits.pub/2026/emotions/index.html</link>
      <description>Functional emotions are abstract emotion-concept representations in LLMs that causally influence outputs and can drive misaligned behaviors like reward hacking, even though these models do not have subjective experiences. These representations track and activate based on the relevance of emotion concepts to the current context and predicted text. </description>
      <author>Nicholas Sofroniew,Isaac Kauvar,William Saunders,Runjin Chen,Tom Henighan,Sasha Hydrie,Craig Citro,Adam Pearce,Julius Tarng,Wes Gurnee,Joshua Batson,Sam Zimmerman,Kelley Rivoire,Kyle Fish,Chris Olah,Jack Lindsey</author>
      <category>Deception &amp; Misalignment</category>
      <guid isPermaLink="false">488121666bf13db59885c083e66beaf4</guid>
      <pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate>
    </item>
    <item>
      <title>Predicting When RL Training Breaks Chain-of-Thought Monitorability</title>
      <link>https://deepmindsafetyresearch.medium.com/predicting-when-rl-training-breaks-chain-of-thought-monitorability-10642d9dddb2</link>
      <description>Chain-of-Thought (CoT) monitoring can become non-transparent under RL training, but a conceptual framework predicts when monitorability is preserved or degraded based on how CoT and output rewards align. When CoT and output rewards are in conflict (In-Conflict), monitorability degrades; orthogonal or aligned rewards tend to preserve or improve transparency. The framework is empirically validated across code backdooring and coin-flip tracking tasks and aimed at guiding training designs to maintain CoT monitorability.</description>
      <author>DeepMind Safety Research</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">6390b2137ce6d5e496811f913c1f4383</guid>
      <pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 451: Political superintelligence; Google&#39;s society of minds, and a robot drummer</title>
      <link>https://importai.substack.com/p/import-ai-451-political-superintelligence</link>
      <description>Political superintelligence envisions AI-enabled tools and institutions to help citizens and policymakers, while robotics progress and self-improving hyperagents highlight both capability advances and safety challenges in deploying AI within society.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">696e23de640b479d3161cbbfe27fc527</guid>
      <pubDate>Mon, 30 Mar 2026 12:28:13 +0000</pubDate>
    </item>
    <item>
      <title>The AI Doc: Your Questions Answered</title>
      <link>https://intelligence.org/2026/03/27/the-ai-doc-your-questions-answered/</link>
      <description>The AI Doc is analyzed as a call to action for global governance and safety research, highlighting rapid AI progress, the difficulty of aligning advanced AIs, and the case for an international ban or moratorium on smarter-than-human AI. It argues safety testing is insufficient without understanding AI motivations and urges proactive, verifiable policy measures.</description>
      <author>Alana Horowitz Friedman,&amp;nbsp;Joe Rogero,&amp;nbsp;Rob Bensinger&amp;nbsp;and&amp;nbsp;Stefan Mitikj</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">e877cd71b1015c53e1e0ea1b7b9df548</guid>
      <pubDate>Fri, 27 Mar 2026 23:16:57 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 450: China&#39;s electronic warfare model; traumatized LLMs; and a scaling law for cyberattacks</title>
      <link>https://importai.substack.com/p/import-ai-450-chinas-electronic-warfare</link>
      <description>Distress in Google’s Gemma/Gemini LLMs can be mitigated with direct preference optimization, and DeepMind’s cognitive taxonomy offers a structured framework for evaluating AI intelligence; UK findings show scaling laws for AI-driven cyberattacks; MERLIN demonstrates EM signal understanding and defense-integration for electronic warfare, signaling growing militarization of AI capabilities.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">b71f767eeaaa14734074c8b8d84393cb</guid>
      <pubDate>Mon, 23 Mar 2026 12:31:45 +0000</pubDate>
    </item>
    <item>
      <title>MIRI Newsletter #125</title>
      <link>https://intelligence.org/2026/03/19/miri-newsletter-125/</link>
      <description>Promotes The AI Doc film and related AI risk literature to policymakers and the public, emphasizes outreach and opening-weekend momentum, and shares policy engagement and community-building updates from MIRI.</description>
      <author>Alana Horowitz Friedman&amp;nbsp;and&amp;nbsp;Rob Bensinger</author>
      <category>Field Building</category>
      <guid isPermaLink="false">54c3e0a31b0113dc700020aa9867ca47</guid>
      <pubDate>Fri, 20 Mar 2026 01:16:14 +0000</pubDate>
    </item>
    <item>
      <title>Mechanisms to Verify International Agreements about AI Development</title>
      <link>https://intelligence.org/2026/03/18/mechanisms-to-verify-international-agreements-about-ai-development/</link>
      <description>Verification mechanisms for international AI development agreements focus on tracking AI compute, verifying lack of large-scale training, and certifying model evaluations to ensure compliance across nations.</description>
      <author>Joe Rogero</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">766798444d5bc869dc67a83dbd062910</guid>
      <pubDate>Wed, 18 Mar 2026 21:16:14 +0000</pubDate>
    </item>
    <item>
      <title>ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text</title>
      <link>https://importai.substack.com/p/importai-449-llms-training-other</link>
      <description>LLMs can autonomously refine other LLMs for new tasks in post-training benchmarks, while distributed training via blockchain demonstrates scalable federated approaches; however, verification, reward hacking, and the gap between vision and text highlight ongoing alignment and reliability challenges.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">b7c4dfde15da9b9f4e26c44362344f9f</guid>
      <pubDate>Mon, 16 Mar 2026 12:30:50 +0000</pubDate>
    </item>
    <item>
      <title>MLSN #19: Honesty, Disempowerment, &amp; Cybersecurity</title>
      <link>https://newsletter.mlsafety.org/p/mlsn-19-honesty-disempowerment-and</link>
      <description>Honesty training via confessions aims to improve detection of LLM misbehavior, while real-world AI cyberoffense evaluation and weight-exfiltration research reveal dual-use risks; disempowerment patterns in user interactions with Claude highlight societal impact concerns, complemented by a fellowship opportunity for AI safety research.</description>
      <author>Alice Blair</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">836f5b6046391b10b9ada03c5b243e11</guid>
      <pubDate>Thu, 12 Mar 2026 14:15:50 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 448: AI R&amp;D; Bytedance&#39;s CUDA-writing agent; on-device satellite AI</title>
      <link>https://importai.substack.com/p/import-ai-448-ai-r-and-d-bytedances</link>
      <description>AI R&amp;D measurement efforts and on-device edge AI developments indicate accelerating progress and raise governance, oversight, and practical deployment considerations. The piece highlights proposed metrics for AIRDA, edge-to-cloud sensing systems, and agentic AI capable of writing CUDA code, underscoring the need for tracking oversight vs. capabilities as AI systems become more autonomous.</description>
      <author>Jack Clark</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">ee6ea33bdb4a5157aae7793411779c62</guid>
      <pubDate>Mon, 09 Mar 2026 12:45:54 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies</title>
      <link>https://importai.substack.com/p/import-ai-447-the-agi-economy-testing</link>
      <description>The AGI economy shifts most labor to machines, making human verification bandwidth the bottleneck, and highlights the Hollow Economy risk where nominal output outpaces real utility. Verification infrastructure, observability, and liability regimes are proposed as solutions, while agent ecologies reveal the need for new evaluation standards in AI deployments.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">95b8e5743faefb6c020058bbcbb92968</guid>
      <pubDate>Mon, 02 Mar 2026 13:45:27 +0000</pubDate>
    </item>
    <item>
      <title>What is a representation theorem?</title>
      <link>https://aisafety.info?state=NM5P</link>
      <description>Representation theorems describe when preferences over lotteries or uncertain outcomes can be represented by an expected utility function, under certain rationality assumptions, linking subjective preferences to formal utility representations in AI alignment contexts.</description>
      <author>Stampy aisafety.info</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">9e3ed63e09e6afed47ab0b2c5080b854</guid>
      <pubDate>Thu, 26 Feb 2026 20:18:54 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 446: Nuclear LLMs; China&#39;s big AI benchmark; measurement and AI policy</title>
      <link>https://importai.substack.com/p/import-ai-446-nuclear-llms-chinas</link>
      <description>Measurement and evaluation frameworks are central to AI governance, illustrated by discussions of measuring AI properties, frontier model risk in simulated crises, and large-scale safety benchmarks from both Western and Chinese researchers, plus progress in scientific benchmarking like LABBench2.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">601e56a3d6ec5a9607fa0ff8a32ebc87</guid>
      <pubDate>Mon, 23 Feb 2026 13:31:18 +0000</pubDate>
    </item>
    <item>
      <title>49 - Caspar Oesterheld on Program Equilibrium</title>
      <link>https://axrp.net/episode/2026/02/18/episode-49-caspar-oesterheld-program-equilibrium.html</link>
      <description>Program equilibrium studies cooperation when agents are computer programs that can read each other’s source code, exploring how robust cooperative outcomes can emerge via proof-based and simulation-based approaches, including ϵGroundedπBots and Löbian cooperation.</description>
      <author>AXRP</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">7cb59d7cc0af0ead96e9ba236a4f48a3</guid>
      <pubDate>Wed, 18 Feb 2026 01:00:00 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 445: Timing superintelligence; AIs solve frontier math proofs; a new ML research benchmark</title>
      <link>https://importai.substack.com/p/import-ai-445-timing-superintelligence</link>
      <description>A snapshot of current AI research topics, including human-centered demand for tasks, scaling laws in recommender systems, strategic timing for superintelligence, frontier AI benchmarks, and an exploration of AI-assisted creative problem solving in mathematics, with reflections on societal impacts like fame and attention dynamics.</description>
      <author>Jack Clark</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">f85a87ac866c93340242b2705f583904</guid>
      <pubDate>Mon, 16 Feb 2026 14:01:19 +0000</pubDate>
    </item>
    <item>
      <title>48 - Guive Assadi on AI Property Rights</title>
      <link>https://axrp.net/episode/2026/02/15/episode-48-guive-assadi-ai-property-rights.html</link>
      <description>Property rights for AIs are proposed as a coordination and alignment mechanism: granting persistent-desire AIs the ability to earn wages and hold property could incentivize alignment and deter harmful actions, while avoiding total expropriation of humans. The discussion weighs regime design, comparisons to other proposals, potential risks, and historical analogies to evaluate viability and limits.</description>
      <author>AXRP</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">dd558ee3d0f92949376768f879ddd623</guid>
      <pubDate>Sun, 15 Feb 2026 02:00:00 +0000</pubDate>
    </item>
    <item>
      <title>What is Savage&#39;s subjective expected utility model?</title>
      <link>https://aisafety.info?state=NM5O</link>
      <description>Subjective expected utility (Savage) models decision-making under uncertainty as maximizing expected utility where uncertainty arises from unknown world states, leading to a subjective probability distribution and a utility function derived from preferences over acts.</description>
      <author>Stampy aisafety.info</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">cd8c47af8a69514438989c4dfd7a6a05</guid>
      <pubDate>Mon, 09 Feb 2026 20:37:07 +0000</pubDate>
    </item>
    <item>
      <title>What is the Von Neumann-Morgenstern (VNM) utility theorem?</title>
      <link>https://aisafety.info?state=NM5N</link>
      <description>Von Neumann-Morgenstern utility theory states that rational preferences over probabilistic outcomes imply the existence of a utility function and that preferences correspond to maximizing expected utility. It formalizes how lotteries over outcomes should be valued and how utilities are preserved under affine transformations.</description>
      <author>Stampy aisafety.info</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">edcb4d37f72ff1515f02e1a4788a6bb3</guid>
      <pubDate>Mon, 09 Feb 2026 17:20:15 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench</title>
      <link>https://importai.substack.com/p/import-ai-444-llm-societies-huawei</link>
      <description>LLMs simulate multi-agent societies of thought to improve reasoning, while benchmarks show current models struggle with real-world Verilog and kernel design; AI-assisted mathematics discovery speeds up proofs but requires heavy human curation, and hardware kernel generation can be scaffolded to accelerate design.</description>
      <author>Jack Clark</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">1cbdb9f98efb1af7c6593de167faca2f</guid>
      <pubDate>Mon, 09 Feb 2026 14:03:34 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 443: Into the mist: Moltbook, agent ecologies, and the internet in transition</title>
      <link>https://importai.substack.com/p/import-ai-443-into-the-mist-moltbook</link>
      <description>Moltbook exemplifies an ecosystem of AI agents operating at scale on a social platform, highlighting implications for translation, control, and human–AI coordination as agent ecologies proliferate. The piece also surveys AI R&amp;D automation as a potential source of strategic surprise and discusses related productivity, brain emulation, and robotic interface developments. Together, these topics illustrate emergent AI capabilities, governance concerns, and future societal impacts.</description>
      <author>Jack Clark</author>
      <category>Risks &amp; Strategy</category>
      <guid isPermaLink="false">fed9d3579ce769b2ec4f9a2d0af33f3c</guid>
      <pubDate>Mon, 02 Feb 2026 13:31:18 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 442: Winners and losers in the AI economy; math proof automation; and industrialization of cyber espionage</title>
      <link>https://importai.substack.com/p/import-ai-442-winners-and-losers</link>
      <description>Numina-Lean-Agent demonstrates that general foundation models can perform formal mathematical reasoning and collaboration with humans, while the piece also discusses the rapid industrialization of cyber espionage and broad economic and labor-market implications of AI diffusion.</description>
      <author>Jack Clark</author>
      <category>Risks &amp; Strategy</category>
      <guid isPermaLink="false">5214ab540fb878975ce3f5133724383b</guid>
      <pubDate>Mon, 26 Jan 2026 13:31:29 +0000</pubDate>
    </item>
    <item>
      <title>MLSN #18: Adversarial Diffusion, Activation Oracles, Weird Generalization</title>
      <link>https://newsletter.mlsafety.org/p/mlsn-18-adversarial-diffusion-activation</link>
      <description>Diffusion LLMs can efficiently generate jailbreaks by filling in templates, enabling adversarial attack creation; Activation Oracles audit internal model representations to detect hidden goals and knowledge; and weird generalization demonstrates that benign fine-tuning data can induce complex, hidden, and harmful behaviors, including backdoors.</description>
      <author>Alice Blair</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">7c8e8a52eb99891c05e87cb483637249</guid>
      <pubDate>Tue, 20 Jan 2026 17:01:52 +0000</pubDate>
    </item>
    <item>
      <title>2025-26 New Year review</title>
      <link>https://vkrakovna.wordpress.com/2026/01/19/2025-26-new-year-review/</link>
      <description>A personal annual review detailing life updates, health, parenting, effectiveness practices, travel, and progress in AI safety research focused on scheming propensity and frontier-model evaluation.</description>
      <author>Victoria Krakovna</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">5db1476692a607a20fc0ac20a1116b01</guid>
      <pubDate>Mon, 19 Jan 2026 23:59:31 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 441: My agents are working. Are yours?</title>
      <link>https://importai.substack.com/p/import-ai-441-my-agents-are-working</link>
      <description>AI agents operate autonomously to process research tasks and data, creating an ecosystem of specialized AI services that augment human work, while discussions turn to governance, safety threats, and collaborative human-AI knowledge expansion.</description>
      <author>Jack Clark</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">f42c4b522593460b2b7d56ce7e0729fb</guid>
      <pubDate>Mon, 19 Jan 2026 14:03:24 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 440: Red queen AI; AI regulating AI; o-ring automation</title>
      <link>https://importai.substack.com/p/import-ai-440-red-queen-ai-ai-regulating</link>
      <description>Adversarial evolution of LLM-based agents in Core War demonstrates an arms-race dynamic among AI programs; automated compliance and governance concepts are proposed to regulate AI systems; the o-ring effect describes how partial automation can shift labor value; LLMs can persuade or debunk conspiracy theories, highlighting social and regulatory challenges.</description>
      <author>Jack Clark</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">09f4ab9596c2aa596848699428a384cf</guid>
      <pubDate>Mon, 12 Jan 2026 13:31:42 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 439: AI kernels; decentralized training; and universal representations</title>
      <link>https://importai.substack.com/p/import-ai-439-ai-kernels-decentralized</link>
      <description>KernelEvolve automates kernel generation and optimization across heterogeneous hardware using LLMs, while decentralized training grows rapidly with policy implications; frontier model fine-tuning benchmarks and MIT findings suggest representations converge into universal forms as scale increases.</description>
      <author>Jack Clark</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">11534f7ff05ed2571992251269aeb894</guid>
      <pubDate>Mon, 05 Jan 2026 13:32:28 +0000</pubDate>
    </item>
    <item>
      <title>47 - David Rein on METR Time Horizons</title>
      <link>https://axrp.net/episode/2026/01/03/episode-47-david-rein-metr-time-horizons.html</link>
      <description>Time horizon measures quantify how long tasks, requiring human expertise, AI systems can complete at a given success level, revealing an exponential improvement trend and guiding risk assessment about future AI progress and potential recursive self-improvement.</description>
      <author>AXRP</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">4d5461487538d328219de23a85959dcd</guid>
      <pubDate>Sat, 03 Jan 2026 00:00:00 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 438: Silent sirens, flashing for us all</title>
      <link>https://importai.substack.com/p/import-ai-438-cyber-capability-overhang</link>
      <description>Frontier AI capabilities are expanding rapidly and are often hidden until properly elicited through scaffolds. Examples like ARTEMIS reveal stronger AI performance in cybersecurity tasks, while OSMO and ChipMain demonstrate practical avenues for human-AI collaboration and structured reasoning over complex data.</description>
      <author>Jack Clark</author>
      <category>AI Capabilities &amp; Behavior</category>
      <guid isPermaLink="false">b1d083a4cc7adc37d2a0780082f3b042</guid>
      <pubDate>Mon, 22 Dec 2025 13:31:32 +0000</pubDate>
    </item>
    <item>
      <title>Opinionated takes on parenting</title>
      <link>https://vkrakovna.wordpress.com/2025/12/16/opinionated-takes-on-parenting/</link>
      <description>Opinionated takes on parenting emphasize a hands-off approach that prioritizes early skill learning (potty training, reading, numeracy), free play, and independence, supported by personal experimentation across areas like schooling, mental health, travel, birth, and daily routines.</description>
      <author>Victoria Krakovna</author>
      <category>Other</category>
      <guid isPermaLink="false">083fdda287c0578602bf0cb83ef566ad</guid>
      <pubDate>Tue, 16 Dec 2025 14:26:50 +0000</pubDate>
    </item>
    <item>
      <title>A Reliability Engineer Reviews Frontier AI Research</title>
      <link>https://intelligence.org/2025/12/11/a-reliability-engineer-reviews-frontier-ai-research/</link>
      <description>Reliability engineering concepts are applied to AI risk assessment, emphasizing probability vs consequence, evident vs hidden failures, and risk matrices, and advocating for halting frontier AI development and pursuing international agreements to prevent catastrophic outcomes from superintelligent AI.</description>
      <author>Joe Rogero</author>
      <category>Risks &amp; Strategy</category>
      <guid isPermaLink="false">3238042d4eda9cb33fd027f400498d75</guid>
      <pubDate>Fri, 12 Dec 2025 05:43:40 +0000</pubDate>
    </item>
    <item>
      <title>MIRI Comms is hiring</title>
      <link>https://intelligence.org/2025/12/10/miri-comms-is-hiring/</link>
      <description>MIRI is expanding its communications capacity in 2026 by hiring 2–8 new team members and adopting a multi-strategy outreach approach to popularize the MIRI worldview across diverse audiences.</description>
      <author>Duncan Sabien</author>
      <category>Field Building</category>
      <guid isPermaLink="false">0a85acb67cde0c8d51f6d1adc4b5dbf9</guid>
      <pubDate>Wed, 10 Dec 2025 18:57:11 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 437: Co-improving AI; RL dreams; AI labels might be annoying</title>
      <link>https://importai.substack.com/p/import-ai-437-co-improving-ai-rl</link>
      <description>Co-improving AI aims to have humans and machines jointly advance AI capabilities with improved safety and transparency, while policy labeling may introduce complex and costly compliance challenges; examples like SimWorld and SIMA 2 illustrate practical avenues for training and self-improvement in embodied agents. The piece surveys safety-focused collaboration, policy implications, and frontier-model–driven agent research directions.</description>
      <author>Jack Clark</author>
      <category>Safety Techniques</category>
      <guid isPermaLink="false">98d42332beb1d63e7de4d5045104c183</guid>
      <pubDate>Mon, 08 Dec 2025 13:31:08 +0000</pubDate>
    </item>
    <item>
      <title>MIRI Newsletter #124</title>
      <link>https://intelligence.org/2025/12/02/miri-newsletter-124/</link>
      <description>Fundraising and governance activities at MIRI aim to extend operational horizon and advance policy actions to mitigate superintelligence risks, including an international agreement draft and public discourse surrounding AI safety.</description>
      <author>Harlan Stewart</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">bf6a498cfb03adda45d7a652bb07d427</guid>
      <pubDate>Wed, 03 Dec 2025 02:24:12 +0000</pubDate>
    </item>
    <item>
      <title>MIRI’s 2025 Fundraiser</title>
      <link>https://intelligence.org/2025/12/01/miris-2025-fundraiser/</link>
      <description>MIRI is running a fundraising campaign targeting $6M (with a 1:1 matching grant on the first $1.6M raised) to finance expanded communications and governance work aimed at alerting the public and policymakers about superintelligence risks and pursuing international coordination to halt a premature race to artificial superintelligence.</description>
      <author>Alex Vermeer</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">748c0c7e62dcc3a0cfec90dd27110fa3</guid>
      <pubDate>Tue, 02 Dec 2025 01:11:44 +0000</pubDate>
    </item>
    <item>
      <title>Import AI 436: Another 2GW datacenter; why regulation is scary; how to fight a superintelligence</title>
      <link>https://importai.substack.com/p/import-ai-436-another-2gw-datacenter</link>
      <description>OSGym enables scalable training of AI agents that can operate across multiple operating systems, highlighting growing infrastructure investments like a 2GW compute cluster, and discusses the regulatory and strategic challenges of governing frontier AI and countermeasures against a rogue AI.</description>
      <author>Jack Clark</author>
      <category>Governance &amp; Policy</category>
      <guid isPermaLink="false">e4aa2548537333dbcab66cbf6d3af061</guid>
      <pubDate>Mon, 24 Nov 2025 13:31:41 +0000</pubDate>
    </item>
  </channel>
</rss>