Research
Cutting-edge research, technical deep-dives, and R&D intelligence
Sources
Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments
Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result’s
The Way We Notice, That's What Really Matters: Instantiating UI Components with Distinguishing Variations
Front-end developers author UI components to be broadly reusable by parameterizing visual and behavioral properties. While flexible, this makes instantiation harder, as developers must reason about numerous property values and interactions. In practice, they must explore the component’s large design
CORPGEN advances AI agents for real work
At a glance Today’s AI agent benchmarks test one task at a time, while real workplace productivity requires managing dozens of interdependent tasks at once. To reflect this, we created a setting called Multi-Horizon Task Environments (MHTEs). Under multi-task loads, leading computer-using agents deg
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Our latest image generation model offers advanced world knowledge, production ready specs, subject consistency and more, all at Flash speed.
Enhancing Multimodal Training and Memory Efficiency with DeepSpeed
Overview This blog walks through two crucial DeepSpeed updates: (1) a PyTorch-identical backward API that enables efficient training of multimodal, multi-component models (including non-scalar backward calls), and (2) low-precision model training that significantly reduces peak memory, especially. F
All Research (60)
Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
Prior studies investigating the internal workings of LLMs have uncovered sparse subnetworks, often referred to as circuits, that are responsible for performing specific tasks. Additionally, it has been shown that model performance improvement through fine-tuning often results from the strengthening
A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning
Traditional electronic recycling processes suffer from significant resource loss due to inadequate material separation and identification capabilities, limiting material recovery. We present A.R.I.S. (Automated Recycling Identification System), a low-cost, portable sorter for shredded e-waste that a
Closing the Gap Between Text and Speech Understanding in LLMs
Large Language Models (LLMs) can be adapted to extend their text capabilities to speech inputs. However, these speech-adapted LLMs consistently underperform their text-based counterparts—and even cascaded pipelines—on language understanding tasks. We term this shortfall the text-speech understanding
Accelerating Autotuning in Helion with Bayesian Optimization
Introduction As introduced in a previous blog post , Helion is a high-level DSL that empowers developers to write high-performance ML kernels using a familiar PyTorch-like syntax, delegating the complex task of optimization to its autotuning engine. This autotuner explores a vast, high-dimensional s
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML. Despite the immense diversity of web content, existing open-source datasets predominantly apply a single fixed extractor to all webpages. In this work, we investigate whether
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
PyTorch \texttt{2.x} introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, adapting to the PyTorch compiler to full potential can be challenging. The compiler operates at the Python bytecode level, making it appear as an opaque box. To addres
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Recent multimodal large language models (MLLMs) such as GPT-4o and Qwen3-Omni show strong perception but struggle in multi-speaker, dialogue-centric settings that demand agentic reasoning tracking who speaks, maintaining roles, and grounding events across time. These scenarios are central to multimo
The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics
Chain-of-thought (CoT) prompting is a de-facto standard technique to elicit reasoning-like responses from large language models (LLMs), allowing them to spell out individual steps before giving a final answer. While the resemblance to human-like reasoning is undeniable, the driving forces underpinni
Apple Workshop on Reasoning and Planning 2025
Reasoning and planning are the bedrock of intelligent AI systems, enabling them to plan, interact, adapt, and ultimately, operate independently. At Apple, understanding and advancing reasoning capablilities in AI systems has long been an area of active research, and has resulted in numerous publicat
Gemini 3.1 Pro: A smarter model for your most complex tasks
3.1 Pro is designed for tasks where a simple answer isn’t enough.
Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions
Insights from Microsoft’s Media Integrity and Authentication: Status, Directions, and Futures report It has become increasingly difficult to distinguish fact from fiction when viewing online images and videos. Resilient, trustworthy technologies can help people determine whether the content they are
After Orthogonality: Virtue-Ethical Agency and AI Alignment
Preface This essay argues that rational people don’t have goals, and that rational AIs shouldn’t have goals. Human actions are rational not because we direct them at some final ‘goals,’ but because we align actions to practices [1] : networks of actions, action-dispositio
Project Silica’s advances in glass storage technology
At a glance Microsoft Research publishes breakthrough in Nature on glass-based data storage that could preserve information for 10,000 years. New technique extends technology from expensive fused silica to ordinary borosilicate glass found in kitchen cookware. Innovations enable faster parallel writ
A new way to express yourself: Gemini can now create music
The Gemini app now features our most advanced music generation model Lyria 3, empowering anyone to make 30-second tracks using text or images.
Accelerating discovery in India through AI-powered science and education
Google DeepMind brings National Partnerships for AI initiative to India, scaling AI for science and education
Pyrefly Now Type Checks PyTorch
We’re excited to share that PyTorch now leverages Pyrefly to power type checking across our core repository , along with a number of projects in the PyTorch ecosystem: Helion, TorchTitan and Ignite. For a project the size of PyTorch, leveraging typing and type checking has long been essential for en
Gemini 3 Deep Think: Advancing science, research and engineering
Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.
Accelerating Mathematical and Scientific Discovery with Gemini Deep Think
Research papers point to the growing impact of Deep Think across fields
Accelerating Mamba2 with Kernel Fusion
Summary In this post, we discuss how we optimized the Mamba-2 State-Space Dual (SSD) module with a fused Triton kernel that yields speedups of 1.50x-2.51x on NVIDIA A100 and H100 GPUs. To achieve this, we fused all five SSD kernels into a single Triton kernel with careful synchronization. To our kno
Some Matrix Multiplication Engines Are Not As Accurate As We Thought
What is an accumulator in an accelerator’s GEMM engine and why does it matter? GPUs and custom accelerators include specialized compute engines for matrix multiplication (also known as matmul or GEMM), such as NVIDIA’s Tensor Cores. These engines efficiently perform matmul on small tenso
Building Highly Efficient Inference System for Recommenders Using PyTorch
Why Choose PyTorch for Recommendation System PyTorch has emerged as the de facto framework in the AI community, with the majority of cutting-edge research, especially in areas like recommendation systems, retrieval, and ranking, being conducted with PyTorch. Developers are eager to bring the latest
Rethinking imitation learning with Predictive Inverse Dynamics Models
At a glance Imitation learning becomes easier when an AI agent understands why an action is taken. Predictive Inverse Dynamics Models (PIDMs) predict plausible future states, clarifying the direction of behavior during imitation learning. Even imperfect predictions reduce ambiguity, making it cleare
Paza: Introducing automatic speech recognition benchmarks and models for low resource languages
At a glance Microsoft Research releases PazaBench and Paza automatic speech recognition models , advancing speech technology for low resource languages. Human-centered pipeline for low-resource languages: Built for and tested by communities, Paza is an end-to-end, continuous pipeline that elevates h
Portable Paged Attention in Helion
Recently, the PyTorch team released Helion , a new domain-specific and PyTorch-based language to make the development of high-performing but portable kernels easier. With extensive autotuning built in, Helion has the promise to move the forefront of performance portability further than Triton. To te
Unlock Reasoning in Llama 3.1-8B via Full Fine-Tuning on NVIDIA DGX Spark
What is the unsaid joy of local LLMs? The magic of downloading weights, running some experiments overnight, maybe your room gets a bit toasty, and voila, you create a small but performant model that runs on your desktop. Often this involves a big GPU machine and lots of cables; in our case, it was a
Project Genie: Experimenting with infinite, interactive worlds
Google AI Ultra subscribers in the U.S. can try out Project Genie, an experimental research prototype that lets you create and explore worlds.
Accelerating On-Device ML Inference with ExecuTorch and Arm SME2
Interactive image segmentation has become a defining mobile experience across the world’s most popular apps. In plain terms, you tap (or draw a rough hint) on an image, and the app instantly “cuts out” the object by producing a pixel mask. This enables familiar features such as creating personalized
UniRG: Scaling medical imaging report generation with multimodal reinforcement learning
At a glance AI-driven medical image report generation can help medical providers become more efficient and productive. Current models are difficult to train because reporting practices vary widely among providers. Universal Report Generation (UniRG) uses reinforcement learning to align model trainin
PyTorch 2.10 Release Blog
We are excited to announce the release of PyTorch® 2.10 ( release notes )! This release features a number of improvements for performance and numerical debugging. Performance has been a focus for PyTorch throughout the 2.x release series, building on the capabilities of the PyTorch compiler stack in
Multimodal reinforcement learning with agentic verifier for AI agents
At a glance Today’s multimodal AI systems can give answers that sound right but may not be grounded in what they actually observe over time, leading to unpredictable errors and safety risks in real-world settings. Argos is a verification framework for multimodal reinforcement learning that tra
D4RT: Teaching AI to see the world in four dimensions
D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.
OptiMind: A small language model with optimization expertise
At a glance Many real-world business problems can benefit from optimization, but translating decisions, constraints, and goals from natural language into optimization algorithms is slow. OptiMind is a small language model designed to convert business problems described in natural language into the m
Veo 3.1 Ingredients to Video: More consistency, creativity and control
Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.
NVIDIA Rubin Platform, Open Models, Autonomous Driving: NVIDIA Presents Blueprint for the Future at CES
NVIDIA founder and CEO Jensen Huang took the stage at the Fontainebleau Las Vegas to open CES 2026, declaring that AI is scaling into every domain and every device. “Computing has been fundamentally reshaped as a result of accelerated computing, as a result of artificial intelligence,” Huang said. “
Google's year in review: 8 areas with research breakthroughs in 2025
Google 2025 recap: Research breakthroughs of the year
Gemini 3 Flash: frontier intelligence built for speed
Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior
Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
Improved Gemini audio models for powerful voice experiences
As AI Grows More Complex, Model Builders Rely on NVIDIA
Unveiling what it describes as the most capable model series yet for professional knowledge work, OpenAI launched GPT-5.2 in December. The model was trained and deployed on NVIDIA infrastructure, including NVIDIA Hopper and GB200 NVL72 systems. GPT-5.3 Codex — the first OpenAI agentic coding model t
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites
AI agents are reshaping software development, from writing code to carrying out complex instructions. Yet LLM-based agents are prone to errors and often perform poorly on complicated, multi-step tasks. Reinforcement learning (RL) is an approach where AI systems learn to make optimal decisions by rec
Deepening our partnership with the UK AI Security Institute
Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and security research
Promptions helps make AI prompting more precise with dynamic UI controls
Anyone who uses AI systems knows the frustration: a prompt is given, the response misses the mark, and the cycle repeats. This trial-and-error loop can feel unpredictable and discouraging. To address this, we are excited to introduce Promptions ( prompt + options ), a UI framework that helps develop
Strengthening our partnership with the UK government to support prosperity and security in the AI era
Deepening our partnership with the UK government to support prosperity and security in the AI era
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.
Engineering more resilient crops for a warming climate
Scientists are using AlphaFold to strengthen a photosynthesis enzyme for resilient, heat-tolerant crops.
AlphaFold: Five years of impact
Explore how AlphaFold has accelerated science and fueled a global wave of biological discovery.
Revealing a key protein behind heart disease
AlphaFold has revealed the structure of a key protein behind heart disease
Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery
Google DeepMind and the DOE partner on Genesis, a new effort to accelerate science with AI.
How we’re bringing AI image verification to the Gemini app
Build with Nano Banana Pro, our Gemini 3 Pro Image model
Introducing Nano Banana Pro
Start building with Gemini 3
We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region
Google DeepMind opens a new Singapore research lab, accelerating AI progress in the Asia-Pacific region.