All articles

Chinchilla Scaling: A Replication Attempt
We replicate Hoffmann et al.’s estimation of a parametric scaling law and find issues with their estimates. Our estimates fit the data better and align with Hoffmann's other approaches.
Tracking Compute-Intensive AI Models
We present a dataset of 81 compute-intensive models, from AlphaGo to Gemini, developed across 18 countries, at the leading edge of scale and capabilities.
Optimally Allocating Compute Between Inference and Training
If it is feasible to trade off inference and training compute, we find that it is optimal for AI labs to spend similar amounts on training and inference.
Algorithmic Progress in Language Models
We study how algorithmic improvements and increases in computational power have improved the performance of language models from 2014 to 2024. We find that the progress from new algorithms surpasses what we'd expect from merely increasing our computing resources, occurring at a pace equivalent to doubling computational power every 5 to 14 months.
Epoch Impact Assessment 2023
Our impact report for 2023.
Biological Sequence Models in the Context of the AI Directives
The White House recently issued an Executive Order requiring enhanced oversight for AI models trained on biological data exceeding 1e23 operations. We provide an overview of our expanded data coverage to biological sequence models, revealing a significant increase in computational resources and the extensive availability of protein and DNA sequence data. Our analysis identifies critical trends in training compute, data stock, and potential regulatory gaps.
How Predictable Is Language Model Benchmark Performance?
We investigate large language model performance across five orders of magnitude of compute scaling, finding that compute-focused extrapolations are a promising way to forecast AI capabilities.
Limits to the Energy Efficiency of CMOS Microprocessors
How much further can the energy efficiency of modern microprocessors be pushed before we hit physical limits? While this question has important implications for future hardware R&D and AI training runs, there has thus far been little work that answers this question comprehensively and rigorously. In our new paper, we analyze the primary sources of heat dissipation in CMOS processors and propose a simple model for forecasting their energy efficiency limits.
AI Capabilities Can Be Significantly Improved Without Expensive Retraining
Scale has been a major driver of performance gains for Large Language Models (LLMs). This can lead to an assumption that an LLM's capabilities are largely set after pre-training. Our latest article challenges this notion and reviews various techniques that can improve the performance of models after training, with minimal additional computational expense. We quantify the benefits and computational costs of these methods and discuss the implications for AI policy.
Who Is Leading in AI? An Analysis of Industry AI Research
The private sector has emerged as a driving force in artificial intelligence, fueled by an explosion of investment in hardware and talent. But which companies are steering the field? Our new article compares leading AI companies by research publications, citations, size of training runs, and contributions to key algorithmic innovations. In this blog post, we summarize the key findings as well as some policy implications.
Challenges in Predicting AI Automation
We review the literature on predicting AI automation of tasks in the economy. There are vast differences between existing approaches, both in methodologies and predictions. We examine the significant challenges these prediction methodologies face, and how predictions relate to empirical evidence so far. This review aims to provide a comprehensive overview for researchers and policymakers on the state of AI automation predictions, their challenges, and their future potential.
Trends in Machine Learning Hardware
We analyze recent trends in machine learning hardware performance, focusing on metrics such as computational performance, memory, interconnect bandwidth, price-performance, and energy efficiency across different GPUs and accelerators. The analysis aims to provide a holistic view of ML hardware capability and bottlenecks.
Announcing Epoch’s Updated Parameter, Compute and Data Trends Database
We are releasing a newly expanded database, which tracks the parameters, datasets, training compute and other details of over 500 notable machine learning systems
Explosive Growth from AI: A Review of the Arguments
Our new article reviews growth theory and empirical arguments regarding the potential of advanced AI substantially accelerating economic growth. We take stock of the key arguments for why we might or might not expect growth that is on the order of ten-fold the growth rates common in today’s frontier economies once advanced AI systems are widely deployed.
Trading Off Compute in Training and Inference
Some techniques allow to increase the performance of machine learning models at the cost of more expensive inference, or reduce inference compute at the cost of lower performance. This possibility induces a tradeoff between spending more resources on training or on inference. We explore the characteristics of this tradeoff and outline some implications for AI governance.
The Limited Benefit of Recycling Foundation Models
I investigate the benefits of recycling old foundation models to save training costs on large training runs, finding that it seems unlikely that model recycling will result in more than a modest increase in AI capabilities.
Epoch and FRI Mentorship Program Summer 2023
We are launching the Epoch and FRI mentorship program for women, non-binary people, and trans people of all genders.
Direct Approach Interactive Model
The Direct Approach framework bounds the compute requirements for transformative AI by extrapolating neural scaling laws. We combine those estimates with simple models of future progress in algorithms, investment, and compute costs to produce a user-adjustable forecast over the date at which TAI will be achieved.
A Compute-Based Framework for Thinking About the Future of AI
I explain a framework for predicting the future of AI. The framework states that compute is ultimately the most important driver of progress in AI, and that AI will likely dramatically increase the world economic growth rate later this century. I also defend the idea that progress in AI will likely become relatively predictable, allowing us to anticipate AI capabilities before they are fully formed.
Please Report Your Compute
We’ve written an opinion piece pushing for AI researchers and engineers to be more transparent in reporting their compute usage. This can help forecast future developments and risks, inform AI governance and policy, and benefit the broader AI community.
The Direct Approach
Empirical scaling laws can help predict the cross-entropy loss associated with training inputs, such as compute and data. However, in order to predict when AI will achieve some subjective level of performance, it is necessary to devise a way of interpreting the cross-entropy loss of a model. This blog post provides a discussion of one such theoretical method, which we call the Direct Approach.
Power Laws in Speedrunning and Machine Learning
We develop a model for predicting record improvements in video game speedrunning and apply it to predicting machine learning benchmarks. This model suggests that machine learning benchmarks are not close to saturation, and that large sudden improvements are infrequent, but not ruled out.
Announcing Epoch’s Dashboard of Key Trends and Figures in Machine Learning
We are launching a dashboard that provides key data from our research on machine learning, aiming to serve as a valuable resource for understanding the present and future of the field.
Epoch Impact Assessment 2022
Our impact report for 2022.
Trends in the Dollar Training Cost of Machine Learning Systems
I combine training compute and GPU price-performance data to estimate the cost of compute in US dollars for the final training run of 124 machine learning systems published between 2009 and 2022, and find that the cost has grown by approximately 0.5 orders of magnitude per year.
Scaling Laws Literature Review
I have collected a database of scaling laws for different tasks and architectures, and reviewed dozens of papers in the scaling law literature.
An Interactive Model of AI Takeoff Speeds
We have developed an interactive website showcasing a new model of AI takeoff speeds.
Literature Review of Transformative Artificial Intelligence Timelines
We summarize and compare several models and forecasts predicting when transformative AI will be developed.
Revisiting Algorithmic Progress
We use a dataset of over a hundred computer vision models from the last decade to investigate how better algorithms and architectures have enabled researchers to use compute and data more efficiently. We find that every 9 months, the introduction of better algorithms contribute the equivalent of a doubling of compute budgets.
Predicting GPU Performance
We develop a simple model that predicts progress in the performance of field-effect transistor-based GPUs under the assumption that transistors can no longer miniaturize after scaling down to roughly the size of a single silicon atom. Our model forecasts that the current paradigm of field-effect transistor-based GPUs will plateau sometime between 2027 and 2035, offering a performance of between 1e14 and 1e15 FLOP/s in FP32.
Will We Run Out of ML Data? Evidence From Projecting Dataset Size Trends
Based on our previous analysis of trends in dataset size, we project the growth of dataset size in the language and vision domains. We explore the limits of this trend by estimating the total stock of available unlabeled data over the next decades.
Trends in Training Dataset Sizes
We collected a database of notable ML models and their training dataset sizes. We use this database to find historical growth trends in dataset size for different domains, particularly language and vision.
The Longest Training Run
Training runs of large machine learning systems are likely to last less than 14-15 months. This is because longer runs will be outcompeted by runs that start later and therefore use better hardware and better algorithms.
A Time-Invariant Version of Laplace’s Rule
We explore how to estimate the probability of an event given information of past occurrences. We explain a problem with the naive application of Laplace's rule in this context, and suggest a modification to correct it.
Machine Learning Model Sizes and the Parameter Gap
The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.
Trends in GPU Price-Performance
Using a dataset of 470 models of graphics processing units released between 2006 and 2021, we find that the amount of floating-point operations/second per $ doubles every ~2.5 years.
Announcing Epoch: A Research Initiative Investigating the Road to Transformative AI
We are a new research initiative forecasting developments in AI. Come join us!
Grokking “Semi-informative priors over AI timelines”
I give visual explanations for Tom Davidson’s report, Semi-informative priors over AI timelines, and summarise the key assumptions and intuitions
Grokking “Forecasting TAI With Biological Anchors”
I give a visual explanation of Ajeya Cotra’s draft report, Forecasting TAI with biological anchors, summarising the key assumptions, intuitions, and conclusions.
Compute Trends Across Three Eras of Machine Learning
We’ve compiled a dataset of the training compute for over 120 machine learning models, highlighting novel trends and insights into the development of AI since 1952, and what to expect going forward.
Estimating Training Compute of Deep Learning Models
We describe two approaches for estimating the training compute of Deep Learning systems, by counting operations and looking at GPU time.
What’s the Backward-Forward FLOP Ratio for Neural Networks?
Determining the backward-forward FLOP ratio for neural networks, to help calculate their total training compute.
How to Measure FLOP/s for Neural Networks Empirically?
Computing the utilization rate for multiple Neural Network architectures.