Introduction

Over the last ten years, we have witnessed a dramatic increase in the computational resources dedicated to training state-of-the-art AI models. This strategy has been incredibly productive, translating into large gains in generality and performance. For example, we estimate that about two-thirds of the improvements in performance in language models in the last decade have been due to increases in model scale.

Given the central role of scaling, it is important to track how the computational resources (‘compute’) used to train models have grown in recent years. In this short article, we provide an updated view of the trends so far, having collected three times more data since our last analysis.

We tentatively conclude that compute growth in recent years is currently best described as increasing by a factor of 4-5x/year. We find consistent growth between recent notable models, the running top 10 of models by compute, recent large language models, and top models released by OpenAI, Google DeepMind and Meta AI.

There are some unresolved uncertainties. We cannot rule out that the overall trend of compute might have accelerated. We also find evidence of a slowdown of growth in the frontier around 2018, which complicates the interpretation, and recent frontier growth since 2018 is better described as a 4x/year trend. We also find a significantly faster trend for notable language models overall, which have grown as fast as 9x/year between June 2017 and May 2024. However, when focusing on the frontier of language models, we see that the trend slows down to a ~5x/year pace after the largest language models catch up with the overall frontier in AI around mid-2020.

All in all, we recommend summarizing the recent trend of compute growth for notable and frontier models with the 4-5x/year figure. This should also be used as a baseline for expectations of growth in the future, before taking into account additional considerations such as possible bottlenecks or speed-ups.

Figure 1: Summary of the compute growth trends we found for overall notable models (top left), frontier models (top right), top language models (bottom left) and top models within leading companies (bottom right). All point to a recent trend of 4-5x/year growth.

The overall trend of training compute growth has held

We have previously investigated the trend of growing training compute. In 2022, we found that the amount of compute used to train notable ML models had grown about 4x per year from 2010 to 2022.1 Many notable models have been released since, and we have expanded our database by tripling the number of compute estimates,2 so an update is in order.

In short, we find that the amount of compute used to train notable models has grown about 4.1x/year (90% CI: 3.7x to 4.6x) between 2010 to May 2024.3 If we look at the trend since our last update in Feb 2022, we find that training compute has grown about 4.4x per year (90% CI: 1.5x to 11.8x) – note the wide confidence interval. Both estimates are consistent with our previous estimate, suggesting that the trend has held.

For context, a notable model is one that meets our notability criteria: either a) be presented in a highly cited paper (>1000 citations), b) demonstrate a state-of-the-art result in a well-recognized benchmark, c) be of historical importance or d) be deployed in a product with significant use. Our database of notable models is not comprehensive, and there are many models either missing or for which we could not provide a training compute estimate. See more details about our data in our documentation.

Figure 2: Trend in the amount of compute used to train the ML models at the frontier of scale.

We also study possible changes in the trend of compute growth. We consider a discrete acceleration of the growth rate, a discontinuous break of the exponential trend, and hyperbolic growth. We find that neither model is robustly preferred over simple exponential growth, with some caveats.4 We ultimately cannot rule out an acceleration from the data we have, but further investigating this possibility is outside the scope of this work.

Of course, the models that interest us most are those at or near the frontier. We now focus on models that were in the top 10 of training compute when they were released, which we will refer to as frontier models. The trend that we observe in frontier models is similar to the recent trend in overall models, increasing at 5.3x/year (90% CI: 4.9x to 5.7x) between 2010 and May 2024.

Note that we used a different definition of frontier models in our previous paper.5 We now prefer our updated, simpler definition of frontier models, as models in the running top 10 of training compute. This new definition has the advantage of being less sensitive to outliers and to models with low compute, and we have found that it corresponds better to what we intuitively consider frontier models. Our new definition has consequences on the results we highlight, as the trend among frontier models according to our new definition is noticeably faster than with our previous definition.6 We also investigate whether the trend in frontier models is sensitive to the choice of how many models to include in the running top, and conclude that it is not.7

Now, we turn to the question of whether the trend in frontier models has changed over time. Previous research has found some suggestive evidence of a deceleration of growth in frontier models. In particular, in our February 2022 paper we found a slower growth rate for large-scale models in the 2016 to 2022 period.8 More recently, Lohn (2023) analyzed Epoch data and found a slowing trend of compute for frontier models after around 2017.9

With our updated definition of frontier models, we are able to replicate a similar result. We find strong evidence of a slowdown of growth sometime in the last decade. While other alternative descriptions are possible, we prefer summarizing the trend as slowing down to 4.2 x/year (90% CI: 3.6x to 4.9x) after 2018 (80% CI: Oct 2016 to Aug 2019). See Figure 3.10

Figure 3: Trend representing the slowdown of compute growth in the frontier of scale, excluding AlphaGo Master and AlphaGo Zero. Note that the growth rates and date of the slowdown aren’t robust, as they are sensitive to the date of the break, whether we include compute outliers and whether we allow for a discontinuous break.

We don’t have enough information to discern what caused this slowdown. One plausible story is that between 2010 and 2018 we were in a ‘compute overhang’, where ML experiments were small-scale relative to available compute. Developers realized that deep learning methods could be scaled with good results, and so pushed to increase performance using small but rapidly growing GPU clusters. However, growing beyond ~1e22 FLOP required large dedicated clusters and better hardware, and so the trend slowed down. However, we are not confident in this explanation.

Note that for deriving this trend we excluded two compute outliers: AlphaGo Master and AlphaGo Zero. These are non-central examples of modern machine learning – they are game-playing systems that were trained through an intensive reinforcement learning process, and most of their training compute was used in game simulations to generate data. When included, they single-handedly warp the trend of compute in frontier models, suggesting faster growth before their release and slower afterwards. In particular, if we include the AlphaGo models the best fit becomes a model with a discrete slowdown in 2018 (90% CI: Dec 2016 to Jan 2018) and a recent growth rate of 3.3 x/year (90% CI: 2.8x to 4.1x).

Our takeaway is that growth in the frontier in recent years is best described as growing by 4x/year, but we are not confident in our interpretation – the trend is sensitive to many choices, including whether we allow for discontinuous growth and whether we exclude compute outliers.

Language models caught up to the frontier around 2020

Currently, the most relevant models to pay attention to are language models (including multimodal models like GPT-4). We have seen major advances in generative language models in recent years, and they power some of the most successful applications such as ChatGPT.

We focus on models including and after the release of the Transformer in June 2017, as our data before then is scarce.11 The trend for notable language models grows noticeably faster than for notable models overall, growing 9.5 x/year (90% CI: 7.4x to 12.2x) between June 2017 and May 2024. The central estimate is almost twice as fast as the trend for notable models, albeit with significant uncertainty.

The faster growth trend might reflect how language models started from a low baseline and climbed up the frontier of the field after gaining popularity, before decelerating once they reached the frontier. We investigate the running top 10 language models by compute, and find evidence of a deceleration around mid-2020 (90% CI: April 2019 to Dec 2021) – close to the launch of GPT-3. Growth after the deceleration and until May 2024 is around 5.0x/year (90% CI: 3.1x to 7.3x) – similar to the overall trend of frontier models (see Figure 4).