Machine Learning Hardware Documentation

Epoch's AI Machine Learning Hardware database is a collection of AI accelerators, such as graphics processing units (GPUs) and tensor processing units (TPUs), used to develop and deploy machine learning models in the deep learning era.

Overview

Epoch AI’s Machine Learning Hardware dataset is a collection of AI accelerators, such as graphics processing units (GPUs) and tensor processing units (TPUs), used to develop and deploy machine learning models in the deep learning era.

This documentation describes the processors included in the dataset, its records, data fields, and definitions, and a changelog and acknowledgements.

The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.

If you would like to ask any questions about the data, or suggest hardware that should be added, feel free to contact us at data@epochai.org.

If this data is useful for you, please cite it as illustrated:

Citation

Epoch AI, ‘Data on Machine Learning Hardware’. Published online at epochai.org. Retrieved from: ‘https://epochai.org/data/machine-learning-hardware’ [online resource]

BibTeX citation

@misc{EpochMLHardware2024,
  title = "Data on Machine Learning Hardware",
  author = {{Epoch AI}},
  year = 2024,
  url = {https://epochai.org/data/machine-learning-hardware},
  note = "Accessed: "
}

Inclusion

This dataset focuses on machine learning processors. These are processors used to train and deploy ML and AI models, especially those included in our Notable AI Models dataset. Here we explain the inclusion and search process and give an overview of data sources.


Inclusion criteria

To identify ML hardware, we annotated chips used for ML training in our database of Notable AI Models. We additionally added ML hardware that has not been documented in training those systems, but is clearly manufactured for ML - based on its description, supported numerical formats, or belonging to the same chip family as other ML hardware.

We use hardware datasheets, documented for each chip in the dataset, to fill in key information such as computing performance, die size, etc. Not all information is available, or even applicable, for all hardware, so columns can be left empty. We additionally use other sources, such as news coverage or hardware price archives, to fill in the price on release.

Records

This dataset has fields containing various processor details, attributes, and specifications. Records in the dataset have information about three broad areas:

Specifications about the processors, such as their clock speed, memory capacity, and performance.

Provenance details, such as the manufacturer and release date.

Metadata, such as sources containing information about the hardware, and a list of models it has been used to produce.

We provide a comprehensive guide to the data fields, below. This includes examples taken from the NVIDIA A100 SXM4 40 GB datacenter GPU, which is one of the most popular hardware used for machine learning. If you would like to request a field be added, contact us at data@epochai.org.

Column Notes Example from NVIDIA A100 SXM4 40 GB
Name of the hardware The full name of the hardware, including the manufacturer. For example, “Google TPU v5p”. Note that there can be different variations of hardware based on similar chips, which should be named distinctly, for example “NVIDIA H100 SXM5 80GB” versus “NVIDIA H100 PCIe”. NVIDIA A100 SXM4 40 GB
Manufacturer The manufacturer of the hardware. NVIDIA
Type Indicates whether the hardware is a central processing unit (CPU), graphics processor (GPU), or tensor processor (TPU). For a small number of experimental other accelerators, such as the Meta MTIA series, this is “Other”. GPU
Release date The date when the hardware could first be rented, used for machine learning workloads, or purchased (excluding pre-orders). 2020-05-14
Release price (USD) Price of the processor when released, in nominal US dollars. Prices are collected from hardware catalogs, news sources, or other documentation. Listed prices do not reflect bulk discounts. 15,000
FP64 (double precision) performance (FLOP/s) These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. 9.7e12
FP32 (single precision) performance (FLOP/s) These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. 1.9e13
FP16 (half precision) performance (FLOP/s) These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training.
FP16 data excludes processors with greater performance in FP32 than in FP16, because these are not designed to support half-precision calculations.
7.8e13
TF32 (TensorFloat-32) performance (FLOP/s) These are performance figures for tensor operations, specifically optimized for AI training. 1.6e14
Tensor-FP16 performance (FLOP/s) These are performance figures for tensor operations, specifically optimized for AI training. 3.1e14
INT16 performance (OP/s) These are performance figures for integer operations, at different numerical precisions.
INT8 performance (OP/s) These are performance figures for integer operations, at different numerical precisions. 6.2e14
INT4 performance (OP/s) These are performance figures for integer operations, at different numerical precisions.
Memory size per board (byte) The hardware’s amount of memory, in bytes. 4.0e10
Memory bandwidth (byte/s) Rate of data transfer between memory and processor, in bytes per second. 1.6e9
Intranode bandwidth (byte/s) Data transfer rate within a single node, in bytes per second. Nodes typically consist of servers which may contain CPUs, GPUs, memory, storage, etc. 6.0e11
Internode bandwidth (bit/s) Data transfer rate between separate nodes, in bits per second. Nodes typically consist of servers which may contain CPUs, GPUs, memory, storage, etc. 2.0e11
Die size (mm^2) The physical size or area of the processing chip, in square millimeters. 826
TDP (W) Thermal design power, the theoretical maximum power that can be dissipated as heat. In theory, this is the maximum sustainable power draw for a given chip. 400
Base clock (MHz) Default operating frequency of the processor, in megahertz. 1095
Boost clock (MHz) Maximum operating frequency of the processor, in megahertz. 1410
Memory clock (MHz) Operating frequency of the processor’s memory, in megahertz. 1215
Memory bus (bit) Amount of data that can be transferred between the memory and processor per cycle, in bits. 5120
Tensor cores Number of tensor cores, a specialized NVIDIA hardware component designed to accelerate matrix and tensor operations. 432
Process size (nm) Nominal semiconductor manufacturing scale, in nanometers. 7
Foundry The semiconductor manufacturer responsible for producing the processor die or chip in a foundry or fabrication plant. TSMC
Number of transistors (millions) Number of transistors in the processor, in millions. 54200
Link to datasheet Links to document(s) containing specifications or data about the processor. https://www.techpowerup.com/gpu-specs/a100-sxm4-40-gb.c3506
Source for the price Link to source(s) listing the price of the hardware. https://www.nextplatform.com/2022/05/09/how-much-of-a-premium-will-nvidia-charge-for-hopper-gpus/
ML models ML models trained with this hardware, cross-referenced from our database of models. Florence, Luminous-supreme, Falcon-180B, GPT-3.5 (text-davinci-003), GPT-4, StableLM-Base-Alpha-7B, Phi-1.5, WeLM, GLM-130B, BlenderBot 3, GPT-NeoX-20B, TinyLlama-1.1B (1T token checkpoint), TinyLlama-1.1B (3T token checkpoint), StableLM-2-1.6B, DINOv2, Stable Code 3B, Falcon-7B, Qarasu-14B, Flan T5-XXL + BLIP-2, BLIP-2 (Q-Former), Swin Transformer V2, SPHINX (Llama 2 13B), EVA-01, CoRe, InstructBLIP, xTrimoPGLM -100B, MPT-7B (base), Pythia-12b, Pythia-2.8b, Pythia-6.9b, Pythia-160m, Pythia-1b, Pythia-1.4b, Pythia-70m, Pythia-410m, PLaMo-13B, Falcon 2 11B

Changelog

2024-10-23

Dataset published to epochai.org.

Acknowledgements

The data have been collected by Epoch AI’s employees and collaborators, including Marius Hobbhahn, Lennart Heim, Gökçe Aydos, Robi Rahman, Josh You, Bartosz Podkanowicz, Luke Frymire, Natalia Martemianova, and James Sanders.

This documentation was written by Robi Rahman and David Owen.