| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110 |
- .. SPDX-License-Identifier: GPL-2.0
- ============
- Introduction
- ============
- The Linux compute accelerators subsystem is designed to expose compute
- accelerators in a common way to user-space and provide a common set of
- functionality.
- These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
- Although these devices are typically designed to accelerate
- Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
- is not limited to handling these types of accelerators.
- Typically, a compute accelerator will belong to one of the following
- categories:
- - Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
- or an IP inside a SoC (e.g. laptop web camera). These devices
- are typically configured using registers and can work with or without DMA.
- - Inference data-center - single/multi user devices in a large server. This
- type of device can be stand-alone or an IP inside a SoC or a GPU. It will
- have on-board DRAM (to hold the DL topology), DMA engines and
- command submission queues (either kernel or user-space queues).
- It might also have an MMU to manage multiple users and might also enable
- virtualization (SR-IOV) to support multiple VMs on the same device. In
- addition, these devices will usually have some tools, such as profiler and
- debugger.
- - Training data-center - Similar to Inference data-center cards, but typically
- have more computational power and memory b/w (e.g. HBM) and will likely have
- a method of scaling-up/out, i.e. connecting to other training cards inside
- the server or in other servers, respectively.
- All these devices typically have different runtime user-space software stacks,
- that are tailored-made to their h/w. In addition, they will also probably
- include a compiler to generate programs to their custom-made computational
- engines. Typically, the common layer in user-space will be the DL frameworks,
- such as PyTorch and TensorFlow.
- Sharing code with DRM
- =====================
- Because this type of devices can be an IP inside GPUs or have similar
- characteristics as those of GPUs, the accel subsystem will use the
- DRM subsystem's code and functionality. i.e. the accel core code will
- be part of the DRM subsystem and an accel device will be a new type of DRM
- device.
- This will allow us to leverage the extensive DRM code-base and
- collaborate with DRM developers that have experience with this type of
- devices. In addition, new features that will be added for the accelerator
- drivers can be of use to GPU drivers as well.
- Differentiation from GPUs
- =========================
- Because we want to prevent the extensive user-space graphic software stack
- from trying to use an accelerator as a GPU, the compute accelerators will be
- differentiated from GPUs by using a new major number and new device char files.
- Furthermore, the drivers will be located in a separate place in the kernel
- tree - drivers/accel/.
- The accelerator devices will be exposed to the user space with the dedicated
- 261 major number and will have the following convention:
- - device char files - /dev/accel/accel\*
- - sysfs - /sys/class/accel/accel\*/
- - debugfs - /sys/kernel/debug/accel/\*/
- Getting Started
- ===============
- First, read the DRM documentation at Documentation/gpu/index.rst.
- Not only it will explain how to write a new DRM driver but it will also
- contain all the information on how to contribute, the Code Of Conduct and
- what is the coding style/documentation. All of that is the same for the
- accel subsystem.
- Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
- To expose your device as an accelerator, two changes are needed to
- be done in your driver (as opposed to a standard DRM driver):
- - Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
- driver_features field. It is important to note that this driver feature is
- mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
- to expose both graphics and compute device char files should be handled by
- two drivers that are connected using the auxiliary bus framework.
- - Change the open callback in your driver fops structure to accel_open().
- Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
- set the correct function operations pointers structure.
- External References
- ===================
- email threads
- -------------
- * `Initial discussion on the New subsystem for acceleration devices <https://lore.kernel.org/lkml/CAFCwf11=9qpNAepL7NL+YAV_QO=Wv6pnWPhKHKAepK3fNn+2Dg@mail.gmail.com/>`_ - Oded Gabbay (2022)
- * `patch-set to add the new subsystem <https://lore.kernel.org/lkml/20221022214622.18042-1-ogabbay@kernel.org/>`_ - Oded Gabbay (2022)
- Conference talks
- ----------------
- * `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
|