| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830 |
- .. SPDX-License-Identifier: GPL-2.0
- ================
- Perf ring buffer
- ================
- .. CONTENTS
- 1. Introduction
- 2. Ring buffer implementation
- 2.1 Basic algorithm
- 2.2 Ring buffer for different tracing modes
- 2.2.1 Default mode
- 2.2.2 Per-thread mode
- 2.2.3 Per-CPU mode
- 2.2.4 System wide mode
- 2.3 Accessing buffer
- 2.3.1 Producer-consumer model
- 2.3.2 Properties of the ring buffers
- 2.3.3 Writing samples into buffer
- 2.3.4 Reading samples from buffer
- 2.3.5 Memory synchronization
- 3. The mechanism of AUX ring buffer
- 3.1 The relationship between AUX and regular ring buffers
- 3.2 AUX events
- 3.3 Snapshot mode
- 1. Introduction
- ===============
- The ring buffer is a fundamental mechanism for data transfer. perf uses
- ring buffers to transfer event data from kernel to user space, another
- kind of ring buffer which is so called auxiliary (AUX) ring buffer also
- plays an important role for hardware tracing with Intel PT, Arm
- CoreSight, etc.
- The ring buffer implementation is critical but it's also a very
- challenging work. On the one hand, the kernel and perf tool in the user
- space use the ring buffer to exchange data and stores data into data
- file, thus the ring buffer needs to transfer data with high throughput;
- on the other hand, the ring buffer management should avoid significant
- overload to distract profiling results.
- This documentation dives into the details for perf ring buffer with two
- parts: firstly it explains the perf ring buffer implementation, then the
- second part discusses the AUX ring buffer mechanism.
- 2. Ring buffer implementation
- =============================
- 2.1 Basic algorithm
- -------------------
- That said, a typical ring buffer is managed by a head pointer and a tail
- pointer; the head pointer is manipulated by a writer and the tail
- pointer is updated by a reader respectively.
- ::
- +---------------------------+
- | | |***|***|***| | |
- +---------------------------+
- `-> Tail `-> Head
- * : the data is filled by the writer.
- Figure 1. Ring buffer
- Perf uses the same way to manage its ring buffer. In the implementation
- there are two key data structures held together in a set of consecutive
- pages, the control structure and then the ring buffer itself. The page
- with the control structure in is known as the "user page". Being held
- in continuous virtual addresses simplifies locating the ring buffer
- address, it is in the pages after the page with the user page.
- The control structure is named as ``perf_event_mmap_page``, it contains a
- head pointer ``data_head`` and a tail pointer ``data_tail``. When the
- kernel starts to fill records into the ring buffer, it updates the head
- pointer to reserve the memory so later it can safely store events into
- the buffer. On the other side, when the user page is a writable mapping,
- the perf tool has the permission to update the tail pointer after consuming
- data from the ring buffer. Yet another case is for the user page's
- read-only mapping, which is to be addressed in the section
- :ref:`writing_samples_into_buffer`.
- ::
- user page ring buffer
- +---------+---------+ +---------------------------------------+
- |data_head|data_tail|...| | |***|***|***|***|***| | | |
- +---------+---------+ +---------------------------------------+
- ` `----------------^ ^
- `----------------------------------------------|
- * : the data is filled by the writer.
- Figure 2. Perf ring buffer
- When using the ``perf record`` tool, we can specify the ring buffer size
- with option ``-m`` or ``--mmap-pages=``, the given size will be rounded up
- to a power of two that is a multiple of a page size. Though the kernel
- allocates at once for all memory pages, it's deferred to map the pages
- to VMA area until the perf tool accesses the buffer from the user space.
- In other words, at the first time accesses the buffer's page from user
- space in the perf tool, a data abort exception for page fault is taken
- and the kernel uses this occasion to map the page into process VMA
- (see ``perf_mmap_fault()``), thus the perf tool can continue to access
- the page after returning from the exception.
- 2.2 Ring buffer for different tracing modes
- -------------------------------------------
- The perf profiles programs with different modes: default mode, per thread
- mode, per cpu mode, and system wide mode. This section describes these
- modes and how the ring buffer meets requirements for them. At last we
- will review the race conditions caused by these modes.
- 2.2.1 Default mode
- ^^^^^^^^^^^^^^^^^^
- Usually we execute ``perf record`` command followed by a profiling program
- name, like below command::
- perf record test_program
- This command doesn't specify any options for CPU and thread modes, the
- perf tool applies the default mode on the perf event. It maps all the
- CPUs in the system and the profiled program's PID on the perf event, and
- it enables inheritance mode on the event so that child tasks inherits
- the events. As a result, the perf event is attributed as::
- evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
- evsel::threads::map[] = { pid }
- evsel::attr::inherit = 1
- These attributions finally will be reflected on the deployment of ring
- buffers. As shown below, the perf tool allocates individual ring buffer
- for each CPU, but it only enables events for the profiled program rather
- than for all threads in the system. The *T1* thread represents the
- thread context of the 'test_program', whereas *T2* and *T3* are irrelevant
- threads in the system. The perf samples are exclusively collected for
- the *T1* thread and stored in the ring buffer associated with the CPU on
- which the *T1* thread is running.
- ::
- T1 T2 T1
- +----+ +-----------+ +----+
- CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
- +----+--------------+-----------+----------+----+-------->
- | |
- v v
- +-----------------------------------------------------+
- | Ring buffer 0 |
- +-----------------------------------------------------+
- T1
- +-----+
- CPU1 |xxxxx|
- -----+-----+--------------------------------------------->
- |
- v
- +-----------------------------------------------------+
- | Ring buffer 1 |
- +-----------------------------------------------------+
- T1 T3
- +----+ +-------+
- CPU2 |xxxx| |xxxxxxx|
- --------------------------+----+--------+-------+-------->
- |
- v
- +-----------------------------------------------------+
- | Ring buffer 2 |
- +-----------------------------------------------------+
- T1
- +--------------+
- CPU3 |xxxxxxxxxxxxxx|
- -----------+--------------+------------------------------>
- |
- v
- +-----------------------------------------------------+
- | Ring buffer 3 |
- +-----------------------------------------------------+
- T1: Thread 1; T2: Thread 2; T3: Thread 3
- x: Thread is in running state
- Figure 3. Ring buffer for default mode
- 2.2.2 Per-thread mode
- ^^^^^^^^^^^^^^^^^^^^^
- By specifying option ``--per-thread`` in perf command, e.g.
- ::
- perf record --per-thread test_program
- The perf event doesn't map to any CPUs and is only bound to the
- profiled process, thus, the perf event's attributions are::
- evsel::cpus::map[0] = { -1 }
- evsel::threads::map[] = { pid }
- evsel::attr::inherit = 0
- In this mode, a single ring buffer is allocated for the profiled thread;
- if the thread is scheduled on a CPU, the events on that CPU will be
- enabled; and if the thread is scheduled out from the CPU, the events on
- the CPU will be disabled. When the thread is migrated from one CPU to
- another, the events are to be disabled on the previous CPU and enabled
- on the next CPU correspondingly.
- ::
- T1 T2 T1
- +----+ +-----------+ +----+
- CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
- +----+--------------+-----------+----------+----+-------->
- | |
- | T1 |
- | +-----+ |
- CPU1 | |xxxxx| |
- --|--+-----+----------------------------------|---------->
- | | |
- | | T1 T3 |
- | | +----+ +---+ |
- CPU2 | | |xxxx| |xxx| |
- --|-----|-----------------+----+--------+---+-|---------->
- | | | |
- | | T1 | |
- | | +--------------+ | |
- CPU3 | | |xxxxxxxxxxxxxx| | |
- --|-----|--+--------------+-|-----------------|---------->
- | | | | |
- v v v v v
- +-----------------------------------------------------+
- | Ring buffer |
- +-----------------------------------------------------+
- T1: Thread 1
- x: Thread is in running state
- Figure 4. Ring buffer for per-thread mode
- When perf runs in per-thread mode, a ring buffer is allocated for the
- profiled thread *T1*. The ring buffer is dedicated for thread *T1*, if the
- thread *T1* is running, the perf events will be recorded into the ring
- buffer; when the thread is sleeping, all associated events will be
- disabled, thus no trace data will be recorded into the ring buffer.
- 2.2.3 Per-CPU mode
- ^^^^^^^^^^^^^^^^^^
- The option ``-C`` is used to collect samples on the list of CPUs, for
- example the below perf command receives option ``-C 0,2``::
- perf record -C 0,2 test_program
- It maps the perf event to CPUs 0 and 2, and the event is not associated to any
- PID. Thus the perf event attributions are set as::
- evsel::cpus::map[0] = { 0, 2 }
- evsel::threads::map[] = { -1 }
- evsel::attr::inherit = 0
- This results in the session of ``perf record`` will sample all threads on CPU0
- and CPU2, and be terminated until test_program exits. Even there have tasks
- running on CPU1 and CPU3, since the ring buffer is absent for them, any
- activities on these two CPUs will be ignored. A usage case is to combine the
- options for per-thread mode and per-CPU mode, e.g. the options ``–C 0,2`` and
- ``––per–thread`` are specified together, the samples are recorded only when
- the profiled thread is scheduled on any of the listed CPUs.
- ::
- T1 T2 T1
- +----+ +-----------+ +----+
- CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
- +----+--------------+-----------+----------+----+-------->
- | | |
- v v v
- +-----------------------------------------------------+
- | Ring buffer 0 |
- +-----------------------------------------------------+
- T1
- +-----+
- CPU1 |xxxxx|
- -----+-----+--------------------------------------------->
- T1 T3
- +----+ +-------+
- CPU2 |xxxx| |xxxxxxx|
- --------------------------+----+--------+-------+-------->
- | |
- v v
- +-----------------------------------------------------+
- | Ring buffer 1 |
- +-----------------------------------------------------+
- T1
- +--------------+
- CPU3 |xxxxxxxxxxxxxx|
- -----------+--------------+------------------------------>
- T1: Thread 1; T2: Thread 2; T3: Thread 3
- x: Thread is in running state
- Figure 5. Ring buffer for per-CPU mode
- 2.2.4 System wide mode
- ^^^^^^^^^^^^^^^^^^^^^^
- By using option ``–a`` or ``––all–cpus``, perf collects samples on all CPUs
- for all tasks, we call it as the system wide mode, the command is::
- perf record -a test_program
- Similar to the per-CPU mode, the perf event doesn't bind to any PID, and
- it maps to all CPUs in the system::
- evsel::cpus::map[] = { 0 .. _SC_NPROCESSORS_ONLN-1 }
- evsel::threads::map[] = { -1 }
- evsel::attr::inherit = 0
- In the system wide mode, every CPU has its own ring buffer, all threads
- are monitored during the running state and the samples are recorded into
- the ring buffer belonging to the CPU which the events occurred on.
- ::
- T1 T2 T1
- +----+ +-----------+ +----+
- CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
- +----+--------------+-----------+----------+----+-------->
- | | |
- v v v
- +-----------------------------------------------------+
- | Ring buffer 0 |
- +-----------------------------------------------------+
- T1
- +-----+
- CPU1 |xxxxx|
- -----+-----+--------------------------------------------->
- |
- v
- +-----------------------------------------------------+
- | Ring buffer 1 |
- +-----------------------------------------------------+
- T1 T3
- +----+ +-------+
- CPU2 |xxxx| |xxxxxxx|
- --------------------------+----+--------+-------+-------->
- | |
- v v
- +-----------------------------------------------------+
- | Ring buffer 2 |
- +-----------------------------------------------------+
- T1
- +--------------+
- CPU3 |xxxxxxxxxxxxxx|
- -----------+--------------+------------------------------>
- |
- v
- +-----------------------------------------------------+
- | Ring buffer 3 |
- +-----------------------------------------------------+
- T1: Thread 1; T2: Thread 2; T3: Thread 3
- x: Thread is in running state
- Figure 6. Ring buffer for system wide mode
- 2.3 Accessing buffer
- --------------------
- Based on the understanding of how the ring buffer is allocated in
- various modes, this section explains access the ring buffer.
- 2.3.1 Producer-consumer model
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- In the Linux kernel, the PMU events can produce samples which are stored
- into the ring buffer; the perf command in user space consumes the
- samples by reading out data from the ring buffer and finally saves the
- data into the file for post analysis. It’s a typical producer-consumer
- model for using the ring buffer.
- The perf process polls on the PMU events and sleeps when no events are
- incoming. To prevent frequent exchanges between the kernel and user
- space, the kernel event core layer introduces a watermark, which is
- stored in the ``perf_buffer::watermark``. When a sample is recorded into
- the ring buffer, and if the used buffer exceeds the watermark, the
- kernel wakes up the perf process to read samples from the ring buffer.
- ::
- Perf
- / | Read samples
- Polling / `--------------| Ring buffer
- v v ;---------------------v
- +----------------+ +---------+---------+ +-------------------+
- |Event wait queue| |data_head|data_tail| |***|***| | |***|
- +----------------+ +---------+---------+ +-------------------+
- ^ ^ `------------------------^
- | Wake up tasks | Store samples
- +-----------------------------+
- | Kernel event core layer |
- +-----------------------------+
- * : the data is filled by the writer.
- Figure 7. Writing and reading the ring buffer
- When the kernel event core layer notifies the user space, because
- multiple events might share the same ring buffer for recording samples,
- the core layer iterates every event associated with the ring buffer and
- wakes up tasks waiting on the event. This is fulfilled by the kernel
- function ``ring_buffer_wakeup()``.
- After the perf process is woken up, it starts to check the ring buffers
- one by one, if it finds any ring buffer containing samples it will read
- out the samples for statistics or saving into the data file. Given the
- perf process is able to run on any CPU, this leads to the ring buffer
- potentially being accessed from multiple CPUs simultaneously, which
- causes race conditions. The race condition handling is described in the
- section :ref:`memory_synchronization`.
- 2.3.2 Properties of the ring buffers
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Linux kernel supports two write directions for the ring buffer: forward and
- backward. The forward writing saves samples from the beginning of the ring
- buffer, the backward writing stores data from the end of the ring buffer with
- the reversed direction. The perf tool determines the writing direction.
- Additionally, the tool can map buffers in either read-write mode or read-only
- mode to the user space.
- The ring buffer in the read-write mode is mapped with the property
- ``PROT_READ | PROT_WRITE``. With the write permission, the perf tool
- updates the ``data_tail`` to indicate the data start position. Combining
- with the head pointer ``data_head``, which works as the end position of
- the current data, the perf tool can easily know where read out the data
- from.
- Alternatively, in the read-only mode, only the kernel keeps to update
- the ``data_head`` while the user space cannot access the ``data_tail`` due
- to the mapping property ``PROT_READ``.
- As a result, the matrix below illustrates the various combinations of
- direction and mapping characteristics. The perf tool employs two of these
- combinations to support buffer types: the non-overwrite buffer and the
- overwritable buffer.
- .. list-table::
- :widths: 1 1 1
- :header-rows: 1
- * - Mapping mode
- - Forward
- - Backward
- * - read-write
- - Non-overwrite ring buffer
- - Not used
- * - read-only
- - Not used
- - Overwritable ring buffer
- The non-overwrite ring buffer uses the read-write mapping with forward
- writing. It starts to save data from the beginning of the ring buffer
- and wrap around when overflow, which is used with the read-write mode in
- the normal ring buffer. When the consumer doesn't keep up with the
- producer, it would lose some data, the kernel keeps how many records it
- lost and generates the ``PERF_RECORD_LOST`` records in the next time
- when it finds a space in the ring buffer.
- The overwritable ring buffer uses the backward writing with the
- read-only mode. It saves the data from the end of the ring buffer and
- the ``data_head`` keeps the position of current data, the perf always
- knows where it starts to read and until the end of the ring buffer, thus
- it don't need the ``data_tail``. In this mode, it will not generate the
- ``PERF_RECORD_LOST`` records.
- .. _writing_samples_into_buffer:
- 2.3.3 Writing samples into buffer
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- When a sample is taken and saved into the ring buffer, the kernel
- prepares sample fields based on the sample type; then it prepares the
- info for writing ring buffer which is stored in the structure
- ``perf_output_handle``. In the end, the kernel outputs the sample into
- the ring buffer and updates the head pointer in the user page so the
- perf tool can see the latest value.
- The structure ``perf_output_handle`` serves as a temporary context for
- tracking the information related to the buffer. The advantages of it is
- that it enables concurrent writing to the buffer by different events.
- For example, a software event and a hardware PMU event both are enabled
- for profiling, two instances of ``perf_output_handle`` serve as separate
- contexts for the software event and the hardware event respectively.
- This allows each event to reserve its own memory space for populating
- the record data.
- 2.3.4 Reading samples from buffer
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- In the user space, the perf tool utilizes the ``perf_event_mmap_page``
- structure to handle the head and tail of the buffer. It also uses
- ``perf_mmap`` structure to keep track of a context for the ring buffer, this
- context includes information about the buffer's starting and ending
- addresses. Additionally, the mask value can be utilized to compute the
- circular buffer pointer even for an overflow.
- Similar to the kernel, the perf tool in the user space first reads out
- the recorded data from the ring buffer, and then updates the buffer's
- tail pointer ``perf_event_mmap_page::data_tail``.
- .. _memory_synchronization:
- 2.3.5 Memory synchronization
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- The modern CPUs with relaxed memory model cannot promise the memory
- ordering, this means it’s possible to access the ring buffer and the
- ``perf_event_mmap_page`` structure out of order. To assure the specific
- sequence for memory accessing perf ring buffer, memory barriers are
- used to assure the data dependency. The rationale for the memory
- synchronization is as below::
- Kernel User space
- if (LOAD ->data_tail) { LOAD ->data_head
- (A) smp_rmb() (C)
- STORE $data LOAD $data
- smp_wmb() (B) smp_mb() (D)
- STORE ->data_head STORE ->data_tail
- }
- The comments in tools/include/linux/ring_buffer.h gives nice description
- for why and how to use memory barriers, here we will just provide an
- alternative explanation:
- (A) is a control dependency so that CPU assures order between checking
- pointer ``perf_event_mmap_page::data_tail`` and filling sample into ring
- buffer;
- (D) pairs with (A). (D) separates the ring buffer data reading from
- writing the pointer ``data_tail``, perf tool first consumes samples and then
- tells the kernel that the data chunk has been released. Since a reading
- operation is followed by a writing operation, thus (D) is a full memory
- barrier.
- (B) is a writing barrier in the middle of two writing operations, which
- makes sure that recording a sample must be prior to updating the head
- pointer.
- (C) pairs with (B). (C) is a read memory barrier to ensure the head
- pointer is fetched before reading samples.
- To implement the above algorithm, the ``perf_output_put_handle()`` function
- in the kernel and two helpers ``ring_buffer_read_head()`` and
- ``ring_buffer_write_tail()`` in the user space are introduced, they rely
- on memory barriers as described above to ensure the data dependency.
- Some architectures support one-way permeable barrier with load-acquire
- and store-release operations, these barriers are more relaxed with less
- performance penalty, so (C) and (D) can be optimized to use barriers
- ``smp_load_acquire()`` and ``smp_store_release()`` respectively.
- If an architecture doesn’t support load-acquire and store-release in its
- memory model, it will roll back to the old fashion of memory barrier
- operations. In this case, ``smp_load_acquire()`` encapsulates
- ``READ_ONCE()`` + ``smp_mb()``, since ``smp_mb()`` is costly,
- ``ring_buffer_read_head()`` doesn't invoke ``smp_load_acquire()`` and it uses
- the barriers ``READ_ONCE()`` + ``smp_rmb()`` instead.
- 3. The mechanism of AUX ring buffer
- ===================================
- In this chapter, we will explain the implementation of the AUX ring
- buffer. In the first part it will discuss the connection between the
- AUX ring buffer and the regular ring buffer, then the second part will
- examine how the AUX ring buffer co-works with the regular ring buffer,
- as well as the additional features introduced by the AUX ring buffer for
- the sampling mechanism.
- 3.1 The relationship between AUX and regular ring buffers
- ---------------------------------------------------------
- Generally, the AUX ring buffer is an auxiliary for the regular ring
- buffer. The regular ring buffer is primarily used to store the event
- samples and every event format complies with the definition in the
- union ``perf_event``; the AUX ring buffer is for recording the hardware
- trace data and the trace data format is hardware IP dependent.
- The general use and advantage of the AUX ring buffer is that it is
- written directly by hardware rather than by the kernel. For example,
- regular profile samples that write to the regular ring buffer cause an
- interrupt. Tracing execution requires a high number of samples and
- using interrupts would be overwhelming for the regular ring buffer
- mechanism. Having an AUX buffer allows for a region of memory more
- decoupled from the kernel and written to directly by hardware tracing.
- The AUX ring buffer reuses the same algorithm with the regular ring
- buffer for the buffer management. The control structure
- ``perf_event_mmap_page`` extends the new fields ``aux_head`` and ``aux_tail``
- for the head and tail pointers of the AUX ring buffer.
- During the initialisation phase, besides the mmap()-ed regular ring
- buffer, the perf tool invokes a second syscall in the
- ``auxtrace_mmap__mmap()`` function for the mmap of the AUX buffer with
- non-zero file offset; ``rb_alloc_aux()`` in the kernel allocates pages
- correspondingly, these pages will be deferred to map into VMA when
- handling the page fault, which is the same lazy mechanism with the
- regular ring buffer.
- AUX events and AUX trace data are two different things. Let's see an
- example::
- perf record -a -e cycles -e cs_etm/@tmc_etr0/ -- sleep 2
- The above command enables two events: one is the event *cycles* from PMU
- and another is the AUX event *cs_etm* from Arm CoreSight, both are saved
- into the regular ring buffer while the CoreSight's AUX trace data is
- stored in the AUX ring buffer.
- As a result, we can see the regular ring buffer and the AUX ring buffer
- are allocated in pairs. The perf in default mode allocates the regular
- ring buffer and the AUX ring buffer per CPU-wise, which is the same as
- the system wide mode, however, the default mode records samples only for
- the profiled program, whereas the latter mode profiles for all programs
- in the system. For per-thread mode, the perf tool allocates only one
- regular ring buffer and one AUX ring buffer for the whole session. For
- the per-CPU mode, the perf allocates two kinds of ring buffers for
- selected CPUs specified by the option ``-C``.
- The below figure demonstrates the buffers' layout in the system wide
- mode; if there are any activities on one CPU, the AUX event samples and
- the hardware trace data will be recorded into the dedicated buffers for
- the CPU.
- ::
- T1 T2 T1
- +----+ +-----------+ +----+
- CPU0 |xxxx| |xxxxxxxxxxx| |xxxx|
- +----+--------------+-----------+----------+----+-------->
- | | |
- v v v
- +-----------------------------------------------------+
- | Ring buffer 0 |
- +-----------------------------------------------------+
- | | |
- v v v
- +-----------------------------------------------------+
- | AUX Ring buffer 0 |
- +-----------------------------------------------------+
- T1
- +-----+
- CPU1 |xxxxx|
- -----+-----+--------------------------------------------->
- |
- v
- +-----------------------------------------------------+
- | Ring buffer 1 |
- +-----------------------------------------------------+
- |
- v
- +-----------------------------------------------------+
- | AUX Ring buffer 1 |
- +-----------------------------------------------------+
- T1 T3
- +----+ +-------+
- CPU2 |xxxx| |xxxxxxx|
- --------------------------+----+--------+-------+-------->
- | |
- v v
- +-----------------------------------------------------+
- | Ring buffer 2 |
- +-----------------------------------------------------+
- | |
- v v
- +-----------------------------------------------------+
- | AUX Ring buffer 2 |
- +-----------------------------------------------------+
- T1
- +--------------+
- CPU3 |xxxxxxxxxxxxxx|
- -----------+--------------+------------------------------>
- |
- v
- +-----------------------------------------------------+
- | Ring buffer 3 |
- +-----------------------------------------------------+
- |
- v
- +-----------------------------------------------------+
- | AUX Ring buffer 3 |
- +-----------------------------------------------------+
- T1: Thread 1; T2: Thread 2; T3: Thread 3
- x: Thread is in running state
- Figure 8. AUX ring buffer for system wide mode
- 3.2 AUX events
- --------------
- Similar to ``perf_output_begin()`` and ``perf_output_end()``'s working for the
- regular ring buffer, ``perf_aux_output_begin()`` and ``perf_aux_output_end()``
- serve for the AUX ring buffer for processing the hardware trace data.
- Once the hardware trace data is stored into the AUX ring buffer, the PMU
- driver will stop hardware tracing by calling the ``pmu::stop()`` callback.
- Similar to the regular ring buffer, the AUX ring buffer needs to apply
- the memory synchronization mechanism as discussed in the section
- :ref:`memory_synchronization`. Since the AUX ring buffer is managed by the
- PMU driver, the barrier (B), which is a writing barrier to ensure the trace
- data is externally visible prior to updating the head pointer, is asked
- to be implemented in the PMU driver.
- Then ``pmu::stop()`` can safely call the ``perf_aux_output_end()`` function to
- finish two things:
- - It fills an event ``PERF_RECORD_AUX`` into the regular ring buffer, this
- event delivers the information of the start address and data size for a
- chunk of hardware trace data has been stored into the AUX ring buffer;
- - Since the hardware trace driver has stored new trace data into the AUX
- ring buffer, the argument *size* indicates how many bytes have been
- consumed by the hardware tracing, thus ``perf_aux_output_end()`` updates the
- header pointer ``perf_buffer::aux_head`` to reflect the latest buffer usage.
- At the end, the PMU driver will restart hardware tracing. During this
- temporary suspending period, it will lose hardware trace data, which
- will introduce a discontinuity during decoding phase.
- The event ``PERF_RECORD_AUX`` presents an AUX event which is handled in the
- kernel, but it lacks the information for saving the AUX trace data in
- the perf file. When the perf tool copies the trace data from AUX ring
- buffer to the perf data file, it synthesizes a ``PERF_RECORD_AUXTRACE``
- event which is not a kernel ABI, it's defined by the perf tool to describe
- which portion of data in the AUX ring buffer is saved. Afterwards, the perf
- tool reads out the AUX trace data from the perf file based on the
- ``PERF_RECORD_AUXTRACE`` events, and the ``PERF_RECORD_AUX`` event is used to
- decode a chunk of data by correlating with time order.
- 3.3 Snapshot mode
- -----------------
- Perf supports snapshot mode for AUX ring buffer, in this mode, users
- only record AUX trace data at a specific time point which users are
- interested in. E.g. below gives an example of how to take snapshots
- with 1 second interval with Arm CoreSight::
- perf record -e cs_etm/@tmc_etr0/u -S -a program &
- PERFPID=$!
- while true; do
- kill -USR2 $PERFPID
- sleep 1
- done
- The main flow for snapshot mode is:
- - Before a snapshot is taken, the AUX ring buffer acts in free run mode.
- During free run mode the perf doesn't record any of the AUX events and
- trace data;
- - Once the perf tool receives the *USR2* signal, it triggers the callback
- function ``auxtrace_record::snapshot_start()`` to deactivate hardware
- tracing. The kernel driver then populates the AUX ring buffer with the
- hardware trace data, and the event ``PERF_RECORD_AUX`` is stored in the
- regular ring buffer;
- - Then perf tool takes a snapshot, ``record__read_auxtrace_snapshot()``
- reads out the hardware trace data from the AUX ring buffer and saves it
- into perf data file;
- - After the snapshot is finished, ``auxtrace_record::snapshot_finish()``
- restarts the PMU event for AUX tracing.
- The perf only accesses the head pointer ``perf_event_mmap_page::aux_head``
- in snapshot mode and doesn’t touch tail pointer ``aux_tail``, this is
- because the AUX ring buffer can overflow in free run mode, the tail
- pointer is useless in this case. Alternatively, the callback
- ``auxtrace_record::find_snapshot()`` is introduced for making the decision
- of whether the AUX ring buffer has been wrapped around or not, at the
- end it fixes up the AUX buffer's head which are used to calculate the
- trace data size.
- As we know, the buffers' deployment can be per-thread mode, per-CPU
- mode, or system wide mode, and the snapshot can be applied to any of
- these modes. Below is an example of taking snapshot with system wide
- mode.
- ::
- Snapshot is taken
- |
- v
- +------------------------+
- | AUX Ring buffer 0 | <- aux_head
- +------------------------+
- v
- +--------------------------------+
- | AUX Ring buffer 1 | <- aux_head
- +--------------------------------+
- v
- +--------------------------------------------+
- | AUX Ring buffer 2 | <- aux_head
- +--------------------------------------------+
- v
- +---------------------------------------+
- | AUX Ring buffer 3 | <- aux_head
- +---------------------------------------+
- Figure 9. Snapshot with system wide mode
|