| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159 |
- ==============================
- Using the tracer for debugging
- ==============================
- Copyright 2024 Google LLC.
- :Author: Steven Rostedt <rostedt@goodmis.org>
- :License: The GNU Free Documentation License, Version 1.2
- (dual licensed under the GPL v2)
- - Written for: 6.12
- Introduction
- ------------
- The tracing infrastructure can be very useful for debugging the Linux
- kernel. This document is a place to add various methods of using the tracer
- for debugging.
- First, make sure that the tracefs file system is mounted::
- $ sudo mount -t tracefs tracefs /sys/kernel/tracing
- Using trace_printk()
- --------------------
- trace_printk() is a very lightweight utility that can be used in any context
- inside the kernel, with the exception of "noinstr" sections. It can be used
- in normal, softirq, interrupt and even NMI context. The trace data is
- written to the tracing ring buffer in a lockless way. To make it even
- lighter weight, when possible, it will only record the pointer to the format
- string, and save the raw arguments into the buffer. The format and the
- arguments will be post processed when the ring buffer is read. This way the
- trace_printk() format conversions are not done during the hot path, where
- the trace is being recorded.
- trace_printk() is meant only for debugging, and should never be added into
- a subsystem of the kernel. If you need debugging traces, add trace events
- instead. If a trace_printk() is found in the kernel, the following will
- appear in the dmesg::
- **********************************************************
- ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
- ** **
- ** trace_printk() being used. Allocating extra memory. **
- ** **
- ** This means that this is a DEBUG kernel and it is **
- ** unsafe for production use. **
- ** **
- ** If you see this message and you are not debugging **
- ** the kernel, report this immediately to your vendor! **
- ** **
- ** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
- **********************************************************
- Debugging kernel crashes
- ------------------------
- There is various methods of acquiring the state of the system when a kernel
- crash occurs. This could be from the oops message in printk, or one could
- use kexec/kdump. But these just show what happened at the time of the crash.
- It can be very useful in knowing what happened up to the point of the crash.
- The tracing ring buffer, by default, is a circular buffer than will
- overwrite older events with newer ones. When a crash happens, the content of
- the ring buffer will be all the events that lead up to the crash.
- There are several kernel command line parameters that can be used to help in
- this. The first is "ftrace_dump_on_oops". This will dump the tracing ring
- buffer when a oops occurs to the console. This can be useful if the console
- is being logged somewhere. If a serial console is used, it may be prudent to
- make sure the ring buffer is relatively small, otherwise the dumping of the
- ring buffer may take several minutes to hours to finish. Here's an example
- of the kernel command line::
- ftrace_dump_on_oops trace_buf_size=50K
- Note, the tracing buffer is made up of per CPU buffers where each of these
- buffers is broken up into sub-buffers that are by default PAGE_SIZE. The
- above trace_buf_size option above sets each of the per CPU buffers to 50K,
- so, on a machine with 8 CPUs, that's actually 400K total.
- Persistent buffers across boots
- -------------------------------
- If the system memory allows it, the tracing ring buffer can be specified at
- a specific location in memory. If the location is the same across boots and
- the memory is not modified, the tracing buffer can be retrieved from the
- following boot. There's two ways to reserve memory for the use of the ring
- buffer.
- The more reliable way (on x86) is to reserve memory with the "memmap" kernel
- command line option and then use that memory for the trace_instance. This
- requires a bit of knowledge of the physical memory layout of the system. The
- advantage of using this method, is that the memory for the ring buffer will
- always be the same::
- memmap==12M$0x284500000 trace_instance=boot_map@0x284500000:12M
- The memmap above reserves 12 megabytes of memory at the physical memory
- location 0x284500000. Then the trace_instance option will create a trace
- instance "boot_map" at that same location with the same amount of memory
- reserved. As the ring buffer is broke up into per CPU buffers, the 12
- megabytes will be broken up evenly between those CPUs. If you have 8 CPUs,
- each per CPU ring buffer will be 1.5 megabytes in size. Note, that also
- includes meta data, so the amount of memory actually used by the ring buffer
- will be slightly smaller.
- Another more generic but less robust way to allocate a ring buffer mapping
- at boot is with the "reserve_mem" option::
- reserve_mem=12M:4096:trace trace_instance=boot_map@trace
- The reserve_mem option above will find 12 megabytes that are available at
- boot up, and align it by 4096 bytes. It will label this memory as "trace"
- that can be used by later command line options.
- The trace_instance option creates a "boot_map" instance and will use the
- memory reserved by reserve_mem that was labeled as "trace". This method is
- more generic but may not be as reliable. Due to KASLR, the memory reserved
- by reserve_mem may not be located at the same location. If this happens,
- then the ring buffer will not be from the previous boot and will be reset.
- Sometimes, by using a larger alignment, it can keep KASLR from moving things
- around in such a way that it will move the location of the reserve_mem. By
- using a larger alignment, you may find better that the buffer is more
- consistent to where it is placed::
- reserve_mem=12M:0x2000000:trace trace_instance=boot_map@trace
- On boot up, the memory reserved for the ring buffer is validated. It will go
- through a series of tests to make sure that the ring buffer contains valid
- data. If it is, it will then set it up to be available to read from the
- instance. If it fails any of the tests, it will clear the entire ring buffer
- and initialize it as new.
- The layout of this mapped memory may not be consistent from kernel to
- kernel, so only the same kernel is guaranteed to work if the mapping is
- preserved. Switching to a different kernel version may find a different
- layout and mark the buffer as invalid.
- Using trace_printk() in the boot instance
- -----------------------------------------
- By default, the content of trace_printk() goes into the top level tracing
- instance. But this instance is never preserved across boots. To have the
- trace_printk() content, and some other internal tracing go to the preserved
- buffer (like dump stacks), either set the instance to be the trace_printk()
- destination from the kernel command line, or set it after boot up via the
- trace_printk_dest option.
- After boot up::
- echo 1 > /sys/kernel/tracing/instances/boot_map/options/trace_printk_dest
- From the kernel command line::
- reserve_mem=12M:4096:trace trace_instance=boot_map^traceprintk^traceoff@trace
- If setting it from the kernel command line, it is recommended to also
- disable tracing with the "traceoff" flag, and enable tracing after boot up.
- Otherwise the trace from the most recent boot will be mixed with the trace
- from the previous boot, and may make it confusing to read.
|