fprobetrace.rst 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ==========================
  3. Fprobe-based Event Tracing
  4. ==========================
  5. .. Author: Masami Hiramatsu <mhiramat@kernel.org>
  6. Overview
  7. --------
  8. Fprobe event is similar to the kprobe event, but limited to probe on
  9. the function entry and exit only. It is good enough for many use cases
  10. which only traces some specific functions.
  11. This document also covers tracepoint probe events (tprobe) since this
  12. is also works only on the tracepoint entry. User can trace a part of
  13. tracepoint argument, or the tracepoint without trace-event, which is
  14. not exposed on tracefs.
  15. As same as other dynamic events, fprobe events and tracepoint probe
  16. events are defined via `dynamic_events` interface file on tracefs.
  17. Synopsis of fprobe-events
  18. -------------------------
  19. ::
  20. f[:[GRP1/][EVENT1]] SYM [FETCHARGS] : Probe on function entry
  21. f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS] : Probe on function exit
  22. t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS] : Probe on tracepoint
  23. GRP1 : Group name for fprobe. If omitted, use "fprobes" for it.
  24. GRP2 : Group name for tprobe. If omitted, use "tracepoints" for it.
  25. EVENT1 : Event name for fprobe. If omitted, the event name is
  26. "SYM__entry" or "SYM__exit".
  27. EVENT2 : Event name for tprobe. If omitted, the event name is
  28. the same as "TRACEPOINT", but if the "TRACEPOINT" starts
  29. with a digit character, "_TRACEPOINT" is used.
  30. MAXACTIVE : Maximum number of instances of the specified function that
  31. can be probed simultaneously, or 0 for the default value
  32. as defined in Documentation/trace/fprobe.rst
  33. FETCHARGS : Arguments. Each probe can have up to 128 args.
  34. ARG : Fetch "ARG" function argument using BTF (only for function
  35. entry or tracepoint.) (\*1)
  36. @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
  37. @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
  38. $stackN : Fetch Nth entry of stack (N >= 0)
  39. $stack : Fetch stack address.
  40. $argN : Fetch the Nth function argument. (N >= 1) (\*2)
  41. $retval : Fetch return value.(\*3)
  42. $comm : Fetch current task comm.
  43. +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
  44. \IMM : Store an immediate value to the argument.
  45. NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
  46. FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
  47. (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
  48. (x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
  49. and bitfield are supported.
  50. (\*1) This is available only when BTF is enabled.
  51. (\*2) only for the probe on function entry (offs == 0). Note, this argument access
  52. is best effort, because depending on the argument type, it may be passed on
  53. the stack. But this only support the arguments via registers.
  54. (\*3) only for return probe. Note that this is also best effort. Depending on the
  55. return value type, it might be passed via a pair of registers. But this only
  56. accesses one register.
  57. (\*4) this is useful for fetching a field of data structures.
  58. (\*5) "u" means user-space dereference.
  59. For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`.
  60. Function arguments at exit
  61. --------------------------
  62. Function arguments can be accessed at exit probe using $arg<N> fetcharg. This
  63. is useful to record the function parameter and return value at once, and
  64. trace the difference of structure fields (for debugging a function whether it
  65. correctly updates the given data structure or not)
  66. See the :ref:`sample<fprobetrace_exit_args_sample>` below for how it works.
  67. BTF arguments
  68. -------------
  69. BTF (BPF Type Format) argument allows user to trace function and tracepoint
  70. parameters by its name instead of ``$argN``. This feature is available if the
  71. kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF.
  72. If user only specify the BTF argument, the event's argument name is also
  73. automatically set by the given name. ::
  74. # echo 'f:myprobe vfs_read count pos' >> dynamic_events
  75. # cat dynamic_events
  76. f:fprobes/myprobe vfs_read count=count pos=pos
  77. It also chooses the fetch type from BTF information. For example, in the above
  78. example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus,
  79. both are converted to 64bit unsigned long, but only ``pos`` has "%Lx"
  80. print-format as below ::
  81. # cat events/fprobes/myprobe/format
  82. name: myprobe
  83. ID: 1313
  84. format:
  85. field:unsigned short common_type; offset:0; size:2; signed:0;
  86. field:unsigned char common_flags; offset:2; size:1; signed:0;
  87. field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
  88. field:int common_pid; offset:4; size:4; signed:1;
  89. field:unsigned long __probe_ip; offset:8; size:8; signed:0;
  90. field:u64 count; offset:16; size:8; signed:0;
  91. field:u64 pos; offset:24; size:8; signed:0;
  92. print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos
  93. If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*``
  94. is expanded to all function arguments of the function or the tracepoint. ::
  95. # echo 'f:myprobe vfs_read $arg*' >> dynamic_events
  96. # cat dynamic_events
  97. f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos
  98. BTF also affects the ``$retval``. If user doesn't set any type, the retval
  99. type is automatically picked from the BTF. If the function returns ``void``,
  100. ``$retval`` is rejected.
  101. You can access the data fields of a data structure using allow operator ``->``
  102. (for pointer type) and dot operator ``.`` (for data structure type.)::
  103. # echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events
  104. The field access operators, ``->`` and ``.`` can be combined for accessing deeper
  105. members and other structure members pointed by the member. e.g. ``foo->bar.baz->qux``
  106. If there is non-name union member, you can directly access it as the C code does.
  107. For example::
  108. struct {
  109. union {
  110. int a;
  111. int b;
  112. };
  113. } *foo;
  114. To access ``a`` and ``b``, use ``foo->a`` and ``foo->b`` in this case.
  115. This data field access is available for the return value via ``$retval``,
  116. e.g. ``$retval->name``.
  117. For these BTF arguments and fields, ``:string`` and ``:ustring`` change the
  118. behavior. If these are used for BTF argument or field, it checks whether
  119. the BTF type of the argument or the data field is ``char *`` or ``char []``,
  120. or not. If not, it rejects applying the string types. Also, with the BTF
  121. support, you don't need a memory dereference operator (``+0(PTR)``) for
  122. accessing the string pointed by a ``PTR``. It automatically adds the memory
  123. dereference operator according to the BTF type. e.g. ::
  124. # echo 't sched_switch prev->comm:string' >> dynamic_events
  125. # echo 'f getname_flags%return $retval->name:string' >> dynamic_events
  126. The ``prev->comm`` is an embedded char array in the data structure, and
  127. ``$retval->name`` is a char pointer in the data structure. But in both
  128. cases, you can use ``:string`` type to get the string.
  129. Usage examples
  130. --------------
  131. Here is an example to add fprobe events on ``vfs_read()`` function entry
  132. and exit, with BTF arguments.
  133. ::
  134. # echo 'f vfs_read $arg*' >> dynamic_events
  135. # echo 'f vfs_read%return $retval' >> dynamic_events
  136. # cat dynamic_events
  137. f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos
  138. f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
  139. # echo 1 > events/fprobes/enable
  140. # head -n 20 trace | tail
  141. # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
  142. # | | | ||||| | |
  143. sh-70 [000] ...1. 335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
  144. sh-70 [000] ..... 335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
  145. sh-70 [000] ...1. 335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
  146. sh-70 [000] ..... 335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
  147. sh-70 [000] ...1. 335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08
  148. sh-70 [000] ..... 335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
  149. sh-70 [000] ...1. 336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
  150. sh-70 [000] ..... 336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
  151. You can see all function arguments and return values are recorded as signed int.
  152. Also, here is an example of tracepoint events on ``sched_switch`` tracepoint.
  153. To compare the result, this also enables the ``sched_switch`` traceevent too.
  154. ::
  155. # echo 't sched_switch $arg*' >> dynamic_events
  156. # echo 1 > events/sched/sched_switch/enable
  157. # echo 1 > events/tracepoints/sched_switch/enable
  158. # echo > trace
  159. # head -n 20 trace | tail
  160. # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
  161. # | | | ||||| | |
  162. sh-70 [000] d..2. 3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
  163. sh-70 [000] d..3. 3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1
  164. <idle>-0 [000] d..2. 3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
  165. <idle>-0 [000] d..3. 3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
  166. rcu_preempt-16 [000] d..2. 3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
  167. rcu_preempt-16 [000] d..3. 3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026
  168. <idle>-0 [000] d..2. 3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
  169. <idle>-0 [000] d..3. 3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
  170. As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on
  171. the other hand, the ``sched_switch`` tracepoint probe event shows *raw*
  172. parameters. This means you can access any field values in the task
  173. structure pointed by the ``prev`` and ``next`` arguments.
  174. For example, usually ``task_struct::start_time`` is not traced, but with this
  175. traceprobe event, you can trace that field as below.
  176. ::
  177. # echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events
  178. # head -n 20 trace | tail
  179. # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
  180. # | | | ||||| | |
  181. sh-70 [000] d..3. 5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
  182. rcu_preempt-16 [000] d..3. 5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526
  183. sh-70 [000] d..3. 5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
  184. <idle>-0 [000] d..3. 5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
  185. rcu_preempt-16 [000] d..3. 5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
  186. <idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
  187. kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
  188. <idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
  189. .. _fprobetrace_exit_args_sample:
  190. The return probe allows us to access the results of some functions, which returns
  191. the error code and its results are passed via function parameter, such as an
  192. structure-initialization function.
  193. For example, vfs_open() will link the file structure to the inode and update
  194. mode. You can trace that changes with return probe.
  195. ::
  196. # echo 'f vfs_open mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
  197. # echo 'f vfs_open%%return mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
  198. # echo 1 > events/fprobes/enable
  199. # cat trace
  200. sh-131 [006] ...1. 1945.714346: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x2 inode=0x0
  201. sh-131 [006] ...1. 1945.714358: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4d801e inode=0xffff888008470168
  202. cat-143 [007] ...1. 1945.717949: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
  203. cat-143 [007] ...1. 1945.717956: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4a801d inode=0xffff888005f78d28
  204. cat-143 [007] ...1. 1945.720616: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
  205. cat-143 [007] ...1. 1945.728263: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0xa800d inode=0xffff888004ada8d8
  206. You can see the `file::f_mode` and `file::f_inode` are updated in `vfs_open()`.