coresight.txt 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430
  1. Coresight - HW Assisted Tracing on ARM
  2. ======================================
  3. Author: Mathieu Poirier <mathieu.poirier@linaro.org>
  4. Date: September 11th, 2014
  5. Introduction
  6. ------------
  7. Coresight is an umbrella of technologies allowing for the debugging of ARM
  8. based SoC. It includes solutions for JTAG and HW assisted tracing. This
  9. document is concerned with the latter.
  10. HW assisted tracing is becoming increasingly useful when dealing with systems
  11. that have many SoCs and other components like GPU and DMA engines. ARM has
  12. developed a HW assisted tracing solution by means of different components, each
  13. being added to a design at synthesis time to cater to specific tracing needs.
  14. Components are generally categorised as source, link and sinks and are
  15. (usually) discovered using the AMBA bus.
  16. "Sources" generate a compressed stream representing the processor instruction
  17. path based on tracing scenarios as configured by users. From there the stream
  18. flows through the coresight system (via ATB bus) using links that are connecting
  19. the emanating source to a sink(s). Sinks serve as endpoints to the coresight
  20. implementation, either storing the compressed stream in a memory buffer or
  21. creating an interface to the outside world where data can be transferred to a
  22. host without fear of filling up the onboard coresight memory buffer.
  23. At typical coresight system would look like this:
  24. *****************************************************************
  25. **************************** AMBA AXI ****************************===||
  26. ***************************************************************** ||
  27. ^ ^ | ||
  28. | | * **
  29. 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
  30. 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
  31. |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
  32. | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
  33. | # ETM # ::::: | # PTM # ::::: ::::: @ |
  34. | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
  35. | |->### | ! | |->### | ! | ! . | || DAP ||
  36. | | # | ! | | # | ! | ! . | |||||||||
  37. | | . | ! | | . | ! | ! . | | |
  38. | | . | ! | | . | ! | ! . | | *
  39. | | . | ! | | . | ! | ! . | | SWD/
  40. | | . | ! | | . | ! | ! . | | JTAG
  41. *****************************************************************<-|
  42. *************************** AMBA Debug APB ************************
  43. *****************************************************************
  44. | . ! . ! ! . |
  45. | . * . * * . |
  46. *****************************************************************
  47. ******************** Cross Trigger Matrix (CTM) *******************
  48. *****************************************************************
  49. | . ^ . . |
  50. | * ! * * |
  51. *****************************************************************
  52. ****************** AMBA Advanced Trace Bus (ATB) ******************
  53. *****************************************************************
  54. | ! =============== |
  55. | * ===== F =====<---------|
  56. | ::::::::: ==== U ====
  57. |-->:: CTI ::<!! === N ===
  58. | ::::::::: ! == N ==
  59. | ^ * == E ==
  60. | ! &&&&&&&&& IIIIIII == L ==
  61. |------>&& ETB &&<......II I =======
  62. | ! &&&&&&&&& II I .
  63. | ! I I .
  64. | ! I REP I<..........
  65. | ! I I
  66. | !!>&&&&&&&&& II I *Source: ARM ltd.
  67. |------>& TPIU &<......II I DAP = Debug Access Port
  68. &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
  69. ; PTM = Program Trace Macrocell
  70. ; CTI = Cross Trigger Interface
  71. * ETB = Embedded Trace Buffer
  72. To trace port TPIU= Trace Port Interface Unit
  73. SWD = Serial Wire Debug
  74. While on target configuration of the components is done via the APB bus,
  75. all trace data are carried out-of-band on the ATB bus. The CTM provides
  76. a way to aggregate and distribute signals between CoreSight components.
  77. The coresight framework provides a central point to represent, configure and
  78. manage coresight devices on a platform. This first implementation centers on
  79. the basic tracing functionality, enabling components such ETM/PTM, funnel,
  80. replicator, TMC, TPIU and ETB. Future work will enable more
  81. intricate IP blocks such as STM and CTI.
  82. Acronyms and Classification
  83. ---------------------------
  84. Acronyms:
  85. PTM: Program Trace Macrocell
  86. ETM: Embedded Trace Macrocell
  87. STM: System trace Macrocell
  88. ETB: Embedded Trace Buffer
  89. ITM: Instrumentation Trace Macrocell
  90. TPIU: Trace Port Interface Unit
  91. TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router
  92. TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO
  93. CTI: Cross Trigger Interface
  94. Classification:
  95. Source:
  96. ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
  97. Link:
  98. Funnel, replicator (intelligent or not), TMC-ETR
  99. Sinks:
  100. ETBv1.0, ETB1.1, TPIU, TMC-ETF
  101. Misc:
  102. CTI
  103. Device Tree Bindings
  104. ----------------------
  105. See Documentation/devicetree/bindings/arm/coresight.txt for details.
  106. As of this writing drivers for ITM, STMs and CTIs are not provided but are
  107. expected to be added as the solution matures.
  108. Framework and implementation
  109. ----------------------------
  110. The coresight framework provides a central point to represent, configure and
  111. manage coresight devices on a platform. Any coresight compliant device can
  112. register with the framework for as long as they use the right APIs:
  113. struct coresight_device *coresight_register(struct coresight_desc *desc);
  114. void coresight_unregister(struct coresight_device *csdev);
  115. The registering function is taking a "struct coresight_device *csdev" and
  116. register the device with the core framework. The unregister function takes
  117. a reference to a "struct coresight_device", obtained at registration time.
  118. If everything goes well during the registration process the new devices will
  119. show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
  120. root:~# ls /sys/bus/coresight/devices/
  121. replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
  122. 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
  123. root:~#
  124. The functions take a "struct coresight_device", which looks like this:
  125. struct coresight_desc {
  126. enum coresight_dev_type type;
  127. struct coresight_dev_subtype subtype;
  128. const struct coresight_ops *ops;
  129. struct coresight_platform_data *pdata;
  130. struct device *dev;
  131. const struct attribute_group **groups;
  132. };
  133. The "coresight_dev_type" identifies what the device is, i.e, source link or
  134. sink while the "coresight_dev_subtype" will characterise that type further.
  135. The "struct coresight_ops" is mandatory and will tell the framework how to
  136. perform base operations related to the components, each component having
  137. a different set of requirement. For that "struct coresight_ops_sink",
  138. "struct coresight_ops_link" and "struct coresight_ops_source" have been
  139. provided.
  140. The next field, "struct coresight_platform_data *pdata" is acquired by calling
  141. "of_get_coresight_platform_data()", as part of the driver's _probe routine and
  142. "struct device *dev" gets the device reference embedded in the "amba_device":
  143. static int etm_probe(struct amba_device *adev, const struct amba_id *id)
  144. {
  145. ...
  146. ...
  147. drvdata->dev = &adev->dev;
  148. ...
  149. }
  150. Specific class of device (source, link, or sink) have generic operations
  151. that can be performed on them (see "struct coresight_ops"). The
  152. "**groups" is a list of sysfs entries pertaining to operations
  153. specific to that component only. "Implementation defined" customisations are
  154. expected to be accessed and controlled using those entries.
  155. How to use the tracer modules
  156. -----------------------------
  157. There are two ways to use the Coresight framework: 1) using the perf cmd line
  158. tools and 2) interacting directly with the Coresight devices using the sysFS
  159. interface. Preference is given to the former as using the sysFS interface
  160. requires a deep understanding of the Coresight HW. The following sections
  161. provide details on using both methods.
  162. 1) Using the sysFS interface:
  163. Before trace collection can start, a coresight sink needs to be identified.
  164. There is no limit on the amount of sinks (nor sources) that can be enabled at
  165. any given moment. As a generic operation, all device pertaining to the sink
  166. class will have an "active" entry in sysfs:
  167. root:/sys/bus/coresight/devices# ls
  168. replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
  169. 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
  170. root:/sys/bus/coresight/devices# ls 20010000.etb
  171. enable_sink status trigger_cntr
  172. root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
  173. root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
  174. 1
  175. root:/sys/bus/coresight/devices#
  176. At boot time the current etm3x driver will configure the first address
  177. comparator with "_stext" and "_etext", essentially tracing any instruction
  178. that falls within that range. As such "enabling" a source will immediately
  179. trigger a trace capture:
  180. root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
  181. root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
  182. 1
  183. root:/sys/bus/coresight/devices# cat 20010000.etb/status
  184. Depth: 0x2000
  185. Status: 0x1
  186. RAM read ptr: 0x0
  187. RAM wrt ptr: 0x19d3 <----- The write pointer is moving
  188. Trigger cnt: 0x0
  189. Control: 0x1
  190. Flush status: 0x0
  191. Flush ctrl: 0x2001
  192. root:/sys/bus/coresight/devices#
  193. Trace collection is stopped the same way:
  194. root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
  195. root:/sys/bus/coresight/devices#
  196. The content of the ETB buffer can be harvested directly from /dev:
  197. root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
  198. of=~/cstrace.bin
  199. 64+0 records in
  200. 64+0 records out
  201. 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
  202. root:/sys/bus/coresight/devices#
  203. The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
  204. Following is a DS-5 output of an experimental loop that increments a variable up
  205. to a certain value. The example is simple and yet provides a glimpse of the
  206. wealth of possibilities that coresight provides.
  207. Info Tracing enabled
  208. Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
  209. Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
  210. Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
  211. Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
  212. Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
  213. Instruction 0 0x8026B550 E3530004 false CMP r3,#4
  214. Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
  215. Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
  216. Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
  217. Timestamp Timestamp: 17106715833
  218. Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
  219. Instruction 0 0x8026B550 E3530004 false CMP r3,#4
  220. Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
  221. Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
  222. Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
  223. Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
  224. Instruction 0 0x8026B550 E3530004 false CMP r3,#4
  225. Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
  226. Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
  227. Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
  228. Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
  229. Instruction 0 0x8026B550 E3530004 false CMP r3,#4
  230. Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
  231. Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
  232. Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
  233. Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
  234. Instruction 0 0x8026B550 E3530004 false CMP r3,#4
  235. Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
  236. Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
  237. Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
  238. Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
  239. Instruction 0 0x8026B550 E3530004 false CMP r3,#4
  240. Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
  241. Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
  242. Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
  243. Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
  244. Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
  245. Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
  246. Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
  247. Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
  248. Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
  249. Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
  250. Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
  251. Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
  252. Info Tracing enabled
  253. Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
  254. Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
  255. Timestamp Timestamp: 17107041535
  256. 2) Using perf framework:
  257. Coresight tracers are represented using the Perf framework's Performance
  258. Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
  259. controlling when tracing gets enabled based on when the process of interest is
  260. scheduled. When configured in a system, Coresight PMUs will be listed when
  261. queried by the perf command line tool:
  262. linaro@linaro-nano:~$ ./perf list pmu
  263. List of pre-defined events (to be used in -e):
  264. cs_etm// [Kernel PMU event]
  265. linaro@linaro-nano:~$
  266. Regardless of the number of tracers available in a system (usually equal to the
  267. amount of processor cores), the "cs_etm" PMU will be listed only once.
  268. A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
  269. listed along with configuration options within forward slashes '/'. Since a
  270. Coresight system will typically have more than one sink, the name of the sink to
  271. work with needs to be specified as an event option. Names for sink to choose
  272. from are listed in sysFS under ($SYSFS)/bus/coresight/devices:
  273. root@linaro-nano:~# ls /sys/bus/coresight/devices/
  274. 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
  275. 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
  276. 20070000.etr 20120000.replicator 220c0000.funnel
  277. 23040000.etm 23140000.etm 23340000.etm
  278. root@linaro-nano:~# perf record -e cs_etm/@20070000.etr/u --per-thread program
  279. The syntax within the forward slashes '/' is important. The '@' character
  280. tells the parser that a sink is about to be specified and that this is the sink
  281. to use for the trace session.
  282. More information on the above and other example on how to use Coresight with
  283. the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
  284. repository [3].
  285. 2.1) AutoFDO analysis using the perf tools:
  286. perf can be used to record and analyze trace of programs.
  287. Execution can be recorded using 'perf record' with the cs_etm event,
  288. specifying the name of the sink to record to, e.g:
  289. perf record -e cs_etm/@20070000.etr/u --per-thread
  290. The 'perf report' and 'perf script' commands can be used to analyze execution,
  291. synthesizing instruction and branch events from the instruction trace.
  292. 'perf inject' can be used to replace the trace data with the synthesized events.
  293. The --itrace option controls the type and frequency of synthesized events
  294. (see perf documentation).
  295. Note that only 64-bit programs are currently supported - further work is
  296. required to support instruction decode of 32-bit Arm programs.
  297. Generating coverage files for Feedback Directed Optimization: AutoFDO
  298. ---------------------------------------------------------------------
  299. 'perf inject' accepts the --itrace option in which case tracing data is
  300. removed and replaced with the synthesized events. e.g.
  301. perf inject --itrace --strip -i perf.data -o perf.data.new
  302. Below is an example of using ARM ETM for autoFDO. It requires autofdo
  303. (https://github.com/google/autofdo) and gcc version 5. The bubble
  304. sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
  305. $ gcc-5 -O3 sort.c -o sort
  306. $ taskset -c 2 ./sort
  307. Bubble sorting array of 30000 elements
  308. 5910 ms
  309. $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
  310. Bubble sorting array of 30000 elements
  311. 12543 ms
  312. [ perf record: Woken up 35 times to write data ]
  313. [ perf record: Captured and wrote 69.640 MB perf.data ]
  314. $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
  315. $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
  316. $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
  317. $ taskset -c 2 ./sort_autofdo
  318. Bubble sorting array of 30000 elements
  319. 5806 ms
  320. How to use the STM module
  321. -------------------------
  322. Using the System Trace Macrocell module is the same as the tracers - the only
  323. difference is that clients are driving the trace capture rather
  324. than the program flow through the code.
  325. As with any other CoreSight component, specifics about the STM tracer can be
  326. found in sysfs with more information on each entry being found in [1]:
  327. root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
  328. enable_source hwevent_select port_enable subsystem uevent
  329. hwevent_enable mgmt port_select traceid
  330. root@genericarmv8:~#
  331. Like any other source a sink needs to be identified and the STM enabled before
  332. being used:
  333. root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
  334. root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
  335. From there user space applications can request and use channels using the devfs
  336. interface provided for that purpose by the generic STM API:
  337. root@genericarmv8:~# ls -l /dev/20100000.stm
  338. crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm
  339. root@genericarmv8:~#
  340. Details on how to use the generic STM API can be found here [2].
  341. [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
  342. [2]. Documentation/trace/stm.rst
  343. [3]. https://github.com/Linaro/perf-opencsd