| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430 |
- Coresight - HW Assisted Tracing on ARM
- ======================================
- Author: Mathieu Poirier <mathieu.poirier@linaro.org>
- Date: September 11th, 2014
- Introduction
- ------------
- Coresight is an umbrella of technologies allowing for the debugging of ARM
- based SoC. It includes solutions for JTAG and HW assisted tracing. This
- document is concerned with the latter.
- HW assisted tracing is becoming increasingly useful when dealing with systems
- that have many SoCs and other components like GPU and DMA engines. ARM has
- developed a HW assisted tracing solution by means of different components, each
- being added to a design at synthesis time to cater to specific tracing needs.
- Components are generally categorised as source, link and sinks and are
- (usually) discovered using the AMBA bus.
- "Sources" generate a compressed stream representing the processor instruction
- path based on tracing scenarios as configured by users. From there the stream
- flows through the coresight system (via ATB bus) using links that are connecting
- the emanating source to a sink(s). Sinks serve as endpoints to the coresight
- implementation, either storing the compressed stream in a memory buffer or
- creating an interface to the outside world where data can be transferred to a
- host without fear of filling up the onboard coresight memory buffer.
- At typical coresight system would look like this:
- *****************************************************************
- **************************** AMBA AXI ****************************===||
- ***************************************************************** ||
- ^ ^ | ||
- | | * **
- 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
- 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
- |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
- | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
- | # ETM # ::::: | # PTM # ::::: ::::: @ |
- | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
- | |->### | ! | |->### | ! | ! . | || DAP ||
- | | # | ! | | # | ! | ! . | |||||||||
- | | . | ! | | . | ! | ! . | | |
- | | . | ! | | . | ! | ! . | | *
- | | . | ! | | . | ! | ! . | | SWD/
- | | . | ! | | . | ! | ! . | | JTAG
- *****************************************************************<-|
- *************************** AMBA Debug APB ************************
- *****************************************************************
- | . ! . ! ! . |
- | . * . * * . |
- *****************************************************************
- ******************** Cross Trigger Matrix (CTM) *******************
- *****************************************************************
- | . ^ . . |
- | * ! * * |
- *****************************************************************
- ****************** AMBA Advanced Trace Bus (ATB) ******************
- *****************************************************************
- | ! =============== |
- | * ===== F =====<---------|
- | ::::::::: ==== U ====
- |-->:: CTI ::<!! === N ===
- | ::::::::: ! == N ==
- | ^ * == E ==
- | ! &&&&&&&&& IIIIIII == L ==
- |------>&& ETB &&<......II I =======
- | ! &&&&&&&&& II I .
- | ! I I .
- | ! I REP I<..........
- | ! I I
- | !!>&&&&&&&&& II I *Source: ARM ltd.
- |------>& TPIU &<......II I DAP = Debug Access Port
- &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
- ; PTM = Program Trace Macrocell
- ; CTI = Cross Trigger Interface
- * ETB = Embedded Trace Buffer
- To trace port TPIU= Trace Port Interface Unit
- SWD = Serial Wire Debug
- While on target configuration of the components is done via the APB bus,
- all trace data are carried out-of-band on the ATB bus. The CTM provides
- a way to aggregate and distribute signals between CoreSight components.
- The coresight framework provides a central point to represent, configure and
- manage coresight devices on a platform. This first implementation centers on
- the basic tracing functionality, enabling components such ETM/PTM, funnel,
- replicator, TMC, TPIU and ETB. Future work will enable more
- intricate IP blocks such as STM and CTI.
- Acronyms and Classification
- ---------------------------
- Acronyms:
- PTM: Program Trace Macrocell
- ETM: Embedded Trace Macrocell
- STM: System trace Macrocell
- ETB: Embedded Trace Buffer
- ITM: Instrumentation Trace Macrocell
- TPIU: Trace Port Interface Unit
- TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router
- TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO
- CTI: Cross Trigger Interface
- Classification:
- Source:
- ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
- Link:
- Funnel, replicator (intelligent or not), TMC-ETR
- Sinks:
- ETBv1.0, ETB1.1, TPIU, TMC-ETF
- Misc:
- CTI
- Device Tree Bindings
- ----------------------
- See Documentation/devicetree/bindings/arm/coresight.txt for details.
- As of this writing drivers for ITM, STMs and CTIs are not provided but are
- expected to be added as the solution matures.
- Framework and implementation
- ----------------------------
- The coresight framework provides a central point to represent, configure and
- manage coresight devices on a platform. Any coresight compliant device can
- register with the framework for as long as they use the right APIs:
- struct coresight_device *coresight_register(struct coresight_desc *desc);
- void coresight_unregister(struct coresight_device *csdev);
- The registering function is taking a "struct coresight_device *csdev" and
- register the device with the core framework. The unregister function takes
- a reference to a "struct coresight_device", obtained at registration time.
- If everything goes well during the registration process the new devices will
- show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
- root:~# ls /sys/bus/coresight/devices/
- replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
- 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
- root:~#
- The functions take a "struct coresight_device", which looks like this:
- struct coresight_desc {
- enum coresight_dev_type type;
- struct coresight_dev_subtype subtype;
- const struct coresight_ops *ops;
- struct coresight_platform_data *pdata;
- struct device *dev;
- const struct attribute_group **groups;
- };
- The "coresight_dev_type" identifies what the device is, i.e, source link or
- sink while the "coresight_dev_subtype" will characterise that type further.
- The "struct coresight_ops" is mandatory and will tell the framework how to
- perform base operations related to the components, each component having
- a different set of requirement. For that "struct coresight_ops_sink",
- "struct coresight_ops_link" and "struct coresight_ops_source" have been
- provided.
- The next field, "struct coresight_platform_data *pdata" is acquired by calling
- "of_get_coresight_platform_data()", as part of the driver's _probe routine and
- "struct device *dev" gets the device reference embedded in the "amba_device":
- static int etm_probe(struct amba_device *adev, const struct amba_id *id)
- {
- ...
- ...
- drvdata->dev = &adev->dev;
- ...
- }
- Specific class of device (source, link, or sink) have generic operations
- that can be performed on them (see "struct coresight_ops"). The
- "**groups" is a list of sysfs entries pertaining to operations
- specific to that component only. "Implementation defined" customisations are
- expected to be accessed and controlled using those entries.
- How to use the tracer modules
- -----------------------------
- There are two ways to use the Coresight framework: 1) using the perf cmd line
- tools and 2) interacting directly with the Coresight devices using the sysFS
- interface. Preference is given to the former as using the sysFS interface
- requires a deep understanding of the Coresight HW. The following sections
- provide details on using both methods.
- 1) Using the sysFS interface:
- Before trace collection can start, a coresight sink needs to be identified.
- There is no limit on the amount of sinks (nor sources) that can be enabled at
- any given moment. As a generic operation, all device pertaining to the sink
- class will have an "active" entry in sysfs:
- root:/sys/bus/coresight/devices# ls
- replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
- 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
- root:/sys/bus/coresight/devices# ls 20010000.etb
- enable_sink status trigger_cntr
- root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
- root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
- 1
- root:/sys/bus/coresight/devices#
- At boot time the current etm3x driver will configure the first address
- comparator with "_stext" and "_etext", essentially tracing any instruction
- that falls within that range. As such "enabling" a source will immediately
- trigger a trace capture:
- root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
- root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
- 1
- root:/sys/bus/coresight/devices# cat 20010000.etb/status
- Depth: 0x2000
- Status: 0x1
- RAM read ptr: 0x0
- RAM wrt ptr: 0x19d3 <----- The write pointer is moving
- Trigger cnt: 0x0
- Control: 0x1
- Flush status: 0x0
- Flush ctrl: 0x2001
- root:/sys/bus/coresight/devices#
- Trace collection is stopped the same way:
- root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
- root:/sys/bus/coresight/devices#
- The content of the ETB buffer can be harvested directly from /dev:
- root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
- of=~/cstrace.bin
- 64+0 records in
- 64+0 records out
- 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
- root:/sys/bus/coresight/devices#
- The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
- Following is a DS-5 output of an experimental loop that increments a variable up
- to a certain value. The example is simple and yet provides a glimpse of the
- wealth of possibilities that coresight provides.
- Info Tracing enabled
- Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
- Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
- Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
- Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
- Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
- Instruction 0 0x8026B550 E3530004 false CMP r3,#4
- Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
- Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
- Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
- Timestamp Timestamp: 17106715833
- Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
- Instruction 0 0x8026B550 E3530004 false CMP r3,#4
- Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
- Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
- Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
- Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
- Instruction 0 0x8026B550 E3530004 false CMP r3,#4
- Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
- Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
- Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
- Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
- Instruction 0 0x8026B550 E3530004 false CMP r3,#4
- Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
- Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
- Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
- Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
- Instruction 0 0x8026B550 E3530004 false CMP r3,#4
- Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
- Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
- Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
- Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
- Instruction 0 0x8026B550 E3530004 false CMP r3,#4
- Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
- Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
- Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
- Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
- Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
- Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
- Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
- Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
- Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
- Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
- Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
- Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
- Info Tracing enabled
- Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
- Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
- Timestamp Timestamp: 17107041535
- 2) Using perf framework:
- Coresight tracers are represented using the Perf framework's Performance
- Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
- controlling when tracing gets enabled based on when the process of interest is
- scheduled. When configured in a system, Coresight PMUs will be listed when
- queried by the perf command line tool:
- linaro@linaro-nano:~$ ./perf list pmu
- List of pre-defined events (to be used in -e):
- cs_etm// [Kernel PMU event]
- linaro@linaro-nano:~$
- Regardless of the number of tracers available in a system (usually equal to the
- amount of processor cores), the "cs_etm" PMU will be listed only once.
- A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
- listed along with configuration options within forward slashes '/'. Since a
- Coresight system will typically have more than one sink, the name of the sink to
- work with needs to be specified as an event option. Names for sink to choose
- from are listed in sysFS under ($SYSFS)/bus/coresight/devices:
- root@linaro-nano:~# ls /sys/bus/coresight/devices/
- 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
- 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
- 20070000.etr 20120000.replicator 220c0000.funnel
- 23040000.etm 23140000.etm 23340000.etm
- root@linaro-nano:~# perf record -e cs_etm/@20070000.etr/u --per-thread program
- The syntax within the forward slashes '/' is important. The '@' character
- tells the parser that a sink is about to be specified and that this is the sink
- to use for the trace session.
- More information on the above and other example on how to use Coresight with
- the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
- repository [3].
- 2.1) AutoFDO analysis using the perf tools:
- perf can be used to record and analyze trace of programs.
- Execution can be recorded using 'perf record' with the cs_etm event,
- specifying the name of the sink to record to, e.g:
- perf record -e cs_etm/@20070000.etr/u --per-thread
- The 'perf report' and 'perf script' commands can be used to analyze execution,
- synthesizing instruction and branch events from the instruction trace.
- 'perf inject' can be used to replace the trace data with the synthesized events.
- The --itrace option controls the type and frequency of synthesized events
- (see perf documentation).
- Note that only 64-bit programs are currently supported - further work is
- required to support instruction decode of 32-bit Arm programs.
- Generating coverage files for Feedback Directed Optimization: AutoFDO
- ---------------------------------------------------------------------
- 'perf inject' accepts the --itrace option in which case tracing data is
- removed and replaced with the synthesized events. e.g.
- perf inject --itrace --strip -i perf.data -o perf.data.new
- Below is an example of using ARM ETM for autoFDO. It requires autofdo
- (https://github.com/google/autofdo) and gcc version 5. The bubble
- sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
- $ gcc-5 -O3 sort.c -o sort
- $ taskset -c 2 ./sort
- Bubble sorting array of 30000 elements
- 5910 ms
- $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
- Bubble sorting array of 30000 elements
- 12543 ms
- [ perf record: Woken up 35 times to write data ]
- [ perf record: Captured and wrote 69.640 MB perf.data ]
- $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
- $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
- $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
- $ taskset -c 2 ./sort_autofdo
- Bubble sorting array of 30000 elements
- 5806 ms
- How to use the STM module
- -------------------------
- Using the System Trace Macrocell module is the same as the tracers - the only
- difference is that clients are driving the trace capture rather
- than the program flow through the code.
- As with any other CoreSight component, specifics about the STM tracer can be
- found in sysfs with more information on each entry being found in [1]:
- root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
- enable_source hwevent_select port_enable subsystem uevent
- hwevent_enable mgmt port_select traceid
- root@genericarmv8:~#
- Like any other source a sink needs to be identified and the STM enabled before
- being used:
- root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
- root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
- From there user space applications can request and use channels using the devfs
- interface provided for that purpose by the generic STM API:
- root@genericarmv8:~# ls -l /dev/20100000.stm
- crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm
- root@genericarmv8:~#
- Details on how to use the generic STM API can be found here [2].
- [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
- [2]. Documentation/trace/stm.rst
- [3]. https://github.com/Linaro/perf-opencsd
|