| .. _performance-counters: |
| |
| Performance Counters |
| ==================== |
| |
| Ibex implements performance counters according to the RISC-V Privileged Specification, version 1.11 (see Hardware Performance Monitor, Section 3.1.11). |
| The performance counters are placed inside the Control and Status Registers (CSRs) and can be accessed with the ``CSRRW(I)`` and ``CSRRS/C(I)`` instructions. |
| |
| Ibex implements the clock cycle counter ``mcycle(h)``, the retired instruction counter ``minstret(h)``, as well as the 29 event counters ``mhpmcounter3(h)`` - ``mhpmcounter31(h)`` and the corresponding event selector CSRs ``mhpmevent3`` - ``mhpmevent31``, and the ``mcountinhibit`` CSR to individually enable/disable the counters. |
| ``mcycle(h)`` and ``minstret(h)`` are always available and 64 bit wide. |
| The ``mhpmcounter`` performance counters are optional (unavailable by default) and parametrizable in width. |
| |
| Event Selector |
| -------------- |
| |
| The following events can be monitored using the performance counters of Ibex. |
| |
| +--------------+------------------+---------------------------------------------------------+ |
| | Event ID/Bit | Event Name | Event Description | |
| +==============+==================+=========================================================+ |
| | 0 | NumCycles | Number of cycles | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 2 | NumInstrRet | Number of instructions retired | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 3 | NumCyclesLSU | Number of cycles waiting for data memory | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 4 | NumCyclesIF | Cycles waiting for instruction fetches, i.e., number of | |
| | | | instructions wasted due to non-ideal caching | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 5 | NumLoads | Number of data memory loads. Misaligned accesses are | |
| | | | counted as two accesses | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 6 | NumStores | Number of data memory stores. Misaligned accesses are | |
| | | | counted as two accesses | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 7 | NumJumps | Number of unconditional jumps (j, jal, jr, jalr) | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 8 | NumBranches | Number of branches (conditional) | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 9 | NumBranchesTaken | Number of taken branches (conditional) | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 10 | NumInstrRetC | Number of compressed instructions retired | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 11 | NumCyclesMulWait | Cycles waiting for multiply to complete | |
| +--------------+------------------+---------------------------------------------------------+ |
| | 12 | NumCyclesDivWait | Cycles waiting for divide to complete | |
| +--------------+------------------+---------------------------------------------------------+ |
| |
| The event selector CSRs ``mhpmevent3`` - ``mhpmevent31`` define which of these events are counted by the event counters ``mhpmcounter3(h)`` - ``mhpmcounter31(h)``. |
| If a specific bit in an event selector CSR is set to 1, this means that events with this ID are being counted by the counter associated with that selector CSR. |
| If an event selector CSR is 0, this means that the corresponding counter is not counting any event. |
| |
| Controlling the counters from software |
| -------------------------------------- |
| |
| By default, all available counters are enabled after reset. |
| They can be individually enabled/disabled by overwriting the corresponding bit in the ``mcountinhibit`` CSR at address ``0x320`` as described in the RISC-V Privileged Specification, version 1.11 (see Machine Counter-Inhibit CSR, Section 3.1.13). |
| In particular, to enable/disable ``mcycle(h)``, bit 0 must be written. For ``minstret(h)``, it is bit 2. For event counter ``mhpmcounterX(h)``, it is bit X. |
| |
| The lower 32 bits of all counters can be accessed through the base register, whereas the upper 32 bits are accessed through the ``h``-register. |
| Reads to all these registers are non-destructive. |
| |
| Parametrization at synthesis time |
| --------------------------------- |
| |
| The ``mcycle(h)`` and ``minstret(h)`` counters are always available and 64 bit wide. |
| |
| The event counters ``mhpmcounter3(h)`` - ``mhpmcounter31(h)`` are parametrizable. |
| Their width can be parametrized between 1 and 64 bit through the ``WidthMHPMCounters`` parameter, which defaults to 40 bit wide counters. |
| |
| The number of available event counters ``mhpmcounterX(h)`` can be controlled via the ``NumMHPMCounters`` parameter. |
| By default (``NumMHPMCounters`` set to 0), no counters are available to software. |
| Set ``NumMHPMCounters`` to a value between 1 and 8 to make the counters ``mhpmcounter3(h)`` - ``mhpmcounter10(h)`` available as listed below. |
| Setting ``NumMHPMCounters`` to values larger than 8 does not result in any more performance counters. |
| |
| Unavailable counters always read 0. |
| |
| The association of events with the ``mphmcounter`` registers is hardwired as listed in the following table. |
| |
| +----------------------+----------------+--------------+------------------+ |
| | Event Counter | CSR Address | Event ID/Bit | Event Name | |
| +======================+================+==============+==================+ |
| | ``mcycle(h)`` | 0xB00 (0xB80) | 0 | NumCycles | |
| +----------------------+----------------+--------------+------------------+ |
| | ``minstret(h)`` | 0xB02 (0xB82) | 2 | NumInstrRet | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter3(h)`` | 0xB03 (0xB83) | 3 | NumCyclesLSU | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter4(h)`` | 0xB04 (0xB84) | 4 | NumCyclesIF | |
| | | | | | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter5(h)`` | 0xB05 (0xB85) | 5 | NumLoads | |
| | | | | | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter6(h)`` | 0xB06 (0xB86) | 6 | NumStores | |
| | | | | | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter7(h)`` | 0xB07 (0xB87) | 7 | NumJumps | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter8(h)`` | 0xB08 (0xB88) | 8 | NumBranches | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter9(h)`` | 0xB09 (0xB89) | 9 | NumBranchesTaken | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter10(h)`` | 0xB0A (0xB8A) | 10 | NumInstrRetC | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter11(h)`` | 0xB0B (0xB8B) | 11 | NumCyclesMulWait | |
| +----------------------+----------------+--------------+------------------+ |
| | ``mhpmcounter12(h)`` | 0xB0C (0xB8C) | 12 | NumCyclesDivWait | |
| +----------------------+----------------+--------------+------------------+ |
| |
| Similarly, the event selector CSRs are hardwired as follows. |
| The remaining event selector CSRs are tied to 0, i.e., no events are counted by the corresponding counters. |
| |
| +----------------------+-------------+-------------+--------------+ |
| | Event Selector | CSR Address | Reset Value | Event ID/Bit | |
| +======================+=============+=============+==============+ |
| | ``mhpmevent3(h)`` | 0x323 | 0x0000_0008 | 3 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent4(h)`` | 0x324 | 0x0000_0010 | 4 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent5(h)`` | 0x325 | 0x0000_0020 | 5 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent6(h)`` | 0x326 | 0x0000_0040 | 6 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent7(h)`` | 0x327 | 0x0000_0080 | 7 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent8(h)`` | 0x328 | 0x0000_0100 | 8 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent9(h)`` | 0x329 | 0x0000_0200 | 9 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent10(h)`` | 0x32A | 0x0000_0400 | 10 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent11(h)`` | 0x32B | 0x0000_0800 | 11 | |
| +----------------------+-------------+-------------+--------------+ |
| | ``mhpmevent12(h)`` | 0x32C | 0x0000_1000 | 12 | |
| +----------------------+-------------+-------------+--------------+ |
| |
| FPGA Targets |
| ------------ |
| |
| For FPGA targets the performance counters constitute a particularily large structure. |
| Implementing the maximum 29 event counters 32, 48 and 64 bit wide results in relative logic utilizations of the core of 100%, 111% and 129% respectively. |
| The relative numbers of flip-flops are 100%, 125% and 150%. |
| It is recommended to implement event counters of 32 bit width where possible. |
| |
| For Xilinx FPGA devices featuring the `DSP48E1` DSP slice or similar, counter logic can be absorbed into the DSP slice for widths up to 48 bits. |
| The resulting relative logic utilizations with respect to the non-DSP 32 bit counter implementation are 83% and 89% respectively for 32 and 48 bit DSP counters. |
| This comes at the expense of 1 DSP slice per counter. |
| For 32 bit counters only, the corresponding flip-flops can be incorporated into the DSP's output pipeline register, resulting in a reduction of the number of flip-flops to 50%. |
| In order to infer DSP slices for performance counters, define the preprocessor variable ``FPGA_XILINX``. |