Optimization Tools

Footprint and Memory Usage

The build system offers 3 targets to view and analyse RAM, ROM and stack usage in generated images. The tools run on the final image and give information about size of symbols and code being used in both RAM and ROM. Additionally, with features available through the compiler, we can also generate worst-case stack usage analysis:

Tools that are available as build system targets:

Build Target: puncover

This target uses a 3rd party tools called puncover which can be found here. When this target is built, it will launch a local web server which will allow you to open a web client and browse the files and view their ROM, RAM and stack usage. Before you can use this target, you will have to install the puncover python module:

pip3 install git+https://github.com/HBehrens/puncover --user

Then:

Using west:

west build -b reel_board samples/hello_world
west build -t puncover

Using CMake and ninja:

# Use cmake to configure a Ninja-based buildsystem:
cmake -B build -GNinja -DBOARD=reel_board samples/hello_world

# Now run ninja on the generated build system:
ninja -C build puncover

To view worst-case stack usage analysis, build this with the CONFIG_STACK_USAGE enabled.

Using west:

west build -b reel_board samples/hello_world -- -DCONFIG_STACK_USAGE=y
west build -t puncover

Using CMake and ninja:

# Use cmake to configure a Ninja-based buildsystem:
cmake -B build -GNinja -DBOARD=reel_board -DCONFIG_STACK_USAGE=y samples/hello_world

# Now run ninja on the generated build system:
ninja -C build puncover

Build Target: ram_report

List all compiled objects and their RAM usage in a tabular form with bytes per symbol and the percentage it uses. The data is grouped based on the file system location of the object in the tree and the file containing the symbol.

Use the ram_report target with your board:

Using west:

west build -b reel_board samples/hello_world
west build -t ram_report

Using CMake and ninja:

# Use cmake to configure a Ninja-based buildsystem:
cmake -B build -GNinja -DBOARD=reel_board samples/hello_world

# Now run ninja on the generated build system:
ninja -C build ram_report

which will generate something similar to the output below:

Path                                                                                    Size       %
==============================================================================================================
...
...
SystemCoreClock                                                                          4     0.08%
_kernel                                                                                 48     0.99%
_sw_isr_table                                                                          384     7.94%
cli.10544                                                                               16     0.33%
gpio_initialized.9765                                                                    1     0.02%
on.10543                                                                                 4     0.08%
poll_out_lock.9764                                                                       4     0.08%
z_idle_threads                                                                         128     2.65%
z_interrupt_stacks                                                                    2048    42.36%
z_main_thread                                                                          128     2.65%
arch                                                                                       1     0.02%
arm                                                                                      1     0.02%
    core                                                                                   1     0.02%
    aarch32                                                                              1     0.02%
        cortex_m                                                                           1     0.02%
        mpu                                                                              1     0.02%
            arm_mpu.c                                                                      1     0.02%
            static_regions_num                                                           1     0.02%
drivers                                                                                  536    11.09%
clock_control                                                                          100     2.07%
    nrf_power_clock.c                                                                    100     2.07%
    __device_clock_nrf                                                                  16     0.33%
    data                                                                                80     1.65%
    hfclk_users                                                                          4     0.08%
...
...

Build Target: rom_report

List all compiled objects and their ROM usage in a tabular form with bytes per symbol and the percentage it uses. The data is grouped based on the file system location of the object in the tree and the file containing the symbol.

Use the rom_report to get the ROM report:

Using west:

west build -b reel_board samples/hello_world
west build -t rom_report

Using CMake and ninja:

# Use cmake to configure a Ninja-based buildsystem:
cmake -B build -GNinja -DBOARD=reel_board samples/hello_world

# Now run ninja on the generated build system:
ninja -C build rom_report

which will generate something similar to the output below:

Path                                                                                    Size       %
==============================================================================================================
...
...
CSWTCH.5                                                                                 4     0.02%
SystemCoreClock                                                                          4     0.02%
__aeabi_idiv0                                                                            2     0.01%
__udivmoddi4                                                                           702     3.37%
_sw_isr_table                                                                          384     1.85%
delay_machine_code.9114                                                                  6     0.03%
levels.8826                                                                             20     0.10%
mpu_config                                                                               8     0.04%
transitions.10558                                                                       12     0.06%
arch                                                                                    1194     5.74%
arm                                                                                   1194     5.74%
    core                                                                                1194     5.74%
    aarch32                                                                           1194     5.74%
        cortex_m                                                                         852     4.09%
        fault.c                                                                        400     1.92%
            bus_fault.isra.0                                                              60     0.29%
            mem_manage_fault.isra.0                                                       56     0.27%
            usage_fault.isra.0                                                            36     0.17%
            z_arm_fault                                                                  232     1.11%
            z_arm_fault_init                                                              16     0.08%
        irq_init.c                                                                      24     0.12%
            z_arm_interrupt_init                                                          24     0.12%
        mpu                                                                            352     1.69%
            arm_core_mpu.c                                                                56     0.27%
            z_arm_configure_static_mpu_regions                                          56     0.27%
            arm_mpu.c                                                                    296     1.42%
            __init_sys_init_arm_mpu_init0                                                8     0.04%
            arm_core_mpu_configure_static_mpu_regions                                   20     0.10%
            arm_core_mpu_disable                                                        16     0.08%
            arm_core_mpu_enable                                                         20     0.10%
            arm_mpu_init                                                                92     0.44%
            mpu_configure_regions                                                      140     0.67%
        thread_abort.c                                                                  76     0.37%
            z_impl_k_thread_abort
            76     0.37%
...
...

Data Structures

Build Target: pahole

Poke-a-hole (pahole) is an object-file analysis tool to find the size of the data structures, and the holes caused due to aligning the data elements to the word-size of the CPU by the compiler.

Poke-a-hole (pahole) must be installed prior to using this target. It can be obtained from https://git.kernel.org/pub/scm/devel/pahole/pahole.git and is available in the dwarves package in both fedora and ubuntu:

sudo apt-get install dwarves

or in fedora:

sudo dnf install dwarves

Using west:

west build -b reel_board samples/hello_world
west build -t pahole

Using CMake and ninja:

# Use cmake to configure a Ninja-based buildsystem:
cmake -B build -GNinja -DBOARD=reel_board samples/hello_world

# Now run ninja on the generated build system:
ninja -C build pahole

After running this target, pahole will output the results to the console:

/* Used at: zephyr/isr_tables.c */
/* <80> ../include/sw_isr_table.h:30 */
struct _isr_table_entry {
        void *                     arg;                  /*     0     4 */
        void                       (*isr)(void *);       /*     4     4 */

        /* size: 8, cachelines: 1, members: 2 */
        /* last cacheline: 8 bytes */
};
/* Used at: zephyr/isr_tables.c */
/* <eb> ../include/arch/arm/aarch32/cortex_m/mpu/arm_mpu_v7m.h:134 */
struct arm_mpu_region_attr {
        uint32_t                   rasr;                 /*     0     4 */

        /* size: 4, cachelines: 1, members: 1 */
        /* last cacheline: 4 bytes */
};
/* Used at: zephyr/isr_tables.c */
/* <112> ../include/arch/arm/aarch32/cortex_m/mpu/arm_mpu.h:24 */
struct arm_mpu_region {
        uint32_t                   base;                 /*     0     4 */
        const char  *              name;                 /*     4     4 */
        arm_mpu_region_attr_t      attr;                 /*     8     4 */

        /* size: 12, cachelines: 1, members: 3 */
        /* last cacheline: 12 bytes */
};
...
...