Build Overview

The Zephyr build process can be divided into two main phases: a configuration phase (driven by CMake) and a build phase (driven by Make or Ninja).

Configuration Phase

The configuration phase begins when the user invokes CMake, specifying a source application directory and a board target.

Zephyr's build configuration phase

CMake begins by processing the CMakeLists.txt file in the application directory, which refers to the CMakeLists.txt file in the Zephyr top-level directory, which in turn refers to CMakeLists.txt files throughout the build tree (directly and indirectly). Its primary output is a set of Makefiles or Ninja files to drive the build process, but the CMake scripts also do some processing of their own:

Devicetree

*.dts (devicetree source) and *.dtsi (devicetree source include) files are collected from the target’s architecture, SoC, board, and application directories.

*.dtsi files are included by *.dts files via the C preprocessor (often abbreviated cpp, which should not be confused with C++). The C preprocessor is also used to merge in any devicetree *.overlay files, and to expand macros in *.dts, *.dtsi, and *.overlay files.

The preprocessed devicetree sources (stored in *.dts.pre.tmp) are parsed by gen_defines.py to generate a devicetree_unfixed.h header with preprocessor macros.

As a debugging aid, gen_defines.py writes the final devicetree to zephyr.dts. This file is just for reference. It is not used anywhere.

The dtc devicetree compiler also gets run on the preprocessed devicetree sources to catch any extra warnings and errors generated by it. The output from dtc is unused otherwise.

The above is just a brief overview. For more information on devicetree, see Devicetree Guide.

Devicetree fixups

Files named dts_fixup.h from the target’s architecture, SoC, board, and application directories are concatenated into a single devicetree_fixups.h file. dts_fixup.h files are used to rename generated macros to names expected by the source code.

Source code accesses preprocessor macros generated from devicetree by including the devicetree.h header, which includes devicetree_unfixed.h and devicetree_fixups.h.

Kconfig

Kconfig files define available configuration options for for the target architecture, SoC, board, and application, as well as dependencies between options.

Kconfig configurations are stored in configuration files. The initial configuration is generated by merging configuration fragments from the board and application (e.g. prj.conf).

The output from Kconfig is an autoconf.h header with preprocessor assignments, and a .config file that acts both as a saved configuration and as configuration output (used by CMake).

Information from devicetree is available to Kconfig, through the functions defined in kconfigfunctions.py.

See the Kconfig section of the manual for more information.

Build Phase

The build phase begins when the user invokes make or ninja. Its ultimate output is a complete Zephyr application in a format suitable for loading/flashing on the desired target board (zephyr.elf, zephyr.hex, etc.) The build phase can be broken down, conceptually, into four stages: the pre-build, first-pass binary, final binary, and post-processing.

Pre-build occurs before any source files are compiled, because during this phase header files used by the source files are generated.

Pre-build

Offset generation

Access to high-level data structures and members is sometimes required when the definitions of those structures is not immediately accessible (e.g., assembly language). The generation of offsets.h (by gen_offset_header.py) facilitates this.

System call boilerplate

The gen_syscall.py and parse_syscalls.py scripts work together to bind potential system call functions with their implementations.

Zephyr's build stage I

First-pass binary

Compilation proper begins with the first-pass binary. Source files (C and assembly) are collected from various subsystems (which ones is decided during the configuration phase), and compiled into archives (with reference to header files in the tree, as well as those generated during the configuration phase and the pre-build stage).

If memory protection is enabled, then:

Partition grouping

The gen_app_partitions.py script scans all the generated archives and outputs linker scripts to ensure that application partitions are properly grouped and aligned for the target’s memory protection hardware.

Then cpp is used to combine linker script fragments from the target’s architecture/SoC, the kernel tree, optionally the partition output if memory protection is enabled, and any other fragments selected during the configuration process, into a linker.cmd file. The compiled archives are then linked with ld as specified in the linker.cmd.

In some configurations, this is the final binary, and the next stage is skipped.

Zephyr's build stage II

Final binary

In some configurations, the binary from the previous stage is incomplete, with empty and/or placeholder sections that must be filled in by, essentially, reflection. When User Mode is enabled:

Kernel object hashing

The gen_kobject_list.py scans the ELF DWARF debug data to find the address of the all kernel objects. This list is passed to gperf, which generates a perfect hash function and table of those addresses, then that output is optimized by process_gperf.py, using known properties of our special case.

Then, the link from the previous stage is repeated, this time with the missing pieces populated.

Zephyr's build stage III

Post processing

Finally, if necessary, the completed kernel is converted from ELF to the format expected by the loader and/or flash tool required by the target. This is accomplished in a straightforward manner with objdump.

Zephyr's build final stage

Supporting Scripts and Tools

The following is a detailed description of the scripts used during the build process.

scripts/gen_syscalls.py

Script to generate system call invocation macros

This script parses the system call metadata JSON file emitted by parse_syscalls.py to create several files:

  • A file containing weak aliases of any potentially unimplemented system calls, as well as the system call dispatch table, which maps system call type IDs to their handler functions.

  • A header file defining the system call type IDs, as well as function prototypes for all system call handler functions.

  • A directory containing header files. Each header corresponds to a header that was identified as containing system call declarations. These generated headers contain the inline invocation functions for each system call in that header.

scripts/gen_kobject_list.py

Script to generate gperf tables of kernel object metadata

User mode threads making system calls reference kernel objects by memory address, as the kernel/driver APIs in Zephyr are the same for both user and supervisor contexts. It is necessary for the kernel to be able to validate accesses to kernel objects to make the following assertions:

  • That the memory address points to a kernel object

  • The kernel object is of the expected type for the API being invoked

  • The kernel object is of the expected initialization state

  • The calling thread has sufficient permissions on the object

For more details see the Kernel Objects section in the documentation.

The zephyr build generates an intermediate ELF binary, zephyr_prebuilt.elf, which this script scans looking for kernel objects by examining the DWARF debug information to look for instances of data structures that are considered kernel objects. For device drivers, the API struct pointer populated at build time is also examined to disambiguate between various device driver instances since they are all ‘struct device’.

This script can generate five different output files:

  • A gperf script to generate the hash table mapping kernel object memory addresses to kernel object metadata, used to track permissions, object type, initialization state, and any object-specific data.

  • A header file containing generated macros for validating driver instances inside the system call handlers for the driver subsystem APIs.

  • A code fragment included by kernel.h with one enum constant for each kernel object type and each driver instance.

  • The inner cases of a switch/case C statement, included by kernel/userspace.c, mapping the kernel object types and driver instances to their human-readable representation in the otype_to_str() function.

  • The inner cases of a switch/case C statement, included by kernel/userspace.c, mapping kernel object types to their sizes. This is used for allocating instances of them at runtime (CONFIG_DYNAMIC_OBJECTS) in the obj_size_get() function.

scripts/gen_offset_header.py

This script scans a specified object file and generates a header file that defined macros for the offsets of various found structure members (particularly symbols ending with _OFFSET or _SIZEOF), primarily intended for use in assembly code.

scripts/parse_syscalls.py

Script to scan Zephyr include directories and emit system call and subsystem metadata

System calls require a great deal of boilerplate code in order to implement completely. This script is the first step in the build system’s process of auto-generating this code by doing a text scan of directories containing C or header files, and building up a database of system calls and their function call prototypes. This information is emitted to a generated JSON file for further processing.

This script also scans for struct definitions such as __subsystem and __net_socket, emitting a JSON dictionary mapping tags to all the struct declarations found that were tagged with them.

If the output JSON file already exists, its contents are checked against what information this script would have outputted; if the result is that the file would be unchanged, it is not modified to prevent unnecessary incremental builds.

arch/x86/gen_idt.py

Generate Interrupt Descriptor Table for x86 CPUs.

This script generates the interrupt descriptor table (IDT) for x86. Please consult the IA Architecture SW Developer Manual, volume 3, for more details on this data structure.

This script accepts as input the zephyr_prebuilt.elf binary, which is a link of the Zephyr kernel without various build-time generated data structures (such as the IDT) inserted into it. This kernel image has been properly padded such that inserting these data structures will not disturb the memory addresses of other symbols. From the kernel binary we read a special section “intList” which contains the desired interrupt routing configuration for the kernel, populated by instances of the IRQ_CONNECT() macro.

This script outputs three binary tables:

  1. The interrupt descriptor table itself.

  2. A bitfield indicating which vectors in the IDT are free for installation of dynamic interrupts at runtime.

  3. An array which maps configured IRQ lines to their associated vector entries in the IDT, used to program the APIC at runtime.

arch/x86/gen_gdt.py

Generate a Global Descriptor Table (GDT) for x86 CPUs.

For additional detail on GDT and x86 memory management, please consult the IA Architecture SW Developer Manual, vol. 3.

This script accepts as input the zephyr_prebuilt.elf binary, which is a link of the Zephyr kernel without various build-time generated data structures (such as the GDT) inserted into it. This kernel image has been properly padded such that inserting these data structures will not disturb the memory addresses of other symbols.

The input kernel ELF binary is used to obtain the following information:

  • Memory addresses of the Main and Double Fault TSS structures so GDT descriptors can be created for them

  • Memory addresses of where the GDT lives in memory, so that this address can be populated in the GDT pseudo descriptor

  • whether userspace or HW stack protection are enabled in Kconfig

The output is a GDT whose contents depend on the kernel configuration. With no memory protection features enabled, we generate flat 32-bit code and data segments. If hardware- based stack overflow protection or userspace is enabled, we additionally create descriptors for the main and double- fault IA tasks, needed for userspace privilege elevation and double-fault handling. If userspace is enabled, we also create flat code/data segments for ring 3 execution.

scripts/gen_relocate_app.py

This script will relocate .text, .rodata, .data and .bss sections from required files and places it in the required memory region. This memory region and file are given to this python script in the form of a string.

Example of such a string would be:

SRAM2:/home/xyz/zephyr/samples/hello_world/src/main.c,\
SRAM1:/home/xyz/zephyr/samples/hello_world/src/main2.c

To invoke this script:

python3 gen_relocate_app.py -i input_string -o generated_linker -c generated_code

Configuration that needs to be sent to the python script.

  • If the memory is like SRAM1/SRAM2/CCD/AON then place full object in the sections

  • If the memory type is appended with _DATA / _TEXT/ _RODATA/ _BSS only the selected memory is placed in the required memory region. Others are ignored.

Multiple regions can be appended together like SRAM2_DATA_BSS this will place data and bss inside SRAM2.

scripts/process_gperf.py

gperf C file post-processor

We use gperf to build up a perfect hashtable of pointer values. The way gperf does this is to create a table ‘wordlist’ indexed by a string representation of a pointer address, and then doing memcmp() on a string passed in for comparison

We are exclusively working with 4-byte pointer values. This script adjusts the generated code so that we work with pointers directly and not strings. This saves a considerable amount of space.