nRF5340 Audio overview and firmware architecture

The application can work as a gateway or a headset. The gateway receives the audio data from external sources (USB or I2S) and forwards it to one or more headsets. The headset is a receiver device that plays back the audio it gets from the gateway. It is also possible to enable a bidirectional mode where one gateway can send and receive audio to and from one or two headsets at the same time.

Both device types use the same code base, but different firmware, and you need both types of devices for testing the application. Gateways and headsets can both run in one of the available application modes, either the connected isochronous stream (CIS) mode or in the broadcast isochronous stream (BIS) mode. The CIS mode is the default mode of the application.

Changing configuration related to the device type and the application modes requires rebuilding the firmware and reprogramming the development kits.

Regardless of the configuration, the application handles the audio data in the following manner:

  1. The gateway receives audio data from the audio source over USB or I2S.

  2. The gateway processes the audio data in its application core, which channels the data through the application layers:

    1. Audio data is sent to the synchronization module (I2S-based firmware) or directly to the software codec (USB-based firmware).

    2. Audio data is encoded by the software codec.

    3. Encoded audio data is sent to the Bluetooth LE Host.

  3. The host sends the encoded audio data to the LE Audio Controller Subsystem for nRF53 on the network core.

  4. The subsystem forwards the audio data to the hardware radio and sends it to the headset devices, as per the LE Audio specifications.

  5. The headsets receive the encoded audio data on their hardware radio on the network core side.

  6. The LE Audio Controller Subsystem for nRF53 running on each of the headsets sends the encoded audio data to the Bluetooth LE Host on the headsets’ application core.

  7. The headsets process the audio data in their application cores, which channel the data through the application layers:

    1. Audio data is sent to the stream control module and placed in a FIFO buffer.

    2. Audio data is sent from the FIFO buffer to the synchronization module (headsets only use I2S-based firmware).

    3. Audio data is decoded by the software codec.

  8. Decoded audio data is sent to the hardware audio output over I2S.

In the I2S-based firmware for gateway and headsets, sending the audio data through the application layers includes a mandatory synchronization step using the synchronization module. This proprietary module ensures that the audio is played at the same time with the correct speed. For more information, see Synchronization module overview.

Application modes

The application can work either in the connected isochronous stream (CIS) mode or in the broadcast isochronous stream (BIS) mode, depending on the chosen firmware configuration.

CIS and BIS mode overview

CIS and BIS mode overview

Connected Isochronous Stream (CIS)

CIS is a bidirectional communication protocol that allows for sending separate connected audio streams from a source device to one or more receivers. The gateway can send the audio data using both the left and the right ISO channels at the same time, allowing for stereophonic sound reproduction with synchronized playback.

This is the default configuration of the nRF5340 Audio application. In this configuration, you can use the nRF5340 Audio development kit in the role of the gateway, the left headset, or the right headset.

In the current version of the nRF5340 Audio application, the CIS mode offers both unidirectional and bidirectional communication. In the bidirectional communication, the headset device will send audio from the on-board PDM microphone. See Selecting the CIS bidirectional communication in the application description for more information.

You can also enable a walkie-talkie demonstration. In this demonstration, the gateway device will send audio from the on-board PDM microphone instead of using USB or the line-in. See Enabling the walkie-talkie demo in the application description for more information.

Broadcast Isochronous Stream (BIS)

BIS is a unidirectional communication protocol that allows for broadcasting one or more audio streams from a source device to an unlimited number of receivers that are not connected to the source.

In this configuration, you can use the nRF5340 Audio development kit in the role of the gateway or as one of the headsets. Use multiple nRF5340 Audio development kits to test BIS having multiple receiving headsets.

Note

In the BIS mode, you can use any number of nRF5340 Audio development kits as receivers.

The audio quality for both modes does not change, although the processing time for stereo can be longer.

Firmware architecture

The following figure illustrates the software layout for the nRF5340 Audio application:

nRF5340 Audio high-level design (overview)

nRF5340 Audio high-level design (overview)

The network core of the nRF5340 SoC runs the LE Audio Controller Subsystem for nRF53, which is included in the LE Audio controller for nRF5340 library’s HEX file. This subsystem is custom-made for the application. It is responsible for receiving the audio stream data from hardware layers and forwarding the data to the Bluetooth LE host on the application core. The subsystem implements the lower layers of the Bluetooth Low Energy software stack and follows the LE Audio specification requirements.

The application core runs both the Bluetooth LE Host from Zephyr and the application layer. The application layer is composed of a series of modules from different sources. These modules include the following major ones:

  • Peripheral modules from the nRF Connect SDK:

    • I2S

    • USB

    • SPI

    • TWI/I2C

    • UART (debug)

    • Timer

    • LC3 encoder/decoder

  • Application-specific Bluetooth modules for handling the Bluetooth connection:

    • Management - This module handles scanning and advertising, in addition to general initialization, controller configuration, and transfer of DFU images.

    • Stream - This module handles the setup and transfer of audio in the Bluetooth LE Audio context. It includes submodules for CIS (unicast) and BIS (broadcast).

    • Renderer - This module handles rendering, such as volume up and down.

    • Content Control - This module handles content control, such as play and pause.

  • Application-specific custom modules:

    • Stream Control - This module handles events from the Bluetooth modules and buttons, receives audio from one module, and forwards the audio data to the next module.

      • Currently, each of the four main device types uses a separate stream control file:

        • CIS gateway (unicast client) - streamctrl_unicast_client.c

        • CIS headset (unicast server) - streamctrl_unicast_server.c

        • BIS gateway (broadcast source) - streamctrl_broadcast_source.c

        • BIS headset (broadcast sink) - streamctrl_broadcast_sink.c

    • FIFO buffers

    • Synchronization module (part of I2S-based firmware for gateway and headsets) - See Synchronization module overview for more information.

Since the application architecture is uniform and the firmware code is shared, the set of audio modules in use depends on the chosen stream mode (BIS or CIS), the chosen audio inputs and outputs (USB or analog jack), and if the gateway or the headset configuration is selected.

Note

In the current version of the application, the bootloader is disabled by default. Device Firmware Update (DFU) can only be enabled when Building and programming using script. See Configuring FOTA upgrades for details.

Communications between modules

Communication between modules is primarily done through Zephyr’s Zephyr message bus (zbus) to make sure that there are as few dependencies as possible. Each of the buses used by the application has their message structures described in nrf5340_audio_common.h.

The application uses the following buses:

  • le_audio_chan - For handling LE Audio events from the Bluetooth stream modules, specifically unicast_client.c, unicast_server.c, broadcast_source.c, and broadcast_sink.c.

  • button_chan - For handling button events from button_handler.c.

  • bt_mgmt_chan - For handling ACL events from bt_mgmt.c.

  • volume_chan - For handling volume events from bt_rend.c.

  • cont_media_chan - For handling media events from content_ctrl.c.

The consumer functions for each of these buses are residing, for the most part, in the stream control files. volume_chan is an exception, with its consumer functions residing directly in hw_codec.c. The linking of producers and consumers is done in the stream control files.

USB-based firmware for gateway

The following figure shows an overview of the modules currently included in the firmware that uses USB:

nRF5340 Audio modules on the gateway using USB

nRF5340 Audio modules on the gateway using USB

In this firmware design, no synchronization module is used after decoding the incoming frames or before encoding the outgoing ones. The Bluetooth LE RX FIFO is mainly used to make decoding run in a separate thread.

I2S-based firmware for gateway and headsets

The following figure shows an overview of the modules currently included in the firmware that uses I2S:

nRF5340 Audio modules on the gateway and the headsets using I2S

nRF5340 Audio modules on the gateway and the headsets using I2S

The Bluetooth LE RX FIFO is mainly used to make audio_datapath.c (synchronization module) run in a separate thread. After encoding the audio data received from I2S, the frames are sent by the encoder thread using a function located in streamctrl_unicast_client.c, streamctrl_unicast_server.c, streamctrl_broadcast_source.c, or streamctrl_broadcast_sink.c.

Synchronization module overview

The synchronization module (audio_datapath.c) handles audio synchronization. To synchronize the audio, it executes the following types of adjustments:

  • Presentation compensation

  • Drift compensation

The presentation compensation makes all the headsets play audio at the same time, even if the packets containing the audio frames are not received at the same time on the different headsets. In practice, it moves the audio data blocks in the FIFO forward or backward a few blocks, adding blocks of silence when needed.

The drift compensation adjusts the frequency of the audio clock to adjust the speed at which the audio is played. This is required in the CIS mode, where the gateway and headsets must keep the audio playback synchronized to provide True Wireless Stereo (TWS) audio playback. As such, it provides both larger adjustments at the start and then continuous small adjustments to the audio synchronization. This compensation method counters any drift caused by the differences in the frequencies of the quartz crystal oscillators used in the development kits. Development kits use quartz crystal oscillators to generate a stable clock frequency. However, the frequency of these crystals always slightly differs. The drift compensation makes the inter-IC sound (I2S) interface on the headsets run as fast as the Bluetooth packets reception. This prevents I2S overruns or underruns, both in the CIS mode and the BIS mode.

See the following figure for an overview of the synchronization module.

nRF5340 Audio synchronization module overview

nRF5340 Audio synchronization module overview

Both synchronization methods use the SDU reference timestamps (sdu_ref) as the reference variable. If the device is a gateway that is using I2S as audio source and the stream is unidirectional (gateway to headsets), sdu_ref is continuously being extracted from the LE Audio Controller Subsystem for nRF53 on the gateway. The extraction happens inside the unicast_client.c and broadcast_source.c files’ send function. The sdu_ref values are then sent to the gateway’s synchronization module, and used to do drift compensation.

Note

Inside the synchronization module (audio_datapath.c), all time-related variables end with _us (for microseconds). This means that sdu_ref becomes sdu_ref_us inside the module.

As the nRF5340 is a dual-core SoC, and both cores need the same concept of time, each core runs a free-running timer in an infinite loop. These two timers are reset at the same time, and they run from the same clock source. This means that they should always show the same values for the same points in time. The network core of the nRF5340 running the LE controller for nRF53 uses its timer to generate the sdu_ref timestamp for every audio packet received. The application core running the nRF5340 Audio application uses its timer to generate cur_time and frame_start_ts.

After the decoding takes place, the audio data is divided into smaller blocks and added to a FIFO. These blocks are then continuously being fed to I2S, block by block.

See the following figure for the details of the compensation methods of the synchronization module.

nRF5340 Audio's state machine for compensation mechanisms

nRF5340 Audio’s state machine for compensation mechanisms

The following external factors can affect the presentation compensation:

  • The drift compensation must be synchronized to the locked state (DRIFT_STATE_LOCKED) before the presentation compensation can start. This drift compensation adjusts the frequency of the audio clock, indicating that the audio is being played at the right speed. When the drift compensation is not in the locked state, the presentation compensation does not leave the init state (PRES_STATE_INIT). Also, if the drift compensation loses synchronization, moving out of DRIFT_STATE_LOCKED, the presentation compensation moves back to PRES_STATE_INIT.

  • When audio is being played, it is expected that a new audio frame is received in each ISO connection interval. If this does not occur, the headset might have lost its connection with the gateway. When the connection is restored, the application receives a sdu_ref not consecutive with the previously received sdu_ref. Then the presentation compensation is put into PRES_STATE_WAIT to ensure that the audio is still in sync.

Note

When both the drift and presentation compensation are in state locked (DRIFT_STATE_LOCKED and PRES_STATE_LOCKED), LED2 lights up.

Synchronization module flow

The received audio data in the I2S-based firmware devices follows the following path:

  1. The LE Audio Controller Subsystem for nRF53 running on the network core receives the compressed audio data.

  2. The controller subsystem sends the audio data to the Zephyr Bluetooth LE host similarly to the Bluetooth: HCI RPMsg sample.

  3. The host sends the data to the stream control module.

  4. The data is sent to a FIFO buffer.

  5. The data is sent from the FIFO buffer to the audio_datapath.c synchronization module. The audio_datapath.c module performs the audio synchronization based on the SDU reference timestamps. Each package sent from the gateway gets a unique SDU reference timestamp. These timestamps are generated on the headset controllers (in the network core). This enables the creation of True Wireless Stereo (TWS) earbuds where the audio is synchronized in the CIS mode. It does also keep the speed of the inter-IC sound (I2S) interface synchronized with the sending and receiving speed of Bluetooth packets.

  6. The audio_datapath.c module sends the compressed audio data to the LC3 audio decoder for decoding.

  7. The audio decoder decodes the data and sends the uncompressed audio data (PCM) back to the audio_datapath.c module.

  8. The audio_datapath.c module continuously feeds the uncompressed audio data to the hardware codec.

  9. The hardware codec receives the uncompressed audio data over the inter-IC sound (I2S) interface and performs the digital-to-analog (DAC) conversion to an analog audio signal.