Keyword Detection integration

Keyword Detection (KD) a.k.a Voice Activation a.k.a Sound Trigger is a feature that allows triggering activity of speech recognition engine depending on successful detection of a predefined keyphrase(keyword). The Primary motivation of offloading the keyphrase detection algorithm to the embedded processing environment (i.e. dedicated DSP) is the reduction of system power consumption while listening for an utterance.

The terms “Voice Activation” and “Keyphrase Detection” are often used interchangeably to describe end to end system level use cases that include:

  • Keyphrase detection algorithm
  • Keyphrase enrollment (parametrization of keyphrase detection algorithm)
  • Management of an audio stream that is used to transport utterances
  • Steps made to reduce system level power consumption
  • System wake up on keyphrase detection

The term “Keyphrase Detector” component typically is used to identify a firmware processing component that implements an algorithm for detection of a keyphrase in an audio stream.

The speech audio stream is used to indicate that the stream is primarily used to deliver data to automatic speech recognition (ASR) algorithm. The voice audio stream typically indicates that the recipent of audio data is a human.

Depending on system level requirements for the keyphrase detection algorithm and the speech recognition engine, different policies for keyphrase buffering and voice data streaming may be applied. This document covers the reference implementation available in SOF. The following sections cover functional scope.


Currently SOF implements the Keyphrase Detector component with a reference trigger function that allows testing of E2E flow by detecting a rapid volume change.

Timing sequence


scale max 1024 width

footer: timeline not to scale 
robust "Speech Application" as App
concise "Audio Stream" as Audio

App is idle
Audio is "Preceeding"

0 is idle
+225 is "Processing"

0 is Keyphrase
@180 <-> @+40 : {detection & burst transmission lag}
Audio@+25 -> App@+25 : notification
200 is Command
+200 is Following

Figure 47 Basic diagram for a timing sequence

A keyphrase is preceeded by a period of silence and is followed by a user command. In order to balance power savings and user experience the host system (CPU) shall be activated only if a keyphrase is detected. To reduce the number of false triggers for user commands, the keyphrase can be sent to the host for additional (2nd stage) verification. This requires the FW to buffer the keyphrase in a memory. Keyphrase transmission to the host shall be as fast as possible (faster than real-time) to reduce latency for system response.

End-2-End flows


scale max 1024 width

participant "Userspace component" as usr
participant "Audio driver" as drv
participant "FW infrastructure" as fw
participant "Data transfer to Host" as dma
participant "Keyword detection algorithm" as kda
participant "Data transfer to DSP" as gpdma

box "Linux User/Kernel space" #LavenderBlush
	participant usr
	participant drv
end box

box "DSP" #LightBlue
	participant fw
	participant dma
	participant kda
	participant gpdma
end box

activate fw

drv -> fw : Setup audio topology \n (Speech Capture & Keyword Detection pipes)
usr -> drv : Prepare & Open PCM capture \n(snd_pcm_open/snd_pcm_hw_params)
drv -> fw : Stream Open & Preparation
drv -> fw : HW Params
group optional (depends on keyword detection algorithm implementation)
 usr -> drv : Send keyword detection algorithm parameters \n (snd_ctl_elem_tlv_write)
 drv -> fw : Send keyword detection algorithm parameters
 fw -> kda : Send keyword detection algorithm parameters

drv ->drv : DAPM power up event
drv -> fw : HW Params for Keyphrase Detection Pipeline
usr -> drv : Trigger start (alsamixer)
drv -> fw : Keyword detection algorithm & buffer manager triggered

fw -> fw : Keyphrase Buffer Manager \nin acquisition mode
fw -> gpdma 

activate gpdma

fw -> kda : keypharse detection enabled

activate kda

usr -> drv : Trigger start (snd_pcm_read)

note over usr
Speech application indefinitely 
waits for data.
end note 

ref over usr, drv, fw , gpdma, kda, dma  
Speech Capture pipeline is not transmitting data to Host system
Host system may enter the low power state
end ref

loop keyword detection algorithm \nexecuted on DSP
 kda <- gpdma 

hnote over kda : keyword is detected

fw <-- kda : FW event on keyword detection
fw -> kda : keyword detection disabled

deactivate kda 

fw -> fw : Keyphrase Buffer Manager \nin drain mode
drv <-- fw : notification on keyword detection
'drv -> fw : enable data transission to Host \n(Capture[Speech] pipeline to Host is running)
usr <-- drv : notification on keyword detection (optional)
gpdma -> dma 

activate dma

ref over dma 
Sending a burst of historic data (approx.2s) 
with detected keyword for
second stage verification on host.
end ref

gpdma <-- dma 

deactivate dma

usr <-- drv : snd_pcm_read completed 

fw -> fw : Keyphrase Buffer Manager \nin passthrough mode 

loop Realtime capture
 usr -> drv : snd_pcm_read
 gpdma -> dma 
 activate dma
 gpdma <-- dma 

 deactivate dma
 usr <-- drv : snd_pcm_read completed 

ref over usr 
User space optionally performs second stage keyword verification.
end ref

usr -> drv : Trigger stop (alsamixer)
drv ->drv : DAPM power down event
drv -> fw : Stop Keyphrase Detection algorithm pipeline
usr -> drv : Trigger stop (snd_pcm_drop / snd_pcm_free)
drv -> fw : Close Speech capture stream
fw -> gpdma 

deactivate gpdma

ref over usr, drv, fw , gpdma, kda, dma  
The flow can be repeated for next user command starting from snd_pcm_open()
end ref

deactivate fw

Figure 48 E2E flow for SW/FW components

The fundamental assumption for the flow is that the keyphrase detection sequence is controlled by the user space component (application) opening and closing speech audio stream. The audio topology setup needs to happen before the speech stream is opened. There is an optional sequence to customize the keyword detection algorithm by behavior by sending run-time parameters. The stream open and preparation phase covers sending HW parameters to DAI and passing configuration parameters from the topology to FW components. The DAPM events handlers are used to control a Keypharse Detector node of the FW topology graph by the audio driver. Once the keyphrase is detected a notification is sent to the driver. At the same time an internal event in FW triggers draining buffered audio data in burst mode to the host. Once the buffer is drained the speech capture pipeline starts to work as a passthrough capture until it is closed by user space application.

FW topology


scale max 1024 width

skinparam rectangle {
   backgroundColor<<dai>> #6fccdd
   backgroundColor<<dma>> #f6ed80
   backgroundColor<<stream>> #d6d6de
   borderColor<<stream>> #d6d6de
   borderColor<<ppl>> #a1a1ca

   backgroundColor<<event>> #f05772
   stereotypeFontColor<<event>> #ffffff
   fontColor<<event>> #ffffff

   backgroundColor<<cpu>> #f0f0f0

together {
rectangle "MIC HW" as dmic #DDDDDD

rectangle "Speech Capture Pipeline" as ppl_1 <<FW pipeline >>{
 rectangle "MIC DAI" as dai_1 <<dai>>
 rectangle "Keyphrase Buffer Manager" as kpb
 dai_1 -> kpb : 2ch/16kHz/16bit
 rectangle "Host" as host


rectangle "Keyphrase Detector Pipeline" as ppl_2 <<FW pipeline >>{
 rectangle "Channel selector" as sel
 rectangle "Keyphrase detection algorithm" as wov
 sel -> wov : 1ch/16kHz/16bit

rectangle "Host System" as hsys {
 rectangle "Host Memory" as hmem #DDDDDD

dmic -> dai_1
kpb -> host
kpb -> sel : 2ch/16kHz/16bit
host -> hmem : 2ch/16kHz/16bit
wov ..> kpb : FW events
wov ..> hsys : FW notifications

Figure 49 Basic diagram for FW components topology

The diagram provides an overview of FW and HW components that play a role in keyphrase detection flows. The components are organized in pipelines:

  1. Speech capture pipeline
    1. DMIC DAI configures hw interface to capture data from microphones.
    2. The Keyphrase Buffer Managrer is responsible for managing the data captured by microphones. This includes control of an internal buffer for incoming data and routing of incoming audio samples. The audio buffer with historic audio data is implemented as a cyclic buffer. While listeining to a keyphrase the component stores incoming data in an internal buffer and copies it to a sink that leads toward the keyword detector component. On successful detection of a keyphrase the buffer is drained during a burst transmission to a host. Once the buffer is drained it starts to work as a passthrough component on a capture pipeline.
    3. The host component configures transport (over DMA) to the host system. The component is responsible for transmitting from local memory (FW accessible) to remote (host CPU accessible) memory.
  2. Keyphrase detector pipeline
    1. The channel selector is responsible for providing a single channel on input to the keyphrase detection algorithm. The decision of which channel to select is made by the platform integrator. The component can accept parameters from a topology file.
    2. The keyphrase detection algorithm accepts audio frames and returns information if a keyphrase is detected. Note that the FW infrastructure can allow a FW event to be sent to the Keyphrase Buffer Manager component if keyphrase is detected. The component also sends a notification to the audio driver and implements large parameters support.

KPBM state diagram

The state diagram below presents all possible keyphrase buffer manager states and valid relationships between them.

[*] --> KPB_DISABLED:  Start \nor\n [IPC] free \nmessage \nfrom either  state
KPB_DISABLED: Starting state of KPB - \nNo action has been taken yet
KPB_DISABLED--> KPB_CREATED: [IPC] \nnew component

KPB_CREATED : New KPB component has been created
KPB_CREATED -[#0000FF]-> KPB_CREATED : [IPC] \nreset

KPB_PREPARING: Prepare Key Phrase Buffer component.

KPB_STATE_RUN: KPB is prepared and ready.
KPB_STATE_RUN---> KPB_STATE_INIT_DRAINING: [EVENT] \nkey phrase detected

KPB_STATE_BUFFERING: Buffer incoming samples in the \ninternal history buffer

KPB_STATE_INIT_DRAINING: KPB received detection event

KPB_STATE_DRAINING: KPB is draining internal history buffer \nto the client's buffer

KPB_STATE_RESETTING: KPB is preparing itself for the reset

KPB_STATE_RESET_FINISHING: KPB is finishing reset sequence

KPB_STATE_HOST_COPY: KPB is copying real time \nstream into client's buffer

Figure 50 Keyphrase buffer manager state diagram