Heap sharing

The memory heap can be:

local - used exclusively by a single DSP core,
shared - higher level memory shared across all DSP cores

scale max 1024 width

node "DSP Core #0 Memory Block" as core_0 {
node "Application Heap (local)" as app_0 #lightgreen {
component "Pipelines @Core #0" as ppl_0
component "LL Modules & Tasks @Core #0" as ll_0
component "DP Modules & Tasks @Core #0" as dp_0
}

node "Application Heap (shared)" as app_shared_0 #lightyellow {
component "Shared buffers"
}

node "System Heap (shared)" as sys_0 #lightblue {
component "Devices"
}
}

ppl_0 -[hidden]down-> ll_0
ll_0 -[hidden]down-> dp_0

node "DSP Core #1 Memory Block" as core_1 {
node "Application Heap (local)" as app_1 #lightgreen {
component "Pipelines @Core #1" as ppl_1
component "LL Modules & Tasks @Core #1" as ll_1
component "DP Modules & Tasks @Core #1" as dp_1
}
}

ppl_1 -[hidden]down-> ll_1
ll_1 -[hidden]down-> dp_1 — Figure 60 Memory Heaps

Note

Introduction of MMU will require a separate local application heap per isolated domain.

L1 Cache Coherency

NOTE: This section applies to Intel systems without L1 cache coherency

A local heap is used exclusively by a single DSP core. Therefore operations on the allocated memory buffers do not require explicit L1 cache operations nor data cache alignment.

All operations performed on a local heap can be executed by the associated DSP core only. The move-to-another-core operation is not permitted for allocated buffers.

A shared heap can be configured in two ways:

To provide uncache aliases of buffer addresses to the clients,
To provide cacheable addresses to the clients.

Option #1 is preferred, since does not require explicit L1 cache operations when memory is accessed by a DSP core. However, all operations directly access L2+ memory therefore it is not suitable for a low latency high performance data processing case.

Option #2 provides better performance but requires explicit L1 cache operations, which are difficult to maintain and validate, as well as data cache alignment for both client buffers and their descriptors, which creates an overhead. This configuration should be avoided if possible unless a coherent API is available to share the data.

However, a one important exception to the shared memory accessed through uncached alias is a data buffer connected between processing components running on different cores. Locking and cache operations price could be payed to get much better performance of accessing the data in the buffer which may be a significant part of light weight LL processing modules DSP cycle budget.

Accessing Shared Memory Pool

The data structures needed to manage shared memories are initialized by the primary core, structure location in memory map is known at the build time and API is protected by the mutex.

The mutex uses atomic operation behind and all processors co-managing this memory heap must support atomics.