Understanding the performance costs of hub execution.

evanh · 2022-09-28 13:14

HubRAM data accesses will be bigger negative impact than hubexec code because hubexec has the FIFO providing substantial code prefetching.

That said, a small inner loop is still costly for hubexec. Each branch is a reload of the FIFO.

Christof Eb. · 2022-09-28 16:03

@brianh said:

Hi. I'm trying to understand this 6x performance difference in the context of my original question, which was mainly about code execution in cog RAM vs. Hub RAM. But, of course I see now, that where the variables are stored also matters.

So, just for my own edification, this benchmark shows that executing code in cog RAM (not Hub RAM), with variables also stored in cog RAM, is 6x faster than if the variables are stored in Hub RAM, right? But what if I execute the code from Hub RAM while having my variables stored in cog RAM?

Do not take this factor 6 too literally. This number applies for this example and for code compiled by FlexProp. Still the effect of random data access is big, because the sequence is not following the egg beater order. With the do ... while it seems even to be factor 10 here.
For me there are two important aspects:
1. The FlexProp compiler will be fast enough to replace much asm code. Cog exec will have ~2 * speed vs Hub exec.
2. Global hub variables can be a massive break and you cannot catch up using Cog exec. The speed loss can vary, if the code or the addresses vary.
Christof

Understanding the performance costs of hub execution.

Comments