SPIN Speed

Dan Moline · 2008-04-28 04:00

Hi, everyone. Can someone help me understand the propeller just a bit better?
My understanding is this:

If you were to only have a single SPIN program; that program resides in RAM and it would be interpreted from COG 0.
But COG 0 only has access to the HUB 1/8 of the time.

Does that mean if there were a propeller with a single COG, that this program would run 8X faster than the propeller? Since it wouldn't need to access 7 other COG's with no programs associated with them?

In other words, as the propeller spins, empty COG's tend to slow down the execution of programs loaded into the other COG's. Am I making sense?

I'm not dissing the propeller. I think it's extraordinary. I'm just trying to test my understanding.

Thanks,
Dan

Mike Green · 2008-04-28 04:13

It doesn't really work that way. Each cog has access to the hub once every 16 clock cycles. The actual instructions (assembly language) reside in a 512 word (2K byte) RAM that is unique to the cog. As long as that cog doesn't access the main (hub) RAM, it executes at full speed with most instructions taking 4 clock cycles (50ns with an 80MHz clock). Once the cog tries to access the main RAM, it has to wait until its access slot comes up. This can cause a wait of up to 15 clock cycles.

Spin programs are interpreted and reside in the main RAM. The interpreter is optimized so that some of the main RAM accesses are synchronized so that, once the first one occurs with its variable delay, the subsequent ones occur without a delay. In addition, other useful instructions are executed between the main RAM accesses so there's no additional time used waiting for the hub slot to become available.

Dan Moline · 2008-04-28 04:34

Thanks for the response, Mike.

My·understanding is·that the interpreter resides in the COG and the SPIN program resides in HUB RAM. Does that mean that program interpretation only occurs 1 out of every 16 clock cycles?

I did a very simple experiment. I compared a loop in SPIN versus a loop in assembly. (Not knowing what else to do, I just included a dummy statement "temp := 1"· inside the loop for no real reason). "value" in the PASM = 1,000,000.

SPIN

· repeat i from 0 to 1_000_000
··· temp := 1

PASM

:loop·· add temp, #1
········ djnz value, #:loop

The difference in speed (using a watch) was about 120:1. Is that difference caused because the assembly code is in the COG and running·100% of the time wheras the SPIN program is waiting for access to the HUB every 16 clock cycles?

Thanks for your patience. My background is RF ... not microcontrollers.

Regards,

Dan

Mike Green · 2008-04-28 04:55

Any interpreter has a lot of necessary overhead. Interpretive instructions (bytecodes) have to be fetched from (in this case) hub memory. They have to be decoded. Their operands have to be fetched (again from hub memory) and the actual instructions have to be executed. Some instructions take more time than others. For example, the hardware does not have a multiply or division instruction and these have to be done by subroutine. A ratio of 100:1 is not unusual for an interpreter and has little to do with the program and data being in the hub memory. Again, the interpreter can be (and is) optimized to minimize the waiting time for hub accesses and to accomplish useful work between hub accesses. For example,

:loop  rdlong  temp,pointer1
         add      pointer1,#4
         add       temp,base
         wrlong temp,pointer2
         add      pointer2,#4
         djnz     count,#:loop

This example copies data from one array of 32 bit values to another adding a fixed value to each array element. It uses every hub access slot taking 32 clock cycles per loop. The only access that the program has to wait for more than one clock cycle is the first read for the first time through the loop.

hippy · 2008-04-28 10:25

Dan Moline said...
Does that mean if there were a propeller with a single COG, that this program would run 8X faster than the propeller? Since it wouldn't need to access 7 other COG's with no programs associated with them?

In other words, as the propeller spins, empty COG's tend to slow down the execution of programs loaded into the other COG's. Am I making sense?

Each Cog runs full-speed but has to wait for its cyclic access to the Hub memory ( where Spin programs and data is stored ). It will get its access at the same rate no matter what any other Cog is doing, so Spin execution is no slower when 8 Cogs are in use or when just one is.

The rotating Hub offers each Cog in turn access to Hub memory whether it wants it or not. The cyclic rotation doesn't care if access is taken or not, but will always offer it every time.

If there were a single Cog version and it didn't use such a rotating Hub access scheme ( as it wouldn't need it ), access to Hub memory would always be available so some code running in Cog would be quicker, but only that accessing Hub memory. The Spin Interpreter makes a fair number of accesses to Hub memory so may be slightly faster but not necessarily. A Cog doesn't get slowed down waiting for Hub access if it asks for access at the right time.

A single Cog version running a Spin Interpreter would not run at 8 times the speed. It may be a bit faster, but probably not much.

Dan Moline · 2008-04-28 12:29

Thanks for the replys. It helps alot.

Regards,

Dan

SPIN Speed

Comments