PDA

View Full Version : 368 clock cycles???



Philldapill
01-12-2008, 08:54 AM
I'm working on something that requires very fast code, but since I don't know assembly much at all, I decided to see how long certain bits of code take. I widdled the code down smaller and smaller because it didn't make sense until I ended up with:


PUB TestTime
· count1 := cnt
· count2 := cnt
· TV.dec(count2-count1)

The output is 368. I'm only assuming that this means it took 368 clock cycles between assigning count1, and finishing the assignment of count2. The TV object is just a simple ouput display. Does it really seem like it could take 368 clock cycles to do all this???

Philldapill
01-12-2008, 08:58 AM
If this IS the case... whoa.... spin IS slow. I need to learn ASM!

stevenmess2004
01-12-2008, 09:07 AM
Yep, spin is around 40 to 80 times slower than assembly. This is because it is interpreted byte code and everything is stored in main memory. Having said that, it will probably do most things you want with maybe a few assembly routines for drivers for different devices. Remember that 368 clock ticks is only 386/80,000,000 = 4.8 us.

Philldapill
01-12-2008, 09:13 AM
Remember, that's 4.8us JUST to assign the two counts! That's not including the time it takes to do the subtraction and output to TV. Amazing... I sure do need to learn assembly. I can see why so many objects only use spin for initialization.

stevenmess2004
01-12-2008, 09:17 AM
But thats 1/4.8=208,333 assignments in a second. http://forums.parallax.com/images/smilies/smile.gif

But yes if you need any to run at a high speed then you will need assembeler.

Steven

deSilva
01-12-2008, 09:36 AM
The main advantage of assembly code is that you are able to do some very specific things in a shortcut way. SPIN needs to consider all things in a more general way, always prepared for the worst case.
Take your assigment:


MOV reg, CNT
WRLONG reg, cnt1


would correspond to the SPIN action; this will take 26 ticks worstcase which is "only" 15 times fster than SPIN.
But no machine code programmer would do this. He would rather reserve a register (a COG cell) for "cnt1" and just code


MOV cnt1, CNT


which will take 4 ticks only and now will be 90 times faster than SPIN

Paul Baker
01-12-2008, 09:58 AM
deSilva has hit the nail on the head, for a spin assignment there is the opcode fetch, operand fetch, then storing the result. Thats at least 3 hub accesses which occur best case every 16 clock cycles but it is much more likely the window is missed on one or more which means even more time. Another major factor is the stack, I don't know if it is involved in the assignment operator but most mathmatical operations use it which adds two hub accesses for every item·(once to push another to pop) and you can see how the hub accesses start to really add up, and with only enough time to perform two additional assembly instructions between hub accesses before the next hub access window is missed things really start to stretch the timeline.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker (mailto:pbaker@parallax.com)
Propeller Applications Engineer
[/url][url=http://www.parallax.com] (http://www.parallax.com)
Parallax, Inc. (http://www.parallax.com)

Post Edited (Paul Baker (Parallax)) : 1/12/2008 2:03:18 AM GMT

Mike Green
01-12-2008, 10:20 AM
This discussion of assembly vs. any kind of "high level" code, whether interpreted or compiled to assembly language, but complete with lots of library routine calls to do things not available in simple assembly instruction sequences ... is an old one that goes back to the first Fortran compiler.

The truth is that any program is a combination of simple things that have to be done quickly and repeatedly and much more complex things that are done occasionally and without much regard to time (within reason).· With the Propeller, the fast things are some kinds of I/O (like video and high speed serial) and some kinds of repetitive computations (like DSP type stuff).· Pretty much anything else can be done more slowly and space starts to become important rather than speed.· Spin is a better fit there and the use of appropriately sized structures rather than using longs for everything.

The trick is to place the dividing line properly.· Sometimes you can only find out how to implement something by getting it to work slowly (in Spin here) and measuring how long parts of the process take, then optimizing them one by one to see if that makes a difference.· There have been many case studies where what was believed to be the bottleneck in a program was not that at all ... It was something else.· You want to measure what's actually going on before committing to a difficult optimization process for something that may get thrown away later.

hippy
01-12-2008, 10:35 AM
In fact it's probably worse than Paul states. Six bytes of bytecode so that's probably six hub accesses, two pushes and two pops, that's four more hub accesses, so 10 in total. and that doesn't include calling routines to determine where to get what to put to stack, where to pop to, and all the moving between registers and hubs to do all this, and of course decoding what the bytecode meant in the first place.

PASM is undoubteldy fast (20MIPS), Spin is slower, but it's all relative. Take a 'normal' single accumulator architecture running at 1MIPS ...




LDA CNT
STA count1
LDA CNT
STA count2





Spin is a closematch for that, so with all that Spin needs to do in the background it is still comparable to that 1 MIPS processor, and you have up to eight of those processors.

So fast or slow, it all depends on what you compare it with. For example, PASM runs like a snail nailed to the ground if you compare it to some things.

deSilva
01-12-2008, 10:51 AM
I am just experimenting with some DeltaSigma routines in SPIN http://forums.parallax.com/images/smilies/smile.gif This is doable by the ingenious working of the timers/counters..

In fact machine language is "slow" compared to the doings of hardware. A computer is an interpreter as much as SPIN is - fast, but slow wrt to the "real world" of electronics...

The speed-down is a factor of six with the Propeller machine code, mitigated by the inleaved instruction execution to four...

stevenmess2004
01-12-2008, 10:55 AM
It would be interesting to see if some hand coded SPIN op-codes would be faster. But I'm not trying http://forums.parallax.com/images/smilies/smile.gif PASM and normal Spin is enough for me.

Phil Pilgrim (PhiPi)
01-12-2008, 11:30 AM
stevenmess2004 said...
It would be interesting to see if some hand coded SPIN op-codes would be faster.

In cases where the compilation is not optimized but could be, hand-coded byte codes would undoubtedly be faster. But plain vanilla stuff like read a timer, assign to variable? I rather doubt it.

-Phil