Thread: Propeller Tricks & Traps (Last update 21 June 2007)

1. Cluso99,

Actually it's the "non-zero" condition that gets saved in bit 1. Getting the condition bits in there to begin with is a little more difficult, though. Referring to an earlier post,

Code:
```        muxnz   save,#2
muxc     save,#1

shr        save,#1 wz,wc,nr```

It'd be great to figure out a single instruction for saving both bits, but I don't think it's possible.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!

2. I'll be a little more specific...

My method is to be able to check and set c and z flags in one instruction. If, as in the spin interpreter, the bytecodes store the lower 2 bits (bit 1 and 0) and they are tested in the code using 2 instructions, in the getflags "call".

(Warning... the z flag is set=1 to indicate a zero condition, but it IS tested correctly as "if_z" - you just have to get your mind around this!)

Code:
```'the following stores the bytecode...
op       long  0-0    'bit 1 (0=nocarry, 1=carry), and bit 0 (0=zero, 1=nonzero); other bits vary
' 00 = gives nc and z flags
' 01 = gives nc and nz flags
' 10 = gives c and z flags
' 11 = gives c and nz flags
cz_flags long  0-0    'bit 31 (0=nocarry, 1=carry), bit 30 (0=zero, 1=nonzero)
'---------------------------------------------------------------------

mov     cz_flags,op             '\ save flags into cz_flags
shl     cz_flags,#30            '/ and all other bits become "0"
'---------------------------------------------------------------------

shl     cz_flags,#1   wc,wz,nr  ' get c&z flags <-- single instruction to get flags
'---------------------------------------------------------------------```

The alternative example reverses the position of the c and z flags and keeps them in bits 1 & 0
Code:
```
Code:
'the following stores only the flags...
cz_flags long  0-0    'bit 1 (0=zero, 1=nonzero), bit 0 (0=nocarry, 1=carry); all other bits zero

' 00 = gives z and nc flags
' 01 = gives z and c flags
' 10 = gives nz and nc flags
' 11 = gives nz and c flags
'---------------------------------------------------------------------

shr     cz_flags,#1   wc,wz,nr  ' get c&z flags <-- single instruction to get flags
'---------------------------------------------------------------------
```

I hope this clarifies what I am trying to illustrate. Obviously, you can just test for either the z or c flag in the instruction. This means you no longer have to use 2 instructions to test for both flags (or a "call"). Don't forget the "nr" or you will overwrite your variable)

3. I get that, and our single instruction for restoring both flags from cz_flags (save in my case) is the same. What I'm wondering, though, is whether there's a way to save both flags with one instruction instead of the two MUXes in my above example. I don't think there is, but maybe someone's got a trick up his sleeve that I can't see.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!

4. ok, here is another trap that just got me and took me a while to figure out:

I had a small state machine where i'd modify the jump location, something like this

movs jump_label, #some_state_label_to_jump_to

...

jump_label jmp, 0-0

...

some_state_label_to_jump_to ....

first of all, one has to use movs to modify jmp location and not movd (using the latter would make sense until one studies documentation ;) )
second, the code above is WRONG - instead of 0-0, one has to use something like #0 (or #\$0), so the instruction code for jmp will be the one which uses an immediate value and not address to a variable. In short, the code above tries to jump to location which is STORED in #some_state_point_to_jump_to (which is of course wrong in thsi case), instead of just jumping to #some_state_point_to_jump_to

hope this will help somebody.

5. pems,

Thanks for that. It's always good to remember that the "#" isn't part of the source field itself — despite its placement in assembly code — but corresponds to a bit elsewhere in the instruction.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!

6. For jmp instructions, maybe this would be better while still identifying that the source would be changed. Then if a mistake is made and the source is not modified, the jmp would loop on itself instead of jumping to \$000... (gives a more predictable result)
Code:
```
jmp_mod  jmpret j_ret, \$-0   ' "\$-0"  (inirect) : means self modifying code - defaults to loop here

jmp_mod2 jmpret j_ret, #\$-0  ' "#\$-0" (direct)  : means self modifying code - defaults to loop here```

Trying to chase down this sort of bug is a nightmare. At least if the code loops in the one spot it just hangs. If you jmp to 0 all kinds of weird things happen and you end up looking for a different kind of bug that usually takes a lot longer to find. Just my opinion...

7. Another option would be to jump indirect via a long (named "NextState" perhaps) located after the program and which defaults to a piece of debug code. This long would then receive the address of each next-state routine in turn. It uses extra memory, though, but the indirect jump doesn't add to the execution time. As a bonus, you don't necessarily have to funnel jumps to the next state through a single JMP instruction. Multiple jumps referring to NextState can occur anywhere in the program.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!

8. Excellent point Phil ! That makes a lot of sense. (I have never written a state machine)

9. Here's a trick you can use for locating the precise time of an input edge, while still being able to time out if it doesn't show up.

Problem: The Propeller provides WAITPEQ and WAITPNE instructions to provide one-clock granularity for locating input edges. But if the edge never shows up, the instruction will hang. To keep it from hanging, you can poll for the edge in software, but the granularity suddenly changes from one clock to eight. Here's an example that waits for a high-to-low edge and times out if it doesn't show up:

Code:
```            mov     time,timeout         'Initialize timeout timer.

:loop       test    pinmask,ina wz       'Test input pin. Is it low?
if_nz djnz    time,#:loop          '  No:  Go back and check again.

mov     time,cnt             'Save cnt value. (At this point the z flag will indicate whether we timed out or not.)

...

timeout     long    80_000_000 / 10      '1/10 sec. @ 80MHz
time        res     1```

Solution: By enlisting the help of a counter, the granularity can be reduced back to one. In this example, the counter is set to count up by one every time it sees a low on the pin. By subtracting this count from the time, we get one-clock timing precision (to within a constant):

Code:
```            mov     ctra,ctra0           'Initialize ctra to count lows on pin.
mov     frqa,#1              'Make it count up by one.
mov     time,timeout         'Initialize timeout timer.
mov     phsa,#0              'Clear counter.

:loop       test    pinmask,ina wz       'Test input pin. Is it low?
if_nz djnz    time,#:loop          '  No:  Go back and check again.

mov     time,cnt             'Save cnt value. (At this point the z flag will indicate whether we timed out or not.)
sub     time,phsa            'Correct time for number of "lows" before reading cnt.

...

timeout     long    80_000_000 / 10      '1/10 sec. @ 80MHz
ctra0       long    %01100 << 26 | pinno 'Count lows on pinno.
time        res     1```

-Phil

10. This may be obvious to some and has been covered·indirectly with Hippy's word/long mixing example,
but it sure set me back a few hours. Say you want to save some hub memory by creating an array
of words rather than longs for storing word-wide information, then use a method to·compare a value.
If you don't explicitly cast the parameter in the body of the method, any comparison may fail as
the incoming parameter value may contain bits of an adjacent word. Example:

Code:
```VAR
word mylist[LISTSIZE]

PRI isListed(value) | n
repeat n from 0 to LISTSIZE-1
if mylist[n] == word[@value]  ' <-- Method can fail if you don't cast the argument.
return true
return false```

Added: The failing case is where the value being passed to isListed is the long return value of another method.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Post Edited (jazzed) : 1/31/2009 10:57:43 PM GMT

11. Steve,

This is true if you are passing data by reference, e.g. in_list := isListed(@valuearray + index). But if valuearray is declared as a word, then passing by value will work without the extra casting effort: in_list := isListed(valuearray[index]), assuming isListed is rewritten to work directly with a value and not an address.

-Phil

12. This is my first post. I got trapped (for 2 days) and finally got out of it, so I figured I would share. I hope this hasn’t been posted about somewhere else (because I could have used it so much sooner)
I was trying to pass information from one cog to another, not a new concept. I was pretty sure I didn’t need semephores (though I experimented with them) since I was only passing one long at a time (using to send status flags back and forth between cogs).
Here’s what I was doing (wrong):
One cog would be reading a variable from main memory (to see if there was any pending tasks in the form of a flag in the variable). I passed the variable using par and indexed appropriately.
Something like:
Code:
```:start    Mov    variable, par        ‘get address of variable
Rdlong    local_var, variable    ‘put the value of that address in local_var
…..(program)
Jmp    #:start            ‘do it all over again (check flags again)
Local_var    long    0```

My other program would toggle the data on that same memory location (just as a test, a test that failed). The code would do some waitcnt stuff, then toggle the data on that memory, then loop. The command I used for writing was in the form of:
Code:
`Wrlong        cogvar, mem_loc   ‘where the mem_loc is the same as variable in previous cog`

This actually worked fine for a while of execution (of which I watched in GEAR). The symptom that this cuased baffled me. I practically watched what every relevant thing was doing in GEAR to try and pinpoint what was going wrong. At around clock tick 25000 or so, the main memory would just clear itself, almost all of it, including the cog0 program itself. In the end, I really couldn’t find out what exactly was causing the issue. But I did do something to erradicate it.
Instead of having my shared value in main memory declared in the VAR section. I declared it in a DAT section. I’m sure someone else could have told me this, but the forums overwhelmed me, it felt like it would have taken longer to search for it then just start hacking it myself.
Anyone know why this happened exactly (or if I may still be doing something wrong and am just lucky for now)?

The moral of the story (so far), declare shared memory between cogs using DAT, not VAR

13. There's nothing wrong with using VAR variables in the way you describe. It's done all the time. The only catch is that you have to make sure that both cogs are working with the same variable, since VARs are replicated among instances of the same object. But if both cogs are started with a pointer to the VARiable from the same object instance that delares the VARiable, that's not a problem. I suspect there's something else going on in your program that caused the problem you saw and, as you say, you just got lucky when switching to a DAT variable. I would suggest starting a new thread and posting your entire program thre so others can help you debug it.

-Phil

14. XlogicX said...
The moral of the story (so far), declare shared memory between cogs using DAT, not VAR

Thats how I did it in my projects program.

http://forums.parallax.com/forums/default.aspx?f=21&m=376422&p=1&ord=a

Using DAT for variable space is a topic that DOES NOT get enough attention in the manual or education kit.

15. Here is a trick that comes with plenty of its own traps.

There have been a number of posts where people are using djnz to decrement a pointer, and i thought i would try taking this a little further(sorry to anyone who may have come up with this before me). One can use the line:

djnz :loop, #:loop

to jump and post decrement the source register. one can use the following two instructions to find whether an array in cog memory contains a value (a)

:loop cmp a, 0-0 wc, wz
if_b djnz :loop, #:loop

this first executes the compare between a and whatever address is loaded into 0-0, then it decrements the address( after, due to pipelineing), and then executes the compare between a and the register with the next lowest index.

This does not have that many uses where it would be beneficial, but i am using it in a a cog that has 4 nested loops that each run up to 80 times, so a single instruction can add seconds to the completion time.

some things that make this harder to use are:
- the array must be loaded in with the first data point having the highest address
-once it has completed its operations one must use the movs instruction, and add one to find the address that contained the correct data.

Here is some working demo code for it that turns pin 5 high to demonstrate that it works:
Code:
```CON
_clkmode      = xtal1 + pll16x
_xinfreq      = 5_000_000
pub main
cognew (@work,0)
repeat
dat

org 0
work          movs :loop, #:d   'this gives the address after the main array
nop
djnz :loop, #:loop    ' this jumps past the data, it is necessary to start with a djnz so the first cycle is not repeated twice
long 0           ' this fulfills the condition in :loop, and prevents it from doing strange things
long 1,2,3,4,5,6,7,8
:d            long 9
:loop         cmp a, 0-0 wc, wz
if_b    djnz :loop, #:loop   ' this works as a jump and post decrement the source register operation
movs b, :loop
add b, #1   ' this corrects for the post decrement of :loop
cmp b, #3 wz, wc    'these two lines verify that it has not stopped due to the zero at the end
if_e  jmp #noend
movs :loop2, b
nop
:loop2        mov c, 0-0   ' this retrieves the number from the array
mov b, #1
shl b,c
mov dira, b
mov outa, b
noend         jmp #noend   ' prevents execution of further data

a long 5
b long 0
c long 0```

I hope this helps someone, or is at least interesting.

16. Things you never wanted to know but were forced to find out

• ina/phsx/cnt are sampled during IdSDeR (Call for Clarity from Chip or Beau)
• the 1st hub slot starts 4 cycles into cog execution
• the minimal waitcnt adjustment is 9 (or 5 if you re-order the instructions)

Code:
```mov     cnt, cnt        ' current time
waitcnt cnt, increment  ' run-through```
Code:
```mov     cnt, #5{14}     ' minimal advance (to avoid full range delay)
add     cnt, cnt        ' current time
waitcnt cnt, increment  ' run-through```
• an instruction placed at \$1FF (and then jumped to) isn't executed (probably due to rollover to \$000)
- Update: execution is possible with a phase jump
• never enable a counter with frqx != 0, you might get more than you asked for (counter issue)
• jmp phsx (counter enabled, frqx != 0) (jumping counter)
- base = phsx at 2nd cycle of jmp (IdSDeR)
- jump target: base+2*frqx, next instruction at base+3*frqx+1
• jmp cnt (like jmp phsx, frqx = 1)
• jmpret dst, phsx|cnt will store \$+1 as return address
• phase jumping to a normal jmpret will cause base+3*frqx+1 being stored as return address
• phase jumping to any instruction where base+3*frqx+1 == 0 will abort the instruction (nop)
• waitvid colors, pixels wr will perform colors += pixels
• waitpne target, mask wr will perform target += (mask + 1)
• waitpxx/waitcnt structure is IdSDwm.R
- w is the stage that is propagating the condition
- m is the earliest match point that the WAIT circuit can test
- afterwards (after the match), the ALU and the rest of the chip wakes up and performs, followed by R
• starting cogs N+1..N+3 misses their hub window and is therefore slower than N+4..N+7
- based on measuring cnt difference just before sync'd coginit and first instruction in cog
- core overhead 8K+4 (8+511*16+8+4), extra +0..+7: \$08 \$0A \$0C \$0E \$00 \$02 \$04 \$06
- additional cycles: ((T - L + 4) & %111) * 2; (T)arget, (L)auncher
• it may be impossible to catch DUTY pulses generated by a counter within the same cog using waitpeq (e.g. Demoboard, pin16/cog4)
• same issue applies to DUTY coupled counters (pulse width may be seen ranging from 0..2), so also this comment regarding pulse sampling
• wc effect for mov[dis] and jmp[ret] is unsigned borrow
• wc effect for rdxxxx/wrxxxx, clkset, cogid and cogstop is the same as for coginit (no cog free)
• PLL output is sync'd to the rising edge of the feeder NCO (starting with the low half of the cycle). Note that /32, /64 and /128 can lock to any rising edge (relative to NCO start), e.g. /128 can lock to 8n+0 .. 8n+7 meaning that starting a PLL in two different cogs will - despite NCOs being in sync - not necessarily sync the PLL output.

Useful code fragments

Code:
```' phsx read-modify-write issue

mov     temp, phsx     ' temp := counter[phsx]
shr     temp, #1       ' temp >>= 1
mov     phsx, temp     ' update shadow and counter

' is equivalent to

mov     phsx, phsx     ' shadow[phsx] := counter[phsx]
shr     phsx, #1       ' r: operate on shadow[phsx]
' w: update shadow and counter```

17. Re: Propeller Tricks & Traps (Last update 21 June 2007)

Want to communicate immediately between cogs without using hub ram or waiting for the hub cycle? Use the I/O pins. Even if a cog sets a pin to an output, you can still read it as an input. All the cogs are wired to all the pins 100% of the time, like a bus kinda.

example

Code:
```PRI pincounter
dira[3..0]~~
repeat
waitcnt(clkfreq+cnt)
outa[3..0]++```
now run pincounter in cog 1, and some other routine in cog2. Cog1 will be counting in binary on pins 3..0. When our other routine in cog2 wants to know how far cog1 has counted, all it has to do is
Code:
`value:=ina[3..0]`
and it should not have to wait for the hub timer.

I don't know, am I full of BS? This appears to work when I build/code it, the question is is it really immediate, or even practical.

Of course now you have to be careful with latency, collisions etc but you could potentially use a single pin as a lock bit, just like the regular locks in the hub. Obviously you're limited to a maximum of 32 bits in parallel transmitted in this way, but you could have one cog serially piping data to another cog using just a few pins and neither of them have to wait for anything.

That said, this is probably only useful for byte size values or smaller. If you're pumping data serially, it will take at least as many clocks to receive all 16 bits as it would to just wait for the hub timer to come back around, unless you use 16 or all 32 i/o pins. But using 1-8 pins, it appears you can beat the hub timer and communicate these values immediately.

Why anybody would need to do something like this, I don't know, but it is an interesting feature.

Gonna have to try this out with ASM and pay close attention to the timings.

18. Re: Propeller Tricks & Traps (Last update 21 June 2007)

Phipi, here's another trick I submit for checking limits at compile time.

If you have some calculations/settings in a CON section you can use a DAT section to verify limits or several conditions in the following way:

DAT
ORG 1
FIT logical_condition '(whatever logical condition you wan to be TRUE)
FIT logical_condition_2 'yet another condition to be met
FIT logical_condition_3 'as many as you need as long as it evaluates to TRUE/FALSE

That way if your logical condition(s) evaluates to FALSE (== 0) you get a compile error stating that you exceeded the allowed size but what really means is that you missed the condition.

This let's the programmer state the conditions that must be met for a correct configuration of constants in the object and rely on the compiler (not the user) to catch them.

I don't know if this has been discussed before but I found this feature handy for my needs.

Best regards
Alex

19. Re: Propeller Tricks & Traps (Last update 21 June 2007)

'Neat trick, Alex! Thank you for posting it!

-Phil

20. Re: Propeller Tricks & Traps (Last update 21 June 2007)

That could be useful actually. Thanks, Alex.
For example, I have notes on how people should select pins for my FlashPoint, but having it give an error if they do it wrong is nice.

Does the Prop tool automatically scroll to the line that gives the error?