Understanding Assembler

Kaio · 2007-06-18 18:51

Jan,

1) If the methods are never calling each other, then I suggest to use variant c). You need only one variable for all subroutines.

The difference between b) and c) is, that you can use the res keyword only at the end of your code after long, word or byte declarations. Otherwise your code will not work properly and the compiler does not inform you about this.

So you could use only b) to declare a locale variable.

2) Your variables x1, x2 and so on are temporary variables. Therefore you should name these as tmp1 and so on, while x is more used as a variable of coordinates. So it would be better readable for others.

The best goal should be to minimize your code. So when you not must have separate variables for different routines try to share the temporary variables. I would recommended to write a comment on each routine which describes the usage of variables.

janb said...

Does it make sense to declare 2 local variables of the same name ´:x1´´in both subroutines
to save one register ?

I don't know what you mean. The Propeller doen't have registers. All registers are used as variables in Cog memory. So if you would declare 2 (local) variables it takes 2 longs of Cog memory.

@Mike

janb said...

Assume I have 3 subroutines in ASM, called...

Thomas

janb · 2007-06-24 03:49

Hi,
could you help me to understand why the following code works correctly:
after certain pins reach desired state it reads 2 bytes from upper pins 16-31 and stores the value in
3 consecutive local variables o21,o22,o23
This is what I want.
However ...

'-------------------------
CaptureFrame_slow
        waitpeq   frameState, frameMask   'wait new fframe
        waitpeq   lineState, lineMask                 'wait for start condition      
        mov n,#3
:newPix  waitpeq   pixState, pixMask                 'wait for start condition
        mov       x, ina                  'store pins state
        waitpeq   zero, pixMask
        shr x,#16
:save   mov o21,x
        add  :save,d_inc  'increment destination in instruction above  
        djnz      n, #:newPix                 'go for next transition 
d_inc  LONG  $0000_0200      
CaptureFrame_slow_ret ret

....

o20      long 0
o21      long 0
o22      long 0
o23      long 0

the full code is much longer (~200 lines). If I move the definition of
d_inc· LONG· $0000_0200·····
way back at the end of the ASM code (I wanted to reuse this constant in many places to save COG memory)
the code seems to not increment address in ':save' instruction. It replaces 3 time the value of the variable 021. Why is that?
Thanks
Jan

Mike Green · 2007-06-24 04:37

janb,
One obvious mistake is that you've put d_inc (a constant) between two instructions where it will be executed as an instruction. Move it at least to after the CaptureFrame_slow_ret. It shouldn't really matter though since this value would be executed as a NOP. Also, do you reinitialize the instruction at :save?

What you've posted (at least the store part) ought to work. I don't see anything else that's obvious. I'd have to see the whole thing to understand why moving the constant causes a problem. If you have some RES statements before the constant, that would throw things off.

janb · 2007-06-24 05:04

Hi Mike,
the full OBJ code is attached. Unfortunatly it is not short, nor one can run it - there is more objects needed.

The problematic subroutine is

CaptureFrame_slow
        waitpeq   frameState, frameMask   'wait new frame
        waitpeq   lineState, lineMask  'wait for new line 
        mov n,lineLen
:newPix waitpeq   pixState, pixMask 'wait for new pixel 
        mov x, ina   'acquire new value
        and x,dataBusMask
        shr x,dataBusPin0
:save2   mov CogArr,x
        add  :save2,d_inc  'increment destination in instruction above
        waitpeq   zero, pixMask
        djnz      n, #:newPix                 'go for next transition 
CaptureFrame_slow_ret ret
d_inc  LONG  $0000_0200

I did moved d_inc below subroutine as you have suggested.
The code as is works.
But if I move this one line at the end the line

add  :save2,d_inc

does nothing and data pileup on the first address of 'CogArr'.
If you have time could you advice me if the order of local/global variables is optimal.
I started with your earlier advice to declare local trmporary variables in every subroutine but this reduced the avaliable space for my big local array - so now there are few working variables decalerd at the end. E.g. ':n' and 'n' .

Also the lines

        mov x, ina   'acquire new value
        and x,dataBusMask
        shr x,dataBusPin0

are ment to acquire one value from a bus made out of K pins starting form pin0 - is it optimal?
Thanks
·Jan

Mike Green · 2007-06-24 05:42

1) You do have variables at the end in the pattern "long ... long ... res ... long ... long" and this is not legal. The RES statements must come last (although a FIT statement can follow it). No LONG statements or instructions may follow a RES in that DAT section. There are some exceptions to that rule, but that's a longer discussion.

2) You're using :d_inc for many of the copies of $0000_0200. That won't work if you want to use a single copy of the constant. Put the "d_inc LONG $0000_0200" at the end (before the RES) and change all the ":d_inc" to just "d_inc".

deSilva · 2007-06-24 12:00

I have these comments wrt assambly style:

- When you modify an instruction, it is good style to clearly mark the modified part as 0 (or #0 if appropriate) - some even use 0-0.
So you will alwys be remainded that you have to initialize it; though there will be rare cases when code ist performed only once and a static preset will do....

- There is no such concept as "local variables" in a COG, just "local names". These NAMES are valid between two non-local names only! This can lead to dangerously inserting variables (LONG) within the instruction flow...
The idea behind this practice - to save space - however is absolutly wrong. The idea to protect local static variables by hiding their names on the other hand is good and valid. Because of the JMPRET conventions this can only be accomplished using an additional "jump-over".

- Saving memory (i.e. registers!!) by overlaying local variables (i.e. using registers in the old fashioned way) is a tricky practice needing much discipline, experience and a good naming practice! Nested routines will always play tricks with your best intentions

CardboardGuru · 2007-06-24 14:17

deSilva said...
- Saving memory (i.e. registers!!) by overlaying local variables (i.e. using registers in the old fashioned way) is a tricky practice needing much discipline, experience and a good naming practice! Nested routines will always play tricks with your best intentions

It's easy to run out of space on a cog. The two approaches I've used up to now to reuse variable memory have been:

The register approach.

r0 RES 1
r1 RES 1
r2 RES 2

Then taking pains to work out where each can be reused.

And the multiple labels on shared memory approach.

foo
bar res 1
x
trumpet res 1
y
trombone res 1

This makes for more clarity in reading the code, but far more difficulty in working out which locations can be reused.

But it occurs to me now that this is purely a matter of scope, which a high level language will deal with by means of having local variables on the stack, and safely reusing memory that way. So how about using a naming convention to indicate scope. e.g. When writing a TV driver, you might have the following structure:

scanline loop
    frame loop
        set background
    process sprites
        do sprite

So how about naming vars starting with the scope.

Scanline_count
ScanlineFrame_count
ScanlineFrame_foo
ScanlineSprites_count
ScanlineSpritesSprite_bar

So the rule for whether you can share a memory location is to compare the scope part of each varable name, and you can reuse if it is different. Different, not just shorter.

Scanline_count RES 0

ScanlineFrame_count
ScanlineSprites_count res 0

ScanlineFrame_foo
ScanlineSpritesSprite_bar res 0

What do you thiink? It won't appeal to those who like their assembly code terse, that's for sure. And scope names would need to be short to fit in the 30 char identifier limit. I think it might be a workable solution to the problem. But I'd have to try it out for real to get a feel for how useful it is.

Post Edited (CardboardGuru) : 6/24/2007 2:24:24 PM GMT

deSilva · 2007-06-24 14:56

I think this will not work so easily

(1) There can be variables in a routine being used statically, i.e. some "local memory"; they are kind of global and must be left alone under all circimstances (thus ned for a special naming convention)
(2) In most cases a "static call tree" can be constructed, i.e. a simple graph showing who calls who. As recursion is not straightforward with the Propeller, it is generally assumed that this is always possible, but it is not!
Consider a routine A calling B, deping on a flag - then using the same global flag B calls C.
Now the flag beeing unset, A could call C and C than B without violation of any law except the lawe of good prgramming practice

Thus it is neccessary for any rules to have a unambigious static call tree in the first place!

Under those circumstances, all routines can use any variables not in their calling path; this is similar to what CardboardGuru has in mind

(3) Alas, issues start already earlier within longer routines, e.g. using "I", and "J" as loop indexes.... Note however that there is no help for this even in highest level languages (except of the functional kind!).

I sometimes used extensive documentation of the kind:
"Using: R1, R2, R3"
and
"Free: R3"

which could help, when beeing painstakingly updated with each change...

CardboardGuru · 2007-06-24 15:28

deSilva said...
(1) There can be variables in a routine being used statically, i.e. some "local memory"; they are kind of global and must be left alone under all circimstances (thus ned for a special naming convention)

Then they'd have larger scope. Such as the Scanline_count variable I gave. Some variables will have file scope - they'd not have any scope prefix, and couldn;t be resused for anything else.

deSilva said...
(2) In most cases a "static call tree" can be constructed, i.e. a simple graph showing who calls who. As recursion is not straightforward with the Propeller, it is generally assumed that this is always possible, but it is not!
Consider a routine A calling B, deping on a flag - then using the same global flag B calls C.
Now the flag beeing unset, A could call C and C than B without violation of any law except the lawe of good prgramming practice
Thus it is neccessary for any rules to have a unambigious static call tree in the first place!

True. It wouldn't work automatically, you'd have to be intelligent about what scope you gave to a variable. It's just a notational way of expressing what you know about how long a variable's value remains used. Where there is doubt, you'd give the widest scope to a variable. Perhaps file scope - which means may not be reused for anything.

It wouldn't be a replacement for the dynamic scoping that comes with a stack. But then I haven't yet seen any prop code complicated enough that it would need dynamic scoping.

janb · 2007-06-25 03:08

Thanks Guys!
this one will be a simple on: how do multiply to vales in assembler?
Assume register x & y hold to arbitrary (not too large) integers.
how do I get its product in to register z?
Do I need to make a loop and add X to register Z Y-times, like in kidergarten?
Jan

Mike Green · 2007-06-25 03:17

There was a long posting by Chip soon after the Propeller was released containing, among other things, multiplication, division, and square root routines. I don't remember the link to it, but I've attached a copy I had on hand. It was under the subject "Propeller Guts".

janb · 2007-06-25 03:32

I see,
so it is like in the kindergarten, except instead of multiplying Y times you rather shift & sum for powers of 2 used in Y.
Very tricky!
Thanks
Jan

Paul Baker · 2007-06-25 04:50

The next chip will have a single cycle instruction multiply.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

janb · 2007-06-25 13:19

Hi Paul,
Good to hear.
I did counted - this 'mul'-emulation costs ~55 instructions (~250 clock ticks) - much more than 1 instruction per 'add'
So I'm forced to rething my ASM algo.
An obvoius questions: when? will it be stil propeler or sth else/different?
Thanks for the feedback
jan

Paul Baker · 2007-06-25 18:03

No answer to when, it will be code compatible with the current Propeller, but there will be enough enhancements that there wont be an absolute compatibility.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

deSilva · 2007-06-25 18:55

Ha, ha!

Paul Baker · 2007-06-25 20:41

deSilva said...
Ha, ha!

? explain

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

deSilva · 2007-06-25 20:47

Sorry

Well, I did never expect the Propeller II to be fully bit-code compatible - you squeezed out to many bit fields in the Propeller I

And - of course - marketing would not allow you to point out any timings

Graham Stabler · 2007-06-25 20:48

yeah really funny

janb · 2007-06-27 01:31

Hi Guys,
I'm stuck. The following code works properly:

'---------------------------
ClearArray
        mov x,#12
        movd :self,#CogArr 'reset address of Cog array
        mov i,bufLen ' clear only used portion of the buffer
:self   mov 0-0,x 
        add  :self,d_inc  'increment destination in instruction above
        djnz  i, #:self 'continue till end of line  
        'mov n,#22
ClearArray_ret  ret

It fills internal COG array 'CogArr' of lenght 'bufLen' with values of 12.

But if I enable one more line in this routine
·mov n,#22
which changes some other register it caus this routine to write zeros instead of 12 to my array.
I do not get this entangelment.
The full code is attached.
Please advice
Jan

janb · 2007-07-04 16:46

Hi,
with time and patience I solved·my previous·problem, the ASM code was writeing on itself.
Perhaps you could help me with new problem?

I'd like to measure exactly (1 prop clock tick accuracy) the time of· some pin going from ·high to low . The time interval is of the order of· ~20 prop ticks.
The following code works only approximate, according to the manual· waitpeq takes 5+ clock ticks - it is not fully deterministic.

        waitpeq   zero, pixMask   
        mov T1,cnt
        waitpeq   pixMask, pixMask   
        waitpeq   zero, pixMask   
        mov T2,cnt
zero long 0

Since I want to measure deltaT only once· would it be possible to use a prop counter instead?

        waitpeq   zero, pixMask   
        '??? activate ctra, frqa,dira 
           
        'wait fixed amount of time, slightly longer then expected measurement time

       'extract content of ctra to get T2-T1

Can it be done? Would be be more accurate, providing there is enough clock ticks to initialize the counter while pin pointed by pinMask is low?

Any suggestions? Perhaps there is somewhere a code I could look it up?
Thanks
Jan

Thanks
Jan

Mike Green · 2007-07-04 17:03

The easiest thing is to set up the counter ahead of time to use mode %01100 which counts clock cycles whenever the selected pin is zero. Your program would do a "waitpne zero,pixMask". At that point, phsa or phsb would contain the number of clock cycles the pin was low or it would contain zero (or whatever its previous value was). If T2-T1 is zero, there was no pulse. If T2-T1 is greater than zero, there was a pulse and you have its width in 12.5ns clock cycles (with an 80MHz system clock).

janb · 2007-07-04 17:26

Hi,
thanks a lot. So my code should look like:

mov ctra,#%01100_0000_0000_0010 ' count when pin#1 is LOW

mov frqa,#1 ' increment cog counterA by 1 per clock tick (at 80 MHz)

waitpeq   pixMask, pixMask ' wait untill counter stops counting 1st time

mov T1,phsa ' save counter value

waitpeq   zero, pixMask ' give it a chance to count again

waitpeq   pixMask, pixMask ' wait untill it stops counting 2nd time

mov T2,phsa

sub T2,T1 ' this is time between 2 consecutive HIG of pin#1

I'll give it a try. I appreciate the immediate feedback, it is a lot of fun to work in such enviroment
Jan

·

janb · 2007-07-04 20:57

Hi,
Thanks again.
The code below does work!· It measures separately duration of negative and positive state of pin#5 and stores 4 cnt values in variables t0,...t4.

Now I want to do more·complex stuff:
Q1:·to measure time for a coincidence state of 2 pins I should use mode %10001 or %10010, ...etc, right?
Q2: what is the difference between modes %01100 and %10101 ?

Thanks
Jan

:modeNEG long %01100 <<26         
:modePOS long %01000 <<26
:mask    long 1<<5
:pin     long 5
        mov frqb,#1 ' increment cog counterB by 1 per clock tick (at 80 MHz)

        'negative state duration
        mov x,:modeNEG
        add x,:pin       
        mov ctrb,x ' count when pin#1 is LOW
        waitpeq   :mask, :mask ' wait untill counter stops counting 1st time
        mov t0,phsb ' save counter value
        waitpeq   zero, :mask ' give it a chance to count again   
        waitpeq   :mask, :mask ' wait untill it stops counting 2nd time
        mov t1,phsb

        'positive state duration
        mov x,:modePOS
        add x,:pin       
        mov ctrb,x ' count when pin#1 is LOW
        waitpeq   zero, :mask ' wait untill counter stops counting 1st time
        mov t2,phsb ' save counter value
        
        waitpeq   :mask, :mask ' wait untill it stops counting 2nd time
        waitpeq   zero, :mask ' give it a chance to count again   
        mov t3,phsb

Mike Green · 2007-07-04 22:32

Jan,
Yes, to measure a logical coincidence use mode %10001 or %10010.

I believe modes %01100 and %10101 are functionally equivalent. I suspect the logic was easier this way than trying to add some additional conditions for the duplicate mode values.

Paul Baker · 2007-07-05 15:58

Hi Jan,
To clarify a point, waitpeq is deterministic with respect to the pin state. The reason it is listed as 5+ is it takes 4 clocks to process the instruction, plus however many cycles of compare necessary to achieve the wait state. If the value is true at the beginning it will take 5 cycles since only one compare cycle occurs. For situations where more than one compare cycle occurs, the next instruction begins execution on the next clock cycle after a comparison evaluates true.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

janb · 2007-07-05 16:18

Hi,
thanks for the explanation. I'm more interested in the case when a cog stops at waitpeg and now waits for a given pin state - in my example pin goes high at the beginning of new frame from a CMOS camera, shortly after data transmission starts.
How much time it will take between the frame-pin (driven by the camera) goes high and execution of the next ASM instruction in the cog after weitpeq?

I need to know this, since the 1 st image data in the frame will show e.g. 1 usec after the frame-pin goes high and pixel data will change every 50 ns (i.e. one ASM instruction time). I need to know very precise which pixel I'm reading and which skipping
Thanks
Jan

Paul Baker · 2007-07-05 17:19

The next instruction will begin executing the next clock cycle, since at 80MHz each clock cycle is 12.5 ns, the first stage of the next instruction will begin executing between 0 and 12.5ns after the line goes high, since that is the resolution of the comparator.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Understanding Assembler

Comments