Shop OBEX P1 Docs P2 Docs Learn Events
Understanding Assembler — Parallax Forums

Understanding Assembler

janbjanb Posts: 74
edited 2007-07-05 17:19 in Propeller 1
<!--StartFragment -->·Hi,
I'm playing with assembler code. The attached OBJ runs a short assem code and records
CNT at 2 positions in the assem code and reports it back so I can look later on TV terminal.

The attached code works, but if I try to call
call·· #PackOutput··

inside the loop it does not fill the output values
·a:=ioData[noparse][[/noparse]2]
· b:=ioData[noparse][[/noparse]3]

I see '0'
The LED stuff is only to monitor the cog is doing anything. And I understand there is no new info I learn if I call PackOutput over and over inside the loop. But it should work , right?
I'm probably missing sth trivial

Jan
«1

Comments

  • Graham StablerGraham Stabler Posts: 2,507
    edited 2007-06-13 17:29
      cognew(@myAss, @ioData) 'Launch new cog, pass params
      waitcnt(cnt+_1s/2)
    
    



    This makes it work. The spin was outputting to the screen before the first waitcnt had finished.

    Reducing the delay also works as does putting the display part in a loop.

    Graham
  • KaioKaio Posts: 253
    edited 2007-06-13 18:05
    Jan,

    your assembly code is working well. The problem is, that the display routine ends before the call instruction after the waitcnt in assembly is executed.

    You must use a loop for your display routine like the following. Otherwise the Cog which is interpreting the Spin code will die.
      repeat
        a:=ioData[noparse][[/noparse] 2]
        b:=ioData[noparse][[/noparse] 3]
        d:=b-a
        tv.str(string("t1="))
        tv.dec(a)
        tv.str(string(cr, "t1-t0="))
        tv.dec(d) 
    
    



    Thomas
  • janbjanb Posts: 74
    edited 2007-06-13 21:05
    thank you guys
    Jan
  • janbjanb Posts: 74
    edited 2007-06-14 16:28
    <!--StartFragment -->Hi,
    could I use help of this forum again?
    The attached code is my next step toward understanding of the assembler.
    The goal is to
    - declare a local array on a COG· (256 longs)· , clear it
    - fast increment· individual bins , selecting cells at random (now using CNT)
    - after N 'events' write the whole array to HUB memory
    - print on the screen first ~20 cells

    It works to some extend. The attached code presets first 100 cells with values 1100, 1099, 1098,.... and prints content of first 30·cells on TV.

    Here are my questions:
    1) the function· 'PresetArray'
    works, but I do not understand why I need to use· the true address of array at the start (line ***A1). If I replace it by the variable holding the same address (commented out line ***A2) code does not work. I'm wondering why?

    2) the function 'FillArray' (commented out· in the main assembler routine ) works but incorrectly.
    I wanted it to accept 3 'events' and· increment· 3 cells.
    For every 'event' it would receive a ~random cell ID in the range [noparse][[/noparse]0,7]·based on·CNT. The content of such cell should be incremented by 1000 - so it is easy to see on TV which one were picked.
    The result of this code changes after each upload, since the sarting value of CNT value is always different.

    In reality,
    -I see cell #0 is always incremented
    - cell #1,2,3 are never incremented
    - cells #4,8,12,... are sometimes incremented
    Can you help me to fix it?

    3) The function PackOutput -copying COG internal array to HUB array works fine but looks very clumsy to me. Is there a way one can make it faster?

    I appreciate your help
    Jan
  • deSilvadeSilva Posts: 2,967
    edited 2007-06-14 17:16
    1) Funny that you sshould expect that...
    :self mov IPosAddr, x1
    moves x1 into regsiter IPosAdr.. Why should it move x1 into the address it contains?
    In the next line you (correctly) write
    movd :self, IPosAddr
    by which you move the content of register IPosAddr into the instruction.

    2) Why do you have this instruction
    shl x1,#2 ' this is cell address offset
    ??
    What do you mean by "cell"? The previos code shows that you are well aware that COG-registers are addressed one by one (and not by four!)

    3) What is wrong with:
    :self2 wrlong IntPos ,IdxM
    add IPosAddr, #1
    add IdxM,#4
    movd :self2, IPosAddr
    djnz Idx, #:self2
    ??
    Slightly better style however would be

    :loop movd :self2, IPosAddr
    add IdxM,#4
    add IPosAddr, #1
    :self2 wrlong 0 ,IdxM
    djnz Idx, #:self2

    Note that you ned something inbetween moved and : self !
  • KaioKaio Posts: 253
    edited 2007-06-14 17:50
    Jan,

    firstly a correction of the code from deSilva while the loop would not work properly.
    :loop   movd :self2, IPosAddr
            add IdxM,#4
            add IPosAddr, #1
    :self2  wrlong 0 ,IdxM
            djnz Idx, #:loop
    
    



    Now I will help you that the function 'FillArray' is working. As deSilva has mentioned you have to take care to have at least one instruction between a modifying instruction and the instruction which you change. So I have inserted a nop.
    FillArray  {increment content of few cells at 'random' by baseVal=1000 }
            mov     Idx, #3 ' set # of events, (i.e. # cells to be incremented)
    :next   mov  x1, cnt 'calc index based on CNT, tmp  
            shr  x1,#2 'drop 2 LSB to make this number ~random, tmp
            and  x1,#$7 ' this is cell #, range [noparse][[/noparse]0,7],tmp
            'shl   x1,#2 ' obsolete while not byte addressing in Cog
            mov  IPosAddr, #IntPos      'set initial address
            add  IPosAddr, x1   'set final cell address
            movd    :self, IPosAddr 'replace address              
            nop
    :self   add     IntPos, baseVal 'increment content of this cell
            djnz    Idx, #:next
    FillArray_ret    ret
    
    



    Thomas
  • deSilvadeSilva Posts: 2,967
    edited 2007-06-15 16:44
    Kaio said...
    Jan,

    firstly a correction of the code from deSilva while the loop would not work properly.
    :loop   movd :self2, IPosAddr
            add IdxM,#4
            add IPosAddr, #1
    :self2  wrlong 0 ,IdxM
            djnz Idx, #:loop
    
    


    I am sorry I did not test the code :-( and such things tend to stay wrong smile.gif

    the (hopefully) correct code could be:

    :loop   movd :self2, IPosAddr
            add IPosAddr, #1
    :self2  wrlong 0 ,IdxM
            add IdxM,#4
            djnz Idx, #:loop
    
    



    I don't "like" this very much - in fact I don't "like" the Propeller machine code at all - it should be generated by a compiler in the first place...
  • Mike GreenMike Green Posts: 23,101
    edited 2007-06-15 17:25
    deSilva,
    You don't have to like the native instruction set. There will be a C compiler available this Winter from someone else, but it simply will not produce code good enough for time critical / space critical applications and it won't be free. Only hand optimized assembly language will work for that. Fortunately, most of what people want to do is not particularly time critical. Space critical depends on what you want to do since there are only 512 words for coding programs for a COG. The C compiler will probably use a hybrid form of assembly language being called "Large Memory Model" which lets you run programs mostly from HUB memory with maybe a 20-25% (perhaps better) hit on performance ... still not bad at all.
  • janbjanb Posts: 74
    edited 2007-06-15 19:12
    Hi,
    once again thanks for all advices.
    I tried it last night and did work ... after I figured out why the first bin was skipped.
    It was nice to see the same fix has been suggested later on this forum - I have learned sth over last few days from you guys
    Jan
  • KaioKaio Posts: 253
    edited 2007-06-15 21:12
    deSilva said...

    I am sorry I did not test the code :-( and such things tend to stay wrong smile.gif
    No problem, I did it also not test. So I have not realized the other mistake.


    Jan,

    congratulation that you have found this mistake by yourself. It is nice to see, that you have made this rapid progress in a short time. For your further learning of assembly I would suggest you to have a look at POD, if you did not yet.
    http://forums.parallax.com/showthread.php?p=639020

    Then you could be testing your assembly code in an easy manner direct on the Propeller.

    Thomas
  • janbjanb Posts: 74
    edited 2007-06-16 17:05
    <!--StartFragment -->·Hi,
    1)
    Since now I have a working SPIN+*** code I'm trying to optimize it.
    The COG running assembler code should accumulate some values in internal array 100 longs for short period of time.
    Next it should export the array from COG -->HUB memory.
    I have done the timing measurement on the transfer time. The core of the code is:
    .............
    ····· mov t1,cnt
    :loop·· movd· :self2, IPosAddr
    ······· add·· IdxM,#4·········· ' incr Hub address
    :self2· wrlong 0 ,IdxM
    ······· add·· IPosAddr, #1····· 'incr Cog address
    ······· djnz Idx, #:loop·
    ········ mov t2, cnt
    .....................
    The difference t2-t1 is the total transfer time of· Idx=nA long variables.
    I try to minimize (t2-t1)/nA

    It cost now 24 COG ticks to transfer a single long from COG to HUB.
    This code has flexibility to transfer longs stored in arbitrary location in COG memory.
    Is there a way to do sort of 'block transfer', taking advantage· both the source· and destination arrays are continuous in memory?

    I have tested a single
    wrlong a,b
    may take 7..22 ticks - according to the manual.
    But a series of hardcoded instructions seems to lock the phase between COG & HUB

    wrlong a1,b1
    wrlong a2,b2
    wrlong a3,b3

    so it take only 7..22 +2x16 clocks.
    So naively if I'd hardcoed 100 lines as above (a1,...a99)· the average transfer time would be just 16 clocks per long. This would be definitely ugly and would need 100 of 512 lines in COG memory. Later I want to use COG array of size ~300 longs, so it is bad approach.

    Q: is there a way to reach· average wrlong transmission time of 16 ticks per long for 'large'
    COG array?
    The full code is attached for convenience.

    2)
    I tried this POD package and ... sort of get it to work.
    I have external TV connected with 3 resistors so I have changed
    in PODKeyboardTV.spin
    from
    ·screen : "PC_Text"
    · keybrd : "PC_Keyboard"

    to
    ·screen : "TV_Text"
    · keybrd : "PC_Keyboard"
    I did see the reassembled code on TV, but could not control anything w/ the keyboard.

    Thanks for all help so far

    Jan
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-06-16 19:03
    Hi Jan,
    What you are doing is similar in function to the digital storage scope aquisition routines I wrote: http://forums.parallax.com/showthread.php?p=606048
    The assembly routines are highly optimized primarily for space, secondarily for speed. GetFast is the most streamlined of the assembly routines.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • janbjanb Posts: 74
    edited 2007-06-16 19:46
    Wow,
    this is complex.
    Could you explain to how this storage loop works?

    :storeloop mov fbufstart, ina 'store pins state
    add :storeloop, :d_inc 'increment destination in instruction above
    djnz :i, #:storeloop 'go for next transition

    Q1:
    What does it mean if you 'add' a value $200 to an address ':storeloop'? How does it increment the value of fbufsart by 1 long?
    Is it sort of self editing code?

    Q2:
    the djnz instruction seems to decrement a constant :i ? Why :i is not declared as a variable if it is changeing?

    I'll study it
    Thanks a lot
    Jan
  • deSilvadeSilva Posts: 2,967
    edited 2007-06-16 20:47
    It might be a good thing to start at the beginnng:

    Each Propeller instruction has a destination and a source field, both are register numbers of general purpose registers in the COG, starting from 0 upto 511 (thought the last 16 are special). There is also the option - through all instructions - to have a literal 0 ..511 as sorce rather than a register number.
    Thats all!

    Well.. nearly.

    Some of those registers are occupied by the assembly code itself, which gives us the opportunity to modify that code by moving (MOVS, MIOVD, MOVI) or adding into the register occupied by the instruction. This changes the number of the destination or source resgister, or the literal. BTW, the literal is always the source, even when you would rather call it destination, as in the case of jumps or calls (=JMPRET)

    By this very systematic approach a lot of interesting coding is possible - and necessary, as you have often to simulate a non-existent "indexed" addressing mode.

    Don't get confused by the colons; they just mean that name has a local scope an can be re-used within another section.

    ad Q1: Destination is incremented by one, as the destination field starts at bit 10 (just before the 9 -bit source field)
    ad Q2: No, no, no! It decrements the register ":i" which is defined near the end.
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-06-16 20:51
    No problem,
    Q1:
    Yes this code is self modifying, the destination field occupies bits 9-17 and the least significant bit of this field is $200. So by adding $200 to the mov instruction the destination address is incremented by 1, and since the cog memory is organized in longs the mov instruction points to the next long after the add. Something that isn't immeadiately apparent, this add occurs after the mov instruction, this way the djnz becomes the necessary instruction between modifying an instruction and executing the instruction.

    Q2:
    In a cog nothing below the special purpose registers is truely a constant, :i is a "pre-initialized" value. Because this array has to be iterated over twice (first to store the data, then to write it to hub memory) the variable :i has to be reinitialized and this happens before entering the second loop in "mov :i, #CogBufSz1", but by doing the pre-initialization an instruction is saved for the first loop. This brings up another point about the mov instruction in the storeloop, there is a pre-initialization that is done by setting the destination initially to fbufstart; if this loop needed to be executed again you would have to execute a "movd :storeloop, #fbufstart" to reinitialize it.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • KaioKaio Posts: 253
    edited 2007-06-16 21:42
    Jan,

    as you have found the fastest way you can write to the main memory uses 16 system clocks. This is also described in the Propeller manual at page 24 with a nice clock diagram.

    Now you must optimize your code to provide the Cog to write within this short time to the main memory in a loop. While a hub instruction (e.g. wrlong) takes at minimum 7 system clocks then are 9 clocks left. So you have time for only 2 instructions and one instruction would be the djnz instruction. So you could have one instruction to increment a pointer and you could use the loop counter also as pointer in Cog memory. But you would have to fill the buffer in the same manner while it will be transfered in reverse order.
                  mov       Idx,nA                  'the buffer in Cog must be located from 1...nA
    [s]:loop         wrlong    Idx,IdxM                'the buffer will be transfered in reverse order[/s]
                  add       IdxM,#4                 'incr Hub address 
                  djnz      Idx,#:loop              'decr Cog address               
            
    
    nA            res       1                       'used size of the array
    Idx           res       1
    IdxM          res       1
    
    



    When you are using POD you have two possibilities to control it.
    1. You can use the default and recommended configuration which is using the PropTerminal on your PC. It is delivered with POD, but you can also use the newer version if you would have problems by uploading your program using PropTerminal.
    http://forums.parallax.com/showthread.php?p=649540

    2. You can use a TV and a keyboard connected at your Prop board.

    It is possible to merge this configurations but it is not useful. The PropTerminal provides a much longer screen so you can see more information of the Propeller internals.

    Thomas

    Post Edited (Kaio) : 6/17/2007 12:18:23 PM GMT
  • deSilvadeSilva Posts: 2,967
    edited 2007-06-16 22:12
    This code is absolutely cool smile.gif
  • rjo_rjo_ Posts: 1,825
    edited 2007-06-16 23:42
    Jan,

    I've been in and out so much I missed this thread. I had a very similar question.

    In response to my question, Ariba posted a very simple but complete example. His code is a great way to approach Paul's example, which is incrementally more useful and slightly more complex.... really tight code[noparse]:)[/noparse])

    My problem was that I didn't have an application. It was an abstract issue for me ... so I understood the answers... and then promptly forgot them. Then when I wanted to actually implement some code... I had to go back and work through it all again.

    http://forums.parallax.com/forums/default.aspx?f=25&m=190445&g=190993#m190993
    So, for anyone similarly confused... my suggestion is go to Ariba's example... refresh yourself and then go on to Paul's code.

    Rich
  • KaioKaio Posts: 253
    edited 2007-06-16 23:50
    deSilva and Jan,

    sorry, I think this is not working as I described above. It would be writing the value of Idx to the main memory and not the value addressed by Idx.

    I have currently none code found that enables a transfer from a buffer in Cog memory to the main memory within 16 system clocks.
    I don't know if it would be possible.

    Thomas
  • AribaAriba Posts: 2,685
    edited 2007-06-17 01:48
    Hello Kaio

    It should be possible with this code:

                  movd      :loop,#AddrCog          'begin of buffer in Cog
                  movs      :loop,#AddrHub          'begin of buffer in HubRAM
                  mov       Count,#Size             'Size of buffers
    :loop         wrlong    0-0,0-0                 'the buffer will be transfered
                  add       :loop,#$204             'incr Hub address (by 4) and Cog address (by 1) 
                  djnz      Count,#:loop            'loop until done               
    
    



    Cheers
    Andy
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-06-17 03:11
    Hi guys,
    Before I was hired by Parallax and working the kinks out of my dscope, Phil Pilgram and I tussled with this very issue: how to write an array to hub memory in every hub time slice. The answer is, it can't be done. The heart of the issue is that immediate values can only store values of 0-511, with the wrlong instruction this is the location in hub memory. This means only the first 512 locations in hub memory are addressable via Ariba's code. Since static mapping of hub memory is not supported by the compiler, this isn't possible to do in the general sense and is a really hack to try to attempt at all. And dont forget the first 16 locations have a very special meaning. For this reason writing an array to hub memory in single hub slice isn't possible. My·:writeloop loop is the most compact, "works in every situation" code possible with the architecture.

    BTW Ariba your·code violates another rule, the immediate value in the add instruction exceeds the maximum value possible ($1FF).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.

    Post Edited (Paul Baker (Parallax)) : 6/17/2007 3:20:23 AM GMT
  • deSilvadeSilva Posts: 2,967
    edited 2007-06-17 07:00
    Aribas Code ist fine! Pauls issue (add $204) can easily be fixed by using a preset register with that value (which takes no time)! This is one ob my own blockings from former machine language experience: I can hardly imagine that it makes no difference smile.gif


    BTW: Is there an "Instruction timing" diagram?

    I imagine the timing would be

    instr fetch (1) - get registers (2) - operation (3) - store back (4)

    On the other hand there is most likely an instruction prefetch happening during (3) which is the only phase without COG memory access. But we know that an instruction takes four, not 3 ticks.

    And most likely "get registers" would take 2 ticks, when not in immediate mode.

    So, second try:

    instr fetch (1) - idle (decode?) (2) - get dest reg (3) - get source reg (4) - operate (5) = next instr fetch - store back (6)
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-06-17 08:25
    deSilva,
    yes my BTW is easily fixable, but the issue discussed in the paragraph still stands, the instruction is:

    wrlong cog_address, hub_address

    the cog address must be a direct value, ie the value contained in register·cog_address is written to hub memory.

    the hub address can either be a direct value or an immediate value, if it's direct the value in register address hub_address is the address in the hub memory. This works just fine, however to increment the address in hub memory the value contained in register hub_address must be incremented, not the instruction itself; therefore an extra instruction is required. You cannot add $04 to the instruction, why? Because lets say you have your hub address stored in register $06, add $04, now the next time wrlong is executed the contents of register $0A is then used as the hub address, so each time you wrlong the register used to figure out where in hub memory to write to changes. This clearly won't work.

    the other option for the hub address is immediate (# prefix), at first glance this would work, but there is a major catch: the only possible values to index into hub memory are $000 - $1FF and $000 - $00F are off limits, and since it's byte addressable that means you have 124 longs addressable in hub memory. To further complicate things, how are you going to reserve that specific range of locations in hub memory? The compiler has no facility to do this, you could be sure they are the first longs reserved in the top object, but this is poor programming practice, your object must always be the top object, this kind of strict restiction makes your code unusable as a distributable object. Also because this behavior falls in the undocumented catagory, this could change in a future revision of the compiler and all the sudden the code no longer works.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.

    Post Edited (Paul Baker (Parallax)) : 6/17/2007 8:54:18 AM GMT
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-06-17 08:37
    The pipeline is SDIR, Source Destination Instruction Result. The Instruction stage is for the following instruction (so the current instruction was fetched before the results of the previous instruction's result was written), this is why there is the requirement of an intervening instruction for self modifying code. So each instruction actually takes 6 cycles but because it is interleaved with the adjacent instructions the throughput is 1 instruction every 4 cycles.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • deSilvadeSilva Posts: 2,967
    edited 2007-06-17 09:21
    Pipeline: So I was not far from the truth smile.gif

    Loop: So sorry, I definitely missed the catch! But Paul made it absolutely clear now.
    However for a lot of applications it is no restriction to use the first fixed 500 bytes for a communcation area.

    The main obstacle - as I understand - is that the SPIN interpreter starts interpreting at byte 16 !?
    So may be a pure assembly program, consisting of just a single cognew(@asm,0) instruction will leave space.

    Could this work?
    PUB Ex1
        cognew(@asm, 0)
    
    DAT
               LONG 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, ...
    asm     ORG  0
    .....
    
    


    The memory dump looks good smile.gif

    I am well aware that it makes no sense to put data into the HUB when nobody is there to read it; but one can load the other COGS as well (making it 7 COGNEWs at the beginning).

    BTW: How do you get rid of the SPIN interpreter in COG #0 ???

    Post Edited (deSilva) : 6/17/2007 9:28:33 AM GMT
  • KaioKaio Posts: 253
    edited 2007-06-17 12:52
    deSilva,

    if you would use only assembly code, except for starting the Cogs, then after all Spin code is performed no other Spin code would be there to process. So the Cog containing the Spin interpreter would be die.

    Thomas
  • AribaAriba Posts: 2,685
    edited 2007-06-17 23:06
    Hello Paul Baker

    Yes the code from my previous posting does not work, so I thought a little bit longer about it, and found this solution:
                  movd      :loop,#CogBuffer        'begin of buffer in Cog
                  movs      :loop,#HubAddrTab       'begin of adress table
                  mov       Count,#Size             'Size of buffers
    :loop         wrlong    0-0,0-0                 'the buffer will be transfered
                  add       :loop,IncD_S            'incr Table Idx (by 1) and Cog address (by 1) 
                  djnz      Count,#:loop            'loop until done
                  ...               
    
    IncD_S        long  $201
    
    HubAddrTab    long  HubBuffer+0                 'Table with addresses of HubBuffer Indexes
                  long  HubBuffer+4                 'must be initialized in Spin before Cog Start
                  long  HubBuffer+8
                  long  HubBuffer+12
                  long  HubBuffer+16
                  ...                               'max 240
    
    CogBuffer     res   max. 240                    'the Buffer in CogRAM that will be transfered
    
    


    It is not compact at all, but runs with 16 clocks per loop. The half CogRAM is the buffer and the other half the AddressTable, so max. ca. 240 longs are possible, but in any destination-address in the HubRAM!
    And because the Addresses are in a table, every order in the destination buffer is possible (for example: Bit reversed for FFT).

    Andy
  • Paul BakerPaul Baker Posts: 6,351
    edited 2007-06-18 04:36
    If you dont mind consuming 2 longs for each long of information that will work.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Paul Baker
    Propeller Applications Engineer

    Parallax, Inc.
  • janbjanb Posts: 74
    edited 2007-06-18 16:25
    Hi,
    the latest solution from Ariba seems to cost a bit more than 16 clocks per transfer of long.
    The execution of HubAddrTab brings additional price tag of 4 ticks per long, so the total is 20 ticks per long.
    But it is much better than 32 . And one could argue HubAddrTab can be executed only once and long transfer multiple times.
    One could also change HubAddrTab so it writes itself, this way the largest COG array must be below (512-const)/2 longs.
    It was a long series of advices, now I´m as smart as you [noparse]:)[/noparse] , almost.
    For now, I´ll lower the sampling fraction of the source and go with the 4-line ASM code transferring from COG to HUB at 32 ticks per long.

    I hope you do not mind a small suggestion. For student like me, it would help if any example of code with a new idea
    would be accompanied by a clear statement:
    - it has been tried and does work or
    - it may work but was not tested.
    But I have learned a lot already.

    Can I ask now a completely new question about best strategy of allocating local variables (with ´:´´)
    Q1:
    Assume I have 3 subroutines in ASM, called in sequence by the master
    clearArray, fillArray, exportArray
    each needs some sort of internal running index: idx. Shall I declare:
    a) ´:idx´in every subroutine, 3 times or
    b) one
    idx long 0
    or
    c) one
    idx res 1

    I think a) wastes 2 registers. I see no difference between b) and c).
    Any suggestions?

    Q2:
    Assume the 4th subroutine getValue , called by fillArray needs a working variable x1.
    Assume fillArray itself needs some other working variable.
    Does it make sense to declare 2 local variables of the same name ´:x1´´in both subroutines
    to save one register ?
    Or it only adds confusion to the code structure?

    thanks
    Jan
  • Mike GreenMike Green Posts: 23,101
    edited 2007-06-18 17:15
    1) You will probably do better with ": idx" in each subroutine. Since "idx" is allocated on the stack, the space is available for other uses when the subroutines are not executing. Since the subroutines are executed sequentially, they will also reuse the space. Also, the first few words of the stack (and of the VAR area) are accessed with special instructions which are faster and smaller than other access. (c) will not work with Spin. It is intended only for use in assembly language where it allocates space in cog memory, but not in main (hub) memory.

    2) Most times when you need working variables in a subroutine, it is better to declare them as local variables. It is common to use the same names over and over again for local working variables. I tend to use "i", "j", "k", "m", "n" and "a", "b", "c", "d", "e" in this way.
Sign In or Register to comment.