Shop OBEX P1 Docs P2 Docs Learn Events
Quick Cog-to-Hub transfer — Parallax Forums

Quick Cog-to-Hub transfer

lonesocklonesock Posts: 917
edited 2009-12-04 17:02 in Propeller 1
Hi, everybody.

I saw a thread on the use of a counter to auto update a pointer. With 16 clocks between consecutive wrlongs (for example), the PHSx register would be jumping by 16 each time. So, using a stride of 4 longs, I can upload an entire buffer using 4 passes (so the buffer must be a multiple of 16 bytes long, and long aligned).

Here is some proof-of-concept code that actually works for me. It's hard coded to move a 256-byte buffer from the Cog RAM to Hub RAM.

        ' start Counter B as my address incrementer (Logic ALWAYS)
        mov ctrb,#%11111
        shl ctrb,#26
        mov frqb,#1

        ' blah blah blah

'' adr_Buf is already set externally
cog_to_hub
        ' setup all 4 passes
        mov val,#4
        movd :write_long,#buffer
:one_of_four
        ' sync to the Hub RAM access
        rdlong tmp,tmp
        ' how many long to move on this pass? (256 bytes / 4)longs / 4 passes
        mov tmp,#(256 / 4 / 4)
        ' get my starting address right (phsb is incremented 1 per clock, so 16 each Hub access)
        mov phsb,adr_Hub
        ' write the longs, stride 4...low 2 bits of phsb are ignored
:write_long
        wrlong 0-0,phsb
        add :write_long,incDest4
        djnz tmp,#:write_long
        ' go back to where I started, but advanced 1 long
        sub :write_long,decDestNminus1
        ' offset my Hub pointer by one long per pass
        add adr_Hub,#4
        ' do all 4 passes
        djnz val,#:one_of_four
cog_to_hub_ret
        ret
   

{===== PASM initialized variables and parameters =====}
incDest                 long 1 << 9
incDest4                long 4 << 9
decDestNminus1          long (256 / 4 - 1) << 9

{===== my buffer, 256 bytes =====}
buffer    res 256/4




Anybody see a way to speed it up?

Jonathan

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
«1

Comments

  • VIRANDVIRAND Posts: 656
    edited 2009-12-02 00:22
    Clever and cool trick!
    I definitely need to be processing 256 bytes practically simultaneously with multiple cogs as fast as possible now.
    I won't know if what I am trying to do will work yet though.
  • kuronekokuroneko Posts: 3,623
    edited 2009-12-02 03:34
    Hi Jonathan,
    lonesock said...
    I saw a thread on the use of a counter to auto update a pointer. With 16 clocks between consecutive wrlongs (for example), the PHSx register would be jumping by 16 each time. So, using a stride of 4 longs, I can upload an entire buffer using 4 passes (so the buffer must be a multiple of 16 bytes long, and long aligned).

    ...

    Anybody see a way to speed it up?
    must be the time of the year, I was playing with the same idea just for getting stuff into the cog (and before someone mentions overlay loaders, they don't do what I want). You can't do anything about the single wasted slot between passes (djnz timing) unless you are prepared to have an unrolled loop to transfer the buffer manually which - depending on the size of the buffer - may not be an option. The second wasted slot can be removed if you are prepared to run each pass separately (and can afford the extra code). This also gives you the option to transfer any size of buffer if it's at least 4 longs. Code is based on your example.

    PUB null
    CON
      lcnt = 256 / 4  ' number of longs to transfer (min 4)
      
    ' lcnt | pass 0   1   2   3
    ' 4n+0        n   n   n   n
    ' 4n+1        n+1 n   n   n
    ' 4n+2        n+1 n+1 n   n
    ' 4n+3        n+1 n+1 n+1 n
    '
    ' example lcnt = 6 (4n+2)
    ' pass 0 transfers 2 longs      0       4
    ' pass 1 transfers 2 longs        1       5
    ' pass 2 transfers 1 longs          2
    ' pass 3 transfers 1 longs            3
    
    DAT             org     0
    
                    movi    ctrb, #%0_11111_000
                    mov     frqb, #1
    
    adr_Hub         long    0
    
    '' adr_Hub is already set externally
    
    cog_to_hub      movd    :write_long0, #buffer + 0
                    movd    :write_long1, #buffer + 1
                    movd    :write_long2, #buffer + 2
                    movd    :write_long3, #buffer + 3
    
                    ' sync to hub window
                    rdlong  tmp, tmp                '  +0 =
    
                    mov     tmp, #(lcnt + 3 / 4)    '  +8   see constant section
                    mov     phsb, adr_Hub           ' +12
    
    :write_long0    wrlong  0-0, phsb               '  +0 =
                    add     :write_long0, incDest4  '  +8
                    djnz    tmp, #:write_long0      ' +12
    
                    add     adr_Hub, #4             ' +20
                    mov     tmp, #(lcnt + 2 / 4)    ' +24   see constant section
                    mov     phsb, adr_Hub           ' +28
    
    :write_long1    wrlong  0-0, phsb               '  +0 =
                    add     :write_long1, incDest4  '  +8
                    djnz    tmp, #:write_long1      ' +12
    
                    add     adr_Hub, #4             ' +20
                    mov     tmp, #(lcnt + 1 / 4)    ' +24   see constant section
                    mov     phsb, adr_Hub           ' +28
    
    :write_long2    wrlong  0-0, phsb               '  +0 =
                    add     :write_long2, incDest4  '  +8
                    djnz    tmp, #:write_long2      ' +12
    
                    add     adr_Hub, #4             ' +20
                    mov     tmp, #(lcnt + 0 / 4)    ' +24   see constant section
                    mov     phsb, adr_Hub           ' +28
    
    :write_long3    wrlong  0-0, phsb               '  +0 =
                    add     :write_long3, incDest4  '  +8
                    djnz    tmp, #:write_long3      ' +12
    
    cog_to_hub_ret  ret
       
    {===== PASM initialized variables and parameters =====}
    incDest4        long    4 << 9
    tmp             long    0
    
    {===== my buffer, 256 bytes =====}
    buffer          res     lcnt
    
                    fit
                    
    DAT
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-12-02 07:25
    I do not understand why you are not interested in the Overlay concept. It catches·ALL hub cycles, so none are wasted. Here is an extract...
                  movd      overlay_copy2,overlay_par       'move cog END address into rdlong instruction
                  sub       overlay_par,#1                  'decrement cog End address by 1
                  movd      overlay_copy1,overlay_par       'move cog END-1 address into rdlong instruction
                  shr       overlay_par,#16                 'extract the overlay## hub END address (remove cog address)
    overlay_copy2 rdlong    0-0,overlay_par                 'copy long from hub to cog   (hptr ignores last 2 bits!)
                  sub       overlay_par,#7                  'decrement hub ptr by 1 long (prev by 1, now by 7)
                  sub       overlay_copy2,_0x400            'decrement cog (destination) address by 2
    overlay_copy1 rdlong    0-0,overlay_par                 'copy long from hub to cog
                  sub       overlay_copy1,_0x400            'decrement cog (destination) address by 2 
                  djnz      overlay_par,#overlay_copy2      'decrement hub ptr by 1 long (now by 1, next by 7)
    
    

    It uses the fact that hub long accesses ignore the lower 2 bits, so firstly the hub pointer is decremented by 7, then by 1, followed by 7, 1 etc.

    We load in reverse and of course we use the fact that the djnz is replaced.

    So it only remains to see if a combination of this could work for writing to hub. I will think about this.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

    Post Edited (Cluso99) : 12/2/2009 7:31:03 AM GMT
  • kuronekokuroneko Posts: 3,623
    edited 2009-12-02 07:41
    Cluso99 said...
    We load in reverse and of course we use the fact that the djnz is replaced.
    That only works when the overlay(!) is appended to the loader stub (or a block of data with an instruction up front). I want to load data at an address which isn't fixed.

    Don't get me wrong, there are use cases where the append approach does what I want and I will use it [noparse]:)[/noparse]
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-12-02 08:32
    kuroneko: Yes I have looked and I cannot make it (overlay method) work efficiently for writing.

    lonesock & kuroneko: I love the use of the counters. I haven't ever used them before. I am having a look again as there may be ways I can use them - thanks smile.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
  • BRBR Posts: 92
    edited 2009-12-02 22:14
    Jonathan:

    I took the liberty of rearranging your code so I could understand it better (the inline comments confuse/distract me)...hope you don't mind.

    I had to make some tweaks to the code to get it to work in this little test program. See the "FIXME" notes in the ASM comments...in particular, I had in indexing problem wherein I had to add 4 to the hub address before entering the loop in order to get the output right. Strange that it works for you without this...

    I also modded your code so that I could specify any arbitrary size block transfer (as long as it is divisible by 4). This cost 4 extra longs, but seemed like a worthwhile addition.

    Anyway, this little test routine yields the following results:
    #longs xfred        #ticks       time/ideal
    16                      438            1.71
    32                      694            1.36
    64                      1206           1.18
    128                     2230           1.09
    256                     4278           1.04
    



    For large block transfers, the overhead only amounts to about a 5-10% penalty relative to the ideal. Hard to see how you'll do much better than what you've got. I haven't looked at the spin interpreter's longmove routine to see if there's any clever tricks to be had there. Might be.

    I'm guessing this is for your speech recognition challenge project. There must be scads of PASM routines people have written to do this; I wish there were some kind of "ASM code snippet repository" for this kind of stuff so we could check out what others have done before.

    Post Edited (BR) : 12/2/2009 10:24:47 PM GMT
  • kuronekokuroneko Posts: 3,623
    edited 2009-12-03 00:53
    BR said...
    I had to make some tweaks to the code to get it to work in this little test program. See the "FIXME" notes in the ASM comments...in particular, I had in indexing problem wherein I had to add 4 to the hub address before entering the loop in order to get the output right. Strange that it works for you without this ...
    The code sequence

    mov     phsa, addr              '  -4
    wrlong  $, phsa                 '  +0 =
    


    works exactly as advertised provided the code is sync'ed with the hub window and addr is 4n. The actual problem here is your test code, you dump the array starting with index 1 (when it should be 0, i.e. 0..127).

    HTH
  • BRBR Posts: 92
    edited 2009-12-03 03:19
    @Kuroneko: Aha! Right you are! OK, loop bug fixed, removed superfulous add in asm routine.

    I noticed that there's only 1 bit different between the wrlong and rdlong instructions. So just out of curiosity, I made a version that will do block reads and writes. Takes 2 extra longs to implement. Seems to work fine, see attached demo. I thought about adding the read functionality using a mask...only 1 long needed, but it just didn't seem as intuitive to use.

    Jonathan, I guess I'm not helping much relative to your original question. But this is a handy little algorithm you've come up with here.
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-12-03 04:24
    BR: Your read code falls into the write code, so the write is put back.

    BTW, You can also do
    movi wr_mode, #%000010_000 'set to writelong
    and
    movi wr_mode, #%000010_001 'set to readlong

    The easiest would be to have the automatic fix back to read at the end of the routine, and set to a write 1 instruction before the read call to setup the write.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
  • lonesocklonesock Posts: 917
    edited 2009-12-03 18:01
    Thanks, everybody.

    Should I submit this to the PASM tips & tricks thread? Or the prop Wiki? (I'm not sure of the protocol, or who's in charge of those resources.)

    Jonathan

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-03 20:07
    I've had this thread in my peripheral vision, but haven't had time to dig into it. 'Just a thought; wouldn't this work?

                  [b]movs[/b]      hubwrite,hubaddr
                  [b]movd[/b]      hubwrite,bufaddr
                  [b]mov[/b]       counter,#bufsize
    
    hubwrite      [b]wrlong[/b]    0-0,#0-0
                  [b]add[/b]       hubwrite,incboth
                  [b]djnz[/b]      counter,#hubwrite
    
    incboth       [b]long[/b]      1 << 9 | 4        
    
    
    


    -Phil
  • lonesocklonesock Posts: 917
    edited 2009-12-03 20:16
    Works great as long as your Hub address is < 512 wink.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-03 20:36
    Right, there is that little detail! smile.gif It's easy enough to arrange, though, with a minimal top-level object that starts the "real" one:

    [b]CON[/b]
    
       [b]_clkmode[/b]       = [b]xtal1[/b] + [b]pll16x[/b]
       [b]_xinfreq[/b]       = 5_000_000
    
    [b]OBJ[/b]
    
      top   : "RealTopLevelObject"
    
    [b]PUB[/b]  Start
    
      top.start(@buffer)
    
    [b]DAT[/b]
    
    buffer  [b]byte[/b] 0[noparse][[/noparse]*256]
    
    
    


    In this case the buffer starts at $001C, so its size will be limited to $200 - $1C bytes.

    -Phil
  • lonesocklonesock Posts: 917
    edited 2009-12-03 21:53
    Good point!

    If you had Cog RAM to spare, you could have 2 equal sized buffers: data & addresses. Pre-load the addresses once on cog load. This assumes that the hub addresses are fixed.

                  movs      hubwrite,#address_table
                  movd      hubwrite,#cog_buffer
                  mov       counter,#bufsize
    
    hubwrite      wrlong    0-0,0-0
                  add       hubwrite,incboth
                  djnz      counter,#hubwrite
    
    incboth       long      1 << 9 | 1
    



    Jonathan

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-03 22:30
    Heh. Pretty clever! smile.gif

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-12-03 23:51
    Very Clever. Of course you need as many locations in cog for the hub pointers as there are for the cog buffer.

    BTW Isn't the incboth·incorrect...
    incboth long· 1<<9 |·4··'add 1 to cog address (long) and add 4 to hub address (bytes)
    (I think the immediate use of the hub is still addressed as bytes - manual on other laptop)

    Phil: IIRC the compiler re-arranges the DAT code so that it will not likely be in the lower 512 longs of hub. However, at least homespun or bst (or maybe both) has an option to reserve space after the mandatory $10? longs in hub.

    I used the immediate hub address for some code quite a while ago. It's a relatively unknown (or thought of) idea.

    As for the tiny hubwrite loop, a
    · movi·· hubwrite,#000010_000· 'set to writelong
    or
    · movi·· hubwrite,#000010_001· 'set to readlong
    before calling the routine will make it a read or write loop. Below is·even better·though.

    So, here is an improved·extended solution (1 extra instruction used per block for read, 2 for write), but I saved 2 (movs & movd)
    'NOTE: Requires the hub block to be in first 512 bytes (warning... more complex than you think)
     
    readblock     mov       hubloop,hubread            'setup to readlong
    writeblock    mov       counter,#bufsize
    
    hubloop       wrlong    0-0,0-0                    'setup to readlong/writelong on entry/exit respectively
                  add       hubloop,incboth            'add 1 to cog (longs) and 4 to hub (bytes)
                  djnz      counter,#hubloop
    
                  mov       hubloop,hubwrite           'preset to writelong (for next time)
                  ret
      
    counter       long      0
    incboth       long      1 << 9 | 4                 'add 1 to cog (longs) and 4 to hub (bytes)
    hubwrite      wrlong    cog_buffer,#address_table  '\ used as initialisation instruction format
    hubread       wrlong    cog_buffer,#address-table  '/
    
    

    Now to work out an easy·way to force the buffer into the lower 512 bytes of hub. For this we will have to ask the compiler masters, Brad & Michael.


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
  • kuronekokuroneko Posts: 3,623
    edited 2009-12-04 00:18
    Can of worms anyone? [noparse]:)[/noparse]

    Another option (apart from the hub-below-0x200) would be to let one cog write its data to a fixed hub address. Two displacer cogs will take it from there and distribute it to wherever you want.

         |---------------|---------------|---------------|---------------|---------------|---------------|----
    cog0 W               W               W               W               W               W               W
    cog1   R               W               R               W               R               W               R
    cog2     N               R               W               R               W               R               W
    
    


    write loop (fragment)

    wrlong  0-0, fixed
    add     $-1, dst_plus_1
    djnz    length, #$-2
    


    displacer loop (fragment)

    rdlong  temp, fixed
    wrlong  temp, buffer
    add     buffer, #8
    djnz    half_of_length, #$-3
    

    Post Edited (kuroneko) : 12/4/2009 3:04:07 AM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-04 00:34
    Cluso99 said...
    BTW Isn't the incboth incorrect... incboth long 1<<9 | 4 'add 1 to cog address (long) and add 4 to hub address (bytes)
    No, he's correct, since he's indexing an address that points to a list of hub addresses also in the cog. They are those addresses in the list that will be four apart from each other.
    Cluso99 said...
    Phil: IIRC the compiler re-arranges the DAT code so that it will not likely be in the lower 512 longs of hub
    I know it puts VAR variables somewhere else, but I'm pretty sure the DAT block remains contiguous with the object's Spin code. I could be wrong, though.

    -Phil
  • BRBR Posts: 92
    edited 2009-12-04 03:19
    I've been trying to wrap my brain around this little Chinese puzzle. If only there was a way to make the counter increment at 1/4 clock frequency, the whole thing could be simplified and you could hit every hub window perfectly. The counters do have a built-in PLL with a div4 tap, but it is on the output, not the input. I've convinced myself that a counter can be made to increment at clkfreq/4 using two counters (one in nco mode outputting a square wave a clkfreq/4, the other in posedge mode). The catch: you have to use a spare pin to pass the square wave output from one counter to the input of the other--yuck.

    I'm down to trying to see if I can pick a freqb value such that the counter wraps around and increments by 4 once every 16 clocks. I.e., set freqb := ($FFFF_FFFF+4)/4 and hope that the remainder won't skew the count. I've tried this but so far it doesn't work. Think I've been staring at it too long, time to stop and try again later.

    I've attached my code for anyone interested in taking a crack at it. Be prepared for a helping of vexation.

    @Cluso99: I've been staring at your overlay code snippet and I'm just not getting it. Is there a thread somewhere that discusses this further?
  • kuronekokuroneko Posts: 3,623
    edited 2009-12-04 03:23
    BR said...
    Is there a thread somewhere that discusses this further?
    Try this thread.
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-12-04 04:32
    Phil: Yes, I see now. When I was looking, I was referring back to lower hub, not a cog address.

    I now have another power supply for my 2nd laptop (old one died) so I can run both - am now off to see how I can force the compiler to place the buffer in lower hub.

    BR: Overlay - Use heaters link. There are examples and descriptions in the code. Heater uses it in the ZiCog. BTW acknowledgements are in the code including PhiPi, hippy and others.

    Yes, I was looking at the counters and like you, I couldn't see a way without wasting a pin... and I never seem to have a spare [noparse]:([/noparse]

    kuroneco, lonesock and PhiPi·seem to be the counter masters. (added PhiPi as I just saw his explanations on another thread - nice Phil)

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

    Post Edited (Cluso99) : 12/4/2009 5:38:00 AM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-04 06:05
    It's easier still to put the buffer at location 1 in the cog. Then you can use the loop counter as the cog pointer. The hub buffer can be anywhere:

    [b]DAT[/b]
    
    cog           [b]jmp[/b]       #cogstart
    cogbuf        [b]long[/b]      0[noparse][[/noparse]*bufsize-1]
    cogbufend     [b]long[/b]      0
    
    cogstart      [b]mov[/b]       hubbufend,#bufsize-1
                  [b]shl[/b]       hubbufend,#2
                  [b]add[/b]       hubbufend,[b]par[/b]
    
                  '...
    
    transfer      [b]mov[/b]       hubptr,hubbufend        'Start at par + 4 * bufsize - 4.
                  [b]mov[/b]       cogptr,#cogbufend       'Start at cogbufend == bufsize.
    
    xferlp        [b]wrlong[/b]    cogptr,hubptr           'Last wrlong is from cogbuf == 1, to par
                  [b]sub[/b]       hubptr,#4
                  [b]djnz[/b]      cogptr,#xferlp
    
    
    


    Of course, the downside is that you can't use res to allocate the cog buffer.

    See below.

    -Phil

    Post Edited (Phil Pilgrim (PhiPi)) : 12/4/2009 6:26:48 AM GMT
  • kuronekokuroneko Posts: 3,623
    edited 2009-12-04 06:18
    Phil, how is that wrlong cogptr, hubptr actually transferring data? The instruction itself doesn't change so you always store the content of e.g. register N, all you end up with is a hub buffer with decreasing values.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-04 06:24
    Oh Smile! You're right: 'one too many levels of indirection there. Nevermind! sad.gif

    -Phil
  • Cluso99Cluso99 Posts: 18,069
    edited 2009-12-04 06:58
    Well I just tested the compilers. bst gives a better listing and I know Brad made it compatible. Here's the results..

    A list of pointers to the routines, etc come first
    DAT's are next
    PUB's and PRI's are next
    VAR's are placed at the end

    So, the buffer needs to be in the first DAT. So use a DAT just for the buffer. Use spin to find the hub address and "poke" it into the PASM routine before loading or pass it via a par parameter.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
    · Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
  • BradCBradC Posts: 2,601
    edited 2009-12-04 07:13
    Phil Pilgrim (PhiPi) said...

    I know it puts VAR variables somewhere else, but I'm pretty sure the DAT block remains contiguous with the object's Spin code. I could be wrong, though.

    Nope, that's correct.
    Each object consists of
    - Method Table
    - DAT block (contiguous and not restructured or tampered with in any way)
    - Spin Methods

    Lather, Rinse, Repeat for all objects.. then

    - VAR block. Object by object with each objects variables sorted in LONG/WORD/BYTE order.

    So the DAT block in the top object is as close to the start of the hub as possible without losing the spin method table.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    If you always do what you always did, you always get what you always got.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-04 07:47
    So the DAT block comes ahead of the Spin methods. Hmm. That's interesting, because it could eliminate the need for a dummy top object, so long as the number of methods is small.

    -Phil
  • BradCBradC Posts: 2,601
    edited 2009-12-04 07:56
    Phil Pilgrim (PhiPi) said...
    So the DAT block comes ahead of the Spin methods. Hmm. That's interesting, because it could eliminate the need for a dummy top object, so long as the number of methods is small.

    -Phil

    If you want to make it minimal, have the top object contain just a DAT section and one method to point to the next sub-object. So the method table will include one spin method and one sub-object. Then you can just behave as normal under that, and the DAT section in your top object can reserve the memory required for you.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    If you always do what you always did, you always get what you always got.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2009-12-04 08:10
    Yeah, that's what I did in my example, but I was assuming the methods themselves subtracted from the available space for the buffer. But since they don't, and if the buffer size can be a ponderable number of bytes less than 512, there's a little more leeway for the top object.

    -Phil
  • BradCBradC Posts: 2,601
    edited 2009-12-04 08:28
    Phil Pilgrim (PhiPi) said...
    Yeah, that's what I did in my example, but I was assuming the methods themselves subtracted from the available space for the buffer. But since they don't, and if the buffer size can be a ponderable number of bytes less than 512, there's a little more leeway for the top object.

    Yep, just take into account that each spin method and each object instance consumes 4 bytes from that.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    If you always do what you always did, you always get what you always got.
Sign In or Register to comment.