Shop OBEX P1 Docs P2 Docs Learn Events
[Propeller Assembly] Using PAR — Parallax Forums

[Propeller Assembly] Using PAR

Vega256Vega256 Posts: 197
edited 2011-07-05 23:57 in Propeller 1
Hey,

If I pass in an address of a piece of data using PAR but I wanted to know the address of the next sequential piece of data in memory as well, would I just add 1 to PAR or is this completely incorrect?

Comments

  • potatoheadpotatohead Posts: 10,261
    edited 2011-07-04 19:20
    That is correct.

    Adding one would point to the next BYTE in the HUB. Adding two would be a WORD, and four is a LONG.


    mov index, PAR
    add index, #4
    rdlong cog_var1, index
    add index, #1
    rdbyte cog_var2, index
    etc...

    I see Cluso beat me to it. Transfer PAR to a working cog storage location first, then operate on it. Didn't say it explicitly, and should have.
  • Cluso99Cluso99 Posts: 18,069
    edited 2011-07-04 19:26
    You may need to add 4. WARNING: You cannot actually add to PAR because it cannot be written. You have to copy it into your cog and then add to this and use it in your rdxxxx instruction.

    This is because the hub is addressed as bytes. PAR can only be passed as a long address because it is shifted 2 bits left with the bottom 2 bits both set to zero. However, if you are picking up bytes you add 1 and for words add 2.

    IIRC there is a good example in passing parameters in the VGA sample code in the obex.
  • Vega256Vega256 Posts: 197
    edited 2011-07-05 07:09
    So then to write to an array in SPIN via PASM, I would need the address of the first element, then I would need to increment the address by x offset and write the next piece of data, and repeat until the end of the array. Correct?
  • potatoheadpotatohead Posts: 10,261
    edited 2011-07-05 08:01
    Yes. Props have no index registers. And the hub access windows are generally 2, 6 and 10 instructions, if you are not branching and or using the special wait instructions.
    :loop           wrlong          value, index
    
                    add             index, offset
                    djnz            offset, #:loop
    
    Simple and fast case. Takes advantage of one of the specialized auto decrement instructions. Maybe initialize a region of RAM with this.
    :loop           wrlong          value, index
    
                    add             index, offset
                    cmp             index, boundary   wc, wz
    
                    nop
                    nop
                    
                    nop
          if_Z      jmp             #:loop
    
    The second loop shows the hub access window missed, because of the compare. It runs just as fast with the nop instructions in there, because of the hub access window. When writing routines that work with the HUB RAM, I find it best to write it sequentially, in the order you find helps get it done, then break the instructions into groups to show the window and make counting easy. More common case, there is time to do stuff, but only so much, or the next window is bigger...

    Then re-sequence them to fit the best case access window, 2, 6, or 10 instructions. Very significant speed gains are to be had that way. Here's one example from the Potatotext 2 driver I just completed.
    :loop                RDLONG     colpix, screen            'Fetch two chars
                           mov      A, colpix                 'Need working copy
                           and      colpix, vff00ff00         'mask for colors colors only both characters
    
                           mov      B, A                      'working copy for second character
                           and      A, #$FF                   'Prepare to calculate first character pixel data address
                           
                           shl      A, #3                     'Multiply by 8
                           add      A, fontsum                'point to pixel data in font table
    
                           RDBYTE   pixels, A                 'fetch pixels from hub for first character
                           
                           and      B, v00ff00ff              'pixels only
                           shr      B, #13                    'Prepare to fetch pixels from HUB, shift contains multiply
    
                           nop
                           nop 
    
                           shl      pixels, #16               'Put pixels in position for combining with colors
                           add      B, fontsum                'point to pixel data second character
    
    
    
                           RDBYTE   pixels1, B                'Fetch pixel data from font table
    
                           rev      pixels, #8                'Reverse pixels for proper display
                           rev      pixels1, #8               'Do it for both characters
    
                           or       colpix, pixels            'Combine colors and pixels for TV COG
                           or       colpix, pixels1
    
                           'add      index, #4
                           add      screen, #4                'Point to next pair of characters
    
    
    
                           WRLONG   colpix, index             'Put complete pixel and color data in buffer for TV COG
    
                           add      index, #4                 'Point to next buffer address
                           'add      screen, #4
                                                 
                           djnz     count, #:loop             'Do all of the characters in the scan line, until buffer full
    
    I'll have to go pull the nop instructions out of that one. Forgot and left them in there.

    I'll typically drop them in as place holders to visualize the access cycle, followed by stuffing instructions in there, out of order where possible, to get the best access times. On that loop, it was very important to get that last cycle to be 2 instructions to hit the time requirement. It's currently 6+6+6+2 = 20 instructions. One more instruction in the last HUB OP there would be 6+6+6+6, a jump to 24 instructions.

    Another instruction in the 6 instruction area would bump it to 10. 6+10+6+6 = 28, with a worst case of 10+10+10+6 = 36 instructions. Well worth considering, if you need the access speed. I was at the 28 instructions the first time, putting the operations in simple, logical, sequential order. Choosing when to do things made a considerable difference. Roughly 25 percent faster.

    Calculate that, then write your access loop, then add up your total hub accesses, finally dropping in your between access operations out of order for best speed.

    This is also a great way to calculate quickly whether or not something is going to be possible. Add up the HUB accesses, then approximate instructions, and you've a rough metric on how much time it will take to move the data. A few exercises like this, and it's clear to see it pays off to combine data operations with the HUB where possible, keeping the number of reads and writes to the minimum.
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-05 18:42
    @potatohead: Would your code benefit from one (or twoA) less hub window(s) in that loop you've shown?

    A provided you drop the rev requirement
  • potatoheadpotatohead Posts: 10,261
    edited 2011-07-05 23:42
    It would benefit clocks lower than 80Mhz. TV's don't realistically do more than 640 pixels, and even that requires a nice TV, or S-video, or monochrome display. Figured if I could get 80 chars at 80Mhz, that's good. Losing one or two hub cycles would bring up the character density for lower clocks though. Inverting fonts can be done in the init stage of whatever the higher level task is, I just prefer not to, just because it's easier to put to use by others.

    Honestly, a future port to VGA would benefit, because the sweeps are higher. The lowest sweep I know of that will display reasonably is 640x200, as seen in the full color tile driver. Would have to be interlaced for this display, but... many newer LCD type VGA displays will just de-interlace that, or not render with a lot of flicker like a CRT will. Did a quick test on the one I have, and the interlace display rendering was superior to a CRT. So that's the plan later on.

    It's probably not worth it to lose the rev instructions for TV, as that driver renders at the max for TV @ 80. But, a VGA one likely won't at the current speed, meaning rev would have to go, in that case to get higher character densities.

    What did you have in mind? Something good probably. Do tell... Maybe Vega256 will enjoy some HUB access time chatter from the master! :)
  • kuronekokuroneko Posts: 3,623
    edited 2011-07-05 23:47
    6+2+6+2 (fontsum_paired = fontsum * $00010001):
    DAT
    :loop           RDLONG  colpix, screen            'Fetch two chars
                    mov     A, colpix                 'Need working copy
                    and     colpix, vff00ff00         'mask for colors colors only both characters
    
                    add     screen, #4                'Point to next pair of characters
                    and     A, v00ff00ff              'Prepare to calculate character pixel data addresses
                    shl     A, #3                     'Multiply by 8
                    add     A, fontsum_paired         'point to pixel data in font table
    
                    RDBYTE  pixels, A                 'fetch pixels from hub for first character
                    shl     pixels, #16               'Put pixels in position for combining with colors
                    shr     A, #16                    'point to pixel data second character
    
                    RDBYTE  pixels1, A                'Fetch pixel data from font table
                    rev     pixels, #8                'Reverse pixels for proper display
                    rev     pixels1, #8               'Do it for both characters
    
                    or      colpix, pixels            'Combine colors and pixels for TV COG
                    or      colpix, pixels1
    
                    WRLONG  colpix, index             'Put complete pixel and color data in buffer for TV COG
                    add     index, #4                 'Point to next buffer address
                    djnz    count, #:loop             'Do all of the characters in the scan line, until buffer full
    
    6+2+2+2 (fontsum_paired = fontsum * $00010001, rev requirement lifted):
    DAT
    :loop           RDLONG  colpix, screen            'Fetch two chars
                    mov     A, colpix                 'Need working copy
                    and     colpix, vff00ff00         'mask for colors colors only both characters
    
                    add     screen, #4                'Point to next pair of characters
                    and     A, v00ff00ff              'Prepare to calculate character pixel data addresses
                    shl     A, #3                     'Multiply by 8
                    add     A, fontsum_paired         'point to pixel data in font table
    
                    RDBYTE  pixels, A                 'fetch pixels from hub for first character
                    or      colpix,pixels             'Combine colors and pixels for TV COG
                    shr     A, #16                    'point to pixel data second character
    
                    RDBYTE  pixels, A                 'Fetch pixel data from font table
                    shl     pixels, #16               'Relocate pixel data
                    or      colpix, pixels            'Combine colors and pixels for TV COG
    
                    WRLONG  colpix, index             'Put complete pixel and color data in buffer for TV COG
                    add     index, #4                 'Point to next buffer address
                    djnz    count, #:loop             'Do all of the characters in the scan line, until buffer full
    
  • potatoheadpotatohead Posts: 10,261
    edited 2011-07-05 23:57
    Very nice, combining the add operations. I like it. Will plug that in and give it a go here in the near future. Nicely done. One long will hold two 16 bit addresses, appreciated.

    Edit: And there is another thing about the Prop. It's best to operate on data in sizes that make sense for it. Not doing that, where there are indexes tends to be very costly.

    @Vega256, sorry for extending your query. Usually, the access window questions very closely follow indexing ones. :) Figured I would get the jump on that, just as somebody did for me when I crossed this point in PASM.
Sign In or Register to comment.