Shop OBEX P1 Docs P2 Docs Learn Events
Waitvid pasm question — Parallax Forums

Waitvid pasm question

Agent420Agent420 Posts: 439
edited 2009-09-11 15:52 in Propeller 1
I am curious regarding a hardware aspect of the waitvid command...· When the video generator activates the waitvid command (ready for more pixel bits), does it grab a copy of the referenced pixel register or does it work directly from the referenced pixel register?

I may need to modify the referenced pixel register in a command closely following the waitvid...· would modifying the pixel regsiter so quickly have any effect on the video generator?

I have a section of code that will send 1 video line of 1024 pixels in 2 bit color format (32 bit pixel data).· I preset the video data in an array of 32 register addresses, which are called consecutively.· Due to timing constraints, I want to flush these registers as they are used so that I can use the limited free time to regenerate them, so I fill them with 0's as soon as they are sent by the waitvid.

Should this pose a problem?

                        mov     frame,#32               '+4 do line = 32 frames
                        mov     vscl,vscl_cursor        '+4 do cursor pixels (32)
:frame1                 waitvid color,lpix              '34 free cycles until next waitvid
                        add     :frame1,#1              '+4 incr lpix source
:frame2                 mov     lpix,#0                 '+4 clear pixels when done
                        add     :frame2,d0              '+4 incr lpix destination
                        djnz    frame,#:frame1          '+4
                                                        '* 16 (-18)
                                                        '+4

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2009-09-09 20:23
    The WAITVID instruction·copies the registers (memory locations) specified when it is executed.· You can change those registers in the next instruction if you want and they won't affect what the video circuitry is doing.
  • ericballericball Posts: 774
    edited 2009-09-10 13:51
    Actually it's all triggered by when the internal frame counter hits zero:
    1. The internal frame and pixel counters are loaded from VSCL.
    2. The internal color and pixel registers are loaded from D & S.
    3. Execution continues if the COG is paused on WAITVID.

    The effect is the same (the WAITVID D & S values are loaded into the internal color & pixel registers) unless the frame counter expires before WAITVID pauses - then the color & pixel registers get loaded with whatever is on the buses.

    Note: Although the manual gives 5 cycles for WAITVID, my testing has shown you need to allow 7 cycles.



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Composite NTSC sprite driver: Forum
    NTSC & PAL driver templates: ObEx Forum
    OnePinTVText driver: ObEx Forum
  • Agent420Agent420 Posts: 439
    edited 2009-09-10 14:20
    > Note: Although the manual gives 5 cycles for WAITVID, my testing has shown you need to allow 7 cycles.


    I've been meaning to ask you about that, I did note you mentioned that in an earlier post as well...

    Some of my code is rather tight on timing, so I'd like to get the best understanding of how many cycles I have available.

    In the example I posted above, the pixel rate is set at 65Mhz.· From previous posts, I see you suggest that SysClock / VidClock * VSCL pixels should provide how many cycles are required.· So I estimate 80/65 * 32 = 39 (rounded down from 39.384) - 5 cycles for the waitvid (from the manual) yields the 34 free as indicated in my comments.· If I were to use your observed measurements of 7, that yields 32 free.· Is that correct?

    I'll start erring on the side of caution and begin using 7 as a default from now on.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Post Edited (Agent420) : 9/10/2009 3:13:40 PM GMT
  • potatoheadpotatohead Posts: 10,261
    edited 2009-09-10 14:23
    This explains the code Linus posted here, where the display timing synced right up with instruction timing. Once the video generator was running, it would display the image, whether or not the waitvid instruction was executed. Was possible to comment them out, and still get the image.

    Nice job Eric. Nobody else here got that simple realization. Thanks for posting it. IMHO, that first sentence should be explicitly stated in the data sheet.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Wiki: Share the coolness!
    Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
    Safety Tip: Life is as good as YOU think it is!
  • ericballericball Posts: 774
    edited 2009-09-10 17:42
    Agent420 said...

    I've been meaning to ask you about that, I did note you mentioned that in an earlier post as well... Some of my code is rather tight on timing, so I'd like to get the best understanding of how many cycles I have available.· In the example I posted above, the pixel rate is set at 65Mhz.· From previous posts, I see you suggest that SysClock / VidClock * VSCL pixels should provide how many cycles are required.· So I estimate 80/65 * 32 = 39 (rounded down from 39.384) - 5 cycles for the waitvid (from the manual) yields the 34 free as indicated in my comments.· If I were to use your observed measurements of 7, that yields 32 free.· Is that correct?· I'll start erring on the side of caution and begin using 7 as a default from now on.·

    Correct.· 80MHz / 65MHz * 32 PLLA/frame = 39 CLK/frame, so you have 32 CLK between WAITVIDs.· Don't forget to allow 22 CLK for any HUB access (unless synch'd by a previous HUB access after the WAITVID).· You can't assume the WAITVID clock will be phase locked to the HUB access.

    If anyone wants to verify this, you can explictly set FRQA in my OnePinTVText driver.· When you start to see sparkles in the text, you've gone too fast.· For LSB fonts there are 50+WAITVID CLKs per frame (character).


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Composite NTSC sprite driver: Forum
    NTSC & PAL driver templates: ObEx Forum
    OnePinTVText driver: ObEx Forum
  • Agent420Agent420 Posts: 439
    edited 2009-09-10 18:02
    > Don't forget to allow 22 CLK for any HUB access (unless synch'd by a previous HUB access after the WAITVID).


    Thanks.· I am trying to discipline myself regarding timing, and do take hub access into account.· I allow 22 for the first hub operation following an unknown state, and from there attempt to keep track of the hub and intersperse 2 x 4 cycle operations as possible to minimize cycle waste.

    edit - you can see I used Chip's driver as the framework, where I inherit his par register trick ;-)
    ' Do three lines minus horizontal back porch pixels to buy a big block of time
    
     
                            mov     vscl,vscl_three_lines_mhb  '3872 pixels @ 65Mhz
                            waitvid color,#0                '4758 free cycles until next waitvid
                                                            '([color=red]4765 - 7 [noparse][[/noparse]waitvid instr[/color]]) 
    'get missile sprite data
                            mov     par,#4                  '+ 4 read 4 missile x & y values
                            mov     idx,x_base              '+ 4 x ptr base
                            mov     idy,y_base              '+ 4 y ptr base
                            movd    :stor_y,#y1             '+ 4 local y1-y4 
                            movd    :xoff,#xoff             '+ 4 local x offset
                            movd    :mpix,#mpix             '+ 4`local pixel patterns
                                                            '[color=red][b]+22? hub synch[/b]
    [/color]:read_x                 rdlong  regs, idx               '+ 8 read x coord
                            mov     tpix,#1                 '+ 4 reset pixel mask
                            mov     temp,regs               '+ 4 copy x to work
                            and     temp,#31                '+ 4 calc pixel bit #
                            shl     tpix,temp               '+ 4 set pixel mask
                            shr     regs,#5                 '+ 4 divide x by 32 = frame offset
    :xoff                   mov     xoff,regs               '+ 4 store offset
    :mpix                   mov     mpix,tpix               '+ 4 store pixel
                            add     idx,#4                  '+ 4 incr x base address
                            add     :mpix,d0                '+ 4 incr mpix dest
                            add     :xoff,d0                '+ 4 incr xoff dest                    
                            rdlong  regs,idy                '+ 8 read y
                            sub     regs,lines_m1           '+ 4 bias y
    :stor_y                 mov     y1,regs                 '+ 4 store y                              
                            add     idy,#4                  '+ 4 incr y base address
                            add     :stor_y,d0              '+ 4 incr y1y4 dest
                            djnz    par,#:read_x            '+ 4
                                                            '+ 4 djnz non-branch cycles
                                                            '* 354 (-4404 free)
    

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Post Edited (Agent420) : 9/10/2009 6:08:29 PM GMT
  • ericballericball Posts: 774
    edited 2009-09-10 20:14
    Close, but not quite.

    First, your :read_x rdlong will take 12 cycles since it starts on cycle 20 after the previous rdlong. That means the hub synch penalty is 10 additional cycles (for a worst case of 22 total).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Composite NTSC sprite driver: Forum
    NTSC & PAL driver templates: ObEx Forum
    OnePinTVText driver: ObEx Forum
  • kuronekokuroneko Posts: 3,623
    edited 2009-09-11 00:47
    ericball said...
    That means the hub synch penalty is 10 additional cycles (for a worst case of 22 total).
    Slightly OT, but unless someone can provide a code fragment proving 7..22 consumed cycles I claim that any hub access actually consumes 8..23 cycles (I checked, but admit that I could be wrong).
  • Agent420Agent420 Posts: 439
    edited 2009-09-11 10:33
    > your :read_x rdlong will take 12 cycles since it starts on cycle 20 after the previous rdlong

    Ah, you are correct. I lost count.·I'll be reworking this code anyway as I recently learned I do not require the AND #31 or temp vars because shift operations are apparently limited to the lower 5 bits.



    > Slightly OT, but unless someone can provide a code fragment proving 7..22 consumed cycles I claim that any hub access actually consumes 8..23 cycles (I checked, but admit that I could be wrong).

    Just how do you go about measuring such a thing?· And I wonder how/if the interleaved execution plays a roll in that?


    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • kuronekokuroneko Posts: 3,623
    edited 2009-09-11 11:11
    Agent420 said...
    Just how do you go about measuring such a thing? And I wonder how/if the interleaved execution plays a roll in that?
    Interleaved execution has no effect on that. Each cog gets access with 2 cycles offset (so after 8x2 cycles you'll get your next slot, provided you are in sync). As for measuring, the way I did it was like this. You start with this simple code fragment:

    mov reg0, cnt
    mov reg1, cnt
    sub reg1, reg0
    


    This gives you a base offset of 4 in reg1, if you put a nop in between the two mov instructions you'll end up with 8 etc (i.e. basic instruction timing). Similar tests have shown that waitcnt and waitpxx require a minimum of 6 cycles. As for hub instructions, I used cogid for example. Which initially gives you a number of cycles depending on how well you hit the hub window for this particular cog. Then you place a waitcnt in front of the first mov and adjust the delay in single cycle steps covering everything from 16n..16n+15, which in turn leaves you with 16 instruction timings. What I got was a minimum of 8 and a maximum of 23 cycles (worst case miss).

    Tip of the day: the first hub window match for a given cog can be achieved when the 2nd instruction is a hub instruction (first one should consume 4 cycles).

    Post Edited (kuroneko) : 9/11/2009 11:32:25 AM GMT
  • Agent420Agent420 Posts: 439
    edited 2009-09-11 11:48
    Ok, makes sense.· I can't test this now, but I'm curious and will investigate later.

    On a side note, of all things, it seems like operation cycle counts and hub access counts are a rather techincal odd thing for Parallax to have got wrong in all of their documentation, particularly regarding the hub for which they go into some theory of operation.· Not that things like that can't happen.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
  • dMajodMajo Posts: 855
    edited 2009-09-11 15:24
    ericball said...
    Close, but not quite.

    First, your :read_x rdlong will take 12 cycles since it starts on cycle 20 after the previous rdlong. That means the hub synch penalty is 10 additional cycles (for a worst case of 22 total).
    I disagree that. the penalty is 4 clocks each loop from the second pass on:

    Agent's code:
     
                                                            '[color=red][b]+22? hub synch[/b]
    [/color]
    :read_x                 rdlong  regs, idx               '+ 8 read x coord
                            mov     tpix,#1                 '+ 4 reset pixel mask
                            mov     temp,regs               '+ 4 copy x to work
    --------------------------------------------------------
                            and     temp,#31                '+ 4 calc pixel bit #
                            shl     tpix,temp               '+ 4 set pixel mask
                            shr     regs,#5                 '+ 4 divide x by 32 = frame offset
    :xoff                   mov     xoff,regs               '+ 4 store offset
    --------------------------------------------------------
    :mpix                   mov     mpix,tpix               '+ 4 store pixel
                            add     idx,#4                  '+ 4 incr x base address
                            add     :mpix,d0                '+ 4 incr mpix dest
                            add     :xoff,d0                '+ 4 incr xoff dest 
    --------------------------------------------------------                   
                            rdlong  regs,idy                '+ 8 read y              [color=red]<= here in sync[/color]
                            sub     regs,lines_m1           '+ 4 bias y
    :stor_y                 mov     y1,regs                 '+ 4 store y  
    --------------------------------------------------------                            
                            add     idy,#4                  '+ 4 incr y base address
                            add     :stor_y,d0              '+ 4 incr y1y4 dest
                                                                                     [color=red]<= here space for one addidional pasm 4 clock instruction[/color]
                            djnz    par,#:read_x            '+ 4
    --------------------------------------------------------
                                                            '+ 4 djnz non-branch cycles
    
    


    without the additional pasm line after the jump to read_x the rdlong comes 4 clocks earlier than expected (to be in sync) so the penalty is just 4 additional cycles. Thats mean that you can have one pasm line more where indicated without lengthening the loop time

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    · Propeller Object Exchange (last Publications / Updates)
  • Agent420Agent420 Posts: 439
    edited 2009-09-11 15:52
    Thanks for the inspection and comments.· As you may note, I do attempt to intersperse 4 cycle commands with an eye towards hub synchronization, which is why I comment two cycle values for the first rdlong at :read_x - the 1st 22 and then 8 for loop operation·(I think I forgot to account for the 4 cycles of djnz).

    I've now removed the unneeded AND# & associated temp var and moved some lines around, but I think given what I need to process I will have one hub access that is 4 cycles off.· This particular loop is only 4 iterations so that's not a big deal, but I do attempt to code with this type of foresight when possible.· Still, shaved a few cycles from the previous code.

    ' Do three lines minus horizontal back porch pixels to buy a big block of time
                            mov     vscl,vscl_three_lines_mhb  '3872 pixels @ 65Mhz
                            waitvid color,#0                [color=red]'4758 free cycles until next waitvid[/color]
                                                            '(4765 - 7 [noparse][[/noparse]waitvid instr]) 
    'get missile sprite data
                            mov     par,#4                  '+ 4 read 4 missile x & y values
                            mov     idx,x_base              '+ 4 x ptr base
                            mov     idy,y_base              '+ 4 y ptr base
                            movd    :stor_y,#y1             '+ 4 local y1-y4 
                            movd    :xoff,#xoff             '+ 4 local x offset
                            movd    :mpix,#mpix             '+ 4`local pixel patterns
    
                                                            '+[color=red]22? hub synch[/color]
    :read_x                 rdlong  regs, idx               '+12 read x coord
                            mov     tpix,#1                 '+ 4 reset pixel mask
                            shl     tpix,regs               '+ 4 set pixel mask
    
                            shr     regs,#5                 '+ 4 divide x by 32 = frame offset
    :xoff                   mov     xoff,regs               '+ 4 store offset
    :mpix                   mov     mpix,tpix               '+ 4 store pixel
                            add     idx,#4                  '+ 4 incr x base address
    
                            rdlong  regs,idy                '+ 8 read y   
                            add     :mpix,d0                '+ 4 incr mpix dest
                            add     :xoff,d0                '+ 4 incr xoff dest 
                       
                            sub     regs,lines_m1           '+ 4 bias y
    :stor_y                 mov     y1,regs                 '+ 4 store y
                            add     idy,#4                  '+ 4 incr y base address
                            add     :stor_y,d0              '+ 4 incr y1y4 dest
    
                            djnz    par,#:read_x            '+ 4
    
                                                            '+ 4 non-branch cycles
                                                            '* 320 (-4438 free)
    

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Sign In or Register to comment.