Waitvid pasm question

Agent420 · 2009-09-09 20:06

I am curious regarding a hardware aspect of the waitvid command...· When the video generator activates the waitvid command (ready for more pixel bits), does it grab a copy of the referenced pixel register or does it work directly from the referenced pixel register?

I may need to modify the referenced pixel register in a command closely following the waitvid...· would modifying the pixel regsiter so quickly have any effect on the video generator?

I have a section of code that will send 1 video line of 1024 pixels in 2 bit color format (32 bit pixel data).· I preset the video data in an array of 32 register addresses, which are called consecutively.· Due to timing constraints, I want to flush these registers as they are used so that I can use the limited free time to regenerate them, so I fill them with 0's as soon as they are sent by the waitvid.

Should this pose a problem?

                        mov     frame,#32               '+4 do line = 32 frames
                        mov     vscl,vscl_cursor        '+4 do cursor pixels (32)
:frame1                 waitvid color,lpix              '34 free cycles until next waitvid
                        add     :frame1,#1              '+4 incr lpix source
:frame2                 mov     lpix,#0                 '+4 clear pixels when done
                        add     :frame2,d0              '+4 incr lpix destination
                        djnz    frame,#:frame1          '+4
                                                        '* 16 (-18)
                                                        '+4

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Mike Green · 2009-09-09 20:23

The WAITVID instruction·copies the registers (memory locations) specified when it is executed.· You can change those registers in the next instruction if you want and they won't affect what the video circuitry is doing.

ericball · 2009-09-10 13:51

Actually it's all triggered by when the internal frame counter hits zero:

The internal frame and pixel counters are loaded from VSCL.
The internal color and pixel registers are loaded from D & S.
Execution continues if the COG is paused on WAITVID.

The effect is the same (the WAITVID D & S values are loaded into the internal color & pixel registers) unless the frame counter expires before WAITVID pauses - then the color & pixel registers get loaded with whatever is on the buses.

Note: Although the manual gives 5 cycles for WAITVID, my testing has shown you need to allow 7 cycles.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

Agent420 · 2009-09-10 14:20

> Note: Although the manual gives 5 cycles for WAITVID, my testing has shown you need to allow 7 cycles.

I've been meaning to ask you about that, I did note you mentioned that in an earlier post as well...

Some of my code is rather tight on timing, so I'd like to get the best understanding of how many cycles I have available.

In the example I posted above, the pixel rate is set at 65Mhz.· From previous posts, I see you suggest that SysClock / VidClock * VSCL pixels should provide how many cycles are required.· So I estimate 80/65 * 32 = 39 (rounded down from 39.384) - 5 cycles for the waitvid (from the manual) yields the 34 free as indicated in my comments.· If I were to use your observed measurements of 7, that yields 32 free.· Is that correct?

I'll start erring on the side of caution and begin using 7 as a default from now on.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Post Edited (Agent420) : 9/10/2009 3:13:40 PM GMT

potatohead · 2009-09-10 14:23

This explains the code Linus posted here, where the display timing synced right up with instruction timing. Once the video generator was running, it would display the image, whether or not the waitvid instruction was executed. Was possible to comment them out, and still get the image.

Nice job Eric. Nobody else here got that simple realization. Thanks for posting it. IMHO, that first sentence should be explicitly stated in the data sheet.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

ericball · 2009-09-10 17:42

Agent420 said...

I've been meaning to ask you about that, I did note you mentioned that in an earlier post as well... Some of my code is rather tight on timing, so I'd like to get the best understanding of how many cycles I have available.· In the example I posted above, the pixel rate is set at 65Mhz.· From previous posts, I see you suggest that SysClock / VidClock * VSCL pixels should provide how many cycles are required.· So I estimate 80/65 * 32 = 39 (rounded down from 39.384) - 5 cycles for the waitvid (from the manual) yields the 34 free as indicated in my comments.· If I were to use your observed measurements of 7, that yields 32 free.· Is that correct?· I'll start erring on the side of caution and begin using 7 as a default from now on.·

Correct.· 80MHz / 65MHz * 32 PLLA/frame = 39 CLK/frame, so you have 32 CLK between WAITVIDs.· Don't forget to allow 22 CLK for any HUB access (unless synch'd by a previous HUB access after the WAITVID).· You can't assume the WAITVID clock will be phase locked to the HUB access.

If anyone wants to verify this, you can explictly set FRQA in my OnePinTVText driver.· When you start to see sparkles in the text, you've gone too fast.· For LSB fonts there are 50+WAITVID CLKs per frame (character).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

Agent420 · 2009-09-10 18:02

> Don't forget to allow 22 CLK for any HUB access (unless synch'd by a previous HUB access after the WAITVID).

Thanks.· I am trying to discipline myself regarding timing, and do take hub access into account.· I allow 22 for the first hub operation following an unknown state, and from there attempt to keep track of the hub and intersperse 2 x 4 cycle operations as possible to minimize cycle waste.

edit - you can see I used Chip's driver as the framework, where I inherit his par register trick ;-)

' Do three lines minus horizontal back porch pixels to buy a big block of time

 
                        mov     vscl,vscl_three_lines_mhb  '3872 pixels @ 65Mhz
                        waitvid color,#0                '4758 free cycles until next waitvid
                                                        '([color=red]4765 - 7 [noparse][[/noparse]waitvid instr[/color]]) 
'get missile sprite data
                        mov     par,#4                  '+ 4 read 4 missile x & y values
                        mov     idx,x_base              '+ 4 x ptr base
                        mov     idy,y_base              '+ 4 y ptr base
                        movd    :stor_y,#y1             '+ 4 local y1-y4 
                        movd    :xoff,#xoff             '+ 4 local x offset
                        movd    :mpix,#mpix             '+ 4`local pixel patterns
                                                        '[color=red][b]+22? hub synch[/b]
[/color]:read_x                 rdlong  regs, idx               '+ 8 read x coord
                        mov     tpix,#1                 '+ 4 reset pixel mask
                        mov     temp,regs               '+ 4 copy x to work
                        and     temp,#31                '+ 4 calc pixel bit #
                        shl     tpix,temp               '+ 4 set pixel mask
                        shr     regs,#5                 '+ 4 divide x by 32 = frame offset
:xoff                   mov     xoff,regs               '+ 4 store offset
:mpix                   mov     mpix,tpix               '+ 4 store pixel
                        add     idx,#4                  '+ 4 incr x base address
                        add     :mpix,d0                '+ 4 incr mpix dest
                        add     :xoff,d0                '+ 4 incr xoff dest                    
                        rdlong  regs,idy                '+ 8 read y
                        sub     regs,lines_m1           '+ 4 bias y
:stor_y                 mov     y1,regs                 '+ 4 store y                              
                        add     idy,#4                  '+ 4 incr y base address
                        add     :stor_y,d0              '+ 4 incr y1y4 dest
                        djnz    par,#:read_x            '+ 4
                                                        '+ 4 djnz non-branch cycles
                                                        '* 354 (-4404 free)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Post Edited (Agent420) : 9/10/2009 6:08:29 PM GMT

ericball · 2009-09-10 20:14

Close, but not quite.

First, your :read_x rdlong will take 12 cycles since it starts on cycle 20 after the previous rdlong. That means the hub synch penalty is 10 additional cycles (for a worst case of 22 total).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

kuroneko · 2009-09-11 00:47

ericball said...
That means the hub synch penalty is 10 additional cycles (for a worst case of 22 total).

Slightly OT, but unless someone can provide a code fragment proving 7..22 consumed cycles I claim that any hub access actually consumes 8..23 cycles (I checked, but admit that I could be wrong).

Agent420 · 2009-09-11 10:33

> your :read_x rdlong will take 12 cycles since it starts on cycle 20 after the previous rdlong

Ah, you are correct. I lost count.·I'll be reworking this code anyway as I recently learned I do not require the AND #31 or temp vars because shift operations are apparently limited to the lower 5 bits.

> Slightly OT, but unless someone can provide a code fragment proving 7..22 consumed cycles I claim that any hub access actually consumes 8..23 cycles (I checked, but admit that I could be wrong).

Just how do you go about measuring such a thing?· And I wonder how/if the interleaved execution plays a roll in that?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

kuroneko · 2009-09-11 11:11

Agent420 said...
Just how do you go about measuring such a thing? And I wonder how/if the interleaved execution plays a roll in that?

Interleaved execution has no effect on that. Each cog gets access with 2 cycles offset (so after 8x2 cycles you'll get your next slot, provided you are in sync). As for measuring, the way I did it was like this. You start with this simple code fragment:

mov reg0, cnt
mov reg1, cnt
sub reg1, reg0

This gives you a base offset of 4 in reg1, if you put a nop in between the two mov instructions you'll end up with 8 etc (i.e. basic instruction timing). Similar tests have shown that waitcnt and waitpxx require a minimum of 6 cycles. As for hub instructions, I used cogid for example. Which initially gives you a number of cycles depending on how well you hit the hub window for this particular cog. Then you place a waitcnt in front of the first mov and adjust the delay in single cycle steps covering everything from 16n..16n+15, which in turn leaves you with 16 instruction timings. What I got was a minimum of 8 and a maximum of 23 cycles (worst case miss).

Tip of the day: the first hub window match for a given cog can be achieved when the 2nd instruction is a hub instruction (first one should consume 4 cycles).

Post Edited (kuroneko) : 9/11/2009 11:32:25 AM GMT

Agent420 · 2009-09-11 11:48

Ok, makes sense.· I can't test this now, but I'm curious and will investigate later.

On a side note, of all things, it seems like operation cycle counts and hub access counts are a rather techincal odd thing for Parallax to have got wrong in all of their documentation, particularly regarding the hub for which they go into some theory of operation.· Not that things like that can't happen.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

dMajo · 2009-09-11 15:24

ericball said...
Close, but not quite.

First, your :read_x rdlong will take 12 cycles since it starts on cycle 20 after the previous rdlong. That means the hub synch penalty is 10 additional cycles (for a worst case of 22 total).

I disagree that. the penalty is 4 clocks each loop from the second pass on:

Agent's code:
 
                                                        '[color=red][b]+22? hub synch[/b]
[/color]
:read_x                 rdlong  regs, idx               '+ 8 read x coord
                        mov     tpix,#1                 '+ 4 reset pixel mask
                        mov     temp,regs               '+ 4 copy x to work
--------------------------------------------------------
                        and     temp,#31                '+ 4 calc pixel bit #
                        shl     tpix,temp               '+ 4 set pixel mask
                        shr     regs,#5                 '+ 4 divide x by 32 = frame offset
:xoff                   mov     xoff,regs               '+ 4 store offset
--------------------------------------------------------
:mpix                   mov     mpix,tpix               '+ 4 store pixel
                        add     idx,#4                  '+ 4 incr x base address
                        add     :mpix,d0                '+ 4 incr mpix dest
                        add     :xoff,d0                '+ 4 incr xoff dest 
--------------------------------------------------------                   
                        rdlong  regs,idy                '+ 8 read y              [color=red]<= here in sync[/color]
                        sub     regs,lines_m1           '+ 4 bias y
:stor_y                 mov     y1,regs                 '+ 4 store y  
--------------------------------------------------------                            
                        add     idy,#4                  '+ 4 incr y base address
                        add     :stor_y,d0              '+ 4 incr y1y4 dest
                                                                                 [color=red]<= here space for one addidional pasm 4 clock instruction[/color]
                        djnz    par,#:read_x            '+ 4
--------------------------------------------------------
                                                        '+ 4 djnz non-branch cycles

without the additional pasm line after the jump to read_x the rdlong comes 4 clocks earlier than expected (to be in sync) so the penalty is just 4 additional cycles. Thats mean that you can have one pasm line more where indicated without lengthening the loop time

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

Agent420 · 2009-09-11 15:52

Thanks for the inspection and comments.· As you may note, I do attempt to intersperse 4 cycle commands with an eye towards hub synchronization, which is why I comment two cycle values for the first rdlong at :read_x - the 1st 22 and then 8 for loop operation·(I think I forgot to account for the 4 cycles of djnz).

I've now removed the unneeded AND# & associated temp var and moved some lines around, but I think given what I need to process I will have one hub access that is 4 cycles off.· This particular loop is only 4 iterations so that's not a big deal, but I do attempt to code with this type of foresight when possible.· Still, shaved a few cycles from the previous code.

' Do three lines minus horizontal back porch pixels to buy a big block of time
                        mov     vscl,vscl_three_lines_mhb  '3872 pixels @ 65Mhz
                        waitvid color,#0                [color=red]'4758 free cycles until next waitvid[/color]
                                                        '(4765 - 7 [noparse][[/noparse]waitvid instr]) 
'get missile sprite data
                        mov     par,#4                  '+ 4 read 4 missile x & y values
                        mov     idx,x_base              '+ 4 x ptr base
                        mov     idy,y_base              '+ 4 y ptr base
                        movd    :stor_y,#y1             '+ 4 local y1-y4 
                        movd    :xoff,#xoff             '+ 4 local x offset
                        movd    :mpix,#mpix             '+ 4`local pixel patterns

                                                        '+[color=red]22? hub synch[/color]
:read_x                 rdlong  regs, idx               '+12 read x coord
                        mov     tpix,#1                 '+ 4 reset pixel mask
                        shl     tpix,regs               '+ 4 set pixel mask

                        shr     regs,#5                 '+ 4 divide x by 32 = frame offset
:xoff                   mov     xoff,regs               '+ 4 store offset
:mpix                   mov     mpix,tpix               '+ 4 store pixel
                        add     idx,#4                  '+ 4 incr x base address

                        rdlong  regs,idy                '+ 8 read y   
                        add     :mpix,d0                '+ 4 incr mpix dest
                        add     :xoff,d0                '+ 4 incr xoff dest 
                   
                        sub     regs,lines_m1           '+ 4 bias y
:stor_y                 mov     y1,regs                 '+ 4 store y
                        add     idy,#4                  '+ 4 incr y base address
                        add     :stor_y,d0              '+ 4 incr y1y4 dest

                        djnz    par,#:read_x            '+ 4

                                                        '+ 4 non-branch cycles
                                                        '* 320 (-4438 free)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Waitvid pasm question

Comments