Waitvid pasm question
Agent420
Posts: 439
I am curious regarding a hardware aspect of the waitvid command...· When the video generator activates the waitvid command (ready for more pixel bits), does it grab a copy of the referenced pixel register or does it work directly from the referenced pixel register?
I may need to modify the referenced pixel register in a command closely following the waitvid...· would modifying the pixel regsiter so quickly have any effect on the video generator?
I have a section of code that will send 1 video line of 1024 pixels in 2 bit color format (32 bit pixel data).· I preset the video data in an array of 32 register addresses, which are called consecutively.· Due to timing constraints, I want to flush these registers as they are used so that I can use the limited free time to regenerate them, so I fill them with 0's as soon as they are sent by the waitvid.
Should this pose a problem?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
I may need to modify the referenced pixel register in a command closely following the waitvid...· would modifying the pixel regsiter so quickly have any effect on the video generator?
I have a section of code that will send 1 video line of 1024 pixels in 2 bit color format (32 bit pixel data).· I preset the video data in an array of 32 register addresses, which are called consecutively.· Due to timing constraints, I want to flush these registers as they are used so that I can use the limited free time to regenerate them, so I fill them with 0's as soon as they are sent by the waitvid.
Should this pose a problem?
mov frame,#32 '+4 do line = 32 frames mov vscl,vscl_cursor '+4 do cursor pixels (32) :frame1 waitvid color,lpix '34 free cycles until next waitvid add :frame1,#1 '+4 incr lpix source :frame2 mov lpix,#0 '+4 clear pixels when done add :frame2,d0 '+4 incr lpix destination djnz frame,#:frame1 '+4 '* 16 (-18) '+4
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Comments
The effect is the same (the WAITVID D & S values are loaded into the internal color & pixel registers) unless the frame counter expires before WAITVID pauses - then the color & pixel registers get loaded with whatever is on the buses.
Note: Although the manual gives 5 cycles for WAITVID, my testing has shown you need to allow 7 cycles.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum
I've been meaning to ask you about that, I did note you mentioned that in an earlier post as well...
Some of my code is rather tight on timing, so I'd like to get the best understanding of how many cycles I have available.
In the example I posted above, the pixel rate is set at 65Mhz.· From previous posts, I see you suggest that SysClock / VidClock * VSCL pixels should provide how many cycles are required.· So I estimate 80/65 * 32 = 39 (rounded down from 39.384) - 5 cycles for the waitvid (from the manual) yields the 34 free as indicated in my comments.· If I were to use your observed measurements of 7, that yields 32 free.· Is that correct?
I'll start erring on the side of caution and begin using 7 as a default from now on.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Post Edited (Agent420) : 9/10/2009 3:13:40 PM GMT
Nice job Eric. Nobody else here got that simple realization. Thanks for posting it. IMHO, that first sentence should be explicitly stated in the data sheet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!
If anyone wants to verify this, you can explictly set FRQA in my OnePinTVText driver.· When you start to see sparkles in the text, you've gone too fast.· For LSB fonts there are 50+WAITVID CLKs per frame (character).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum
Thanks.· I am trying to discipline myself regarding timing, and do take hub access into account.· I allow 22 for the first hub operation following an unknown state, and from there attempt to keep track of the hub and intersperse 2 x 4 cycle operations as possible to minimize cycle waste.
edit - you can see I used Chip's driver as the framework, where I inherit his par register trick ;-)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Post Edited (Agent420) : 9/10/2009 6:08:29 PM GMT
First, your :read_x rdlong will take 12 cycles since it starts on cycle 20 after the previous rdlong. That means the hub synch penalty is 10 additional cycles (for a worst case of 22 total).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum
Ah, you are correct. I lost count.·I'll be reworking this code anyway as I recently learned I do not require the AND #31 or temp vars because shift operations are apparently limited to the lower 5 bits.
> Slightly OT, but unless someone can provide a code fragment proving 7..22 consumed cycles I claim that any hub access actually consumes 8..23 cycles (I checked, but admit that I could be wrong).
Just how do you go about measuring such a thing?· And I wonder how/if the interleaved execution plays a roll in that?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
This gives you a base offset of 4 in reg1, if you put a nop in between the two mov instructions you'll end up with 8 etc (i.e. basic instruction timing). Similar tests have shown that waitcnt and waitpxx require a minimum of 6 cycles. As for hub instructions, I used cogid for example. Which initially gives you a number of cycles depending on how well you hit the hub window for this particular cog. Then you place a waitcnt in front of the first mov and adjust the delay in single cycle steps covering everything from 16n..16n+15, which in turn leaves you with 16 instruction timings. What I got was a minimum of 8 and a maximum of 23 cycles (worst case miss).
Tip of the day: the first hub window match for a given cog can be achieved when the 2nd instruction is a hub instruction (first one should consume 4 cycles).
Post Edited (kuroneko) : 9/11/2009 11:32:25 AM GMT
On a side note, of all things, it seems like operation cycle counts and hub access counts are a rather techincal odd thing for Parallax to have got wrong in all of their documentation, particularly regarding the hub for which they go into some theory of operation.· Not that things like that can't happen.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
without the additional pasm line after the jump to read_x the rdlong comes 4 clocks earlier than expected (to be in sync) so the penalty is just 4 additional cycles. Thats mean that you can have one pasm line more where indicated without lengthening the loop time
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)
I've now removed the unneeded AND# & associated temp var and moved some lines around, but I think given what I need to process I will have one hub access that is 4 cycles off.· This particular loop is only 4 iterations so that's not a big deal, but I do attempt to code with this type of foresight when possible.· Still, shaved a few cycles from the previous code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔