WAITVID, HUB OPs and general PASM timing questions (Parallax & others)

dMajo · 2009-09-14 19:41

WAITVID (Posted 9/12/2009 5:18 PM (GMT +1)·erroneusly in the Sandbox)

Propeller manual V1.1 (page 225)
... For VGA, each color value’s upper 6-bits is the 2-bit red, 2-bit green, and 2-bit blue color components describing the desired color; the lower 2-bits are "don’t care" bits. ...
·

Propeller datasheet V1.2 (page 12/13)
... For VGA mode, each 8-bit color value is written to the pins specified by the VGroup and VPins field. For VGA typically the 8 bits are grouped into 2 bits per primary color and Horizontal and Vertical Sync control lines, but this is up to the software and application of how these bits are used. ...

1) Which one is true? ·(I understand them in opposition to each other)
2) VGA mode, PLLA@80MHz, pixelclocks 1, frameclocks 5: can I say that at each systemclock cycle 8bits are delivered out:

waitvid   Data32N1,#0
waitvid   Data32N2,#0
waitvid   Data32N3,#0

Is the above code executed without wait penalties? What is output the 5th clock? is the 4th byte repeated?
3) One idea

:

Propeller datasheet V1.2 (page 13) said...
... When FrameClocks cycles occur and the cog is not in a WAITVID instruction, whatever data is on the source and destination busses at the time will be fetched and used. So it is important to be in a WAITVID instruction before this occurs. ...

Is correct this: waitvid DestinationDataBus, SourceDataBus or is the above sentence referred to address busses?
If we look at this from an other point of view waitvid is not the only method to deliver data to the video hardware. Whatever instruction that places the right data on the right bus on the right time will fulfil the request ! If the plla is running the same frequency as system clock how much they differ in sync? How difficult is deliver data with an other instruction that beside delivering the data will also make something useful in the same time?
Has someone ever tryed something like this?

setup VGA mode, single ended PLLA@80MHz with pina on output pin (datastrobe), pixelclocks1 frameclocks 9713717 (one clock more or less then some pasm lines), VGA on P0..7, PinA on P8
 
 
- loop to get in sync
 
- set pixelclocks 4
 
mov buff1,buff1    ' just to be shure that both destination and source busses has the right value
mov buff2,buff2    'these instructions are synced with video hardware window (like the hub one)
mov buff3,buff3
mov buff4,buff4
 
 
result: data output at 80MB/s

PEDIT: Going on with the arguments:
4) PASM instructions (according to datasheet V1.2, chapter 4.8, fig.4)

mov outa, #1
mov outa, #0  ' here at 2nd clock (sising edge?) of this instruction the output changes phisicaly to 1 due to previous instruction

is the above true? and what the behavior for ina?
5) can someone map the execution stages (similar to Fig.4) for HubOps (RDxxxx/WRxxxx) ?
6) is the following true?

COG0                       COG2                       COG4
======================     ======================     ======================
rdlong   temp1,address     rdlong   temp2,address     rdlong   temp3,address            '<= this executes at same system clock (cnt) cycle
nop                           ' waiting window           'waiting window                '<= this executes at same system clock (cnt) cycle
nop                        nop                           'waiting window                '<= this executes at same system clock (cnt) cycle
mov      any,any           mov      any,any           mov      any,any                  '<= this executes at same system clock (cnt) cycle

so this mean that by selectively starting a (wished) cog, the hub can be used as a sync device.
BTW Since the CogID is just returning it's number, how is possible that the it is not aware of it and have to ask that to the hub? In other words, why it takes 7..22 clocks? It seems to me like·if someone ask me my name and·I answer: wait a moment I have to ask it to my mother. Have CogID some other uses?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

dMajo · 2009-09-14 19:41

Phil Pilgrim (PhiPi) said...(Posted 9/12/2009 7:48 PM (GMT +1))
'Not sure why this was posted in the Sandbox, since it's a valid Propeller topic...

1) In describing the WAITVID instruction, I believe the manual erred in being so specific about how the bits are used, since "VGA" mode applies to any general purpose output to as many as eight bits at a time. In cases where the lower two bits are masked out in VCFG, the statement is correct. But this is just one example. One might also wish to have WAITVID drive an 8-bit data bus, for example, in which case all eight bits are used. In the statement cited from the datasheet, again, this is way too specific to be included in a definition of WAITVID and should serve more as an example of how it can be used or is typically used. If taken as examples, the two statements are not mutually contradictory. They merely illustrate two ways in which the eight available bits can be assigned.

2) In the example cited, byte 0 of Data32N1 would be output for five clocks, then byte 0 of Data32N2 for five clocks, and finally byte 0 of Data32N3 for five clocks. This is because the source field is all zeroes, specifying only the first "color" in the destination field. To output all four bytes of the destination, the source would have to look like this:

····%xxxxxxxxxxxxxxxxxxxxxx_01_11_10_01_00

This displays byte 0 of the destination value, then byte 1, byte2, and byte 3, followed by byte 1 again.

3) Based on the statement in the datasheet, something like what you propose might work:
        waitvid colors0,#%11_10_01_00        'Set colors0 and sync to vid.
        mov     colors1,#%11_10_01_00 nr     'Send additional colors
        mov     colors2,#%11_10_01_00 nr
        mov     colors3,#%11_10_01_00 nr
        mov     colors4,#%11_10_01_00 nr
There will be come pipelining issues to consider, and you will have to figure out how to terminate the sequence gracefully. This may be difficult, since the source and destination values of the terminating instruction will already be on their respective busses and output to the video, too ... I think.

-Phil

Addendum: This begs a rather interesting question: If the video hardware merely loads whatever happens to be on the source and destination busses when it runs out of data, what is actually happening during a WAITVID instruction? Theroretically, due to pipelining, the source and destination values of the next instruction should already have been fetched. For it to work as stated, wouldn't the pipeline have to stall? I know I'm not understanding this mechanism correctly. Maybe someone who actually knows can explain it.

potatohead said...(Posted 9/12/2009 9:11 PM (GMT +1)·)
I think the other thread we had recently is part of the answer to this.

Once a waitvid has been initiated, the video generator just keeps cycling through frames, whether or not another waitvid is executed. This continues until VCSL is set to 0, at which point there is no action. What I can't remember is whether or not it stops right away, or if the frame in process is completed first. I think it completes the frame, because I think VCSL is latched when the data on the busses is latched.

When the frame counter hits 0, the data on the source and destination bus gets copied (latched) to an internal working register. This occurs whether or not a waitvid is executed.

The waitvid is all about making sure known data is present and accounted for so that each frame presents the right data, without having to worry about cycle counting.

Linus (that craft demo guy) had posted up some interesting code a while back. Before his demo, he was working on a nicely timed 800x600 @ 80Mhz vga image display demo. In that code, the timing all worked out such that it was difficult to get the frame data timing correct, and one artifact of that was being able to comment out some of the waitvids and still have the image display! There was noise in that video code that showed as "sparkles", and I believe at that time nobody had come to the realization that the latch of the data occurs when the frame counter hits zero. Eric Ball was the first one I recall suggesting this on a recent waitvid related thread.

Given some setup to synchronize things, I think it's totally possible to cycle count and have the data hit the bus at the right times. This would yield faster pixel clocks, or faster data bursts, depending on use. However, said code would be very clock speed and video timing dependent. One advantage I see to the above is getting full color (8bpp) display capability at faster pixel clocks. Most of the loops I've done and that I've seen others do, can't quite feed a Propeller to say, 640 pixels NTSC @ 80-Mhz. 512 is possible, but that's pushing it. Code done this way would yield higher pixel clocks, meaning a single COG could then drive a high resolution TV display, leaving other COGs to build graphics for it.

The dynamic in play regarding Propeller graphics color density is the waitvid loop. To get "full color", which is any pixel, any color, any scanline, it is necessary to run a 4 pixel frame on the waitvid, thus turning the "colors" data into pixels via the #%%3210 trick. At 80Mhz, this boils down to a pixel clock loop that needs to be larger than about 4 PLLA, and that happens to all shake out to about 512 "full color" pixels tops for NTSC color graphics.

If larger frame sizes are used, it's easy to drive very high resolutions. Higher than the best NTSC displays are capable of. The trade off though, is color density. Two or four colors per frame results, and that is basically where all the tile graphics drivers all have the same "color cell" properties. Four colors can be specified for a given frame, and all pixels in that frame then, share that color pool.

For TV, I don't think exceeding these limits are worth it because s-video displays top out right where the Propeller does. 512 pixels is right near the top of what an S-video display will render well.

For VGA, this might be a good trick! Higher color densities at higher resolutions, without having to worry about "tiles" might be worth doing. RAM is then a problem, or the building of dynamic displays where a frame buffer does not exist and images are drawn "on the fly" are required to make use of the output capability. At these resolutions, 32K isn't much of a frame buffer.

I've not done a lot of data output kind of code, but being able to burst at higher speeds is generally something people see use for. Sorting this issue out would play well there too, IMHO. Of course, multiple COGs synchronized is probably faster, at the expense of using COGs up.

Seems to me, a graceful ending to a loop like this could occur with just one final waitvid to render the last frame, or two frames, set VCSL to either 0, to end things, or another frame size to transition between signal states.

eg: Use ordinary "simple", latched waitvid techniques to build sync and blanking signals, then for the active pixel area, use this technique to achieve high pixel clocks with full color density, on the last frame or two, use a waitvid instruction so that VCSL can be changed at the right time.

As for the pipeline, a waitvid latches whatever is on the busses at frame counter 0. I think it latches VCSL too, but not VCFG. That can be changed mid pixel stream. We had a thread on that quite a while back. Changing the output pin, can happen right in the middle of output, for example. I was able to test that one with a video display
loop  waitvid colors, pixels
      nop
      nop
      mov VCFG, test_value
      nop
      nop
      mov VCFG, control_value
      ....  'update indexes and such
      jmp #loop
This kind of thing done with nice, fat pixels clearly showed partial pixels, right where the output was changed. Adding to the nops moved that transition point nicely enough. I don't know about the other VCFG options. It might be possible to change bits per pixel, or something too. Maybe some of that register gets latched at frame counter 0 as well.

Seems to me the pipeline has to stall. Waitvid takes more than 4 cycles to execute. Wasn't the final number observed 7 cycles? (goes off to find other thread)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

dMajo · 2009-09-14 19:52

@PhiPi & Potatohead

1) I hope you'll not be angry on me because I have moved the post here in such way
2) Thanks for beeing the only one that has answered (I thought this arguments would have had a greater public in the community)

3) I do not think the pipeline has to stall nor the waitvid takes more then 5+ clocks. According to (datasheet V1.2, chapter 4.8) fig.4 if the waitvid is our instruction N, when it get to the stage 4 (M+4) the istruction is executed: cog clock hold. Perhaps there is a set/reset FF: the execution set it and the video counter reaching 0 reset it. In this stage just the next(N+1) instruction's op code is feched while on the dest/source busses are still present the actual(N) instruction data and cog's clock is frozen.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

dMajo · 2009-09-14 20:08

BTW: This tread is in some way related to this one. So the main interest is to understand well propeller timings involved in high speed (as higher as possible) data trasmission using single cog.

Bye. Tomorrow it's another day

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

Agent420 · 2009-09-14 20:30

Very interesting (and technical) musing...· it will take a while for me to fully appreciate what is being discussed, but I am also curious how this all works at the hardware level.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

Phil Pilgrim (PhiPi) · 2009-09-14 20:34

dMajo said...
I do not think the pipeline has to stall nor the waitvid takes more then 5+ clocks.

Relooking at the pipeline sequence (which I need to do from time to time), I believe you are right. I was under the misapprehension that, barring a pipeline stall, the source and destination values for the next instruction would already have been fetched while the WAITVID was waiting, but this is not how the pipeline works: only the next instruction will have been fetched at this point.

-Phil

ericball · 2009-09-15 00:54

dMajo said...
WAITVID (Posted 9/12/2009 5:18 PM (GMT +1)·erroneusly in the Sandbox)

Propeller manual V1.1 (page 225)
... For VGA, each color value’s upper 6-bits is the 2-bit red, 2-bit green, and 2-bit blue color components describing the desired color; the lower 2-bits are "don’t care" bits. ...
·

Propeller datasheet V1.2 (page 12/13)
... For VGA mode, each 8-bit color value is written to the pins specified by the VGroup and VPins field. For VGA typically the 8 bits are grouped into 2 bits per primary color and Horizontal and Vertical Sync control lines, but this is up to the software and application of how these bits are used. ...

1) Which one is true? ·(I understand them in opposition to each other)

The statement in v1.1 is incorrect while the statement in v1.2 of the datasheet is effectively correct.· In VGA mode each pair of bits in the pixel register is used as an index to the color register and the 8 bits are delivered to the 8 bins specified by VGroup, using Vpins as a mask.· (This is then OR'd with OUTA and finally masked with DIRA.)· In my NTSC sprite driver I use VGA mode to generate a composite video signal using the standard composite resistor network.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

ericball · 2009-09-15 02:27

Video without WAITVIDs. Hmm... Syncronization is going to be the bugaboo. You'd need a good scope and then create a test application which could be used to empirically determine when sync has been achived. Then WAITVID color,pixel could be replaced with MOV color,pixel NR. One problem is any HUB accesses would cause sync to lost, so you'd have to do like Linus and have a second COG for the HUB access and feed the video COG using the pins as a bus.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

ericball · 2009-09-15 11:26

Thinking about it this morning, I realized that creating a synchronized display kernel is going to be difficult*. There may even be some value in seeing if such a beast could be created first before going through the effort to work out the details of achieving synchronization.

* No HUB access allowed while synchronized. The MOV color, pixel NR must happen every N cycles. To avoid HUB access you're either going to need a second cog feeding the display cog - which limits your pixel frequency, or render the line to a cog buffer (like I did for my NTSC sprite driver) which then limis your horizontal resolution.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

potatohead · 2009-09-15 11:38

@dMajo, no worries here.

I stand corrected on the pipeline, confusing when it occurs.

To make this work, wouldn't frames then have to be some precise multiple of the system clock? I'm having trouble sorting out what that buys us, other than some really cool timing hack.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

dMajo · 2009-09-15 16:12

ericball said...
Video without WAITVIDs. Hmm... Syncronization is going to be the bugaboo. You'd need a good scope and then create a test application which could be used to empirically determine when sync has been achived. Then WAITVID color,pixel could be replaced with MOV color,pixel NR. One problem is any HUB accesses would cause sync to lost, so you'd have to do like Linus and have a second COG for the HUB access and feed the video COG using the pins as a bus.

No, here is the idea

If frameclocks is set to 4 (PLL@80MHz)then this is same time than the ordinary pasm instruction. This is also 1/4 of the rotating hub window. So the 16 clocks for hub window is a multiple of the "pasm(mov)-video" transmission window.

 ' some timing sync setup here
 
                rdlong   dataout,address              'read from hub
                add      address,#4                   'increment address for next read
                mov      dataout,#%11_10_01_00 nr     'output on P0..7 through video circuitry, datastrobe (PLL-pina/b) on P8
                rdlong   dataout,address
                add      address,#4
                mov      dataout,#%11_10_01_00 nr
 
 
OR
 
 
' some timing sync setup here
 
:loop           rdlong   cmd_ramaddress,     preset1
                movi     data,               cmd                   ' selects rd/byte/word/long
                cmp      cmd_ramaddress_bkp, cmd_ramaddress WZ     ' look for new command
:data           rdlong   dataout,            preset2
        if_e    jmp      #:loop         
                mov      cmd_ramaddress_bkp, cmd_ramaddress
                movi     outa,               cpldtrigger           ' alert cpld for transmission
                mov      cmd_ramaddress,     #%11_10_01_00  NR     ' send data
                mov      dataout,            #%11_10_01_00  NR     ' send data
                jmp      #:loop
 
result: 48 clocks => 6.6MB/s continuous random write from hubram to sram

once the video circuitry is synced with hub window then this should be kept forever. Important is to know the timings and develop the right protocol (so that the cpld is aware of how and when data is coming). I am still a newbie in pasm (specially in self-modifing code) but hope the above examples are (code) correct. For sure they should meet the timings!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

Post Edited (dMajo) : 9/15/2009 4:20:06 PM GMT

dMajo · 2009-09-15 16:22

BTW: Anyone knows how the pins in composite video mode are arranged? How they are related to the "waitvid colors, pixels" ? How I should setup the video hardware if I want to use it as a serializer? In this case I will use just one pin: may I choose anyone out of the four?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

dMajo · 2009-09-15 16:42

potatohead said...
@dMajo, no worries here.

I stand corrected on the pipeline, confusing when it occurs.

To make this work, wouldn't frames then have to be some precise multiple of the system clock? I'm having trouble sorting out what that buys us, other than some really cool timing hack.

Can you rewrite this sentence? Even if I understand every word I cannot take out the meaning (sorry for my English, have tryed a translator but the result is of no-meaning to me)

In fact the frames are multiple (4) of system clock (video pll runing at 80MHz)

Some tricks that should help here:
1) run the system clock at 10M*8 (the end divide by 2 should give a 50% duty cycle): this is also the reason I want the cpld to provide the 80M clock to the prop (to avoid pll phase drifts)
2) run the video hw at 160M with frameclocks 8, pixelclocks 2: in this way the video registers can be loaded in the middle of the system clock·period and it should allow a greater timming tollerance (of course if everithing is internally synced on rising edges)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

Phil Pilgrim (PhiPi) · 2009-09-15 16:54

VGA mode is the most straightforward. I would use it in preference to composite mode for any non-video stuff. You can still use a single pin and pick any one of the eight from the chosen group.

-Phil

dMajo · 2009-09-15 17:30

Phil Pilgrim (PhiPi) said...
VGA mode is the most straightforward. I would use it in preference to composite mode for any non-video stuff. You can still use a single pin and pick any one of the eight from the chosen group.

-Phil

The problem is that I havent understood well the relationship between the color/pixel registers.

As you have seen I was sure that in vga mode the colors get multiplexed out independently from pixel (waitvid dataout,#0). Now I have understand this (I hope: waitvid col,(%11_01_11_11_10_01_00)· => c0,c1,c2,c3,c3,c1,c3)

Still I cannot understand how to have 32(data)bits in color or pixel register and bit-bang them out through a single pin with single instruction using vga mode. I thought it can be done with composite 2-colors mode even if I haven't focused the right software setup: perhaps cmode0, pixelclocks1, frameclocks32,· waitvid dataout,($FFFFFFFF)· or ·waitvid·($FFFFFFFF),dataout

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

Phil Pilgrim (PhiPi) · 2009-09-15 17:52

Say, for example, that you want to output your serial data to pin 2 within the eight-pin VGA group. Here's how you would encode WAITVID:

        waitvid   mapping,data

mapping long      %00000100_00000000 'Pin 2 is high for "1" pixels, low for "0" pixels.
data    long      0-0

The "pixels" (i.e. data bits) in data get shifted out one at a time by the video circuitry. But they don't get shifted to a pin. They're used as the index to a lookup table comprised of the mapping ("colors") long. Since there are only two colors, only the lower 16 bits get used: one block of eight for "1" pixels, and one block of eight for "0" pixels. It's one of these blocks of eight bits that gets written to the output pin, as selected by the "pixel" data in data.

-Phil

ericball · 2009-09-15 19:52

dMajo said...
No, here is the idea ··If frameclocks is set to 4 (PLL@80MHz)then this is same time than the ordinary pasm instruction. This is also 1/4 of the rotating hub window. So the 16 clocks for hub window is a multiple of the "pasm(mov)-video" transmission window. once the video circuitry is synced with hub window then this should be kept forever. Important is to know the timings and develop the right protocol (so that the cpld is aware of how and when data is coming). I am still a newbie in pasm (specially in self-modifing code) but hope the above examples are (code) correct. For sure they should meet the timings!

Ahh, true.· If you sync vscl after a HUB sync, then the video counter will be sync'd to the hub instructions.· Okay, then you're back to the first challenge of achieving the synchronization you want.

dMajo said...
BTW: Anyone knows how the pins in composite video mode are arranged? How they are related to the "waitvid colors, pixels" ? How I should setup the video hardware if I want to use it as a serializer? In this case I will use just one pin: may I choose anyone out of the four?

If you are using the video modes as a serializer, then you don't want to use the composite video mode - stick to the VGA mode.· The Propeller datasheet has a good diagram (figure 6) which should help you achieve what you want.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Composite NTSC sprite driver: Forum
NTSC & PAL driver templates: ObEx Forum
OnePinTVText driver: ObEx Forum

dMajo · 2009-09-16 07:19

@PhiPi, thank you for the explanation example: now I get it

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

dMajo · 2009-09-16 18:19

dMajo said...
5) can someone map the execution stages (similar to Fig.4) for HubOps (RDxxxx/WRxxxx) ?

According to fig.4 (datasheet V1.2 - chapter 4.8) it is correct if I assume this:
1) hubop is my instruction N and till stage 3 everything is equal to any pasm instruction.
2) stage 4 is extended to at least 3 cycles: the first waits for hub sync, the last two (because of hub half speed)·exchange data with hub
3) stage 5 becames M+7 and used only for read ops, where after the data has been exchanged must be written to cog register

So the hubop takes 8 real_clock(M) cycles but because the M+1 is devoted to previous instruction(N-1) we say it takes 7.

Moreover can we say that every output manipulation takes really (phisically) place in the second cycle of the next pasm instruction?

mov outa, #1
nop            ' output goes high at the rising edge of the second system clock cycle?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

potatohead · 2009-09-16 18:23

@dMajo: Let's try this!

I had it wrong on the pipeline. I was confusing the operation of the D & S registers with instruction pre-fetch.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

kuroneko · 2009-09-17 00:20

dMajo said...
So the hubop takes 8 real_clock(M) cycles but because the M+1 is devoted to previous instruction(N-1) we say it takes 7.

I'm not sure about that, but my tests tell me that 8 real cycles (min) pass when a hub operation is executed and that's without devoted cycles [noparse]:)[/noparse]

dMajo said...
Moreover can we say that every output manipulation takes really (phisically) place in the second cycle of the next pasm instruction?

From an earlier posting:

Paul Baker said...
So all alterations of a cog's state is performed in the Result stage, whether it's writing to outa or whatever.

So why would the h/w wait for another 2 cycles to actually output the new state? I'm pretty sure that output state is altered at the same time (i.e. result stage) otherwise most of the timing critical drivers around here wouldn't work (my hub DMA certainly wouldn't). FWIW, writing to frqx will show effect during the first cycle of the next instruction at the earliest.

dMajo · 2009-09-17 13:02

kuroneko said...
I'm not sure about that, but my tests tell me that 8 real cycles (min) pass when a hub operation is executed and that's without devoted cycles [noparse]:)[/noparse]

How have you measured/tested this ?

Example 1                                 Example 2
CLK0 : rdlong     outa,address            CLK0 : rdlong     outa,address
CLK7 : movs       outa, value             CLK7 : suppose_8_clocks
CLK11: nop                                CLK8 : movs       outa, value
CLK15: wait_hub_window                    CLK12: nop
CLK16:  rdlong    outa,address            CLK16:  rdlong    outa,addressCLK23:  movs      outa, value             CLK23:  suppose_8_clocks
CLK27:  nop                               CLK24:  movs      outa, value



CLK31:  wait_hub_window                   CLK28:  nop

CLK32:   rdlong   outa,address            CLK32:   rdlong   outa,address

Both the above examples will meet the hub window in sync, but the output behavior will be different: which one is right?

kuroneko said...
So why would the h/w wait for another 2 cycles to actually output the new state? I'm pretty sure that output state is altered at the same time (i.e. result stage) otherwise most of the timing critical drivers around here wouldn't work (my hub DMA certainly wouldn't). FWIW, writing to frqx will show effect during the first cycle of the next instruction at the earliest.

ORG 0
N    M  :  mov  outa, #1   ' instruction driving high the pin
     M+1:
     M+2:
     M+3:
N+1  M+4:  nop
     M+5:                 ' pin going high due to mov instruction
     M+6:
     M+7:
N+2  M+8:  anypasm

The hardware do not wait, the instructions simply take 6 clocks to execute. Since 2 stages are overlapped we have a 4 cycle time throughput but the real(phisical) state compared to software lines is delayed by 2 clocks. When you are accessing prop internals it doesn't matter since everything is running on 4 clocks, but if you are dealing with external timings I think that is important to know that. Eg.: if my N instruction in the example above is the first instruction you should know that the pin will change the state after 6 clocks and not 4.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
· Propeller Object Exchange (last Publications / Updates)

kuroneko · 2009-09-17 13:19

dMajo said...
How have you measured/tested this ?

See my posting here.

As for the 2 cycle offset, I know what you mean now. For ease of use I view instruction cycles as SDER (and counting S as 1) while the full sequence is IdSDER. So the result phase of instruction N coincides with the decode phase of instruction N+1 which also happens to be the 2nd cycle. Sorry, my mistake.

Cluso99 · 2009-09-19 03:49

I seem to recall that I thought the wait instructions caused a flush of the pipline. This could account for the extra 2 cycles. No proof though.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade, RetroBlade,·TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

WAITVID, HUB OPs and general PASM timing questions (Parallax & others)

Comments