Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 103 — Parallax Forums

Propeller II update - BLOG

1100101103105106223

Comments

  • ctwardellctwardell Posts: 1,716
    edited 2013-11-30 10:40
    Heater. wrote: »
    Well, if it won't fit it won't fit. So there is no way out.
    The way you put it that bus does sound like a gross waste of space.

    Still makes me nervous though. We are going to get a rash of suggestions for things to fill the freed up space with. More tweaks and features and...

    I'd like to see the P2 spec frozen as far as adding new features.

    The only open items would be bug fixes and the SERDES / CRC items and instructions that Chip has already agreed will likely be included.

    We have the makings of a really nice chip here, I think we could best help Chip and Parallax at this point not providing any more squirrels to chase.

    C.W.
  • ctwardellctwardell Posts: 1,716
    edited 2013-11-30 10:48
    cgracey wrote: »
    We could double or quadruple the AUX RAMs, though.

    I'm sure we can find a way to use that extra AUX RAM if you end up having the space for it.

    C.W.

    After looking at the instruction set encoding it looks like going double to 512 longs should be easy since 9 bits are available for immediate manipulation of the pointers.

    Going to 1024 longs limits the use of the immediate option in some cases, but maybe that would still be acceptable.
  • ctwardellctwardell Posts: 1,716
    edited 2013-11-30 10:52
    potatohead wrote: »
    Finally, I've not kept up with all the AUX RAM changes. How quickly can code be executed from them? To me, that seems the crux of the answer to the question, "More HUB or more AUX?"

    I haven't kept up as much as I should, but if I recall the AUX is what was the CLUT early on.

    If that's the case I had messed around with an LMM like loop that worked really well.

    Back then I called it CLMM, I guess now it would be AUXMM...

    With more AUX space it might become a useful memory model.

    C.W.
  • jmgjmg Posts: 15,155
    edited 2013-11-30 13:04
    cgracey wrote: »
    No matter what, every pin will have a 9-bit DAC and a delta-sigma ADC. Those are built-in and all fit underneath the power rings.

    What is taking 20% of the chip area, or 10 square mm, is the huge DAC bus for DAC signals that come out of cogs and update on every system clock or video clock. If we eliminate that bus, we'll have double the space we have now for the core, which would be a huge deal..
    cgracey wrote: »
    This stuff is all very simple to do, actually. I'm also aware that what we currently have for the core logic is bigger than our last fab attempt, and probably won't fit into the old space. We're going to have to eliminate at least part of that huge DAC bus.

    That is an obvious place to prune, and it sounds like a prune is essential.

    cgracey wrote: »
    The limitation caused by getting rid of that huge DAC bus would be that certain pins would now be tied to certain cogs for outputting new DAC data on every clock. It wouldn't affect anything else, like static DAC updates or ADC's. It would just mean that for outputting video or analog CTR signals, certain pins would relate to certain cogs. Is that a limitation that we can live with?

    Can you expand on what that means, exactly ?

    What is a static DAC updates or ADC's. ? Slower access ?

    Every pin still has a usable 9-bit DAC and a usable delta-sigma ADC ?

    I think you are saying COGS can still talk to any DAC/ADC, but with some bandwidth caveats ?
    What are those bandwidth caveats ?

    Streaming DAC use (outputting new DAC data on every clock.) is relatively rare, and if that is partly MUXed/encoded in COG number, that is not a bad thing.
    It would only be a brick wall issue, if other stuff was also encoded into COG numbers, and those conflicted.

    What would be the total limits (old and new) on streaming DAC pins with this change (COG placements aside )

    There are power and EMC reasons not to send all high-bandwidth signals to all places anyway.

    I can see significant ATE markets for Prop 2, and those are not price sensitive.

    Can you list how many DACS / ADCs and what access speeds they could have and how those map to COGs ?
  • localrogerlocalroger Posts: 3,451
    edited 2013-11-30 13:24
    I have no problem with DAC outputs being tied to cogs. As a practical matter video output tends to come from one cog anyway, and we have software control over starting particular cogs. Frankly I thought from the beginning that having DAC on-chip was a bit of an extravagance; it's not like resistors are expensive. As cool features go, I think being able to hook the chip straight up to a HDMI cable comes second to things like more RAM.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-30 14:12
    cgracey wrote: »
    No matter what, every pin will have a 9-bit DAC and a delta-sigma ADC. Those are built-in and all fit underneath the power rings.
    Excellent!
    What is taking 20% of the chip area, or 10 square mm, is the huge DAC bus for DAC signals that come out of cogs and update on every system clock or video clock. If we eliminate that bus, we'll have double the space we have now for the core, which would be a huge deal..

    The limitation caused by getting rid of that huge DAC bus would be that certain pins would now be tied to certain cogs for outputting new DAC data on every clock. It wouldn't affect anything else, like static DAC updates or ADC's. It would just mean that for outputting video or analog CTR signals, certain pins would relate to certain cogs. Is that a limitation that we can live with?
    This is easier to live with because this would form part of the users hw & sw design. So these would be coginit'd first. The rest of the cogs can be assigned in any way, and be dynamic if required by the user.
    This is not the restriction most micros have, where they share fixed pins, and the use of one peripheral means you lose another. While we have tended to keep the standard P1 pins for keyboard, mouse, tv, vga, its simple enough to change - this has really only been done to help the hobbyist.
    To sum up - seems we have to do it anyway, but to me 20% is well worth this very simple sw restriction.
    Such a change would allow all 1.8V pin logic to be synthesized within the core and drastically reduce timing complexities for I/O. Also, we could put things like parallel DAC updating and DAC dither into the core-side logic for each pin. We could even have ADC tallies computed per pin, or PWM output. This stuff is all very simple to do, actually. I'm also aware that what we currently have for the core logic is bigger than our last fab attempt, and probably won't fit into the old space. We're going to have to eliminate at least part of that huge DAC bus. I'm thinking of getting rid of it, altogether. What do you guys think?
    All this sounds like a huge gain for little restriction.

    With all the new features, an AUX RAM increase now sounds way better than a HUB RAM increase to me (for any spare space). We are finding that the AUX is extremely useful/powerful, way beyond its original clut usage.

    There are a couple of bottlenecks that seem to now be known that if solved would provide for even better performance. Feature creep, probably yes. Lets see if its simple and worthwhile. Discussion is easy and Chip can just say no. But as we have found, simple ideas have tweeked Chips interest and found to be simply a couple of lines of Verilog and its done. Look at the task switching heater suggested - yes it was tweeked more than a few times, but ozpropdev has proved its capabilities beyond belief!

    1. Hub bandwidth:
    Might there be a way to utilise unused slots. Currently we have 1 slot per cog in round-robbin...
    Cog 0,1,2,3,4,5,6,7
    Might this be possible where the cog number in brackets is the cog that would get a slot if the original cog did not require its slot...
    Cog 0 (4) , 1 (5) , 2 (6), 3 (7), 4 (0) , 5 (1) , 6 (2), 7 (3)

    This could (up to) double a cogs bandwidth. This could either be normal, or enabled on a per cog basis with an instruction - perhaps just an extra bit in the coginit/cognew instruction (eg msb in the hub start address).

    Chip said to do a good HD required ~590MB/s bandwidth and this method could be made to yield this by careful sw. Seems to me that this would not be so difficult to implement???

    2. AUX to/from HUB / COG transfers:
    Might it be possible to build the Aux Ram block differently. Since the block is possibly being increased anyway, this might not be that difficult.

    If the AUX Ram was built with the HUB interface (existing 128 bits wide for Quad Longs) being separated for read and write paths, the read path could be shared with the 32 bit read interface of the video circuitry. This would permit simultaneously writing to the Aux Ram from Hub and reading from the Aux Ram to the Video circuitry.


    P2_aux_001.jpg


    When the Video circuitry is not being used, then the hub has full read/write access, and the cog has full read/write access, with the only caveat that simultaneous writes to the same aux location would cause a cog stall (or the cog write fails) - this is a reasonable result of a programming bug anyway.

    This method gives full speed cog access to aux ram as well as full (1:8) hub access to the aux ram.

    Secondly, could a 16 long block of Aux Ram be built with a 4 port interface, so that it could be windowed/swapped into cog space, substituting $1E0..$1EF cog addresses. There might be other ways to achieve this, but I think this idea conveys what I am attempting to describe. This method would permit...
    * Execution in place up to 16 longs in a loop
    * Provide full instruction access to a small block (AND/OR/XOR...)
    * The small block could be an update section of a video block to be processed by the cog, then written back to hub quickly.

    Now, could the Read & Write Quad be expanded to also do 4 Quad Longs in 4 successive hub slots?

    Also, could a non-blocking instruction (or mode for the quad read and/or write instruction) be implemented where the transfer HUB to/from AUX would be "n" quad longs. A special register would hold the "n" count. This could be used here for...
    * automatic video fetching of a whole line or part line (reduces cog overhead)
    * automatic fetching of a block of hub for an overlay
    * automatic updating of a block of video memory back to hub (eg the video game cog)
    * cache transfers, and perhaps many other uses where a block could be transferred without further interruption to the cogs execution
    979 x 467 - 81K
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 15:01
    jmg wrote: »
    That is an obvious place to prune, and it sounds like a prune is essential.




    Can you expand on what that means, exactly ?

    What is a static DAC updates or ADC's. ? Slower access ?

    Every pin still has a usable 9-bit DAC and a usable delta-sigma ADC ?

    I think you are saying COGS can still talk to any DAC/ADC, but with some bandwidth caveats ?
    What are those bandwidth caveats ?

    Streaming DAC use (outputting new DAC data on every clock.) is relatively rare, and if that is partly MUXed/encoded in COG number, that is not a bad thing.
    It would only be a brick wall issue, if other stuff was also encoded into COG numbers, and those conflicted.

    What would be the total limits (old and new) on streaming DAC pins with this change (COG placements aside )

    There are power and EMC reasons not to send all high-bandwidth signals to all places anyway.

    I can see significant ATE markets for Prop 2, and those are not price sensitive.

    Can you list how many DACS / ADCs and what access speeds they could have and how those map to COGs ?

    I don't have any exact details worked out yet, but the only limitation you would have, compared to before, would be that DAC signals which update on every clock would now relate cogs to pins. Writing static values to DACs in software and use of the ADCs would be unchanged. The cog-to-pin fast DAC mapping would look like this:

    cog0 = P0..P11
    cog1 = P12..P23
    cog2 = P24..P35
    cog3 = P36..P47
    cog4 = P48..P59
    cog5 = P60..P71
    cog6 = P72..P83
    cog7 = P84..P91

    So, each set of 12 pins would be driven by a certain cog for fast output, while any cog could write a static value for any of those DACs. Since each cog outputs a set of 4 channels, those sets could appear 3 times across 12 pins.
  • pedwardpedward Posts: 1,642
    edited 2013-11-30 15:22
    Hey Chip, something jmg said in another thread got me thinking.

    If you change the COG so it uses a message bus to set pin DAC values, you could make it so a COG can update any arbitrary DAC value with a single instruction.

    Right now you have the limitation of 4 DACs per COG, but what if you added an instruction called SETDAC pin, value that allows any COG to set any pin DAC to an arbitrary value?

    This would be great for the low-bandwidth DAC usage. Same for ADC, if that were possible. 90+ pins have DAC and ADC, why not make them arbitrary to access?
  • jmgjmg Posts: 15,155
    edited 2013-11-30 15:23
    cgracey wrote: »
    Writing static values to DACs in software and use of the ADCs would be unchanged.

    Does that path mean any COG to any DAC ?
    What speed and time-slot restrictions are on this ?
    cgracey wrote: »
    So, each set of 12 pins would be driven by a certain cog for fast output, while any cog could write a static value for any of those DACs. Since each cog outputs a set of 4 channels, those sets could appear 3 times across 12 pins.

    Where you say 'each cog outputs a set of 4 channels, those sets could appear 3 times across 12 pins.' does that mean the video feed can be 4-DACS-wide, and mapped to a choice of (any?) 4 within that 12 available ?

    For the 8 DACS not mapped to the streaming Video-feeder, what COG access choices are there ?
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 15:25
    Cluso99 wrote: »
    1. Hub bandwidth:
    Might there be a way to utilise unused slots. Currently we have 1 slot per cog in round-robbin...
    Cog 0,1,2,3,4,5,6,7
    Might this be possible where the cog number in brackets is the cog that would get a slot if the original cog did not require its slot...
    Cog 0 (4) , 1 (5) , 2 (6), 3 (7), 4 (0) , 5 (1) , 6 (2), 7 (3)

    This could (up to) double a cogs bandwidth. This could either be normal, or enabled on a per cog basis with an instruction - perhaps just an extra bit in the coginit/cognew instruction (eg msb in the hub start address).

    This is a sound solution to the old hub bandwidth problem that we've debated forever. It's very simple to implement and could be enabled/disabled via special instructions. This spaces out hub opportunities perfectly. If this had come up before, it would have already been implemented. I'll see about doing this. Barring any problems, this is a 2-hour task, from start to end.
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 15:26
    pedward wrote: »
    Hey Chip, something jmg said in another thread got me thinking.

    If you change the COG so it uses a message bus to set pin DAC values, you could make it so a COG can update any arbitrary DAC value with a single instruction.

    Right now you have the limitation of 4 DACs per COG, but what if you added an instruction called SETDAC pin, value that allows any COG to set any pin DAC to an arbitrary value?

    This would be great for the low-bandwidth DAC usage. Same for ADC, if that were possible. 90+ pins have DAC and ADC, why not make them arbitrary to access?

    That's exactly what I plan on doing! That's how we'll still get random access to DACs.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-30 15:54
    cgracey wrote: »
    This is a sound solution to the old hub bandwidth problem that we've debated forever. It's very simple to implement and could be enabled/disabled via special instructions. This spaces out hub opportunities perfectly. If this had come up before, it would have already been implemented. I'll see about doing this. Barring any problems, this is a 2-hour task, from start to end.
    Absolutely fantastic Chip!
    It's about thinking the problem thru many times, and then something pops that is quite simple. Everything else is just wasted discussion, but necessary to get there. Your mention of the hub/cog bandwidth limitation to 400MB/s when you really wanted ~590MB/s got me rethinking the problem again, but the other ideas were too complex.
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 16:11
    jmg wrote: »
    Does that path mean any COG to any DAC ?
    What speed and time-slot restrictions are on this ?



    Where you say 'each cog outputs a set of 4 channels, those sets could appear 3 times across 12 pins.' does that mean the video feed can be 4-DACS-wide, and mapped to a choice of (any?) 4 within that 12 available ?

    For the 8 DACS not mapped to the streaming Video-feeder, what COG access choices are there ?

    Any cog can write any DAC via the CFGPINS instruction. CFGPINS takes 16 clocks. There is no slot restriction, as only one cog should be trying to configure any certain pin, just like two cogs shouldn't be writing the same pin. So, any static DAC update would take 16 clocks.

    Cog 0's four DAC output channels would be available as follows:

    CH0 - P0
    CH1 - P1
    CH2 - P2
    CH3 - P3
    CH0 - P4
    CH1 - P5
    CH2 - P6
    CH3 - P7
    CH0 - P8
    CH1 - P9
    CH2 - P10
    CH3 - P11

    Cog 1's DAC output channels would be similarly available via P12..P23, and so on.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-11-30 16:15
    cgracey wrote: »
    It's very simple to implement and could be enabled/disabled via special instructions. This spaces out hub opportunities perfectly. If this had come up before, it would have already been implemented. I'll see about doing this. Barring any problems, this is a 2-hour task, from start to end.

    As a self confessed multi-tasking ADDICT this is a BIG one! :):)
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-30 16:23
    Chip,
    How does the current SDRAM interface work - I presume we are driving it via a cog, and that cog transfers data between the sdram and the hub?
    Would this also work if the user provided a pair of SDRAMs (2 * words) to get 32bit transfers (using extra 16 + 3 control pins)?

    The geometry of the P2 is such that each cog is in a wedge shape, coming together in the centre of the die.
    Might it be possible to have a small ram block in the centre of the die that would act as a common ram block between all cogs? I was thinking of maybe 16 * 32 + 1 bits. This could then be used for the sdram buffer (could be used for direct access between cogs too) directly accessible by each cog without going via the hub. Sort of like the PortD but without the complexities. Perhaps it could even replace the PortD???

    The block would preferably have a read port and a write port, so that simultaneous read and writes could occur, but not imperative to keep it simple. The block would provide first cog in first access and would be simply a read/write to a long in the block. The extra bit per long (in the block) would be set when written and cleared when read, via the WZ/WC bits of the access instruction.

    The most likely scenario of the SDRAM usage is:
    (1) As XMM for large programs
    (2) As video ram
    (3) Interpreted code (low priority)

    Would this be something you would consider useful or not?
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 16:26
    Cluso99 wrote: »
    Absolutely fantastic Chip!
    It's about thinking the problem thru many times, and then something pops that is quite simple. Everything else is just wasted discussion, but necessary to get there. Your mention of the hub/cog bandwidth limitation to 400MB/s when you really wanted ~590MB/s got me rethinking the problem again, but the other ideas were too complex.

    I'm thinking that there is a general-case rule we could come up with, for which your proposal is just a subset (perhaps the first tier of the rule).

    The criteria are:

    1) Every cog gets 1st priority to its own slot
    2) If a cog doesn't use its slot, it must be equitably offered to all other cogs, in such a way that no particular cog has an unfair advantage (ie no simple 0..7 priority list, but one that it distributed, based on slot number).

    How to do this?
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-30 16:37
    DE0 Compilation

    Chip,
    You are having problems fitting a single cog into the DE0.

    Does making the DAC changes reduce the FPGA size enough?

    If not, what block(s) (if any) can you remove to make your life easier?

    ozpropdev (and others):
    What could be left out for your work?
    Brian, you are pushing the envelope the most here, so your input is likely the most valuable.

    I am not using video atm (other than to see what others are doing, and for that I can load a new fpga design). I suspect I am in the minority here.
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 16:41
    Cluso99 wrote: »
    DE0 Compilation

    Chip,
    You are having problems fitting a single cog into the DE0.

    Does making the DAC changes reduce the FPGA size enough?

    If not, what block(s) (if any) can you remove to make your life easier?

    ozpropdev (and others):
    What could be left out for your work?
    Brian, you are pushing the envelope the most here, so your input is likely the most valuable.

    I am not using video atm (other than to see what others are doing, and for that I can load a new fpga design). I suspect I am in the minority here.

    Before I knew it was possible to fit into the DE0-Nano, I had disabled CTRB and freed up quite a few LE's in the FPGA.
  • jmgjmg Posts: 15,155
    edited 2013-11-30 16:43
    cgracey wrote: »
    Any cog can write any DAC via the CFGPINS instruction. CFGPINS takes 16 clocks. There is no slot restriction, as only one cog should be trying to configure any certain pin, just like two cogs shouldn't be writing the same pin. So, any static DAC update would take 16 clocks.

    Cog 0's four DAC output channels would be available as follows:

    CH0 - P0
    CH1 - P1
    CH2 - P2
    CH3 - P3
    CH0 - P4
    CH1 - P5
    CH2 - P6
    CH3 - P7
    CH0 - P8
    CH1 - P9
    CH2 - P10
    CH3 - P11

    Cog 1's DAC output channels would be similarly available via P12..P23, and so on.

    So the DAC feature summary would be like this ?

    * 96 DACs, all available to any COG, on a 16 cycle CFGPINS path, one at a time basis.

    This means 10MHz any-DAC updates are possible at 160MHz fSYS ?

    Does the DAC update delay from COG write time change ?
    Is that blocking ( ie 16 cycles pause in that thread ? ) or can the COG-thread do other work for 15 of limit 16 cycles ?


    * Quad channel Fast-path (video) to DACs (fpDAC) possible, for a max of 4*8 = 32 fpDACS with all 8 COGs busy helping.
    Those fpDACs Quad can map to groups of 12 pins, so any of 96 pins can be fpDAC with right setups.

    fpDAC update limit is fSYS ? - 4 wide, to what depth ? Timer Paced ? Sync'd across COGs ?
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 16:43
    cgracey wrote: »
    I'm thinking that there is a general-case rule we could come up with, for which your proposal is just a subset (perhaps the first tier of the rule).

    The criteria are:

    1) Every cog gets 1st priority to its own slot
    2) If a cog doesn't use its slot, it must be equitably offered to all other cogs, in such a way that no particular cog has an unfair advantage (ie no simple 0..7 priority list, but one that it distributed, based on slot number).

    How to do this?

    I came up with something, but I'm not sure if it makes the most sense.

    In the table below, the first column is the cog which is offered its primary slot, followed by the order in which other cogs are offered the same slot, in case the first cog doesn't want it. In order for a cog to receive a second-hand slot, it must enable its give-me-whatever-slot-is-available mode. The idea is to change the offer order for each slot, so that no slot has a perpetual advantage in receiving second-hand slots.
    0	4	2	6	1	5	3	7
    1	5	3	7	2	6	4	0
    2	6	4	0	3	7	5	1
    3	7	5	1	4	0	6	2
    4	0	6	2	5	1	7	3
    5	1	7	3	6	2	0	4
    6	2	0	4	7	3	1	5
    7	3	1	5	0	4	2	6
    
  • ctwardellctwardell Posts: 1,716
    edited 2013-11-30 16:44
    Cluso's scheme for Hub Access creates 'Hub Access Pairs'.

    Hub Access Pairs

    COG0 and COG4
    COG1 and COG5
    COG2 and COG6
    COG3 and COG7

    Assuming the goal is that one of the COGs in the pair needs high speed deterministic hub access and the other COG in the pair does not, would it make sense to have the following options for each COG in a pair.

    Option 1: Use 'my' hub slot only
    Option 2: Use 'my' hub slot and 'my twins' hub slot if available
    Option 3: Use 'my twins' hub slot if available

    For 'normal' use both COGs would be set to Option1

    For the case when one of the COGs needs high speed deterministic access the 'high priority' COG would be set to option 2 and the other 'low priority' COG in the pair to option 3.
    In this case the 'low priority' COG would only access the hub when the 'high priority' COG was not using it, say between scan lines.
    This lets the 'high priority' COG control hub access without setting up a communication process with the 'low priority' COG

    At this point I assume the COG thread waiting for hub access will just block while waiting on a hub slot.

    C.W.
  • jmgjmg Posts: 15,155
    edited 2013-11-30 16:47
    cgracey wrote: »
    2) If a cog doesn't use its slot, it must be equitably offered to all other cogs, in such a way that no particular cog has an unfair advantage (ie no simple 0..7 priority list, but one that it distributed, based on slot number).

    If you make it too equitable, does that not cost determinism ? With the fixed choices, if you really want a COG to know it has 2x bandwidth, you can choose to use the complementary COG for something else.
  • jazzedjazzed Posts: 11,803
    edited 2013-11-30 16:49
    Woo Hoo!!! No longer have to wear a strict round robin straight-jacket.

    It is still easy enough to pass information between COGs using serial IO or port D right?.

    Why not let one COG have the entire HUB bus if that's what the programmer wants?

    Would be nice to have full speed HUB fetch and execute at 200MHz.
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 16:54
    jmg wrote: »
    So the DAC feature summary would be like this ?

    * 96 DACs, all available to any COG, on a 16 cycle CFGPINS path, one at a time basis.

    This means 10MHz any-DAC updates are possible at 160MHz fSYS ?

    Does the DAC update delay from COG write time change ?
    Is that blocking ( ie 16 cycles pause in that thread ? ) or can the COG-thread do other work for 15 of limit 16 cycles ?


    * Quad channel Fast-path (video) to DACs (fpDAC) possible, for a max of 4*8 = 32 fpDACS with all 8 COGs busy helping.
    Those fpDACs Quad can map to groups of 12 pins, so any of 96 pins can be fpDAC with right setups.

    fpDAC update limit is fSYS ? - 4 wide, to what depth ? Timer Paced ? Sync'd across COGs ?

    You got it all right. Currently, CFGPINS is blocking. Once CFGPINS finishes with a static DAC update, the PIN reflects the new value - no jitter.

    fpDAC update limit is system clock rate or video clock rate - whichever is selected for that DAC.
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 16:57
    jazzed wrote: »
    Woo Hoo!!! No longer have to wear a strict round robin straight-jacket.

    It is still easy enough to pass information between COGs using serial IO or port D right?.

    Why not let one COG have the entire HUB bus if that's what the programmer wants?

    Would be nice to have full speed HUB fetch and execute at 200MHz.

    With what Cluso99 came up, then augmented to allow any other cog a slot, a single cog WOULD get all the slots if no other cogs were needing one.
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 17:00
    ctwardell wrote: »
    Cluso's scheme for Hub Access creates 'Hub Access Pairs'.

    Hub Access Pairs

    COG0 and COG4
    COG1 and COG5
    COG2 and COG6
    COG3 and COG7

    Assuming the goal is that one of the COGs in the pair needs high speed deterministic hub access and the other COG in the pair does not, would it make sense to have the following options for each COG in a pair.

    Option 1: Use 'my' hub slot only
    Option 2: Use 'my' hub slot and 'my twins' hub slot if available
    Option 3: Use 'my twins' hub slot if available

    For 'normal' use both COGs would be set to Option1

    For the case when one of the COGs needs high speed deterministic access the 'high priority' COG would be set to option 2 and the other 'low priority' COG in the pair to option 3.
    In this case the 'low priority' COG would only access the hub when the 'high priority' COG was not using it, say between scan lines.
    This lets the 'high priority' COG control hub access without setting up a communication process with the 'low priority' COG

    At this point I assume the COG thread waiting for hub access will just block while waiting on a hub slot.

    C.W.

    That's a great idea! Can we fold this into a general case, somehow?
  • cgraceycgracey Posts: 14,133
    edited 2013-11-30 17:03
    cgracey wrote: »
    That's a great idea! Can we fold this into a general case, somehow?

    Maybe we could have three modes:

    1) normal - I only use my slot (default)
    2) vacuum - I'll use any slot
    3) I only use slot n (which can be a high-priority counterpart's slot)
  • jazzedjazzed Posts: 11,803
    edited 2013-11-30 17:10
    cgracey wrote: »
    With what Cluso99 came up, then augmented to allow any other cog a slot, a single cog WOULD get all the slots if no other cogs were needing one.

    Cool. Looking forward to silicon.

    BTW, When you get a chance ... any luck with SERDES ?
  • ctwardellctwardell Posts: 1,716
    edited 2013-11-30 17:14
    cgracey wrote: »
    Maybe we could have three modes:

    1) normal - I only use my slot (default)
    2) vacuum - I'll use any slot
    3) I only use slot n (which can be a high-priority counterpart's slot)

    The issue I see with item 2 being ANY slot is the loss of determinism.

    C.W.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-30 17:16
    cgracey wrote: »
    I came up with something, but I'm not sure if it makes the most sense.

    In the table below, the first column is the cog which is offered its primary slot, followed by the order in which other cogs are offered the same slot, in case the first cog doesn't want it. In order for a cog to receive a second-hand slot, it must enable its give-me-whatever-slot-is-available mode. The idea is to change the offer order for each slot, so that no slot has a perpetual advantage in receiving second-hand slots.
    0    4    2    6    1    5    3    7
    1    5    3    7    2    6    4    0
    2    6    4    0    3    7    5    1
    3    7    5    1    4    0    6    2
    4    0    6    2    5    1    7    3
    5    1    7    3    6    2    0    4
    6    2    0    4    7    3    1    5
    7    3    1    5    0    4    2    6
    
    I am of the opinion, it is more about giving priority to a cog(s), rather than be equitable. I could be wrong.

    Scenarios:

    (1) Video:
    Probably I would want 2 cogs with priority.
    I would setup cog 0 as the video out cog, and ensure cog 4 only had access during cog 0's free time. Therefore, I would control what code was running in cog 4. It would obey a regime - a PortD internal pin or a lock (does a lock use the hub slot, or just wait for the slot time?)
    A second cog may need priority too, for fast video updates. I would allocate another similar pair of reciprocal cogs (high access & low access).

    (2) XMM:
    This is likely to be only 1 cog, and would not require deterministic timing, but want fast access overall.
    Again I would setup a reciprocal pair, but this time it would just be a low access cog as the slower cog.

    If I could use a second and/or third level that might be advantageous too, like...

    My reasoning behind setting pairs 0 & 4, 1 & 5, etc was it allowed some processing in between instructions, and allowed any setup times that the hub may be requiring (is there any setup time?). Therefore I didn't think adjacent cogs would be able to work with fast enough arbitration. If this isn't the case then perhaps a different approach could be done.

    I will wait for a few comments to see if anyone agrees/disagrees with my concept that the priority is not equitable.
Sign In or Register to comment.