Shop OBEX P1 Docs P2 Docs Learn Events
P16+X32B - what might you want as a minimum ? - Page 2 — Parallax Forums

P16+X32B - what might you want as a minimum ?

2

Comments

  • RossHRossH Posts: 5,454
    edited 2014-04-06 20:33
    potatohead wrote: »
    That depends on what replaces it.

    If we kludge in the P1 ROM, does it plop down right in the new chip memory space? Or, is it at the top of RAM? Bottom?

    Does booting into SPIN hose up code protect? Yeah, I'll walk on that most likely. It's important.

    Like I said, it should be discussed.

    Given the modest cost, having a baseline interactivity isn't an unreasonable minimum, and we need to do code protect, fuses, crypto anyway, and we know how to do it all already.

    Honestly, if it's too much of a dog, I might walk on that too.

    I agree code protect may end up in the "minimum" list. That has been requested many times, and appears to have limited the applicability of the P1. And if that means we can't boot into Spin then so be it.

    But (as far as I know) we not had these discussions yet. I'm happy to have them, but these things should be made part of the "minimum" list at the end of such a discussion - but not at the beginning.

    Ross.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-06 20:36
    RossH wrote: »
    Okay, I am happy to vote for 1-9. Items 7, 8 and 9 are technical details that I'm sure Chip can sort out.

    However, I'd still like to know where 7 & 8 crept in, because I must have missed that part of the discussion. Can you provide a link?

    Ross.
    I don't have a link, but I did ask the question about Flash vs Eeprom and Chip said it WILL use SPI Flash.

    A requirement from Parallax customers is Security so I think that mandates the small boot scenario.
    Chip & Ken have said the 5 points need to be included in any design.

    Why wouldn't we use the monitor style mode with minimal ROM being reconfigured RAM?
    Spin can be part of the Flash boot and would be easily done in sw.
    However, I don't recall if this was specifically discussed.

    Perhaps I assumed from Chips P32X32B die usage that because there was no die usage for ROM that the ROM would be part of the RAM. By keeping Rom to a minimum and using the Ram, quite a bit of driver logic/space/power was saved in the P2. Since its a no-brainer I simple anticipated it would be so.
  • potatoheadpotatohead Posts: 10,261
    edited 2014-04-06 20:38
    Isn't it all just one discussion?
  • jmgjmg Posts: 15,171
    edited 2014-04-06 20:44
    I think that due to my understanding the P1, I've always assumed that any hub access stalls the pipeline (thus only drawing quiescent current) until the hub cycle arrives, un-stalling the pipeline.

    Based on your message, I think your understanding was that while waiting for the next hub cycle, the cog runs at full power, and only drops to quiescent current on WAITPEQ/WAITPNE/WAITCNT.

    Not quite, HUB can stall the COG. based on timing, but what if there is no HUB slot, or they are rare ?
    My scheme gives power envelope control (also speed deterministic) over cases that are not hub-locked, and also works in cases with no HUB access at all.

    Rather than user code & data flows setting the envelope, it is a specific design decision in the Map table (and thus harder to break accidently)

    In an ideal word, there would be no Power envelope, and all this would be superfluous :)
    In the 180nm word, Chip has these parts so well optimised for speed, that Power is a big issue.
    At this point, I am REALLY interested in how much current a cog uses while waiting for a hub cycle!
    Current HW has ~10% power level idle on Waiting opcodes, and I guess HUB is, (or could be), the same.
    It may not have been coded-in that way, as they were always close together.
    At N/128 mapping, that detail matters more, and it could be coded to behave that way.
  • tonyp12tonyp12 Posts: 1,951
    edited 2014-04-06 20:44
    >will not burn much power at all as they will mostly be waiting for something to do

    What if the user loads 32cogs that don't use hub access or any waits, full blast would do what?
    1: Heat sensor will shutdown it down and reboot once it return to regular temp
    2: each cog has a mhz multiplier for speed control, call it carbon credits. the total for all 32cogs can not exceed 60% of full blast.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-06 20:44
    Cluso99 wrote: »
    Brilliant!

    Those simple little cogs will not burn much power at all as they will mostly be waiting for something to do. And it is the props philosophy in a nutshell.
    Then simple hubexec rounds this out to handle the big and faster programs.

    Thank you!

    Once I had a good day, and got over losing 8 cog p2 for now, the idea popped out easily, and it gives us a large chunk of the capabilities of hubexec, eliminates tasks, threads..
    Cluso99 wrote: »
    Do you want P1 instruction compatibility or P2S (the P1 subset using P2 opcodes) ?

    Based on what I've read, I see no harm and some benefit to keeping P1 binary compatability. It gets us PropGCC and usable (for 32KB) spin immediately

    I'd add my new JMP / CALL / LOAD instructions, with the lowest 17 bits specifying a 128K LONG address. I'd use say $1EF as LR, hard wired (only steals 1 register from old pasm code, does not use 0, no need for mapping register / instruction / multiplexer, when not used by CALL/JMP it is simply the regular $1EF cog register.

    Y'all wanted P1 simplicity. This gives that, and a good chunk of P2 hubexec, with extremely fine grained 100% deterministic hub timing, that could be set up to be exactly P1 hub timing compatible for 8 cogs.
    Cluso99 wrote: »
    Chip worked out we could have 32 P1 2clock cogs and 512KB hub.

    Yep, and I love that - we should be able to fit in JMP/CALL/LOAD... heck, we could use the old MUL opcode, or one of the few unused mappings (that I vaguely recall might exit)
    Cluso99 wrote: »
    Let us scale that back slightly to say 24+ cogs so we have a little room for the hub slot mechanism and the simple hubexec mode.

    The beauty is that we don't have to!

    - If we don't start a cog it does not need any slots

    - If we have mooch, we can start a cog without any assigned slots, let it mooch unused slots. Can you say perfect for background garbage collection????

    - or mooch can speed up a hubexec cog.

    I want all 32 cogs, 512KB, I love the luxury that they are there if I need them - and take no power if not started or waiting for SOMETHING. I see no need to scale back to 24.

    For some applications, this would actually be better than a P2.

    - Prop Manual only needs a small update

    - drop the multiplier, assign slots carefully, you can run it as 100% binary compatible to P1
  • jmgjmg Posts: 15,171
    edited 2014-04-06 20:53
    RossH wrote: »
    I agree code protect may end up in the "minimum" list. That has been requested many times, and appears to have limited the applicability of the P1. And if that means we can't boot into Spin then so be it.

    But (as far as I know) we not had these discussions yet. I'm happy to have them, but these things should be made part of the "minimum" list at the end of such a discussion - but not at the beginning.

    Err, I think the ROM handling Chip has at 180nm, makes it explicit there is no Spin ROM. (or other lookup ROMs)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-06 20:53
    Mooch would be a huge win in this P1E version.

    The hub slot assignment table makes perfectly deterministic bandwidth control for cogs based on actual need extremely easy.

    I am really, really tempted to keep it 100% binary compatible with existing P1, as with multiplier and hub cycle control it could even be 100% timing compatible, so let's find space for COGRUN options in unused opcodes :) maybe we can fit everyting into the unused MUL slot, or use registers or hub locations.

    Thanks, this should be very cheap in terms of power and gates. Chip will tel us. Yep, it would allow a few fast higher bandwidth deterministic cogs.

    Mooch could be used for:

    - non-deterministic boost for compiled C code (business logic, HMI etc)

    OR

    - "free" garbage collector (say for Java, JS, python or other language that needs it)
    potatohead wrote: »
    @Bill: mooch lives! (let's hope) I would miss mooch. Let's hope he sticks around.

    I've been thinking about the cool COGRUN options we worked out. It would be really nice to get the ability to start a COG without loading it, and or at a given address with or without loading it. Seems this one is cheap, and given the mapping scheme, might allow for a couple of high throughput COGS, able to jump on a task quickly, then go away when done, saving power.
  • RossHRossH Posts: 5,454
    edited 2014-04-06 20:59
    jmg wrote: »
    Err, I think the ROM handling Chip has at 180nm, makes it explicit there is no Spin ROM. (or other lookup ROMs)

    Can you provide a link?

    Ross.
  • potatoheadpotatohead Posts: 10,261
    edited 2014-04-06 21:01
    It's buried in the P2 thread, but the gist of it was the ROM required extra addressing in the P2 design. Simpler to just hard code some RAM. I do not recall the details of the size discussion, but I think the ROM cells would have been larger? Or was it that making RAM cells were larger, but there were not that many of them, and we didn't want to kludge in an address line?

    That should be enough for one of us to find the discussion. (goes off to look)

    In any case, I do not think having the P1 ROM needs to be on the minimum list as it can always be supplied right along with anything else needed, such as lookups, fonts, etc... We need booter and the crypto in order to make effective use of the fuses, which will be in the design, and insure code protect works.

    Personally, I think the monitor makes sense. It sure does in the testing phase, which we will need a testing phase anyway.

    Before I added a SPIN ROM, I would prefer debugger, for a quintet! booter, crypto, monitor, debugger, leaving all else to be loaded as needed.
  • jmgjmg Posts: 15,171
    edited 2014-04-06 21:04
    tonyp12 wrote: »
    What if the user loads 32cogs that don't use hub access or any waits, full blast would do what?
    1: Heat sensor will shutdown it down and reboot once it return to regular temp
    2: each cog has a mhz multiplier for speed control, call it carbon credits. the total for all 32cogs can not exceed 60% of full blast.

    Valid points, and we do not yet have SIM values.
    As (180nm + Possible MHz Chip reaches) combine to push the package, I think some form of thermal monitoring is going to be needed.
  • jmgjmg Posts: 15,171
    edited 2014-04-06 21:08
    potatohead wrote: »
    It's buried in the P2 thread, but the gist of it was the ROM required extra addressing in the P2 design. Simpler to just hard code some RAM.

    In any case, I do not think having the P1 ROM needs to be on the minimum list as it can always be supplied right along with anything else needed, such as lookups, fonts, etc... We need booter and the crypto in order to make effective use of the fuses, which will be in the design, and insure code protect works.

    Correct, but it seems some thought this was more like an original P1, than it really is ?

    We need booter and the crypto in order to make effective use of the fuses, which will be in the design, and insure code protect works. is a good point.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-06 21:13
    Based on what I've read, I see no harm and some benefit to keeping P1 binary compatability. It gets us PropGCC and usable (for 32KB) spin immediately
    With Rom at the bottom of hub, maybe there is a minor change required to spin - I cannot recall. But, SPIN gets almost a full 64KB immediately because it can address the whole P1 hub ram and rom!

    If we use P1 binary compatible, we can use NR for the WAITPxx selector for Port B. This is simpler and nicer.
    There were 4 unused instructions MUL/MULS/ENC/ONES. As has been mentioned, DJNZ/TJNZ/TJZ could use modified opcodes.
    P2 changed the Z & C flag setting on some instructions to use the result instead of the original value. This needs discussion.
    But these are all minor things, and as long as Chip is reminded, he can make the decision.
    I'd add my new JMP / CALL / LOAD instructions, with the lowest 17 bits specifying a 128K LONG address. I'd use say $1EF as LR, hard wired (only steals 1 register from old pasm code, does not use 0, no need for mapping register / instruction / multiplexer, when not used by CALL/JMP it is simply the regular $1EF cog register.
    The new JMP/CALL/RET would be similar to the existing JMPRET instruction, with the 17 bit saved address being stored in $1EF if the NR bit is not set (r=1).
    The P1 JMPRET instruction uses WZ to set Z if the result is zero (pretty useless - for wraparound detection). Perhaps WZ & WC could be used as per P2 to save/restore the Z & C flags as well.

    If we are short of opcodes, then the LOAD could use the new JMPRET with b17=1.
    Y'all wanted P1 simplicity. This gives that, and a good chunk of P2 hubexec, with extremely fine grained 100% deterministic hub timing, that could be set up to be exactly P1 hub timing compatible for 8 cogs.
    A resounding yes!

    "We could reduce to 24+ cogs..."
    The beauty is that we don't have to!
    This was to provide silicon space for the few additional things required. Nothing to do with slots/power/etc.
  • RossHRossH Posts: 5,454
    edited 2014-04-06 21:15
    jmg wrote: »
    Correct, but it seems some thought this was more like an original P1, than it really is ?

    We need booter and the crypto in order to make effective use of the fuses, which will be in the design, and insure code protect works. is a good point.

    Fair point. This is not a P1, so my comments may be out of order. I am just interested because I can see the same discussion occurring for the P16X32B.

    Ross.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-06 21:16
    jmg wrote: »
    Not quite, HUB can stall the COG. based on timing, but what if there is no HUB slot, or they are rare ?
    My scheme gives power envelope control (also speed deterministic) over cases that are not hub-locked, and also works in cases with no HUB access at all.

    Excellent question.

    I would believe that the programmer would have useful, needed work running in the cog, that in order to have deterministic timing, should not have its clock arbitrarily changed.

    I am not at all opposed to a cog_clock_divider[32] array that slows a cogs clock by dividing it. I am not sure it is needed, but I do not oppose it. Pretty sure some others will :)

    Thinking about it, a better solution would be a new instruction:

    SETMYCLOCKDIV #n

    That way, no need for a clock divider array, and presumably code running in a cog would know how much its clock should be divided.
    jmg wrote: »
    Rather than user code & data flows setting the envelope, it is a specific design decision in the Map table (and thus harder to break accidently)

    In an ideal word, there would be no Power envelope, and all this would be superfluous :)
    In the 180nm word, Chip has these parts so well optimised for speed, that Power is a big issue.

    I see what you would like added, and I think my 'setmyclockdivide' instruction for a cog is a good solution, if Chip wants to add it.

    I think we can't add your power cycle control bit to the hub slot as the slot array is indexed by [cnt&$1F], and NOT by cog number. It contains the cog to give the hub slot to.
    jmg wrote: »
    Current HW has ~10% power level idle on Waiting opcodes, and I guess HUB is, (or could be), the same.
    It may not have been coded-in that way, as they were always close together.
    At N/128 mapping, that detail matters more, and it could be coded to behave that way.

    Thanks. 10% is good, I am surprised it is not lower.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-06 21:20
    If the user needs to do that, he needs to do precise that - we should not prevent him arbitrarily.

    I do not like auto reboot or shutdown, the user presumably needs that action.

    Specifically I like the idea of the total chip temp being available, and one cog could monitor and take appropriate action for the appropriate temperature - fully under users control.

    The idea of carbon credits is far too complicated, as is 32 temperature sensors. The power envelope issue is for the chip as a whole, so 32 temperature sensors, adc's etc is way overkill, and not needed.
    tonyp12 wrote: »
    >will not burn much power at all as they will mostly be waiting for something to do

    What if the user loads 32cogs that don't use hub access or any waits, full blast would do what?
    1: Heat sensor will shutdown it down and reboot once it return to regular temp
    2: each cog has a mhz multiplier for speed control, call it carbon credits. the total for all 32cogs can not exceed 60% of full blast.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-06 21:22
    Mooch would be a huge win in this P1E version.

    The hub slot assignment table makes perfectly deterministic bandwidth control for cogs based on actual need extremely easy.

    I am really, really tempted to keep it 100% binary compatible with existing P1, as with multiplier and hub cycle control it could even be 100% timing compatible, so let's find space for COGRUN options in unused opcodes :) maybe we can fit everyting into the unused MUL slot, or use registers or hub locations.
    Opcode space for opcode 000011 only uses bits 2:0 of S for CLKSET/COGID/COGINIT/COGSTOP/LOCKxxxx.

    Remind me, why do we need COGRUN ? (senior moment or more coffee needed)
  • potatoheadpotatohead Posts: 10,261
    edited 2014-04-06 21:33
    COGRUN was to simply request a COG from the pool of free, not running COGS, not start a specific COG, which could fail, if it's running. This was to avoid having to track COGS, and is considered P1 best practice. :)

    It's deffo the coffee. (goes to get a cup, myself)
  • jmgjmg Posts: 15,171
    edited 2014-04-06 21:58
    I am not at all opposed to a cog_clock_divider[32] array that slows a cogs clock by dividing it. I am not sure it is needed, but I do not oppose it.

    Once the new Sim Numbers are in, we will know what uses power where.
    Exactly what speed HUB ram can hit will also be shown in Sims ( 200MHz ? 100MHz?)


    Thinking about it, a better solution would be a new instruction:

    SETMYCLOCKDIV #n

    That way, no need for a clock divider array, and presumably code running in a cog would know how much its clock should be divided.

    I originally approached it this way, but was concerned about possible 'beat' effects between the HUB allocate and the Clock Enables, so I figured it would be better to allocate a bit per cog per slot into the Mapping array.
    That gives a spread sheet style view of Power and Hub Slots and COG Slots (clock enables), no surprises.

    It makes power handling a system-level decision, but one that could be redefined at runtime if needed.
  • jmgjmg Posts: 15,171
    edited 2014-04-06 22:19
    With the slot mapping, no need to have merged kb/mouse/serial drivers!
    ...
    Resources left free: 11 cogs, 7/128 hub slots

    100% deterministic timing, by user! TOTALLY configurable!

    Removes need for tasks, threads totally!

    Another detail this overlooks, is a modulus control on that N/128 scan.

    There may be Power/Hub allocation that must repeat with no jitter across a user selected Active-COG count.
    A fixed N/128 does not allow that.
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-07 01:41
    mindrobots wrote: »
    1) maximum acceptable power ceiling - if you don't specify that, history will repeat itself.
    2) requirements frozen on mm/dd/yy to make mm/yy shuttle so chip can be GA from Parallax by mm/yy.

    Yes and yes.
    Cluso99 wrote: »
    1. 512KB hub ram (32bits wide)
    2. 16-32 P1 2-clock little lean and mean Cogs
    3. 160-200 MHz
    4. 64 I/O with ADC etc from the P2
    5. TQFP100 0.5mm with thermal ground pad
    6. P2 Security fuses and mode
    7. P2 simple boot and monitor
    8. P2 style boot from SPI Flash (not I2C Eeprom)
    9. WAITPEQ/WAITPNE - need method for PORT A/B selection

    Yes to all except 4 which has a 'only if easy/low power' caveat.
    Cluso99 wrote: »
    10. Some form of hub slot allocation by sw (Bill has made some excellent suggestions)
    11. Simple hubexec as I suggested

    Yes and yes.
    Cluso99 wrote: »
    12. P2 instruction format for P1 instructions (becomes subset for later P2) ???
    13. Some form of cog-cog parallel communication (relieves hub bandwidth, and faster) ???
    14. Add P2 single MATHS, CORDIC, maybe a P2 COUNTER, etc shared subsystem ???
    15. P1 Video block - add read capability to perform simple serial input (can already do output) ???
    16. P2 Video block - is it possible to add a single shared video block ??? Is 1 enough ???
    17. Add P2 instructions TARG (simple variant), maybe AUGS and AUGD ???
    -- TARG (only valid for next instruction) - if P2 instruction subset we don't have NR
    -- Is it simple and doable ???
    18. A "WAIT n" instruction to conserve power would be nice

    No.
    Yes.
    Yes. No.
    Yes if easy
    Yes if easy
    Don't know enough about them.
    Is it needed? Can't you just waitcnt?


    Question - is there a concise explanation of the P2 video block somewhere?
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-07 02:04
    Brian,
    "WAIT n" could be used to sprinkle throughout code where short delays are required, This would save power. It would not destroy the WAITCNT so it could still be used. But WAITCNT has a minimum number of clocks - you cannot just do 1 or 2. Hence my thoughts about WAIT n.

    TARG can be used to redirect the following instructions destination. Effectively it allows A + B = C instead of A + B = B. However, in the P2 instruction set, we no longer have the NR bit, so TARG would sidestep this by writing the result of an instruction to another destination (a scratch register).
    If we use the P1 opcodes, then TARG isn't as attractive as it is with P2 opcodes.

    AUGS and AUGD were P2 methods to increase an S or D value by adding upper bits x:9 using the AUGS and/or AUGD instruction before the instruction that gets its S or D result modified.
    Again, I think it is more a consideration with P2 opcodes. It aids Hubexec mode to load a larger immediate value. Of course it is unnecessary.

    As for the P2 Video modes, I haven't played with them. Better for someone else to explain.
  • jmgjmg Posts: 15,171
    edited 2014-04-07 03:00
    localroger wrote: »
    If we are talking minimums, we need to talk minimums.

    The minimum is P1 running at 160-200 MHz with 64 I/O and 256K Hub RAM.

    Chip says 16 cogs and 2 cycle instruction execution are easy, so I'll take those.

    EVERYTHING ELSE IS GRAVY. DAC or ADC resistors under the I/O pads? If easy do it, if not drop. That is the watchword for everything else. If it's easy do it, but if it's not easy just get us a chip with the better clock, more I/O and RAM. We've been waiting too long.

    P1 is bottlenecked because it doesn't have enough business logic RAM and the best schemes for external RAM use too many I/O pins. More RAM! More pins! Everything else is gravy! Tasty delicious gravy and we want it very much, sure, but before we get to the gravy we have to have the MEAT. I would rather have the processor that is twice as fast with twice the pins and twice (well all right eight times) the RAM that can actually get produced than the one with five times, ten times, twenty times, and magic pixie dust that remains a simulation on a $500 FPGA.

    This 'true minimum' approach, does raise a road map option.
    Many vendors have a series of devices in compatible packages, to allow users to select and upgrade existing designs.
    This also gives more rapid design ramp.
    Applying this focus, would see

    X) a minimal 180nm design, single supply device, that meets (4c) P1 running at 160-200 MHz with 64 I/O and 256K Hub RAM, and also with a on-chip regulator, and a Boot-option that can load either SPI or i2c.
    This would be designed to subset bond into existing 44/40 pin package, for system upgrades, as well as 64io packages.
    It would focus on the smaller-die, and lower cost end of the 180nm spectrum.
    Around the performance of a 200MHz P1@4c should be possible with a larger set of COGs (but still few enough to meet the Power Envelope of the cheaper, smaller package options).
    Needs an OnSemi Sim run to define Cog # inside the smaller package choices. Might even reach QFN32 ?

    and also at 180nm, above X) with some margin is Y) :

    Y) A larger, Power-envelope dictated device, Dual Supply (switching regulator), for TQFP100 exposed PAD, at 20'C/W.
    Because 8 x P2 COGs cannot meet the Power envelope, it would have fewer P2 cogs, and some P2 Subset COGS, and more memory than X). The exact splits set by Power-envelope values.
    DIe size is Chip's maximum, and memory is sized large, to target LCD Displays.
    The external Core switching regulator means higher applied costs, but gives more user choices on Peak cases.

    Parallax then have 3 devices, at the end, with some space between families.
  • SRLMSRLM Posts: 5,045
    edited 2014-04-07 04:08
    What about all the "small" features of the P2 that we're missing? Personally, I've looked forward to a 64 bit CNT.
  • tonyp12tonyp12 Posts: 1,951
    edited 2014-04-07 07:09
    > I do not like auto reboot or shutdown, the user presumably needs that action.
    > Specifically I like the idea of the total chip temp being available, and one cog could monitor and take appropriate action for
    1: A user have code that has 1st cog running at full-blast and the other 31 is at 6%, his cognew loop forgets to +1 so it puts the full blast cog on to all of them.
    2: People find out that you can kill the mcu and even start a fire by this trick, will for sure release a program called: greatest demo ever.

    A: Having a heatsensor reboot will save the chip and the house from burning down.
    B: Having a digital carbon credit counter, one that actually adds up all cogs active cycles vs sleep as (call it watchdog)
    is better than forcing some cog to run a 1/8 etc as some programs maybe have a 1% duty but needs full blast when it's awake.
    But if the register that sets up the cog speed divider can not physically set all 32cogs to 0 divider is probably simpler.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-07 07:43
    jmg,

    I think you may be mus-understanding my slot allocator table.

    Chip's P1+ is proposed as 200MHz clock (and hub), 2 clock cog instructions, and yep, we need simulation results, but before then, with the assumption that his idea works (just may need to be scaled down for lower clock speed, I came up with an allocation scheme, indexed not by cog number, but by clock cycle.

    After crunching numbers, I decided that 2x number of cogs was too small for the allocation array, as it wasted hub bandwidth on simple driver cogs, and did not give enough flexibility.

    4x seemed pretty good, and only needs 128*5 bits of fast memory.

    I envisioned a default fill pattern of 4x[0..31], which would be the same as Chip's non-mapped idea.

    In order to use it to gate clock on a per-cog basis, it would need one bit per cog, or 32 bits for the gating, in each table entry.

    I think it would be much simpler to have a separate "clock gating / power array", with one gating bit per cog in eash element.

    Say you wanted 1/8 cycle control,

    power_gate long 8[$FFFFFFFF] ' indexed by cnt&7, each bit in a long corresponds to each of the 32 cogs.

    default behavior, every cog gets clock enabled every cycle.

    If you need finer grain clock gating control than 1/8, a 16 entry table would provide 1/16 resolution, and so on.

    As the 128 entry slot mapping table would already save power (10% I think you mentioned as the power figure for a cog waiting for something; I am surprised it is that high, I'd have expected very close to leakage current)

    Wait! I think I have something better.

    No power array needed.

    Each cog could do "SETCLOCKGATE D" where D is the gate bit for 32 clock cycles, each bit corresponding to enabling the clock for the cog / disabling it for cnt&$1F.

    I was bothered by the array influencing the cog outside of the code writer's control, but if a driver only needs 1/4 of the cycles and 1/128 of slots, the driver writer can set the power for that cog and that cog only.
    jmg wrote: »
    Once the new Sim Numbers are in, we will know what uses power where.
    Exactly what speed HUB ram can hit will also be shown in Sims ( 200MHz ? 100MHz?)

    I originally approached it this way, but was concerned about possible 'beat' effects between the HUB allocate and the Clock Enables, so I figured it would be better to allocate a bit per cog per slot into the Mapping array.
    That gives a spread sheet style view of Power and Hub Slots and COG Slots (clock enables), no surprises.

    It makes power handling a system-level decision, but one that could be redefined at runtime if needed.

    I am 100% convinced hub slot allocate must be kept separate from cog clock gating. See above. The cog should control its own clock gating, as only the code in it knows how many clock cycles it needs... and dropping below the minimum it needs could cost huge problems.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-07 07:49
    Tony,

    1) Your "one cog running full blast" is likely impossible, as it would require a single simple cog to consume enough power to melt the packaging, PCB, and find something to ignite.

    Even the unrealistic 8 cog full-bore P2 @ running all 8 cogs full-out with all peripherals and math at 180MHz would not generate enough heat to do that, never mind a single P1+ cog!

    32 temperature sensors, and ADC for them, is a lot of resources for not nearly enough justification

    With synthesis, the 32 cogs are not likely to be in nice simple rectangular cog box easy to place a temp sensor in, but intermeshed intertangled mess.

    2) The chip shutting down or rebooting automatically could stop a machine at the wrong moment, potentially causing serious injury

    3) Overall chip temp should be monitored, and having a cog, which can be programmed how to do a quick safe shutdown monitoring it addresses your concerns, while preventing potential disasters.

    We have ZERO information on P1+ power envelope, and given that its whole reason for being is to reduce the P2's power envelope, it will have to be lower, or this whole argument dissapears as a P2 is made.

    If it is lower, if the P2 @ 8W can not cause a fire, a lower wattage P1+ definitely cannot. Your fear, while understandable, is misplaced.

    Go into a modern factory. You have emergency stop buttons, that activate a controlled quick shutdown. You can cause great damage by just pulling the breaker, which you would only risk if the emergency stop failed somehow (kentucky fried plc?)
    tonyp12 wrote: »
    > I do not like auto reboot or shutdown, the user presumably needs that action.
    > Specifically I like the idea of the total chip temp being available, and one cog could monitor and take appropriate action for
    1: A user have code that has 1st cog running at full-blast and the other 31 is at 6%, his cognew loop forgets to +1 so it puts the full blast cog on to all of them.
    2: People find out that you can kill the mcu and even start a fire by this trick, will for sure release a program called: greatest demo ever.

    A: Having a heatsensor reboot will save the chip and the house from burning down.
    B: Having a digital carbon credit counter, one that actually adds up all cogs active cycles vs sleep as (call it watchdog)
    is better than forcing some cog to run a 1/8 etc as some programs maybe have a 1% duty but needs full blast when it's awake.
    But if the register that sets up the cog speed divider can not physically set all 32cogs to 0 divider is probably simpler.
  • tonyp12tonyp12 Posts: 1,951
    edited 2014-04-07 07:55
    >as it would require a single simple cog to consume enough power to melt the packaging
    One cog can not do it, but 32 all together could do it if by malicious intention/mistake are all running at full-blast (I don't know, but onsemi could confirm)
    Having heat sensor that is software readable is good, but there should be a upper limit in hardware that a hacker/mistake can not bypass.

    Amiga Floppy drive playing El C
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-07 07:59
    Surely the proper engineering solution is to design a chip, that when fitted to the appropriately sized board, will never be able to kill itself.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-07 08:03
    32 running full blast MIGHT be able to do it if it can generate enough watts of waste heat. Which it almost certainly can't, however that cannot be quantified until after we get simulated power maximum ratings. If it could draw say 100W, it would likely vaporize the power supply traces, some capacitors, etc., if the voltage regulation could not go into thermal shutdown fast enough. Which would require far more watts to be drawn in very few miliseconds.

    We pretty much know 8W cannot make it catch fire, and as it is designed to draw far less, less than 8W DEFINITELY could not cause the PCB catch fire!

    Having said that,

    I would actually be very interested in how many watts would need to be drawn to cause a PCB to catch fire?


    I've often had to check pcb trace current carrying capacity for similar reasons, and generally the tables I use indicate temperature rise per amp for different copper thicknesses and trace width.

    Solder masks are designed not to catch fires, and pcb's (mostly epoxy these days) are pretty difficult to make catch fire.

    Even if your trace is too narrow to carry the current you are trying to pull the most likely result is just browning the trace/pcb, and you would need a LOT of current to vaporize the trace, or to make the PCB catch fire, and not be able to carry enough current to the P1+ to make it catch fire - which it never could anyway, as it could not ask for enough current to cause the PCB / traces to catch fire.

    Your concern, while appreciated, is entirely misplaced.


    I am now really, really interested in how much current would be needed to:

    a) make a P2 / P1 literally catch fire

    b) make a PCB burn

    I am guessing it would be FAR more than 8W, and that PCB traces supplying power would vaporize before the chip could burn.

    And before the chip could burn, or traces vaporize, the power supply / voltage regulator designed to deliver far less amps would do a thermal shutdown, thus shutting down the processor.

    Your concern is misplaced, but I really want to know how many watts it would take :)
    tonyp12 wrote: »
    >as it would require a single simple cog to consume enough power to melt the packaging
    One cog can not do it, but 32 all together could do it if by malicious intention/mistake are all running at full-blast (I don't know, but onsemi could confirm)
    Having heat sensor that is software readable is good, but there should be a upper limit in hardware that a hacker/mistake can not bypass.
Sign In or Register to comment.