Shop OBEX P1 Docs P2 Docs Learn Events
Hub Slot Mapping — Parallax Forums

Hub Slot Mapping

Bill HenningBill Henning Posts: 6,445
edited 2014-05-07 16:54 in Propeller 2
I originally proposed 128 slots - but 64 slots would be enough (128 still provides finer control)

9 bits per entry, defined as:

Rccccmmmm

R - reset table index counter (for jmg)
Cccc - cog this hub cycle is assigned to
Mmmm - if cog cccc does not need slot, give it to mooching cog Mmmm (great idea Ray)

Personally I strongly prefer multiple of 16 table size as reset/top cog will break Obex, however it can be useful in bare metal cases.

Table should have at least 64 entries so low and medium speed drivers can donate most of their slots.

Must be writable by all cogs as otherwise all objects will need modification before use.

If objects have info ie needs 4/64 slots loader could make the table automatically.

Leaving this huge performance boost out due to fear is limiting the app space the chip could compete in, therefore dumb.

Comments

  • jmgjmg Posts: 15,148
    edited 2014-05-07 12:46
    9 bits per entry, defined as:

    Rccccmmmm

    R - reset table index counter (for jmg)
    Cccc - cog this hub cycle is assigned to
    Mmmm - if cog cccc does not need slot, give it to mooching cog Mmmm (great idea Ray)

    Works for me, but right now my testcases show mooch has penalties. ie Idea is good, shame about the speed.
    About to try use-more-silicon, but that makes larger tables more costly, and larger tables are slower...

    An appeal of 32x is a WrQUAD can achieve full atomic set/change.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-05-07 12:51
    64 provides better bandwidth management

    Simpler mooch is possible

    1) only two cogs can mooch, even cog can use spare even cycles, odd can use spare odd

    2) 4 cogs can mooch, but only from cogid mod four cog unused

    Both should be faster in logic.
  • jmgjmg Posts: 15,148
    edited 2014-05-07 16:54
    64 provides better bandwidth management

    Sure, but atomic handling needs to be implemented, and it has area and speed costs.
    It it can be made Atomic, and comes in off the critical path, and negligible Silicon cost, then x64 is fine
    Simpler mooch is possible

    1) only two cogs can mooch, even cog can use spare even cycles, odd can use spare odd

    2) 4 cogs can mooch, but only from cogid mod four cog unused

    Both should be faster in logic.

    The speed problem is not in the mapping of COGid (once you have to change 1 bit, there is no speed cost to change all 4)
    - the speed issue is in the fetch-time checking and indirection.

    See my other post
    http://forums.parallax.com/showthread.php/155561-A-32-slot-Approach-(was-An-interleaved-hub-approach)?p=1265817&viewfull=1#post1265817

    more Parallel logic did not really help, but another pipeline stage to resolve the F_NeedsHUB and delay the AllocCOG result does manage to give (Mooch_x15 & x32 ) @ ~ 207.684MHz and just a bit slower than x32.
    These pipelines (hopefully) tap-into, and run in Parallel with existing pipelines and opcode 'need info now' timings.
Sign In or Register to comment.