Shop OBEX P1 Docs P2 Docs Learn Events
Should the next Propeller be code-compatible? - Page 9 — Parallax Forums

Should the next Propeller be code-compatible?

1679111227

Comments

  • jazzedjazzed Posts: 11,803
    edited 2008-08-29 18:37
    Yes Chip direct to hub would be nice, but that is impossible as is clear with the round-robin nature of COGS without strict priority servicing.·The main reason for being interested in hub access in the first place is that software bit-fetch/bit-bang is too slow and one needs to do segmentation and reassembly somewhere when using multiple COGs. Adding a COG DMA engine surely would speed things up ... no?

    COG DMA should have no such hub constraints though and should be able to sample pretty darn fast right? Looking at the counter modules, they can be incremented at clock speed so I assume with a verilog/vhdl hardware state-machine assist and some arbitration if necessary, words can be DMA read or written in·less than·4·clock cycles.

    Of course having an NRZ/NRZI SERDES solves many serial communications issues that even DMA can't solve [noparse]:)[/noparse]

    Good luck with your project.



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 18:39
    Chip,

    If I understand the JMPD correctly, the following could be used to emulate two threads, assuming they don't include JMP or WAIT instructions:

    :loop   jmpd    pc0     'nop
            add     pc0,#1
                            'inst@old pc0
            jmpd    pc1     'nop
            add     pc1,#1
                            'inst@old pc1
            jmp     #:loop
    
    
    


    Does that look right? What's missing, of course, is context-switching. As a concession to software multi-threaders, would it be possible to include two sets of condition bits that could be swapped in one instruction? I think most of us would be satisfied with two. More than two threads, emulated in this fashion, would bog down. This kind of context-swapping could come in handy elsewhere, too, when you want to call a subroutine, say, that sets condition bits, while preserving those bits in the calling program.

    Thanks,
    Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 18:42
    jazzed said...

    Yes Chip direct to hub would be nice, but that is impossible as is clear with the round-robin nature of COGS without strict priority servicing.·The main reason for being interested in hub access in the first place is that software bit-fetch/bit-bang is too slow and one needs to do segmentation and reassembly somewhere when using multiple COGs. Adding a COG DMA engine surely would speed things up ... no? It would, but it would be really useful if it performed modulation/demodulation after/before reading/writing cog RAM, as you state at the end here.

    COG DMA should have no such hub constraints though and should be able to sample pretty darn fast right? Looking at the counter modules, they can be incremented at clock speed so I assume with a verilog/vhdl hardware state-machine assist and some arbitration if necessary, words can be DMA read or written in·less than·4·clock cycles. It could actually hit every clock, so 32 bits could be moved at a 160MHz rate!

    Of course having an NRZ/NRZI SERDES solves many serial communications issues that even DMA can't solve [noparse]:)[/noparse]

    Good luck with your project.



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2008-08-29 18:43
    A couple more layout PIC's that Chip has asked me to post....·

    Phil,

    The '3R1W_BIT_CELL.png' functions as a "...four-port memory..." that performs "...·three reads and one write".

    Compared to a standard 6T memory cell which we also use '6T_BIT_CELL.png'.


    Our 6T memory cell·is comparable to TSMC's 130nm process if it were scaled up to 180nm.
    ·

    ·

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.

    Post Edited (Beau Schwabe (Parallax)) : 8/29/2008 7:38:16 PM GMT
    2560 x 966 - 246K
    1280 x 966 - 100K
  • RaymanRayman Posts: 14,849
    edited 2008-08-29 18:45
    Chip Gracey (Parallax) said...


    I've noticed the jitter to be within a nanosecond, which is 1/8 of·a pixel at a 125MHz pixel rate - not noticeable on a CRT, but some LCDs (which·all use internal PLLs to sync) exhibit occasional pixel-boundary jitter. A monitor with a good PLL system always looks rock-solid, though. Some monitors are better than others in this regard.
    I don't mean jitter, I'm referring to a fixed timing offset between plls...

    But, maybe I did something wrong...· Now that I think about again, the cursor wouldn't looks so good if the pll's weren't pretty well sync'd.

    Post Edited (Rayman) : 8/29/2008 7:07:43 PM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 18:51
    Beau, Chip,

    That four-port cell represents quite a commitment, silicon-wise! How does it compare, in physical area, to the Prop I cell that has a larger feature size?

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 18:59
    Phil Pilgrim (PhiPi) said...
    Chip,

    If I understand the JMPD correctly, the following could be used to emulate two threads, assuming they don't include JMP or WAIT instructions:

    :loop   jmpd    pc0     'nop
            add     pc0,#1
                            'inst@old pc0
            jmpd    pc1     'nop
            add     pc1,#1
                            'inst@old pc1
            jmp     #:loop
    
    
    


    Does that look right? What's missing, of course, is context-switching. As a concession to software multi-threaders, would it be possible to include two sets of condition bits that could be swapped in one instruction? I think most of us would be satisfied with two. More than two threads, emulated in this fashion, would bog down. This kind of context-swapping could come in handy elsewhere, too, when you want to call a subroutine, say, that sets condition bits, while preserving those bits in the calling program.

    Thanks,
    Phil


    Wow! I never though of doing it that way. It's quite a brain-bender. I think it would work. It needs to be diagrammed. Branches within the single-instruction-at-a-time threads wouldn't work, though, right? About the condition bits, I'm having a mental block on how to save/restore these.

    I'm getting the feeling that very fine granualarity is paramount to your multi-threading. There·is always going to be the possibility that if each thread had to do a RDxxxx/WRxxxx at the 'same' time, and let's say there were four threads, the last guy would get delayed by 24+ clocks (3*8). That's no worse than what JMPRETD could provide in the same circumstance. Plus, with JMPRETD, you can get bursts of determinant timing in short runs of code before you must JMPRETD to the next guy.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 8/29/2008 7:10:28 PM GMT
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 19:06
    Rayman said...


    I don't mean jitter, I'm referring to a fixed timing offset between plls...
    I was thinking about multiple cogs starting up their internal CTRs and PLLs at the same time.

    For syncing CTRA and CTRB PLLs within a single cog, you'd have to preload the second-to-be-configured CTR's PHS register with a value that jump-starts it to match the first CTR at the time it is to be configured.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • RaymanRayman Posts: 14,849
    edited 2008-08-29 19:10
    Chip,

    I take back what I said... Upon further reflection, the cursor in XGA, SXGA mode wouldn't look as good as it does if the plls weren't sync'd very well. I must have done something wrong when I tried it for my 6-bit vga driver...
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 19:11
    Chip,

    A branch in the emulated code would just be something like MOV PC0,#target_addr, similar to the LMM. This works, since the increment has already taken place by then and won't muck it up.

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • TimmooreTimmoore Posts: 1,031
    edited 2008-08-29 19:13
    Wow! I never though of doing it that way. It's quite a brain-bender. I think it would work. It needs to be diagrammed. Branches within the single-instruction-at-a-time threads wouldn't work, though, right? About the condition bits, I'm having a mental block on how to save/restore these.
    What about making the condition bits accessable as a register then software can save/restore if needed
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 19:19
    Chip,

    BTW, given that the non-hub and non-0001xx opcodes are used up in the Prop I, where did you manage to squeeze in the JMPRETD family?

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • Beau SchwabeBeau Schwabe Posts: 6,568
    edited 2008-08-29 19:28
    Phil,
    ·
    "That four-port cell represents quite a commitment, silicon-wise! How does it compare, in physical area, to the Prop I cell that has a larger feature size?"
    ·
    I looked at the database for the Prop I, and there isn't a four-port cell... at least not that I saw.· The 6T memory is used for both the MEM_RAM and MEM_COG.
    ·
    The dimensions of the 6T for the Prop I, are 6.85um X 4.8um requiring a total area of 32.88^2um
    ·
    The dimensions of the 6T for the Prop II, are 2.73um X 1.95um requiring a total area of 5.3235^2um· ... 1/6th of the silicon
    ·
    ·
    The dimensions of the four-port cell in the Prop II is 2.885um X 7.770um requiring a total area of 22.4165^2um· ... Still less silicon than the 6T in Prop I.
    ·
    ·
    Attached is a side by side comparison in the same scale view of the 3R1W and 6T memory cells.

    ·

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Beau Schwabe

    IC Layout Engineer
    Parallax, Inc.
    2560 x 966 - 151K
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 19:29
    Hey, why is there so much focus on executing LMM code at a certain address, when you could just copy the instruction into a register that you're about to execute? You've got six instructions available between RDLONGs, so could you not perform the housekeeping within that period?

    loop····· rdlong· inst0,pc0··'get thread0 instruction
    ········· getzc·· zc0······· 'get thread0 flags
    inst0···· nop··············· 'LMM instruction gets put here
    ········· setzc·· zc0······· 'set thread0 flags

    ········· '(you've got three more instructions before another RDLONG)

    You could use indirection, too, to make this thing loop.

    Maybe the indirect addressing should be augmented so that addresses within some range get converted to another range through bit substitution. This way, each LMM thread could address the same locations, but be accessing separate windows. For example, you could set it up so that if either S or D·registers were ranging from $000..$00F, you would mux into S/D bits 7..4 a variable value. That way, a thread0 could address $000.$00F, a thread1 could address $010..$01F, a thread2 could address $020.$02F, etc. - all while being coded to $000.$00F. This way, you could run multiple instances of the same LMM code. This would be something VERY easy and small·to add to the cog.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 19:30
    Tim,

    The condition bits are already accessible, in a way. They can be saved in two instructions and restored in one (assuming save is initialized to zero):

            muxnz   save,#2
            muxc     save,#1
    
            shr        save,#1 wz,wc,nr
    
    
    



    I was hoping for a way just to switch between two sets in one instruction.

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • TimmooreTimmoore Posts: 1,031
    edited 2008-08-29 19:37
    Phil, not sure you want to combine the save/restore into a switch. If you are writing a small multi-threading kernel, you need to save the old thread context and restore the next thread, making the 2 separate allows any number of threads and allows easier management of thread contexts. If you make it a switch then the thread context will keep moving, it keeps moving to the next running threads old context, and its context will move to the thread context after that etc. This works if all you want is a pure round-robin switcher but as soon as you add some decision logic the finding the thread will mean another level of indirection which will lose the switch efficency.
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 19:40
    Phil Pilgrim (PhiPi) said...
    Chip,

    BTW, given that the non-hub and non-0001xx opcodes are used up in the Prop I, where did you manage to squeeze in the JMPRETD family?

    -Phil

    It went where ONES was slated to go: %000111. We are using %000110 for everything else!

    Sorry I missed your point about the swap-flags instruction. That's a much better concept than read- and write-flags instructions:

    SWAPZC·· D· 'swap Z and C with bit 1 and 0 in D

    I am going to add this, for sure. It's very elegant. How about augmenting its functionality so that it rotates the bits down by two, acting as a FIFO? That way, N threads can be chained with one instruction. Otherwise, SWAPZC is just a ping-pong mechanism.

    Ah ha! It can be coded·like this:

    SWAPZC·· D,[noparse][[/noparse]n]· 'swap Z and C with FIFO bits·in D, n specifies number of bit-pairs.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.

    Post Edited (Chip Gracey (Parallax)) : 8/29/2008 7:53:04 PM GMT
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 19:41
    Chip Gracey said...
    Hey, why is there so much focus on executing LMM code at a certain address, when you could just copy the instruction into a register that you're about to execute?
    I was thinking more about emulating multi-threaded code that's already resident in the cog. But, yes, your technique would work great for LMM code!

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 19:43
    Chip Gracey said...
    I am going to add this [noparse][[/noparse]condition code swapping], for sure...
    Chip, that makes me very happy! And that SWAPZC D,[noparse][[/noparse]n] is a welcome bonus!

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • simonlsimonl Posts: 866
    edited 2008-08-29 19:47
    Did you guys miss Chips 12:29 post?

    I'm no LMM programmer, but Chip's 'find' seems to be a big deal for LMM, and no-one appears to have mentioned it! (Just wanted to make sure it didn't get missed wink.gif )

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Cheers,
    Simon

    www.norfolkhelicopterclub.com

    You'll always have as many take-offs as landings, the trick is to be sure you can take-off again wink.gif
    BTW: I type as I'm thinking, so please don't take any offence at my writing style smile.gif
  • TimmooreTimmoore Posts: 1,031
    edited 2008-08-29 19:52
    Can [noparse][[/noparse]n] be a register or just a constant? I want a thread id (assume ids are even no) in a register. Then for up to 16 threads I can save/restore context based on the thread id register, and double the thread id for the jmpret offset. ad I will be able to switch between arbitary threads just by setting the thread id and then restoring the context.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2008-08-29 19:55
    Simon,

    Nope, didn't miss it! 'Just focused elsewhere for the moment and not multitasking very well. smile.gif

    In fact, for one or two threads, you could also do RDLONG inst0,PTRA[noparse][[/noparse] 1++] to auto-increment the PC in PTRA.

    -Phil

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    'Still some PropSTICK Kit bare PCBs left!
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 20:33
    Timmoore said...
    Can [noparse][[/noparse]n] be a register or just a constant? I want a thread id (assume ids are even no) in a register. Then for up to 16 threads I can save/restore context based on the thread id register, and double the thread id for the jmpret offset. ad I will be able to switch between arbitary threads just by setting the thread id and then restoring the context.
    It could use either a constant (0..15) or a register (4 lsb's). This takes some thinking about...


    get·A flags
    exe·A inst
    save·A flags

    (save A, get B, rotate) %ddccbbaa -> %AAddccbb

    get·B flags
    exe·B inst
    save·B flags

    (save B, get C, rotate) %AAddccbb -> %BBAAddcc

    get·C flags
    exe·C inst
    save·C flags

    (save C, get D, rotate) %BBAAddcc -> %CCBBAAdd

    get D flags
    exe D inst
    save D flags

    (save D, get A) %CCBBAAdd -> %CCBBAAdd -> %DDCCBBAA


    So,·the SWAPZC instruction must·read flags from bits[noparse][[/noparse]1..0], shift right by 2 bits, and save·flags to bits[noparse][[/noparse]n*2+1..n*2].

    When a thread starts or dies, you change n at that time to open a new bit set, or clip off an old one. This is almost nothing to implement.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • ImageCraftImageCraft Posts: 348
    edited 2008-08-29 20:34
    @Chip, if it in fact doesn't take too much real estate to do and doesn't take things away from other stuff, I'd say go for it. The TackXXX should be sufficient for some people's needs. Why Tack though? Why not Thread?
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 20:39
    ImageCraft said...
    @Chip, if it in fact doesn't take too much real estate to do and doesn't take things away from other stuff, I'd say go for it. The TackXXX should be sufficient for some people's needs. Why Tack though? Why not Thread?
    Tack spells better with +new and +end, plus TACKNEW and TACKEND are under 8 characters (a tab stop).

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • jazzedjazzed Posts: 11,803
    edited 2008-08-29 20:46
    "Task" is more relevant. I thought you were talking about nails or cars at first [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve
  • cgraceycgracey Posts: 14,256
    edited 2008-08-29 20:49
    jazzed said...
    "Task" is more relevant. I thought you were talking about nails or cars at first [noparse]:)[/noparse]
    You're right! What was I thinking?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • SapiehaSapieha Posts: 2,964
    edited 2008-08-29 21:02
    Hi Chip Gracey.

    Supose if COG has fast SERIN/OUT.
    It is in You construction rom to Run LMM directly on serial IN?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Nothing is impossible, there are only different degrees of difficulty.

    Sapieha
  • simonlsimonl Posts: 866
    edited 2008-08-29 21:25
    I like your thinking Sapieha smile.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Cheers,
    Simon

    www.norfolkhelicopterclub.com

    You'll always have as many take-offs as landings, the trick is to be sure you can take-off again wink.gif
    BTW: I type as I'm thinking, so please don't take any offence at my writing style smile.gif
  • tpw_mantpw_man Posts: 276
    edited 2008-08-29 21:28
    Maybe I'm missing something, but what exactly does [noparse][[/noparse]<x>] mean?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    I am 1011, so be surprised!


    Advertisement sponsored by dfletch:
    Come and join us on the Propeller IRC channel for fast and easy help!
    Channel: #propeller
    Server: irc.freenode.net or freenode.net
    If you don't want to bother installing an IRC client, use Mibbit. www.mibbit.com
    tongue.gif
Sign In or Register to comment.