Shop OBEX P1 Docs P2 Docs Learn Events
The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Page 36 — Parallax Forums

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

13334363839144

Comments

  • TubularTubular Posts: 4,646
    edited 2014-04-16 15:07
    pjv wrote: »
    In following the banter here and reading comments about why the co-operative approach is bad, it just seems to me that the approach is poorly understood. From the comments, most seem to have it backwards.

    Peter, from your point of view, what's the most common misconception?
  • RaymanRayman Posts: 14,132
    edited 2014-04-16 17:01
    Is LOC the same as LINK?
  • pjvpjv Posts: 1,903
    edited 2014-04-16 17:19
    Tubular wrote: »
    Peter, from your point of view, what's the most common misconception?

    Firstly let me say that I mean no malice to anyone on this forum, and perhaps I used an inappropriate word when I said that "most seem to have it backwards", when I should have said that "some have it backwards".

    Yet I do believe that those poo-pooing the cooperative approach have not themselves done the exercise of investigating it adequately, let alone making it work. My reason for making this statement is based on the example of the full duplex serial uart code that is often cited as a typical cooperative application. If that is the extent of their understanding, then I would agree. That particular code does in fact ping-pong back and forth between receive and transmit modes, and it is a form of multitasking, albeit a very poor one; one that does not use a scheduler. However, what those citing that example seem to be missing is the level of elegance and convenience a "scheduler adds to the mix". In the former case, each half of the co-routines, when done, causes a switch to the other routine. And that can work (as evidenced) but is far from general and not optimal. Instead, the most each thread segment should do (at least in my opinion) is relinquish control back to the scheduler which will then calculate the next time that thread should run, and from what point in its program. On completing that calculation, the scheduler places it in a "wait" condition thereby also minimizing the power consumption.It then it selects the next task to be run, and dispatches that at the appointed time. In this manner all threads are serviced sequentially and orderly; each according to their "scheduled" time, and without (much) regard for each other, except for the jitter and processor load each intruduces to the fold.

    This method permits one to have as many threads as one has room for in a cog, of course consistent with the total work load to be done, and the maximum jitter tolerated.

    There are limits to what a cooperative approach are suited for: probably 8..16 low speed (1 KHz) threads, 4..6 medium speed (20 KHz) threads, and 2..4 high speed (100 KHz) threads, Of course all dependent on the jitter that can be accomodated. It is quite acceptable to run a combination of the different speeds.

    So my conclusions that caused me to say what I did are based on my interpretation of comments from those who appear to not fully understand.

    Do I think hardware multi tasking is bad ? Of course not..... unless it is "costly" in some manner. And if that is the case, then I say ditch it; there are other viable approaches.

    I hope I have cleared up any mysteries regarding my comments, expanded on how I see cooperative routines working, and that I have not offended anyone.

    Cheers,

    Peter (pjv)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-16 17:31
    pjv,

    "There are limits to what a cooperative approach are suited for: probably 8..16 low speed (1 KHz) threads, 4..6 medium speed (20 KHz) threads, and 2..4 high speed (100 KHz) threads, Of course all dependent on the jitter that can be accomodated. It is quite acceptable to run a combination of the different speeds."


    Chip's four hardware tasks allow tasks in the MHz range.

    Having said that, my earlier post clearly admitted that cooperative multitasking is useful for the kHz range (even up to your 100kHz sample range).

    The point of my post was to show that hardware tasks are useful if Chip decides to implement them, and to point out that people can fully understand co-operative threads, have used them (many times) in the past, and prefer hardware tasks for solid reasons.

    (All: this is not to push Chip to implement tasks, it is a discussion between myself and pjv regarding the merits of co-operative threads vs. hardware level tasks)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-16 17:33
    Nope, it just lets you load a 17 bit constant into $1EF with one long.

    JMP appears to have been combined with LINK
    Rayman wrote: »
    Is LOC the same as LINK?
  • TubularTubular Posts: 4,646
    edited 2014-04-16 18:10
    Thanks Peter. Good to have your take on things.

    The way I see it, there are several ways to achieve the same end, whether it be a software or hardware approach. Each has a 'layer' of knowledge required, whether that be setting up a scheduler, or a task register and jump addresses for the hardware approach.

    Chip did a really beautiful implementation of the hardware approach, but the disadvantage is it limits to 4 tasks, consumes area, and perhaps some power, even when not used.

    (disclaimer: like bill said, this isn't a call for tasks to be included in any particular form, just looking at options for moving existing code)
  • pjvpjv Posts: 1,903
    edited 2014-04-16 18:14
    pjv,

    "There are limits to what a cooperative approach are suited for: probably 8..16 low speed (1 KHz) threads, 4..6 medium speed (20 KHz) threads, and 2..4 high speed (100 KHz) threads, Of course all dependent on the jitter that can be accomodated. It is quite acceptable to run a combination of the different speeds."


    Chip's four hardware tasks allow tasks in the MHz range.

    Having said that, my earlier post clearly admitted that cooperative multitasking is useful for the kHz range (even up to your 100kHz sample range).

    The point of my post was to show that hardware tasks are useful if Chip decides to implement them, and to point out that people can fully understand co-operative threads, have used them (many times) in the past, and prefer hardware tasks for solid reasons.

    (All: this is not to push Chip to implement tasks, it is a discussion between myself and pjv regarding the merits of co-operative threads vs. hardware level tasks)

    Bill, I agree with you, but it's unfair to compare what can currently be done with the P1, to what might be possible with a P2. With the higher speed and enhanced instruction set, I expect to push cooperative routines also above 1 MHz.

    We'll see how all that turns out..... can hardly wait.

    Cheers,

    Peter (pjv)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-16 18:20
    Sorry Peter, I thought you were talking about P1+ / P2

    For a few simple co-operative threads I am sure you can get to somewhere above 1MHz quite easily :)

    Frankly, I can easily see running co-operative threads within a hardware task, quite often a task (at 25MIPS) will be a huge overkill, and one such task will be able to run your scheduler at threads at roughly P1 speeds :)

    Regards,

    Bill
    pjv wrote: »
    Bill, I agree with you, but it's unfair to compare what can currently be done with the P1, to what might be possible with a P2. With the higher speed and enhanced instruction set, I expect to push cooperative routines also above 1 MHz.

    We'll see how all that turns out..... can hardly wait.

    Cheers,

    Peter (pjv)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-16 18:25
    Exactly!

    If tasks make it in - great.

    If not - c'est la vie.

    Up to Chip.

    What I loved about the way Chip implemented tasks was:

    - easy to write high performance drivers sharing a cog (up to four)
    - the enhanced debugger capabilities
    - easy control of how much processing time each task got
    - user level threads

    With Ken saying Parallax would like to release a new chip every one to two years, I have hope of seeing the "P8X256HP" (aka P2 design before P1+) for high performance applications in one to two years time after P16X512, with all the goodies in it - and maybe more, as with a process shrink we can get more hub, more cogs, more MHz :):):)
    Tubular wrote: »
    Thanks Peter. Good to have your take on things.

    The way I see it, there are several ways to achieve the same end, whether it be a software or hardware approach. Each has a 'layer' of knowledge required, whether that be setting up a scheduler, or a task register and jump addresses for the hardware approach.

    Chip did a really beautiful implementation of the hardware approach, but the disadvantage is it limits to 4 tasks, consumes area, and perhaps some power, even when not used.

    (disclaimer: like bill said, this isn't a call for tasks to be included in any particular form, just looking at options for moving existing code)
  • jmgjmg Posts: 15,155
    edited 2014-04-16 18:38
    What I loved about the way Chip implemented tasks was:

    - easy to write high performance drivers sharing a cog (up to four)
    - the enhanced debugger capabilities
    - easy control of how much processing time each task got
    - user level threads

    I would underline the enhanced debugger capabilities, and expand that to include advanced Watchdog features too.

    Emerging software standards are based on verifying correct operation, of things like oscillators, and software core tasks, and a tiny watchdog stub can do that with ease.

    Having tracked this sort of failure/recovery in other Micros, it does raise one question :

    Q: Does the reset of a P1+, force a re-read of the fuses ?

    (on some competitor parts, not quite everything is kicked by Reset, and a power cycle can be required)
  • Invent-O-DocInvent-O-Doc Posts: 768
    edited 2014-04-16 18:41
    I like the most recent specs that Chip has provided. About hardware multitasking within a COG, I say "PLEASE GET RID OF IT". Let's have a simpler design, besides hubexec is more important and hubexec is great, even at 50% of the speed. I'll likely use C and it will compile to something that runs very fast in that mode

    The other reason to not worry about hardware multitasking is this: 16 COGS. There are plenty to go around+.

    @info - You may not see the value of parallel processing, but I recommend you get a decent propeller 1 board and try it out. once you get used to doing multiple tasks in parallel, it changes how you program and so many things get that much easier to do. Its freaky.
  • SeairthSeairth Posts: 2,474
    edited 2014-04-16 18:59
    Early in this thread, Chip said that the hub access will be every 16 clocks/8 instructions. With 16 cogs, I don't see how this will work, unless the instruction cycle of the odd cogs are staggered by one clock from the even cogs (i.e. COG0, 2, 4, etc. is performing the load/write, while COG1, 3, 5, etc. is performing the read/execute). If this is not the case, then the hub access window will fall in the middle of the two-cycle instruction window for half of the cogs.

    Does this make sense? Or do I have something wrong with my thinking?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-16 19:04
    The hub will run at 200MHz

    Most instructions will take 2 clock cycles (due to dual ported cog memory)

    That leads to 100MIPS max.

    Therefore, each cog will get at the hub every eight instruction (which is every 16 clock cycles)

    I hope all the documentation will talk in terms of real clock cycles, as the instruction clock / system clock difference can be quite confusing, and easy to make mistakes with - heck, it has thrown me off several times too.
    Seairth wrote: »
    Early in this thread, Chip said that the hub access will be every 16 clocks/8 instructions. With 16 cogs, I don't see how this will work, unless the instruction cycle of the odd cogs are staggered by one clock from the even cogs (i.e. COG0, 2, 4, etc. is performing the load/write, while COG1, 3, 5, etc. is performing the read/execute). If this is not the case, then the hub access window will fall in the middle of the two-cycle instruction window for half of the cogs.

    Does this make sense? Or do I have something wrong with my thinking?
  • jmgjmg Posts: 15,155
    edited 2014-04-16 19:26
    Seairth wrote: »
    Early in this thread, Chip said that the hub access will be every 16 clocks/8 instructions. With 16 cogs, I don't see how this will work, unless the instruction cycle of the odd cogs are staggered by one clock from the even cogs (i.e. COG0, 2, 4, etc. is performing the load/write, while COG1, 3, 5, etc. is performing the read/execute). If this is not the case, then the hub access window will fall in the middle of the two-cycle instruction window for half of the cogs.

    Does this make sense? Or do I have something wrong with my thinking?

    You may be right, there could be a half-Op-cycle (1 SysClk) phase offset on half the COGs.

    Not sure that matters much ? - any WAIT would snap back again, but I guess it could give subtle COG-location effects.
  • TubularTubular Posts: 4,646
    edited 2014-04-16 19:31
    The offset might matter for multi-cog video.

    Then again Chip may have something in the smart pins to deal with this effect

    Perhaps we need an instruction HALFNOP :)
  • SeairthSeairth Posts: 2,474
    edited 2014-04-16 19:33
    The hub will run at 200MHz

    Most instructions will take 2 clock cycles (due to dual ported cog memory)

    That leads to 100MIPS max.

    Therefore, each cog will get at the hub every eight instruction (which is every 16 clock cycles)

    I hope all the documentation will talk in terms of real clock cycles, as the instruction clock / system clock difference can be quite confusing, and easy to make mistakes with - heck, it has thrown me off several times too.

    I think my mistake was in forgetting that the initial hubop for each cog will typically stall in order to synchronize to the hub window. After that point, the instruction cycle should be aligned (assuming none of the next 7 instructions stall). And, of course, that the hub will now be running at the full system clock rate (unlike P1, which ran at half the rate), which you pointed out.
  • jazzedjazzed Posts: 11,803
    edited 2014-04-16 19:36
    Thanks Chip.

    Looks lean and mean so far ... just like a winner.

    Any more details on the MSGIN and MSGOUT instructions? Anyone?
    I suppose they are like the P2's serial instructions. No, I'm not asking for serdes.

    Thanks.
    cgracey wrote: »
    Okay. Here is the tentative plan for the new chip:
    New Propeller Chip - 16 April 2014
    
        200MHz system clock
        16 cogs with 2-clock instructions, hub execution at 50% cog speed
        512KB hub memory with 8/16/32/128 bit cog transfers
        64 smart I/O pins
        100-pin 14x14mm TQFP with exposed thermal GND pad
    
    
    -- addressable cog registers
    --
    --    addr        read        write        name        hidden
    --    -----------------------------------------------------------------------
    --
    --    000-1EF        RAM        RAM
    --
    --    1F0        CNT        -        CNT        ICACHE0
    --    1F1        RND        -        RND        ICACHE0
    --    1F2        INA        -        INA        ICACHE0
    --    1F3        INB        -        INB        ICACHE0
    --    1F4        RAM        RAM+OUTA    OUTA
    --    1F5        RAM        RAM+OUTB    OUTB
    --    1F6        RAM        RAM+DIRA    DIRA
    --    1F7        RAM        RAM+DIRB    DIRB
    --    1F8        RAM        RAM+CTRA    CTRA
    --    1F9        RAM        RAM+CTRB    CTRB
    --    1FA        RAM        RAM+FRQA    FRQA
    --    1FB        RAM        RAM+FRQB    FRQB
    --    1FC        PHSA        PHSA        PHSA        ICACHE1
    --    1FD        PHSB        PHSB        PHSB        ICACHE1
    --    1FE        PTRA        PTRA        PTRA        ICACHE1
    --    1FF        PTRB        PTRB        PTRB        ICACHE1
    
    
    
    ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
    ----------------------------------------------------------------------------------------------------------------------
    
    ZCWS        0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS        RDBYTE    D,S/PTRA/PTRB        (waits for hub)
    ZCWS        0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS        RDWORD    D,S/PTRA/PTRB        (waits for hub)
    ZCWS        0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS        RDLONG    D,S/PTRA/PTRB        (waits for hub)
    ZCWS        0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS        RDQUAD    D,S/PTRA/PTRB        (waits for hub)
    
    ZCMS        0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS        SYSOP    D,S/#            (waits for hub, S/# determines four write-long enables)
    
    ZCWS        0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS        MSGIN    D,S/#            (receives message on pin, C=timeout)
    
    ZCMS        0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS        MUL    D,S/#            multiplier    (16 x 16 unsigned multiply)
    ZCMS        0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS        MULS    D,S/#            multiplier    (16 x 16 signed multiply)
    
    ZCMS        0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS        ISOB    D,S/#            bitop
    ZCMS        0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS        NOTB    D,S/#            bitop
    ZCMS        0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS        CLRB    D,S/#            bitop
    ZCMS        0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETB    D,S/#            bitop
    ZCMS        0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETBC    D,S/#            bitop
    ZCMS        0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETBNC    D,S/#            bitop
    ZCMS        0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETBZ    D,S/#            bitop
    ZCMS        0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETBNZ    D,S/#            bitop
    
    ZCMS        0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS        ANDN    D,S/#            logic
    ZCMS        0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS        AND    D,S/#            logic
    ZCMS        0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS        OR    D,S/#            logic
    ZCMS        0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS        XOR    D,S/#            logic
    ZCMS        0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS        MUXC    D,S/#            logic
    ZCMS        0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS        MUXNC    D,S/#            logic
    ZCMS        0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS        MUXZ    D,S/#            logic
    ZCMS        0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS        MUXNZ    D,S/#            logic
    
    ZCMS        0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS        ROR    D,S/#            rotator
    ZCMS        0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS        ROL    D,S/#            rotator
    ZCMS        0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS        SHR    D,S/#            rotator
    ZCMS        0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS        SHL    D,S/#            rotator
    ZCMS        0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS        RCR    D,S/#            rotator
    ZCMS        0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS        RCL    D,S/#            rotator
    ZCMS        0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS        SAR    D,S/#            rotator
    ZCMS        0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS        REV    D,S/#            rotator
    
    ZCWS        0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS        MOV    D,S/#            adder
    ZCWS        0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS        ABS    D,S/#            adder
    ZCWS        0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS        ABSNEG    D,S/#            adder
    ZCWS        0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS        NEG    D,S/#            adder
    ZCWS        0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS        NEGC    D,S/#            adder
    ZCWS        0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS        NEGNC    D,S/#            adder
    ZCWS        0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS        NEGZ    D,S/#            adder
    ZCWS        0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS        NEGNZ    D,S/#            adder
    
    ZCMS        0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS        MIN    D,S/#            adder
    ZCMS        0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS        MAX    D,S/#            adder
    ZCMS        0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS        MINS    D,S/#            adder
    ZCMS        0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS        MAXS    D,S/#            adder
    ZCMS        0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUMC    D,S/#            adder
    ZCMS        0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUMNC    D,S/#            adder
    ZCMS        0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUMZ    D,S/#            adder
    ZCMS        0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUMNZ    D,S/#            adder
    
    ZCMS        0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS        ADD    D,S/#            adder
    ZCMS        0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUB    D,S/#            adder
    ZCMS        0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS        ADDS    D,S/#            adder
    ZCMS        0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUBS    D,S/#            adder
    ZCMS        0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS        ADDX    D,S/#            adder
    ZCMS        0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUBX    D,S/#            adder
    ZCMS        0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS        ADDSX    D,S/#            adder
    ZCMS        0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUBSX    D,S/#            adder
    
    ZCWS        0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS        NOT    D,S/#            adder
    ZCMS        0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUBR    D,S/#            adder
    ZCMS        0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS        ADDABS    D,S/#            adder
    ZCMS        0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS        SUBABS    D,S/#            adder
    ZCMS        0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS        INCMOD    D,S/#            adder
    ZCMS        0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS        DECMOD    D,S/#            adder
    ZCMS        0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS        CMPSUB    D,S/#            adder
    ZCMS        0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS        WAITCNT    D,S/#            adder
    
    ZCMS        1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETS    D,S/#            muxer
    ZCWS        1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS        GETS    D,S/#            muxer
    ZCMS        1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETD    D,S/#            muxer
    ZCWS        1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS        GETD    D,S/#            muxer
    ZCMS        1000100 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETCOND    D,S/#            muxer
    ZCWS        1000101 ZC I CCCC DDDDDDDDD SSSSSSSSS        GETCOND    D,S/#            muxer
    ZCMS        1000110 ZC I CCCC DDDDDDDDD SSSSSSSSS        SETI    D,S/#            muxer
    ZCWS        1000111 ZC I CCCC DDDDDDDDD SSSSSSSSS        GETI    D,S/#            muxer
    
    --MS        100100n nn I CCCC DDDDDDDDD SSSSSSSSS        RORNIBn    D,S/#            muxer
    --MS        100101n nn I CCCC DDDDDDDDD SSSSSSSSS        ROLNIBn    D,S/#            muxer
    --WS        100110n nn I CCCC DDDDDDDDD SSSSSSSSS        GETNIBn    D,S/#            muxer
    --MS        100111n nn I CCCC DDDDDDDDD SSSSSSSSS        SETNIBn    D,S/#            muxer
    
    --MS        1010000 nn I CCCC DDDDDDDDD SSSSSSSSS        RORBYTn    D,S/#            muxer
    --MS        1010001 nn I CCCC DDDDDDDDD SSSSSSSSS        ROLBYTn    D,S/#            muxer
    --WS        1010010 nn I CCCC DDDDDDDDD SSSSSSSSS        GETBYTn    D,S/#            muxer
    --MS        1010011 nn I CCCC DDDDDDDDD SSSSSSSSS        SETBYTn    D,S/#            muxer
    
    --MS        1010100 0n I CCCC DDDDDDDDD SSSSSSSSS        RORWRDn    D,S/#            muxer
    --MS        1010100 1n I CCCC DDDDDDDDD SSSSSSSSS        ROLWRDn    D,S/#            muxer
    --WS        1010101 0n I CCCC DDDDDDDDD SSSSSSSSS        GETWRDn    D,S/#            muxer
    --MS        1010101 1n I CCCC DDDDDDDDD SSSSSSSSS        SETWRDn    D,S/#            muxer
    
    ZCWS        1010110 ZC I CCCC DDDDDDDDD SSSSSSSSS        ESWAP4    D,S/#            muxer
    ZCWS        1010111 ZC I CCCC DDDDDDDDD SSSSSSSSS        ESWAP8    D,S/#            muxer
    
    ZCWS        1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS        SPLITW    D,S/#            muxer
    ZCWS        1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS        MERGEW    D,S/#            muxer
    
    ZCMS        1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS        DJZ    D,S/@            adder
    ZCMS        1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS        DJNZ    D,S/@            adder
    
    ZCWS        1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS        TOPBIT    D,S/#            miscellaneous
    ZCWS        1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS        DECOD    D,S/#
    ZCMS        1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS        ALTDS    D,S/#            (set up redirection for result/D/S)
    ZCWS        1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS        JMPSW    D,S/@            (jump to S/@, store return address in D, WZ/WC to save/load flags)
    
    ZCRS        1100000 ZC I CCCC DDDDDDDDD SSSSSSSSS        TESTB    D,S/#            bitop    tests and compares
    ZCRS        1100001 ZC I CCCC DDDDDDDDD SSSSSSSSS        TESTN    D,S/#            logic
    ZCRS        1100010 ZC I CCCC DDDDDDDDD SSSSSSSSS        TEST    D,S/#            logic
    ZCRS        1100011 ZC I CCCC DDDDDDDDD SSSSSSSSS        CMP    D,S/#            adder
    ZCRS        1100100 ZC I CCCC DDDDDDDDD SSSSSSSSS        CMPX    D,S/#            adder
    ZCRS        1100101 ZC I CCCC DDDDDDDDD SSSSSSSSS        CMPS    D,S/#            adder
    ZCRS        1100110 ZC I CCCC DDDDDDDDD SSSSSSSSS        CMPSX    D,S/#            adder
    ZCRS        1100111 ZC I CCCC DDDDDDDDD SSSSSSSSS        CMPR    D,S/#            adder
    
    ZCRS        1101000 ZC I CCCC DDDDDDDDD SSSSSSSSS        TJZ    D,S/@
    ZCRS        1101001 ZC I CCCC DDDDDDDDD SSSSSSSSS        TJNZ    D,S/@
    ZCRS        1101010 ZC I CCCC DDDDDDDDD SSSSSSSSS        TJS    D,S/@
    ZCRS        1101011 ZC I CCCC DDDDDDDDD SSSSSSSSS        TJNS    D,S/@
    
    ZCRS        1101100 ZC I CCCC DDDDDDDDD SSSSSSSSS        -    D,S/#
    ZCRS        1101101 ZC I CCCC DDDDDDDDD SSSSSSSSS        -    D,S/#
    ZCRS        1101110 ZC I CCCC DDDDDDDDD SSSSSSSSS        -    D,S/#
    ZCRS        1101111 ZC I CCCC DDDDDDDDD SSSSSSSSS        -    D,S/#
    
    --LS        1110000 0L I CCCC DDDDDDDDD SSSSSSSSS        WRBYTE    D/#,S/PTRA/PTRB        (waits for hub)
    --LS        1110000 1L I CCCC DDDDDDDDD SSSSSSSSS        WRWORD    D/#,S/PTRA/PTRB        (waits for hub)
    --LS        1110001 0L I CCCC DDDDDDDDD SSSSSSSSS        WRLONG    D/#,S/PTRA/PTRB        (waits for hub)
    --LS        1110001 1L I CCCC DDDDDDDDD SSSSSSSSS        WRQUAD    D/#,S/PTRA/PTRB        (waits for hub, zero-extends #)
    
    --LS        1110010 0L I CCCC DDDDDDDDD SSSSSSSSS        MSGOUTA    D/#,S/#            (send message to pin(s) on OUTA)
    --LS        1110010 1L I CCCC DDDDDDDDD SSSSSSSSS        MSGOUTB    D/#,S/#            (send message to pin(s) on OUTB)
    --LS        1110011 0L I CCCC DDDDDDDDD SSSSSSSSS        MSGDIRA    D/#,S/#            (send message to pin(s) on DIRA)
    --LS        1110011 1L I CCCC DDDDDDDDD SSSSSSSSS        MSGDIRB    D/#,S/#            (send message to pin(s) on DIRB)
    
    --LS        1110100 0L I CCCC DDDDDDDDD SSSSSSSSS        WAITPAE    D/#,S/#            (waits for INA)
    --LS        1110100 1L I CCCC DDDDDDDDD SSSSSSSSS        WAITPAN    D/#,S/#            (waits for INA)
    --LS        1110101 0L I CCCC DDDDDDDDD SSSSSSSSS        WAITPBE    D/#,S/#            (waits for INB)
    --LS        1110101 1L I CCCC DDDDDDDDD SSSSSSSSS        WAITPBN    D/#,S/#            (waits for INB)
    
    --LS        1110110 0L I CCCC DDDDDDDDD SSSSSSSSS        WAITVID    D/#,S/#            (waits for video)
    --LS        1110110 1L I CCCC DDDDDDDDD SSSSSSSSS        PICKZC    D/#,S/#            (always writes Z/C)
    --LS        1110111 0L I CCCC DDDDDDDDD SSSSSSSSS        JP    D/#,S/@            (jump if pin IN high, pins registered at beginning of ALU cycle)
    --LS        1110111 1L I CCCC DDDDDDDDD SSSSSSSSS        JNP    D/#,S/@            (jump if pin IN high, pins registered at beginning of ALU cycle)
    
    --LS        1111000 0L I CCCC DDDDDDDDD SSSSSSSSS        REP    D/#,S/#            (begin repeat block of size D/# with S/# iterations)
    --LS        1111000 1L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    --LS        1111001 0L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    --LS        1111001 1L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    
    --LS        1111010 0L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    --LS        1111010 1L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    --LS        1111011 0L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    --LS        1111011 1L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    
    --LS        1111100 0L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    --LS        1111100 1L I CCCC DDDDDDDDD SSSSSSSSS        -    D/#,S/#
    
    ----        1111101 00 n nnnn nnnnnnnnn nnnnnnnnn        AUGS    #23bits            (appends n to upper bits of next immediate S in same task)
    ----        1111101 01 n nnnn nnnnnnnnn nnnnnnnnn        AUGD    #23bits            (appends n to upper bits of next immediate D in same task)
    
    ----        1111101 10 0 CCCC 0 nnnnnnnnnnnnnnnnn        LOC    #abs            (write 17-bit absolute address to $1EF)
    ----        1111101 10 0 CCCC 1 nnnnnnnnnnnnnnnnn        LOC    @rel            (write 17-bit relative address to $1EF)
    ---- wr        1111101 10 1 CCCC 0 nnnnnnnnnnnnnnnnn        JMP    #abs            (jump to 17-bit absolute address and write {Z,C,P[16:0]} to $1EF)
    ---- wr        1111101 10 1 CCCC 1 nnnnnnnnnnnnnnnnn        JMP    @rel            (jump to 17-bit relative address and write {Z,C,P[16:0]} to $1EF)
    ----        1111101 11 0 CCCC 0 nnnnnnnnnnnnnnnnn        CALL    #abs            (call to 17-bit absolute address using 4-level stack)
    ----        1111101 11 0 CCCC 1 nnnnnnnnnnnnnnnnn        CALL    @rel            (call to 17-bit relative address using 4-level stack)
    ----        1111101 11 1 CCCC 0 nnnnnnnnnnnnnnnnn        CALLA    #abs            (call to 17-bit absolute address using PTRA)
    ----        1111101 11 1 CCCC 1 nnnnnnnnnnnnnnnnn        CALLA    @rel            (call to 17-bit relative address using PTRA)
    
    ----        1111110 00 n CCCC n nnnnnnnnnnnnnnnnn        SETPTRA    #abs            (write 19-bit absolute address to PTRA)
    ----        1111110 01 n CCCC n nnnnnnnnnnnnnnnnn        SETPTRA    @rel            (write 19-bit relative address to PTRA)
    ----        1111110 10 n CCCC n nnnnnnnnnnnnnnnnn        SETPTRB    #abs            (write 19-bit absolute address to PTRB)
    ----        1111110 11 n CCCC n nnnnnnnnnnnnnnnnn        SETPTRB    @rel            (write 19-bit relative address to PTRB)
    
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00000        WAIT    D/#            (wait for some number of clocks, 0 same as 1)
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00001        WAITPX    D/#            (wait for any edge on pin D/#)
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00010        WAITPR    D/#            (wait for pos edge on pin D/#)
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00011        WAITPF    D/#            (wait for neg edge on pin D/#)
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00100        PUSH    D/#            (push D/# into 4-level stack)
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00101        SETVID    D/#            (set video mode)
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00110        -    D/#
    --L-        1111111 00 L CCCC DDDDDDDDD xxxx00111        -    D/#
                                                    (D[18:17] into Z/C via WZ/WC for JMP/CALL/CALLA/POP D)
    ZCR- wr        1111111 ZC x CCCC DDDDDDDDD xxxx01000        JMP    D            (jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)
    ZCR-        1111111 ZC x CCCC DDDDDDDDD xxxx01001        CALL    D            (call to D[16:0] using 4-level stack)
    ZCR-        1111111 ZC x CCCC DDDDDDDDD xxxx01010        CALLA    D            (call to D[16:0] using PTRA stack)
    ZCR-        1111111 ZC x CCCC DDDDDDDDD xxxx01011        -    D
    ZCR-        1111111 ZC x CCCC DDDDDDDDD xxxx01100        -    D
    --R-        1111111 00 x CCCC DDDDDDDDD xxxx01101        -    D
    --R-        1111111 00 x CCCC DDDDDDDDD xxxx01110        -    D
    --R-        1111111 00 x CCCC DDDDDDDDD xxxx01111        -    D
    
    ZCW-        1111111 ZC x CCCC DDDDDDDDD xxxx10000        POP    D            (pop 4-level stack into D)
    --W-        1111111 00 x CCCC DDDDDDDDD xxxx10001        -    D
    --W-        1111111 00 x CCCC DDDDDDDDD xxxx10010        -    D
    --W-        1111111 00 x CCCC DDDDDDDDD xxxx10011        -    D
    --W-        1111111 00 x CCCC DDDDDDDDD xxxx10100        -    D
    --W-        1111111 00 x CCCC DDDDDDDDD xxxx10101        -    D
    --W-        1111111 00 x CCCC DDDDDDDDD xxxx10110        -    D
    --W-        1111111 00 x CCCC DDDDDDDDD xxxx10111        -    D
    
    ZC--        1111111 ZC x CCCC xxxxxxxxx xxxx11000        RET                (return using 4-level stack)
    ZC--        1111111 ZC x CCCC xxxxxxxxx xxxx11001        RETA                (return using PTRA stack)
    ZC--        1111111 ZC x CCCC xxxxxxxxx xxxx11010        POLVID                (C = ready for WAITVID)
    -C--        1111111 0C x CCCC xxxxxxxxx xxxx11011        CACHEX                (invalidate instruction cache)
    ----        1111111 00 x CCCC xxxxxxxxx xxxx11100        -
    ----        1111111 00 x CCCC xxxxxxxxx xxxx11101        -
    ----        1111111 00 x CCCC xxxxxxxxx xxxx11110        -
    ----        1111111 00 x CCCC xxxxxxxxx xxxx11111        -
    
    ----        0000000 00 0 0000 000000000 000000000        NOP
    
    Aliases for WRLONG/RDLONG:     PUSHA/PUSHB/POPA/POPB 
    
    


    Note that the JMP instructions save a return address into $1EF, so these double as the old LINK instructions.
  • TubularTubular Posts: 4,646
    edited 2014-04-16 19:49
    jazzed wrote: »
    Any more details on the MSGIN and MSGOUT instructions? Anyone?
    I suppose they are like the P2's serial instructions. No, I'm not asking for serdes.

    They're for configuring sending/receiving data to/from smart pins
    see post #151 of this thread http://forums.parallax.com/showthread.php/155145-Putting-smarts-into-the-I-O-pins
  • jmgjmg Posts: 15,155
    edited 2014-04-16 20:02
    jazzed wrote: »
    T
    Any more details on the MSGIN and MSGOUT instructions? Anyone?
    I suppose they are like the P2's serial instructions. No, I'm not asking for serdes.

    These are how the more complex ports are configured, IIRC via a phantom-link(serial) over a line already routed to each pin.
    Chip also plans to interface the Config and Registers of the PinCell Counters using this same pathway, with the more direct Pin-registers available as Enable/trigger in some cases.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-16 21:42
    Here is a Summary of the latest Instruction Set posted earlier today
    Propeller P16X512x Instructions as of 2014/04/16
    ---------------------------------------------------------------------------------------------------------------------
    ZCxS Opcode  ZC I Cond  Dest       Source     Instr00 01      10      11        Operand(s)                Flags
    ---------------------------------------------------------------------------------------------------------------------
    ZCWS 00000ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  RDBYTE  RDWORD  RDLONG  RDQUAD    D,S/PTRA/PTRB             ZC ZC ZC ZC 
    ZCMS 00001ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  SYSOP   MSGIN   MUL     MULS      D,S/#                     ZC ZC ZC ZC 
    ZCMS 00010ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  ISOB    NOTB    CLRB    SETB      D,S/#                     ZC ZC ZC ZC 
    ZCMS 00011ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  SETBC   SETBNC  SETBZ   SETBNZ    D,S/#                     ZC ZC ZC ZC 
    ZCMS 00100ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  ANDN    AND     OR      XOR       D,S/#                     ZC ZC ZC ZC 
    ZCMS 00101ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  MUXC    MUXNC   MUXZ    MUXNZ     D,S/#                     ZC ZC ZC ZC 
    ZCMS 00110ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  ROR     ROL     SHR     SHL       D,S/#                     ZC ZC ZC ZC 
    ZCMS 00111ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  RCR     RCL     SAR     REV       D,S/#                     ZC ZC ZC ZC 
    ZCWS 01000ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  MOV     ABS     ABSNEG  NEG       D,S/#                     ZC ZC ZC ZC 
    ZCWS 01001ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  NEGC    NEGNC   NEGZ    NEGNZ     D,S/#                     ZC ZC ZC ZC 
    ZCMS 01010ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  MIN     MAX     MINS    MAXS      D,S/#                     ZC ZC ZC ZC 
    ZCMS 01011ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  SUMC    SUMNC   SUMZ    SUMNZ     D,S/#                     ZC ZC ZC ZC 
    ZCMS 01100ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  ADD     SUB     ADDS    SUBS      D,S/#                     ZC ZC ZC ZC 
    ZCMS 01101ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  ADDX    SUBX    ADDSX   SUBSX     D,S/#                     ZC ZC ZC ZC 
    ZCWS 01110ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  NOT     SUBR    ADDABS  SUBABS    D,S/#                     ZC ZC ZC ZC 
    ZCMS 01111ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  INCMOD  DECMOD  CMPSUB  WAITCNT   D,S/#                     ZC ZC ZC ZC 
    ZCMS 10000ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  SETS    GETS    SETD    GETD      D,S/#                     ZC ZC ZC ZC 
    ZCMS 10001ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  SETCOND GETCOND SETI    GETI      D,S/#                     ZC ZC ZC ZC 
    --MS 1001ffn nn I CCCC  DDDDDDDDD  SSSSSSSSS  RORNIBn ROLNIBn GETNIBn SETNIBn   D,S/#                     -- -- -- -- 
    --MS 10100ff nn I CCCC  DDDDDDDDD  SSSSSSSSS  RORBYTn ROLBYTn GETBYTn SETBYTn   D,S/#                     -- -- -- -- 
    --MS 101010f fn I CCCC  DDDDDDDDD  SSSSSSSSS  RORWRDn ROLWRDn GETWRDn SETWRDn   D,S/#                     -- -- -- -- 
    ZCWS 101011f ZC I CCCC  DDDDDDDDD  SSSSSSSSS  ESWAP4  ESWAP8                    D,S/#                     ZC ZC       
    ZCWS 10110ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  SPLITW  MERGEW  DJZ     DJNZ      D,S/# || D,S/@            ZC ZC ZC ZC 
    ZCWS 10111ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  TOPBIT  DECOD   ALTDS   JMPSW     D,S/# ||| D,S/@           ZC ZC ZC ZC 
    ZCRS 11000ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  TESTB   TESTN   TEST    CMP       D,S/#                     ZC ZC ZC ZC 
    ZCRS 11001ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  CMPX    CMPS    CMPSX   CMPR      D,S/#                     ZC ZC ZC ZC 
    ZCRS 11010ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  TJZ     TJNZ    TJS     TJNS      D,S/@                     ZC ZC ZC ZC 
    ZCRS 11011ff ZC I CCCC  DDDDDDDDD  SSSSSSSSS  -       -       -       -         D,S/#                     ZC ZC ZC ZC 
    ---------------------------------------------------------------------------------------------------------------------
    --LS 1110000 fL I CCCC  DDDDDDDDD  SSSSSSSSS  WRBYTE          WRWORD            D/#,S/PTRA/PTRB           --    --    
    --LS 1110001 fL I CCCC  DDDDDDDDD  SSSSSSSSS  WRLONG          WRQUAD            D/#,S/PTRA/PTRB [s]| D/#,S/#/s] --    --    
    --LS 1110010 fL I CCCC  DDDDDDDDD  SSSSSSSSS  MSGOUTA         MSGOUTB           D/#,S/#                   --    --    
    --LS 1110011 fL I CCCC  DDDDDDDDD  SSSSSSSSS  MSGDIRA         MSGDIRB           D/#,S/#                   --    --    
    --LS 1110100 fL I CCCC  DDDDDDDDD  SSSSSSSSS  WAITPAE         WAITPAN           D/#,S/#                   --    --    
    --LS 1110101 fL I CCCC  DDDDDDDDD  SSSSSSSSS  WAITPBE         WAITPBN           D/#,S/#                   --    --    
    --LS 1110110 fL I CCCC  DDDDDDDDD  SSSSSSSSS  WAITVID         PICKZC            D/#,S/# [s]| D/#,S/@[/s]         --    --    
    --LS 1110111 fL I CCCC  DDDDDDDDD  SSSSSSSSS  JP              JNP               D/#,S/@ [s]| D/#,S/#[/s]         --    --    
    --LS 1111000 fL I CCCC  DDDDDDDDD  SSSSSSSSS  REP             -                 D/#,S/#                   --    --    
    --LS 1111001 fL I CCCC  DDDDDDDDD  SSSSSSSSS  -               -                 D/#,S/#                   --    --    
    --LS 1111010 fL I CCCC  DDDDDDDDD  SSSSSSSSS  -               -                 D/#,S/#                   --    --    
    --LS 1111011 fL I CCCC  DDDDDDDDD  SSSSSSSSS  -               -                 D/#,S/#                   --    --    
    --LS 1111100 fL I CCCC  DDDDDDDDD  SSSSSSSSS  -               -                 D/#,S/# [s]| #23bits[/s]         --    --    
    ---------------------------------------------------------------------------------------------------------------------
    ---- 1111101 0f n nnnn  nnnnnnnnn  nnnnnnnnn  AUGS    AUGD                      #23bits                   -- --       
    ---- 1111101 10 0 CCCC  fnnnnnnnn  nnnnnnnnn  LOC     LOC                       #abs | @rel               -- --       
    ---- 1111101 10 1 CCCC  fnnnnnnnn  nnnnnnnnn  JMP     JMP                       #abs | @rel               -- --       
    ---- 1111101 11 0 CCCC  fnnnnnnnn  nnnnnnnnn  CALL    CALL                      #abs | @rel               -- --       
    ---- 1111101 11 1 CCCC  fnnnnnnnn  nnnnnnnnn  CALLA   CALLA                     #abs | @rel               -- --       
    ---- 1111110 ff n CCCC  nnnnnnnnn  nnnnnnnnn  SETPTRA SETPTRA SETPTRB SETPTRB   #abs | @rel | #abs | @rel -- -- -- -- 
    ---------------------------------------------------------------------------------------------------------------------
    --L- 1111111 00 L CCCC  DDDDDDDDD  xxxx000ff  WAIT    WAITPX  WAITPR  WAITPF    D/#                       -- -- -- -- 
    --L- 1111111 00 L CCCC  DDDDDDDDD  xxxx001ff  PUSH    SETVID  -       -         D/#                       -- -- -- -- 
    ZCR- 1111111 ZC x CCCC  DDDDDDDDD  xxxx010ff  JMP     CALL    CALLA   -         D                         ZC ZC ZC ZC 
    ZCR- 1111111 ZC x CCCC  DDDDDDDDD  xxxx011ff  -       -       -       -         D                         ZC -- -- -- 
    ZCW- 1111111 ZC x CCCC  DDDDDDDDD  xxxx100ff  POP     -       -       -         D                         ZC -- -- -- 
    --W- 1111111 00 x CCCC  DDDDDDDDD  xxxx101ff  -       -       -       -         D                         -- -- -- -- 
    ZC-- 1111111 ZC x CCCC  xxxxxxxxx  xxxx110ff  RET     RETA    POLVID  CACHEX                              ZC ZC ZC -C 
    ---- 1111111 00 x CCCC  xxxxxxxxx  xxxx111ff  -       -       -       -                                   -- -- -- -- 
    ---------------------------------------------------------------------------------------------------------------------
    
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 00:59
    pjv,

    I may well have your scheduling proposals backwards but you will have to explain why.

    I don't poo-poo cooperative scheduling. It's a fine thing and I have used it on many production projects dating back decades.

    Yes I did offer FullDuplexSerial as an example. I believe it's as good an example as any. Having written a FullDuplexSerial in C that runs entirely in a COG and reaches 115200 baud I feel I can say something about it.
    That particular code does in fact ping-pong back and forth between receive and transmit modes, and it is a form of multitasking, albeit a very poor one; one that does not use a scheduler.
    That "ping-pong" of Rx and Tx is commonly known as "coroutines". An ancient and long forgotten technique. There is nothing "very poor" about it. It does what it does very efficiently. Given that we only need two "threads" here it is a very good, low overhead, solution.
    However, what those citing that example seem to be missing is the level of elegance and convenience a "scheduler adds to the mix".
    That would be me. No I'm not missing the "elegance and convenience" a scheduler can add. However in the case of FDS a scheduler can only add code that we don't need and consume time that is short supply. Not a good solution.
    ...is far from general and not optimal.
    True, coroutines may not be general or optimal in the general case. But in the case of FDS I believe they are. FDS does not need a scheduler or a general solution.
    ...the scheduler places it in a "wait" condition thereby also minimizing the power
    You are not understanding how the P1 or proposed PII works. The only way to get into a lower power mode is to execute a WAITxxx instruction. Say, wait on a pin state. However the WAITxxx instructions halts the entire COG. All other threads and your scheduler will not be able to run.
    This method permits one to have as many threads as one has room for in a cog, of course consistent with the total work load to be done, and the maximum jitter tolerated.
    Ignoring the low power WAIT idea that cannot work, I agree. If one want's to implement many threads, more than two coroutines, some kind of scheduler is required. That scheduler will of course take time to run so it has an overhead.
    So my conclusions that caused me to say what I did are based on my interpretation of comments from those who appear to not fully understand.
    I think we understand very well.

    If we are missing a point, I have a challenge for you, an opportunity to prove your case. Can you write a version of FullDuplexSerial for the P1 that uses a scheduler in the form you have proposed that can operate at the same, or higher, baud rate as the maximum of FullDuplexSerial? We would all be very interested to see it. If you can demonstrate power saving that would be a bonus.
    I hope I have cleared up any mysteries regarding my comments, expanded on how I see cooperative routines working,
    You have not. We all know how cooperative schedulers work. What we don't know is how you propose to get the performance, let alone the power saving. Not saying it can't be done but, I at least don't see it.

    Do we need hardware scheduling?

    I did some advanced mathematics and arrived a startling result.

    The previous PII concept has 8 COGs each capable of running 4 hardware scheduled threads each.
    That's 8 * 4 = 32 threads.

    The current PII concept has 16 COGs each capable of running 2 software coroutines, using JMPRET, very efficiently.
    That's 16 * 2 = 32 threads.

    This is a stunning result, both designs support the same number of threads! The latter design, with no hardware scheduling, has some advantages:

    a) Pairs of coroutines on multiple cores are a lot simpler to reason about for the programmer.
    b) We can actually save power by having a single thread in COG and using WAITxxx
    c) Better performance.
    d) A simple more light weight chip design that get's the PII into silicon faster. (A chip that exists is much faster than one that does not :) )

    My conclusion is that hardware scheduled thread is perhaps nice to have but in order to keep up with the old design the new design does not need them.

    P.S. Could someone in the know make the calcs and prove c) above?
  • RossHRossH Posts: 5,399
    edited 2014-04-17 02:10
    Heater. wrote: »
    That "ping-pong" of Rx and Tx is commonly known as "coroutines". An ancient and long forgotten technique.

    Ugh! Coroutines are just "goto" statements prettied up and made to sound "structured".

    Like "goto" they have utility in about 1% of cases, and just serve to confound and confuse in the other 99%.

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 02:27
    RossH,
    Coroutines are just "goto" statements prettied up and made to sound "structured".
    No they are not!

    GOTO, as in JMP, has no expectation of returning to the instruction, or statement in a high level language, following the GOTO.

    Coroutine calls do.

    The coroutines as used in FullDuplexSerial use JMPRET. You know the same instruction we use for subroutine calls. The return address is saved and the next call to JMPRET gets you back to where you were.

    Further, a GOTO has a specified target location. A coroutine call does not.

    "confuse and confound" ? I don't think so. The alternative here is to make a call to some scheduler component which then makes a return to the next threads code, or immediately back to the same thread. This is no simpler to comprehend and adds a bunch of run time overhead that essentially does nothing.

    So you get the same challenge. Can you write a FullDuplexSerial that does not use that "confusing" coroutine mechanism that is as efficient and simple as FDS?
  • evanhevanh Posts: 15,356
    edited 2014-04-17 03:17
    The hardware time slicing gives a much finer grain than what is being achieved in FDS. Bill has gone over this already.

    Does it provide a benefit? Sure it does. Trying to argue it doesn't is pointless.
    Is it required? Of course not, but then neither is a hardware multiplier required.
    Is Chip going to add it? He's already said it depends on ease of fit with hubexec.
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-17 03:27
    evanh wrote: »
    ...but then neither is a hardware multiplier required.

    Surely multipliers come into their own when used by the compiler to calculate the address of elements within a structure?
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 03:27
    evanh,

    Yep it does. I have also detailed why and how here a couple of times. I'm not sure why it is being questioned. It could be argued that a cooperative system, with a scheduler or coroutines, has an advantage over a hardware scheduler when one of the threads absolutely needs all the processing power when it needs it and the other(s) are of a lesser priority. However that argument has not been made. And I don't believe that is the case with things like FDS anyway.

    Yep, hardware scheduling would be nice but we can live without it pretty well.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 03:35
    Brian,

    A multiply is not required to calculate the address of an element within a structure. Only addition. You have the base address of the structure and you add an offset to get to the field you want.

    Multiply is required to calculate the address of elements with in a two dimensional array (or more dimensions) as you have to do something like baseAddress + (xSize * y) + x to get element array[x, y].

    How many 2D arrays have you seen in Propeller programs?

    Of course if you have the memory you can make the array dimensions a power of 2 in size and just use shifts instead of multiply.
  • evanhevanh Posts: 15,356
    edited 2014-04-17 03:56
    Surely multipliers come into their own ...

    All features come into their own in their own ways.

    Take the FDS driver: By re-writing as two tasks, one for rx and one for tx, instead of the coroutine mechanism, one could expect a significant increase in maximum bit rate without requiring the use of extra cogs. Or maybe even go for a dual super fast FDS in one cog. :D

    There is also the question of how much specialised hardware is put in around the cogs/hub/pins ... hardware threading provides some extra options for reducing the amount of specialised hardware that gets added.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 04:12
    Is there some non linear scaling of complexity, size, power consumption vs features going on here?

    I mean let's say adding hubexec and making it work optimally might involve adding X amount of size, complexity, power consumption (CSP).

    Also implementing hardware threading might add Y amount of CSP.

    Adding them both together to work optimally might not be (X + Y) * CSP but (X * Y) * CSP.

    I.e. the complexity grows faster than you think as all the interactions between the two have to be taken into account.

    Is the why the previous design exploded? Are we just expecting the impossible given the technology constraints?
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-17 04:28
    Heater. wrote: »
    How many 2D arrays have you seen in Propeller programs?
    In my case: none. But then I usually have another processor alongside the P1 running 'main()'. It's not unusual for me to have arrays of structures containing arrays. Or 2D arrays.

    My designs often have multiple uCs in them. It's not unknown for keypad and display duties to be stuffed into a small 20-pinner which is treated as an intelligent peripheral to offload from the main uC.

    What excites me about the P1+ is that, at last, I'll be able to write in C without having to worry about constraints as basic as the amount of memory the chip has. I can't wait.
Sign In or Register to comment.