Shop OBEX P1 Docs P2 Docs Learn Events
The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Page 33 — Parallax Forums

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

13031333536144

Comments

  • potatoheadpotatohead Posts: 10,260
    edited 2014-04-15 21:21
    @Phil: It's nice to be loved. :)
  • RossHRossH Posts: 5,399
    edited 2014-04-15 21:23
    That was 46 for P1+, 4 for P2

    Nothing to do with hubexec.

    Not quite true.

    48 said we would accept the P16X32B or P32X32B as Chip had originally specified it - i.e. without Hubexec (but there may have been one or two preferences for Hubexec in there somewhere).

    4 wanted to hold out for the P2, or something else (EDIT: including Hubexec, as msrobots just pointed out).
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-15 22:01
    msrobots wrote: »
    Bill,

    you have some short memory there.

    You voted NO because of not including hubexec and slot-sharing.

    #225


    #226



    #227


    It was NOT about P1/P2 it was about P1 on steroids.

    Don't cheat

    Mike
    Mike,
    Actually Chip added hubexec back in.

    As he then went thru it he realised there were a few gotchas with the simplified version of hubexec.
    But that didn't really mean the simplified hubexec would not work, only that a couple of nice things couldn't be done.
    Together we (yes "we" meaning some help from the forum) found a solution for parts, but some other parts are currently too complex.

    So the current hubexec discussion really has nothing to do with the original poll done by Ross.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-15 22:12
    Since there seemed to be no takers on a separate thread, I am posting here.

    Please also see my post #971 below.

    This is a possible solution to the cog stack for CALLA/RETA...

    With just a new "LINKA" instruction and standard cog instructions in cog to support the LINKA & RETA instructions, we could make a cog stack ourselves.
    These support cog instructions may or may not lose a hub cycle, depending on where the code currently is.
    They would not increase user code (beyond the base cog support routine).

    We do not have to use it if we don't want to. It would be at least as fast as hub based stack(s), and likely faster, especially if there were a few instructions replaced with new ones, or someone comes up with a better method.
    Note I have not used PUSHA & POPA as these are currently not guaranteed.
    dat
    Note: I have not considered save/restore of Z & C Flags.
                  ...
                  LINKA     #<routine>              ' <routine> is the 15-bit address of the hub-long/cog routine to be called
                                                    '    This instruction writes a 32-bit long to the fixed cog address _SAVEA
                                                    '       _SAVEA[31] = Z flag
                                                    '       _SAVEA[30] = C flag
                                                    '       _SAVEA[29:15] = 15-bit <routine> address (hub-long or cog)
                                                    '       _SAVEA[14:0]  = 15-bit <return>  address (hub-long or cog)
                                                    '    Then it jumps to the fixed cog address _CALLA
                  ...
    <routine>     ...
                  RETA                              ' This instruction jumps to the fixed cog address _RETA
                                                    '    which will ultimately return to the next instruction after LINKA.
                                                    '    It could be simply coded as JMP #_RETA.
                  ...
    
    ' The following routine must be setup in the cog ram at the fixed location $1Ex
    ' This routine supports the new instructions LINKA & RETA
    ' Note: new instructions could combine the following to simplify the code.
                  org       $1Ex
    _CALLA        movd      _PUSHA, _INDA           ' set the cog stack pointer
                  add       _INDA, #1               ' INDA++
    _PUSHA        mov       *-*, _SAVEA             ' push the return address onto the cog stack
                  shr       _SAVEA, #15             ' get <routine> address into lower 15 bits
                  jmp       _SAVEA                  ' jump to hub-long/cog <routine> 15-bit address
                        
    _RETA         sub       _INDA, #1               ' --INDA
                  movs      _POPA, _INDA            ' set the cog stack pointer
                  nop                               ' (may not be required?)
    _POPA         mov       _SAVEA, *-*             ' pop the return address off the cog stack              
                  jmp       _SAVEA                  ' includes 17 bit jump address (Z&C??)
    _SAVEA        long      0                       ' Z,C,<routine>,<return>              
    _INDA         long      0                       ' INDA cog stack pointer
    
  • msrobotsmsrobots Posts: 3,706
    edited 2014-04-15 22:15
    Cluso99 wrote: »
    Mike,
    Actually Chip added hubexec back in. As he went thru it he realised there were a few gotchas with the simplified version of hubexec. But that didn't realy mean the simplified hubexec would not work, only that a couple of nice things couldn't be done. Together we (yes "we" meaning some help from the forum) found a solution for parts, but some other parts are currently too complex.

    So the current hubexec discussion really has nothing to do with the original poll done by Ross.

    Well the poll was about a P1+ NOT having all the clutter of the P2 and was about a specific Model WITHOUT hubexec.

    So that poll still stands.

    The same people who ruined the P2 thru excessive adding of functionality are doing the same thing now with the new P1+ based on Chips Model of that Thread the poll was in.

    Even when Chip then said he needs to take that hubexec out AGAIN because it is complicating the NEW design over his limits - there is no stopping of the feature creep.

    WHO really needs this and WHY?

    And would ANY of them put there own money on the block? Like people offered on that thread for a P1+ WITHOUT feature creep?

    How about $10 per post for each additional feature you need? just 6000 posts for a shuttle run. We can do better than that in two weeks.

    Enjoy!

    Mike
  • infoinfo Posts: 31
    edited 2014-04-15 22:35
    I don't have full understanding of parallel processing, so I feel kinda like window shopping. I see something interesting but I can't have it. If parallel means several tasks at the same time, then I would think that one processor could read/write i2c memory, another processor could read/write serial port, all at the same time, but that requires hardware usart, etc, so I would really need to see some programs to understand the advantage of cogs, or explanation how is it better than interrupts. The Propeller is huge step up from Basic Stamp, but it isn't easy to understand. At least some parts are not clear even to programmers and taking full advantage of analog dac and adc is really a black magic. It is like hobby magazine that turned into trade magazine for science. Different audience. Many readers left behind unable to catch up. Nothing wrong with that, its just different.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-15 22:49
    info.

    Don't worry about the "parallel programming" part.

    If you need a UART then that UART is written in software and runs on one of those processors. Your program can Tx and Rx by calling functions provided by that software UART. You don't even have to know it is a software UART or that it is running anything in parallel. Same goes for I2C, Spi, Video output etc etc etc. Just use the library objects provided that create these devices for you.

    One day when you are comfortable with this idea you might want to write your own drivers. Then you will start to understand why not having interrupts is wonderful, having a whole processor dedicated to do the job is so much easier.
  • koehlerkoehler Posts: 598
    edited 2014-04-15 23:04
    MSROBOTS,

    I think many would agree with you on the feature creep complaint.

    However, having your mainline code running LMM at 12-15MIPS, no matter how fast your Cog PASM cores are running, seems to be well below current expectations of new customers. Even 25 probably.
    I agree once you hit 50 + 10-12 super Cog cores at 100 MIPS, it's probably interesting enough to give Parallax a test run.
    Otherwise, get an M3/M4 with the peripherals you need and Core MIPS sufficient for the job, and move on.
    Parallax does even get a shot.

    I think people who are against this are simply seeing only their personal use case, and not the commercial use that Parallax needs to remain profitable for the future.

    I doubt all the annual sales volume to forum participants (minus commercial lurkers) profit-wise cover even 1 of their 70 employees annual pay.
  • msrobotsmsrobots Posts: 3,706
    edited 2014-04-15 23:25
    Hmm,

    now we are down to 12-15 MIPs LMM? Bill was still talking about 25 MIPs and Ariba has a thread with 50 MIPs LMM.

    The current P1 does 20 MIPS in PASM so C in LMM on the P1+ would be faster then PASM on the P1. Right?

    The current customers paying for Parallax use the existing P1. Having a chip with

    5 times the ram
    5 times the speed,
    2 times the pins
    ADC/DAC

    and NOT eating 5 Watt

    Might give them a chance to enhance their products while keeping the Propellers.

    remember. The featuritis already ate the P2.

    Want to kill P1+ too?

    me not.

    Mike
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-16 00:04
    Here is what I also meant to post with my hubexec post #965 above...

    HUBEXEC
    Here is where I see we are up to with hubexec...
    * Hub can be accessed as 128bits (QUAD)
    * We can read a block of 4 instructions into a cog cache (ICACHE)
    * We can execute up to 4 instructions out of this ICACHE (presuming no branches and initial quad long aligned.
    * We can have CALLA & CALLB using PTRA & PTRB as hub pointers where the return address is stored.
    * RETA & RETB also uses PTRA & PTRB to fetch the return address.
    * We can have PUSHA & PUSHB and POPA & POPB too.
    * So, support for hub stacks are possible.
    * We can have LINK (a CALL where the return address is placed in a fixed cog register (currently $1EF). This supports GCC, etc.
    * We can have a 4 deep 19bit buried LIFO for CALL & RET, PUSH & POP. There is a preference to remove this.
    What we cannot have (currently anyway) is a Cog Stack using INDA & INDB because the instruction timing does not work.
    * We can overcome INDA & INDB for indirect cog access by using new a ALTDS instruction. This can be hidden from the user by the compiler, but it adds an instruction either way.
    Are there alternate solutions ???
    * Maybe increase the depth of the buried LIFO ???
    This may remove the need to use a PTRA/PTRB based cog stack.
    * Maybe use a specialised LINKA & RETA and cog support sw as I described in my earlier post.
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-16 00:15
    Just a random early morning thought but does it matter if all Cores don't have equal access to the RAM?

    In other words there are 16 cores which have restricted access to act as intelligent IO and 4 cores which are closely coupled to the RAM in a central block, along with the CORDIC unit, that provide the heavy lifting.

    The IO cores are ultra-simple with no support for threading. The central cores have threading capability.

    I guess this is just a fixed version of the various slot allocation schemes proposed but it might result in simpler logic.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-16 00:20
    msrobots wrote: »
    Hmm,

    now we are down to 12-15 MIPs LMM? Bill was still talking about 25 MIPs and Ariba has a thread with 50 MIPs LMM.

    The current P1 does 20 MIPS in PASM so C in LMM on the P1+ would be faster then PASM on the P1. Right?

    The current customers paying for Parallax use the existing P1. Having a chip with

    5 times the ram
    5 times the speed,
    2 times the pins
    ADC/DAC

    and NOT eating 5 Watt

    Might give them a chance to enhance their products while keeping the Propellers.

    remember. The featuritis already ate the P2.

    Want to kill P1+ too?

    me not.

    Mike
    Just because we voted for the P1+ didn't mean we really did not want hubexec. But the P2 was not going anywhere soon.
    Chip took the P1+ onboard and added in a few P2 features - his choice, not ours - although obviously we embraced it.

    Chip, and many others including myself, realise that hubexec, no matter how crippled, is way better than sw LMM. Apart from the fact LMM runs 4x slower, plus less every time a jump/call/ret occurs (because there are multiple hub instructions executed each time), plus less every time a hub data location is accessed, is likely the now 12-15MIPs that Bill is referring to. BUT, the cog is still executing at at least 4x the power of hubexec mode. Every hub instruction takes 4 cog instructions to execute.

    It is also considered a kludge. A very smart kludge in the P1 as there was otherwise no alternative. The silicon was done!

    Here we have the opportunity to discuss the problems and try and find solutions.

    IMHO hubexec was the single most benefit that the P2 ultimately uncovered. It would be such a shame to waste this without some constructive thoughts, without trashing this.

    Otherwise, the P1+ (or whatever it is called) cannot really say with all honesty that it can run 512KB programs without real qualifiers. We are destined to the less than 496 instruction limit.

    I realise you may not be able to contribute to valid alternatives, but there are some of us here who have the experience to suggest varying alternatives. And who knows, one of them may just be the seed that opens Chip's mind to a simple alternative. I have see this often enough to know this works - Chip has even said so!
  • cgraceycgracey Posts: 14,133
    edited 2014-04-16 00:22
    Question:

    Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

    It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.

    I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.

    No multitasking would keep the cogs very simple to understand, and keep them deterministic.

    Any thoughts?
  • Roy ElthamRoy Eltham Posts: 2,996
    edited 2014-04-16 00:26
    Chip,
    I am fine without the hardware multitasking. I've never been a fan of it personally. I think it's much less important now that we have 16 COGs, and like you have said you can still do software based cooperative multitasking as needed.

    I know there are some folks that really like it and wanted it... not sure how badly they will miss it...
  • Peter JakackiPeter Jakacki Posts: 10,193
    edited 2014-04-16 00:27
    cgracey wrote: »
    Question:

    Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

    It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.

    I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.

    No multitasking would keep the cogs very simple to understand, and keep them deterministic.

    Any thoughts?

    If I left this reply blank I think everyone would know what I said. Yes, leave it out, we have multiple cores and smart I/O but no silicon so if this makes silicon possible sooner, then please do.
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-16 00:32
    cgracey wrote: »
    Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

    Not me: in fact I suggested something similar just two posts above you.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-04-16 00:33
    cgracey wrote: »
    Question:

    Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

    It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.

    I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.

    No multitasking would keep the cogs very simple to understand, and keep them deterministic.

    Any thoughts?
    I am fine without multi-tasking, especially with 16 cogs!
    Keep the silicon and power for killer features, not niceties that we don't really need.

    BTW I don't consider the ROM Monitor essential either. We can always load it up from external flash.
  • cgraceycgracey Posts: 14,133
    edited 2014-04-16 00:41
    Roy Eltham wrote: »
    ...I know there are some folks that really like it and wanted it... not sure how badly they will miss it...


    It's valuable for fine-grained timing that deals with I/O. If we have smart pins, it's needed much less.
  • SapiehaSapieha Posts: 2,964
    edited 2014-04-16 00:44
    Hi Chip.

    In my opinion --->
    Simplest possible HubExec have more value that Task's

    As IC will have 16 COG/Cores -- That with internal I/O port can give any TASK from second COG/Core.



    cgracey wrote: »
    Question:

    Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

    It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.

    I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.

    No multitasking would keep the cogs very simple to understand, and keep them deterministic.

    Any thoughts?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2014-04-16 00:56
    cgracey wrote:
    Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?
    No! Absolutely not! (Same goes for hubexec.) Now we're getting back to the Propeller's roots, and I'm liking it.

    -Phil
  • RossHRossH Posts: 5,399
    edited 2014-04-16 01:20
    No! Absolutely not! (Same goes for hubexec.) Now we're getting back to the Propeller's roots, and I'm liking it.

    -Phil

    Agree.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-16 01:58
    Hardware multi-tasking is not essential, especially now that there are twice as many COGs.

    The whole point of hardware multi-tasking was to make better use of those big "expensive" COGs when there were few. Now we have leaner, meaner COGS and there are many-.

    Does it make HUB exec any easier or is that still the same problem?
  • cgraceycgracey Posts: 14,133
    edited 2014-04-16 02:00
    Heater. wrote: »
    Hardware multi tasking is not essential, especially now that there are twice as many COGs.

    Does it make HUB exec any easier or is that still the same problem?


    Not dealing with tasks makes hub exec a little simpler.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-16 02:07
    If dropping multi-tasking makes hub exec even a little simpler then that's a good reason to drop multi-tasking.

    I would imagine that a simple hubexec that is faster than a software LMM loop but perhaps not as fast as has theoretically possible (give an infinite number of gates and zero power consumption) is still very valuable.

    Edit: "have" corrected to "drop"
  • Brian FairchildBrian Fairchild Posts: 549
    edited 2014-04-16 02:23
    Heater. wrote: »
    If dropping multi-tasking makes hub exec even a little simpler then that's a good reason to have multi-tasking.
    Do you mean that? Or should there be a 'not' in there?
  • Roy ElthamRoy Eltham Posts: 2,996
    edited 2014-04-16 02:27
    I think Heater meant 'drop' instead of 'have'.
  • evanhevanh Posts: 15,356
    edited 2014-04-16 02:29
    The hardware threads were born out of two factors,
    1: The lack of extra Cogs in the Prop2 design.
    2: The significant increase in performance and realestate of each Cog.

    So, with the now simpler 16 Cogs, that need has diminished.

    That said, I'd still be interested in something that could exist as Cog only feature. Being able to partition off some fixed amount of MIPS for an assitant task was a nice idea all on it's own.
  • TubularTubular Posts: 4,646
    edited 2014-04-16 02:31
    I'd miss tasks. I have three asynchronous tasks running in a single (P3-) cog, and only one one of those (video stream) needs to interact with the hub rather than the pins.

    If tasks go then the replacement is probably a 4~6 cog "traditional P1 style" cog solution, with key functions divided into one P1+ cog each. It would have a heap of data going via hub, which would be unnecessary, but it'd still work. 4~6 cogs depending on how smart the pins are.

    The good news is I'd have more cogs left over now, and perhaps get a P1+ slightly sooner, which has appeal too. But it worries me slightly that we're getting towards the point where two P1's (also giving 16 cogs, 64 pins, but available now) is looking like valid competition. I know that's simplistic.

    I haven't looked in depth at software tasking, so I'd be interested in your proposed general approach for the monitor, Chip. I expect there would be lots of comms objects where you have an input stream, an output stream, and a command processor / state machine, and what kind of performance looks achievable with a software approach.
  • jmgjmg Posts: 15,155
    edited 2014-04-16 02:32
    cgracey wrote: »
    Question:

    Is anyone going to be disappointed if we get rid of hardware multitasking in the cogs?

    It's a cool feature, but does introduce jitter in tasks, depending on the instruction mix. It also takes some extra flops and logic to support properly, beyond the Z/C/PC's.

    I'm thinking about the ROM_Monitor and realizing that I could code it with a single task by doing cooperative multitasking at a few time-critical points in the program. I wouldn't need hardware multitasking, after all.

    No multitasking would keep the cogs very simple to understand, and keep them deterministic.

    Any thoughts?

    Any numbers ? ie what MHz is attained with/without this enabled, and what is the exact die impact.

    Given this is an entirely optional feature, that does bring some important marketing edge, plus an ability to use that expensive COG Memory more fully, as well as Debug & watchdog abilities, it seems the sort of thing to remove only when there is a pressing reason to do so. ( like Area or Power envelopes being hit)

    What is the size increase in the ROM monitor, without tasking ?
  • jmgjmg Posts: 15,155
    edited 2014-04-16 02:36
    cgracey wrote: »
    Not dealing with tasks makes hub exec a little simpler.

    - but tasks are optional ? - so it could be that hub exec needs tasks disabled ?
    Not every COG is going to need HubExec, and I'm not sure both Hub Exec and Tasks in the same COG is vital, but tasks will allow more to fit into a COG - just look what was packed into a P2 COG with tasks.
Sign In or Register to comment.