Shop OBEX P1 Docs P2 Docs Learn Events
What are your 3 most wanted features from a P1.1? Now with bonus questions! - Page 2 — Parallax Forums

What are your 3 most wanted features from a P1.1? Now with bonus questions!

2

Comments

  • Dave HeinDave Hein Posts: 6,347
    edited 2015-02-04 13:30
    [QUOTE=mark
  • pik33pik33 Posts: 2,350
    edited 2015-02-04 13:43
    3 most wanted features?

    1. mul
    2. portb
    3. speed (clock/pll)
  • User NameUser Name Posts: 1,451
    edited 2015-02-04 13:56
    [QUOTE=mark
  • markmark Posts: 252
    edited 2015-02-04 14:11
    Hmm... My understanding is that die size has a non-negligible effect on the cost of a chip because during each step of the manufacturing of ICs with large geometries, you're making fewer parts per wafer. Granted, the process likely costs more, but I would assume the labor prices are roughly the same. That brings up an interesting question: at what point (after how many units produced), would a P1 on a smaller process resulted in cost savings?
  • User NameUser Name Posts: 1,451
    edited 2015-02-04 14:32
    'Tis a good question! There are also heat dissipation, leakage current, and speed to consider. But I'm likely to be happy with whatever speed and current sourcing/sinking limitations might be imposed by the design rules.

    BTW, the P1 is 7.28 mm on edge (perfectly square), has around 3e6 transistors, and is implemented @ 350nm node. A 90nm version ought to fit nicely into a die 2 mm on edge.
  • Cluso99Cluso99 Posts: 18,069
    edited 2015-02-04 18:23
    FYI There is a P8X16A... A P1 with 16 I/O in a DIP24 0.6"w package.

    http://forums.parallax.com/showthread.php/149255-P8X16A-DIP24-16xI-O-Propeller-now-working-)?highlight=p8X16a

    A P8X12A in a DIP20 0.6"w package is possible, but it can only boot from EEPROM (ie no download capability)

    ;););)
  • markmark Posts: 252
    edited 2015-02-04 21:20
    Cluso99 wrote: »
    FYI There is a P8X16A... A P1 with 16 I/O in a DIP24 0.6"w package.

    http://forums.parallax.com/showthread.php/149255-P8X16A-DIP24-16xI-O-Propeller-now-working-)?highlight=p8X16a

    A P8X12A in a DIP20 0.6"w package is possible, but it can only boot from EEPROM (ie no download capability)

    ;););)

    Hahaha. Funny thread. Still waiting for a 2 I/O version.
  • ksltdksltd Posts: 163
    edited 2015-02-05 17:38
    While a linear scaling of the 350nm device might lead you to believe that you can get the die size down to 2mm square in 90, I believe you'll have a hard time placing 11 pads on a 2mm side. And linear scaling is rarely achievable.

    Bottom line is that if one part is going to be taped out, its hard to imagine having fewer than 32 IOs as IO count is a major constraint for many designs. I believe that being pin compatible has little value, but when I think about what I'd do with the device, I always come back to the same 44 pin LQFP for cost-of-assembly reasons in the market that matters.

    So while I can't imagine seriously saying that package/pin compatibility matters and software compatibility doesn't matter - that's exactly what I'd advocate here. Same package, same pinout and punt on software compatibility.
  • Dave HeinDave Hein Posts: 6,347
    edited 2015-02-05 19:21
    1. 1 GHz instruction rate
    2. 16 MB Hub RAM
    3. 1080p60 HDMI Video Out
    4. $2 Price
  • TubularTubular Posts: 4,622
    edited 2015-02-05 19:29
    1. Code protection
    2. More pins
    3. Analog

    4. More ram
  • ozpropdevozpropdev Posts: 2,791
    edited 2015-02-05 19:56
    1. Hub execution
    2. Floating point math
    3. On chip FLASH (secure)
    .
    4. 640K Hub ram ;)
  • msrobotsmsrobots Posts: 3,704
    edited 2015-02-05 20:09
    1. Fast prop2Prop connection. Out of COGs? Out of PINs? get another Prop.
    2. ROM as prefilled RAM. Override when you need the space, but not the ROM content.
    3. fast Input mode via Video shifter.

    4 Faster.

    Enjoy!

    Mike
  • markmark Posts: 252
    edited 2015-02-05 20:40
    Dave Hein wrote: »
    1. 1 GHz instruction rate
    2. 16 MB Hub RAM
    3. 1080p60 HDMI Video Out
    4. $2 Price

    Is that it?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2015-02-05 22:28
    The P1.x/P2 drama kinda reminds me of a South Park episode:
    1. Solicit would-be customers for suggestions.
    2. ????
    3. PROFIT!

    :)

    -Phil
  • Dave HeinDave Hein Posts: 6,347
    edited 2015-02-06 07:23
    [QUOTE=mark
  • Heater.Heater. Posts: 21,230
    edited 2015-02-06 08:51
    Yes, but Dave, what you want is already available. Get yourself a Raspi 2 or similar.

    Except item 10.

    10. Tightly coupled COGs so that a single threaded program can be run on multiple COGs

    How is that even possible?
  • bte2bte2 Posts: 154
    edited 2015-02-06 09:04
    My list is short-

    I would like to see 5v tolerant pins, maybe a pin to apply port-specific Vcc to a group of pins. Heavier diodes on the input pins would be fantastic too.

    I would also like to have the program exist in the Prop to eliminate the boot pause. A full second to clear a display or LED that powers up 'on' seems like an eternity. I have some T-1 3/4 WS2812 LEDs that power up blue (for some stupid reason) and I hate it that it takes so long to clear them.

    My biggest wish has nothing to do with the Propeller- I wish there was conditional compilation directives in Prop Tool and/or an inline INCLUDE directive so I could keep data tables and pin constants in their own files.
  • Dave HeinDave Hein Posts: 6,347
    edited 2015-02-06 09:06
    Heater. wrote: »
    10. Tightly coupled COGs so that a single threaded program can be run on multiple COGs

    How is that even possible?
    Item 10 is a tough one. Maybe it could be done with an 8-port register file, where each COG can read or write a register every cycle. The compiler would need to be aware of the interaction among the COGs, and generate code for each COG that all works together, and doesn't step on each other. Single threaded code could be broken up into chunks that can be done independently, and farmed out to multiple processors.
  • SeairthSeairth Posts: 2,474
    edited 2015-02-06 09:11
    Dave Hein wrote: »
    Hey, you wanted to know what I wanted. And my requested features are as realistic as anybody else's since the P1.1 will never be made in silicon.

    Okay, how about this:
    • Add 32 pins (Port B)
    • Add 8 more cogs (no changes to the cogs themselves)
    • Double the maximum clock speed
    • Add one external pin which controls whether P1.1 hub runs in 8-cog mode (and is therefore backward compatible with P1 for code execution) or runs in 16-cog mode.

    (Note: this would also effectively double the number of counters.)

    That's it. It's certainly not a P2-level improvement, but that's what the P2 is for. This just gives you more of what already exists, as well as an easy migration path from P1 to P1.1. Also, we should be able to quickly verify this on an FPGA, as it is still purely digital I/O.

    But, with such a "simple" iteration, not only would this allow the Propeller to target larger projects (because of the doubled cogs and I/O), but it would add the ability to extend existing projects. One such example is: the additional cogs and I/O pins would make the addition of external RAM realistic. Or, it might just be that code can be spread out over more cogs. Regardless, there are certainly a number of projects that have had to carefully squeeze their code together to fit on the P1, which might have some breathing room (and therefore, ability to be improved) with the P1.1.
  • Heater.Heater. Posts: 21,230
    edited 2015-02-06 09:58
    Dave,
    Item 10 is a tough one.
    It certainly is.

    After decades of research into having compilers automatically finding possibilities of parallelizing single threaded programs and then compiling those programs to exploit multiple CPUs we have basically gotten nowhere. In fact I get the impression that people have given up on the idea.

    However, there are things like OpenMP that let you mark up your single threaded code with hints to the compiler as to how to paralellize it.

    If that is good enough for you then you already have your dream come true. Item 10 is done on the P1 with prop-gcc.

    As a proof of the OpenMP concept on the Propeller the heater fft has been parallelizable for ages now. The same FFT code can be compiled to make use of 2, 4 or even 8 COGs.

    Have a look at the code here: https://github.com/ZiCog/fftbench/blob/master/fftbench.c

    Only problem is that the fft was giving the wrong results last time I tried it on a Prop. A long time ago. Despite working on every other multi-core machine I tried it on.

    Perhaps I should test it again against more a more recent prop-gcc.
  • SeairthSeairth Posts: 2,474
    edited 2015-02-06 10:08
    Heater. wrote: »
    After decades of research into having compilers automatically finding possibilities of parallelizing single threaded programs and then compiling those programs to exploit multiple CPUs we have basically gotten nowhere. In fact I get the impression that people have given up on the idea.

    Though not specifically compiler-related, take a look at:

    https://newsoffice.mit.edu/2015/new-priority-queues-data-structure-0130
  • Dave HeinDave Hein Posts: 6,347
    edited 2015-02-06 10:42
    Heater. wrote: »
    Only problem is that the fft was giving the wrong results last time I tried it on a Prop. A long time ago. Despite working on every other multi-core machine I tried it on.
    I think there is a bug in the PropGCC mutex lock/unlock routines. It may be that OpenMP relies on this feature. In my multi-threaded chess program I use pthreads and a mutex lock. However, the mutex lock didn't work correctly on the Prop, so I had to use a hardware lock instead. Maybe if the mutex lock is fixed it will fix OpenMP as well.
  • Heater.Heater. Posts: 21,230
    edited 2015-02-06 11:15
    Dave,

    So it may not be me. At the time I spent an age trying to determine if this was a problem in OMP in prop-gcc or a problem in my code.

    For sure mutex lock/unlock is involved in there.

    Is this a reported bug in an issue tracker somewhere?
  • jmgjmg Posts: 15,148
    edited 2015-02-06 11:32
    Seairth wrote: »
    • Add 32 pins (Port B)
    • Add 8 more cogs (no changes to the cogs themselves)
    • Double the maximum clock speed
    • Add one external pin which controls whether P1.1 hub runs in 8-cog mode (and is therefore backward compatible with P1 for code execution) or runs in 16-cog mode.

    That's certainly within a P1V realm, and I would add
    • Arrange die & bonding so it can go into existing P1 packages too

    The Pin-mode may be able to be a RAM bit, that defaults to P1 ?
    There, you could even define a 4 bit COG count, to soft adjust HUB slot spacing. Easily set to 8,16.

    Note however that Analog PLLs and ADCs that use a pin-threshold are considered custom design, so whilst the above list is easy enough to try on a big FPGA, the full tapeout is still not going to be cheap.
  • SeairthSeairth Posts: 2,474
    edited 2015-02-06 12:07
    jmg wrote: »
    That's certainly within a P1V realm, and I would add
    • Arrange die & bonding so it can go into existing P1 packages too

    The Pin-mode may be able to be a RAM bit, that defaults to P1 ?
    There, you could even define a 4 bit COG count, to soft adjust HUB slot spacing. Easily set to 8,16.

    Note however that Analog PLLs and ADCs that use a pin-threshold are considered custom design, so whilst the above list is easy enough to try on a big FPGA, the full tapeout is still not going to be cheap.

    A soft mode would be easy enough to do as well. Power-up would be 8-cog mode, and a single HUBOP could toggle it. An approach like this would certainly be necessary if this were made available in the existing P1 package.

    Could you explain the "custom design" part, though? I know that part of the original P1 was manually laid out. But is that necessary?

    Besides, regardless of the design, a tapeout isn't going to be cheap. The question is: would the cost of a tapeout (,fab, etc.) be able to recuperated with the P1.1? The hopeful answer, of course, is "yes". Presumably, the eventual introduction of the P2 is not going get rid of the need for the P1. Therefore, if the P1.1 replaces the P1, it will still have a market. Further, the enhancements of the P1.1 will be able to capture some additional opportunities that fall in between the P1 and P2, so it will likely have a life of its own even when the P2 arrives.
  • markmark Posts: 252
    edited 2015-02-06 12:21
    The P1.x/P2 drama kinda reminds me of a South Park episode:
    1. Solicit would-be customers for suggestions.
    2. ????
    3. PROFIT!

    :)

    -Phil

    Just a minor correction, but step 1 should be "Collect suggestions" :P

    bte2 wrote: »
    My list is short-

    I would like to see 5v tolerant pins, maybe a pin to apply port-specific Vcc to a group of pins. Heavier diodes on the input pins would be fantastic too.

    I would also like to have the program exist in the Prop to eliminate the boot pause. A full second to clear a display or LED that powers up 'on' seems like an eternity. I have some T-1 3/4 WS2812 LEDs that power up blue (for some stupid reason) and I hate it that it takes so long to clear them.

    My biggest wish has nothing to do with the Propeller- I wish there was conditional compilation directives in Prop Tool and/or an inline INCLUDE directive so I could keep data tables and pin constants in their own files.

    Neat suggestions. For reducing boot times, perhaps an additional pin, or perhaps the reset pin can be configured in a way to where if it's in a certain state, it skips the serial boot and downloads straight from the eeprom.
    Dave Hein wrote: »
    Item 10 is a tough one. Maybe it could be done with an 8-port register file, where each COG can read or write a register every cycle. The compiler would need to be aware of the interaction among the COGs, and generate code for each COG that all works together, and doesn't step on each other. Single threaded code could be broken up into chunks that can be done independently, and farmed out to multiple processors.

    If the cogs still take 4 cycles per instruction, couldn't you get away with just a 2-port register (or a real small cache preferably) assuming that it operates at clk? You might have to stagger the instruction execution by 1/4 for each set of 4 cogs. So for example, 1 of 4 cogs is fetching, the next is decoding, the third is executing, and the 4th is writing (or whatever the steps for the P1 are).

    What about parallelization of expression evaluations and comparisons? For code that can't be broken up into individual threads, being able to evaluate multiple ifs/cases simultaneously should be able to improve performance somewhat, but is likely wasteful compared to running different threads instead.
    Seairth wrote: »
    Okay, how about this:
    • Add 32 pins (Port B)
    • Add 8 more cogs (no changes to the cogs themselves)
    • Double the maximum clock speed
    • Add one external pin which controls whether P1.1 hub runs in 8-cog mode (and is therefore backward compatible with P1 for code execution) or runs in 16-cog mode.

    I'd probably suggest stripping out the video gen hardware to perhaps free up a little room for more hub memory, unless the die space it takes up is negligible.

    jmg's post just reminded me that the hub latency needs to be addressed. It would be nice to at least double its current rate if you're going to have 16 cogs.
  • Dave HeinDave Hein Posts: 6,347
    edited 2015-02-06 13:50
    [QUOTE=mark
  • SeairthSeairth Posts: 2,474
    edited 2015-02-06 14:19
    [QUOTE=mark
  • Ken GraceyKen Gracey Posts: 7,386
    edited 2015-02-06 14:35
    Tubular wrote: »
    1. Code protection
    2. More pins
    3. Analog
    4. More ram

    These are the features that would make a successful Propeller 2 - hardly anything else is required from our highest-volume customers. While there are many additional requests for counters, a design which is more suited to C, etc, these four items are the exact formula for a successful P2.

    This is something we know from asking them, supporting them and working with them for the past eight years.

    Ken Gracey
  • markmark Posts: 252
    edited 2015-02-06 14:36
    @Dave Hein

    So what you're saying is that performance of each cog should "just" be raised to 160 MIPS. I know you're joking, but I think it's an interesting enough topic to discuss, but not for any practical reason pertaining to the P1, of course. If there's actually some technical merit to what you said, and wasn't a completely off the wall comment, then I don't see why each cog would need access to the register every cycle. What shared data could a process among many processors possibly be working on to read from and overwrite to? I could see something like SIMD where, say, one cog fetches and decodes the instructions for the other cogs to execute. That could free up cog memory for data, and utilize hub bandwidth better in the case of "big" code, but that doesn't necessitate a multi-port register.
Sign In or Register to comment.