Prop 1 Rev C - Feasible ???

Cluso99 · 2011-06-30 16:39

With some of the issues with Prop II understood a little better, and understanding some of the prop 1 uses, I wondered if the following Prop 1 version C might be possible..???

This is just a discussion point for the time being, as I don't want to upset any progress on the Prop II.

Prop 1 as the basis, meaning...

Same instruction set
Same registers, counters, etc
Same pinout
8 cogs

But with the following changes...

128KB - 512KB hub ram (128KB is an absolute minimum)
Delete the ROM font (we have to remove something for the extras) and replace with...
- 8x8 ROM font
- Code to boot from SD card
  - Detects P29 (SDA) external 10K pulldown resistor
    - P29 (SDA) has external 10K pullup resistor for normal EEPROM boot
  - P29 = DI
  - P28 = SCL
  - P27 = DO
  - P26 = /CS (external 10K pullup resistor required)
- Optional Fast Spin Interpreter
  - Co-exist with existing interpreter to avoid any potential problems
- Optional FP code
- Optional FAT16/32 code
- Other possible ROM code (plenty of space)
  - Cordic tables ?
  - FullDuplex code ?
  - VT100 code ?
  - VGA code ? (which version?)
  - TV code ? (which version?)
  - PS2 Keyboard code ?
  - Perhaps some of the GUI code being done for Prop II ?
Faster clock (160MHz) ???
Reduce hub access to 1 in 8 clocks (if possible)
Maybe one change to the counters to permit shifting in using the VGA section ??
- I am unsure if this is minor or a major change, but would be nice
Is an internal 8MHz (giving 128MHz) oscillator 1% trimmed feasible ???
- All the AVRs seem to do it.
- We could even trim ourselves by code.
- I am not looking to reclaim the pins, just remove the crystal requirement.

What this would give us is...

Mainly a lot more hub ram, removing need for external SRAM, and hence the main pressure on 32 I/O pins.
Option to directly boot from SD cards (microSD) without EEPROM
Maintain existing compatiblity (except for ROM font)
Faster execution
Existing spin interpreter will only be able to access 32KB hub ram (am I correct due to 16 bit addressing?) but the faster interpreter will have access to the full hub ram.
Same pinout and packaging (do I hear DIP40 anyone?).
I could see this replacing the existing prop 1.

The process would be similar to the existing prop 1, maybe only jumping one or two feature size levels (whatever the correct name is for the die feature size). This would retain the low power of the prop 1, but give us the valuable space required for the extra hub RAM.

Am I now correct in presuming that many of the requirements for the Prop 1B (64 I/O) would go away with extra hub memory??

Your thoughts???

localroger · 2011-06-30 17:01

Well, hitting a few points based on what I remember of Chip's presentation at UPEW...

Some of the limitations in P2 are materializing because of a desire to "have it all" and things like built-in video out resistor networks are making the cogs pretty die hoggy, so there might be a place for an intermediate product. I would guess that there will be more room for ideas like this once *some* new product, presumably the P2 as currently envisioned, as shipped. Apparently the dev process has now been made less by-hand and more industry standard tools-based so that new versions can be rolled more easily. But Parallax has to start by shipping *something*.

* RAM is expensive. Dual-port as used in the P1 isn't as bad as the 4-port used in P2, but dual vs. quad is what gives you 4 cycle instructions / 16 cycle hub vs. 1 cycle instructions / 8 cycle hub. Asking for more of this expensive RAM quickly gets you to the die limit. Falling back to the dual port RAM used by P1 might buy you some extra kilobytes but it loses you the speed. ROM takes a lot less die space than equivalent RAM.

* "Compatible except for the ROM font" is "not compatible."

* Faster clock requires smaller, leakier process and no low-power mode. Does get you more stuff on the die though.

* Even a trimmed oscillator will be much more temperature sensitive than a crystal. You can always "software trim" the existing chip by measuring it against an accurate timebase. Send it to the Sahara or Antarctica, don't expect it to be very close though. As with the inability to support on-chip EEPROM, this probably has to do with the fab process.

An alternate ROM might be do-able, since ROM is die acreage cheap. I suspect the pull-down detect thing switching to the alternate ROM for different ROM font, SD boot and faster big SPIN might be do-able. Might even be able to swap around the memory map so that normal booting puts the old ROM where it is now with more RAM atop it but new boot mode gives continuous RAM memory map. I smell a mode bit in there though.

Roy Eltham · 2011-06-30 17:13

Cluso99,
Prop 1 HUB memory is limited to 16bit addresses. Thus 32k RAM, 32k ROM. To change that is a pretty significant architecture change.

Tubular · 2011-06-30 17:16

"What this would give us" - sounds a bit like a RamBlade

I don't think the pin requirements go away at all - they probably get worse. Because having more hub memory will make us want video with more colours, or driving TFTs etc.

Why not do something like my double dipper but with a big sram attached to one of the props? Or an adapter that adds the existing ramblade to a second board which has a prop and essential peripherals?

jeff-o · 2011-06-30 19:36

Sounds like an entirely new chip design, which they're doing already. I think you're just going to have to wait until the Prop II arrives.

Toby Seckshund · 2011-06-30 22:45

The Euro lottery is up at £120 million, at present, and the draw is tonight.

So on Monday I will ask very nicely for a batch of Prop 1Bs, in PLCC84.

Cluso99 · 2011-06-30 23:47

Tubular wrote: »

"What this would give us" - sounds a bit like a RamBlade

I don't think the pin requirements go away at all - they probably get worse. Because having more hub memory will make us want video with more colours, or driving TFTs etc.

Why not do something like my double dipper but with a big sram attached to one of the props? Or an adapter that adds the existing ramblade to a second board which has a prop and essential peripherals?

The RamBlade was designed to plug into many pcbs. It plugs into the proto board nicely.

Roy: I would not think that 16 bit addressing is a fixed thing, apart from the current interpreter.

localroger: The current prop 1 design only accesses hub every 16 clocks (means hub is only accessed every 2nd cycle - I am pondering access every cycle as that would give a big improvement). That IMHO has nothing to do with multiport memory. The quad access cog ram in the prop II is so that the instructions can go to single cycle. I am not suggesting that as I would like to keep things as simple as possible, just scale down slightly to get more room for more hub ram. PII has quad access to hub for the quad long access. I do not think it is quad port.

Generally, there seems to be far more requirements for hub memory than for pins, provided we remove the external memory from the equation. We effectively gain 2 pins by using the SD on the EEPROM pins.

You will note I tried to leave everything "as is" to minimise the work. Not even discussed internal pullups or ADC. If done right, IMHO it could replace the current Prop 1.

I would even be happy if it was just the existing prop on a larger die, as long as it had more hub RAM. Even the package/footprint could be larger to accommodate a larger die.

Toby: Yes I try the local lotto, but not been lucky yet. I would love to have the funds to do it.

Roy Eltham · 2011-07-01 02:02

Cluso99,
When you do a RDLONG/RDWORD/RDBYTE/WRLONG/WRWORD/WRBYTE in a COG it connects to HUB memory via an address and data bus. There is 16 address lines connecting from the COGs to the HUB, so you can only address 64k. In order to add additional address lines from all of the COGs to the HUB, I suspect it would require a fair bit of rework to the chip layout/design.

Roy

Ale · 2011-07-01 02:39

The Spin interpreter would have to be changed too because it uses many 16 bit pointers... The easiest would be to increase the RAM to say 60 KBytes and leave the last 4kBytes as ROM, thus changing almost anything... but it is not what Cluso asked for...

localroger · 2011-07-01 05:20

Cluso, one of the things that emerged at UPEW was that ditching ROM doesn't buy you much RAM because ROM takes like 1/4 the die space of equivalent RAM. You're right about the quad port cog RAM being for the cache but there was something about Hub RAM being hoggier for 1 cycle access too. The main problem is that both P1 and P2 are die bound -- the die cannot be made any larger if it's to fit in the standard packages, and the die is full.

One suggestion that was made at UPEW was to ditch the entire ROM except for a boot loader. That's when the difference in die acreage for RAM and ROM emerged.

It also needs to be mentioned that much of the work that Parallax has been doing has been moving the design from the hand-validated tools Chip used to more industry standard tools that can be auto-validated, making a faster release cycle possible. Since they have a working P1 design and have been working toward the smaller process P2, I'm not sure how much of a back step it would be to do rework on P1. They did make it clear that once some version of P2 is out, the new design tools would make it much quicker to bring new versions to market, including at other process sizes.

I suspect the only P1 revision we are likely to see -- and I would really like to see it, because I like the single supply low power operation of the 360 uM process -- is the version with 64 I/O pins.

Dave Hein · 2011-07-01 05:25

The current Spin interpreter could support up to 64K of RAM minus a few K for a smaller ROM that only contains the boot loader, Spin interpreter and a small amount of Spin code use by the interpreter. So in theory a version with 60K of RAM, 4K of ROM and 64 I/O pins could be done with a "small" effort. However, I don't see much of a chance for that to happen.

Toby Seckshund · 2011-07-01 06:01

Not even with a "pretty please" ?

(Try to imagine a Toby with his soleful puppy dog eyes......then struggle to remove that image)

jmg · 2011-07-01 19:46

localroger wrote: »

I suspect the only P1 revision we are likely to see -- and I would really like to see it, because I like the single supply low power operation of the 360 uM process -- is the version with 64 I/O pins.

The P1 is full custom, so it is likely to be the last of its kind- changes are just too costly, and slow, for the return.

There is scope however for a cut-down of the P2 tho, as that has a LOT of IO, and single supply is comfortably managed by many vendors now.
Sure, it takes a little more effort, but many have rightly decided to make that effort.

64I/O pins does not mesh well with common packages, but a 64 pin P2 would slot nicely between the P1 and P2.

All this, of course, depends on how mnay re-spins are needed before P2 flies.

Mark_T · 2011-07-02 15:31

Roy Eltham wrote: »

Cluso99,
Prop 1 HUB memory is limited to 16bit addresses. Thus 32k RAM, 32k ROM. To change that is a pretty significant architecture change.

Really? Its just widening the bus by a few bits, the read/write instructions use 32 bit registers for the address generation (or 9 bit literals, but that's already less than 16). There would need to be a decision as to how to organise a new memory map.

Incidentally the manual doesn't say what happens to addresses greater than $FFFF - presumably they wrap around so that the memory map could have been presented as:

$00000000 -- $00007FFF RAM
$FFFF8000 -- $FFFFFFFF ROM

In which case expanding RAM or ROM would have been easy!

I'm going to check if that's true though... Yes, you can use $FFFFE000 as the sine table address quite happily.

ericball · 2011-07-02 20:05

Oooh, I just had an idea. What if there was a way (say via the CLKSET command) to remap ROM to RAM, so all 64K of the HUB memory map is now RAM. This was quite common in the 8 bit era. It might not even be that difficult to modify the bootloader.

Of course, the real question is whether there's a sufficient market to justify engineering and manufacturing a different chip.

Dr_Acula · 2011-07-02 20:12

Quote Originally Posted by Roy Eltham View Post
Cluso99,
Prop 1 HUB memory is limited to 16bit addresses. Thus 32k RAM, 32k ROM. To change that is a pretty significant architecture change.
Really? Its just widening the bus by a few bits, the read/write instructions use 32 bit registers for the address generation (or 9 bit literals, but that's already less than 16). There would need to be a decision as to how to organise a new memory map.

This might be an issue that needs addressing for the prop II, and also it comes up as a problem when disucssing "big spin" using external memory and spin simulators and sphinx.

How could such memory maps work? If I write a hypothetical 'big spin' program, and I think about modifying sphinx so it can compile, how would a call to a pub that is at the other end of memory work? (ie it is 200k away).

Your thoughts???

My feeling is that there is still so much we have not done yet with the Prop I.

hinv · 2011-07-03 15:28

I too would love to see a prop1 variant. I have been wanting a Prop1b for years now. But I would like:

64 I/Os
4 cycle multiply
4 cycle divide
64KB of RAM
sdcard boot rom masked on during boot.

This would eliminate the need for the external eeprom. The extra pins could be used to add more memory, and other peripherals. The 4 cycle multiply and divides would fit nicely in those reserved slots and give the prop1 the power to do mp3 decode and other stuff. I also like the idea of a full custom design.

Oh, and one other thing...a deserializer to match each serializer for high speed communications.

I personally think the pins on the P2 have gotten a bit bloated.

Just my 10 bits worth.

Doug

Leon · 2011-07-03 16:19

Four cycle divide isn't possible.

Roy Eltham · 2011-07-03 18:45

That isn't strictly true Leon.

potatohead · 2011-07-03 18:51

Seems to me that's all about how the divide is clocked internally...

Leon · 2011-07-03 19:18

I don't think I've seen a divide function on any MCU that only takes 1 clock.

Phil Pilgrim (PhiPi) · 2011-07-03 19:40

Leon wrote:

I don't think I've seen a divide function on any MCU that only takes 1 clock.

I'll bet I could devise an 8 / 8 divide that uses a one-clock lookup table.

-Phil

Leon · 2011-07-03 19:58

Try doing that for a 32-bit divide.

Ravenkallen · 2011-07-03 20:35

I was kinda hoping a future Prop Rev C would be a 28 pin(Skinny DIP) package:)

K2 · 2011-07-03 23:26

Dr_Acula wrote: »

My feeling is that there is still so much we have not done yet with the Prop I.

I agree entirely. I know for certain that I'm not through with it. In fact I've hardly started.

hinv · 2011-07-04 23:45

even an 8 cycle divide would be good...or do without. A variable cycle divide would be useful for somethings, but would really mess up determinism depending on the arguments.

What is the Prop2 going to have for a divider? How much die does it take up?

Doug

zoopydogsit · 2011-07-05 05:04

I don't mean to hijack this thread, but I agree with K2 and DrAcular about not being done with the Prop1

Does anyone know what happened to the long awaited Prop 1B ? The full 64 pins would open a whole bunch of new opportunities for the Prop1.

Heater. · 2011-07-05 05:11

As far as I understand the Prop 1B was stalled when some chip design software bug failed to be resolved. Haven't heard about it for a long time. It is sad that it never materialized but I suspect now it's better those efforts go into the Prop II.

Given the modern chip design techniques be adopted for the Prop II one might be optimistic and look forward to different Prop variants coming rapidly after the Prop II. Perhaps something akin to the Prop 1B also.

Dave Hein · 2011-07-05 06:57

Leon wrote: »

Four cycle divide isn't possible.

Without thinking about it too much, it seems like a four-cycle divider could be brute-forced with 256 8x32-bit multipliers and 256 subtractors. An 8-cycle divider would be more reasonable with 16 4x32 bit multipliers and 16 subtractors. The 4x32-bit multipliers would be trivial since they would be multiplying times values of 0 through 15. At most, each one would require two adders or subtractors.

So, it is clearly possible to do a four cycle divider.

Leon · 2011-07-05 07:05

Perhaps I should have said that it wasn't feasible.

Dave Hein · 2011-07-05 08:06

Leon, perhaps you should have googled "combinational dividers". Single cycle dividers are possible and feasible. They do require more gates than multipliers, and there are more stages for the logic to ripple through. A four cycle divider is certainly feasible.

Prop 1 Rev C - Feasible ???

Comments