Musings while we wait for the Prop II specs.....
Cluso99
Posts: 18,069
I have a crystal ball.
On April 1 2012 Chip wins the big lottery...
Chip announces that the Prop III will be available 1 month after the Prop II.
Prop III specs...
35-40nm geometry
4MB hub ram
2KB boot rom
32 cogs
2.56GHz clock (10MHz xtal)
Dropin pinout to PropII
All other specs as per PropII
On April 1 2012 Chip wins the big lottery...
Chip announces that the Prop III will be available 1 month after the Prop II.
Prop III specs...
35-40nm geometry
4MB hub ram
2KB boot rom
32 cogs
2.56GHz clock (10MHz xtal)
Dropin pinout to PropII
All other specs as per PropII
Comments
Jim
IMHO we already have everything we need to replicate an IPOD touch or an Android Pad.
Back to coding...
I repeat CODE RED
This is a CODE RED ALERT
Cry babies have been spotted in the near vicinity!!!!
THIS THREAD IS IN JEOPARDY!
CODE RED
I repeat CODE RED
This is a CODE RED ALERT
Cry babies have been spotted in the near vicinity!!!!
THIS THREAD IS IN JEOPARDY!
Sorry Ray, I could not help but inject this thought into your thread. It appeared to be the perfect opportunity. Please do not take it personally, because I think you are a nice guy, and I think you deserve to post this thread in this forum, but due to musings, your thread may be in jeopardy from cry babies.
I think I have found a neat solution to switch between groups of pins using a 74HC137. So there are 28 pins and you can switch between 8 groups of 28.
There are some pins that *always* need to be connected. These are generally inputs that are waiting for asynchronous data. Examples would be a keyboard and a serial input. I think there is no choice there but to dedicate propeller pins. But it isn't many - in fact maybe just keyboard, mouse and serial input. And even then there are other solutions - eg a touchscreen does away with the need for a keyboard and mouse, and with serial ports it is possible to add in the handshake lines and then unload the serial cog driver and reuse the pins when not needed.
VGA takes 8 pins and TV takes 3. That does use up pins but touchscreens are cheaper than TV screens and they don't require any dedicated pins as they have onboard ram.
So, with a touchscreen and hardware RS232, you still have the full 28 pins to use.
Then I think it comes down to how you group things together. 8 groups of 28.
I find groupings are dictated by how data is transferred.
For instance, one group I am using is for very rapid ram to touchscreen transfers. Connect the touchscreen to the ram with 16 lines, and on the propeller side, devote 19 lines to drive the ram address and one line to toggle the display write pin. So group 1 takes 20 prop pins and can transfer data at a somewhat scary speed (5 pasm instructions = ?4Mhz x16bits = 64 megabits per second).
There is another grouping that is useful, call this group2, where you want to transfer data from the SD card to the ram chips. I think that calls for a different layout. 28 pins minus 3 for the SD card (the group select does the SD chip select) = 25 pins left. Devote 16 to data and 8 to address and one to either read or write (read would be group2, write replicates this as group3). Latch in A8 to A18, and then do a fast pasm burst write or read 16 bits at a time and repeat 256 times. With word transfers, that is 512 bytes.
If you are using an 8 bit bus instead of 16 bits, more pins for address and so only one latch needed.
So - group1 is fast ram to display (full screen update in 30ms), group2 is SD to ram, group3 is ram to SD.
Then I think you can look at caching. If you can cache data in and out of the propeller at 256 or 512 bytes at a time, then most of the time the ram and the SD card are not being used. Neither is the display needing to be updated.
So for most of the time, you have 28 pins completely free to do what you want with. This is the concept I am working towards with a new board.
28 free pins can be called group4. I'd devote 4 to drive the touchscreen finger press. devote some to serial ports. Anything really.
This is why I don't think we need the prop II or prop III.
Ahhhh shucks! I think the natives were just starting to get restless.
Bruce
Can't say as I blame them.
We now have microSD cards for <$20 with 1,000,000 times the storage than the old washing machine sized disks that cost $16,000.
We now have bottom end phones with lots more power than the $1M computers that filled huge airconditioned rooms.
We now have printers for ~$100 that run rings around the printers of yesteryear that cost >$20,000 and required cranes to lift them through large windows in commercial buildings.
We now have cheap 3D plastic printers that were only an imagination years ago.
So, who can make a cheap printer that can lay down a printegrated circuit with say 1,000 to 10,000 transistors. Surely we could fit these on an A4 of Foolscap page???
Anyone up for the challenge???
I think we would need at least three inks, two conductive and 1 insulator inks.
Sounds fun. A search for 'printable transistor' throws up some interesting links on Google.
I know what I would like - CPLD chips in 14,16,20 pin DIP size that can be programmed to be a range of 74xx chips AND which cost less than the equivalent real chip.
Not sure about the 'cost less' , but there was an opening for this a while ago.
Since then CPLDs have moved into Narrow Vcc land, and package choices are not great either.
Even the new Lattice MachXO2 has just one MLF package (coming..), and the only gullwing sizes are large 100 and 144 pins,
The rest are BGA, which shrinks the design-base drastically.
If you like little, these guys have a nice little solution
http://www.silego.com/index.php?page=greenpak2
Yes, that has FAR less logic than a MachXO2, and the XC2C32A still needs a 1.8V core, and it cannot tolerate 5V.
- which underlines my point, that CPLDs have rather painted themselves into a corner.
It is silly, as CPLD can compete on price ok, it's a pity that other details often remove them from the candidate list.
-Phil
I do not follow this ?
They have a well designed $49 development system, (I have one here) and will sell devices from 100+, on their web store for 29c each.
They even list a ZIF socket for $18, if you want to prototype into a custom Board.
The only draw back on these parts I see, is they have too little logic. They ARE wide Vcc, and small, in a QFN12.
http://www.silego.com/buy/index.php?main_page=index&cPath=48
http://www.microchip.com/pagehandler/en-us/press-release/microchip-launches-8-bit-mcus-with-configurable-lo.html
I've had a play with it, it's quite easy to use.
One of the examples provided is a high-speed Manchester encoder. It would have been nice if something like that had been put into the P2.
a) 16 COGs or as many will fit with the technology available at the time.
b) COGS have 64 bit registers and instructions
c) Src and Dest fields can then be 25 bits wide.
d) That means COG address space can be increased to a max of 32 mega longs (128 mega bytes)
e) Actual COG space is increased to what ever is possible with the tech available at the time.
f) NO HUB RAM. With the high speed COG to COG or (Chip to Chip) links (as in Prop II) there is no need for shared HUB RAM anymore.
g) Some way to allow at least one COG to be hooked to a real external memory bus for that really large code you need.
h) The whole thing programmable in the GO language making multi core / multi chip programming a sinch.
That should keep Chip busy for a little while:)
--Ed
- 12 cogs and 4 of them are super cogs
- Super cogs have Instruction/memory addressing of 40bits so the D & S registers are now 13 bits = 8K*40bits = 40KB
- Super cogs have 4K*40bits=20KB
- Normal cogs have Instruction/memory of 32bits 0.5K*32bits=2KB
- Chip Cog-Ram layout...
- 4 @ super P2 cogs ram 4K*40bits=20KB (better if 8K*40bits=40KB fits)
- located in the corners
- 8 @ current P2 cogs ram 512*32bits=2KB
- located 2 per side between corners
- Cogs are located between each cog ram block. Starting top left corner and anticlockwise...
- cog 0 super cog has top left corner as its cog ram
- cog 1 normal
- cog 2 normal
- cog 3 super
- cog 4 normal
- cog 5 normal
- cog 6 super
- cog 7 normal
- cog 8 normal
- cog 9 super
- cog 10 normal
- cog 11 normal
- Between each pair of cogs is a new cog-to-cog ram as follows...
- 12 @ 0.5K*32bit=2KB
- is an extension of cog ram so is on the left of the cog ram
- accessible only by both adjacent cogs
- dual port ram (write bits OR'ed if both cogs write in the same cycle)
- Access via new R/W long/word/byte instructions (mapped above hub ram by exiting R/W hub instructions)
- In the centre is the hub...
- 2K*32bit=8KB addressable as long/word/byte (quad long access probably not required - saves silicon)
- 0.5K*32bit=2KB ROM
- Super cogs get 2 accesses per 16 cycles
- Normal cogs get 1 access per 16 cycles
- cog access per 16 cycles: 0 1 3 4 6 7 9 10 0 2 3 5 6 8 9 11 (super cogs in bold)
- maybe possible to allow super cogs access to hub on extra cycles if normal cogs dont require access. Alternately, allow register for cog to indicate it never requires hub access, and permit a super cog to take this cycle.
- Counters, Video, Cluts
- Super cogs counters/video/cluts etc per P2
- Normal cogs do not have the video but do have the cluts to be used as fifos (saves silicon)
- Therefore 12 @ 128*32bits=0.5KB ea
- Counters improved...
- Add 2 @ 32bit serialiser/deserialisers with paths to input and output pins (can be used as any depth by software)
- Modes to use internal counter clocks or feed external clocks, both via counters
- Note existing counters should be able to be simply modified to add these abilities
- Modes to clock on rising or falling or both. (Note pins already provide inversion)
While this means 2 instruction core sets, I think this would be quite easy to implement. A compiler switch would define the supercog.On the supercog, the movi and movd would adjust accordingly to 13 bits and the relative position in the 40bit instruction.
The video/counter registers would remain 32 bits, but a movi or movd into them would write the lower 9 bits to the appropriate register bits. This does create a minor problem that needs a little more thought.
Asymmetric cores is an obvious next step.
I'd like to see a number for how much area the Multiply/Maths costs, (usually it is quite a lot..) as there are a lot of data-pumping apps, that do not need a maths engine, and the Prop Pin Access, means any COG can reach anywhere, so you lose very little.
That silicon could flip to more RAM, or better peripherals, that do not impose SW bottle-necks, but co-operate with SW to deliver speeds the same as the chip clock. ( XMOS does this quite well )
Then it's a fair bet Chip won't do it.
Unless the 'super COGs' can run unmodified 'Normal COG' code, they break the everything that Chip has worked to create.
There are plenty of ways to go asymmetric, and still keep a valid code base.
Maths speeds is one, another clear asymmetric candidate is a Prop 1 cog ! That is much smaller than a Full Prop II cog, and it certainly can "run unmodified 'Normal COG' code", - anything you like, presently in the Code libraries.