P2 ROM

Cluso99 · 2017-10-10 23:48

Chip,
How fast is the ROM going to be on the P2? Same or slower than the Hub RAM?

Might it be simpler, and more usable, to map the ROM into the upper part of the HUB (above the 512KB and below the 1MB)?
You would just need 'n' blocks like the RAM.

If the same speed, then all code could access the ROM. A high volume customer might wish to get a new ROM mask done for some P2's, or you might like to add something for a later ROM revision.

If the ROM was slower than the HUB RAM, then at startup it's already using a slower clock, so booting can still be done using the ROM in situ. For anyone wanting to use the ROM, it would be a simple matter to start the oscillator at say half speed, copy what they want from the ROM to Hub RAM, and then increase the crystal speed to 100%.

BTW since you are now obviously talking to better people at OnSemi, it may be prudent just to ask about FLASH again, just in case it is simple (which it should be).

jmg · 2017-10-11 00:54

Cluso99 wrote: »

How fast is the ROM going to be on the P2? Same or slower than the Hub RAM?

Might it be simpler, and more usable, to map the ROM into the upper part of the HUB (above the 512KB and below the 1MB)?

I thought the ROM was serial, and not parallel/RAM mapped at all ?
A tiny verilog stub loads the ROM into COG, and then it runs.

Cluso99 wrote: »

BTW since you are now obviously talking to better people at OnSemi, it may be prudent just to ask about FLASH again, just in case it is simple (which it should be).

Flash is usually more process steps and charge pumps, but there is IP around for OTP/MTP, that claims no added steps.

ISTR some comment Chip made, around testing connections needed ? for these IP blocks that made him shy away, but really, the benefit of OTP/MTP over ROM, is well worthw taking some effort to get it in there.

OTP/MTP may also allow the 'fuses conundrum' to be solved.

evanh · 2017-10-11 02:24

The mask ROM is not executed directly. It's serially read in byte chunks via the CLKSET instruction - "Set system clock configuration to D. If WC, also reads next ROM byte (used in boot-up)."

evanh · 2017-10-11 02:32

There isn't an explicit execution time mentioned for reading the ROM but given it is a hub resource it probably has the usual HubRAM latencies plus 7 extra clocks for the serial nature of the ROM. If the FIFO kicks in then that might change the rules though.

cgracey · 2017-10-11 03:03

The ROM is read via CLKSET on boot-up. It is very low-overhead this way. Making it map to hub memory would mean we'd need more ROM instances and lots of logic to fit the eggbeater paradigm.

cgracey · 2017-10-11 03:04

I will post the cold-boot code tonight so you can see how it works.

msrobots · 2017-10-11 04:04

No need to make ROM instances. Just load it at startup into RAM and leave it there.

So a loader can load a program overwriting the RAM or load after it, to keep the ROM content in RAM.

Say for the encryption stuff, or a small monitor, please?

by now the serial ROM has 16K bits or something like this, maybe there is some space left to put the P2-Hot monitor back in?

since in RAM the program loader can overwrite it if wanted, so it is no security concern.

Enjoy!

Mike

cgracey · 2017-10-11 04:56

Mike,

I agree about the monitor. Remember, though, that you can always load one in by a simple paste into a terminal program.

Here is the ROM code that is built into the cog hardware (not the serial ROM used to hold bigger system-level boot stuff):

wire [31:0][31:0] booti	= {

	// cold boot code - only cog0 on startup		
	32'b1111_1101011_001_000000000_000010110 + (cogm << 9),	// 0 =	setq	#cogm		(write reti0 instructions at end of hub)
	32'b1111_11111_1111_1011001_110_111111111,		// 1 =	augd	#reti0		(reti0 = calld inb,inb wc,wz)
	32'b1111_1100011_011_111111111_100010000 + (16 - cogs),	// 2 =	wrlong	##reti0,ptra[-cogs]

	32'b1111_1101011_101_000000000_000000000,		// 3 =	clkset	#0 wc		(read last rom byte)
	32'b1111_1101011_101_000000000_000000000,		// 4 =	clkset	#0 wc		(read 1st rom byte, get last rom byte)
	32'b1111_0000011_001_000000000_000000110,		// 5 =	shl	0,#6		(multiply by 64)
	32'b1111_1100110_110_000000010_000000000,		// 6 =	rep	#2,0		(ready to read 64n bytes)
	32'b1111_1101011_101_000000000_000000000,		// 7 =	clkset	#0 wc		(read Nth+1 rom byte, get Nth rom byte) 
	32'b1111_1100010_001_000000000_101100001,		// 8 =	wrbyte	0,ptra++	(write byte to hub ram)

	32'b1111_1100111_011_000000000_000000000,		// 9 =	coginit	#0,#0		(restart at $00000)

	32'b1111_1100111_011_000000000_000000000,		// A =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// B =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// C =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// D =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// E =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// F =	<filler>

	// warm boot code - via coginit
	32'b1111_0110000_001_111111100_000000000,		// 0 =	mov	outa,#0		(clear port shadow registers)
	32'b1111_0110000_001_111111101_000000000,		// 1 =	mov	outb,#0

	32'b1111_0110000_001_111111110_000000000,		// 2 =	mov	ina,#0		(clear ina to protect fuses during cogid)
	32'b1111_1101011_000_111111110_000000001,		// 3 =	cogid	ina		(point ina/ijmp0 to cog's initial int0 handler)
	32'b1111_0110001_000_111111110_111111110,		// 4 =	not	ina,ina
	32'b1111_0000011_001_111111110_000000010,		// 5 =	shl	ina,#2

	3'b111, !hubs, 28'b1101011_001_111110111_000010110,	// 6 =	setq	#$1F7		(if !hubs, load $1F8 longs from ptrb)
	3'b111, !hubs, 28'b1011000_001_000000000_110000000,	// 7 =	rdlong	0,ptrb
	21'b1111_1101011_000_1111110, !hubs, 10'b1_000101100,	// 8 =	jmp	dirb/ptrb	(if !hubs, jump to $000 (dirb=0), else ptrb)

	32'b1111_1100111_011_000000000_000000000,		// 9 =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// A =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// B =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// C =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// D =	<filler>
	32'b1111_1100111_011_000000000_000000000,		// E =	<filler>
	32'b1111_1100111_011_000000000_000000000		// F =	<filler>
	};

Cluso99 · 2017-10-11 06:47

cgracey wrote: »

The ROM is read via CLKSET on boot-up. It is very low-overhead this way. Making it map to hub memory would mean we'd need more ROM instances and lots of logic to fit the eggbeater paradigm.

Chip,
That makes no sense at all. Just parallel up one ROM per HUB RAM. The only difference will be the enable RAM vs ROM which could be as simple as using the A19 (top address pin) to select the RAM if '0' and ROM if '1'.

Isn't this easier than having to serial load hub/cog from serial ROM ?
It can hubexec straight off from boot by just forcing the COG 0 PC to the HUB ROM address !

It would probably even be acceptable to just permit RDLONG from HUB ROM. ie no byte enables per se.

Has the advantage of increasing hub space because it's not necessary to copy to hub ram.

cgracey · 2017-10-11 08:05

Cluso99 wrote: »

cgracey wrote: »

The ROM is read via CLKSET on boot-up. It is very low-overhead this way. Making it map to hub memory would mean we'd need more ROM instances and lots of logic to fit the eggbeater paradigm.

Chip,
That makes no sense at all. Just parallel up one ROM per HUB RAM. The only difference will be the enable RAM vs ROM which could be as simple as using the A19 (top address pin) to select the RAM if '0' and ROM if '1'.

Isn't this easier than having to serial load hub/cog from serial ROM ?
It can hubexec straight off from boot by just forcing the COG 0 PC to the HUB ROM address !

It would probably even be acceptable to just permit RDLONG from HUB ROM. ie no byte enables per se.

Has the advantage of increasing hub space because it's not necessary to copy to hub ram.

Yes, but we'd need 7 more ROM instances, plus steering logic. Also, this ROM would potentially eat into main memory space on full 1MB hub implementations. It takes under 2ms to load the current ROM into hub RAM, already. That thing's job is just to provide some code to kick things off. I like that the ROM hides and is only used on boot. Did you see the 'cold boot code' two posts above? We are currently loading just over 2KB from the ROM. The last ROM location holds a 64-byte block count and it's the first thing read. So, 2100 bytes x (8 + 8 clocks) = 33,600 clocks. At 20MHz, that's 1.68ms. And very little logic.

Cluso99 · 2017-10-11 08:38

Just thought it would be simpler and potentially more useful.
But it's fine as you have it.

Can the ring frame be easily changed in size for the smaller cog versions, which I presume would also have less pins and therefore less smart pins too? eg 4 cogs + 32 pins/SmartPins.

BTW I don't see any value in a 1 cog version. This would be pitched more against other micros.

jmg · 2017-10-11 09:23

Cluso99 wrote: »

Just thought it would be simpler and potentially more useful.
But it's fine as you have it.

To me, the way it is now is the simpler and more useful.

It is simpler, because there are two loaders - one pico-rom loads the boot code.
The pico-rom can be true ROM, as it is so small, it only has to be able to read the serial ROM/OTP/MTP.

Serial ROM is simpler, because it has fewer connections between an IP block, and the speed critical RAM.

Having a Serial IP Loader is much more useful, as it allows easier swap-in of OTP/MTP cells, - there are no parallel timing specs to have to meet.
This really can be dropped into a corner.

Hopefully any issues around OTP/MTP can be resolved, as that is far more flexible than mask rom.

jmg · 2017-10-12 02:19

On the topic of OTP for ROM, I see SST/OnSemi have a license agreement for what SST call SmartBit.
https://www.microchip.com/pressreleasepage/sst-announces-qualification-of-smartbit-otp-nvm-technology-for-on-semiconductor-s-110-nm-cmos-process

This cell has a charge pump related to the bit, so does not stress the process with some overall Vpp, and does not use charge-storage.
It's not sounding like a small cell, but they do claim "Unlike polysilicon or laser fuses, it is possible to route over IP based upon Smartbit™ IP, thus consuming no additional chip area.", however, the BOOT ROM is quite small.

Cluso99 · 2017-10-12 03:26

jmg wrote: »

On the topic of OTP for ROM, I see SST/OnSemi have a license agreement for what SST call SmartBit.
https://www.microchip.com/pressreleasepage/sst-announces-qualification-of-smartbit-otp-nvm-technology-for-on-semiconductor-s-110-nm-cmos-process

This cell has a charge pump related to the bit, so does not stress the process with some overall Vpp, and does not use charge-storage.
It's not sounding like a small cell, but they do claim "Unlike polysilicon or laser fuses, it is possible to route over IP based upon Smartbit™ IP, thus consuming no additional chip area.", however, the BOOT ROM is quite small.

What is interesting is OnSemi's 110nm line.

I had thought the ring frame had to be convertedto software from a manual layout so that it could be made in various incarnations. I had also though this meant it would scale (except the fuses). Shame it wasn't so as IIRC it cost heaps to re-do. Otherwise, 110nm might have been a good P2 option nowadays.

jmg · 2017-10-12 03:51

Cluso99 wrote: »

What is interesting is OnSemi's 110nm line.
... 110nm might have been a good P2 option nowadays.

Of course, one outcome of the 'long gestation time', is the 'trailing edge process node' also gradually gets finer... & hopefully NRE cheaper in real terms...
110nm might allow 1MB of RAM ?

Cluso99 wrote: »

I had thought the ring frame had to be converted to software from a manual layout so that it could be made in various incarnations. I had also though this meant it would scale (except the fuses). Shame it wasn't so as IIRC it cost heaps to re-do. Otherwise, 110nm might have been a good P2 option nowadays.

I think it does mean it can scale (relatively easily) , but the NRE costs are what drives the 'cost heaps to re-do'.
It needs a placement run to then simulate precisely, and in the current case, to also run a Test Device.

I don't know how different the doping profiles are from 110nm to 180nm, but it seems the PAD Ring masks can apply at 110nm, it just means things are larger than they could have been. Once the test results are in for the Test Chip, maybe OnSemi can confirm they are exactly as Simulated, and then run a simulation of the PAD Ring on 110 doping
SP110 seems to run 3.3V IO just fine, just has a lower core supply, and likely higher mask costs....

Cluso99 · 2017-10-12 04:31

jmg wrote: »

I don't know how different the doping profiles are from 110nm to 180nm, but it seems the PAD Ring masks can apply at 110nm, it just means things are larger than they could have been. Once the test results are in for the Test Chip, maybe OnSemi can confirm they are exactly as Simulated, and then run a simulation of the PAD Ring on 110 doping
SP110 seems to run 3.3V IO just fine, just has a lower core supply, and likely higher mask costs....

Would probably fit 16 cogs and 1MB Hub RAM at 110nm

Lets calculate that...
The synthesis guy just came back and said that the logic+memories area is looking to be 72 mm2.
We have 16 instances of 8192x32 SP RAM at 1.57mm2 = ~25mm2.
So 1MB would be 72+25 = ~97mm2
97 x 110/180 = ~59.3mm2
We only have 58 mm2 of space in the middle of our huge 8.5 x 8.5 mm die.

Looks mighty close to doable

evanh · 2017-10-12 07:18

Cluso99 wrote: »

I had thought the ring frame had to be convertedto software from a manual layout ...

It's the other way round, the synthesis converts from HDL to layout.

evanh · 2017-10-12 07:22

Cluso99 wrote: »

Would probably fit 16 cogs and 1MB Hub RAM at 110nm

Since we're pushing for expensive tooling then with MRAM, Hub could be 4 MB or more.

Cluso99 · 2017-10-12 11:02

evanh wrote: »

Cluso99 wrote: »

I had thought the ring frame had to be convertedto software from a manual layout ...

It's the other way round, the synthesis converts from HDL to layout.

The original ring frame was done by hand - by Chip and Beau. Then Treehouse did it.
Now OnSemi had to convert it.

cgracey · 2017-10-12 13:07

Cluso99 wrote: »

evanh wrote: »

Cluso99 wrote: »

I had thought the ring frame had to be convertedto software from a manual layout ...

It's the other way round, the synthesis converts from HDL to layout.

The original ring frame was done by hand - by Chip and Beau. Then Treehouse did it.
Now OnSemi had to convert it.

They are going to extract a schematic from the layout which contains parasitic capacitances and wire resistances. Then, they are going to simulate it to discover the timing corners of its core-related digital nodes. That data will then be fed back to the digital synthesis tools to make sure that that synthesized logic blob up in the middle of the chip respects the timing of the pad circuitry.

evanh · 2017-10-12 13:11

Converting it to what? I can't say I know anything about the number of steps are involved at the OnSemi end of things but Treehouse's layout will be geometrically matching the physical end silicon.

EDIT: I see Chip has given an informative and somewhat diplomatic answer.

evanh · 2017-10-12 13:21

Cluso,
If you were really thinking about this you wouldn't be asking Chip to put Flash memory in. You'd be asking for MRAM instead.

Neither are an option right now but that's a different conversation.

samuell · 2017-10-12 13:31

Internal ROM is a bad idea IMHO, especially if it is FLASH. It will degrade. I really prefer an external SPI flash and load the program to RAM, in the same fashion of the P1. If the EEPROM degrades, just replace it instead of the whole MCU. Otherwhise it is a waste of silicon. But I probably said that before.

Also, I read that it won't fit in the die, nevertheless. I think the presence, or absence (hopefully) of ROM should be cleared once and for all.

Cluso99 · 2017-10-12 13:32

Evanh,
MRAM is only proven on a limited line, is still new, probably under patent, and licensing is likely prohibitive.
On the other hand, FLASH is old technology done on most lines, and is in OnSemis repertoire because they make their own FLASH chips.

Cluso99 · 2017-10-12 13:39

samuell wrote: »

Internal ROM is a bad idea IMHO, especially if it is FLASH. It will degrade. I really prefer an external SPI flash and load the program to RAM, in the same fashion of the P1. If the EEPROM degrades, just replace it instead of the whole MCU. Otherwhise it is a waste of silicon. But I probably said that before.

Also, I read that it won't fit in the die, nevertheless. I think the presence, or absence (hopefully) of ROM should be cleared once and for all.

Are you aware that P1 has 32KB of ROM?

ROM, OTP, EEPROM, FLASH and MRAM are all different. Some form of this must be in the P2 to be able to boot, even if this is only to read/load from an external chip.

evanh · 2017-10-12 13:54

Cluso99 wrote: »

Evanh,
MRAM is only proven on a limited line, is still new, probably under patent, and licensing is likely prohibitive.
On the other hand, FLASH is old technology done on most lines, and is in OnSemis repertoire because they make their own FLASH chips.

Our "line" desn't have Flash either, if I've understood things so far. Chip would have to sort out the whole deal all over again.

MRAM may not be popular but it certainly is not new any longer. I'd be confident OnSemi do have MRAM options on certain "lines".

You are discarding MRAM on old outdated ideas, imho.

Heater. · 2017-10-12 16:47

Can we stop it with the MRAM thing?

I'm willing to be corrected but as far as I can tell there is no MCU available yet that us mere mortals can buy that uses MRAM. As of the beginning of the year it was still a question as to whether MRAM would appear in high end MCU processors using high end process nodes any time soon.

For example this article: "Will MRAM replace flash in leading edge processes?" from January 2017.
http://www.newelectronics.co.uk/electronics-technology/will-mram-replace-flash-in-leading-edge-processes/150041/

My understanding of all this is that MRAM is far away from being something the P2 can adopt.

All I want is a P2. I don't care if it uses 2708's for program store?

jmg · 2017-10-12 18:33

samuell wrote: »

Internal ROM is a bad idea IMHO, especially if it is FLASH...I really prefer an external SPI flash and load the program to RAM, in the same fashion of the P1.

ROM and FLASH are not the same thing.
Even with off chip flash you need SOME code running, to load that into RAM, thus even P1 has internal ROM (Which you claim is a bad idea?)
There are OTP systems that do not use charge storage, so have no corresponding degrade mechanism.

evanh wrote: »

MRAM may not be popular but it certainly is not new any longer. I'd be confident OnSemi do have MRAM options on certain "lines".

Any links to ANY OnSemi MRAM ?? Until OnSemi have MRAM proven on 180nm, talking about it is mere pie in sky.

Smarter to focus on choosing what OnSemi CAN offer, and OTP has many benefits over ROM.

Cluso99 · 2017-10-12 22:46

I would be astounded if OnSemi do not make chip that include FLASH on their ONC18 line.

OnSemi make lots of FLASH chips themselves. Of course I don't know which line they currently use, but they sure have the expertise, just as they also have/do with EEPROM.

Chip just hasn't got to the correct team in OnSemi. Sometimes it's difficult for small players to get to the right people in large companies. But you can be sure that OnSemi is making micros with embedded ROM, OTP, FLASH and/or EEPROM on their lines.

jmg · 2017-10-12 22:51

Cluso99 wrote: »

I would be astounded if OnSemi do not make chip that include FLASH on their ONC18 line.

Of course, I'm sure they do ... the issue with flash is the extra process steps, which increase the price of the chip.
If OnSemi have a MTP or Flash solution (serial loaded), that has no chip-price impact, & is easy to connect to & test, yes, it's a no-brainer.

Even OTP, which claims to have no chip-price impact, is better than MASK ROM.

Addit: OnSemi website lists

Non-Volatile Memory
OTP – One Time Programmable
Sidense 1.8/3.3 V gate-rupture
1k-bit array and 256 k-bit array
In field programming capable

EEPROM – No additional masks or processing steps
Vector: Up to 64 bits supported
Internal Charge Pump provided

Mentions OTP, EEPROM but not flash, and looks only very small EEPROM ?

For the OTP, 256k-bit sounds quite useful... could also sole the fuse issues ?

evanh · 2017-10-13 00:40

Flash is not possible on the line Chip is working with so pretty much any line, including 130 or 110 nm ones, is up for discussion in the context of this topic.

For the existing line, OTP is an option but it would still be using some mask ROM to manage things. OTP would then require extra silicon. A small amount of EEPROM for config and maybe a tiny program would be superior, imho.

However, what Chip already has (fuses and mask ROM only) is perfectly okay in my books.

P2 ROM

Comments