The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

cgracey · 2015-09-24 18:37

Seairth wrote: »

Chip,

What happens when PC increments from $0FFC to $1000? Do you automatically switch from LUT execution to HUB execution?

Yes, and it always goes to $1000. In order for there to be any unique behavior per cog, we would have to have some register to hold the overflow address. It would only come into play when the PC was advancing beyond LUT.

By having an overflow address, we could have large contiguous programs that begin in the cog, continue into the LUT, and then continue into hub RAM. The first part of your program would have your variables and fast code, the next part would have fast code, and the last part slower code.

This is very simple to implement. Would it be a worthwhile feature?

P.S. Wait! This would only have marginal value because incidental 9-bit relative branches would not make it across that LUT/hub boundary, as it would not actually be contiguous, address-wise.

Seairth · 2015-09-24 18:48

cgracey wrote: »

Seairth wrote: »

Chip,

What happens when PC increments from $0FFC to $1000? Do you automatically switch from LUT execution to HUB execution?

Yes, and it always goes to $1000. In order for there to be any unique behavior per cog, we would have to have some register to hold the overflow address. It would only come into play when the PC was advancing beyond LUT.

By having an overflow address, we could have large contiguous programs that begin in the cog, continue into the LUT, and then continue into hub RAM. The first part of your program would have your variables and fast code, the next part would have fast code, and the last part slower code.

This is very simple to implement. Would it be a worthwhile feature?

P.S. Wait! This would only have marginal value because relative branches would not make it across that LUT/hub boundary, as it would not be contiguous address-wise. this would be a wrench in the works.

I wouldn't do it at this point in time. If you need such a feature, you can manually do it right now with something like...

<LUT code>
fitl $0FFB      ' or whatever the fit directive is
orgl $0FFC      ' or whatever the org directive is
jmp #hub_address

In the case of switching from LUT to Hub, it's probably better to be explicit

cgracey · 2015-09-24 18:51

A

Seairth wrote: »
cgracey wrote: »

Seairth wrote: »

Chip,

What happens when PC increments from $0FFC to $1000? Do you automatically switch from LUT execution to HUB execution?

Yes, and it always goes to $1000. In order for there to be any unique behavior per cog, we would have to have some register to hold the overflow address. It would only come into play when the PC was advancing beyond LUT.

By having an overflow address, we could have large contiguous programs that begin in the cog, continue into the LUT, and then continue into hub RAM. The first part of your program would have your variables and fast code, the next part would have fast code, and the last part slower code.

This is very simple to implement. Would it be a worthwhile feature?

P.S. Wait! This would only have marginal value because relative branches would not make it across that LUT/hub boundary, as it would not be contiguous address-wise. this would be a wrench in the works.

I wouldn't do it at this point in time. If you need such a feature, you can manually do it right now with something like...
<LUT code>
fitl $0FFB      ' or whatever the fit directive is
orgl $0FFC      ' or whatever the org directive is
jmp #hub_address
In the case of switching from LUT to Hub, it's probably better to be explicit

Agreed. You've gotta keep 'em separated.

cgracey · 2015-09-24 18:52

Seairth, what kind of FPGA board do you have?

Roy Eltham · 2015-09-24 18:53

I think it's fine to have it always transition to $1000 in hub when executing past the end of LUT. As Seairth said, you can always get around that by placing a jmp instruction at the last LUT address.

You mention the relative branches not working if you did the overflow address thing, but do they work as expected when it always goes to $1000?

cgracey · 2015-09-24 18:59

Roy Eltham wrote: »

I think it's fine to have it always transition to $1000 in hub when executing past the end of LUT. As Seairth said, you can always get around that by placing a jmp instruction at the last LUT address.

You mention the relative branches not working if you did the overflow address thing, but do they work as expected when it always goes to $1000?

Yes.

Seairth · 2015-09-24 19:11

cgracey wrote: »

Seairth, what kind of FPGA board do you have?

1-2-3 (w/o PLL fix).

Seairth · 2015-09-24 19:15

Roy Eltham wrote: »

I think it's fine to have it always transition to $1000 in hub when executing past the end of LUT. As Seairth said, you can always get around that by placing a jmp instruction at the last LUT address.

In general, though, it will not be a good thing to transition from LUT to Hub without planning. Unless you go out of your way to change hub memory, $1000 is going to contain the startup code. I suspect that is not what most people will intend to execute.

Edit: Unintentionally transitioning to $1000 will likely outwardly look like the chip has reset itself, causing people to think the hardware itself reset...

Seairth · 2015-09-24 19:18

(deleted.)

Roy Eltham · 2015-09-24 19:29

Seaith,
My envisioning of an example startup sequence would be like this:
1. read in 16k ROM into the first 16k of HUB ram.
2. start executing the code that is at $1000 (4k) in hub.
3. the code there will copy the first 4k of hub memory into a cog and jump to it.
4. the code in the cog (and remainder of hub if needed) will then read external flash/sdcard/whatever
4a. first it will read in a chunk of memory and copy up to 4k blocks of it up into various cogs.
4b. next it will read in up to 512k replacing hub memory (some or all)
5. all that code will then be running.

This allows you to plan out what ends up at $1000, and it could be continuation code for what's in the first initial cog code, or something else to work with one of the other cogs. Whatever you wish.

jmg · 2015-09-24 20:05

cgracey wrote: »

Yes, and it always goes to $1000. In order for there to be any unique behavior per cog, we would have to have some register to hold the overflow address. It would only come into play when the PC was advancing beyond LUT.

By having an overflow address, we could have large contiguous programs that begin in the cog, continue into the LUT, and then continue into hub RAM. The first part of your program would have your variables and fast code, the next part would have fast code, and the last part slower code.

This is very simple to implement. Would it be a worthwhile feature?

P.S. Wait! This would only have marginal value because incidental 9-bit relative branches would not make it across that LUT/hub boundary, as it would not actually be contiguous, address-wise.

That would be nice, but if there is a way for the tools to manage that 'overflow' for the user, then it is less critical.
ie an extra couple of lines is ok, provided that can be hidden from users.
RELJMP not working would be reported by the linking stage.
Likely 'fixed' in a compiler, or reported in ASM flow.

jmg · 2015-09-24 20:08

Seairth wrote: »

Roy Eltham wrote: »

I think it's fine to have it always transition to $1000 in hub when executing past the end of LUT. As Seairth said, you can always get around that by placing a jmp instruction at the last LUT address.

In general, though, it will not be a good thing to transition from LUT to Hub without planning. Unless you go out of your way to change hub memory, $1000 is going to contain the startup code. I suspect that is not what most people will intend to execute.

Edit: Unintentionally transitioning to $1000 will likely outwardly look like the chip has reset itself, causing people to think the hardware itself reset...

Sounds like a very good case for boot < $1000 ?

I'm lost where it actually boots from...

In most chips, Bootloaders are protected and it is ok if they are somewhat complex to reach. The idea is they run only when you explicitly want them to.
Typically on RST and via user-calls.

potatohead · 2015-09-24 20:26

That's precisely why I liked the non-aligned code below $1000. Thought it might be just odd / different enough to discourage code there, etc... And in the "just load the chip and get going case", program start at $1000 made sense. Boot / init / decrypt happens below $1000, which then can get wiped and used for data storage in most use cases.

But, it's totally OK if we don't have it that way too. Looks like maybe only Chip and I thought that a good idea!

Propeller boot code isn't going to be complicated, nor hard to reach. This is in line with how we've come to use the chips anyway. The P1 code was encrypted, and once we got past that, having SPIN and the booter open helped on a number of fronts.

Since we copy to RAM, it's going to be entirely up to the developer as to how much boot code, if any at all, continues to exist after initial startup. It's also going to be up to the developer as to the load process beyond a basic loader. Multi-stage load, etc...

For those using encryption, they may well want to get rid of all initial code and or perform secondary loads, depending on their use cases.

Boot at $1000 isn't a big deal, and we can prevent the "looks like the chip got reset" case by loading startup code into COGS, as Roy pointed out, starting up, then clearing all that out. User program can start at something like $1008, just a coupla instructions in.

Edit: scratch that. The process Roy posted up means user program starts at $1000, and startup / init code can just go away, if the user desires it, or it can be kept.

I personally want to preserve the option of running like the P2 "hot" chip did, with monitor and ability to reset. That was very useful for a lot of testing.

jmg · 2015-09-24 20:40

potatohead wrote: »

Edit: scratch that. The process Roy posted up means user program starts at $1000, and startup / init code can just go away, if the user desires it, or it can be kept.

I personally want to preserve the option of running like the P2 "hot" chip did, with monitor and ability to reset. That was very useful for a lot of testing.

Agreed,having Boot "out of the way" in < $1000 lets Chip start how he intended, and any user code will naturally flow past the hidden boot area, but can still call all the useful routines, should any user want to do that.

With 16k of ROM, there is room for quite a bit of useful Monitor / debug stuff...

potatohead · 2015-09-24 20:56

Yes.

And what I was thinking was build the boot code, offset $0001 and just deal. No support beyond the PASM assembler needed. In that area goes booter, crypt, monitor, optional debug. Also, in that area go a couple of "hooks", jump vectors for the dev system and advanced debug features we have yet to develop.

The < $1000 area is "system" and it's largely untouched and unused precisely because of that $0001 offset. This gets included onto the chip as a binary blob, and the vast majority of users won't need to do anything with it at all, though they could call routines in it, if they wanted / needed to.

Users who want to boot and go, just clear that and it's data storage. Users who want to boot and develop, test, etc... have it all there, and they then can optionally use the full set of ROM tools on chip, or load additional dev tools, or provide their own and have them nicely integrated with the core system tools with the hooks planned out right now.

All this would take is a "user" command in the monitor maybe "&", with address vector and argument pointer. User could then issue a command to the monitor, say "& assemble $5000" or "& editor" and it all just works. Could get interesting too. Say a FORTH or BASIC or some other thing is loaded and integrated... Could have a nice little system building over time for those who want it, while those who don't, get the full benefit of all the RAM if they really want to.

Maybe insure that booter can also perform a second stage load at the user request. That would cover the supported boot methods, and the user is always free to load something that can then do more, like fetch from SD card. And, with the hooks in place, even provide file / dev support for those long after the core on chip tools are done.

The initial ROM copy to RAM is 16K, and user program start is $1000. If the user wants to, just put a bootloader, or use the existing one below $1000 to load their HUB binary, overwriting anything that's supplied in the ROM, and it's just like there is no ROM.

Best of all worlds, if you ask me.

Ah well, I'll shut up about it here pretty quick.

It's worth a pass or two before I do though.

Roy Eltham · 2015-09-24 21:37

I think you guys (or maybe just potatohead?) are forgetting that there is really a two (or multistage) boot going on here.

The 16k ROM that gets serially read into hub at startup is not user editable. It's something chip will write and it gets cooked into every P2 made. There is a chance it could be updated in the future with new/more code but then every P2 from then on would have the new version. This would be kind of odd, unless it was some new spin of the chip with other changes.

This is what going to be started at $1000 in hub. It's going to have the security stuff in it, and some form of monitor. Maybe if chip has time he can code up some minimal editor/assembler that runs on chip and fits in that ROM along with the security stuff and the normal second stage boot code.

This code will then read in stuff from an external flash/sdcard/whatever to get the user written code/data into the chip. After this code is "done" is where the user takes over and does whatever they want to finish booting/whatever.

Rayman · 2015-09-24 22:09

If I did the math right, I think an 8x16 font with 256 characters would be perfect for below $1000...

tonyp12 · 2015-09-24 22:10

It will load rom based boot code in to hub ram starting at $1000, it then will spi-read a external flash to what address?
Or all .bin files needs to start with a header that list load location?

There is no protection for monitor/debugger/goodies the boot-rom-code have from getting overwritten or corrupted while it's in hub ram?, only a hardware reset will restore it?

16K seems a little to much, but if it can boot in 1/4 sec I guess OK. but I think a USB boot-strap loader should be implemented if you have that much space.

potatohead · 2015-09-24 22:15

We don't have protection this time. System tools load into RAM.

I didn't want to suggest this, but it has been on my mind for a while.

Another good case for code under $1000. A bit could lock it out from writes...

Just saying...

jmg · 2015-09-24 22:23

tonyp12 wrote: »

16K seems a little to much, but if it can boot in 1/4 sec I guess OK. but I think a USB boot-strap loader should be implemented if you have that much space.

The copy of Serial ROM to RAM will be very fast, and only a tiny portion of that 16k will then run to load a very small stub from SPI.
That stub will likely flip to QuadSPI and then load the real code.
For smaller/simpler project that 2nd gear change could be skipped.
Most of the boot delay is usually waiting for serial.

jmg · 2015-09-24 22:25

potatohead wrote: »

We don't have protection this time. System tools load into RAM.

I didn't want to suggest this, but it has been on my mind for a while.

Another good case for code under $1000. A bit could lock it out from writes...

Just saying...

Interesting idea, but would need care around security, as some designs may want to delete the BOOT. Coupled with the other fuses this may work.
There has not been much discussion around fuses, maybe once the FPGA is loading code, fuses can map to DIPSW to check how they interact with the flows.

potatohead · 2015-09-24 22:29

Roy, yeah I know it's not user editable. ROM copy to RAM, then boot.

What I was envisioning, when Chip first suggested code run non-aligned under $1000 was the core system software go there. 4K is enough for booter, crypt, monitor, etc...

Since the ROM copy is 16K, that area, along with some HUB RAM would actually get populated. Ideally, we get the on chip tools done, and those fit into the 16K. Or not... If not, other data can be put there, or maybe it's all just ignored, or the size cut down before making a real chip.

So the first load is internal, 16K gets copied to HUB RAM. Start boot process at $0001. Those programs run in a mix of HUB or COG, whatever makes sense, and they would initiate the second stage load.

If there is no second stage load, the monitor is active as it was in P2 "hot" and we are done.

In the second stage load, perhaps the image can contain a start address, or just a mode bit. One mode would be to load from $0 to the max HUB memory, if desired. The other mode would be to just load starting at $1000.

The load starting at $1000 is a standard user program that assumes the system software is there. Maybe for debug, decryption, whatever. When the image is loaded, program begins at $1000.

The load starting at $0 would be able to modify the system software, updating pointers to nicely integrate some things, or maybe just make complete use of the Propeller HUB RAM, erasing the system tools entirely. Whatever the user wants to do. For this one, program start at $1000 as well, but the image may well contain code below $1000 for whatever reason the user might think makes sense.

If this under $1000 area could be write protected, that's even better! Now we've got a system area that can't be trounced on my the vast majority of user programs. Those that toggle that protect bit obviously would overwrite that area. Where that bit would reside? Who knows? Maybe there is a bit left in something like COGID...

A compelling feature of P2 "hot" was developing on chip, using the monitor to run code, modify memory, etc... Would be nice to have this for this chip, and we basically do, but it's not really ROM.

Also, at that time, we were divided on ROM / RAM use as the ROM space was in the RAM space. If we did it this way, we get the best for both camps and their use cases.

I also really like the idea of a small region of RAM that is discouraged from seeing general development. It's the perfect place to drop debuggers or other kinds of tools without having to worry too much about the impact on a lot of programs. This is sort of like how the shadow register RAM in the COGS on P1 ended up being used.

mark · 2015-09-24 22:47

Someone mentioned this long ago, but I don't recall there ever being an answer/response to it: what if the longs were 34-bit? You'd have two more bits to work with for addressing which would alleviate this whole conundrum, no? This would cut into the hub ram a bit for a given die space, and you'd have to decide what you would use those two extra bits for. you can't change your byte sizes unless you up it to 36 bits. Perhaps they could be used to store flags? Any thoughts on how useful being able to store flags would be in practice?

tonyp12 · 2015-09-24 22:48

Maybe 3 modes for user software loading:
1: start at $0 (over writes everything, only cog-ram that is the spi flash/uart loader survives)
2: start at $1000 leave the first 4K intact, this area is write protected after boot, long aligned accessed only and have important system tools.
3: start at $4000 leave the first 16K intact as it contains some small useful libraries (or font) even a USB boot strap loader maybe.

potatohead · 2015-09-24 22:51

@jmg, don't use a fuse, just a write mask bit. That's enough to handle the basic overwrite case concerns.

Maybe the COG ID instruction, or some other one has a bit to spare to set the write mask on RAM under $1000 to simulate actually having ROM there.

tonyp12 · 2015-09-24 23:02

>don't use a fuse, just a write mask bit.
Hacking concerns maybe, if your software is designed to remove the monitor as your code is closed source.
If they are able to edit/hack the flash so your software is in load-mode2 and then hardware invoke the monitor, they can read the source code.

jmg · 2015-09-24 23:11

mark wrote: »

Someone mentioned this long ago, but I don't recall there ever being an answer/response to it: what if the longs were 34-bit?

That's more a viable option on FPGA, as the bits are almost free there.
The P2 silicon is a OnSemi memory compiler block and I'm not sure they can generate 34b.
34b also messes with Byte and int16 overlays, so there are plenty of fish-hooks, for little benefit....

jmg · 2015-09-24 23:14

tonyp12 wrote: »

3: start at $4000 leave the first 16K...

Is that 16k for ROM bits or bytes ?
I thought it was bits, but I'm not sure that's been clearly mentioned recently?

tonyp12 · 2015-09-24 23:22

rom is 16kbits? (no one should use bits unless talking about transmission rate)
That is only 2K.

jmg · 2015-09-24 23:46

tonyp12 wrote: »

rom is 16kbits? (no one should use bits unless talking about transmission rate)
That is only 2K.

More digging, finds this in the die-area notes

16384x8 ROM 0.3 mm2
Looks like 16k Bytes

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

Comments