Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

ozpropdev · 2017-11-03 09:26

cgracey wrote: »

New v26 at the top of this thread.

This unifies the 3ms reset timer and PNut.exe.

All pin inputs are now registered, which seems to have eliminated the flakiness I was seeing in the booter yesterday. This should be pretty solid.

All flavours of V26 loaded and running fine on all boards.

Roy Eltham · 2017-11-03 09:45

You can't do SD 4bit without paying for it. You also have to deal with the secure protocol when doing that.
It's not just 4bit SPI, it doesn't just go faster.

The P2 boot rom shouldn't have anything to do with license required stuff. You can do 4 bit SD in your product using a P2 if you license it and write the code for it (which will have to be loaded into the P2 without using 4bit SD), and then you can do whatever you want for the pins.

Cluso99 · 2017-11-03 09:54

Roy Eltham wrote: »

You can't do SD 4bit without paying for it. You also have to deal with the secure protocol when doing that.
It's not just 4bit SPI, it doesn't just go faster.

The P2 boot rom shouldn't have anything to do with license required stuff. You can do 4 bit SD in your product using a P2 if you license it and write the code for it (which will have to be loaded into the P2 without using 4bit SD), and then you can do whatever you want for the pins.

This is precisely my aim Roy.

It appears others have now reverse engineered the 4bit protocol and is now being done on other micros. Even the SD 1-bit mode is significantly faster than the SPI that I/we currently use.

cgracey · 2017-11-03 14:39

ozpropdev wrote: »

cgracey wrote: »

New v26 at the top of this thread.

This unifies the 3ms reset timer and PNut.exe.

All pin inputs are now registered, which seems to have eliminated the flakiness I was seeing in the booter yesterday. This should be pretty solid.

All flavours of V26 loaded and running fine on all boards.

Thanks, Ozpropdev.

Did you notice the downloading was snappier?

dMajo · 2017-11-03 17:09

cgracey wrote: »

I mean, if we allow this quad SPI setup of CS, CLK, D3, D2, D1, D0, then we'd need to drive all those pins on boot-up or the user would have to put pull-ups on what were HOLDn and WPn. It just looks like a sprawling mess to me with lots of ugly contingencies. I DO LIKE SD, though.

@cgracey
you do not need to drive this pins at boot-up. You assume 1 pin SPI. the 4bit mode will be than used by the user.
If the designer uses 1bit SPI, by no driving the other pins will make them usable for anything else. Hold and WP will be hw defined by PCB in this case.
If the designer chooses a QuadSPI connection he will add also the 2 pull-ups for the scope.

The supported SD boot have to be in SPI mode only, thus again 4 pins.
The most future proof boot is to use a raw (un-formatted, without file-system partition). In this way you read the MBR and from the partition table you take its start and then start reading a given number of bytes in.
In the MBR there is "safe" and standard places to have also a small signature/checksum and also clock configuration and mode (cog/hub-exec).

The best thing is to implement the same MBR also in the flash. I would recommend to foresee for double-image so that first a new image can be written and than pointed to in the MBR. If the pointed image fails the system could try the other one. This will give reliability for field updates even through the air/internet as even a big image can be written directly to flash/SD and then verified prior to switch the pointer to it.

The ROM booter will in this way read the MBR, being it from flash or SD and live-extract from the flow the image start/length, clock-setup and execution-mode to a few COG registers and finally stream into the hub the image. Then cog or hub exec it based upon the mode setup.
This will also make most of the read/streaming code reusable between the two media reducing the ROM footprint.

Also IMHO the serial boot should be available only if flash/SD checks fails or are not detected. That means that for serial-to-ram download and or flash/sd rewrite, in case of working/good image the flash/SD firmware should foresee this option. Serial boot will be available only with damaged/missing/empty flash/SD. Should the user want always the serial-boot even with programmed flash/SD: that means the flash/SD will not be on boot pins but elsewhere.
In this way serial path can be used with full speed from the beginning because the flash/SD boot is already completed. The needed flash/SD function (pasm/spin) can be added to the image by the IDE on user request (flag) and called(hub/cog-executed) from the user program at user's conditions. It could be a standard/library function. This gives also a minimal level of code/image safety/protection.

The IDE will support empty-flash/SD and programmed flash/SD image-write processes. And will support double image writes. This by downloading its sw-writer to the P2 RAM in case of empty/failed/missing flash/SD or will comunicate to the firmware built-in image-update function.

Ariba · 2017-11-03 17:34

Cluso99 wrote: »

Doesn't my diagram work better???

The data pins are in the correct order, and you can detect whether to use SPI with SI/SO combined, or the Quad SPI booting in 1bit mode using separate SI/SO.

It also maps for SD too.

Your device connections need a lot of detecting by the bootloader and several pullup/down resistors here and there. And it has no pullups on nHD and nWE with Quad connection, which are required in 1bit SPI. If you add them, I think there is a conflict with SD detection.

I tried to show that if the booter just supports 1bit SPI with Di+DO connected and without any detection of a boot mode, you can still use quad-mode later in your application.

To do the same with your connection scheme, the SI/SO in 2A.) should go to P56, not P59, which results in unconnected pins in between.

Andy

jmg · 2017-11-03 20:11

cgracey wrote: »

I mean, if we allow this quad SPI setup of CS, CLK, D3, D2, D1, D0, then we'd need to drive all those pins on boot-up or the user would have to put pull-ups on what were HOLDn and WPn...

You only need to add light pullups, which is very common in MCU-land, & only to HOLDn, WPn.
If users need that pin to be low during reset, because they use 1-SPI, they add a pull-down, just like they do now.

Given most will use Quad (or Octal) SPI, ignoring HOLDn, WPn and mandating the user fits external pullups (ugh), it just looks like you forgot something...

cgracey · 2017-11-03 20:57

dMajo wrote: »

cgracey wrote: »

I mean, if we allow this quad SPI setup of CS, CLK, D3, D2, D1, D0, then we'd need to drive all those pins on boot-up or the user would have to put pull-ups on what were HOLDn and WPn. It just looks like a sprawling mess to me with lots of ugly contingencies. I DO LIKE SD, though.

@cgracey
you do not need to drive this pins at boot-up. You assume 1 pin SPI. the 4bit mode will be than used by the user.
If the designer uses 1bit SPI, by no driving the other pins will make them usable for anything else. Hold and WP will be hw defined by PCB in this case.
If the designer chooses a QuadSPI connection he will add also the 2 pull-ups for the scope.

The supported SD boot have to be in SPI mode only, thus again 4 pins.
The most future proof boot is to use a raw (un-formatted, without file-system partition). In this way you read the MBR and from the partition table you take its start and then start reading a given number of bytes in.
In the MBR there is "safe" and standard places to have also a small signature/checksum and also clock configuration and mode (cog/hub-exec).

The best thing is to implement the same MBR also in the flash. I would recommend to foresee for double-image so that first a new image can be written and than pointed to in the MBR. If the pointed image fails the system could try the other one. This will give reliability for field updates even through the air/internet as even a big image can be written directly to flash/SD and then verified prior to switch the pointer to it.

The ROM booter will in this way read the MBR, being it from flash or SD and live-extract from the flow the image start/length, clock-setup and execution-mode to a few COG registers and finally stream into the hub the image. Then cog or hub exec it based upon the mode setup.
This will also make most of the read/streaming code reusable between the two media reducing the ROM footprint.

Also IMHO the serial boot should be available only if flash/SD checks fails or are not detected. That means that for serial-to-ram download and or flash/sd rewrite, in case of working/good image the flash/SD firmware should foresee this option. Serial boot will be available only with damaged/missing/empty flash/SD. Should the user want always the serial-boot even with programmed flash/SD: that means the flash/SD will not be on boot pins but elsewhere.
In this way serial path can be used with full speed from the beginning because the flash/SD boot is already completed. The needed flash/SD function (pasm/spin) can be added to the image by the IDE on user request (flag) and called(hub/cog-executed) from the user program at user's conditions. It could be a standard/library function. This gives also a minimal level of code/image safety/protection.

The IDE will support empty-flash/SD and programmed flash/SD image-write processes. And will support double image writes. This by downloading its sw-writer to the P2 RAM in case of empty/failed/missing flash/SD or will comunicate to the firmware built-in image-update function.

dMajo, I see what you are saying, but I think that serial needs to be able to get in more easily, as it's the means to update the flash/SD, plus it's the fast path for development. I made a proposal in the other thread about having a static 4-pin layout for both flash and SD. Could you please say if you think it would work to do that? It seems by far the cleanest approach to me, but maybe it's a little unorthodox.

I'll just replicate the post here:

So, could we make a static 4-pin pinout that works with both SPI flash and SD card, assuming one or the other is present (or neither is present)?

If we had this:

P61 = CS (pull-up indicates presence of either SPI flash or SD card)
P60 = CLK (pull-up indicates 'skip serial wait if boot data okay')
P59 = DI (could be used as SPI flash DQ0)
P58 = DO (could be used as SPI flash DQ1)
P57 = (could be used as SPI flash DQ2)
P56 = (could be used as SPI flash DQ3)

In the ROM booter now, I do some initial command/response sleuthing to determine if a SPI chip is present. That takes umpteen microseconds. If it seems present, it takes ~4ms to read in $100 longs and determine if they're legit. If either of those checks fail, could we then treat those 4 pins as if an SD card is connected, and see what we get? This would all be predicated on seeing a pull-up on P61, indicating some kind of memory is present. This keeps our initial "is-memory-present" pin checking down to ONE pin. Also, by keeping DI and DO separate for both SPI flash and SD card, we could hold DI high while reading SPI flash (CS is low during reading) to avoid some command being interpreted by the SD card. Also, keeping those pins separate allows for quad SPI flash usage.

What do you guys think about this? Is it possible to have a pin-out scheme for booting that supports both SPI flash and SD card, with static CS, CLK, DI, and DO pins?

Cluso99 · 2017-11-03 22:44

Hey guys,
Can we discuss the boot code/options etc over on the "P2 Boot Rom Decision trees" please, and leave this for the FPGA files and bugs. Thanks.

Rayman · 2017-11-05 16:15

USB seems to work on V26, at least low speed mouse works.

But, had to uncomment out this line:

clkset  #$ff

Will the real silicon need this command too?
What frequency is it running without it?

cgracey · 2017-11-05 16:24

Rayman wrote: »
USB seems to work on V26, at least low speed mouse works.

But, had to uncomment out this line:
clkset  #$ff
Will the real silicon need this command too?
What frequency is it running without it?

It runs at 20MHz, until you raise it.

evanh · 2017-11-11 06:52

cgracey wrote: »

ozpropdev wrote: »

cgracey wrote: »

New v26 at the top of this thread.

This unifies the 3ms reset timer and PNut.exe.

All pin inputs are now registered, which seems to have eliminated the flakiness I was seeing in the booter yesterday. This should be pretty solid.

All flavours of V26 loaded and running fine on all boards.

Thanks, Ozpropdev.

Did you notice the downloading was snappier?

I just woke up to the fact the changes work with Dave's loadp2.

evanh · 2017-11-11 08:23

Chip,
In an effort to clarify the pipeline sequencing further, I've done a complementary timing diagram of the Cog instruction pipeline. I'd appreciate you casting an eye over it for functional accuracy, and maybe any additions/improvements you can think of.

------------------
instruction timing
------------------

clk         ___________             ___________             ___________             ___________             ___________             ___________
___________|           |___________|           |___________|           |___________|           |___________|           |___________|           |___________|


           |                       |                       |                       |                       |                       |                       |
  rdRAM Ib |-------+               |              rdRAM Ic |-------+               |              rdRAM Id |-------+               |              rdRAM Ie |
           |       |               |                       |       |               |                       |       |               |                       |
  latch Da |---+   +----> rdRAM Db |------------> latch Db |---+   +----> rdRAM Dc |------------> latch Dc |---+   +----> rdRAM Dd |------------> latch Dd |
  latch Sa |---+   +----> rdRAM Sb |------------> latch Sb |---+   +----> rdRAM Sc |------------> latch Sc |---+   +----> rdRAM Sd |------------> latch Sd |
  latch Ia |---+   +----> latch Ib |------------> latch Ib |---+   +----> latch Ic |------------> latch Ic |---+   +----> latch Id |------------> latch Id |
           |   |                   |                       |   |                   |                       |   |                   |                       |
           |   +------------------ALU-----------> wrRAM Ra |   +------------------ALU-----------> wrRAM Rb |   +------------------ALU-----------> wrRAM Rc |
           |                       |                       |                       |                       |                       |                       |
           |                       |  stall/done = 'gox'   |                       |  stall/done = 'gox'   |                       |  stall/done = 'gox'   |
           |         'get'         |        done = 'go'    |         'get'         |        done = 'go'    |         'get'         |        done = 'go'    |


clk         ___________             ___________             ___________             ___________             ___________             ___________
___________|           |___________|           |___________|           |___________|           |___________|           |___________|           |___________|

Program Counter(PC) in flux..................======c======...................................======d======...................................======e======..
Inst Fetch..=====b======....................................=====c======....................................=====d======....................................
Operand Decode..........====b=====......................................====c=====......................................====d=====..........................
S/D Fetch...........................=====b======....................................=====c======....................................=====d======............
Full Decode.....................................====b=====......................................====c=====......................................====d=====..
ALU Stage1..===========a==========..........................===========b==========..........................===========c==========..........................
ALU Stage2 (muxing).................===========a==========..........................===========b==========..........................===========c==========..
Result (write back).........................................====a====.......................................====b====.......................................

cgracey · 2017-11-11 09:32

Evanh, you've got things happening on both states of the clock. Things only change states when the clock rises.

evanh · 2017-11-11 10:12

What I'm showing there is the delay from start of a RAM fetch to fetch completion then the subsequent action that occurs within the same clock cycle.

For example, the instruction decoding, I believe, occurs in the same clock cycle as the instruction is fetched. I've shown these two parts as separate lines in the diagram to show the relative timings for functional purpose.

This gives insight into when changes happen vs when the clocking occurs.

evanh · 2017-11-11 10:17

You'll see there is a small timing gap before each rising clock edge. This signifies a small amount of slack time where the clock speed could be raised further.

evanh · 2017-11-11 10:21

PS: It's just a coincidence that anything is lining up with the falling clock edge.

cgracey · 2017-11-11 10:48

Okay, yes. That looks right.

evanh · 2017-11-11 11:15

Excellent. To me, there had been a slight lingering doubt about whether a named action was starting or completed at a particular clock edge. Your ALU dotted line was my best guide.

cgracey · 2017-11-11 13:28

I put a new v27 at the top of this thread.

This has the fast ROM loading and the protectable 16KB of RAM at the end of the hub ($FC000..$FFFFF).

Also, the FPGA boards which contain SD card slots can now have their four SD SPI signals mapped into P[61:58]. This is to facilitate development of SD boot code to go into the final ROM.

Tomorrow I will try to get the documentation all caught up.

Rayman · 2017-11-11 17:03

Think you can squeeze in 512 kB + 16 kB of RAM?

Rayman · 2017-11-11 17:18

V27 works on my P123 with garryj's V19 USB code and low speed keyboard or mouse

jmg · 2017-11-11 18:39

Rayman wrote: »

Think you can squeeze in 512 kB + 16 kB of RAM?

Interesting idea.
Adding another address bit would slow the RAM down - OnSemi may know by exactly how much ?
If it can add 16k, should that go to the top of 1M map ?

How does this affect FPGAs with smaller RAM offerings, can they still emulate the Top MAP ?
(or does the memory binary simply subset wrap, so that 16k appears in more than one place?)

Electrodude · 2017-11-11 18:43

Are the debug interrupt vectors still in the upper 16KB of RAM? if so, is there any way to change the debug interrupt vectors when the upper RAM protection is enabled?

If not, instead of fixing it somehow or moving them, you could just make them all jmp instructions that jump to an alternate location outside of the protected area, or something to that effect.

cgracey · 2017-11-11 19:33

I will ask OnSemi about making the hub SRAMs each 512 longs bigger.

potatohead · 2017-11-11 19:34

Are the debug interrupt vectors still in the upper 16KB of RAM? if so, is there any way to change the debug interrupt vectors when the upper RAM protection is enabled?

Probably not. The write inhibit should mask off that whole range. Doing anything else adds gates, latency, etc...

Choices would be:

One, you mentioned, which is just redirect them somewhere else.

Two, manage them in a specific routine, write enable, modify, write disable.

potatohead · 2017-11-11 19:35

I will ask OnSemi about making the hub SRAMs each 512 longs bigger.

Man, if this causes grief at all, the current arrangement is just fine.

jmg · 2017-11-11 21:24

cgracey wrote: »

I will ask OnSemi about making the hub SRAMs each 512 longs bigger.

You probably should word that 'at least 512 longs'... - in case they can fit > 512K in the final synth.
It's quite good to plan to go slightly above 512k, as then you can test the decode of that extra memory area.

Electrodude wrote: »

Are the debug interrupt vectors still in the upper 16KB of RAM? if so, is there any way to change the debug interrupt vectors when the upper RAM protection is enabled?

They could go just under the ROM ? - in which case adding eg 1024 longs could fit interrupt vectors and 'ROM' copy.
Other MCUs map interrupts to 00H, so they at least do not move on Variant-memory models.
If that's not practical, P2 could map them to Top of 1M ? (so they are also variant fixed)

ozpropdev · 2017-11-12 00:05

Chip
Re: V27 images

BeMicro CV-A9 image is still non responsive.

Borh Nano images ID Ok from Pnut but debug interrupt doesn't seem to work.(Vector locked??)

All other images work Ok.(P123-A7,P123-A9,BeMicro CV-A2 and DE2-115)

cgracey · 2017-11-12 00:07

ozpropdev wrote: »

Chip
Re: V27 images

BeMicro CV-A9 image is still non responsive.

Borh Nano images ID Ok from Pnut but debug interrupt doesn't seem to work.(Vector locked??)

All other images work Ok.(P123-A7,P123-A9,BeMicro CV-A2 and DE2-115)

Are you saying that the debug interrupt doesn't work only on the DE0-Nano images? Hmmm... I never invoked the write-protect. The DE0-Nano does have the distinction of having only 32KB of hub RAM. Hey! That means it only has 16KB at the bottom of hub RAM ($00000..$03FFF). Is that enough for your code?

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments