TAQOZ - Tachyon Forth for the P2 BOOT ROM

Cluso99 · 2018-01-27 12:30

Peter, IIRC Chips boot code uses smart pins to read the SPI flash.

The code to bit bang the SPI for SD is tiny compared to the rest of the code, which are just common routines calling the bit banging routines.

Quartus is capable of loading a data file for pre-configuring the ROM (top hub ram). I am not aware of a way that it can be loaded after the verilog is compiled though.

Perhaps we need a compiled P2 version where the boot code does just that (either loads 16KB from Flash or SD) and then continues the proper boot sequence. I can knock out the code to load 16KB from an SD file if that suits.

Cluso99 · 2018-01-27 12:36

I have a few concerns regarding the clock delays in reading and writing directly to the pins (ie like the old INA/OUTA on P1. We have quite a few extra clock delays in P2. They don't occur within the instruction operation any more. BTW I realise these delays have been documented.

Peter Jakacki · 2018-01-27 13:11

Cluso99 wrote: »

Peter, IIRC Chips boot code uses smart pins to read the SPI flash.

The code to bit bang the SPI for SD is tiny compared to the rest of the code, which are just common routines calling the bit banging routines.

Quartus is capable of loading a data file for pre-configuring the ROM (top hub ram). I am not aware of a way that it can be loaded after the verilog is compiled though.

Perhaps we need a compiled P2 version where the boot code does just that (either loads 16KB from Flash or SD) and then continues the proper boot sequence. I can knock out the code to load 16KB from an SD file if that suits.

No, Chip's code just bit bangs the SPI, the only place he uses smart pins is with the serial port. I will have to compile and blend in with the ROM area for a test though. What I'd like is to have my code work as if it is ROM

Are you able to put your code into Flash? I haven't tried that yet either.

evanh · 2018-01-27 13:14

Cluso,
The state change lags (which don't slow down the data rate) don't really get in the way because the Smartpins do the required serial buffering so therefore all the software has to worry about is grabbing/sending data when the relevant Smartpin becomes ready.

In a SPI master mode, managing a bit-bashed chip-select in conjunction with a Smartpin shifter may have a bearing but, given the slack that seems to occur around extra trailing SPI clocks, I doubt anyone will notice any issues.

I havn't tried to think about SPI slave mode.

David Betz · 2018-01-27 15:29

msrobots wrote: »

As far as I understood current booting is:

1) serial
2) flash
3) boot from SD with clusso99's code
--3a) MBR boot
--3b) named File in root dir
4) if all fails jump into TAQOZ
--4a) TAQOZ start script from file from SD

It makes no sense to me that we have two different ways of booting from SD in the ROM. Seems like a waste of space. Why not do this:

1) serial
2) flash
3) jump to TAQOS
--3a) TAQOS attempts to boot from SD
--3b) If that fails, enter the interactive TAQOS

evanh · 2018-01-27 15:43

#3 is really the pre-TAQOZ methods. So, combining them makes perfect sense. It'd then be:

3) jump to TAQOS, reads SD block#1 and selects one of the following:
--3a) MBR boot
--3b) named File in root dir
--3c) TAQOS script
--3d) If that fails, enter the interactive TAQOS

Dave Hein · 2018-01-27 15:46

evanh wrote: »

DaveH,
The mask ROM can't execute in place. It has to be copied into RAM first.

Also, Chip has arranged the last 16 KB of HubRAM to be separated from the rest. It sits at the end of the 20 bit address space (right under 1 MB). This is intended for the debug routines and any OS like facilities. Peter has just indicated this space could be where TAQOZ sits, but it doesn't as yet.

So on boot up only the loader is loaded into hub RAM initially, correct? That is, taqoz is not read into hub RAM unless serial/flash/SD fails. Is that correct? So if I want to use the last 16KB of hub RAM for my own OS, there is no concern that taqoz will overwrite it on boot up. Is that all correct?

evanh · 2018-01-27 15:48

Dave Hein wrote: »

So on boot up only the loader is loaded into hub RAM initially, correct? That is, taqoz is not read into hub RAM unless serial/flash/SD fails. Is that correct? So if I want to use the last 16KB of hub RAM for my own OS, there is no concern that taqoz will overwrite it on boot up. Is that all correct?

Yep, sounds right to me.

Roy Eltham · 2018-01-27 18:19

I think the entire ROM will be read into the upper block at chip boot. The very first power on logic just runs a tiny hard coded bit of code to read the ROM into hubram and then start it.
Then that ROM code will do the sequence of steps of trying serial/flash/etc. That ROM will have Chip's code, Cluso's code, and Peter's code. It will be there for you to use if you want, or to ignore/overwrite/whatever.

If your code loads from serial/flash (which will be before any SD loading or TAQOZ), then you can do whatever you want and you will be full in control. It's only when you don't feed it code from serial or have code in flash that it will fallback to trying SDcard and/or TAQOZ.

Dave Hein · 2018-01-27 19:19

I assume a soft reset will perform like a power on boot. That means that the entire ROM will be read into hub RAM on soft reset. So an OS would not be able to store data in the upper block that would persist between soft resets. Fortunately, the P2 will have lots of hub RAM.

Cluso99 · 2018-01-27 21:12

Peter Jakacki wrote: »

Cluso99 wrote: »

Peter, IIRC Chips boot code uses smart pins to read the SPI flash.

The code to bit bang the SPI for SD is tiny compared to the rest of the code, which are just common routines calling the bit banging routines.

Quartus is capable of loading a data file for pre-configuring the ROM (top hub ram). I am not aware of a way that it can be loaded after the verilog is compiled though.

Perhaps we need a compiled P2 version where the boot code does just that (either loads 16KB from Flash or SD) and then continues the proper boot sequence. I can knock out the code to load 16KB from an SD file if that suits.

No, Chip's code just bit bangs the SPI, the only place he uses smart pins is with the serial port. I will have to compile and blend in with the ROM area for a test though. What I'd like is to have my code work as if it is ROM Are you able to put your code into Flash? I haven't tried that yet either.

Sorry Peter. I just recalled seeing use of smart pins in Chips code, but yes, it was the serial and not SPI.

I haven't tried the flash either. Agreed we need an image where we can download (by some means) the 16KB top Hub RAM and then execute as if a clean boot from there. Perhaps when Chip finishes this round with OnSemi we can discuss this with him.

What would be easiest for your testing? Would loading a 16KB file from SD into top Hub and a new clean boot work for you? That's probably the easiest for me to do.

Cluso99 · 2018-01-27 21:23

evanh wrote: »

Cluso,
The state change lags (which don't slow down the data rate) don't really get in the way because the Smartpins do the required serial buffering so therefore all the software has to worry about is grabbing/sending data when the relevant Smartpin becomes ready.

In a SPI master mode, managing a bit-bashed chip-select in conjunction with a Smartpin shifter may have a bearing but, given the slack that seems to occur around extra trailing SPI clocks, I doubt anyone will notice any issues.

I havn't tried to think about SPI slave mode.

Evan,
I think of the case for bit banging the NRZI in USB FS. Here, the reply is required within a few clocks (16 comes to mind). A delay here of some extra 4 clocks (above the instruction) and a further 3 in outputting are significant!!! (these are delays from what I recall - I didn't check). There is also code being executed too, adding to delays. We use the waitpxx to get granularity of 1 clock, which is going to be 1+4 now.

Not only is this going to be a trap for coders, but there are likely timing implications. Just saying.

Smart pins is not going to be a solution for bit banging. And bit banging is done a lot, not just on the prop, but other micros too. That is one of the biggest complaints about ARM - you cannot do things determinantly.

There are plenty of other bit banging we did with the P1 where I am sure the delays would also matter.

Cluso99 · 2018-01-27 21:40

Roy Eltham wrote: »

I think the entire ROM will be read into the upper block at chip boot. The very first power on logic just runs a tiny hard coded bit of code to read the ROM into hubram and then start it.
Then that ROM code will do the sequence of steps of trying serial/flash/etc. That ROM will have Chip's code, Cluso's code, and Peter's code. It will be there for you to use if you want, or to ignore/overwrite/whatever.

If your code loads from serial/flash (which will be before any SD loading or TAQOZ), then you can do whatever you want and you will be full in control. It's only when you don't feed it code from serial or have code in flash that it will fallback to trying SDcard and/or TAQOZ.

Dave Hein wrote: »

I assume a soft reset will perform like a power on boot. That means that the entire ROM will be read into hub RAM on soft reset. So an OS would not be able to store data in the upper block that would persist between soft resets. Fortunately, the P2 will have lots of hub RAM.

Basically correct. The serial ROM will be loaded into top Hub RAM on power up. Then it will execute from here code that will...
* Check pullups
Then one of the following
* Download from serial into Hub $0 upwards
* Download from FLASH into Hub $0 upwards
* Run Clusos SD boot code which will load into hub $0 upwards
* Run TAQOZ
In all cases, top 16KB Hub RAM will contain (and remains) the serial ROM code. This is also mapped to the upper 16KB block of the total HUB RAM xxKB beginning from $0.

The user can overwrite the top 16KB if so desired. The SD code loaded from the card would also be able to overwrite this 16 KB block.

As for clearing hub ram on soft/hard reset, that's up to Chip in hardware. Speak now if you have a preference. I believe the top 16KB will always be overwritten.

David Betz · 2018-01-27 23:11

Cluso99 wrote: »

Basically correct. The serial ROM will be loaded into top Hub RAM on power up. Then it will execute from here code that will...
* Check pullups
Then one of the following
* Download from serial into Hub $0 upwards
* Download from FLASH into Hub $0 upwards
* Run Clusos SD boot code which will load into hub $0 upwards
* Run TAQOZ

Can you explain why it's better to have the Clusos SD boot code make the first SD boot attempt rather than just starting TAQOS and letting it try booting from SD before entering its interactive mode?

evanh · 2018-01-28 00:51

Cluso99 wrote: »

I think of the case for bit banging the NRZI in USB FS.

The lags only occur on Cog monitoring/controlling of Smartpins internal state. If you're bit-banging the real pins then it doesn't occur.

Smart pins is not going to be a solution for bit banging.

Bit-banging often will be combined with the serial shifters in particular. SPI chip-selects have to be bit-banged. Any RS-232/485 control lines will also be bit-banged.

Cluso99 · 2018-01-28 12:04

evanh wrote: »

Cluso99 wrote: »

I think of the case for bit banging the NRZI in USB FS.

The lags only occur on Cog monitoring/controlling of Smartpins internal state. If you're bit-banging the real pins then it doesn't occur.

Smart pins is not going to be a solution for bit banging.

Bit-banging often will be combined with the serial shifters in particular. SPI chip-selects have to be bit-banged. Any RS-232/485 control lines will also be bit-banged.

What!?!?
I am bit banging the SD SPI interface including the chip select. But I am not driving it at speed.

And, bit banging the real pins is what I am saying about the lag. There are flops in the path in and out to the real pins from/to the PASM instructions.

Cluso99 · 2018-01-28 12:06

David Betz wrote: »

Cluso99 wrote: »

Basically correct. The serial ROM will be loaded into top Hub RAM on power up. Then it will execute from here code that will...
* Check pullups
Then one of the following
* Download from serial into Hub $0 upwards
* Download from FLASH into Hub $0 upwards
* Run Clusos SD boot code which will load into hub $0 upwards
* Run TAQOZ

Can you explain why it's better to have the Clusos SD boot code make the first SD boot attempt rather than just starting TAQOS and letting it try booting from SD before entering its interactive mode?

All the blocks of code will be combined for the final rom.

David Betz · 2018-01-28 12:26

Cluso99 wrote: »

David Betz wrote: »

Cluso99 wrote: »

Basically correct. The serial ROM will be loaded into top Hub RAM on power up. Then it will execute from here code that will...
* Check pullups
Then one of the following
* Download from serial into Hub $0 upwards
* Download from FLASH into Hub $0 upwards
* Run Clusos SD boot code which will load into hub $0 upwards
* Run TAQOZ

Can you explain why it's better to have the Clusos SD boot code make the first SD boot attempt rather than just starting TAQOS and letting it try booting from SD before entering its interactive mode?

All the blocks of code will be combined for the final rom.

That is not an explanation of why two SD boot methods are needed. The ROM is quite small and having duplicate code may mean fewer TAQOS features can be included. It seems like it would be best to maximize the ROM usage and that means eliminating duplication. Why not work with Peter to help him improve his SD code rather than provide an entirely different block of code that does the same thing as his SD functions?

Peter Jakacki · 2018-01-28 13:33

I am quite happy for Cluso99 to work on perfecting an SD boot method because even though I do have my own drivers, I like to compare that with an independent driver when the time comes and blend the two together to leverage the strengths of each. Of course we will work together closely on that at the time. My software SPI driver runs at 10MHz at present so we can expect 20MHz in the final silicon but obviously I need a way to test it at that speed or else limit it to 10MHz in the final silicon. But this is assuming we have a crystal etc so maybe I have to test this in RCFAST but surely if we have an SD card in the system, we would also expect a crystal. So many variables but this is where I will be relying upon the valuable help and experience from many in this forum. I've got as many questions as the next guy, but we need to be aware of pitfalls and also have possible solutions even more so than our focus on features. Chip did such a marvelous job in the past with not only getting P1 silicon right but also the whole Spin and boot ROM just as he is doing now with P2. We want to get our part right too.

David Betz · 2018-01-28 13:41

Peter Jakacki wrote: »

I am quite happy for Cluso99 to work on perfecting an SD boot method because even though I do have my own drivers, I like to compare that with an independent driver when the time comes and blend the two together to leverage the strengths of each. Of course we will work together closely on that at the time. My software SPI driver runs at 10MHz at present so we can expect 20MHz in the final silicon but obviously I need a way to test it at that speed or else limit it to 10MHz in the final silicon. But this is assuming we have a crystal etc so maybe I have to test this in RCFAST but surely if we have an SD card in the system, we would also expect a crystal. So many variables but this is where I will be relying upon the valuable help and experience from many in this forum. I've got as many questions as the next guy, but we need to be aware of pitfalls and also have possible solutions even more so than our focus on features. Chip did such a marvelous job in the past with not only getting P1 silicon right but also the whole Spin and boot ROM just as he is doing now with P2. We want to get our part right too.

That's good to hear. Then I guess the boot sequence that I suggested may be the final one once you have your SD code merged. The step of using Cluso's code is only temporary until the integration is done.

evanh · 2018-01-28 13:55

Cluso99 wrote: »

And, bit banging the real pins is what I am saying about the lag. There are flops in the path in and out to the real pins from/to the PASM instructions.

I didn't know that.

Ariba · 2018-01-28 15:06

I think the SD booter needs to run entirely in a cog, so that it can fill also the top 16kB hubram from the SD card. Not sure if TAQOZs FAT32 code can do that.

Cluso99 · 2018-01-28 21:03

My SD boot code is completely different from the usual boot code done by the different code out there. Previously I have used the routines from a number of sources including the latest one which Kye wrote. But here we want the code to be minimal and generic. Kye's was pedantic to get everything validated but that excluded some SD cards or corrupted ones.

My code uses the latest known SD info as well as allowing for some cards that do not implement SD correctly. It minimises the checks as it's only reading for specific info to find a "boot sector" to begin the load process. We don't want to be locked in against using future SD cards either. It should not be reliant on any crystal as that is unknown at boot time.

When the time comes Peter, Chip and I will work together to bring it all together. We've already discussed this offline.

evanh · 2018-01-28 22:02

Cluso,
I do remember now. You were asking about it not that long ago. I even spoke of the same effect with my testing of Smartpins.

Chip gave some details and I suspected it was a routing limitation of the FPGA and that the final silicon could probably do without the extra stages ...

Chip,
This may be worth putting in a lot of extra effort to verify. I'm not clear of best way to test it though. Probably OnSemi would have to simulate both to find the timings.

cgracey · 2018-01-28 22:16

What do we need to look into, again?

evanh · 2018-01-28 22:41

Chip,
Here's your posting on the subject - http://forums.parallax.com/discussion/comment/1426175/#Comment_1426175

I'm very late for work! I'll check in later ...

Yanomani · 2018-01-28 23:39

Chip

As for STREAMER-related transactions, It'll also be good to know the delays involved when exercising its several options of controlling pins states, or getting their values, including the data path to/from the FIFO.

Based on the posting, above refered by evanh, how does the several STREAMER WFBYTE/WORD/LONG and their corresponding RF???? counterparts does compare with OUT, IN for D/S and IN for TESTP{N}?

Sometimes, It's being a hard time to me to craft my own explanatory timing charts, trying to understand the right choreography, between SMART PINS and STREAMERS.

And, since the involved COG could also be grabbing some other state changes (PIN-related or SMARTPIN-related ones), It'll be good to know all these relationships.

There are also cases, where two COGs are expected to act synchronously, including times where both STREAMERs would be involved in complimentary tasks.

The more we know about those timing relationships, the better we can use them, in advantage, to our purposes.

Henrique

evanh · 2018-01-29 03:19

Main focus would be for possibility of removing those two hub registrations.

It'd be cool to have the "IN for D/S" improved as well but I know that's another matter.

jmg · 2018-03-01 05:20

Peter Jakacki wrote: »

...
Now I am testing the transition mode and decided to break that up into three words, TRANS to set the mode, PW to set the pulse width (just cycles at present but nanoseconds probably in the manner 100 ns PW ), and PULSES which triggers the number of pulse transitions. The pulse/cycle mode actually counts out pulses so manually typing $48 WRPIN $8.000C WXPIN 4 WYPIN causes four 50ns pulses in a 150ns period (sweet!) and then typing 2 WYPIN outputs two of those pulses so obviously I will find a way to express that in an easier to use format.

As I explore these mode interactively the documentation starts to make more sense but I'm just wondering how much testing has been done on the smartpins. Certainly I will continue to check them out and build support into TAQOZ.

Easier to use format is a nice idea, but the 'raw mode' of $48 WRPIN $8.000C WXPIN 4 WYPIN is always going to be present, right ?

Test coverage of smart pins has been modest - the broad functions are confirmed, but I'm not sure how much corner-case and packing testing has been done.
ie when you start continually pushing data, and hoping Streamer and Smart pins stay aligned.
I'm unsure how much 'gapless' continual tx/rx (or interrupts) testing has been done, but that likely needs PASM code, not byte-code engines to test.

MJB · 2018-03-25 23:07

Peter Jakacki wrote: »

Peter, latest Tacos.spin2
this looks strange


AJMP			mov	PTRAos		' jump to address on top of the data stack
			jmp	#DROP

I would expect


AJMP			mov	PTRA,tos		' jump to address on top of the data stack
			jmp	#DROP

TAQOZ - Tachyon Forth for the P2 BOOT ROM

Comments