P2 bootloader challenge

Peter Jakacki · 2015-11-23 01:51

While I am waiting for some initial bootloader code to be included in the P2 FPGA images I have been busy with writing and testing Tachyon for the P2. The problem is that I have a kernel that gets loaded with PNut, I then copy&paste multiple source code files into Tachyon to develop my final image. Now if I had the bootloader in ROM I could just have it load an image I have saved onto serial Flash or SD but that got me to thinking, why don't I just write a bootloader so that for the moment I can use PNut to load the bootloader which will then load my image from Flash or SD? By doing so I will prove or disprove the methods that I advocate.

Now I throw the same challenge out to all those rather vocal forumistas to get off their "buts" and prove you can do it your way rather than saying "but" all the time. While Chip is taking a Sunday break why don't we try some methods and present the results for Chip to review this week? I know I won't have any problems at all with a bootloader and this at least solves my immediate problem to a small extent.

I use the same SPI routines for Flash and for SD which also include block transfer functions too. To kick that part of it off here are the basic SPI functions although all my source code is out there in my Dropbox.

'********************** SPI READ/WRITE *********************

' SPI>BUF ( dst cnt -- )
SPI2BUF		mov	R1,tos
		wrfast	#0,tos1
.L0		call	#SPIRD
		wfbyte	tos
		djnz	r1,#.L0
		jmp	#DROP2


' SPIRD ( dummy -- dat )
SPIRD		rep	@.end,#8		' 8 bits
		xor  	outa,sck		' clock 
		xor	outa,sck
		test	ina,miso wc		' read data from card
		rcl	tos,#1			' shift in msb first
.end		ret

' BUF>SPI ( src cnt -- )
BUF2SPI		mov	R1,tos
		rdfast	#0,tos1
.L0		rfbyte	tos
		call	#SPIWR8
		djnz	r1,#.L0
		jmp	#DROP2
		

' SPIWR8 ( byte -- byte )
' Shift 8 bits from data[0..7] out and leave data on stack (restored with other bytes zeroed)
'
SPIWR8		shl	tos , #24		' left justify 8-bit data s	
'
' SPIWR ( data -- data<<8 ) 
'
SPIWR		rep	#3 , #8
		rol	tos,#1 wc		' output next msb
		muxc	outa,mosi
		xor	outa,sck		' clock 
		xor	outa,sck		' clock 
		ret

Here I dump memory from both devices which use these common SPI functions:

TF2$ 0 $40 SF DUMPL
00.0000: B412.ECFC B410.ECFC B40E.ECFC B40C.ECFC    ................
00.0010: B40A.ECFC B408.ECFC B406.ECFC B404.ECFC    ................
00.0020: 2802.ECFC B400.ECFC 207E.65FD 7000.00FF    (....... ~e.p...
00.0030: 003E.04F6 224A.00F6 024A.84F1 0048.04F6    .>.."J...J...H.. ok
TF2$ 0 $40 SD DUMPL
00.0000: 4C42.3250 0069.7A00 0000.4DB1 0000.0000    P2BL.zi..M......
00.0010: 0000.0000 0000.0000 0000.0000 0000.0000    ................
00.0020: 0000.0000 0000.0000 0000.0000 0000.0000    ................
00.0030: 0000.0000 0000.0000 0000.0000 0000.0000    ................ ok

I think I could program up a Prop hooked-up to the serial port so that it loads the bootloader automatically on reset to save me having to run PNut. But that is an extra step which would prove unnecessary if Chip starts implementing some of these bootloader functions.

Finally this is the reason I need a bootloader, look at my modules (you don't want to see the dictionary list).

TF2# Q
--------------------------------------------------------------------------------
CODE MEMORY  @ $00.82D7 for 29,399
NAME MEMORY  @ $00.ABBE for 11,709
DATA MEMORY  @ $00.E2DE for 329
FREE MEMORY  = 10,471

MODULES LOADED:
73DE: P2ASM.fth           Tachyon Forth inline assembler for the P2 151022-0000
63E3: EASYNET.fth         WIZNET NETWORK SERVERS 151111.0800
5773: W5500.fth           WIZNET W5500 driver for TF2 151110.1500
4839: EASYFILE.fth        FAT32 Virtual Memory Access File System Layer V1.1 for TF2 151119-1200
41B3: SDCARD.fth          P2 SD CARD Toolkit - 151119.1200
2A40: EXTEND.fth          TACHYON FORTH EXTENSIONS for the P2 - 151121-1400
MON 12:07:04
-------------------------------------------------------------------------------- ok
TF2#

What? you do want to see the dictionary listing?

(sorry, the forum said it was too big)

jmg · 2015-11-23 03:10

Minor point, but best nailed now, rather than overlooked later, is what is the P2 Boot clock in the final device ?
I think the SPI speed needs to be limited to no more than ~20MHz to cover the most common suspects.
You may want a NOP in the CLK pulses, to make it closer to 50% ?

Peter Jakacki · 2015-11-23 03:15

Don't worry about the final clock rate, we are checking methods that Chip can include in the FPGA image. Once functionality is established then adjusting for speed is a very minor technicality.

evanh · 2015-11-23 03:19

The block list is really just a tiny code extension to what you're already advocating - StartBlock plus BlockCount - Then repeat as necessary to cover possible fragmentation. There is plenty of spare space in the MBR to cover this with ease.

If the file happens to be a single fragment then there would be only a single entry in the block list. Just like yours.

Peter Jakacki · 2015-11-23 03:21

No "buts", just do it. I know it works already.

jmg · 2015-11-23 03:40

Peter Jakacki wrote: »

Don't worry about the final clock rate, we are checking methods that Chip can include in the FPGA image. Once functionality is established then adjusting for speed is a very minor technicality.

Sure it is, until it is overlooked in the rush to silicon.
Then that 'very minor technicality' becomes very costly indeed.
Smarter to get this stuff right from the beginning, not kick the can down the road and rely on some later checks to (hopefully) catch all the "Don't worry about's"

Peter Jakacki · 2015-11-23 03:51

No need to get caught up in trying to envision what the future holds at this point. This point is a technicality that would always be addressed at final P2 speeds anyway so there is no need to waste time talking about it as if we didn't appreciate that it would but at present there is nothing to address. So get into it.

Do, or do not, there is "no guessing" or naysay in this thread which is about actually putting your code where your mouth is.

potatohead · 2015-11-23 03:56

Well, you could always suggest some code edits to, say link the transfer to a time interrupt or counter... jmg

Or maybe you've got some template, or reference SD code planned that we could use to start attempting various boot methods?

I agree at the moment. Unless Chip says otherwise, the overall plan is to get the smart pins done right now, and continue to refine the core design. We are likely to find a few more teething pain type things to get cleaned up.

Once it's starting to look like a chip, there will be some time to settle the ROM. Synthesis and some design checks took a while last time, and that was a great time to sort out the ROM. Bet it all goes about the same on this run.

Getting a "just boot for now" job done, means we can simulate a lot of stuff, and doing that is important.

@Peter: Holy Smile! Well, your Forth-Fu is beyond question at this point.

Peter needs a boot loader to continue on his path, and we all need to hook up some SD cards and try some of this stuff out.

See the video driver efforts?

Same deal. Those will need to get some parametric type improvement. In fact, I'm late on some sync code I want to get done for TV... Point is, we build some rough code, test, explore the features, and what will fall out of that is some nicely refined code that can handle clock speed changes, etc... In fact, sync of frequency changes for pixel clocks is one thing that fell out of the current body of, "let's put stuff on the display" code out there right now.

Way back on P1, it was the same way. Walk before you run. It won't hurt a bit to toss some temporary SD card code in there. It's all gonna get redone and sorted as the final image gets done.

jmg · 2015-11-23 03:56

Peter Jakacki wrote: »

Do, or do not, there is "no guessing" or naysay in this thread which is about actually putting your code where your mouth is.

So you claim - yet when offered a NOP line of code, that fixes a real problem NOW, you reject is for the most nebulous of reasons ? - and generated far more words than actually simply fixing it would have.

potatohead · 2015-11-23 04:02

But it doesn't fix it. That NOP won't have much to do with production ready code intended for the ROM. That's a much higher clock, and the NOP will end up being something else. As it should be too.

jmg · 2015-11-23 04:04

Peter Jakacki wrote: »

No "buts", just do it.

Here you go, since you claim to want to 'just do it' - enjoy.

'********************** SPI READ/WRITE *********************

' SPI>BUF ( dst cnt -- )
SPI2BUF		mov	R1,tos
		wrfast	#0,tos1
.L0		call	#SPIRD
		wfbyte	tos
		djnz	r1,#.L0
		jmp	#DROP2


' SPIRD ( dummy -- dat )
SPIRD		rep	@.end,#8		' 8 bits
		xor  	outa,sck		' clock 
		nop 		 		' Fix clock Skew, improve Mhz tolerance  
		xor	outa,sck
		test	ina,miso wc		' read data from card
		rcl	tos,#1			' shift in msb first
.end		ret

' BUF>SPI ( src cnt -- )
BUF2SPI		mov	R1,tos
		rdfast	#0,tos1
.L0		rfbyte	tos
		call	#SPIWR8
		djnz	r1,#.L0
		jmp	#DROP2
		

' SPIWR8 ( byte -- byte )
' Shift 8 bits from data[0..7] out and leave data on stack (restored with other bytes zeroed)
'
SPIWR8		shl	tos , #24		' left justify 8-bit data s	
'
' SPIWR ( data -- data<<8 ) 
'
SPIWR		rep	#3 , #8
		rol	tos,#1 wc		' output next msb
		muxc	outa,mosi
		xor	outa,sck		' clock 
		nop 		 		' Fix clock Skew, improve Mhz tolerance  
		xor	outa,sck		' clock 
		ret

[/quote]

Peter Jakacki · 2015-11-23 04:20

Many of the forum members have gotten into the bad habit of talking that much that not only does nothing gets done, they even talk down anyone else who wants to get things done. Is this what happened last time the P2-hot bootloader was discussed? Shame.

I don't have to prove that I can get things done because I've been busy doing it. I have no problems writing my own bootloader that works and I have even shown the code that is common to both serial Flash and SD.

PLEASE - WRITE CODE - MAKE IT WORK - WE WILL ALL BENEFIT

evanh · 2015-11-23 04:36

Chip only asked for a consensus. I though it wouldn't be a problem advocating for following a well trodden protocol. Seems my diplomacy isn't up to scratch.

jmg · 2015-11-23 04:48

potatohead wrote: »

But it doesn't fix it. That NOP won't have much to do with production ready code intended for the ROM. That's a much higher clock, and the NOP will end up being something else. As it should be too.

?? Err nope.
The Boot clock will be much lower than the PLL ability (just as it is now, in P1), but you still need to avoid too-narrow clock pulses being the limiting factor.

The NOP fixes a clock skew issue, but you are welcome to provide a smaller means to fix the clock skew.

Peter is asking for code - I provided some.

You also seem to think the final ROM code will be radically different - why ?
If you get the details right now, there should be no reason for radically different ROM code.

potatohead · 2015-11-23 05:13

Do you need an FPGA board to write PASM on jmg?

Perhaps we all could be very enlightened with an example.

"there should be..."

Sure, and there is how it's actually being done. Two different things. And yes, that means the ROM code is yet to be seriously sorted out yet. We are running on a small stub for testing the core design, just like last time.

Now, I'm going to get back on some streamer tests and ideally get the VSYNC stuff I've had queued done.

@Peter, everyone: Is the SD card on the DE2 connected to the FPGA image?

Peter Jakacki · 2015-11-23 05:23

No, that isn't available but I've got an sd card connected on the breadboard to some port pins at P25 etc but the final llines could be shared with the serial flash with just the chip selects separate. In fact the SPI code is common also for the WIZnet chip, I just change the masks registers.

jmg · 2015-11-23 05:23

potatohead wrote: »

Perhaps we all could be very enlightened with an example.

Example already given above, maybe you missed it ?

http://forums.parallax.com/discussion/comment/1355551/#Comment_1355551

potatohead · 2015-11-23 05:33

Ok, I'll cobble something together in the next day or two, thanks.

Cluso99 · 2015-11-23 07:22

jmg,
I suggest instead of arguing, you just go do it yourself. That way you can prove what you want works.

My bootloader is posted in the PropOS thread here
http://forums.parallax.com/discussion/138251/a-propeller-os-that-can-run-on-multiple-hardware/p1
It only requires converting to P2 code, which I will do sometime soon.
It works on the P1. I ship both commercial and hobbyist products using this bootloader.
And I am just about to release a new P1 board that includes an SD card with this bootloader.

jmg · 2015-11-23 07:43

Cluso99 wrote: »

jmg,
I suggest instead of arguing, you just go do it yourself. That way you can prove what you want works.

Arguing ? nope - I merely gave a (very) simple, real code suggestion improvement - exactly what Peter was claiming he wanted.

potatohead · 2015-11-23 08:05

Seriously jmg?

Ok, this is getting really out there in the weeds.

What Peter wants is for people to post up SD card boot code methods. A lot has been proposed. Chip is asking for specifics, and the best way to do that is to provide some code.

Now, say his code was modified to do the FAT file blocks, or some other scheme? That's something that is wanted.

Or, ignore his code and contribute a PASM routine to read boot files from an SD card somehow. That's wanted too.

There has been a lot of discussion on various methods, and maybe it's best if some code got written to explore the merits of those methods. That is what is wanted.

It's wanted to see how many instructions it might take, difficulty of implementation, etc... and it's wanted for others to test too. Does it work on their board, their SD card, format, etc.. ?

With these code bodies, we can make some better decisions. If they are small, or perhaps can be combined to use common routines, a nice SD boot scheme that makes a lot of people happy might not take all that many instructions or time.

cgracey · 2015-11-23 16:38

Guys, there are multiple clock delays between changing an OUTA bit and having a pin transition. There are also clock delays between a pin transitioning and INA reflecting the change. I will post these numbers shortly. I must look at the Verilog and figure it out.

This means that you can't write code that toggles a clock pin and immediately reads a data pin.

potatohead · 2015-11-23 17:10

Good to know.

Peter Jakacki · 2015-11-23 17:22

Whoa up there Chip, are you saying that reading a pin occurs before the output transition that preceded it? I didn't know the P2 was that fast that that this was possible

I've got this SPI code which I've been using to read and write reliably from serial Flash, SD cards, and WIZnet chips so I'm not sure what you are saying.

potatohead · 2015-11-23 17:26

Seems like your code is working, but there is some latency between it and the pins that is greater than expected...

I'm curious about it too.

Seairth · 2015-11-23 18:24

Chip,

Make sure to document it in your GDoc file instead of here. Otherwise, it will eventually get buried...

Cluso99 · 2015-11-23 22:43

cgracey wrote: »

Guys, there are multiple clock delays between changing an OUTA bit and having a pin transition. There are also clock delays between a pin transitioning and INA reflecting the change. I will post these numbers shortly. I must look at the Verilog and figure it out.

This means that you can't write code that toggles a clock pin and immediately reads a data pin.

OUCH! This is going to be a BIG source of problems for the unwary!

cgracey · 2015-11-24 01:21

I did a test:

' Prop123 / DE2-115 code

dat
		org

		mov	dira,#$FF
		mov	dirb,#$FF

		setb	outa,#0
		testb	ina,#0		wc
		setbc	outb,#0			'led off

		setb	outa,#1
		waitx	#0			'2 clocks
		testb	ina,#1		wc
		setbc	outb,#1			'led off

		setb	outa,#2
		waitx	#1			'3 clocks
		testb	ina,#2		wc
		setbc	outb,#2			'led on!

		setb	outa,#3
		waitx	#2			'4 clocks
		testb	ina,#3		wc
		setbc	outb,#3			'led on

		jmp	#$

It seems that three clocks are needed between changing OUTA/OUTB and seeing the effect on INA/INB. This means that four clocks would be safe for reading SPI parts, as three is cutting it maybe too close. That's a two-instruction delay needed between outputs and inputs.

This means for a fast synchronous serial input, clock outputs need to be staggered relative to inputs.

jmg · 2015-11-24 01:44

cgracey wrote: »

It seems that three clocks are needed between changing OUTA/OUTB and seeing the effect on INA/INB. This means that four clocks would be safe for reading SPI parts, as three is cutting it maybe too close. That's a two-instruction delay needed between outputs and inputs.

Do you have the Clk-op and tsu numbers, as that +1 SysCLK margin needs to be enough to satisfy the SPI Clk-op, and the P2 tsu values.

Best to get the details right early, and avoid thermal or process issues later.

I see a small 10MHz SPI EE device specs 40ns MAX Ck-op & Twh,Twl 40ns Max
A SPI SRAM specs 32ns in E-Temp version, which are both > 1 added SysCLK

cgracey wrote: »

This means for a fast synchronous serial input, clock outputs need to be staggered relative to inputs.

Hehe, you mean to say a NOP (or two) is needed, like this ?

' SPIRD ( dummy -- dat )
SPIRD		rep	@.end,#8		' 8 bits
		xor  	outa,sck		' clock (active edge)
		nop 		 		' Fix clock Skew, improve Mhz tolerance  & Tsu for Din
		xor	outa,sck		' cannot be active edge !
		test	ina,miso wc		' read data from card
		rcl	tos,#1			' shift in msb first
.end		ret

mindrobots · 2015-11-24 01:49

Is this FPGA timing issue that will disappear with silicon and/or smart pins?

Peter Jakacki · 2015-11-24 01:59

Well this flies in the face of what I am achieving, I find that there is no need to insert nops even though I have tried various schemes early on for clock symmetry with all those nops etc. I've tested this over and over, are you sure your LED is not affecting the rise time, did you try it without any non-CMOS loads? Take into account that I am actively using these "raw" routines. I feel that this timing is a drive issue.

@jmg, you are stuck in a clock loop harping on about "your" nop, as if that is something only you could come up with.

P2 bootloader challenge

Comments