P2 bootloader challenge

Seairth · 2015-11-24 02:08

I'm not understanding the significance of Chip's test/demonstration. Why would you read INx for a pin that set to output?

cgracey · 2015-11-24 02:10

It's got nothing to do with loading. There are clock delays in the signal chain. The silicon will be exactly the same.

Peter Jakacki · 2015-11-24 02:14

Well when I clock SPI I've checked that the MISO data does indeed change on the falling edge and immediately I sample it and the sample that I read is the same sample that I read off the digital-signal analyzer.

SPIRD		rep	@.end,#8		' 8 bits
		xor  	outa,sck		' clock 
		xor	outa,sck
		test	ina,miso wc		' read data from card
		rcl	tos,#1			' shift in msb first

BTW, by loading I mean capacitive effects of the LED on rise/fall time.

cgracey · 2015-11-24 02:19

Seairth wrote: »

I'm not understanding the significance of Chip's test/demonstration. Why would you read INx for a pin that set to output?

These delays matter when there is some tight output-to-input signalling.

I will write a fast SPI input routine tonight and post it here.

jmg · 2015-11-24 02:41

Peter Jakacki wrote: »

@jmg, you are stuck in a clock loop harping on about "your" nop, as if that is something only you could come up with.

Read my posts more carefully.
I said it was minor, and have never claimed it was unique, but I DID say details matter.

I am enjoying a wry smile here ...

evanh · 2015-11-24 02:42

Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.

' SPIRD ( dummy -- dat )
SPIRD		xor  	outa,sck		' clock (active edge)
		nop 		 		' Fix clock Skew, improve Mhz tolerance  & Tsu for Din
		xor	outa,sck		' cannot be active edge !

		rep	@.end,#7		' Remaining 7 bits

		xor  	outa,sck		' clock (active edge)
		test	ina,miso wc		' read data from card
		xor	outa,sck		' cannot be active edge !
		rcl	tos,#1			' shift in msb first
.end
		nop
		test	ina,miso wc		' read data from card
		rcl	tos,#1			' shift in final lsb
		ret

Peter Jakacki · 2015-11-24 02:42

Just going through my SPI read timing and I think I remember how I got it to work without the delays because there was an issue, it's all a matter of clock polarity and offset "tricks". I will document what I did so that I could clock without any delays. The clock high btw is 40ns and the clock cycle is 160ns with data hold/valid at around 10ns.

jmg · 2015-11-24 02:49

evanh wrote: »

Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.

That's nifty, slightly unrolled, & manages a 50% clock duty and a 4 opcode loop.
Relies on there being at least 2 clocks from CLK to Din.

Peter Jakacki · 2015-11-24 03:03

evanh wrote: »

Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.

' SPIRD ( dummy -- dat )
SPIRD		xor  	outa,sck		' clock (active edge)
		nop 		 		' Fix clock Skew, improve Mhz tolerance  & Tsu for Din
		xor	outa,sck		' cannot be active edge !

		rep	@.end,#7		' Remaining 7 bits

		xor  	outa,sck		' clock (active edge)
		test	ina,miso wc		' read data from card
		xor	outa,sck		' cannot be active edge !
		rcl	tos,#1			' shift in msb first
.end
		nop
		test	ina,miso wc		' read data from card
		rcl	tos,#1			' shift in final lsb
		ret

Thanks, I will have a look at it soon but when I wrote this code I didn't know that you couldn't sample that fast as it wasn't documented, so I just made it work!

cgracey · 2015-11-24 03:03

evanh wrote: »

Try this one out Peter. I'd love to know if gives stable result. You could say I'm testing what Chip has said, and it does depend on the clock active edge being where your code says.

' SPIRD ( dummy -- dat )
SPIRD		xor  	outa,sck		' clock (active edge)
		nop 		 		' Fix clock Skew, improve Mhz tolerance  & Tsu for Din
		xor	outa,sck		' cannot be active edge !

		rep	@.end,#7		' Remaining 7 bits

		xor  	outa,sck		' clock (active edge)
		test	ina,miso wc		' read data from card
		xor	outa,sck		' cannot be active edge !
		rcl	tos,#1			' shift in msb first
.end
		nop
		test	ina,miso wc		' read data from card
		rcl	tos,#1			' shift in final lsb
		ret

That's pretty much what I was envisioning, too.

Peter Jakacki · 2015-11-24 03:09

Just a quick analysis of what I did it seems that just before I did the SD card read block that I inserted a single clock pulse and then I don't need to worry about anything else for the next 512 bytes, just go for it and 815us later the data is all buffered.

As for the WIZnet chip well that outputs the read data before the clock so that just worked, and now I have to check the serial Flash. A lot of this stuff was done on the fly as I was still busy writing and testing various parts of Tachyon.

EDIT: just inserted my sample in the middle of the clock pulse (which I have tried before) and that works fine due to the clock skew I use and now the clock high/low is 80ns/90ns

jmg · 2015-11-24 04:13

mindrobots wrote: »

Is this FPGA timing issue that will disappear with silicon and/or smart pins?

No, the inherent delays of N clocks remain, and the T CK-op & Tsu values may even get worse on 180nm process.

cgracey · 2015-11-24 06:09

Here is how you could clock SPI data out of a flash efficiently:

		rep	@.end,#8	'ready for 8 bits

		clrb	outb,clk	'clk low
		rcl	data,#1		'save din bit (1st pass like nop)
		setb	outb,clk	'clk high
		testb	outb,din  wc	'sample din (4 clocks since clk low)
.end
		rcl	data,#1		'save final din bit

Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.

Peter Jakacki · 2015-11-24 06:25

Thanks Chip, but if the carry isn't cleared when it's entered it will have b8 set so you would need to mask or clear the carry.

cgracey · 2015-11-24 06:32

Peter Jakacki wrote: »

Thanks Chip, but if the carry isn't cleared when it's entered it will have b8 set so you would need to mask or clear the carry.

Ah, yes. You would need 'and data,#$FF' after the last 'rcl' instruction.

jmg · 2015-11-24 06:50

cgracey wrote: »

Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.

That's a little more compact, and it still has 50% CLK, but the slow-memory tolerance is not as good as the code a few posts up.

How much margin is in the pipeline, for adjacent opcode wr-rd cases ? I think 2 CLKs from your tests ?

cgracey · 2015-11-24 07:36

jmg wrote: »

cgracey wrote: »

Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.

That's a little more compact, and it still has 50% CLK, but the slow-memory tolerance is not as good as the code a few posts up.

How much margin is in the pipeline, for adjacent opcode wr-rd cases ? I think 2 CLKs from your tests ?

You're right. I didn't realize that about the slow-speed tolerance.

It took three clocks, at minimum, before pins echoed. 4 clocks is just practical given that instructions take 2 clocks each. That extra clock would give memories more time. I may not understand what you mean, though.

jmg · 2015-11-24 08:44

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

Note there are 4 clocks between clk going low and din being sampled. This respects the Prop2 clock delays and the SPI device outputting after the falling clk edge.

That's a little more compact, and it still has 50% CLK, but the slow-memory tolerance is not as good as the code a few posts up.

How much margin is in the pipeline, for adjacent opcode wr-rd cases ? I think 2 CLKs from your tests ?

You're right. I didn't realize that about the slow-speed tolerance.

It took three clocks, at minimum, before pins echoed. ..

If 3 CLKs is visible, and 2 is not, then there may be a remote chance 2 becomes visible with extreme timing ? but that still leaves 1 clk of headroom, which is ok
(ie there is by-design no risk of a shift in phase ?)

cgracey wrote: »

That extra clock would give memories more time. I may not understand what you mean, though.

The parts I looked at would need more than 1 SysCLK Tco tolerance, but in other aspects do meet timing (ie they would load with SysCLK=50MHz. 50% Duty, 8c Loop)

LoopyByteloose · 2015-11-24 13:02

Well is this may be all wrong and useless, but there is a public-domain all-purpose bootloader called "Das U-boot" that might be a useful alternative.

http://www.denx.de/wiki/U-Boot/

The Raspberry Pi, my Cubieboard1, and my MR-3020 minirouter all are capable of using 'Das U-boot'. So I've been wondering why not the Propeller 2?

Cluso99 · 2015-11-26 03:28

Here is Kye's PASM section of his FAT file system, with modifications that I have done for my PropOS. This can be simplified quite a bit.

jmg · 2015-11-26 04:03

Loopy Byteloose wrote: »

So I've been wondering why not the Propeller 2?

Maybe the simple issue of size ?
The numbers given around RAM and image sizes, suggest this is nowhere near a 16k ROM footprint.

potatohead · 2015-11-26 04:24

Yep. That level of functionality is way beyond the scope of the P2. Heck, there is a good, strong case for FAT itself being beyond the scope. We shall see.

P2 bootloader challenge

Comments