SPI boot code and new CALLPA/CALLPB instructions
cgracey
Posts: 14,210
Here is the the code from the booter that reads the SPI flash. It makes use of the new CALLPA/CALLPB instructions which store D/# into PA/PB and call to S/# (PA/PB used to be named ADRA/ADRB). They save an instruction when you need to pass a parameter to a subroutine.
Thanks to jmg and the others who discovered the quad-exit and reset commands that initialize various SPI flash chips, in case they're in a quad mode or waiting for a 300-second erase to complete when you need to boot.
Thanks to jmg and the others who discovered the quad-exit and reset commands that initialize various SPI flash chips, in case they're in a quad mode or waiting for a 300-second erase to complete when you need to boot.
CON spi_cs = 61 'SPI flash chip-select pin spi_ck = 60 'SPI flash clock pin spi_dq = 59 'SPI flash data I/O pin DAT ' ' ' Load from SPI flash, if present ' callpa #spi_cs,#check_pullup 'check for spi flash via spi_cs pull-up if_nc jmp #.nospi outh #spi_cs 'make spi_cs high dirh #spi_cs 'make spi_cs output dirh #spi_ck 'make spi_ck output neg pb,#1 'set command bits to all 1's callpa #2,#spi_cmd 'send exit-quad command callpa #8,#spi_cmd 'send exit-quad command callpa #16,#spi_cmd 'send exit-dual command callpb #$66,#spi_cmd8 'send reset command callpb #$99,#spi_cmd8 waitx ##rc_max/20_000 'wait 50us callpb #$04,#spi_cmd8 'send write-disable command to clear WEL .wait callpb #$05,#spi_cmd8 'send read-status command call #spi_in 'get status testb y,#1 wz 'if WEL high, no SPI flash if_nz jmp #.float test y,#0 wz 'if BUSY high, wait for erase/write to finish if_nz jmp #.wait mov pa,#32 'send read command callpb #$03,#spi_cmd wrfast #0,#0 'load loader into $000..$3DF, HMAC signature into lut bmask z,#9 'ready to input $400 flash bytes .data call #spi_in 'get byte cmp z,#$20 wc 'first $3E0 bytes are program, last $20 bytes are signature if_nc wfbyte y 'store program byte into hub if_c call #enter_sig 'store signature byte djns z,#.data 'loop for next byte call #verify_sig 'verify loader signature if_z setq2 #$F7 'if loader verified, copy into lut if_z rdlong lut_loader,#0 if_z callpa #spi_ck,#check_pullup 'if loader verified and pull-up on spi_ck, run it now if_z_and_c jmp #$200+lut_loader if_z setb mode,#spi_ok 'if loader verified, set flag and leave SPI enabled .float if_nz dirl #spi_cs 'if no SPI flash or loader didn't verify, float SPI pins if_nz dirl #spi_ck .nospi ' ' ' Check pin pull-up, c=1 if present ' check_pullup dirh pa 'drive low (out bit must be low) waitx #20*1 'wait ~1us dirl pa 'float waitx #20*5 'wait ~5us testin pa wc 'sample pin into c ret ' ' ' SPI command ' spi_cmd8 mov pa,#8 spi_cmd rol pb,#24 outh #spi_cs outl #spi_cs ' ' ' SPI long/byte out ' spi_out dirh #spi_dq 'make data output .out rol pb,#1 wc 'get bit to send outc #spi_dq 'set data to bit outh #spi_ck 'clock high cmp pa,#2 wc 'last bit? if_c dirl #spi_dq 'if last bit, make data input outl #spi_ck 'clock low djnz pa,#.out 'loop to output bits ret ' ' ' SPI byte in ' spi_in rep @.in,#8 'ready to input a byte outh #spi_ck 'clock high outl #spi_ck 'clock low testin #spi_dq wc 'sample data bit ('testin' is from before 'outl') rcl y,#1 'save data bit .in ret
Comments
That makes for nice code
Comments: Code looks very good.
Some minor details in the code
* Is it worth doing a CheckBusy before the $66,$99, and skip that if not busy ?
* Preamble Commands with no data, should be CS framed. ==\___/===, without very narrow CS=H
Above seems to exit with CS=L, which is a normal SPI command+Data
I think this needs 2 distinct Cmd_prefix (leaves CS=L), and a Cmd_frame (exits with CS=H)
Not sure of the most compact way to code that ? CS=H after every framed call would do, but not compact.
* Is spi_in tested ? I'm not sure you have the right sample point, as the last Address/Cmd =\_ will drive first bit on DO,
so I think you should sample before the next CLK =\_ ?
- Oh wait, just saw the comment "'testin' is from before 'outl'", still, even with that, the earlier placement gives 50% CLK duty.
What is the exact testin pipeline ?
eg repeating outl/rcl/outh/testin, has what phase of actual Pin Clock =\_ and sample point ?
About it being worth checking for busy before doing a ($66,$99), if 50us is the only concern, I say let it be.
CSn is low until it gets floated, in case of some error. That way, the signed loader can pick right up, shifting more bits out. If CSn gets floated and there is a flash chip, it's CSn pull-up will shut it off.
This is running fine on my FPGA.
You sure can deduce a lot, quickly!
is Opc -> Pin the middle of the next opcode(+1), or the end of the next opcode (+2), or ?
Likewise, is Testin, -1, or -2, or ? clocks from the start of the testin opcode ?
Maybe add a NOP between
outh #spi_cs
outl #spi_cs
and between
outh #spi_ck 'clock high
outl #spi_ck 'clock low
When I look at some SPI EEPROMs, they have slightly more modest timing - 100ns may be tight.
Someone may choose a SPI EEPROM over SPI flash, because the EEPROM has longer life, and faster erase times.
OK, Given Macronix data, I guess this will somewhat auto-select.
eg Not busy is 30-40us, and if it was doing a full erase, it is busy and max wait is then 100ms.
Great
FRAM are not as cheap, but if you really want to avoid any delay/busy effects, they are a good solution for small sizes.
In a 4-pin SPI connection this has a minor effect of starting one byte early, so you can fix by just relocate the image.
In a 3-pin connection, that's more of an issue, as you now have BUS contention.
Very sneaky. I want to find this FRAM that would cause 3-pin contention.
That has $ff Quad exit code, and $03, $05 codes, with Status xx000000b, and should ignore $04, $66,$99.
The DO pin on a SPI Flash chip is actually the output pin (MISO) and the DI on SPI Flash is the input pin (MOSI).
This is the classic P1 connection
So spi_do & spi_di are incorrectly commented.
Looks good Chip. Once I can publish the SD card code, you will see how SPI is used on SD.
Typically there is just one common send/receive routine for 8/16/32 bits. If you are reading then the output register is set to all 1's (by a neg dataout,#1). If you are writing then reading still takes place except the reply is just ignored. Makes for pretty simple routines.
I will be posting any day now for lots of P1 testing with lots of SD cards.
Relocating the image by one byte should be a fix for both the cases isn't it?
NXP only requires that compatible chips know the $FF and $03 commands...
This code will fail if chip doesn't recognize $05 command...
Every SPI flash chip supports these commands. $04 disables write, causing bit 1 of STATUS to go low. $05 reads STATUS. Bit 0 of STATUS is high when the device is busy erasing or programming. We need to wait through any erase/program in progress before the flash can be read.
I'm going to verify that EEPROMs and FRAMs support these, too.
I looked at some SPI EEPROM and FRAM datasheets and they support these, too. The FRAM just returns 0 for BUSY, since it takes no time to reprogram.
You could just make the first two bytes of flash have to read "P2" or something as a check that only requires the read command...
I've thought about doing a quick read check, too. The thing is, it only takes about 14 milliseconds to load and verify the loader.
If so, was suggesting that just checking the first two bytes with a read command for some constant value would also work...
Oh, I see what you mean. But, there's that matter of even knowing if the chip is currently unavailable for reading because an 'erase' or 'program' is in progress. That needs to be checked for, first. By doing a read-disable, you are certain to get a 0-to-1 contrast between WEN and BUSY if the chip is, indeed, busy. No chip connected would return either 0,0 or 1,1 for WEN,BUSY. So, the BUSY check is combined with a presence check, using WEN as contrast.
On the other hand, just cycling power would fix a reboot while writing flash dilemma..
Most devices say something like "Updating firmware! Do not reboot until complete!"
I don't see writing to flash as something that's going on except maybe 0.01% of the time...
Exactly! All those commands are for extremely unlikely scenarios, but if they aren't there...
About cycling power, it is possible to do, but requires one pin, a logic-level P-FET, and a bleed resistor on the SPI memory's power pin. It would be better to solve this with just a few instructions, if we can cover over 99% of possible scenarios.
Certainly true. The FRAM and EEPROM look ok, as also does the Microchip SRAM.
Field testing is what is now needed, to prove and check for any unforseen issues.
Around the CS-too-narrow fear I have, I think this line shuffle can work ?
Present code of
changes to one of these - same code, different order to stretch CS=H
or
The WRITEDIS is done to ensure a known state in the WEN bit. Part without this command, should ignore it.
The problem there is a blank part looks rather like a not connected part.
Read-ID is the other command that could be used, as that always gives <> 00.ff, but ReadStatus is probably enough.
Even the SRAM parts have defined lower Status bits, that read 00, so they expect this sort of code.
Yes.
A purist could also clear the very first (redundant) byte, so that when it streams at the same time as ADR_LSB is output, both drive low. A new/blank part would have contention, but thereafter, contention current is much lower.
Only an FRAM with 16-bit address would benefit from this resistor, and as jmg pointed out, if you could get a $00 into the first data byte, there would be no contention thereafter.
I was thinking about this matter that dMajo brought up about parts killing themselves by driving against opposing states (or power rails). I think that the foundry design rules actually prevent this from happening. For final output transistors, the gates and drains are long and doped to much higher impedance than normal silicide. This is anticipating what could, otherwise, become Mortal Combat. It also mitigates ESD damage. So, logic chips don't kill each other. If there were enough pins in contention, they could cause over-current on the power conductors, though. I once heard an Altera guy say that they had given some university the info needed to configure their FPGA's. Well, in their experimentation, they were getting internal bus signals in conflict and they WERE destroying devices. The transistors in contention were minimum-length (highest-current) and silicided to ~8 ohms/square. They COULD blow eachother up! Not going to happen with I/O pins, though.
Good idea. I found that the 'spi_out' label was never even called, so we can do this:
Looks great Ready for field testing ?