With 'FLASH' on and both P59 switches off you only have 100mS to start serial comms after reset.
After that time if a valid 'Prop' checksum is calculated from the 1024 bytes loaded, from SPI FLASH, the code executes.
This will block ypu from starting TAQOZ from a terminal.
Try turning on P59 Up. I can make it program with both P59 Up and Down on together.
Thanks evanh! That worked while programming with Spin 2 GUI. I get the program loaded up and apparently running. P56 blinks, although it doesn't show anything on the terminal yet.
Next step is turn P59 Up off again and press the reset button.
Strange, after pressing reset, with P59 Up already in the off position, P56 blinks no more. ...
At that point it has run and finished. If you didn't have the terminal monitoring already then you would have missed the report.
P56 blinking just indicates that programming of the flash is complete. To get it to run from the flash requires the reset. So, repeated resets will keep rerunning the program in flash.
With 'FLASH' on and both P59 switches off you only have 100mS to start serial comms after reset.
After that time if a valid 'Prop' checksum is calculated from the 1024 bytes loaded, from SPI FLASH, the code executes.
This will block ypu from starting TAQOZ from a terminal.
Try turning on P59 Up. I can make it program with both P59 Up and Down on together.
Thanks evanh! That worked while programming with Spin 2 GUI. I get the program loaded up and apparently running. P56 blinks, although it doesn't show anything on the terminal yet.
Next step is turn P59 Up off again and press the reset button.
Strange, after pressing reset, with P59 Up already in the off position, P56 blinks no more. ...
At that point it has run and finished. If you didn't have the terminal monitoring already then you would have missed the report.
P56 blinking just indicates that programming of the flash is complete. To get it to run from the flash requires the reset. So, repeated resets will keep rerunning the program in flash.
So, how can I see the report? I'm programming with Spin 2 GUI and see nothing on the terminal window after I press the reset button, with the switches in the correct position.
My plan to use this to automatically load flash from SpinEdit isn't going so well...
I can't figure out why hardcoding the file size doesn't work...
In this test, I've just hard coded the file size and commented out the 3 lines that overwrite the file size. Doesn't work... Can't figure out why not...
It works for the a very small program like "Larson scanner", but not this big one...
Never mind, figured it out...
Added this line to fix it.
rdlong byte_count,##size
Don't know why, but it works...
So, I take the fastspin binary of this and remove the last 32 bytes (fastspin seems to pad the end with 28 bytes of zero for some reason) to remove the file size. Then, I add the filesize. Then, add the program's binary. Add in those 28 zeros (just in case). Load this into ram and it programs the flash.
Now, I have Load Flash option in SpinEdit. Thanks ozpropdev!
I am using your flash loader to load Catalina programs into Flash on the P2_EVAL - many thanks!. But I have noticed that it sometimes does not program correctly on the first attempt, but generally seems to work on the second.
Have you seen anything like this in your own testing?
I see it predates the discovery about unreliable PLL mode switching. He's used the older simpler method that is known to randomly fail.
Hmmm. Still having problems. Can you point me to the correct way we are supposed to set the clock on boot? I thought I was doing it correctly, but perhaps I am not.
There was also a big discussion in another topic. Basic structure is to remember and reuse the prior mode config to cleanly switch back to RCFAST clock source before making any adjustment to PLL configuration.
Hi RossH
I haven't had any issues with the flash loader so far.
I've been using it for quite some time now with my micropython stuff.
My eval board does seem to behave slightly differently to others though.
Hi RossH
I haven't had any issues with the flash loader so far.
I've been using it for quite some time now with my micropython stuff.
My eval board does seem to behave slightly differently to others though.
Yes, it might not be the P2 itself. It could be the P2 EVAL board. Or the boot ROM. I have noticed some odd things previously - they generally seem to sort themselves out with enough power cycles, SD Card removal/re-insertions and/or reboots
Ross,
IIRC the current ROM boot code for SD can leave the DO pin from the SD card driven. This interferes with the Flash SPI such that it will not work. This has hopefully been fully corrected in the new ROM by forcing the SD card to release DO after each use/transaction.
Not sure if this is causing your problems.
Ross,
IIRC the current ROM boot code for SD can leave the DO pin from the SD card driven. This interferes with the Flash SPI such that it will not work. This has hopefully been fully corrected in the new ROM by forcing the SD card to release DO after each use/transaction.
Not sure if this is causing your problems.
That may be causing some odd problems I was having with some other code, even if it is not causing this particular problem.
For the current engineering samples, consider that if you access the SD, then Flash is a no-go, even after reset!!!
There was an old thread where this was discussed as there is a sequence of clocks (96*8) if memory serves me correctly to force the sad to release the DO pin.
I've solved the loadp2 issue with needing to turn P59-UP on and off all the time. The pauses in the loading sequence were just too long, causing the 100 ms timeout to occur. So that sorts the revA Eval board.
RevB Eval board doesn't seem to program its Flash memory yet. The programming sequence completes but the reset doesn't boot. I'm about to look into this now ...
Lol, I've done a large amount of re-engineering your programmer/loader code over the last day, Brian. Along with fixing up loadp2 as well.
Only now did I decide to meter the electrical connections for continuity. Amazingly pin #1 (Chip Select) of the Flash chip was floating in air. The solder blob was only on top. Reheating it fixed it. Tested and work now.
Brian,
This is my current stage1 SPI read routine that is added to front of Flash chip program. Ignore the longer 32 bits.
Tell me if you can understand the comments describing the SPI clocking. The built-in compensation really makes a big difference with the ability to overclock it for higher bit rates. I probably should have just used the hardware resources. I might give that a whirl next.
read_byte4
outh #spi_clk 'OUT takes 4 sysclocks to present to the pinnopoutl #spi_clk 'tells SPI chip to clock the second bit outrep @.loop, #31'one bit per 8 sysclocks, plenty of leeway to accommodate poor slewingouth #spi_clk
testp #spi_do wc'IN takes another 4 or 5 sysclock to present from the pinoutl #spi_clk 'SPI chip clocks out data on falling edgerclpa, #1
.loop
noptestp #spi_do wc'picks up data from OUTL seven instructions prior (14 sysclocks)_ret_rclpa, #1' the last OUTL is for first bit of next word, if any
.... The built-in compensation really makes a big difference with the ability to overclock it for higher bit rates. I probably should have just used the hardware resources. I might give that a whirl next.
RDSR05h Read Status Register 110MHz or 54MHz
RDID0Fh Read Device ID 410MHz or 54MHz
RUID4Ch Read Unique ID 810MHz or 54MHz
RDSN C3h Read Serial Number Register 810MHz or 54MHz
Brian,
This is my current stage1 SPI read routine that is added to front of Flash chip program. Ignore the longer 32 bits.
Tell me if you can understand the comments describing the SPI clocking. The built-in compensation really makes a big difference with the ability to overclock it for higher bit rates. I probably should have just used the hardware resources. I might give that a whirl next.
read_byte4
outh #spi_clk 'OUT takes 4 sysclocks to present to the pinnopoutl #spi_clk 'tells SPI chip to clock the second bit outrep @.loop, #31'one bit per 8 sysclocks, plenty of leeway to accommodate poor slewingouth #spi_clk
testp #spi_do wc'IN takes another 4 or 5 sysclock to present from the pinoutl #spi_clk 'SPI chip clocks out data on falling edgerclpa, #1
.loop
noptestp #spi_do wc'picks up data from OUTL seven instructions prior (14 sysclocks)_ret_rclpa, #1' the last OUTL is for first bit of next word, if any
Hi Evan
Had a quick look and I think I see what your trying to do.
The original code works fine up to system clock <= 275MHz.
What speed are you getting now?
A variant of loader that uses the Hyperflash would be pretty slick.
Yet another thing to add to the forever growing TODO list
A variant of loader that uses the Hyperflash would be pretty slick.
Yet another thing to add to the forever growing TODO list
The problem there is an SPI/SD device is still needed for booting as well. If going down that path then probably wise to turn any HyperFlash parts into FAT filesystem storage. As opposed to HyperRAM being used as a large buffer.
Brian,
I've been back at this again. It dawned on me that smartpins doing rx behaves differently enough from tx that It'd make a lot of sense to use smartpins. So I've added a bunch of init code and replaced that read4 routine above again ... for doing dual-SPI fast reads. Nicely suits booting the onboard SPI Flash chip.
{
Prop2 Flash loader
Version 1.2 17th January 2019 - ozpropdev
18 Oct 2019 Reengineered the programming bitbashing to resolve an issue that turned out to be a faulty board - Evan H
31 Oct 2019 Modified to use dual smartpins for block reads with DualSPI signalling
Writes user code (.obj) and loader into flash.
On P2-ES Eval board "FLASH" switch must be on.
"CODE" is stored in FLASH starting @ $1_0000
First long is code size in bytes.
See end of program for examples of how to include users .obj file.
}con
#58,spi_do,spi_di,spi_clk,spi_cs
write_enable = $06
block_unlock = $98
block_erase_64k = $D8
read_status = 5
device_id = $ab
enable_reset = $66
device_reset = $99
read_data = 3
page_program = 2
read_dual = $3b' "Fast Read Dual Output" SPI command'==============================================================================================datorgdrvh #spi_cs
drvl #spi_clk
drvl #spi_di
'faster loadinghubset .clk_mode 'config crystal and PLL - still running RCFASTwaitx ##25_000_000/100'wait for crystal/PLL to ramp upor .clk_mode, #XSEL 'select clock modehubset .clk_mode 'engage'compute checksum for SPI flash bootcall #checksum
'reset flashcall #chip_reset
'erase flashmov addr, #0'erase_stage1call #erase_64k
mov addr, ##$1_0000'erase_codemov blocks, ##512 / 64
.loop
call #erase_64k
add addr, ##$1_0000djnz blocks, #.loop
'copy stage1 loadercall #copy_stage1
'copy code to $1_0000mov byte_count,##@code_end - @code
locptra,#@size
wrlong byte_count,ptracall #copy_code
hubset ##%0001 << 28'hard reset for reboot to Flashjmp #$
.clk_mode long1<<24 + (XDIV-1)<<18 + (XMUL-1)<<8 + XPPPP<<4 + XOSC<<2'------------------------------------------------
chip_reset
call #busy
'read device ID for scope to viewmovpa, #device_id
outl #spi_cs
call #send_byte
call #send_addr24 'dummy addresscall #read_byte
outh #spi_cs
movpb, #2'2 us pause in case was sleepingcall #pause_us
'do the resetcallpa #enable_reset, #send_command
callpa #device_reset, #send_command
movpb, #50'50 us pause to let the interal reset occurcall #pause_us
'clear lockscallpa #write_enable, #send_command
callpa #block_unlock, #send_command
jmp #busy
'------------------------------------------------
erase_64k callpa #write_enable,#send_command
movpa,#block_erase_64k
outl #spi_cs
call #send_byte
call #send_addr24
outh #spi_cs
call #busy
ret
copy_stage1 mov pages,#4mov addr,#0locptra,#@stage1
.loop2 callpa #write_enable,#send_command
mov byte_count,#256outl #spi_cs
movpa,#page_program
call #send_byte
call #send_addr24
.loop rdbytepa,ptra++
call #send_byte
djnz byte_count,#.loop
outh #spi_cs
call #busy
add addr,#256djnz pages,#.loop2
ret
copy_code mov pages,byte_count
shr pages,#8add pages,#2mov addr,##$1_0000locptra,#@size
.loop2 callpa #write_enable,#send_command
mov byte_count,#256outl #spi_cs
movpa,#page_program
call #send_byte
call #send_addr24
.loop rdbytepa,ptra++
call #send_byte
djnz byte_count,#.loop
outh #spi_cs
call #busy
add addr,#256djnz pages,#.loop2
movpb, #2'2 us pausejmp #pause_us
'------------------------------------------------
send_command
outl #spi_cs
call #send_byte
_ret_outh #spi_cs
'------------------------------------------------
send_addr24
getbytepa, addr, #2call #send_byte
getbytepa, addr, #1call #send_byte
getbytepa, addr, #0jmp #send_byte
'------------------------------------------------
send_byte
shlpa, #32-7wcrep @.loop, #8outc #spi_di
outh #spi_clk
shlpa, #1wcoutl #spi_clk
.loop
retwcz'preserve C/Z flags'------------------------------------------------
read_byte
outh #spi_clk
rep @.loop, #7outl #spi_clk 'needs to be about 6 clocks early due to I/O bufferingtestp #spi_do wc'read in bit prior to clockouth #spi_clk
rcl val,#1
.loop
outl #spi_clk
testp #spi_do wc'read final bitrcl val,#1retwcz'preserve C/Z flags'------------------------------------------------
busy
movpa, #read_status
outl #spi_cs
call #send_byte
call #read_byte
outh #spi_cs
testb val, #0wc'write in progressif_ncretwcz'preserve C/Z flagsjmp #busy
'------------------------------------------------
checksum
locptra, #@stage1
movpa, #0rep @.loop, #256rdlongpb, ptra++
addpa, pb
.loop
subrpa, ##$706F7250'Proo'wrlongpa, ptra[-1]
ret'------------------------------------------------
pause_us
rep @.rend, pbwaitx #(CLOCKFREQ / 1_000_000) 'one microsecond - assumes a round number of MHz
.rend
ret
blocks long0
count long0
addr long0
pages long0
xx long0
byte_count long0
val long0'==============================================================================================con
XTALFREQ = 20_000_000'PLL stage 0: crystal frequency
XDIV = 20'PLL stage 1: crystal divider (1..64)
XMUL = 160'PLL stage 2: crystal / div * mul (1..1024)
XDIVP = 1'PLL stage 3: crystal / div * mul / divp (1,2,4,6..30)
XOSC = %10' OSC ' %00=OFF, %01=OSC, %10=15pF, %11=30pF
XSEL = %11' XI+PLL ' %00=rcfast(20+MHz), %01=rcslow(~20KHz), %10=XI(5ms), %11=XI+PLL(10ms)
XPPPP = ((XDIVP>>1) + 15) & $F' 1->15, 2->0, 4->1, 6->2...30->14
CLOCKFREQ = round(float(XTALFREQ) / float(XDIV) * float(XMUL) / float(XDIVP))
AF_PLUS1 = (%0001 << 28)
AF_PLUS2 = (%0010 << 28)
AF_PLUS3 = (%0011 << 28)
BF_PLUS1 = (%0001 << 24)
BF_PLUS2 = (%0010 << 24)
BF_PLUS3 = (%0011 << 24)
P_REGD = (%1 << 16) ' turn on clocked digital I/O (registered pins)
SP_OUT = (%1 << 6) ' force on pin output when DIR operates smartpin
SPM_PULSES = %00100_0 |SP_OUT ' pulse/cycle output
SPM_SSER_TX = %11100_0 |SP_OUT ' sync serial transmit (A-data, B-clock)
SPM_SSER_RX = %11101_0' sync serial receive (A-data, B-clock)
DMADIV = 4'160 MHz sysclock / 4 = 40 MHz SPI clock (with dual SPI makes 80 Mbit/s or 10 MB/s)dat
orgh $400org
stage1
'config pin for SPI chip selectdrvh #spi_cs
drvl #spi_clk
drvl #spi_di
'faster loadinghubset .clk_mode 'config crystal and PLL - still running RCFASTwaitx .pause 'wait for crystal/PLL to ramp upor .clk_mode, #XSEL 'select clock modehubset .clk_mode 'engage'load code @$1_0000 to hub address 0movpb, ##$1_0000'Flash address to loadoutl #spi_cs
callpa #read_dual, #send_byte2
getbytepa, pb, #2'send Flash reading addresscall #send_byte2
getbytepa, pb, #1call #send_byte2
getbytepa, pb, #0call #send_byte2
'config one smartpin for SPI clockwrpin #SPM_PULSES, #spi_clk
dirl #spi_clk 'SPI clock still driven low by the smartpinwxpin ##((DMADIV/2)<<16) | DMADIV, #spi_clk 'pulse width (space->mark) and period respectivelydirh #spi_clk
wypin #8, #spi_clk 'pace out dummy clocks required by "Fast Read Dual Output"waitx #50'config two smartpins for SPI dual datafltl #spi_do
fltl #spi_di
wrpin ##SPM_SSER_RX | BF_PLUS2, #spi_do
wrpin ##SPM_SSER_RX | BF_PLUS1, #spi_di
wxpin #15, #spi_do '32 bits at a timewxpin #15, #spi_di
dirh #spi_do
dirh #spi_di
'get length of binary datasetse1 #(%001<<6)|spi_do
wypin #16, #spi_clk '16 clock for first 32 bits containing binary lengthpollse1'clear prior event - needs a spacer instruction from SETSE1call #read_byte4 'get the "size" valuemovbytspa, #%%0123'endian swap 24bit length in bytesaddpa, #3'round upshrpa, #2'scale to longwordsmov .lcount, pa'full-on continuous burst, right up to sysclock/2!wrfast #0, #0'start FIFO at beginning of hubRAMshlpa, #4'x16 clocks per longwordwypinpa, #spi_clk 'start clocking for the full length
.loop
call #read_byte4
movbytspa, #%%0123'want as little-endianwflongpadjnz .lcount, #.loop
outh #spi_cs
rdfast #0, #0'flush the FIFO'go back to RCFAST mode before handoverandn .clk_mode, #%11'select RCFAST clock mode while retaining the old PLL confighubset .clk_mode 'switch to RCFAST, critical reliability workaround for hardware bughubset #0'shutdown crystal and PLLwaitx .pause 'wait for crystal shutdown, emulating hard reset conditionscoginit #0, #0'kick it!
.clk_mode long1<<24 + (XDIV-1)<<18 + (XMUL-1)<<8 + XPPPP<<4 + XOSC<<2
.pause long25_000_000/100
.lcount long0'------------------------------------------------
send_byte2
shlpa, #32-7wcrep @.loop,#8outc #spi_di
outh #spi_clk
shlpa, #1wcoutl #spi_clk
.loop
ret'------------------------------------------------
read_byte4
waitse1'wait for smartpin (spi_do) buffer full eventrdpinpa, #spi_do '16-bit shift-in as little-endian (odd bits)rdpinpb, #spi_di '(even bits)revpa'but SPI data is stored as big-endian (odd bits)revpb'(even bits)rolwordpa, pb, #0'combine to a single 32-bit word_ret_mergewpa'untangle the odd-even pattern'------------------------------------------------fit$100
orgf $100'==============================================================================================
orgh
size long0'located at Flash address $1000
code
'example code indicating programming suceededdrvh #56'LED56 offdrvl #57'LED57 onrep @.floop, #0'loop forever toggling the LEDsoutnot #56outnot #57waitx ##(25_000_000/4)
.floop
' file "_P2 Invaders 2.0.52_eval.obj"
code_end
EDIT: Done a small tidy up. Added back in crystal clock setting for faster Flash programming. Had originally been removed when I had the unsoldered chip select pin on my revB board and I thought the issue must have been software.
Just done some experimenting with pin registering and found that sync serial smartpin mode surprisingly works better without. And then can even double the SPI clock rate by setting X[5] = 1 of the serial rx smartpins.
I can't visualise why but testing has definitely proved it. Tested a 300 kByte binary using SPI clock = sysclock/2 with sysclock from 4 MHz to 160 MHz on the revA Eval board with its long SPI tracks. So up to 80 MHz SPI clock (20 MBytes/s)! I wouldn't be surprised to see the SPI clock attenuated down to something like one volt.
EDIT: Ah, registering just the SPI clock pin does help a small amount. Err, or not, it fails the 4 MHz sysclock test. Hmm, that's not a good sign ...
EDIT2: Right, given that issue, I figure pin registering all round is a good idea. With both clock and data pins registered the revA Eval Board works up to 60 MHz SPI clock (120 MHz sysclock) and the revB Eval Board works up to 115 MHz SPI clock (230 MHz sysclock). PS: Room temperature of 21 °C.
Hmm, now I've successfully tested a faster config of:
- Falling edge SPI clock. Had always previously been configured as rising edge.
- Post-clock-edge data-in sampling (late sampling).
- Data pins registered, clock pin unregistered.
Works on revB Eval Board at 2 MHz sysclock ( 1 MHz SPI clock) at -10 °C. This was the critical test. Demonstrates not easy to fool into an early sample.
I think the reason it works is because the unregistered SPI clock out has enough of a natural delay line. I'm not quite sure how the post-clock sampling actually works but it seems to still get in before the Flash chip has responded to the falling clock edge. The other three registered/unregistered combinations don't work in this setup.
Doh! The Flash programming routines fail at 300 MHz sysclock. Something else to fix ... hacked around ... Whoa! At 25°C, pulling 360 MHz sysclock (180 MHz SPI clock) now! A mere 35% above rating of the Flash chip. EDIT: Err, well, its rating is at 85 °C.
EDIT2: 360 MHz fell over around 30 °C. 340 MHz made it to 65 °C. 330 MHz got to 80 °C. 320 MHz got about 100 °C.
PS: Take those high measurements with some salt. I'm doing this in an open space with a cheap Smile hair dryer, so the gradients are getting large above 60 °C.
PPS: revA Eval Board reaches 200 MHz sysclock (100 MHz SPI clock) at 21 °C with this config.
EDIT3: Updated source code to de-glitch the transition from 1-bit to 2-bit SPI.
Comments
After that time if a valid 'Prop' checksum is calculated from the 1024 bytes loaded, from SPI FLASH, the code executes.
This will block ypu from starting TAQOZ from a terminal.
P56 blinking just indicates that programming of the flash is complete. To get it to run from the flash requires the reset. So, repeated resets will keep rerunning the program in flash.
So, how can I see the report? I'm programming with Spin 2 GUI and see nothing on the terminal window after I press the reset button, with the switches in the correct position.
Kind regards, Samuel Lourenço
Also, don't need the P59 pull-down for anything...
Kind regards, Samuel Lourenço
I can't figure out why hardcoding the file size doesn't work...
In this test, I've just hard coded the file size and commented out the 3 lines that overwrite the file size. Doesn't work... Can't figure out why not...
It works for the a very small program like "Larson scanner", but not this big one...
Added this line to fix it.
rdlong byte_count,##size
Don't know why, but it works...So, I take the fastspin binary of this and remove the last 32 bytes (fastspin seems to pad the end with 28 bytes of zero for some reason) to remove the file size. Then, I add the filesize. Then, add the program's binary. Add in those 28 zeros (just in case). Load this into ram and it programs the flash.
Now, I have Load Flash option in SpinEdit. Thanks ozpropdev!
I am using your flash loader to load Catalina programs into Flash on the P2_EVAL - many thanks!. But I have noticed that it sometimes does not program correctly on the first attempt, but generally seems to work on the second.
Have you seen anything like this in your own testing?
Ross.
Ah! Thanks. I will amend the program.
Hmmm. Still having problems. Can you point me to the correct way we are supposed to set the clock on boot? I thought I was doing it correctly, but perhaps I am not.
There was also a big discussion in another topic. Basic structure is to remember and reuse the prior mode config to cleanly switch back to RCFAST clock source before making any adjustment to PLL configuration.
I haven't had any issues with the flash loader so far.
I've been using it for quite some time now with my micropython stuff.
My eval board does seem to behave slightly differently to others though.
Yes, it might not be the P2 itself. It could be the P2 EVAL board. Or the boot ROM. I have noticed some odd things previously - they generally seem to sort themselves out with enough power cycles, SD Card removal/re-insertions and/or reboots
But your flash loader is very useful - thanks!
IIRC the current ROM boot code for SD can leave the DO pin from the SD card driven. This interferes with the Flash SPI such that it will not work. This has hopefully been fully corrected in the new ROM by forcing the SD card to release DO after each use/transaction.
Not sure if this is causing your problems.
That may be causing some odd problems I was having with some other code, even if it is not causing this particular problem.
There was an old thread where this was discussed as there is a sequence of clocks (96*8) if memory serves me correctly to force the sad to release the DO pin.
RevB Eval board doesn't seem to program its Flash memory yet. The programming sequence completes but the reset doesn't boot. I'm about to look into this now ...
Only now did I decide to meter the electrical connections for continuity. Amazingly pin #1 (Chip Select) of the Flash chip was floating in air. The solder blob was only on top. Reheating it fixed it. Tested and work now.
This is my current stage1 SPI read routine that is added to front of Flash chip program. Ignore the longer 32 bits.
Tell me if you can understand the comments describing the SPI clocking. The built-in compensation really makes a big difference with the ability to overclock it for higher bit rates. I probably should have just used the hardware resources. I might give that a whirl next.
read_byte4 outh #spi_clk 'OUT takes 4 sysclocks to present to the pin nop outl #spi_clk 'tells SPI chip to clock the second bit out rep @.loop, #31 'one bit per 8 sysclocks, plenty of leeway to accommodate poor slewing outh #spi_clk testp #spi_do wc 'IN takes another 4 or 5 sysclock to present from the pin outl #spi_clk 'SPI chip clocks out data on falling edge rcl pa, #1 .loop nop testp #spi_do wc 'picks up data from OUTL seven instructions prior (14 sysclocks) _ret_ rcl pa, #1 ' the last OUTL is for first bit of next word, if any
Here are new data sheets I spotted, you could glance at, when doing SPI work ?
http://www.avalanche-technology.com/wp-content/uploads/2019/10/1Mb-16Mb-Serial-HP-MRAM.pdf
http://www.avalanche-technology.com/wp-content/uploads/2019/10/1Mb-16Mb-Serial-ULP-MRAM.pdf
P2 may even boot from these ?
Ultra low power one is spec'd at 10MHz which seems quite slow, but may be good enough to boot P2, in a low-power system.
These parts have useful other registers too
RDSR 05h Read Status Register 1 10MHz or 54MHz RDID 0Fh Read Device ID 4 10MHz or 54MHz RUID 4Ch Read Unique ID 8 10MHz or 54MHz RDSN C3h Read Serial Number Register 8 10MHz or 54MHz
Had a quick look and I think I see what your trying to do.
The original code works fine up to system clock <= 275MHz.
What speed are you getting now?
A variant of loader that uses the Hyperflash would be pretty slick.
Yet another thing to add to the forever growing TODO list
PS: The FBGA package is clearly in need of becoming Hyperbus capable.
I've been back at this again. It dawned on me that smartpins doing rx behaves differently enough from tx that It'd make a lot of sense to use smartpins. So I've added a bunch of init code and replaced that read4 routine above again ... for doing dual-SPI fast reads. Nicely suits booting the onboard SPI Flash chip.
{ Prop2 Flash loader Version 1.2 17th January 2019 - ozpropdev 18 Oct 2019 Reengineered the programming bitbashing to resolve an issue that turned out to be a faulty board - Evan H 31 Oct 2019 Modified to use dual smartpins for block reads with DualSPI signalling Writes user code (.obj) and loader into flash. On P2-ES Eval board "FLASH" switch must be on. "CODE" is stored in FLASH starting @ $1_0000 First long is code size in bytes. See end of program for examples of how to include users .obj file. } con #58,spi_do,spi_di,spi_clk,spi_cs write_enable = $06 block_unlock = $98 block_erase_64k = $D8 read_status = 5 device_id = $ab enable_reset = $66 device_reset = $99 read_data = 3 page_program = 2 read_dual = $3b ' "Fast Read Dual Output" SPI command '============================================================================================== dat org drvh #spi_cs drvl #spi_clk drvl #spi_di 'faster loading hubset .clk_mode 'config crystal and PLL - still running RCFAST waitx ##25_000_000/100 'wait for crystal/PLL to ramp up or .clk_mode, #XSEL 'select clock mode hubset .clk_mode 'engage 'compute checksum for SPI flash boot call #checksum 'reset flash call #chip_reset 'erase flash mov addr, #0 'erase_stage1 call #erase_64k mov addr, ##$1_0000 'erase_code mov blocks, ##512 / 64 .loop call #erase_64k add addr, ##$1_0000 djnz blocks, #.loop 'copy stage1 loader call #copy_stage1 'copy code to $1_0000 mov byte_count,##@code_end - @code loc ptra,#@size wrlong byte_count,ptra call #copy_code hubset ##%0001 << 28 'hard reset for reboot to Flash jmp #$ .clk_mode long 1<<24 + (XDIV-1)<<18 + (XMUL-1)<<8 + XPPPP<<4 + XOSC<<2 '------------------------------------------------ chip_reset call #busy 'read device ID for scope to view mov pa, #device_id outl #spi_cs call #send_byte call #send_addr24 'dummy address call #read_byte outh #spi_cs mov pb, #2 '2 us pause in case was sleeping call #pause_us 'do the reset callpa #enable_reset, #send_command callpa #device_reset, #send_command mov pb, #50 '50 us pause to let the interal reset occur call #pause_us 'clear locks callpa #write_enable, #send_command callpa #block_unlock, #send_command jmp #busy '------------------------------------------------ erase_64k callpa #write_enable,#send_command mov pa,#block_erase_64k outl #spi_cs call #send_byte call #send_addr24 outh #spi_cs call #busy ret copy_stage1 mov pages,#4 mov addr,#0 loc ptra,#@stage1 .loop2 callpa #write_enable,#send_command mov byte_count,#256 outl #spi_cs mov pa,#page_program call #send_byte call #send_addr24 .loop rdbyte pa,ptra++ call #send_byte djnz byte_count,#.loop outh #spi_cs call #busy add addr,#256 djnz pages,#.loop2 ret copy_code mov pages,byte_count shr pages,#8 add pages,#2 mov addr,##$1_0000 loc ptra,#@size .loop2 callpa #write_enable,#send_command mov byte_count,#256 outl #spi_cs mov pa,#page_program call #send_byte call #send_addr24 .loop rdbyte pa,ptra++ call #send_byte djnz byte_count,#.loop outh #spi_cs call #busy add addr,#256 djnz pages,#.loop2 mov pb, #2 '2 us pause jmp #pause_us '------------------------------------------------ send_command outl #spi_cs call #send_byte _ret_ outh #spi_cs '------------------------------------------------ send_addr24 getbyte pa, addr, #2 call #send_byte getbyte pa, addr, #1 call #send_byte getbyte pa, addr, #0 jmp #send_byte '------------------------------------------------ send_byte shl pa, #32-7 wc rep @.loop, #8 outc #spi_di outh #spi_clk shl pa, #1 wc outl #spi_clk .loop ret wcz 'preserve C/Z flags '------------------------------------------------ read_byte outh #spi_clk rep @.loop, #7 outl #spi_clk 'needs to be about 6 clocks early due to I/O buffering testp #spi_do wc 'read in bit prior to clock outh #spi_clk rcl val,#1 .loop outl #spi_clk testp #spi_do wc 'read final bit rcl val,#1 ret wcz 'preserve C/Z flags '------------------------------------------------ busy mov pa, #read_status outl #spi_cs call #send_byte call #read_byte outh #spi_cs testb val, #0 wc 'write in progress if_nc ret wcz 'preserve C/Z flags jmp #busy '------------------------------------------------ checksum loc ptra, #@stage1 mov pa, #0 rep @.loop, #256 rdlong pb, ptra++ add pa, pb .loop subr pa, ##$706F7250 'Proo' wrlong pa, ptra[-1] ret '------------------------------------------------ pause_us rep @.rend, pb waitx #(CLOCKFREQ / 1_000_000) 'one microsecond - assumes a round number of MHz .rend ret blocks long 0 count long 0 addr long 0 pages long 0 xx long 0 byte_count long 0 val long 0 '============================================================================================== con XTALFREQ = 20_000_000 'PLL stage 0: crystal frequency XDIV = 20 'PLL stage 1: crystal divider (1..64) XMUL = 160 'PLL stage 2: crystal / div * mul (1..1024) XDIVP = 1 'PLL stage 3: crystal / div * mul / divp (1,2,4,6..30) XOSC = %10 ' OSC ' %00=OFF, %01=OSC, %10=15pF, %11=30pF XSEL = %11 ' XI+PLL ' %00=rcfast(20+MHz), %01=rcslow(~20KHz), %10=XI(5ms), %11=XI+PLL(10ms) XPPPP = ((XDIVP>>1) + 15) & $F ' 1->15, 2->0, 4->1, 6->2...30->14 CLOCKFREQ = round(float(XTALFREQ) / float(XDIV) * float(XMUL) / float(XDIVP)) AF_PLUS1 = (%0001 << 28) AF_PLUS2 = (%0010 << 28) AF_PLUS3 = (%0011 << 28) BF_PLUS1 = (%0001 << 24) BF_PLUS2 = (%0010 << 24) BF_PLUS3 = (%0011 << 24) P_REGD = (%1 << 16) ' turn on clocked digital I/O (registered pins) SP_OUT = (%1 << 6) ' force on pin output when DIR operates smartpin SPM_PULSES = %00100_0 |SP_OUT ' pulse/cycle output SPM_SSER_TX = %11100_0 |SP_OUT ' sync serial transmit (A-data, B-clock) SPM_SSER_RX = %11101_0 ' sync serial receive (A-data, B-clock) DMADIV = 4 '160 MHz sysclock / 4 = 40 MHz SPI clock (with dual SPI makes 80 Mbit/s or 10 MB/s) dat orgh $400 org stage1 'config pin for SPI chip select drvh #spi_cs drvl #spi_clk drvl #spi_di 'faster loading hubset .clk_mode 'config crystal and PLL - still running RCFAST waitx .pause 'wait for crystal/PLL to ramp up or .clk_mode, #XSEL 'select clock mode hubset .clk_mode 'engage 'load code @$1_0000 to hub address 0 mov pb, ##$1_0000 'Flash address to load outl #spi_cs callpa #read_dual, #send_byte2 getbyte pa, pb, #2 'send Flash reading address call #send_byte2 getbyte pa, pb, #1 call #send_byte2 getbyte pa, pb, #0 call #send_byte2 'config one smartpin for SPI clock wrpin #SPM_PULSES, #spi_clk dirl #spi_clk 'SPI clock still driven low by the smartpin wxpin ##((DMADIV/2)<<16) | DMADIV, #spi_clk 'pulse width (space->mark) and period respectively dirh #spi_clk wypin #8, #spi_clk 'pace out dummy clocks required by "Fast Read Dual Output" waitx #50 'config two smartpins for SPI dual data fltl #spi_do fltl #spi_di wrpin ##SPM_SSER_RX | BF_PLUS2, #spi_do wrpin ##SPM_SSER_RX | BF_PLUS1, #spi_di wxpin #15, #spi_do '32 bits at a time wxpin #15, #spi_di dirh #spi_do dirh #spi_di 'get length of binary data setse1 #(%001<<6)|spi_do wypin #16, #spi_clk '16 clock for first 32 bits containing binary length pollse1 'clear prior event - needs a spacer instruction from SETSE1 call #read_byte4 'get the "size" value movbyts pa, #%%0123 'endian swap 24bit length in bytes add pa, #3 'round up shr pa, #2 'scale to longwords mov .lcount, pa 'full-on continuous burst, right up to sysclock/2! wrfast #0, #0 'start FIFO at beginning of hubRAM shl pa, #4 'x16 clocks per longword wypin pa, #spi_clk 'start clocking for the full length .loop call #read_byte4 movbyts pa, #%%0123 'want as little-endian wflong pa djnz .lcount, #.loop outh #spi_cs rdfast #0, #0 'flush the FIFO 'go back to RCFAST mode before handover andn .clk_mode, #%11 'select RCFAST clock mode while retaining the old PLL config hubset .clk_mode 'switch to RCFAST, critical reliability workaround for hardware bug hubset #0 'shutdown crystal and PLL waitx .pause 'wait for crystal shutdown, emulating hard reset conditions coginit #0, #0 'kick it! .clk_mode long 1<<24 + (XDIV-1)<<18 + (XMUL-1)<<8 + XPPPP<<4 + XOSC<<2 .pause long 25_000_000/100 .lcount long 0 '------------------------------------------------ send_byte2 shl pa, #32-7 wc rep @.loop,#8 outc #spi_di outh #spi_clk shl pa, #1 wc outl #spi_clk .loop ret '------------------------------------------------ read_byte4 waitse1 'wait for smartpin (spi_do) buffer full event rdpin pa, #spi_do '16-bit shift-in as little-endian (odd bits) rdpin pb, #spi_di '(even bits) rev pa 'but SPI data is stored as big-endian (odd bits) rev pb '(even bits) rolword pa, pb, #0 'combine to a single 32-bit word _ret_ mergew pa 'untangle the odd-even pattern '------------------------------------------------ fit $100 orgf $100 '============================================================================================== orgh size long 0 'located at Flash address $1000 code 'example code indicating programming suceeded drvh #56 'LED56 off drvl #57 'LED57 on rep @.floop, #0 'loop forever toggling the LEDs outnot #56 outnot #57 waitx ##(25_000_000/4) .floop ' file "_P2 Invaders 2.0.52_eval.obj" code_end
EDIT: Done a small tidy up. Added back in crystal clock setting for faster Flash programming. Had originally been removed when I had the unsoldered chip select pin on my revB board and I thought the issue must have been software.
I can't visualise why but testing has definitely proved it. Tested a 300 kByte binary using SPI clock = sysclock/2 with sysclock from 4 MHz to 160 MHz on the revA Eval board with its long SPI tracks. So up to 80 MHz SPI clock (20 MBytes/s)! I wouldn't be surprised to see the SPI clock attenuated down to something like one volt.
EDIT: Ah, registering just the SPI clock pin does help a small amount. Err, or not, it fails the 4 MHz sysclock test. Hmm, that's not a good sign ...
EDIT2: Right, given that issue, I figure pin registering all round is a good idea. With both clock and data pins registered the revA Eval Board works up to 60 MHz SPI clock (120 MHz sysclock) and the revB Eval Board works up to 115 MHz SPI clock (230 MHz sysclock). PS: Room temperature of 21 °C.
- Falling edge SPI clock. Had always previously been configured as rising edge.
- Post-clock-edge data-in sampling (late sampling).
- Data pins registered, clock pin unregistered.
Works on revB Eval Board at 2 MHz sysclock ( 1 MHz SPI clock) at -10 °C. This was the critical test. Demonstrates not easy to fool into an early sample.
I think the reason it works is because the unregistered SPI clock out has enough of a natural delay line. I'm not quite sure how the post-clock sampling actually works but it seems to still get in before the Flash chip has responded to the falling clock edge. The other three registered/unregistered combinations don't work in this setup.
Doh! The Flash programming routines fail at 300 MHz sysclock.
EDIT2: 360 MHz fell over around 30 °C. 340 MHz made it to 65 °C. 330 MHz got to 80 °C. 320 MHz got about 100 °C.
PS: Take those high measurements with some salt. I'm doing this in an open space with a cheap Smile hair dryer, so the gradients are getting large above 60 °C.
PPS: revA Eval Board reaches 200 MHz sysclock (100 MHz SPI clock) at 21 °C with this config.
EDIT3: Updated source code to de-glitch the transition from 1-bit to 2-bit SPI.