P2 Boot Decision Tree Suggestions

Rayman · 2016-09-28 17:52

I think SST only supports what WinBond calls QPI. Winbond's SQI is still in 1-bit command mode, so there's no need to exit anything.

SST lets you exit 4-bit command mode with either 2 or 8 clocks of $FF:

Reset Quad I/O (RSTQIO)
The Reset Quad I/O instruction, FFH, resets the device to 1-bit SPI protocol operation. To execute a
Reset Quad I/O operation, the host drives CE# low, sends the Reset Quad I/O command cycle (FFH)
then, drives CE# high. The device accepts either SPI (8 clocks) or SQI (2 clocks) command cycles. For
SPI, SIO[3:1] are don’t care for this command, but should be driven to VIH or VIL.

Rayman · 2016-09-28 17:53

Looking at this thing:
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2011/20110809_S108_Cosaro.pdf

We also need to be able to exit XIP mode.
This is a continuous read mode where no opcode is sent.
I have to think that $FF will exit that too, but should make sure...

jmg · 2016-09-28 19:17

cgracey wrote: »

Thanks, jmg.

Now, what would you add to get better coverage (ie $66, $99)?

I'd release the 0xff & poll variants first, and let users try to find parts where that does not work.

Certainly, ($66, $99) is the next step, and that seems a newer reset path, but this does add delays -
what is not quite clear in data, is if the BUSY flag is active during a ($66, $99) exit.
My guess is yes, that is included, but it could be easily tested.

A busy test before a ($66, $99) might be enough to bring all delays down to a common denominator ~ 40us, but even here, it would be nice to check if BUSY applies, as a busy based ready, is preferable to a fixed-time one.

Another variant would be to wait 40us, then poll that covers parts that may not signal busy, and also covers those that may be > 40us, and do signal busy.

I'd guess BUSY code also has some fail-fallout, so you cannot freeze forever in a busy poll. Note some erase max times are quite long.

cgracey · 2016-09-28 19:36

Rayman wrote: »

Looking at this thing:
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2011/20110809_S108_Cosaro.pdf

We also need to be able to exit XIP mode.
This is a continuous read mode where no opcode is sent.
I have to think that $FF will exit that too, but should make sure...

Yes, getting out of XIP is another issue. From a Micron datasheet:

Power Loss Rescue Sequence

If a power loss occurs during a WRITE NONVOLATILE CONFIGURATION REGISTER
command, after the next power-on, the device might begin in an undetermined state
(XIP mode or an unnecessary protocol). If this happens, until the next power-up, a rescue
sequence must reset the device to a fixed state (extended SPI protocol without XIP).
After the rescue sequence, the issue should be resolved by running the WRITE NONVOLATILE
CONFIGURATION REGISTER command again. The rescue sequence is composed
of two parts that must be run in the correct order. During the entire sequence,
tSHSL2 must be at least 50ns. The first part of the sequence is DQ0 (PAD DATA) and
DQ3 (PAD HOLD) equal to 1 for the situations listed below:
• 7 clock cycles within S# LOW (S# becomes HIGH before 8th clock cycle)
• + 13 clock cycles within S# LOW (S# becomes HIGH before 14th clock cycle)
• + 25 clock cycles within S# LOW (S# becomes HIGH before 26th clock cycle)
The second part of the sequence is exiting from dual or quad SPI protocol by using the
following FFh sequence: DQ0 and DQ3 equal to 1 for 8 clock cycles within S# LOW; S#
becomes HIGH before 9th clock cycle.
After this two-part sequence the extended SPI protocol is active.

cgracey · 2016-09-28 19:38

jmg wrote: »

...Note some erase max times are quite long.

Bulk erase on some large devices is ~200 seconds. That would be like waiting a long time for nothing.

jmg · 2016-09-28 19:47

Rayman wrote: »

Reset Quad I/O (RSTQIO)
The Reset Quad I/O instruction, FFH, resets the device to 1-bit SPI protocol operation. To execute a
Reset Quad I/O operation, the host drives CE# low, sends the Reset Quad I/O command cycle (FFH)
then, drives CE# high. The device accepts either SPI (8 clocks) or SQI (2 clocks) command cycles. For
SPI, SIO[3:1] are don’t care for this command, but should be driven to VIH or VIL.

That's covered in #1.

Rayman wrote: »

I'm on the fence with the $66, $99. NXP decided just to give a $FF in 4-pin mode and that's it.

Yes, $FF is a starting point, but that is an older basis.

Rayman wrote: »

So, what if flash was in process of being written to when P2 got reset?
Maybe the $66,$99 would help.

Erase is the slowest operation - Macronix data does give detailed exit delays for various modes - Pgm and Erase with Erase the longest. Those do seem to be much less using $66,$99, than the core operation, but still 'quite long' in real time.

tREADY2 - RESET  for Macronix MX25R1635F (2x3mm)
Reset Recovery time (During instruction decoding)  40 us
Reset Recovery time (for read operation)           35 us
Reset Recovery time (for program operation)       310 us
Reset Recovery time(for SE4KB operation)           12 ms
Reset Recovery time (for BE32K/64K operation)      25 ms
Reset Recovery time (for Chip Erase operation)    100 ms
Reset Recovery time (for WRSR operation)           40 ms

High Performance Mode
tBP Byte-Program                                        32 100 us
Byte-Program (Applied Vhv at WP# pin)                   32 100 us
tPP Page Program Cycle Time                            0.85 4 ms
Page Program Cycle Time (Applied Vhv at WP# pin)       0.6 3.6 ms
tSE Sector Erase Cycle Time                            40 240 ms
Sector Erase Cycle Time (Applied Vhv at WP# pin)       36 210 ms
tBE32K Block Erase (32KB) Cycle Time                  0.24 1.5 s
Block Erase (32KB) Cycle Time (Applied Vhv at WP# pin) 0.22 1.05 s
tBE Block Erase (64KB) Cycle Time                      0.48 3 s
Block Erase (64KB) Cycle Time (Applied Vhv at WP# pin) 0.43 2.1 s
tCE Chip Erase Cycle Time                             13 38 s
Chip Erase Cycle Time (Applied Vhv at WP# pin)        12 34 s

Some of this can be tested, on top of a first pass ROM $ff combo preamble.

It seems erase is a serial-train of smaller erases, and from a system control viewpoint, the $66,$99 hard reset command, is much faster, than a simple wait till erase/pgm-done.
It also seems that even @ reset here, they do try to complete the operation - maybe a brutal short circuit, (aka power removal) gives partial erase, and results in soft field reliability issues.

jmg · 2016-09-28 19:50

Rayman wrote: »

Looking at this thing:
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2011/20110809_S108_Cosaro.pdf

We also need to be able to exit XIP mode.
This is a continuous read mode where no opcode is sent.
I have to think that $FF will exit that too, but should make sure...

Yes, some field tests are now needed.
There is also this brand-new
http://www.adestotech.com/products/ecoxip/
I'm waiting on more data on, as it covers 1-pin SPI, so what they do around reset & exits will be informative.

Rayman · 2016-09-28 19:57

The Micron procedure Chip posted is interesting.
Very detailed.
Guess we need to see if it works with the other chips...

The other option is just tell people not to use XIP mode.

The 4-pin command mode is superior to these other modes,
so I really don't see why people would use it.
It divides by 4 the number of clocks you need to send a command.
If people just avoid XIP mode, we're back to just $FF...

jmg · 2016-09-28 20:02

cgracey wrote: »

jmg wrote: »

...Note some erase max times are quite long.

Bulk erase on some large devices is ~200 seconds. That would be like waiting a long time for nothing.

Yes, see also my post after yours, about the Macronix times when using $66,$99
There, it seems they can slash waits to 100ms worst case on Erase, but still compete an atomic operation safely.
ie I'd say Chip Erase is not complete, but what is erased, is erased properly.

jmg · 2016-09-28 20:07

cgracey wrote: »

Yes, getting out of XIP is another issue. From a Micron datasheet:
...
• 7 clock cycles within S# LOW (S# becomes HIGH before 8th clock cycle)
• + 13 clock cycles within S# LOW (S# becomes HIGH before 14th clock cycle)
• + 25 clock cycles within S# LOW (S# becomes HIGH before 26th clock cycle)
...

I saw that, but it seems a quite strange clock count, as many SPI ports cannot actually generate 7,13,25 clocks.
They do not say this actually fails, if you give 8,16 clocks.
That's why most specify 8,16, even though the mode exit occurs slightly before the end of the frame.

If they really do need precisely 7 clocks, (not >= 7) I'd be fine with that part going in the not-supported basket. Plenty more work.

Cluso99 · 2016-09-28 20:38

May I suggest something to get us started...

We just boot from Serial and EEPROM

        P2            EEPROM         
        pin           I2C            
        ----------    -------------  
        61  oi        SDA    io      
        60  out       CLK    in      


1.      P63=SI=1 ?              Y: Try booting from serial
2.      Pullup on P60 CLK ?     Y: Boot from I2C
3.      ---not yet implemented FLASH---
4.      ---not yet implemented SD---
5.      Serial Debug Mode???

Most of us will have EEPROM we can use from a P1 project.

We can then get working code to for FLASH on different pins.

In parallel, we can get working code for SD on different pins.

These solutions can be tested by the many, and result in code back to Chip to be included as a test.

Ultimately, it does not matter whether EEPROM is ultimately included or not. At least it gets us running quickly for testing actual P2 code.

Discussing is one thing. The actual doing is the important part !!!

cgracey · 2016-09-28 20:55

Cluso99 wrote: »
May I suggest something to get us started...
We just boot from Serial and EEPROM

        P2            EEPROM         
        pin           I2C            
        ----------    -------------  
        61  oi        SDA    io      
        60  out       CLK    in      


1.      P63=SI=1 ?              Y: Try booting from serial
2.      Pullup on P60 CLK ?     Y: Boot from I2C
3.      ---not yet implemented FLASH---
4.      ---not yet implemented SD---
5.      Serial Debug Mode???           
Most of us will have EEPROM we can use from a P1 project.

We can then get working code to for FLASH on different pins.

In parallel, we can get working code for SD on different pins.

These solutions can be tested by the many, and result in code back to Chip to be included as a test.

Ultimately, it does not matter whether EEPROM is ultimately included or not. At least it gets us running quickly for testing actual P2 code.

Discussing is one thing. The actual doing is the important part !!!

This feels like one of those problems that can't be solved, yet. The right way eventually percolates up, but you don't know when it will. This is what has taken me so long, all along. There are usually problems, or more like imperfections, that are due to matters not being optimally bounded, so the solution can't come, because the context is flawed. That seems to be where we are with this. My gut says dirt simple is probably the best approach, given the mushrooming complexity we have here.

I really like this: A 3-pin or 4-pin SPI flash solution (4 would be okay as it would permit dual mode), with an overlapped, but mutually exclusive, 4-pin SPI flash connection. That could get us down to 4 fixed, contiguous pins that could boot SPI flash or SD card. Works on all devices, limits pin ghetto, and sets a simple standard. Maybe it could even be arranged so that the next two pins down *could* be used as two more data bits, if the user wanted, but no toggling of these pins occurs on boot. This would mean that the data nibbes would be backwards, but that is trivial to compensate for in software.

jmg · 2016-09-28 21:08

cgracey wrote: »

... That could get us down to 4 fixed, contiguous pins that could boot SPI flash or SD card. Works on all devices, limits pin ghetto, and sets a simple standard. Maybe it could even be arranged so that the next two pins down *could* be used as two more data bits, if the user wanted, but no toggling of these pins occurs on boot. This would mean that the data nibbles would be backwards, but that is trivial to compensate for in software.

If you seek contiguous pins, and want to avoid 'pin ghetto', then the detail that needs attention is the "data nibbles would be backwards"

Can that nibble(/byte?) bit reverse be a hardware config choice, even if just on boot-area pins, as it will be important to allow SD programming anywhere, and software reverses are going to be both slow, and very prone to error.

or, maybe do 63,62 for Uart and 0,1,2,3 for SPI ? that does avoid pin-gaps.

My suggestion would be to ship a build with data nibbles correct for streamer speed, and think about contiguous pins in the meantime.
Fastest streamer operation, and simplest code, to me easily trumps any cosmetic pin ordering.
HW nibble-swap may allow both.

Allowing a PCB design for 8b wide, with either 2x Quad, or 1 x ATXP032 should also be possible.
Such a design would boot in 1b SPI, but user code would stream 8b.

Cluso99 · 2016-09-28 23:04

cgracey wrote: »
Cluso99 wrote: »
May I suggest something to get us started...
We just boot from Serial and EEPROM

        P2            EEPROM         
        pin           I2C            
        ----------    -------------  
        61  oi        SDA    io      
        60  out       CLK    in      


1.      P63=SI=1 ?              Y: Try booting from serial
2.      Pullup on P60 CLK ?     Y: Boot from I2C
3.      ---not yet implemented FLASH---
4.      ---not yet implemented SD---
5.      Serial Debug Mode???           
Most of us will have EEPROM we can use from a P1 project.

We can then get working code to for FLASH on different pins.

In parallel, we can get working code for SD on different pins.

These solutions can be tested by the many, and result in code back to Chip to be included as a test.

Ultimately, it does not matter whether EEPROM is ultimately included or not. At least it gets us running quickly for testing actual P2 code.

Discussing is one thing. The actual doing is the important part !!!
This feels like one of those problems that can't be solved, yet. The right way eventually percolates up, but you don't know when it will. This is what has taken me so long, all along. There are usually problems, or more like imperfections, that are due to matters not being optimally bounded, so the solution can't come, because the context is flawed. That seems to be where we are with this. My gut says dirt simple is probably the best approach, given the mushrooming complexity we have here.

I really like this: A 3-pin or 4-pin SPI flash solution (4 would be okay as it would permit dual mode), with an overlapped, but mutually exclusive, 4-pin SPI SD flash connection. That could get us down to 4 fixed, contiguous pins that could boot SPI flash or SD card. Works on all devices, limits pin ghetto, and sets a simple standard. Maybe it could even be arranged so that the next two pins down *could* be used as two more data bits, if the user wanted, but no toggling of these pins occurs on boot. This would mean that the data nibbes would be backwards, but that is trivial to compensate for in software.

"My gut says dirt simple is probably the best approach, given the mushrooming complexity we have here."
Agreed.

cgracey · 2016-09-29 05:35

I called up Kye tonight, who's the guy that wrote the Prop1 SD card driver with FAT support. I asked him all kinds of questions about SD cards. Here is what I learned:

SD 4-bit mode is not only subject to licensing, but requires CRC calculations involving every nibble. This is a problem that needs a hardware solution. We can stream nibbles at 80MHz, but without dedicated hardware, we can't compute the CRC's anywhere near that speed in software. We may only be 4x faster than SPI if we went through all the trouble of implementing (and paying for) this.

In SD SPI mode, you are limited to 25MHz bit rate, but don't have to worry about CRC's. This is much more practical for the current Prop2. This is also a 4-pin interface, instead of a 6-pin.

Kye said he scoured the web for many days, piecing together how to do SD. The problem is that while the SD association more-or-less explains the SPI interface, the O.S. vendors all do different things with their data structures, which end up requiring heuristic analysis to know what you are dealing with in each case, since you'll need to adapt your approach, accordingly. This is where things become really complicated. He felt that booting without FAT awareness was rather iffy. So, you end up dealing with Byzantine file systems if you want broad compatibility.

Perhaps Cluso and Peter can pull off a reliable SD SPI boot over the next few weeks. If we can get something really reliable, we can add it to the ROM booter. FAT-friendly would be good. It sure would be neat to boot right off an SD card that you dropped some files onto from your Windows/Linux/Mac machine.

Meanwhile, I really like the idea of the 3-wire SPI flash approach with shorted DI/DO signals. I know you guys see it as irrationally limiting, but I do not want to set the stage for any reliability problems. This has to work perfectly. We just need simple storage, no frills. Shorted DI/DO cuts off the possibility of even flirting with anything beyond 1-bit mode, keeping us clear of warm reset problems. You can get a 512K byte SPI flash for $0.17, or a 16M byte device for $1.52, qty 1, which can also serve as a local SSD. This SPI flash concept is three pins, no drama, very cheap, every vendor, and plenty of storage. This is the only thing I have a good feeling about right now, aside from serial. This is what I'm going to release in v12:

pin	usage		note
-------------------------------------------------------------------------
63	RX		(also TX for half-duplex command response)
62	TX		(only driven during full-duplex command response)
61	SPI CSn		(only driven if pull-up on pin 61, HOLDn=WPn=1)
60	SPI CLK		(only driven if pull-up on pin 61, HOLDn=WPn=1)
59	SPI DI+DO	(only driven if pull-up on pin 61, HOLDn=WPn=1)

jmg · 2016-09-29 06:24

cgracey wrote: »

...Shorted DI/DO cuts off the possibility of even flirting with anything beyond 1-bit mode, keeping us clear of warm reset problems.

Sorry, nope, that claim is simply miss-placed optimism.

Any user can add a resistor to bridge DI/DO, and voila, they have the P2 thinking it is 3 Pin booting, but they can use Quad (or Oct) for their code, and the performance gain that gives.
Many users clearly will choose to do this.

Because you cannot exclude Quad Modes, you do need to include Quad Exit preambles, even in your 3-Pin code.
That is what NXP have done, and what ESP8266 does.

Even supposing you could impose the technically impractical 'Thou shalt use 3Pin only' edict onto a restless user base, think through what that means ?

They now have to connect TWO FLASH parts, one for booting, and one for Speed. Check the BOM and Pin Cost now.
Users will react to having to consume MORE pins, and manage TWO memory images. They will ask why ?

You can release a ROM with only one SPI pin set, but you cannot leave out the Quad Exit preambles, without a serious market impact.

cgracey · 2016-09-29 06:49

jmg wrote: »

cgracey wrote: »

...Shorted DI/DO cuts off the possibility of even flirting with anything beyond 1-bit mode, keeping us clear of warm reset problems.

Sorry, nope, that claim is simply miss-placed optimism.

Any user can add a resistor to bridge DI/DO, and voila, they have the P2 thinking it is 3 Pin booting, but they can use Quad (or Oct) for their code, and the performance gain that gives.
Many users clearly will choose to do this.

Because you cannot exclude Quad Modes, you do need to include Quad Exit preambles, even in your 3-Pin code.
That is what NXP have done, and what ESP8266 does.

Even supposing you could impose the technically impractical 'Thou shalt use 3Pin only' edict onto a restless user base, think through what that means ?

They now have to connect TWO FLASH parts, one for booting, and one for Speed. Check the BOM and Pin Cost now.
Users will react to having to consume MORE pins, and manage TWO memory images. They will ask why ?

You can release a ROM with only one SPI pin set, but you cannot leave out the Quad Exit preambles, without a serious market impact.

Okay. I forgot about the resistor trick.

How about those quad-exit commands? You mean the $FF patterns of variable CLK counts?

If asked, I will say that I had to destroy the P2 to save the P2. Everybody's familiar with that.

dMajo · 2016-09-29 07:02

David Betz wrote: »

dMajo wrote: »

Rayman wrote: »

NXP just requires that all devices accept $FF in 4-pin mode to exit and return to 1-pin mode.
The only other thing required it to have the same read command.

They say

Any device that can accept a 03 read serial opcode after receiving an FF opcode is expected to boot successfully.
A device that switches to quad opcodes and doesn't return after an 0xff reset to serial mode might not boot after a reset.

So I think they expect to exit SQI issuing "FF" in SPI, I mean using 1b command. This expectation is wrong in case of QPI. Nothing wrong in this, they simply not support this kind of devices.

That is probably a good compromise for P2 as well. We support SQI but not QPI.

Supporting QPI is not an issue at all.
It is enough to issue 2 clock pulses with DI(IO0) high and then deassert CS (drive high). After that you issue the standard SPI "FF" (1bit, 8 clocks).
The first 2 clocks with DI high together with 3 weak pullup (eg. 100K) on IO1(DO), IO2(WP) IOe(HOLD/RESET) will transmit a valid "FF" for QPI. The users that want to use the flash in QPI will add the 3 pullups on the board. For all the others the 2 transmitted values (DI=HIGH) will not hurt. The CS going high after these two bits will reset the internal logic for the next command and the two bits, if the device is already in SPI, will be ignored.

The 3 external pullups can be an immediate boot test, without any flash access, to differentiate between 3/4/6 pin wiring of the flash.
- If no pullups, you can read flash status to detect presence otherwise boot from other device
- if only pullup on DO, it means flash present in SPI, you can immediately start booting.
- if all 3 pullups, flash present, potentially in SQI/QPI, start booting after have eventually switched it back to SPI

jmg · 2016-09-29 07:03

cgracey wrote: »

How about those quad-exit commands? You mean the $FF patterns of variable CLK counts?

Yes! - the ones given in #1, you asked about before.

Any SD card code will include similar exit-preambles, Cluso has mentioned a couple already.

Such preambles are essentially invisible to the user, unless they scope the pins -- but they do notice if they are not there

Cluso99 · 2016-09-29 07:16

cgracey wrote: »
I called up Kye tonight, who's the guy that wrote the Prop1 SD card driver with FAT support. I asked him all kinds of questions about SD cards. Here is what I learned:

SD 4-bit mode is not only subject to licensing, but requires CRC calculations involving every nibble. This is a problem that needs a hardware solution. We can stream nibbles at 80MHz, but without dedicated hardware, we can't compute the CRC's anywhere near that speed in software. We may only be 4x faster than SPI if we went through all the trouble of implementing (and paying for) this.

In SD SPI mode, you are limited to 25MHz bit rate, but don't have to worry about CRC's. This is much more practical for the current Prop2. This is also a 4-pin interface, instead of a 6-pin.

Kye said he scoured the web for many days, piecing together how to do SD. The problem is that while the SD association more-or-less explains the SPI interface, the O.S. vendors all do different things with their data structures, which end up requiring heuristic analysis to know what you are dealing with in each case, since you'll need to adapt your approach, accordingly. This is where things become really complicated. He felt that booting without FAT awareness was rather iffy. So, you end up dealing with Byzantine file systems if you want broad compatibility.

Perhaps Cluso and Peter can pull off a reliable SD SPI boot over the next few weeks. If we can get something really reliable, we can add it to the ROM booter. FAT-friendly would be good. It sure would be neat to boot right off an SD card that you dropped some files onto from your Windows/Linux/Mac machine.

Meanwhile, I really like the idea of the 3-wire SPI flash approach with shorted DI/DO signals. I know you guys see it as irrationally limiting, but I do not want to set the stage for any reliability problems. This has to work perfectly. We just need simple storage, no frills. Shorted DI/DO cuts off the possibility of even flirting with anything beyond 1-bit mode, keeping us clear of warm reset problems. You can get a 512K byte SPI flash for $0.17, or a 16M byte device for $1.52, qty 1, which can also serve as a local SSD. This SPI flash concept is three pins, no drama, very cheap, every vendor, and plenty of storage. This is the only thing I have a good feeling about right now, aside from serial. This is what I'm going to release in v12:
pin	usage		note
-------------------------------------------------------------------------
63	RX		(also TX for half-duplex command response)
62	TX		(only driven during full-duplex command response)
61	SPI CSn		(only driven if pull-up on pin 61, HOLDn=WPn=1)
60	SPI CLK		(only driven if pull-up on pin 61, HOLDn=WPn=1)
59	SPI DI+DO	(only driven if pull-up on pin 61, HOLDn=WPn=1)

Chip,
I have to totally disagree with Kye. His code actually places too much reliability on bytes being correct in the right places. This means that his code rejects a number of SD Cards that have not been precisely formatted to "his" correctness.

I use Kye's code, as I do with a number of prior SD code sets. I have needed to modify them as I have previously explained.
I have hundreds in the field that have a minimal boot to SD (on write protected EEPROM) which then accesses the FAT16/32 files. I also have 8MB contiguous files on the cards that I use to run HDD's under CPM2.2 format. The original ZiCog emulation did not even have a FAT file system. Heater and I just used CPM to write to raw sectors.

If you are in any doubt about this, grab the P8XBlade2 that I sent to Ken earlier in the year. Email/call me (Ken has all my details) and I will email you a zip file which you can expand and copy to almost any SD Card (I prefer SanDisk as they seem to be the most reliable from other forums) and copy these files to the FAT16/32 root directory. Plug it into my P8XBlade with a propplug (not pin compatible) conected to a PC and terminal (PST) and 115200 baud, and a 5V power. You will see my Prop OS boot straight up. Type ? <cr> and you will obtain a list of DOS style commands supported.

Alternately, here is a link to the latest released v1.10 zip file. https://forums.parallax.com/discussion/comment/1368157/#Comment_1368157
A few posts above this describe the operation.
Here is a link to my P8XBlade2 schematic
forums.parallax.com/discussion/comment/1358420/#Comment_1358420

In no way should we demand that FAT16 or FAT32 be on the drive. SD cards are now available that already exceed the maximum size of FAT32 systems. A lot of vendors are pre-installing exFAT on those cards, but there is a Microsoft licencing issue here.

Peter's and my way ignores any past/current/future file system by using a the MBR (sector 0) to contain a small identifier, a sector start address, and a length. These items point to a contiguous section on the SD Card where the SD boot code will load code from. Nothing could be simpler, and more future proof. Users can use FAT16, FAT32, exFAT (licencing issues), RAW sector addressing, separate Partition, or any other method that can be thought of.
All this is required, is for the data/file to be located, and the MBR written/updated. This can be done on a P1, a P2, a Windows PC or a *NIX PC.

cgracey · 2016-09-29 07:26

Cluso99 wrote: »
cgracey wrote: »
I called up Kye tonight, who's the guy that wrote the Prop1 SD card driver with FAT support. I asked him all kinds of questions about SD cards. Here is what I learned:

SD 4-bit mode is not only subject to licensing, but requires CRC calculations involving every nibble. This is a problem that needs a hardware solution. We can stream nibbles at 80MHz, but without dedicated hardware, we can't compute the CRC's anywhere near that speed in software. We may only be 4x faster than SPI if we went through all the trouble of implementing (and paying for) this.

In SD SPI mode, you are limited to 25MHz bit rate, but don't have to worry about CRC's. This is much more practical for the current Prop2. This is also a 4-pin interface, instead of a 6-pin.

Kye said he scoured the web for many days, piecing together how to do SD. The problem is that while the SD association more-or-less explains the SPI interface, the O.S. vendors all do different things with their data structures, which end up requiring heuristic analysis to know what you are dealing with in each case, since you'll need to adapt your approach, accordingly. This is where things become really complicated. He felt that booting without FAT awareness was rather iffy. So, you end up dealing with Byzantine file systems if you want broad compatibility.

Perhaps Cluso and Peter can pull off a reliable SD SPI boot over the next few weeks. If we can get something really reliable, we can add it to the ROM booter. FAT-friendly would be good. It sure would be neat to boot right off an SD card that you dropped some files onto from your Windows/Linux/Mac machine.

Meanwhile, I really like the idea of the 3-wire SPI flash approach with shorted DI/DO signals. I know you guys see it as irrationally limiting, but I do not want to set the stage for any reliability problems. This has to work perfectly. We just need simple storage, no frills. Shorted DI/DO cuts off the possibility of even flirting with anything beyond 1-bit mode, keeping us clear of warm reset problems. You can get a 512K byte SPI flash for $0.17, or a 16M byte device for $1.52, qty 1, which can also serve as a local SSD. This SPI flash concept is three pins, no drama, very cheap, every vendor, and plenty of storage. This is the only thing I have a good feeling about right now, aside from serial. This is what I'm going to release in v12:
pin	usage		note
-------------------------------------------------------------------------
63	RX		(also TX for half-duplex command response)
62	TX		(only driven during full-duplex command response)
61	SPI CSn		(only driven if pull-up on pin 61, HOLDn=WPn=1)
60	SPI CLK		(only driven if pull-up on pin 61, HOLDn=WPn=1)
59	SPI DI+DO	(only driven if pull-up on pin 61, HOLDn=WPn=1)
Chip,
I have to totally disagree with Kye. His code actually places too much reliability on bytes being correct in the right places. This means that his code rejects a number of SD Cards that have not been precisely formatted to "his" correctness.

I use Kye's code, as I do with a number of prior SD code sets. I have needed to modify them as I have previously explained.
I have hundreds in the field that have a minimal boot to SD (on write protected EEPROM) which then accesses the FAT16/32 files. I also have 8MB contiguous files on the cards that I use to run HDD's under CPM2.2 format. The original ZiCog emulation did not even have a FAT file system. Heater and I just used CPM to write to raw sectors.

In no way should we demand that FAT16 or FAT32 be on the drive. SD cards are now available that already exceed the maximum size of FAT32 systems. A lot of vendors are pre-installing exFAT on those cards, but there is a Microsoft licencing issue here.

Peter's and my way ignores any past/current/future file system by using a the MBR (sector 0) to contain a small identifier, a sector start address, and a length. These items point to a contiguous section on the SD Card where the SD boot code will load code from. Nothing could be simpler, and more future proof. Users can use FAT16, FAT32, exFAT (licencing issues), RAW sector addressing, separate Partition, or any other method that can be thought of.
All this is required, is for the data/file to be located, and the MBR written/updated. This can be done on a P1, a P2, a Windows PC or a *NIX PC.

That sounds great, Cluso! Sounds deterministic, too.

I was thinking about these extra-clock-after command issues, and how some hosts drive the SD card CLK continuously. Could it be that there is a PLL inside these FAT cards that needs a steady clock to work most reliably? The smart pin can do the data shifting for you, you know, based on the CLK, which can be generated by another smart pin.

Cluso99 · 2016-09-29 07:42

cgracey wrote: »
Cluso99 wrote: »
cgracey wrote: »
I called up Kye tonight, who's the guy that wrote the Prop1 SD card driver with FAT support. I asked him all kinds of questions about SD cards. Here is what I learned:

SD 4-bit mode is not only subject to licensing, but requires CRC calculations involving every nibble. This is a problem that needs a hardware solution. We can stream nibbles at 80MHz, but without dedicated hardware, we can't compute the CRC's anywhere near that speed in software. We may only be 4x faster than SPI if we went through all the trouble of implementing (and paying for) this.

In SD SPI mode, you are limited to 25MHz bit rate, but don't have to worry about CRC's. This is much more practical for the current Prop2. This is also a 4-pin interface, instead of a 6-pin.

Kye said he scoured the web for many days, piecing together how to do SD. The problem is that while the SD association more-or-less explains the SPI interface, the O.S. vendors all do different things with their data structures, which end up requiring heuristic analysis to know what you are dealing with in each case, since you'll need to adapt your approach, accordingly. This is where things become really complicated. He felt that booting without FAT awareness was rather iffy. So, you end up dealing with Byzantine file systems if you want broad compatibility.

Perhaps Cluso and Peter can pull off a reliable SD SPI boot over the next few weeks. If we can get something really reliable, we can add it to the ROM booter. FAT-friendly would be good. It sure would be neat to boot right off an SD card that you dropped some files onto from your Windows/Linux/Mac machine.

Meanwhile, I really like the idea of the 3-wire SPI flash approach with shorted DI/DO signals. I know you guys see it as irrationally limiting, but I do not want to set the stage for any reliability problems. This has to work perfectly. We just need simple storage, no frills. Shorted DI/DO cuts off the possibility of even flirting with anything beyond 1-bit mode, keeping us clear of warm reset problems. You can get a 512K byte SPI flash for $0.17, or a 16M byte device for $1.52, qty 1, which can also serve as a local SSD. This SPI flash concept is three pins, no drama, very cheap, every vendor, and plenty of storage. This is the only thing I have a good feeling about right now, aside from serial. This is what I'm going to release in v12:
pin	usage		note
-------------------------------------------------------------------------
63	RX		(also TX for half-duplex command response)
62	TX		(only driven during full-duplex command response)
61	SPI CSn		(only driven if pull-up on pin 61, HOLDn=WPn=1)
60	SPI CLK		(only driven if pull-up on pin 61, HOLDn=WPn=1)
59	SPI DI+DO	(only driven if pull-up on pin 61, HOLDn=WPn=1)
Chip,
I have to totally disagree with Kye. His code actually places too much reliability on bytes being correct in the right places. This means that his code rejects a number of SD Cards that have not been precisely formatted to "his" correctness.

I use Kye's code, as I do with a number of prior SD code sets. I have needed to modify them as I have previously explained.
I have hundreds in the field that have a minimal boot to SD (on write protected EEPROM) which then accesses the FAT16/32 files. I also have 8MB contiguous files on the cards that I use to run HDD's under CPM2.2 format. The original ZiCog emulation did not even have a FAT file system. Heater and I just used CPM to write to raw sectors.

In no way should we demand that FAT16 or FAT32 be on the drive. SD cards are now available that already exceed the maximum size of FAT32 systems. A lot of vendors are pre-installing exFAT on those cards, but there is a Microsoft licencing issue here.

Peter's and my way ignores any past/current/future file system by using a the MBR (sector 0) to contain a small identifier, a sector start address, and a length. These items point to a contiguous section on the SD Card where the SD boot code will load code from. Nothing could be simpler, and more future proof. Users can use FAT16, FAT32, exFAT (licencing issues), RAW sector addressing, separate Partition, or any other method that can be thought of.
All this is required, is for the data/file to be located, and the MBR written/updated. This can be done on a P1, a P2, a Windows PC or a *NIX PC.
That sounds great, Cluso! Sounds deterministic, too.

I was thinking about these extra-clock-after command issues, and how some hosts drive the SD card CLK continuously. Could it be that there is a PLL inside these FAT cards that needs a steady clock to work most reliably? The smart pin can do the data shifting for you, you know, based on the CLK, which can be generated by another smart pin.

I am just doing a tidyup to the code since I found the bug blocking one of my cards from doing a cold boot from power up. Then I will test all the SD's here. Then I will put it up on the P1 forum for testing, while I convert it to P2.

dMajo · 2016-09-29 08:26

jmg wrote: »

...
I saw that, but it seems a quite strange clock count, as many SPI ports cannot actually generate 7,13,25 clocks.
...

Usually hw SPI peripherals in various MCU have a CS function (part of the SPI hardware) only in slave mode. When operating as master the CS is driven by GPIO. Thus I think by carefully programming you are able to drive it high even before 8/16 clock pulses. This is special/recovery case and during this operation the other interrupt sources can be temporary disabled to accomplish the task.

dMajo · 2016-09-29 08:32

Rayman wrote: »

The Micron procedure Chip posted is interesting.
Very detailed.
Guess we need to see if it works with the other chips...

The other option is just tell people not to use XIP mode.

The 4-pin command mode is superior to these other modes,
so I really don't see why people would use it.
It divides by 4 the number of clocks you need to send a command.
If people just avoid XIP mode, we're back to just $FF...

I think it have to do with Fmax.
When fast reading the device after the opcode and address you have dummy bytes befor the first data is output at higher bus speeds.
I presume that when in XIP, the flash have not to decode opcodes and so address data reads can happen at faster bus speeds also, mainly for program execution from flash so random readonly access.

cgracey · 2016-09-29 09:00

jmg wrote: »

cgracey wrote: »

How about those quad-exit commands? You mean the $FF patterns of variable CLK counts?

Yes! - the ones given in #1, you asked about before.

Any SD card code will include similar exit-preambles, Cluso has mentioned a couple already.

Such preambles are essentially invisible to the user, unless they scope the pins -- but they do notice if they are not there

Okay, so these should do it, in this order?

Send_1b_CS(8,$ff)         ' classic M4 exit Quad command, as in NXP etc
Send_1b_CS(16,$ffff)     ' classic M4 exit Dual command
Send_1b_CS(2,$ff)         ' nibble coverage, in case any parts lack the above, assumes Pullups on DIO3..DIO0. Parts listed below do not need this.
' <<  by here, BUS should be back in Single-bit mode.
PollBusy                        ' confirm device connected, and confirm is not Busy

' Optional firm reset - for some newer, larger parts.
Send_1b_CS(8,$66)         ' newer 66H,99H command pair
Send_1b_CS(8,$99)         ' newer 66H,99H command pair
Wait(>40us)                   ' does more than a pin-mode change
PollBusy                         ' unclear if poll busy can be used for 40us timing

' check device ID for <> 00,ff as connected device.

I thought of a way I could impede dual/quad function.... Hold pin 58 low during SPI boot read. That would be like standing on someone's foot while their shoelaces are tied together. No booting for wise guys!

cgracey · 2016-09-29 09:08

Another more passive and effective way we could prevent booting with clandestine 2-bit or 4-bit setups would be to toggle pin 59 (supposed to be DI+DO) in a pseudo-random pattern for maybe 100 cycles and make sure no other INA/INB bits reflected those states consistently. If a violation was discovered, we could brick the chip by programming a real random pattern into the fuses. And maybe if wifi was detected over RX/TX, we could alert the IRS of un-American activities. We have ways.

Cluso99 · 2016-09-29 09:49

cgracey wrote: »

Another more passive and effective way we could prevent booting with clandestine 2-bit or 4-bit setups would be to toggle pin 59 (supposed to be DI+DO) in a pseudo-random pattern for maybe 100 cycles and make sure no other INA/INB bits reflected those states consistently. If a violation was discovered, we could brick the chip by programming a real random pattern into the fuses. And maybe if wifi was detected over RX/TX, we could alert the IRS of un-American activities. We have ways.

Or just add an internal FET and connect the I/O volts to the core logic

Ah, I love the smell of burnt chips

Rayman · 2016-09-29 10:33

Sounds like we can have it both ways with the resistor jmg suggests...

Chip can keep it with DI&DO tied together.
All we would ask is to lower CS, output a 1 on DO, toggle the clock twice then raise CS.

If the other Quad data pins have pullup resistors, that should exit QPI mode, right?

Bill Henning · 2016-09-29 16:53

Quick Drive-By notes:

1) I HATE the idea of 3 pin only, 4 pin with resistor is far more flexible

with two more pins following for four bit mode

2) 4 bit SD card has a 32 bit crc per I/O pin as I recall, painful! (but I think I read that the patent may have expired)

3) It has been discovered in Raspberry Pi land that a lot of "UHS-1" SD cards can go WAY beyond 25MHz... 100MHz seems doable in 4 bit mode, may also work in 1 bit mode.

jmg · 2016-09-29 19:37

cgracey wrote: »

Okay, so these should do it, in this order?

Send_1b_CS(8,$ff)         ' classic M4 exit Quad command, as in NXP etc
Send_1b_CS(16,$ffff)     ' classic M4 exit Dual command
Send_1b_CS(2,$ff)         ' nibble coverage, in case any parts lack the above, assumes Pullups on DIO3..DIO0. 
' <<  by here, BUS should be back in Single-bit mode.
PollBusy                        ' confirm device connected, and confirm is not Busy

' Optional firm reset - for some newer, larger parts.
Send_1b_CS(8,$66)         ' newer 66H,99H command pair
Send_1b_CS(8,$99)         ' newer 66H,99H command pair
Wait(>40us)                   ' does more than a pin-mode change
PollBusy                         ' unclear if poll busy can be used for 40us timing

' check device ID for <> 00,ff as connected device.

Yes. Certainly all 3 $ff variants, and Poll, should be in the next test.

The more optional $66,$99 is less vital, so is up to you, as that needs timing, and/or maybe polling.
This command does seem to open a way to recover during ERASE without very long polling delays.

Macronix seem to have the best specs around $66,$99 delays, where they have
ChipErase exit 100ms vs < 300s(!) on one of their larger 512Mb.
64k Block exit 25ms vs <2s

P2 Boot Decision Tree Suggestions

Comments