Do you intend to connect this board directly to a P2 ES?
Are the board dimensions already defined? Apart from connectors and BGA footprints for two HyperSomething and also two EMMC memories, are there any other ICs that you want to be placed on it?
On the left can be HyperRam or HyperFlash or MCP with both on board.
I'm not sure that eMMC can share the bus, but can test them out separately.
Don't think I need anything else, except some capacitors.
Perhaps you could try to fit some speed enhancement options, such as 1.85 V core and I/O, provided you agree to add the required voltage-level translation IC footprints.
I know there will be some expressive space and price concerns to be taken in acount, but the forthcoming P2 silicon revision could push the limits of control, data and clock signalings more then a bit out of the comfort region of a 3.3V design.
It look to me like 250 MHz is the place to be (this is where VGA over HDMI can be).
It'd be interesting to see if can actually interact at 250 MBPS.
That might be worth the trouble of 1.8 V translation.
Or, maybe P2 can work in digital mode at 1.8 V?
If that could work, don't need second chip to achieve goals... Would save 8 I/O pins too...
Is it to be understood that, if you choose to use only the 1.8 V option, you are thinking in considering dismissing the second extra HyperSomething and EMMC chips?
Perhaps you could try to fit some speed enhancement options, such as 1.85 V core and I/O, provided you agree to add the required voltage-level translation IC footprints.
I know there will be some expressive space and price concerns to be taken in acount, but the forthcoming P2 silicon revision could push the limits of control, data and clock signalings more then a bit out of the comfort region of a 3.3V design.
Finding level shifters that do not lose speed would be a challenge ?
It look to me like 250 MHz is the place to be (this is where VGA over HDMI can be).
It'd be interesting to see if can actually interact at 250 MBPS.
One test point would be to see if 100MHz is possible, then try incremental moves to 125MHz ( I think it's working now, up to 75MHz ? )
I notice the Octaflash specs quote 133MHz CLK DTR, (3V/133MHz, 1.8V/200MHz) so that might run at 125MHz to give 250MBytes/sec - no need for level shifters ?
Finding level shifters that do not lose speed would be a challenge ?
As we have seen during Rayman and RJSM HyperRam-related software developments, the position of a read-data window can be slided, like the panning controls of a professional music record studio equipment. Then, just check Nexperia's 74AVCH20T245, Rev. 6 — 14 January 2019 datasheets for some available values.
There are also TI's options to be considered, I'll need to review a lot of previously downloaded datasheets and bookmarks, to find the right references.
One test point would be to see if 100MHz is possible, then try incremental moves to 125MHz ( I think it's working now, up to 75MHz ? )...
Since both HyperBUS parts and Octaflash parts spec 200MHz/1.8V, it may be the 100MHz HyperBUS 3v3 number is quite conservative ?
Just ensure you provide some way to adjust both HyperRam VCC/VCCQ and P2 directly-connected signaling VIOs, enabling them to be dialed-down towards 2.7V, and we all can be surprised with the resulting speeds.
Just ensure you provide some way to adjust both HyperRam VCC/VCCQ and P2 directly-connected signaling VIOs, enabling them to be dialed-down towards 2.7V, and we all can be surprised with the resulting speeds.
Surprised in what way ?
The usual CMOS action is to get slower as Vcc drops, so there is unlikely to be any speed gain in lowering VIO/VCC.
The 1.8V faster spec has more to do with a different die/mode, and a differential clock support.
It may be the 3v3 part operates at 1.8V internally (just like P2 does) and it is slower because it has internal level translators, and has to slew more volts.
Nexperia's part datasheet only shows a TSSOP56, 0.5 mm pitch footprint option, TI's shows TSSOP56, 0.5 mm pitch, DGV (R-PDSO-G**) (narrower/shorter), 0.4 mm pitch, and, for BGA addicts, a JRBGA (Junior???; We'll see...) 6 X 10 (lacking four, at the central region), 0.65 mm pitch.
Surprised in what way ?
The usual CMOS action is to get slower as Vcc drops, so there is unlikely to be any speed gain in lowering VIO/VCC.
The 1.8V faster spec has more to do with a different die/mode, and a differential clock support.
It may be the 3v3 part operates at 1.8V internally (just like P2 does) and it is slower because it has internal level translators, and has to slew more volts.
The slower spec for 3.0 V Vcc/VccQ Hyper parts has more to do with the external control/data buses generating/picking excess switching noise than any other concern.
As that info could be someway "detrimental" from a technological POV of some chipmakers, it was wipped off from the curiosity of regullar web crawllers.
Could still be available at the dark side, but I ceased and desisted to have any intention to swimm at that kind of pool, a long time ago.
I've had a document with that exact information in front of my curious eyes, some three years ago (perhaps a little bit more), and I'm almost sure I have saved it, somewhere.
Two failing and older XP systems, and a hard disk replacement at my W 8.1 system, did turned the task of finding it, among three 1 TB, rarely used, external backup units a real pain. Could do it, anyway, but I have been using that info, from my neural backup, since them. It's readilly available, no dust removal necessary.
It look to me like 250 MHz is the place to be (this is where VGA over HDMI can be).
It'd be interesting to see if can actually interact at 250 MBPS.
That might be worth the trouble of 1.8 V translation.
Or, maybe P2 can work in digital mode at 1.8 V?
If that could work, don't need second chip to achieve goals... Would save 8 I/O pins too...
Chip said somewhere that the i/o would operate much slower and weaker at 1.8v, so keep the P2 VIO up at 3v3 and use the DAC modes to drive the digital 1.8v outputs, and PinA>D to watch the inputs
That's exactly the reason I'd cited P2's ability to pan the window, as a profs studio equipment.
What ability ? There are no pin-delay abilities in the Smart pin cells, the only user choice is pin-register or not, and that adds a D-FF in each direction to the IO pin.
That does tighten up the aperture spread across the pins, but there is no capability to move or pan, the D-FF sampling point or clock.
What ability ? There are no pin-delay abilities in the Smart pin cells, the only user choice is pin-register or not, and that adds a D-FF in each direction to the IO pin.
That does tighten up the aperture spread across the pins, but there is no capability to move or pan, the D-FF sampling point or clock.
I'm talking about the ability of controlling the relative position of HyperCK, as an outgoing control signal, generated by a Smart pin cell, and the moment, timed in Sysclks, that the reading Streamer grabs incoming data bus.
Since HyperCk does the stimulus that HyperRam needs to serve each data Word; first byte after CK rising edge, second byte, after CK falling edge, the question of how near/far from each edge can be determined by periodically sampling a known pattern, in order to achieve the best data window expectations, along each operational period, TBD.
Another approach can be inserting known contents between datablocks, whose patterns can also be checked, on-the-fly.
Sure, they are both space and time consuming options, but, if higher data rates can be confirmed as being deterministically achievable thru such technics, perhaps they worth the extra efforts.
I tried using hyperflash with same VGA code as hyperram and it didn't work...
Gives me all $FF when read.
Think I see why now from datasheet:
HyperFlash write transactions do not support burst sequence and ignore the burst type indication. Write command transactions
transfer a single word per write. Only the Word Program command write data transfer may be done with a linear burst at up to 50
MHz.
So, the first part here is clear enough. Can only write 2 bytes at a time. I have no idea what they are trying to tell me with that last sentence...
Ok, I get it now... There's an unlock sequence you have to do first and then a finalizing command after... More complex than HyperRam…
Also, I can only program half a page at a time (512 bytes)…
Actually, that finalizing command is only needed when using the write buffer. Using the write buffer looks like a pain, so I'll just use the "word program" command. Then, it's not so bad. Just the unlock codes and then linear write, just like with HyperRam, but limited to 512 instead of 1024 bytes.
Ok, I get it now... There's an unlock sequence you have to do first and then a finalizing command after... More complex than HyperRam…
Also, I can only program half a page at a time (512 bytes)…
Actually, that finalizing command is only needed when using the write buffer. Using the write buffer looks like a pain, so I'll just use the "word program" command. Then, it's not so bad. Just the unlock codes and then linear write, just like with HyperRam, but limited to 512 instead of 1024 bytes.
Hi Rayman
You'll also need to slow down Hyper_Clock to a value less or equal to 50 MHz, and follow the Burst Write protocols, as depicted at the part datasheet.
Despite the natural similarities, because both are part of the HyperBus-enabled devices family, be aware to the fact that Latency counts are used in a differing way, when compared to HyperRams.
If you intend to use the same command structure you have been using, it'll be valid only for Read commands, but the number of latency Hyper_CK counts is different. Please check "VCR and NVCR Configuration Register Bit Assignments" table and explanation, to avoid being trapped, just from the beggining.
Extracted from a ISSI part datasheet (IS26KS128S/256S/512S)
"A write operation starts with the first three clock cycles providing the CAx (Command / Address) information indicating the transaction characteristics. The Burst Type bit CA[45) is ‘don’t care’ because the HyperFlash device only supports a single write transaction of 16b or a continuous linear write burst that is only supported when loading data during a Word Program command. Immediately following the CA information the host is able to transfer the write data on the DQ bus."
During the 512 byte burst programm, you are writing to a Program Buffer, that will be latelly writen to the internal flash array.
HyperRams uses a similar process, but you'll don't need to poll any status bit, in order to verify it has completed the intended action.
Also, HyperRams don't refuses to programm any value at the main memory array.
HyperFlashs will do it, if the value can't be programmed, due to incompatibility between the previous contents any location can be holding, and the one you intend to write over it.
Is like sleeping over a steel nails bed, after being retired. Each turn, a fresh new pain... and scars....
P.S. the following text was suposed to be at the top... bad eyes here...
That kind of minuet-alike protocols are a real pain.
And even when you get to understand them, and craft some software to deal with the interface signaling, you'll need to add extensive comments to each line of code. Or you'll be caught again, in a future time, trying to leverage the code you have, without paying much attention to any subtle differences, introduced by some new ways of using it you may have found.
Seems I didn't read the datasheet closely enough...
Starting to get it now.
They don't use the HyperBus Command/Address scheme of using bit #46 to specify if read is from memory or register space.
Instead, they use overlay commands to make registers appear in memory space.
This is a bit of a nightmare... Says they did it for driver compatibility with earlier memory...
This is a bit of a nightmare... Says they did it for driver compatibility with earlier memory...
LOL, So instead of cleaning up the mess or making it compatible with the more reasonable hyper ram interface lets make it compatible with the nightmare interface. Wow, true genius...NOT.
Looks like just a fixed delay with the default latency will get the correct timing.
Replaced waitse1 with waitx.
But, the waitx delay seems to depend on clock frequency.
At 250 MHz, #64 or #65 work. At 80 MHz, #62 or #63 work.
The required delay does not seem to depend on whether data bus is on P0..P7 or P32..P39 (thankfully).
Looks like just a fixed delay with the default latency will get the correct timing.
Replaced waitse1 with waitx.
But, the waitx delay seems to depend on clock frequency.
At 250 MHz, #64 or #65 work. At 80 MHz, #62 or #63 work.
The required delay does not seem to depend on whether data bus is on P0..P7 or P32..P39 (thankfully).
Are you using the clocked IOs modes here ?
That tightens up the timing skews, (and adds another clock delay at the pin), but the tighter delays should be ok 80MHz~250MHz.
Evanh's tests before indicate somewhere above 300MHz for a clocked-step effect, but much lower for non-clocked.
Comments
Do you intend to connect this board directly to a P2 ES?
Are the board dimensions already defined? Apart from connectors and BGA footprints for two HyperSomething and also two EMMC memories, are there any other ICs that you want to be placed on it?
I'm not sure that eMMC can share the bus, but can test them out separately.
Don't think I need anything else, except some capacitors.
I know there will be some expressive space and price concerns to be taken in acount, but the forthcoming P2 silicon revision could push the limits of control, data and clock signalings more then a bit out of the comfort region of a 3.3V design.
It'd be interesting to see if can actually interact at 250 MBPS.
That might be worth the trouble of 1.8 V translation.
Or, maybe P2 can work in digital mode at 1.8 V?
If that could work, don't need second chip to achieve goals... Would save 8 I/O pins too...
Is it to be understood that, if you choose to use only the 1.8 V option, you are thinking in considering dismissing the second extra HyperSomething and EMMC chips?
Finding level shifters that do not lose speed would be a challenge ?
One test point would be to see if 100MHz is possible, then try incremental moves to 125MHz ( I think it's working now, up to 75MHz ? )
I notice the Octaflash specs quote 133MHz CLK DTR, (3V/133MHz, 1.8V/200MHz) so that might run at 125MHz to give 250MBytes/sec - no need for level shifters ?
A useful speed and pinout comparison is here :
http://www.macronix.com/Lists/ApplicationNote/Attachments/2013/AN0473V2 - Comparing Cypress S26KL512S to Macronix MX25LM51245G.pdf
I see they also mention SO16-300 package ?
Since both HyperBUS parts and Octaflash parts spec 200MHz/1.8V, it may be the 100MHz HyperBUS 3v3 number is quite conservative ?
As we have seen during Rayman and RJSM HyperRam-related software developments, the position of a read-data window can be slided, like the panning controls of a professional music record studio equipment. Then, just check Nexperia's 74AVCH20T245, Rev. 6 — 14 January 2019 datasheets for some available values.
There are also TI's options to be considered, I'll need to review a lot of previously downloaded datasheets and bookmarks, to find the right references.
Just ensure you provide some way to adjust both HyperRam VCC/VCCQ and P2 directly-connected signaling VIOs, enabling them to be dialed-down towards 2.7V, and we all can be surprised with the resulting speeds.
that specs a range of 0.5~3.9 ns delay spread in the fast direction and 0.5~5.1 ns in the slow direction.
Surprised in what way ?
The usual CMOS action is to get slower as Vcc drops, so there is unlikely to be any speed gain in lowering VIO/VCC.
The 1.8V faster spec has more to do with a different die/mode, and a differential clock support.
It may be the 3v3 part operates at 1.8V internally (just like P2 does) and it is slower because it has internal level translators, and has to slew more volts.
https://assets.nexperia.com/documents/data-sheet/74AVCH20T245.pdf
ti.com/lit/ds/symlink/sn74avch20t245.pdf
Nexperia's part datasheet only shows a TSSOP56, 0.5 mm pitch footprint option, TI's shows TSSOP56, 0.5 mm pitch, DGV (R-PDSO-G**) (narrower/shorter), 0.4 mm pitch, and, for BGA addicts, a JRBGA (Junior???; We'll see...) 6 X 10 (lacking four, at the central region), 0.65 mm pitch.
In other words, options for every taste.
That's exactly the reason I'd cited P2's ability to pan the window, as a profs studio equipment.
The slower spec for 3.0 V Vcc/VccQ Hyper parts has more to do with the external control/data buses generating/picking excess switching noise than any other concern.
As that info could be someway "detrimental" from a technological POV of some chipmakers, it was wipped off from the curiosity of regullar web crawllers.
Could still be available at the dark side, but I ceased and desisted to have any intention to swimm at that kind of pool, a long time ago.
I've had a document with that exact information in front of my curious eyes, some three years ago (perhaps a little bit more), and I'm almost sure I have saved it, somewhere.
Two failing and older XP systems, and a hard disk replacement at my W 8.1 system, did turned the task of finding it, among three 1 TB, rarely used, external backup units a real pain. Could do it, anyway, but I have been using that info, from my neural backup, since them. It's readilly available, no dust removal necessary.
Chip said somewhere that the i/o would operate much slower and weaker at 1.8v, so keep the P2 VIO up at 3v3 and use the DAC modes to drive the digital 1.8v outputs, and PinA>D to watch the inputs
What ability ? There are no pin-delay abilities in the Smart pin cells, the only user choice is pin-register or not, and that adds a D-FF in each direction to the IO pin.
That does tighten up the aperture spread across the pins, but there is no capability to move or pan, the D-FF sampling point or clock.
I'm talking about the ability of controlling the relative position of HyperCK, as an outgoing control signal, generated by a Smart pin cell, and the moment, timed in Sysclks, that the reading Streamer grabs incoming data bus.
Since HyperCk does the stimulus that HyperRam needs to serve each data Word; first byte after CK rising edge, second byte, after CK falling edge, the question of how near/far from each edge can be determined by periodically sampling a known pattern, in order to achieve the best data window expectations, along each operational period, TBD.
Another approach can be inserting known contents between datablocks, whose patterns can also be checked, on-the-fly.
Sure, they are both space and time consuming options, but, if higher data rates can be confirmed as being deterministically achievable thru such technics, perhaps they worth the extra efforts.
Gives me all $FF when read.
Think I see why now from datasheet:
So, the first part here is clear enough. Can only write 2 bytes at a time. I have no idea what they are trying to tell me with that last sentence...
Also, I can only program half a page at a time (512 bytes)…
Actually, that finalizing command is only needed when using the write buffer. Using the write buffer looks like a pain, so I'll just use the "word program" command. Then, it's not so bad. Just the unlock codes and then linear write, just like with HyperRam, but limited to 512 instead of 1024 bytes.
Hi Rayman
You'll also need to slow down Hyper_Clock to a value less or equal to 50 MHz, and follow the Burst Write protocols, as depicted at the part datasheet.
Despite the natural similarities, because both are part of the HyperBus-enabled devices family, be aware to the fact that Latency counts are used in a differing way, when compared to HyperRams.
If you intend to use the same command structure you have been using, it'll be valid only for Read commands, but the number of latency Hyper_CK counts is different. Please check "VCR and NVCR Configuration Register Bit Assignments" table and explanation, to avoid being trapped, just from the beggining.
But, if we were doing a lot of small reads, we'd probably want to minimize latency...
"A write operation starts with the first three clock cycles providing the CAx (Command / Address) information indicating the transaction characteristics. The Burst Type bit CA[45) is ‘don’t care’ because the HyperFlash device only supports a single write transaction of 16b or a continuous linear write burst that is only supported when loading data during a Word Program command. Immediately following the CA information the host is able to transfer the write data on the DQ bus."
During the 512 byte burst programm, you are writing to a Program Buffer, that will be latelly writen to the internal flash array.
HyperRams uses a similar process, but you'll don't need to poll any status bit, in order to verify it has completed the intended action.
HyperFlashs will do it, if the value can't be programmed, due to incompatibility between the previous contents any location can be holding, and the one you intend to write over it.
Good point!
P.S. the following text was suposed to be at the top... bad eyes here...
That kind of minuet-alike protocols are a real pain.
And even when you get to understand them, and craft some software to deal with the interface signaling, you'll need to add extensive comments to each line of code. Or you'll be caught again, in a future time, trying to leverage the code you have, without paying much attention to any subtle differences, introduced by some new ways of using it you may have found.
Not having any luck programming it yet...
I've been trying to read the Device ID and always getting $FF, no matter what I do...
Finally, got a some zeros when following instructions for reading the status register. Getting $00,$80 which appears to be the correct power up value.
Starting to get it now.
They don't use the HyperBus Command/Address scheme of using bit #46 to specify if read is from memory or register space.
Instead, they use overlay commands to make registers appear in memory space.
This is a bit of a nightmare... Says they did it for driver compatibility with earlier memory...
Thought it was dead on with HyperRam. Have to see what's going on here...
LOL, So instead of cleaning up the mess or making it compatible with the more reasonable hyper ram interface lets make it compatible with the nightmare interface. Wow, true genius...NOT.
So, this technically works as VGA buffer, since not using all of a 1024 byte row. But, not ideal...
Replaced waitse1 with waitx.
But, the waitx delay seems to depend on clock frequency.
At 250 MHz, #64 or #65 work. At 80 MHz, #62 or #63 work.
The required delay does not seem to depend on whether data bus is on P0..P7 or P32..P39 (thankfully).
Are you using the clocked IOs modes here ?
That tightens up the timing skews, (and adds another clock delay at the pin), but the tighter delays should be ok 80MHz~250MHz.
Evanh's tests before indicate somewhere above 300MHz for a clocked-step effect, but much lower for non-clocked.