Spansion HyperBUS for P1, P1V, & P2 ?
jmg
Posts: 15,173
The Spansion Hyperbus is close to full release, there is mention on the web of a S26KL512S HyperFlash Demo Board, but no price/order info yet
Some HyperBUS timing info is here
http://core.spansion.com/article/spansion-hyperbus-interface-enables-breakneck-read-throughput-speeds/#.VDMX7lcjAkI
Published device codes are S26KL128S S26KL256S S26KL512S (S26KL512S due first)
There is also web mention of possible upcoming RAM devices using this, no part codes yet.
I think there is growing potential in External Memory, and considerable overlap in ALL the languages that run a byte-code-like inner engine. ( PropGCC, Tachyon, eBasic3 etc )
SPI was good, but a little slow especially with the lack of HW SPI support.
Then came QuadSPI, which has more potential, but I think the break-through device for P1, P1V, (and P2) will be the Spansion Hyperbus.
HyperBUS is 8 bits wide, clocked, DDR, and I think does not use PLLs so is tolerant of pauses in the clocks.
A P1 SW version will never push the limits, but a P1V and P2 are very different, and can bring HW support to this.
The beauty of this, is the SAME software flows work on all 3, just faster each time.
Unlike QuadSPI, the commands are byte wide, and it needs just 6 writes to fully define a command and address.
(At the HW detail, I think a simple 1G57 type delay can allow a CLK edge from the same byte-Write, using the Prop 9 bit mode, that saves needing WriteByte/WriteClock/WriteByte/WriteClock pairs so can double SW speed)
Read then specs a 5 clock Latency, and RDS shows when data is available, and can stream from there.
(unclear if that scales with CLk speed, or is always 5 ?)
Write currently looks to be 16b only, but should be plenty fast enough.
A P1 should manage ~ 4MBytes/sec write loop, which is likely to be well above the Flash Write speed, and is faster than any download link. (Spansion SPI data gives write ~ 1MB/s)
Suggestion: Get this working in PASM first, on P1 + PropGCC, and then look at how P1V can improve the throughput.
Some HyperBUS timing info is here
http://core.spansion.com/article/spansion-hyperbus-interface-enables-breakneck-read-throughput-speeds/#.VDMX7lcjAkI
Published device codes are S26KL128S S26KL256S S26KL512S (S26KL512S due first)
There is also web mention of possible upcoming RAM devices using this, no part codes yet.
I think there is growing potential in External Memory, and considerable overlap in ALL the languages that run a byte-code-like inner engine. ( PropGCC, Tachyon, eBasic3 etc )
SPI was good, but a little slow especially with the lack of HW SPI support.
Then came QuadSPI, which has more potential, but I think the break-through device for P1, P1V, (and P2) will be the Spansion Hyperbus.
HyperBUS is 8 bits wide, clocked, DDR, and I think does not use PLLs so is tolerant of pauses in the clocks.
A P1 SW version will never push the limits, but a P1V and P2 are very different, and can bring HW support to this.
The beauty of this, is the SAME software flows work on all 3, just faster each time.
Unlike QuadSPI, the commands are byte wide, and it needs just 6 writes to fully define a command and address.
(At the HW detail, I think a simple 1G57 type delay can allow a CLK edge from the same byte-Write, using the Prop 9 bit mode, that saves needing WriteByte/WriteClock/WriteByte/WriteClock pairs so can double SW speed)
Read then specs a 5 clock Latency, and RDS shows when data is available, and can stream from there.
(unclear if that scales with CLk speed, or is always 5 ?)
Write currently looks to be 16b only, but should be plenty fast enough.
A P1 should manage ~ 4MBytes/sec write loop, which is likely to be well above the Flash Write speed, and is faster than any download link. (Spansion SPI data gives write ~ 1MB/s)
Suggestion: Get this working in PASM first, on P1 + PropGCC, and then look at how P1V can improve the throughput.
Comments
BGA is less than ideal, but with SMD flows, is no big issue.
There is an eval board, as they understand no one can lash-up a BGA.
RAM has been mentioned on the same bus, but no part codes / release dates yet.
HyperRAM™ products coming :
http://news.spansion.com/2014-10-09-Spansion-and-ISSI-to-Develop-RAM-Products-based-on-Breakthrough-Spansion-HyperBus-Interface
"Spansion and ISSI plan to release the first HyperRAM products in the first half of 2015 and will provide additional details at that time"
Some devices will mix RAM and FLASH
["By combining HyperFlash™ and HyperRAM components on a single bus, chipset providers can reduce controller pincount, accommodate smaller packages, simplify PCB designs, and potentially replace or reduce DRAM, resulting in significant cost savings while dramatically improving performance"]
Despite an IP license necessary for the use of HyperBus, Spansion is confident that the interface will be designed into a variety of products including SoCs, MCUs, memories and peripherals. Spansion's Ino said at a press conference, "More than two leading SoC vendors are already implementing the new HyperBus Interface on their chips."
In addition, Freescale is rolling out a number of HyperBus-based MCUs in the near future. Freescale has worked with Spansion to develop the interface, said Ray Cornyn, VP of product management for Freescale's automotive microcontrollers business
Should be very easy for Parallax to check, and I'd expect any CHIP vendor, to be VERY keen to sell more chips, so the license should be nominal (ie like NXPs i2c-bus terms), to allow many devices to connect with little effort.
Otherwise, the alternative could be like 'Two wire interface' semantics used by Atmel to avoid saying i2c.
Parallax can add a 'Memory state engine for 1-2-4-8 wide SPI', and include a mode that can generates a Hyperbus Address frame, and accept a CLK in signal.
[QUOTE=mark
Of course, there is a protocol of commands and addressing that can be layered on top of all that. A bit like ECP printer mode. SPI itself obviously has a lot of baggage when it comes to modes and commands.
Almost. Hyperbus is DDR clocked, which only some QuadSPI parts have as a mode, and Hyperbus is always in byte-wide mode, so the serial command byte is not needed. (Some SPI parts have special sticky-quad modes, but all that makes a simple HW support complex.)
Hyperbus has a fixed, 6-edge Command/Address frame, and it looks like the RDS (RdClk) is gated/delayed so first edge is on first valid data out. That looks a nice way to manage varying latencies, as well as allowing higher clock speeds.
On that basis, if a Prop2 can only pull 200MHz internal clocking then it'll be limited to 100MHz clock on a DDR interface.
Do you mean to send, or receive ?
DDR can be designed with a MUX at the pins, so 8 bit DDR can feed from 100MHz 16 bit bus, and the CLK H/L selects each half-clock data.
Likewise on receive, you can use 8 x Rising and 8 x falling edge registers, and then process x16 at CLK speed.
I expect the Upper P2 DDR limit will be set mainly by the pin-drivers, and I would doubt > 100MHz is practical, but even that gives 200MB/s
Some of the newest DRAMs use PLLs, but that is not mandatory for DDR itself, it is used mainly to meet timing.
The Hyperbus shows CLK start/stop, so it should not have any PLL, and the RDS (clock echo) manages round-trip delays.
in addition to a license fee, it goes against the Propeller philosophy of soft peripherals. All of a sudden, you have a specific HW interface built into the Prop using a hard defined set of pins. If you live by soft peripherals, you should live by soft memory interfaces (yes, there is a price to pay).
There's a Diet Pepsi next to me so I get product placement points from my sponsor!
What "philosophy" ?
The Video DACs already break all those claimed 'rules'.
There are good reasons to NOT wiggle every action in software, and any chip that wants to be taken seriously, needs to have performance figures to match.
The P2 SmartIO pins look to be addressing a lot of the speed issues.
Except that the prop has a HW video generator and counters :P
There seems to be plenty of interfaces which the propeller could manage sufficiently enough, but there are a lot that it can't. Of course, it's not reasonable to expect to do gigabit ethernet, but it *is* reasonable to expect decent performance from common interfaces like USB and external memory - at least in the form of "dedicated" helper instructions and core hardware support. I'm not going to go as far as to say hyperbus should be implemented (though it does look pretty darn nice), but certain things that might provide significant value to customers shouldn't be brushed off the table strictly because of some "philosophy" that was never fully adhered to in the first place. After all, there's plenty of examples of AVRs and whatnot that are generating video with no specialty hardware, and why have counters when you can just do it slower and crappier in software?
Here is this inner loop.
As you see, it is simply a rdbyte from HUB, and the byte is then used in a jump.
The hyperbus propably excels at block moves, not single random byte fetches.
so I don't see an immediate speedup.
Maybe you see it ??
In P1V there is clear scope to merge READ and INC into one opcode, and given all serial memory has two speeds, a smarter opcode that did INC.Compare.Jump.Read would allow those cases where a +1 read is valid, to skip generating a new address preamble. (that opcode is not limited to Hyperbus, ANY serial memory usually has Random and streaming modes, and I assume the Hyperbus flash is like QuadSPI Flash in streaming)
Any time Next read <> (Last+1), an address preamble is needed.
With HW, that can run at up to ~ 100MHz, so perhaps 80~100ns Random access, and linear access could be essentially instant (no hub-slot delays)
This type of read is also useful for FONT fetch, which would be the other majour use of off-chip serial memory.
Addit :
A two-die QuadSPI design that used Quad-DDR for all transfers (eg in a 48 bit Command/address preamble) would give similar speeds to Hyperbus, without the licensing questions.
Google finds that some vendors use DTR for DDR, and some parts can 'sticky' the command, to get very close to Hyperbus. (they avoid the very slow 1b 1edge command repeat issue )
eg ISSI mention in IS25LP032-064-128.pdf
FRQDTR (Fast Read Quad IO DTR Mode) OPERATION (without command decode cycles)
which needs 8 edges for address/command, and Data out appears after the 13th edge.
At the 66MHz DTR spec speed, that is ~ 200ns random access.
Given internal memory is 32b, and a 32b read over QuadSPI (DDR/DTR) also makes more sense, then an opcode that 'pico-cached' 32b values, and checked if the ByteCode was in the 32b holding register, before doing an external memory issue, would give speed gains in all memory feed cases.
So I provided the inner loop.
The complete implications are currently a little above my comprehension.
Being an interpreter you have the advantage of being able rewire the interpreter to continue using HubRAM for inner loop fetches via a program cache that prefills and holds active Fourth code for fast execution. However, code will be the easy part me thinks. A dataspace that exists in external memory will be more tricky.
Spansion's 64Mb HyperRAM will be sampling in 2Q15. It will be available in both 3V and 1.8V versions packaged in the market compatible 5x5 array BGA.
The Spansion HyperFlash memory family will offer 3V and 1.8V power-supply versions and initially include three densities: 128Mb, 256Mb and 512Mb, with 512Mb samples available.
Spansion HyperRAM memory can operate at frequencies as fast as 166MHz in DDR mode with a fast, random initial access time of 36ns
That 36ns is ~ 6 x 166MHz cycles of Tacc
- an obvious candidate to go into P1V variant.
64Mb of small, and easy to access RAM, with 120Mb of FLASH at XIP speeds.
https://www.micron.com/products/nor-flash/xtrmflash
https://www.micron.com/~/media/documents/products/technical-note/nor-flash/xtrmflash/tn2508_spi_flash_design_flexibility.pdf
Looks very similar, down to the Clock return (DQS?) for highest speed transfers.
Claims "random access times as fast as 83 nanoseconds (ns) and sequential byte reads as fast as 2.5ns"
128Mb to 2Gb, and initial samples are available.
A P2 version would be limited by process pin-toggle rates to a little below that, but it follows the HyperFLASH thrust.
Also says
["Micron fully intends for XTRMFlash memory and the interface to become widely adopted open standards, and as a testament to that effort and the XTRMFlash memory value proposition, flash memory provider Winbond Electronics has signed a license agreement to develop XTRMFlash-compatible solutions."]
Ideal target for Smart Pins.
HYPERFLASH-EVBSK-01
- could pay for parallax to buy all of them given the price showing is $0.0702
Also showing for HyperBUS is this $99 M4 module
http://www.cypress.com/documentation/development-kitsboards/fm4-176l-s6e2dh-arm-cortex-m4-fm4-graphics-mcu-starter-kit?source=search&keywords=hyperflash&cat=software_tools
Includes this : 512MB Spansion HyperFlash™ memory for image storage
http://www.issi.com/US/product-flash.shtml#jump1