Spansion HyperBUS for P1, P1V, & P2 ?

jmg · 2014-10-06 16:34

The Spansion Hyperbus is close to full release, there is mention on the web of a S26KL512S HyperFlash Demo Board, but no price/order info yet

Some HyperBUS timing info is here
http://core.spansion.com/article/spansion-hyperbus-interface-enables-breakneck-read-throughput-speeds/#.VDMX7lcjAkI

Published device codes are S26KL128S S26KL256S S26KL512S (S26KL512S due first)
There is also web mention of possible upcoming RAM devices using this, no part codes yet.

I think there is growing potential in External Memory, and considerable overlap in ALL the languages that run a byte-code-like inner engine. ( PropGCC, Tachyon, eBasic3 etc )
SPI was good, but a little slow especially with the lack of HW SPI support.
Then came QuadSPI, which has more potential, but I think the break-through device for P1, P1V, (and P2) will be the Spansion Hyperbus.

HyperBUS is 8 bits wide, clocked, DDR, and I think does not use PLLs so is tolerant of pauses in the clocks.

A P1 SW version will never push the limits, but a P1V and P2 are very different, and can bring HW support to this.
The beauty of this, is the SAME software flows work on all 3, just faster each time.

Unlike QuadSPI, the commands are byte wide, and it needs just 6 writes to fully define a command and address.
(At the HW detail, I think a simple 1G57 type delay can allow a CLK edge from the same byte-Write, using the Prop 9 bit mode, that saves needing WriteByte/WriteClock/WriteByte/WriteClock pairs so can double SW speed)

Read then specs a 5 clock Latency, and RDS shows when data is available, and can stream from there.
(unclear if that scales with CLk speed, or is always 5 ?)

Write currently looks to be 16b only, but should be plenty fast enough.
A P1 should manage ~ 4MBytes/sec write loop, which is likely to be well above the Flash Write speed, and is faster than any download link. (Spansion SPI data gives write ~ 1MB/s)

Suggestion: Get this working in PASM first, on P1 + PropGCC, and then look at how P1V can improve the throughput.

Ale · 2014-10-07 09:50

It sounds quite good. I know P1V can do differential clocks... but will P2 bring something like that ?... the memories have also a BGA package... Do they have some RAM with a similar interface ?

jmg · 2014-10-07 11:52

Ale wrote: »

It sounds quite good. I know P1V can do differential clocks... but will P2 bring something like that ?... the memories have also a BGA package... Do they have some RAM with a similar interface ?

I think the Differential clocks are optional, so this should work on P1
BGA is less than ideal, but with SMD flows, is no big issue.
There is an eval board, as they understand no one can lash-up a BGA.

RAM has been mentioned on the same bus, but no part codes / release dates yet.

jmg · 2014-10-20 15:06

an update has this news
HyperRAM™ products coming :

http://news.spansion.com/2014-10-09-Spansion-and-ISSI-to-Develop-RAM-Products-based-on-Breakthrough-Spansion-HyperBus-Interface

"Spansion and ISSI plan to release the first HyperRAM products in the first half of 2015 and will provide additional details at that time"

Some devices will mix RAM and FLASH

["By combining HyperFlash™ and HyperRAM components on a single bus, chipset providers can reduce controller pincount, accommodate smaller packages, simplify PCB designs, and potentially replace or reduce DRAM, resulting in significant cost savings while dramatically improving performance"]

mark · 2014-10-20 20:11

This sounds like a nice balance between something like SPI sram and parallel s/dram. However, I have to wonder how much traction this interface will gain, as it would be a major decision to incorporate native hardware support for it. Also, will it be an open standard, or does the technology have to be licensed (as I presume this would be a deal breaker for Parallax)? Lastly, I can't see how it would "dramatically" improve performance, relative to parallel ram that is, or result in significant cost savings since it's unlikely to be cheaper than other common forms of ram and flash. That said, I really would like this to be a viable option.

jmg · 2014-10-20 20:32

Licensing is a good point, I found this

Despite an IP license necessary for the use of HyperBus, Spansion is confident that the interface will be designed into a variety of products including SoCs, MCUs, memories and peripherals. Spansion's Ino said at a press conference, "More than two leading SoC vendors are already implementing the new HyperBus Interface on their chips."

In addition, Freescale is rolling out a number of HyperBus-based MCUs in the near future. Freescale has worked with Spansion to develop the interface, said Ray Cornyn, VP of product management for Freescale's automotive microcontrollers business

Should be very easy for Parallax to check, and I'd expect any CHIP vendor, to be VERY keen to sell more chips, so the license should be nominal (ie like NXPs i2c-bus terms), to allow many devices to connect with little effort.

Otherwise, the alternative could be like 'Two wire interface' semantics used by Atmel to avoid saying i2c.
Parallax can add a 'Memory state engine for 1-2-4-8 wide SPI', and include a mode that can generates a Hyperbus Address frame, and accept a CLK in signal.

[QUOTE=mark

mark · 2014-10-20 23:28

I have to wonder when or if we'll see 8-bit SPI. Of course there's 8-bit MMC for flash widely used in phones/tablets, and apparently atmel had something similar called Rapid8, but I guess there has never been any interest in using these interfaces for anything other than flash.

evanh · 2014-10-21 00:10

Hehe, given QSPI doesn't serialise it's data - it's actually a nibble mode parallel port, that would make an 8-bit version the same as Hyperbus which is pretty close to identical to a centronics port (aka PC parallel port).

Of course, there is a protocol of commands and addressing that can be layered on top of all that. A bit like ECP printer mode. SPI itself obviously has a lot of baggage when it comes to modes and commands.

evanh · 2014-10-21 01:10

A 256MB MRAM part would be cool.

jmg · 2014-10-21 02:22

evanh wrote: »

Hehe, given QSPI doesn't serialise it's data - it's actually a nibble mode parallel port, that would make an 8-bit version the same as Hyperbus ....

Almost. Hyperbus is DDR clocked, which only some QuadSPI parts have as a mode, and Hyperbus is always in byte-wide mode, so the serial command byte is not needed. (Some SPI parts have special sticky-quad modes, but all that makes a simple HW support complex.)
Hyperbus has a fixed, 6-edge Command/Address frame, and it looks like the RDS (RdClk) is gated/delayed so first edge is on first valid data out. That looks a nice way to manage varying latencies, as well as allowing higher clock speeds.

evanh · 2014-10-21 03:13

Speaking of which, has anyone come across a DDR functional diagram? I'm guessing it requires an internal clock at least double that of the external bus.

On that basis, if a Prop2 can only pull 200MHz internal clocking then it'll be limited to 100MHz clock on a DDR interface.

jmg · 2014-10-21 12:25

evanh wrote: »

Speaking of which, has anyone come across a DDR functional diagram? I'm guessing it requires an internal clock at least double that of the external bus.

On that basis, if a Prop2 can only pull 200MHz internal clocking then it'll be limited to 100MHz clock on a DDR interface.

Do you mean to send, or receive ?
DDR can be designed with a MUX at the pins, so 8 bit DDR can feed from 100MHz 16 bit bus, and the CLK H/L selects each half-clock data.
Likewise on receive, you can use 8 x Rising and 8 x falling edge registers, and then process x16 at CLK speed.

I expect the Upper P2 DDR limit will be set mainly by the pin-drivers, and I would doubt > 100MHz is practical, but even that gives 200MB/s

Some of the newest DRAMs use PLLs, but that is not mandatory for DDR itself, it is used mainly to meet timing.
The Hyperbus shows CLK start/stop, so it should not have any PLL, and the RDS (clock echo) manages round-trip delays.

evanh · 2014-10-21 14:26

True, 100MHz is enough. Parallax would be licensing Hyperbus me thinks. I'd be happy with SDRAM.

mark · 2014-10-21 16:40

Just stack the P2 on a ram die and wire bond the two together and call it a day.

For non-volatile storage we can go with eMMC.

Cluso99 · 2014-10-21 17:02

It's a licence fee that Parallax doesn't need because it adds to the end price of every P2.

mindrobots · 2014-10-21 17:33

I'll say it and poke this hornet's nest....

in addition to a license fee, it goes against the Propeller philosophy of soft peripherals. All of a sudden, you have a specific HW interface built into the Prop using a hard defined set of pins. If you live by soft peripherals, you should live by soft memory interfaces (yes, there is a price to pay).

There's a Diet Pepsi next to me so I get product placement points from my sponsor!

jmg · 2014-10-21 17:59

mindrobots wrote: »

in addition to a license fee, it goes against the Propeller philosophy of soft peripherals. All of a sudden, you have a specific HW interface built into the Prop using a hard defined set of pins. If you live by soft peripherals, you should live by soft memory interfaces (yes, there is a price to pay).

What "philosophy" ?
The Video DACs already break all those claimed 'rules'.

There are good reasons to NOT wiggle every action in software, and any chip that wants to be taken seriously, needs to have performance figures to match.
The P2 SmartIO pins look to be addressing a lot of the speed issues.

mark · 2014-10-21 18:08

mindrobots wrote: »

I'll say it and poke this hornet's nest....

in addition to a license fee, it goes against the Propeller philosophy of soft peripherals. All of a sudden, you have a specific HW interface built into the Prop using a hard defined set of pins. If you live by soft peripherals, you should live by soft memory interfaces (yes, there is a price to pay).

There's a Diet Pepsi next to me so I get product placement points from my sponsor!

Except that the prop has a HW video generator and counters :P

There seems to be plenty of interfaces which the propeller could manage sufficiently enough, but there are a lot that it can't. Of course, it's not reasonable to expect to do gigabit ethernet, but it *is* reasonable to expect decent performance from common interfaces like USB and external memory - at least in the form of "dedicated" helper instructions and core hardware support. I'm not going to go as far as to say hyperbus should be implemented (though it does look pretty darn nice), but certain things that might provide significant value to customers shouldn't be brushed off the table strictly because of some "philosophy" that was never fully adhered to in the first place. After all, there's plenty of examples of AVRs and whatnot that are generating video with no specialty hardware, and why have counters when you can just do it slower and crappier in software?

MJB · 2014-10-23 06:20

jmg wrote: »

The Spansion Hyperbus is close to full release, there is mention on the web of a S26KL512S HyperFlash Demo Board, but no price/order info yet

Some HyperBUS timing info is here
http://core.spansion.com/article/spansion-hyperbus-interface-enables-breakneck-read-throughput-speeds/#.VDMX7lcjAkI

Published device codes are S26KL128S S26KL256S S26KL512S (S26KL512S due first)
There is also web mention of possible upcoming RAM devices using this, no part codes yet.

I think there is growing potential in External Memory, and considerable overlap in ALL the languages that run a byte-code-like inner engine. ( PropGCC, Tachyon, eBasic3 etc )

hi jmg, you had asked about the options for using the hyperbus to load code in the inner loop of TACHYON.
Here is this inner loop.
As you see, it is simply a rdbyte from HUB, and the byte is then used in a jump.
The hyperbus propably excels at block moves, not single random byte fetches.
so I don't see an immediate speedup.
Maybe you see it ??

{ *** [COLOR=#020FC0][B]TACHYON [/B][/COLOR]RUNTIME BYTECODE INTERPRETER *** }

'    *    *    *    *    
' Fetch the next byte code instruction in hub RAM pointed to by the instruction pointer IP
' This is the very heart of the runtime interpreter
'
doNEXT        rdbyte    instr,IP    'read byte code instruction
         add    IP,#1 wc    'advance IP to next byte token (clears the carry too!)
         jmp    instr    'execute the code by directly indexing the first 256 long in cog

jmg · 2014-10-23 14:23

MJB wrote: »

As you see, it is simply a rdbyte from HUB, and the byte is then used in a jump.
The hyperbus propably excels at block moves, not single random byte fetches.
so I don't see an immediate speedup.
Maybe you see it ??

I'm not quite sure what you are asking, but any external memory will be slower than on-chip RAM reads, so the issue is less 'speedup' than avoiding large drops in speed.

In P1V there is clear scope to merge READ and INC into one opcode, and given all serial memory has two speeds, a smarter opcode that did INC.Compare.Jump.Read would allow those cases where a +1 read is valid, to skip generating a new address preamble. (that opcode is not limited to Hyperbus, ANY serial memory usually has Random and streaming modes, and I assume the Hyperbus flash is like QuadSPI Flash in streaming)

Any time Next read <> (Last+1), an address preamble is needed.

With HW, that can run at up to ~ 100MHz, so perhaps 80~100ns Random access, and linear access could be essentially instant (no hub-slot delays)

This type of read is also useful for FONT fetch, which would be the other majour use of off-chip serial memory.

Addit :
A two-die QuadSPI design that used Quad-DDR for all transfers (eg in a 48 bit Command/address preamble) would give similar speeds to Hyperbus, without the licensing questions.
Google finds that some vendors use DTR for DDR, and some parts can 'sticky' the command, to get very close to Hyperbus. (they avoid the very slow 1b 1edge command repeat issue )

eg ISSI mention in IS25LP032-064-128.pdf
FRQDTR (Fast Read Quad IO DTR Mode) OPERATION (without command decode cycles)
which needs 8 edges for address/command, and Data out appears after the 13th edge.
At the 66MHz DTR spec speed, that is ~ 200ns random access.

Given internal memory is 32b, and a 32b read over QuadSPI (DDR/DTR) also makes more sense, then an opcode that 'pico-cached' 32b values, and checked if the ByteCode was in the 32b holding register, before doing an external memory issue, would give speed gains in all memory feed cases.

MJB · 2014-10-24 02:32

jmg wrote: »

I'm not quite sure what you are asking, but any external memory will be slower than on-chip RAM reads, so the issue is less 'speedup' than avoiding large drops in speed.

In P1V there is clear scope to merge READ and INC into one opcode, and given all serial memory has two speeds, a smarter opcode that did INC.Compare.Jump.Read would allow those cases where a +1 read is valid, to skip generating a new address preamble. (that opcode is not limited to Hyperbus, ANY serial memory usually has Random and streaming modes, and I assume the Hyperbus flash is like QuadSPI Flash in streaming)

Any time Next read <> (Last+1), an address preamble is needed.

With HW, that can run at up to ~ 100MHz, so perhaps 80~100ns Random access, and linear access could be essentially instant (no hub-slot delays)

This type of read is also useful for FONT fetch, which would be the other majour use of off-chip serial memory.

Addit :
A two-die QuadSPI design that used Quad-DDR for all transfers (eg in a 48 bit Command/address preamble) would give similar speeds to Hyperbus, without the licensing questions.
Google finds that some vendors use DTR for DDR, and some parts can 'sticky' the command, to get very close to Hyperbus. (they avoid the very slow 1b 1edge command repeat issue )

eg ISSI mention in IS25LP032-064-128.pdf
FRQDTR (Fast Read Quad IO DTR Mode) OPERATION (without command decode cycles)
which needs 8 edges for address/command, and Data out appears after the 13th edge.
At the 66MHz DTR spec speed, that is ~ 200ns random access.

Given internal memory is 32b, and a 32b read over QuadSPI (DDR/DTR) also makes more sense, then an opcode that 'pico-cached' 32b values, and checked if the ByteCode was in the 32b holding register, before doing an external memory issue, would give speed gains in all memory feed cases.

I'm not quite sure what you are asking

I saw your message to Peter asking for the inner loop of Tachyon and if hyperbus would be a good fit.
So I provided the inner loop.
The complete implications are currently a little above my comprehension.

evanh · 2014-10-24 02:34

MJB wrote: »

hi jmg, you had asked about the options for using the hyperbus to load code in the inner look of TACHYON.
Here is this inner loop ...

Being an interpreter you have the advantage of being able rewire the interpreter to continue using HubRAM for inner loop fetches via a program cache that prefills and holds active Fourth code for fast execution. However, code will be the easy part me thinks. A dataspace that exists in external memory will be more tricky.

jmg · 2015-03-03 21:59

and a few months later, and there is this
Spansion's 64Mb HyperRAM will be sampling in 2Q15. It will be available in both 3V and 1.8V versions packaged in the market compatible 5x5 array BGA.

The Spansion HyperFlash memory family will offer 3V and 1.8V power-supply versions and initially include three densities: 128Mb, 256Mb and 512Mb, with 512Mb samples available.
Spansion HyperRAM memory can operate at frequencies as fast as 166MHz in DDR mode with a fast, random initial access time of 36ns

That 36ns is ~ 6 x 166MHz cycles of Tacc

- an obvious candidate to go into P1V variant.

64Mb of small, and easy to access RAM, with 120Mb of FLASH at XIP speeds.

jmg · 2015-10-27 07:01

one year from the first noises on Spansion HyperFLASH, we now see XTRMFlash from Micron

https://www.micron.com/products/nor-flash/xtrmflash

https://www.micron.com/~/media/documents/products/technical-note/nor-flash/xtrmflash/tn2508_spi_flash_design_flexibility.pdf

Looks very similar, down to the Clock return (DQS?) for highest speed transfers.

Claims "random access times as fast as 83 nanoseconds (ns) and sequential byte reads as fast as 2.5ns"

128Mb to 2Gb, and initial samples are available.

A P2 version would be limited by process pin-toggle rates to a little below that, but it follows the HyperFLASH thrust.

Also says
["Micron fully intends for XTRMFlash memory and the interface to become widely adopted open standards, and as a testament to that effort and the XTRMFlash memory value proposition, flash memory provider Winbond Electronics has signed a license agreement to develop XTRMFlash-compatible solutions."]

Ideal target for Smart Pins.

jmg · 2015-11-09 05:05

.. and Findchips is showing stock for
HYPERFLASH-EVBSK-01
- could pay for parallax to buy all of them given the price showing is $0.0702

Also showing for HyperBUS is this $99 M4 module

http://www.cypress.com/documentation/development-kitsboards/fm4-176l-s6e2dh-arm-cortex-m4-fm4-graphics-mcu-starter-kit?source=search&keywords=hyperflash&cat=software_tools

Includes this : 512MB Spansion HyperFlash™ memory for image storage

jmg · 2015-11-22 02:35

ISSI is showing Data Sheets for 128/256/512Mb Hyperflash

http://www.issi.com/US/product-flash.shtml#jump1

Spansion HyperBUS for P1, P1V, & P2 ?

Comments