Memory Interface and IO Peripherals in 2015

jmg · 2015-01-12 20:15

As time marches forward, the level of integration on small controllers continues to climb,
The new NUC505 has some topical examples :

This has

Core
- ARM® Cortex®-M4F with DSP and FPU
- Max frequency of 100 MHz
- Temperature range: -40℃ ~ 105℃

Memory
- 128 KB of embedded SRAM
- 2 MB of SPI Flash

Connectivity
- USB 2.0 HS Device
- Up to two USB 2.0 FS Host
- Up to three UARTs
- Up to three SPIs
- Up to two I²Cs (Up to 1 MHz)
- SD Host

Security for code protect
- 128-bit key for code protection against illegal pirating
- Supports multi-entry for 15 times

RTC
- Supports external power pin VBAT
- 32 bytes spare registers

I²S
- Supports Master or Slave mode operation
- Internal PLL for frequency adjustment
- Supports PCM mode A, PCM mode B, I²S and MSB justified data format
- Supports DMA mode

12-bit ADC
- Up to eight channels
- Up to 1MSPS conversion with ADC_CH1, and up to 200 kSPS with others (except ADC_CH0)

Audio CODEC
- Embedded Stereo 24-bit Sigma-Delta CODEC output
- ADC-THDN: -80 dB, Dynamic Range SNR: 90 dB (A-Weighted)
- Headphone Output-THDN:-60dB, Dynamic Range SNR: 93 dB (A-Weighted)
- Sample Rate: 8 kHz to 96 kHz

Packages
- LQFP48, LQFP64, QFN88

http://www.nuvoton.com/hq/products/microcontrollers/arm-cortex-m4-mcus/nuc505-series/Technical-Reference-Manual/?__locale=en&resourcePage=Y

The most interesting and P2 relevant 'peripherals' are the 2MB SPI with DMA support & 200Mbps is shown on SD Host link - as well as the 24b ADC and multiple USB FS/HS links .

My guess is Nuvoton use a Winbond SPI as stacked die, & they mention HW in-line cypher encryption for R/W of external SPI.
Packages as small as TQFP48

With the DMA and speeds mentioned, that delivers an impressive level of Serial Memory support, and having 2MB included inside the package, is a bonus.

I think P2 needs to be USB-capable, and have SPI HW that can do QuadSPI (DDR?)

Seairth · 2015-01-13 11:42

Agreed! Considering the current P2 space limitations, I say we make the following changes to P2:

* Remove all but 1 of the cogs (maybe keep a second one that is customized to do nothing but floating-point operations).
* Significantly reduce the number of registers in the cog and force the cog to run in HUBEXEC mode only (which will further allow a reduction in cog memory size).
* Remove the per-pin smart i/o, and instead dedicate specific pins to specific functions.

This should free up enough room to make a lot more of those other features available.

jmg · 2015-01-13 13:11

Seairth wrote: »

Agreed! Considering the current P2 space limitations, I say we make the following changes to P2:.

The good news is nothing nearly as radical as your suggestions are needed.

It is a simple matter of learning from others, and in the P2 context here, the focus is on the SPI memory support.
Nuvoton have HW 200Mbps Quad data path support, and if I understood the presentation they also have
* hw DMA
* Cypher support, in line and real time
* Memory mapping, so eXecute In Place is possible.

All up, it looks a nicely balanced design, for a chip that is RAM based and boots from SPI.

The expectation is the P2 will have SPI HW support (hopefully QuadSPI), and Memory mapping would allow HUBEXEC to also work as 'SPIEXEC', into megabytes of code space.

evanh · 2015-01-13 13:54

Totally worthless hardware on SMP architecture like the Prop. Prop2 Hub bandwidth is 16 x 200 x 4 = 12800MB/s. One SPI interface clocked at 200MHz is 200 / 8 = 25MB/s and that's without any address setup. Even two Cogs accessing a large application on SPI ROM will probably thrash it's way down to under 1MB/s.

Load it to HubRAM and run from there thank you very much. If that means something like overlays then so be it.

evanh · 2015-01-13 14:17

To put that comment in context, what was the results of using XMM for XIP on the Prop1? I imagine there was a trend to cache the memory fetches into HubRAM ... Well, if there is a fixed XIP DMA engine doing the fetching then there is far less opportunity for such caching.

Streaming data transfers should be about as far as any hardware needs to go. This would apply to SDRAM also. Leave the native execution to Cog/Hub accesses only. Be glad we're getting native HubExec.

jmg · 2015-01-13 16:14

evanh wrote: »

Load it to HubRAM and run from there thank you very much. If that means something like overlays then so be it.

I think that is the intended mode for the NUC505 for critical Code flows, but there are cases where data may reside in SPI Flash, where efficient (ie HW supported) addressing and burst streaming will increase throughput, and make the code to access that data simpler.

Yes, SPI will always be slower than on-chip parallel access, but the point is to ensure to improve the bottle necks, rather than ignore them because something else is faster.
SPI is much larger, and cheaper, than on chip RAM, so it has an important place in the design toolbox on a part like the P2.

evanh · 2015-01-13 20:56

People are already using I2C ROMs with the Prop1 happily this way just by bit bashing. A complete hardware solution would do very little for performance, takes up silicon, and can't be bug fixed. A bit-shifter, simplified SPI, is about all that's needed in hardware. I imagine Chip's earlier UART design will have options for simple bit-shifting without the framing bits.

jmg · 2015-01-13 21:48

evanh wrote: »

People are already using I2C ROMs with the Prop1 happily this way just by bit bashing. .

Sure they are, but i2c is WAY slower than 200Mbps.
Using a COG to just wiggle pins, is a terrible waste of silicon, and will always be slower, and less power efficient, than HW support.
QuadSPI is not complex Verilog, and is very easy to test.

The reality is, devices like the NUC505 set users expectations.

evanh · 2015-01-13 23:10

I believe I mentioned having a basic shifter is fine.

Ramon · 2015-01-14 04:20

evanh wrote: »

Totally worthless hardware ...

And totalIy worthless discusion about hardware that is not possible to buy.

Remember Spansion hyperbus? Announced in March 2014, no distributor has them yet. (even ISSI talked about making hyperRAM)
Remember FTDI FT900 series? Announced in Feb 2014, no distributor has them yet. (Also I wonder how much will we wait for the announced FTDI FT600 that allows USB3.0)
Remember Lattice MachXO3 announced with gigabit ethernet and many other things? After more than 6 months they delivered something really behind their own marketing expectations.

Please, talk about something that can be actually bought (on digikey, mouser, anywhere). This is not the first time that someone announce a revolutionary IC that never see the light.

jmg · 2015-01-14 11:05

Ramon wrote: »

Please, talk about something that can be actually bought (on digikey, mouser, anywhere). This is not the first time that someone announce a revolutionary IC that never see the light.

You cannot buy a P2 either

QuadSPI is not a revolutionary IC that never see the light - it has been available for some time.
MCU with QuadSPI interface support are also available now.

( There are also Show dates, BSP downloads, TRM, & NuTiny-SDK-NUC505 showing (with what looks a kosher 2014 week 40 date code?) , all up making a NUC505 much closer than P2.)

Thus NUC505 is more incremental than revolutionary, in that they include a 2MB SPI in the package, and add well-thought-out hardware support, to make what is a cheap, but modest bandwidth, pathway easier to use, and with little waste.

evanh · 2015-01-14 15:00

Given Quad-SPI is a parallel interface, the Prop2 will be able to bit-bash at a fair pace, maybe as high as the rated 50 MHz. There is very little need of even a shifter for such things.

The one argument that does make some sense is on the basis of power efficiency. A specific hardware solution is usually more power efficient. But it still needs to be a correct solution. Where's the full spec for Quad-SPI?

jmg · 2015-01-14 16:26

evanh wrote: »

..
The one argument that does make some sense is on the basis of power efficiency. A specific hardware solution is usually more power efficient. But it still needs to be a correct solution. Where's the full spec for Quad-SPI?

The Data sheets spec the SPI details, and give the wide price / size options, I would target at least 2 points

Examples could be (- these are real devices, on the shelf, and easy to test on a FPGA board.)

sub 50c/1k : 24b Address, Low cost parts 2MByte

http://www.winbond.com/NR/rdonlyres/0143840F-8550-480A-9C1C-9261DE1DFCD7/0/W25Q16DV.pdf

~ $2/1k Double Transfer Rate added, 32b Address, 32M Byte

http://www.micron.com/~/media/Documents/Products/Data%20Sheet/NOR%20Flash/Serial%20NOR/N25Q/n25q_256mb_3v_65nm.pdf

The DTR feature option allows even more system power savings, and lower RFI..

Heater. · 2015-01-15 00:53

Searith,

Agreed! Considering the current P2 space limitations, I say we make the following changes to P2:

* Remove all but 1 of the cogs (maybe keep a second one that is customized to do nothing but floating-point operations).
* Significantly reduce the number of registers in the cog and force the cog to run in HUBEXEC mode only (which will further allow a reduction in cog memory size).
* Remove the per-pin smart i/o, and instead dedicate specific pins to specific functions.

Do what? I do hope that was intended with a big glob of humor. It's hard to tell when I have only just woken up.

Ramon · 2015-01-15 07:09

jmg wrote: »

You cannot buy a P2 either

You don't need to buy it. It is FREE. You just only need to buy the DE0-NANO EC4CE22 and throw the bitstream to it. The P2 is real. Trust me.

Dave Hein · 2015-01-15 07:21

Ramon wrote: »

The P2 is real. Trust me.

I must have slept through the past year. Where do I get the FPGA image?

Seairth · 2015-01-15 09:40

Heater. wrote: »

Searith,

Do what? I do hope that was intended with a big glob of humor. It's hard to tell when I have only just woken up.

I was indeed joking.

The "changed" version I suggested was basically the NUC505 itself. The point is: let the P2 be what it's going to be and stop trying to make it be like something else. I know the addition of QSPI would be useful in some circumstances, but so would a heap of other technologies. Frankly, I am still amazed at what people were able to pull off with the P1, most of which was probably not even thought possible when the P1 was first released. I suspect the same will be true for P2.

jmg · 2015-01-15 11:09

Seairth wrote: »

I know the addition of QSPI would be useful in some circumstances....

Then we actually agree, and given the P2 will connect to some SPI memory in every application, and QSPI is mature technology, and cheap, users are simply going to expect that is supported.

It's not really 'technology', more a matter of balanced system design.

A lobotomised P2 that fails to hit critical mass, helps no one, and history has many examples of MCUs that failed with one factor being they failed balanced system design.

Electrodude · 2015-01-15 11:31

Can't you do QSPI by just doing SPI on 4 pins? I got the impression that SPI would be done by telling one pin to toggle at some frequency and another to send asynchronous data (except it's at the same speed as the clock pin and has the right phase so it's effectively synchronous). To do QSPI, wouldn't you just set up 4 pins to send data at once? The bit-interleaver instruction (I forget what it's called), that was in the old P2 and will hopefully be in this one if it's cheap enough, can interleave the bits for you.

jmg · 2015-01-15 11:38

Electrodude wrote: »

Can't you do QSPI by just doing SPI on 4 pins? I got the impression that SPI would be done by telling one pin to toggle at some frequency and another to send asynchronous data (except it's at the same speed as the clock pin and has the right phase so it's effectively synchronous). To do QSPI, wouldn't you just set up 4 pins to send data at once? The bit-interleaver instruction (I forget what it's called), that was in the old P2 and will hopefully be in this one if it's cheap enough, can interleave the bits for you.

Correct, qSPI is not a complex addition from SPI at all, it just needs a nibble pathway supported to the pins & a config boolean.
Chip has indicated before this is likely.
The qSPI.DTR option sends data on both edges of the SPI_CLK, so that adds a MUX and Byte pathway, & another config boolean, still elementary levels of Logic.

Ramon · 2015-01-16 04:38

Dave, I think this was the latest one (Feb 6 2014) -> http://forums.parallax.com/showthread.php/153422-HUB-EXEC-Update-Here?p=1237960&viewfull=1#post1237960

Dave Hein · 2015-01-17 09:03

OK, I guess I didn't sleep through all of last year. That FPGA image was never fully documented, and as you can see in the thread there was a lot of discussion about enhancements to the P2 after that. And then the s#!+ hit the fan and the P2 development was restarted. The last fully documented P2 FPGA image was back in October 2013 -- at least that's the last image that I played around with. The current P2 design is quite different from the design from last year and before. I'm hoping we see a P2 FPGA image soon. It's been over 3 weeks since the hoped for XMAS 2014 FPGA image, and there's been no indication from Parallax when the P2 image is expected. I'm guessing Chip has been spending most of his time on the Parallax FPGA board and working out details with the P2 chip layout.

Ramon · 2015-01-17 21:08

As soon as Chip release a new image I guess we will read zillions of emails saying: "Chip, there is no HW Quad SPI! Please, make a new image ... "Chip, can you include DDR for hubexec? When will be a new image?", "Chip, there are no instructions for HW USB, it will be on the next image?", "Chip! There is no hyperbus, I want a new image !" ... until the end of the days.

I wish there will be no new image, just the P2 IC.

jmg · 2015-01-17 21:29

An optimist might be guided by what Chip has said before, and expect to see QuadSPI on the agenda

Hyperbus would be nice, but unless there is real silicon to test, that is hard to envision.
However, there is a lot of QuadSPI (DTR too) out there that is easy to test with.

A direct jump to P2 is very unlikely, as that would make the Parallax FPGA board a waste of time, and the idea behind that, is to allow extensive field testing of code revisions.
Ken wants to avoid (or make the risk as low as possible) of a 2nd tapeout, for obvious reasons.

If the idea of code revisions really does scare you, then perhaps take a break from the P2 development until the revisions stop ?

evanh · 2015-01-18 23:18

jmg wrote: »

An optimist might be guided by what Chip has said before, and expect to see QuadSPI on the agenda

It does beg the question as to why you might want so much community discussion when you already think it's included?

jmg · 2015-02-03 22:10

I see a price point for the 88 pin version of new NUC505 is up :

NUC505YO13Y Nuvoton Technology Corp M4 WITH FLASH BASED , 88PIN Stock: 1344 $3.20 (@ 1344)

jmg · 2015-03-03 21:56

This just in on Hyperbus RAM

Spansion's 64Mb HyperRAM will be sampling in 2Q15. It will be available in both 3V and 1.8V versions packaged in the market compatible 5x5 array BGA.

The Spansion HyperFlash memory family will offer 3V and 1.8V power-supply versions and initially include three densities: 128Mb, 256Mb and 512Mb, with 512Mb samples available.

jmg · 2015-04-19 19:42

and this one is an IO peripheral, but more directly equivalent to a P2 use itself.
FTDI have the EVE80x parts, but they were too small and that left an opening for P2.

Now this news...
http://www.ftdichip.com/Products/ICs/FT81X.html

Maximum screen resolution increased from 512x512 pixels to 800x600 pixels
Bitmap size increased from 18-bit to 24-bit (FT812 and FT813)
Extra large ROM fonts added (There are a total of 19 ROM fonts)
Orientation switching support added with REG ROTATE and CMD SETROTATE
Main memory increased from 256kBytes to 1Mbyte
Master clock frequency improvement - REG PCLK can now be 1
Smooth back-motion JPG encoded AVI video playback
Multiple 16/32 bit colour palettes supported with transparency
L2 format supported for efficient DXT1-style bitmaps
CMD MEDIAFIFO specifies an area of main memory to use as a FIFO for JPG, AVI and PNG loading
Multiple numeric formats supported including binary, octal, decimal and hex
Simplified font loading with CMD SETFONT2
MCU transfer with QSPI interface option
Performance Improvements
500-1000 times faster JPG loading
Pixel clock drawing increased from 4 to 16 pixels per clock
Fetch cull primitives per clock boosted from 1 to 2
Quicker Firmware memory operations (e.g. copy, fill, CRC) due to tuned inner loops
CMD SNAPSHOT hundreds of times faster

Still, the P2 can always control one of these devices, but that QSPI looks more important by the day...

mark · 2015-04-20 05:59

I know this is irrelevant to the point you're trying to make, but that's a pretty neat chip. However, I can't help but be somewhat surprised there's a market for it. If the ft801 is anything to go by, then at least $4 a pop in quantities of 2,500. I have to imagine there's MCUs out there at that price point with similar capabilities, so why use this thing?

Brian Fairchild · 2015-04-20 07:02

[QUOTE=mark

jmg · 2015-04-20 14:28

[QUOTE=mark

Memory Interface and IO Peripherals in 2015

Comments