Probably a stupid question ... 2xQuadSPI support.

jazzed · 2011-06-04 13:16

Can any MCU besides Propeller read/write 2 QuadSPI chips simultaneously?

User Name · 2011-06-04 13:27

Well some of the Blackfin chips from Analog Devices have several of what they call SPORT ports. Seems like each SPORT is capable of more than one SPI transaction at a time. But don't trust ADI's dodgy drivers to make it all happen for you. You might just be spending months making everything work in synchrony. This is in stark contrast to the Propeller, which might take 1/2 hour to so equip.

BTW, the Blackfin is also quite a bit more expensive. Its IDE is infinitely more expensive.

Leon · 2011-06-04 13:39

XMOS, of course.

User Name · 2011-06-04 13:42

Leon wrote: »

XMOS, of course.

Curiously, Leon, the same thing applies here: "You might just be spending months making everything work in synchrony."

Leon · 2011-06-04 13:57

Why? XMOS devices are fully deterministic, and equally demanding interfaces have been developed very quickly. If there are timing problems, the XMOS Timing Analyzer tool will solve them very quickly.

jmg · 2011-06-04 14:08

jazzed wrote: »

Can any MCU besides Propeller read/write 2 QuadSPI chips simultaneously?

The Atmel XMega's timers claim to do this, and some other vendors have state-engine support around the timers, so the support is not always called Quadrature Counting.

Then you have the Asym-dual-cores, like NXPs M0/M4 pairing, and Freescales XGATE

Or, you can use a CPLD, if you need speed and precision...

User Name · 2011-06-04 14:14

Oh, no. It has nothing to do with determinism nor the inherent quality and power of XMOS. It has to do with the "very quickly" part. Perhaps it would be quick for David May to implement. For lesser mortals and particularly the uninitiated, it's almost insurmountable.

jazzed · 2011-06-04 14:19

LIke I said, "probably a stupid question" ...

LEON and jmg, can you point to specific demo code or app notes where 2 QuadSPI chips are being used in parallel to form an 8 bit bus with a chip select and clock that can be read/written by the processors? This description is a little different from the first post, but this is more precisely what I meant.

@User Name, the code to do this for Propeller was indeed a very fast development effort.
I can post my examples if that helps anyone.

jmg · 2011-06-04 14:25

jazzed wrote: »

LEON and jmg, can you point to specific demo code or app notes where 2 QuadSPI chips are being used in parallel to form an 8 bit bus with a chip select and clock that can be read/written by the processors? This description is a little different from the first post, but this is more precisely what I meant.

Oops, I miss-read the QuadXXX to be Quad counting. Brain fade.

For QuadSPI, that is new on uC, but ISTR a NXP upper-end variant claiming Dual QuadSPI capability ?
More suppliers are starting to think about Execute-In-Place that QuadSPI can allow, but I have not seen QuadSPI SRAMs yet.
Cypress had something in a press release, but no devices yet ?

Leon · 2011-06-04 14:35

Why should development be any slower with XMOS than with the Propeller? One reason why XMOS chips are so popular is that development is typically very fast, with applications being completed far more quickly than is possible with other solutions.

NXP has the LPC1800 Cortex-M3 device with a QuadSPI interface, and the dual-core LPC4000 devices also have one. They run at 40 MHz.

Jazzed,

I didn't say that QuadSPI devices had been interfaced to an XMOS device, just that it should be feasible.

What are the QuadSPI devices that you interfaced to the Propeller? What speed do you get?

jazzed · 2011-06-04 14:38

Leon wrote: »

Why should it be any slower with XMOS than with the Propeller? One reason why XMOS chips are so popular is that development is typically very fast, with applications being completed far more quickly than is possible with other solutions.

So you don't intend to favor us with an example or app note even in a http link?

jazzed · 2011-06-04 14:46

Leon wrote: »

Why should development be any slower with XMOS than with the Propeller? One reason why XMOS chips are so popular is that development is typically very fast, with applications being completed far more quickly than is possible with other solutions.

NXP has the LPC1800 Cortex-M3 device with a QuadSPI interface, and the dual-core LPC4000 devices also have one.

Jazzed,

I didn't say that QuadSPI devices had been interfaced to an XMOS device, just that it should be feasible.

What are the QuadSPI devices that you interfaced to the Propeller?

That helps

Except that the LPC1800 has only one QuadSPI interface.
LPC4000 has two QuadSPI interfaces, but it is unclear if they can use 2 chips for simultaneous access.

I'm using 2 W25Qxx devices. Rayman has drivers for the ST series SQI devices.

Leon · 2011-06-04 14:47

http://ics.nxp.com/products/lpc4000/all/~LPC4322/

http://ics.nxp.com/literature/leaflets/microcontrollers/pdf/lpc18xx.pdf

What speed are you getting?

What are the actual devices you used? I'll order a couple and see what I can do with one of my XMOS boards. Digi-Key seems to be the only source for the Winbond parts.

jazzed · 2011-06-04 15:02

Leon wrote: »

http://ics.nxp.com/products/lpc4000/all/~LPC4322/

http://ics.nxp.com/literature/leaflets/microcontrollers/pdf/lpc18xx.pdf

Cross-posted. See previous reply.

Leon wrote: »

What speed are you getting?

The maximum data rate from pins to hub vs-a-vs for propeller in one COG clkfreq/16. ARM with a special interace would trump that pretty easily.

Anyway, as far as I can tell no other MCU can do what I described yet

You have the part numbers Leon. Let me know when you're finished.

Leon · 2011-06-04 15:07

Is that 5 MHz with an 80 MHz clock? The ARMs get 40 MHz.

I see that Digi-Key has plenty of the W25Q16BVSSIG , I'll get those. XMOS has an interesting new chip that Digi-Key should have in stock in a few days, I'll make the order up with some of those to avoid shipping costs.

jazzed · 2011-06-04 15:17

Leon wrote: »

What is that in MHz. The ARMs get 40 MHz.

I see that Digi-Key has plenty of the W25Q16BVSSIG , I'll get those.

That's what I'm using on SpinSocket-Flash
I'm getting about 48Mbps read rate at 96MHz.

User Name · 2011-06-04 20:00

Near as I can tell, the Grand Vision of XMOS is of a sea of processors seamlessly connected via serial links of some sort, and capable of adapting to most of the world's computational needs. Inasmuch as this becomes somewhat difficult to achieve in practice, most XMOS implementations so far seem to involve a single processor internally time sliced into four or eight pieces, creating multiple virtual processors, each with its own execution thread. This is an interesting alternative to the Propeller, which actually has eight physical CPUs.

Whether it is better to impose time-sharing on a single core or leave the allocation of processor resources up to the programmer (as is done in conventional single-core microcontrollers) is an interesting question. No doubt the quest for determinism is part of the logic behind the XMOS approach. Yet the more external events an XMOS chip has to handle, the more this determinism suffers, time-slicing not withstanding. XMOS users point out that the jitter is small, and much less than on conventional interrupt-driven processors. Meanwhile, XMOS, Inc. prefer to dispense with the word 'deterministic' entirely and replace it with 'predictable'.

So, the good news is that we have a lots of choices! So much depends on the application. IMHO, I don't think there is a processor in the world that makes perfect clocking of complex I/O as simple to achieve as the Propeller. If what you really need are multiple CPUs, nothing beats multiple CPUs. The miracle that Chip seems to have achieved is finding the right degree of coupling between them - at least for embedded control applications.

In reference to the quest of the OP, I'm sure a single-core XMOS chip can deliver adequate timing, ultimately. In the process, the XMOS chip will gain useful external storage and this storage might benefit heater's XZPU. So it's all good. It would be even better if the entry fee to give it a whirl weren't so high.

Leon · 2011-06-04 21:20

Heater has shown that the XMOS way of doing things with hardware threads is identical to the Propeller and its cogs interacting with hub memory.

Heater. · 2011-06-04 22:11

User Name,

A processor might have four phases of execution of a single instruction, like:
1) Instruction Decode
2) Register Read
3) Execute
4) Result write

So it takes 4 clocks to dispatch one instruction. How can we speed this up?.
Use a pipeline, have 4 instructions in flight through the CPU at a time. This is a common approach but suffers from pipeline stalls when you hit a jump instruction. all of a sudden you have to flush the pipe and start again. This impacts performance and execution determinism.

Or you could have 4 threads being executed simultaneously, at any given moment an instruction from each thread is in some phase of execution.

This gives you deterministic timing, and keeps your CPU pipe full at all time for maximum performance.

This is what XMOS does and also IBM in their 1.6 Billion transistor 16 core, 256 thread PowerEN chip.

jmg · 2011-06-04 22:39

Leon wrote: »

XMOS has an interesting new chip that Digi-Key should have in stock in a few days

Do you mean the TQFP48 packaged XS1, or something that really is an interesting new chip ?

Leon · 2011-06-05 00:57

That's the one. It's interesting in that they seem to be adopting Microchip's very successful strategy of making a wide range of similar devices that enable substantial cost savings to be made when manufacturing products on a large scale. Why pay for features that aren't needed?

I placed my Digi-Key order yesterday, I should get the Winbond Quad SPI chips on Wednesday.

I've designed a breakout board for the chip, the schematic and PCB layout are attached. It's intended for use with a 0.64 mm square header and socket.

I could get a batch of boards made, if anyone is interested.

hinv · 2011-06-05 08:08

Jazzed,
Have you posted the code that can do it on the propeller? How many cogs does it take?
What performance can you get from it?
Is the nibble mode on SDcards the same?

Thanks,
Doug

Leon · 2011-06-05 08:13

Jazzed hasn't posted any code, AFAIK.

He mentioned the performance. It's rather slow at 5 MHz for an 80 MHz clock, with NXP achieving 40 MHz with their ARM Cortex devices. They do it in hardware, though.

Nibble mode on an SD card is completely different, see the data sheet.

jazzed · 2011-06-05 10:10

I've posted code for this in a couple of threads already. MIT code is attached here.

The interface needs 10 pins with P0..7 for data and 2 pins for CS/CLK.
A single COG programming and buffered read driver is available.
Flash to HUB read performance is the best achievable with Propeller.
SpinSocket-Flash modules run at 96MHz clkfreq so buffer reads are 6 MBytes/s.
The idea was originally introduced in the SpinSocket thread.
Details of how the driver works and rough code is posted here.

The SpinSocket-Flash modules are available with 4MB of Winbond W25Q flash.
SpinSocket is a 32 pin DIP socket stackable module series.
A 256KB SRAM board and several SpinSocket-Flash versions available.
I've been working on a driver for a SpinSocket-Flash/SRAM module.
The Flash/SRAM module will have 1 QuadSPI Flash and 1 single bit SPI SRAM.

XBASIC is available for running programs that are stored in Flash.
XBASIC can also run on any simple Propeller configuration.
XBASIC is an ongoing development by David Betz that targets many MCUs.

Leon · 2011-06-05 10:32

Thanks for the code, Steve. I'll use it to check my board with a Propeller.

jazzed · 2011-06-05 10:42

Leon wrote: »

Thanks for the code, Steve. I'll use it to check my board with a Propeller.

Here is a very simple test program. It does not try to program more than one sector, but the ability has been proven with other code. The result of the program should look like below. This is compiled with BSTC.

c:\Propeller\SpinSocket>bstc -d com6 -p0 -Ograux -L c:\bstc\spin TestW25q.spin
Brads Spin Tool Compiler v0.15.4-pre3 - Copyright 2008,2009 All rights reserved
Compiled for i386 Win32 at 19:57:58 on 2010/01/15
Loading Object TestW25q
Loading Object gw25qx2
Loading Object FullDuplexSingleton
Program size is 7320 longs
2 Constants folded
Compiled 771 Lines of Code in 0.015 Seconds
We found a Propeller Version 1
Propeller Load took 2.142 Seconds

c:\Propeller\SpinSocket>uterm com6 115200

MFG ID   EF
DEV TYPE 40
erase sector
read byte 000000FF
read word 0000FFFF
read long FFFFFFFF
fill buffer
prog buffer
clear buffer
read byte 55030407
read word 00000155
read long 03020155
read buffer
show buffer

080800 55 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
080810 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
080820 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
080830 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
080840 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
080850 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
080860 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
080870 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
080880 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
080890 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
0808A0 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
0808B0 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
0808C0 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
0808D0 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
0808E0 E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
0808F0 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
080900 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
080910 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
080920 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
080930 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
080940 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
080950 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
080960 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
080970 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
080980 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
080990 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
0809A0 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
0809B0 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
0809C0 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
0809D0 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
0809E0 E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
0809F0 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
test complete.

User Name · 2011-06-05 14:00

Leon wrote: »

Heater has shown that the XMOS way of doing things with hardware threads is identical to the Propeller and its cogs interacting with hub memory.

ZPU is an effort to create a general-purpose C engine out of an embedded controller. No flavor of heater's ZPU requires or even benefits from exact timing. 10 ns here or there wouldn't be noticed. So, for his purposes the two approaches are 'identical' even though they aren't really identical. XMOS themselves admit their processors are predictable rather than deterministic, and that I/O activity makes prediction more difficult.

The Propeller is first and foremost an embedded controller. In that application it excels. That it can be pressed into other service with adequate results is interesting/amusing/curious. You pick.

Leon · 2011-06-05 15:49

It's nothing to do with ZPU! He showed that the two architectures were equivalent.

Where has XMOS stated that their processors are non-deterministic, and that I/O activity causes problems?

User Name · 2011-06-07 05:26

I think you are being thick-headed intentionally. Nevertheless I'll state it one more time. ZPU and many other applications don't require complete synchrony and precise timing. For those applications XMOS time slicing is equivalent to individual CPUs. When edges and pulse durations need to be superimposable - which is often the case in embedded control apps (for which the Propeller was designed) - then the two are NO LONGER equivalent, at least as implemented by XMOS and Parallax, respectively.

I did not use the words "causes problems." Once again, you are choosing to prevaricate. I said that I/O activity makes the job of timing prediction more difficult. It is axiomatic that in those applications which do not require prediction analysis, like ZPU, no difficulty of this particular sort is encountered.

Leon · 2011-06-07 06:23

You still haven't said where XMOS has stated that their processors are non-deterministic. Where has I/O timing caused problems for XMOS users?

The XMOS architecture maintains determinism (including I/O) across threads, cores and chips, connected by channels and XLinks. That can't be done with any other architecture, AFAIK.

K2 · 2011-06-07 07:03

I don't pretend to know anything about XMOS, but I was interested in this discussion. It took about 20 seconds to find on an XMOS forum this interesting post:

Strangely enough I agree with you "in most cases" and that we should stop flogging a dead horse with this debate. We just have a slightly different perspective on this determinism issue.

I have to attempt one last lash because in the limit determinism vaporizes, imagine:

1) One of my threads is required suck bits in or blow bits out as fast as a single core will go. This might end up as a software timed thing as setting up timers and such is extra instructions. As a
trivial case imagine implementing an an 8 bit comparator, two times 8 bits in, one bit out, working on the performance limit of a core.

2) All is well until I decide I need to make use of another thread for some other task, perhaps adopting some nice open source object whose internals I know nothing about.

Boom, if that extra functionality requires I now use more than 4 threads my original task starts to fail.

Of course the reverse is true, I might want to adopt some existing code but it won't fly because of what my application is doing.

The point is that given an "object A" and an "object B" that work perfectly well by themselves I can't be sure that they will work together without analysing the combination. The determinism of one
is intimately linked to the other. I cannot look at my nice comparator code in isolation and say that it will work in all applications all the time.

Now, as you say, in most cases this should not be an issue. When working well within performance limits and using timers to straighten things out.

The xcores do indeed do a far better job at this than a traditional MCU with interrupts etc. However its not quite the determinism you would expect from having 8 separate cores instead of 8 threads
in one core.

It is only this little quibble that causes me to question when it is claimed that the xcores have 100% execution determinism.

Curiously, the author of this post was some guy named Heater.

Probably a stupid question ... 2xQuadSPI support.

Comments