Shop OBEX P1 Docs P2 Docs Learn Events
Probably a stupid question ... 2xQuadSPI support. — Parallax Forums

Probably a stupid question ... 2xQuadSPI support.

jazzedjazzed Posts: 11,803
edited 2011-06-23 11:47 in Propeller 1
Can any MCU besides Propeller read/write 2 QuadSPI chips simultaneously?
«13

Comments

  • User NameUser Name Posts: 1,451
    edited 2011-06-04 13:27
    Well some of the Blackfin chips from Analog Devices have several of what they call SPORT ports. Seems like each SPORT is capable of more than one SPI transaction at a time. But don't trust ADI's dodgy drivers to make it all happen for you. You might just be spending months making everything work in synchrony. This is in stark contrast to the Propeller, which might take 1/2 hour to so equip.

    BTW, the Blackfin is also quite a bit more expensive. Its IDE is infinitely more expensive. :)
  • LeonLeon Posts: 7,620
    edited 2011-06-04 13:39
    XMOS, of course.
  • User NameUser Name Posts: 1,451
    edited 2011-06-04 13:42
    Leon wrote: »
    XMOS, of course.

    Curiously, Leon, the same thing applies here: "You might just be spending months making everything work in synchrony."
  • LeonLeon Posts: 7,620
    edited 2011-06-04 13:57
    Why? XMOS devices are fully deterministic, and equally demanding interfaces have been developed very quickly. If there are timing problems, the XMOS Timing Analyzer tool will solve them very quickly.
  • jmgjmg Posts: 15,185
    edited 2011-06-04 14:08
    jazzed wrote: »
    Can any MCU besides Propeller read/write 2 QuadSPI chips simultaneously?

    The Atmel XMega's timers claim to do this, and some other vendors have state-engine support around the timers, so the support is not always called Quadrature Counting.

    Then you have the Asym-dual-cores, like NXPs M0/M4 pairing, and Freescales XGATE

    Or, you can use a CPLD, if you need speed and precision...
  • User NameUser Name Posts: 1,451
    edited 2011-06-04 14:14
    Oh, no. It has nothing to do with determinism nor the inherent quality and power of XMOS. It has to do with the "very quickly" part. Perhaps it would be quick for David May to implement. For lesser mortals and particularly the uninitiated, it's almost insurmountable.
  • jazzedjazzed Posts: 11,803
    edited 2011-06-04 14:19
    LIke I said, "probably a stupid question" ...

    LEON and jmg, can you point to specific demo code or app notes where 2 QuadSPI chips are being used in parallel to form an 8 bit bus with a chip select and clock that can be read/written by the processors? This description is a little different from the first post, but this is more precisely what I meant.

    @User Name, the code to do this for Propeller was indeed a very fast development effort.
    I can post my examples if that helps anyone.
  • jmgjmg Posts: 15,185
    edited 2011-06-04 14:25
    jazzed wrote: »
    LEON and jmg, can you point to specific demo code or app notes where 2 QuadSPI chips are being used in parallel to form an 8 bit bus with a chip select and clock that can be read/written by the processors? This description is a little different from the first post, but this is more precisely what I meant.

    Oops, I miss-read the QuadXXX to be Quad counting. Brain fade.

    For QuadSPI, that is new on uC, but ISTR a NXP upper-end variant claiming Dual QuadSPI capability ?
    More suppliers are starting to think about Execute-In-Place that QuadSPI can allow, but I have not seen QuadSPI SRAMs yet.
    Cypress had something in a press release, but no devices yet ?
  • LeonLeon Posts: 7,620
    edited 2011-06-04 14:35
    Why should development be any slower with XMOS than with the Propeller? One reason why XMOS chips are so popular is that development is typically very fast, with applications being completed far more quickly than is possible with other solutions.

    NXP has the LPC1800 Cortex-M3 device with a QuadSPI interface, and the dual-core LPC4000 devices also have one. They run at 40 MHz.

    Jazzed,

    I didn't say that QuadSPI devices had been interfaced to an XMOS device, just that it should be feasible.

    What are the QuadSPI devices that you interfaced to the Propeller? What speed do you get?
  • jazzedjazzed Posts: 11,803
    edited 2011-06-04 14:38
    Leon wrote: »
    Why should it be any slower with XMOS than with the Propeller? One reason why XMOS chips are so popular is that development is typically very fast, with applications being completed far more quickly than is possible with other solutions.

    So you don't intend to favor us with an example or app note even in a http link?
  • jazzedjazzed Posts: 11,803
    edited 2011-06-04 14:46
    Leon wrote: »
    Why should development be any slower with XMOS than with the Propeller? One reason why XMOS chips are so popular is that development is typically very fast, with applications being completed far more quickly than is possible with other solutions.

    NXP has the LPC1800 Cortex-M3 device with a QuadSPI interface, and the dual-core LPC4000 devices also have one.

    Jazzed,

    I didn't say that QuadSPI devices had been interfaced to an XMOS device, just that it should be feasible.

    What are the QuadSPI devices that you interfaced to the Propeller?

    That helps :) Except that the LPC1800 has only one QuadSPI interface.
    LPC4000 has two QuadSPI interfaces, but it is unclear if they can use 2 chips for simultaneous access.

    I'm using 2 W25Qxx devices. Rayman has drivers for the ST series SQI devices.
  • LeonLeon Posts: 7,620
    edited 2011-06-04 14:47
    http://ics.nxp.com/products/lpc4000/all/~LPC4322/

    http://ics.nxp.com/literature/leaflets/microcontrollers/pdf/lpc18xx.pdf

    What speed are you getting?

    What are the actual devices you used? I'll order a couple and see what I can do with one of my XMOS boards. Digi-Key seems to be the only source for the Winbond parts.
  • jazzedjazzed Posts: 11,803
    edited 2011-06-04 15:02
    Leon wrote: »
    Cross-posted. See previous reply.
    Leon wrote: »
    What speed are you getting?

    The maximum data rate from pins to hub vs-a-vs for propeller in one COG clkfreq/16. ARM with a special interace would trump that pretty easily.

    Anyway, as far as I can tell no other MCU can do what I described yet :)

    You have the part numbers Leon. Let me know when you're finished.
  • LeonLeon Posts: 7,620
    edited 2011-06-04 15:07
    Is that 5 MHz with an 80 MHz clock? The ARMs get 40 MHz.

    I see that Digi-Key has plenty of the W25Q16BVSSIG , I'll get those. XMOS has an interesting new chip that Digi-Key should have in stock in a few days, I'll make the order up with some of those to avoid shipping costs.
  • jazzedjazzed Posts: 11,803
    edited 2011-06-04 15:17
    Leon wrote: »
    What is that in MHz. The ARMs get 40 MHz.

    I see that Digi-Key has plenty of the W25Q16BVSSIG , I'll get those.

    That's what I'm using on SpinSocket-Flash
    I'm getting about 48Mbps read rate at 96MHz.
  • User NameUser Name Posts: 1,451
    edited 2011-06-04 20:00
    Near as I can tell, the Grand Vision of XMOS is of a sea of processors seamlessly connected via serial links of some sort, and capable of adapting to most of the world's computational needs. Inasmuch as this becomes somewhat difficult to achieve in practice, most XMOS implementations so far seem to involve a single processor internally time sliced into four or eight pieces, creating multiple virtual processors, each with its own execution thread. This is an interesting alternative to the Propeller, which actually has eight physical CPUs.

    Whether it is better to impose time-sharing on a single core or leave the allocation of processor resources up to the programmer (as is done in conventional single-core microcontrollers) is an interesting question. No doubt the quest for determinism is part of the logic behind the XMOS approach. Yet the more external events an XMOS chip has to handle, the more this determinism suffers, time-slicing not withstanding. XMOS users point out that the jitter is small, and much less than on conventional interrupt-driven processors. Meanwhile, XMOS, Inc. prefer to dispense with the word 'deterministic' entirely and replace it with 'predictable'.

    So, the good news is that we have a lots of choices! So much depends on the application. IMHO, I don't think there is a processor in the world that makes perfect clocking of complex I/O as simple to achieve as the Propeller. If what you really need are multiple CPUs, nothing beats multiple CPUs. The miracle that Chip seems to have achieved is finding the right degree of coupling between them - at least for embedded control applications.

    In reference to the quest of the OP, I'm sure a single-core XMOS chip can deliver adequate timing, ultimately. In the process, the XMOS chip will gain useful external storage and this storage might benefit heater's XZPU. So it's all good. It would be even better if the entry fee to give it a whirl weren't so high.
  • LeonLeon Posts: 7,620
    edited 2011-06-04 21:20
    Heater has shown that the XMOS way of doing things with hardware threads is identical to the Propeller and its cogs interacting with hub memory.
  • Heater.Heater. Posts: 21,230
    edited 2011-06-04 22:11
    User Name,

    A processor might have four phases of execution of a single instruction, like:
    1) Instruction Decode
    2) Register Read
    3) Execute
    4) Result write

    So it takes 4 clocks to dispatch one instruction. How can we speed this up?.
    Use a pipeline, have 4 instructions in flight through the CPU at a time. This is a common approach but suffers from pipeline stalls when you hit a jump instruction. all of a sudden you have to flush the pipe and start again. This impacts performance and execution determinism.

    Or you could have 4 threads being executed simultaneously, at any given moment an instruction from each thread is in some phase of execution.

    This gives you deterministic timing, and keeps your CPU pipe full at all time for maximum performance.

    This is what XMOS does and also IBM in their 1.6 Billion transistor 16 core, 256 thread PowerEN chip.
  • jmgjmg Posts: 15,185
    edited 2011-06-04 22:39
    Leon wrote: »
    XMOS has an interesting new chip that Digi-Key should have in stock in a few days

    Do you mean the TQFP48 packaged XS1, or something that really is an interesting new chip ?
  • LeonLeon Posts: 7,620
    edited 2011-06-05 00:57
    That's the one. It's interesting in that they seem to be adopting Microchip's very successful strategy of making a wide range of similar devices that enable substantial cost savings to be made when manufacturing products on a large scale. Why pay for features that aren't needed?

    I placed my Digi-Key order yesterday, I should get the Winbond Quad SPI chips on Wednesday.

    I've designed a breakout board for the chip, the schematic and PCB layout are attached. It's intended for use with a 0.64 mm square header and socket.

    I could get a batch of boards made, if anyone is interested.
    1024 x 493 - 32K
    1024 x 493 - 44K
  • hinvhinv Posts: 1,255
    edited 2011-06-05 08:08
    Jazzed,
    Have you posted the code that can do it on the propeller? How many cogs does it take?
    What performance can you get from it?
    Is the nibble mode on SDcards the same?

    Thanks,
    Doug
  • LeonLeon Posts: 7,620
    edited 2011-06-05 08:13
    Jazzed hasn't posted any code, AFAIK.

    He mentioned the performance. It's rather slow at 5 MHz for an 80 MHz clock, with NXP achieving 40 MHz with their ARM Cortex devices. They do it in hardware, though.

    Nibble mode on an SD card is completely different, see the data sheet.
  • jazzedjazzed Posts: 11,803
    edited 2011-06-05 10:10
    I've posted code for this in a couple of threads already. MIT code is attached here.

    The interface needs 10 pins with P0..7 for data and 2 pins for CS/CLK.
    A single COG programming and buffered read driver is available.
    Flash to HUB read performance is the best achievable with Propeller.
    SpinSocket-Flash modules run at 96MHz clkfreq so buffer reads are 6 MBytes/s.
    The idea was originally introduced in the SpinSocket thread.
    Details of how the driver works and rough code is posted here.

    The SpinSocket-Flash modules are available with 4MB of Winbond W25Q flash.
    SpinSocket is a 32 pin DIP socket stackable module series.
    A 256KB SRAM board and several SpinSocket-Flash versions available.
    I've been working on a driver for a SpinSocket-Flash/SRAM module.
    The Flash/SRAM module will have 1 QuadSPI Flash and 1 single bit SPI SRAM.

    XBASIC is available for running programs that are stored in Flash.
    XBASIC can also run on any simple Propeller configuration.
    XBASIC is an ongoing development by David Betz that targets many MCUs.
  • LeonLeon Posts: 7,620
    edited 2011-06-05 10:32
    Thanks for the code, Steve. I'll use it to check my board with a Propeller.
  • jazzedjazzed Posts: 11,803
    edited 2011-06-05 10:42
    Leon wrote: »
    Thanks for the code, Steve. I'll use it to check my board with a Propeller.
    Here is a very simple test program. It does not try to program more than one sector, but the ability has been proven with other code. The result of the program should look like below. This is compiled with BSTC.
    c:\Propeller\SpinSocket>bstc -d com6 -p0 -Ograux -L c:\bstc\spin TestW25q.spin
    Brads Spin Tool Compiler v0.15.4-pre3 - Copyright 2008,2009 All rights reserved
    Compiled for i386 Win32 at 19:57:58 on 2010/01/15
    Loading Object TestW25q
    Loading Object gw25qx2
    Loading Object FullDuplexSingleton
    Program size is 7320 longs
    2 Constants folded
    Compiled 771 Lines of Code in 0.015 Seconds
    We found a Propeller Version 1
    Propeller Load took 2.142 Seconds
    
    c:\Propeller\SpinSocket>uterm com6 115200
    
    MFG ID   EF
    DEV TYPE 40
    erase sector
    read byte 000000FF
    read word 0000FFFF
    read long FFFFFFFF
    fill buffer
    prog buffer
    clear buffer
    read byte 55030407
    read word 00000155
    read long 03020155
    read buffer
    show buffer
    
    080800 55 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
    080810 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
    080820 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
    080830 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
    080840 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
    080850 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
    080860 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
    080870 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
    080880 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
    080890 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
    0808A0 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
    0808B0 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
    0808C0 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
    0808D0 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
    0808E0 E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
    0808F0 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
    080900 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
    080910 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
    080920 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
    080930 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
    080940 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
    080950 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
    080960 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
    080970 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
    080980 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
    080990 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
    0809A0 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
    0809B0 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
    0809C0 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF
    0809D0 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF
    0809E0 E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF
    0809F0 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF
    test complete.
    
  • User NameUser Name Posts: 1,451
    edited 2011-06-05 14:00
    Leon wrote: »
    Heater has shown that the XMOS way of doing things with hardware threads is identical to the Propeller and its cogs interacting with hub memory.

    ZPU is an effort to create a general-purpose C engine out of an embedded controller. No flavor of heater's ZPU requires or even benefits from exact timing. 10 ns here or there wouldn't be noticed. So, for his purposes the two approaches are 'identical' even though they aren't really identical. XMOS themselves admit their processors are predictable rather than deterministic, and that I/O activity makes prediction more difficult.

    The Propeller is first and foremost an embedded controller. In that application it excels. That it can be pressed into other service with adequate results is interesting/amusing/curious. You pick.
  • LeonLeon Posts: 7,620
    edited 2011-06-05 15:49
    It's nothing to do with ZPU! He showed that the two architectures were equivalent.

    Where has XMOS stated that their processors are non-deterministic, and that I/O activity causes problems?
  • User NameUser Name Posts: 1,451
    edited 2011-06-07 05:26
    I think you are being thick-headed intentionally. Nevertheless I'll state it one more time. ZPU and many other applications don't require complete synchrony and precise timing. For those applications XMOS time slicing is equivalent to individual CPUs. When edges and pulse durations need to be superimposable - which is often the case in embedded control apps (for which the Propeller was designed) - then the two are NO LONGER equivalent, at least as implemented by XMOS and Parallax, respectively.

    I did not use the words "causes problems." Once again, you are choosing to prevaricate. I said that I/O activity makes the job of timing prediction more difficult. It is axiomatic that in those applications which do not require prediction analysis, like ZPU, no difficulty of this particular sort is encountered.
  • LeonLeon Posts: 7,620
    edited 2011-06-07 06:23
    You still haven't said where XMOS has stated that their processors are non-deterministic. Where has I/O timing caused problems for XMOS users?

    The XMOS architecture maintains determinism (including I/O) across threads, cores and chips, connected by channels and XLinks. That can't be done with any other architecture, AFAIK.
  • K2K2 Posts: 693
    edited 2011-06-07 07:03
    I don't pretend to know anything about XMOS, but I was interested in this discussion. It took about 20 seconds to find on an XMOS forum this interesting post:
    Strangely enough I agree with you "in most cases" and that we should stop flogging a dead horse with this debate. We just have a slightly different perspective on this determinism issue.

    I have to attempt one last lash because in the limit determinism vaporizes, imagine:

    1) One of my threads is required suck bits in or blow bits out as fast as a single core will go. This might end up as a software timed thing as setting up timers and such is extra instructions. As a
    trivial case imagine implementing an an 8 bit comparator, two times 8 bits in, one bit out, working on the performance limit of a core.

    2) All is well until I decide I need to make use of another thread for some other task, perhaps adopting some nice open source object whose internals I know nothing about.

    Boom, if that extra functionality requires I now use more than 4 threads my original task starts to fail.

    Of course the reverse is true, I might want to adopt some existing code but it won't fly because of what my application is doing.

    The point is that given an "object A" and an "object B" that work perfectly well by themselves I can't be sure that they will work together without analysing the combination. The determinism of one
    is intimately linked to the other. I cannot look at my nice comparator code in isolation and say that it will work in all applications all the time.

    Now, as you say, in most cases this should not be an issue. When working well within performance limits and using timers to straighten things out.

    The xcores do indeed do a far better job at this than a traditional MCU with interrupts etc. However its not quite the determinism you would expect from having 8 separate cores instead of 8 threads
    in one core.

    It is only this little quibble that causes me to question when it is claimed that the xcores have 100% execution determinism.

    Curiously, the author of this post was some guy named Heater.
Sign In or Register to comment.