Using the DE0 as an embedded 140MHz P1V module with 32MB SDRAM and ADC + SHIELD
Peter Jakacki
Posts: 10,193
It seems to me that the DE0 makes a great little Propeller P1V demonstration board for embedding in projects. Certainly the faster speed helps and port B could be made to work with the internal ADC + buttons/switchs/accel as well plus bring a few extra pins out.
Now though looking at the SDRAM I'm wondering besides implementing the controller whether it might be possible to map it into the hub memory space, at least as byte/word memory. Of course I could be totally off here and I still have much research to do on the matter.
I am looking at an adapter PCB that with the clear cover removed will plug in over the top to add microSD and Ethernet too as well as VGA. I think it's possible to solder another header on the top side for the A/D+I/O connector too that will mate up with that "shield" too. The clear cover would be replaced with this shield of course.
Any thoughts?
Now though looking at the SDRAM I'm wondering besides implementing the controller whether it might be possible to map it into the hub memory space, at least as byte/word memory. Of course I could be totally off here and I still have much research to do on the matter.
I am looking at an adapter PCB that with the clear cover removed will plug in over the top to add microSD and Ethernet too as well as VGA. I think it's possible to solder another header on the top side for the A/D+I/O connector too that will mate up with that "shield" too. The clear cover would be replaced with this shield of course.
Any thoughts?
Comments
I know the -7 device has a 7ns CAS latency, just enough for 140MHz but I will look into the overall timing.
Because of the symmetry of the 2 40 pin connectors, it may be worth designing boards that can be 'halved', potentially mixed and matched. Eg one connector has almost all the signals required for 4 DACs, so you could plug in VGA or Component or Amplified DAC (audio etc) signals on that side depending on what was required. Perhaps even HDMI transmitter. You could put uSD and ethernet on the other side. There are a couple of pesky signals that cross over (reset, and a few Blue channel LSBs from memory), but thats it, and it may be possible to re-route Reset across to the input only pins adjacent to P0 and P1.
Regarding the analog header on the back, in some cases just connecting an IDC ribbon may be useful enough, but some signal conditioning would be great, if you do a board.
I have looked into the SDRAM on this board (as a theoretical exercise only). By reading through the data sheet for this memory and doing some analysis I have concluded that if you run it at 2x the prop clock you will get up to 3 independent access cycles per hub cycle at its maximum rated speed of 143MHz for the -7 part on this board. Each access could read/write up to 2 longs at a time with CAS latency set to 3. Note that slower clocked props could allow more COGs to share the memory.
You could carve this bandwidth up in different ways if you design your memory controller accordingly. For example at 143MHz:
It's a pity Terasic didn't fit the -6ns 166MHz rated SDRAM to this board, which would allow a nice regular 80MHz prop clock to be used. Though perhaps the DE-0 SDRAM can be overclocked a bit to 160MHz if you get lucky.
Personally I like the idea of a model where one COG can have a large hubexec space in SDRAM, shared with a hires video buffer read by a dedicated FPGA video controller, while the other COGs use hub SRAM for driver type functions. That's what I am hoping can be acheived on the DE-0 Nano or other similar boards.
A problem with SDRAM is it is not designed for super fast random access, but it is better in bursts.
It takes a number of clock cycles to meet latency and set commands and multiplexed address.
Looking at one SDRAM example seems to be a min of 6+ Clocks on read, so that sets the possible read rate.
You could likely keep up with one COGs needs, but if all 8 wanted at the memory at the same time, someone has to wait. (Then there is refresh to think about.)
It may be enough to design SDRAM for one-cog use ?
Then the challenge would be to design a SDRAM state engine that can manage Read/Write/Refresh, within 16 clocks.
To make use of the bandwidth, a 4 clk R/W might be best for a 64b transfer, supported by a tiny fifo.
I think that needs 10 clks @ 64b, that leaves 6 edges to [16], but there can be some overlap of commands, so maybe [Rd+refresh] or [Wr+refresh] can fit into 16 clks, and so 'appear' as Random SRAM ?
Yes if you do prop clock = SDRAM clock you will only get 16 clocks per hub instead of 32 in my examples above, this will basically restrict you to one random access per hub cycle. You may possibly be able refresh another bank in parallel, but if sustained accesses are made by the COG if could become a problem to refresh the currently accessed bank. It can get rather complex. It's better when you have more SDRAM clocks to play with per hub cycle but then you need to clock the prop slower than the memory.
Another issue to consider is that once you are locked to the hub cycle, for compatibility with critically timed code you want to be able to deliver the byte/word/long within 7 prop clock cycles to behave the same as the real P1 hardware. That is going to be difficult to achieve with 16 bit wide SDRAM if you clock it at the same rate as the prop clock and if you have CL=3, because that alone takes 7 clocks wihout any extra overhead cycles you might encounter somewhere in the pipeline. Once you clock the prop at 1/2 the SDRAM rate, timing frees up enormously.
Yes, that may be a better initial target. - and maybe burst length can be bumped a little to compensate for lost speed ?
(becomes like a nano-cache)
Yes, there is flexibility in that too. If you run the prop at 1/2 SDRAM speed, I figured you can do 1 dedicated refresh + 256 bit burst accesses using burst terminated page mode transfer on each hub cycle for a single COG access implementation. This could then rapidly start to fill a SRAM based cache line in the background for example (8 P1 instructions / hub cycle). Though the hub access would still be the bottleneck so caching may not buy too much for a single COG implementation unless you can return from the hub cycle early with cached data. Multi-cog SDRAM access may be a different story...