Embedded FPGA options for P1?

David Betz · 2016-05-23 11:59

rogloh wrote: »

By the way, I just ran an experiment with the 10M25. I have 8 COGs and all my other extras - SDRAM/SRAM/Video/UFM FLASH/MUL/AutoIncOpcodes/LinkReg/PortB now fitting in 19760 of the 24960 LE’s (79%). Each COG takes about 1730 LEs in its own right including its two counters and ALU but not its video generators which I removed in my build.

Wow! Do you have a full description of your additions somewhere? Because of the large memory capacity, this might even outperform the P2 for some applications. What board do you recommend for experimenting with the 10M25?

jmg · 2016-05-23 23:49

David Betz wrote: »

... What board do you recommend for experimenting with the 10M25?

That's a moving target and 10M25 is somewhat new into the Board mix.

https://www.altera.com/products/fpga/max-series/max-10/design-tools.html#

There is a 10M50 showing for $125 (has HDMI), and no price showing yet on the nice looking 10M25 HyperMAX ?

rogloh · 2016-05-23 23:51

T Chap wrote: »

Thanks for the info on your tests. Does the FMax of 60 imply that code adapter for 80Mh on P1 would have to be reworked to run at 60?

I think the MUL needs to be looked at. I recall when I added that into my own codebase the Fmax suddenly dropped a lot. It is because it is a 32 bit x 32 bit multiplier made up with multiple parallel 18x18 DSP blocks and summed together a couple of times sequentially. Inside the ALU path it all needs to happen in 1 P1V clock for Quartus TimeQuest timing analyzer to be happy to say it meets timing. This seems to pushing it a bit at 80MHz far so if you break it up into smaller registered portions over two or three clocks it will speed it up again no doubt.

I have also found that these Fmax numbers are reasonably conservative and they do get computated at the extreme temperate/process variation points, and you can typically run quite a bit faster (but no guarantees there).

rogloh · 2016-05-23 23:55

David Betz wrote: »

Wow! Do you have a full description of your additions somewhere? Because of the large memory capacity, this might even outperform the P2 for some applications. What board do you recommend for experimenting with the 10M25?

I have various posts describing different parts as I have done quite a few separate things, not in one place. But they are all related to my ongoing efforts... I don't have a 10M25 based board unfortunately, only a 10M08 to play with. But the board that jmg linked to looks interesting when it becomes available.

I do think a P1V with its hub memory bus directly mapped to external mem via RDLONG pathway is still a good option for me. P2V may get there one day...

David Betz · 2016-05-23 23:59

rogloh wrote: »

David Betz wrote: »

Wow! Do you have a full description of your additions somewhere? Because of the large memory capacity, this might even outperform the P2 for some applications. What board do you recommend for experimenting with the 10M25?

I have various posts describing different parts as I have done quite a few separate things, not in one place. But they are all related to my ongoing efforts... I don't have a 10M25 based board unfortunately, only a 10M08 to play with. But the board that jmg linked to looks interesting when it becomes available.

I do think a P1V with its hub memory bus directly mapped to external mem via RDLONG pathway is still a good option for me. P2V may get there one day...

Can you execute code from external memory? What about hub memory?

T Chap · 2016-05-24 00:32

I started an eagle library for MAX10M25SC single voltage 3v3 / compact. I have not been able to find a reference schematic, does the BE Micro MAX10 come with a schematic in it's documents? There are 144 pins on here, not sure where to start with the IO. I use an older eagle on a G4 from way back, I didn't like the changes they did on OSX so I kept this as a dedicated PCB machine, plus it has my older libraries and I could never figure out how to translate to the newer OSX version. Not sure if anyone could open it but I will post the library if someone wants it later. I have been looking but cannot find a library on line for MAX10. Here is the Pinout. If all goes well I will make some boards for this in the next month with HyperRam, EVE2 optional. I am trying to make a basic tester for some amount of direct RGB pins, OR EVE2 so you can jumper with 0603's to select what goes to the FPC 40 LCD connector. EVE2 has 18 bits, 6/6/6. I am not sure what makes sense on the P1V for LCD. I don't need 24bits. Suggestions on LCD pins? I see a lot of RGB methods. I also considered putting a P1 routed to the P1V with some wires either routed or jumperable for a P1+P1V combo for testing that concept for future use. First thing is it to see how to route the chip.

Tubular · 2016-05-24 00:54

Ah yeah its been a while since I was deeply involved in the docs

The pinouts, which is probably the key thing you're missing, are here
https://www.altera.com/support/literature/lit-dp.html#MAX-10

There is also a long guide about what each pin does, that you'll need

Finally, check out the altera design store. You can drill down on a lot of different designs. Often there is a single key link that opens up all the stuff you're looking for, but you have to find it!...

jmg · 2016-05-24 00:54

T Chap wrote: »

... I am trying to make a basic tester for some amount of direct RGB pins, OR EVE2 so you can jumper with 0603's to select what goes to the FPC 40 LCD connector. EVE2 has 18 bits, 6/6/6. I am not sure what makes sense on the P1V for LCD. I don't need 24bits. Suggestions on LCD pins?

If you have a FPC40, that supports 24 bits ?
For LCD-only use, choices are to install 2 HyperRAM and feed 16b direct to LCD and clock from P1V, or use a single HyperRAM to save cost, and either
1) simply de-mux to any of 16/18/24 in some simply verilog, or
2) use more comprehensive Verilog to make a CLUT, which cuts the pixel bandwidth needed in P1V. This is also more important is you want VGA, as I think smoothing is needed there, and the CLUT can have a FIFO that does that.
(starts to sound like a portion of P2 ?

)

I was thinking about a small cell that does 1 x HyperRAM->CLUT-RGB, and effectively runs the display with no CPU involvement.

Tubular · 2016-05-24 00:58

Here's the one
https://www.altera.com/en_US/pdfs/literature/dp/max-10/PCG-01018.pdf

T Chap · 2016-05-24 01:03

FPC uses 24 pins for RGB to Newhaven which is what I like, not sure about others. So you are saying that a direct HyperRam to LCD with 16 bits is the way to go versus reading it then driving the RGB pins. I see what you mean, but that means it kills the HyperRam for any other use right? However, I am not sure what other use I'd have for it anyway. I can't visualize how you drive the LCD yet with this method, except for maybe you would populate the ram with an a complete screen worth of pixel data? Then cycle each Word starting at an address for 800x480 iterations then start over at Word 1? That stuff is a bit over my head still. Interesting to drive 16 bits of RGB with 16bits of ram, the P1V doesn't need pins for RGB then.

T Chap · 2016-05-24 01:17

Thanks Tubular, that is a really deep stuff. Way over my head, I need to dissect an eval board schematic, but I have yet to find this specific chip on an eval board.

jmg · 2016-05-24 01:34

T Chap wrote: »

So you are saying that a direct HyperRam to LCD with 16 bits is the way to go versus reading it then driving the RGB pins.

No, just that that is simplest. (you could do that on a P1 )

T Chap wrote: »

... but that means it kills the HyperRam for any other use right? However, I am not sure what other use I'd have for it anyway.

Pretty much, tho you could time-share it at the cost of a reduced pixelupdate rate.

T Chap wrote: »

I can't visualize how you drive the LCD yet with this method, except for maybe you would populate the ram with an a complete screen worth of pixel data? Then cycle each Word starting at an address for 800x480 iterations then start over at Word 1? That stuff is a bit over my head still.

You could try that, gating RWDS to the LCD via an Edge-doubler (XOR gate+RC), and work on the basis that reading does refresh, and ignore any unused RAM as 'don't care'.
This would probably be OK for VGA too, and would match well with P1.

The Data hints at this sort of mode :
"The host system may also effectively increase the tCMS value by explicitly taking responsibility for performing all refresh and doing burst refresh reading of multiple sequential rows in order to catch up on distributed refreshes missed by longer transactions."

The limit here would be 64ms, for full read and any write windows.

However, if you want to keep all RAM, there is one HyperRAM detail which is something of a royal pain, and that is the tCMS, which places a upper limit of 4us on read lengths, and refresh is 'snuck into' the Address-> Data latency.
You can set the part (default) so that refresh window is always present.
Each time you release CS#, at that 4us rate, you have to re-address and add latency window, before new data arrives.

That dictates short bursts of faster reading, and then slower, evenly paced playback, and you could then add windows at around that 4us rate, of other RAM access, so you share the BUS with Video and Data.
That's more verilog...

- but certainly is P1V territory.

Byte-code fetch anyone ?

Tubular · 2016-05-24 01:46

T Chap wrote: »

Thanks Tubular, that is a really deep stuff. Way over my head, I need to dissect an eval board schematic, but I have yet to find this specific chip on an eval board.

In Altera's cross compatible tables, the EQFP package appears compatible ("vertical migration) from 10M04 up to 10M25, then there's a new set for 10M40/10M50. This means you (should) be able to use exactly the same schematic for 10M25 as 10M08, as long as you have the same power supply version (ie 10M08S->10M25S, 10M08D->10M25D)

The simplest board I've seen, in terms of schematic is the axelsys $50 Altera Max10 Evaluation board, link here
http://wl.altera.com/products/devkits/altera/kit-max-10-evaluation.html

To get the schematic, it claims you need to 'install' but don't worry if you don't have Altera installed, it will still write to a directory, in my (windows) case C:\altera\14.0sp2\kits\max10_10M08E144_eval

If you're not running windows I can PM you the schematic pdf

You'll need a jtag programmer, I think Terasic do a cheap one, and there are many others

rogloh · 2016-05-24 01:52

David Betz wrote: »

Can you execute code from external memory? What about hub memory?

Yes I can execute code from external memory with GCC support - it is mapped into the hub memory space for one COG so RDLONG/WRLONG etc just access the external SDRAM or external SRAM or internal hub RAM data depending on the 32 bit address. See my other thread.

T Chap · 2016-05-24 02:27

Tubular, thanks for that link. Hopefully there is a sch in there somewhere. I got the library done based on the documents I found for 10M25SC. I'd be curious if anyone was able to open the eagle lib. MAX10M25SCpad part. Also I included a blank schematic and board with only the MAX10 sitting on it, curious if anyone could open this older version eagle file.

rogloh · 2016-05-25 01:04

I'll try to answer your post to my other SDRAM thread here, seems like a better place.

You can map named prop IO pins to specific FPGA pins in the Pin Assignment editor in Quartus and can in general place them flexibly around your FPGA. You may want to consider IO bank voltage limitations and try to group your related pins in the same bank(s) for example.

You need to use the USB blaster to download the image to the FPGA (either RAM based .sof or permanent flash based .pof)

You need to plug in a prop plug to the pins 30 and 31 of port A for the UART access to load your code by PropTool.

T Chap · 2016-05-25 01:08

I have gotten all the IO transcribed off the MAX10M25SC EQPF144 eval board.

http://wl.altera.com/products/devkits/altera/kit-max-10-evaluation.html

This board has 14io connected to the Arduino pads, and the rest of the IO connected to headers. There is a USB plug for power only, nothing else connected so it requires USB Blaster module. I have not yet found a USB Blaster schematic to copy. It is also not yet clear how you guys are programming the BE Micro for both the image and the Prop code. Do I include a USB chip for IO just like the FTDI/SIlabs works on the regular Prop boards or must the Prop code get flashed via USB Blaster the same as the Verilog image.

I was going to use the same 14io they used for Arduino, but I need to select the balance of the 64io for the P1V. Not sure how to do this yet. There are a number of other things to sort out, such as are the 2.5v Ref voltages needed on the non Analog FPGA, I think not. VREFB1N0-VREFB8 are not clear whether they need me to provide those voltages to each bank, on this board they go to the headers that are not being used. Each bank does have VCCIO/VCCA/VCC_Core voltages, so it is not clear what the VREFB voltage does, as in this eval board it must be provided from the headers.

This board requires programmer. 300 for Altera Blaster! I see cheap versions online ebay etc.

I would like to buy this eval board for 50.00 and see how it works for P1V and sort how what IO to use to get Port a/b. Any thoughts of whether it will work just the same as the BE Micro Max10?

Tubular · 2016-05-25 01:18

This is the Terasic blaster, that I'd go with
http://www.terasic.com.tw/cgi-bin/page/archive.pl?No=46

I'm sure there are cheaper but this looks hassle free

T Chap · 2016-05-25 04:38

Good info. Thanks! Just got the libraries for EVE2, HyperRam, MAX10M25 done. Dragging the basic parts on and will start to sort out what this tester board will have on it. So far, P1+ P1V-Max10M25, 2 HyperRam, EVE2. I need to find a camera to test a processor to get cam to digital then to the EVE2. Ideally need a way to send multiple cameras over distances of a few hundred feet( RS485?) and select what cam to watch on the LCD. I may experiment with the HyperRam driving the LCD with a set of jumpers to change configs, but realistically I am pretty sold on just using the screen editor and slick functions builtin to EVE2. I was thinking the 2 HR's will allow for buffering video.

jmg · 2016-05-25 05:02

Just curious, how does a P2 footprint look fitted inside that 10M25 ?

Leon · 2016-05-25 07:25

I've got aTerasic Blaster and have never had any problems with it. Some years ago I bought a Chinese Sunshine Blaster and that also works OK. The casing looks rather crude which is why I got the Terasic one.

Martin Hodge · 2016-05-25 20:15

don't forget the JTAG connection.

T Chap · 2016-05-25 20:44

I am copying the eval board which includes the USB Blaster connection, it has JTAGEN on it connected to the USB Blaster port. I assume that's what you mean.

Martin Hodge · 2016-05-25 22:13

TMS,TDO,TDI, and TCK are the four signals you need to break out for JTAG programming. I didn't see a J10 on your image above, but maybe you haven't added it yet.

The "USB Blaster" nonsense is just Altera's hypese for "JTAG controller".

Here's an image of a board I did for an Atari Falcon. You can see the 10 pin JTAG connector next to the Altera CPLD. Something like that is what you'll need to program the MAX 10 (assuming the MAX-10 programs the same as the MAX-7000S I've used)

T Chap · 2016-05-25 22:45

Correct, first just going through the schematic choosing what to include, J10 will be there. I was concerned about what pins would be usable for the 64io for p1V, and was told that any pin that is marked IO will behave just like a GPIO on a process. Although they are some marked high speed for better performance, others low speed, others for VREF that have higher capacitance. So there are lots of pins to choose from.

One thought I am having to protect the pins from damage is for all 64 IO to have something like a 220R in series to help in case of a shorted pin on this tester board. A 0hm jumper R can be added to replace a higher value R. I don't yet have the eval board yet to start learning. Does anyone with experience on these boards know if you are programming via the regular IDE's ie BST/ Prop tool etc is done via USB to the USB blaster? Or can a FTDI or other serial uart be connected to two pins ie 30/31 as in the P1.

jmg · 2016-05-26 00:27

T Chap wrote: »

Does anyone with experience on these boards know if you are programming via the regular IDE's ie BST/ Prop tool etc is done via USB to the USB blaster? Or can a FTDI or other serial uart be connected to two pins ie 30/31 as in the P1.

I think there are many answers to that question...
The FPGA of course, needs some loader path. (In P123, that is a Prop1)
Then, you have a choice to load P1 from the FPGA Flash, - in that case, the FPGA.JTAG would be needed to preload that Flash.

For general or early P1V code development, you probably do not want to use that FPGA flash, and once the P1V is booted you can load over the usual Serial loader.

You could do what Lattice do, which is use a FT2232H, and that gives both a JTAG load path, and a good user UART path.
I'm guessing there is Altera support for the FTDI JTAG ?

Or, if the P1 loader on P123 really is faster than Altera's flows, you could look at using that, and a simpler dual UART like CP2105 ?
The resident P1 can then manage Debug too.

T Chap · 2016-05-26 01:21

The FPGA of course, needs some loader path. (In P123, that is a Prop1) FPGA of course, needs some loader path. (In P123, that is a Prop1)

I thought there was only the USB Blaster or a variant JTAG loader?

Then, you have a choice to load P1 from the FPGA Flash, - in that case, the FPGA.JTAG would be needed to preload that Flash.

Sorry for the confusion, I will have an eval board next week and can start to make sense of the process. Right now it is just very unclear what goes on. Assumptions:

1. You must load the verilog P1V binary to the MAX10 ( Flash storage?) for permanent storage of the P1v Image, then the MAX becomes a P1V minus any SPIN/PASM

NEXT

2. You must load the SPIN binary for your program, just like loading any p1. And this is loaded to the MAX10 flash area as well, since there is no EEPROM like on the p1. How is it loaded to Flash?

I header numbers like 25 seconds to download. Download what? p1V image, Spin Binary? Then there are faster ways to download( which would really be needed for development of code else 25 seconds is a killer)

Is there anywhere that explains this stuff concisely or do you really have to buy a board and figure it out from scratch?

jmg · 2016-05-26 01:49

T Chap wrote: »

The FPGA of course, needs some loader path. (In P123, that is a Prop1) FPGA of course, needs some loader path. (In P123, that is a Prop1)

I thought there was only the USB Blaster or a variant JTAG loader?

I believe P123 uses a Prop1 and a FT231X, so you would need to drill into the Schematic and code to see which of many FPGA loading choices they are using.

T Chap wrote: »

1. You must load the verilog P1V binary to the MAX10 ( Flash storage?) for permanent storage of the P1v Image, then the MAX becomes a P1V minus any SPIN/PASM

Yes.

T Chap wrote: »

2. You must load the SPIN binary for your program, just like loading any p1. And this is loaded to the MAX10 flash area as well, since there is no EEPROM like on the p1. How is it loaded to Flash?

I think generic/standard P1V does use EEPROM. (aka external loader memory)

What has further confused the P1V mix somewhat, is this quite recent advance by rogloh

http://forums.parallax.com/discussion/comment/1377145/#Comment_1377145

May 23:
"So after some mucking about this weekend I was able to get the P1V booting from internal MAX10 user flash memory blocks (UFM0/UFM1), and I've modified the booter to enable writing applications to it as well, instead of needing the separate i2c eeprom as you normally would. Ran into plenty of issues with converting file formats, MIF/HEX/binary, 8 bit, 32 bit, endianness, Quartus strangeness etc, but finally sorted it out in the end"

The appeal of this path, is security, but for initial developments, I would include an external loader memory.
For more confusion,

that external loader memory can be EEPROM, but advances have given other choices....

eg In that First attempt at building a P1V image for the BeMicro Max10 thread, there is

Ozpropdev: "I'm currently modifying the booter to support the 2Mb x 8 SPI flash (external) to hold spin/pasm code." ..."Just got the modified booter running so spin/pasm code loads/runs from BeMicro's M25P16 SPI flash"

Cheapest current SPI seems to be FT25H16T, which is 16Mb, $0.13860/1k, so unless you actually need the EEPROM as EEPROM, it could pay to find the SPI ROM loader code mentioned above ... ( & maybe that (UFM0/UFM1) loader too... )

T Chap · 2016-05-26 02:03

OK, so the modified booter is the key here. I don't care about that yet, just need toboot off eeprom, so for standard boot just put the same eepron on the same scl sda pins as P1. At that point, I can load the p1v the same over my CP2110 in p30/31. Which should be no slower than typical P1 load times. Thanks for straightening this out.

T Chap · 2016-05-26 21:19

Made a ton of progress figuring this out. Many thanks jmg/Tub/rog and others for advice on these few threads. I got Quartus 15.0 and at least was able to find the pin assignments so that parts is a lot less mystical. I am glad you can assign any pin to the port a/b io. That makes it easier to have different assignments for different test configs. Still a ton to weed through, ie set number of cogs, add port b. change from eeprom boot to SPI boot(seems this is posted on one of the MAX 10 threads), turn on off video, counters, etc.

I can add some HRam for testing only and maybe even just make a small breakout board with a pin header layout. But for this test board the focus is on EVE3 for LCD use. Suggestions welcome for HR connections if you want one of the P1+P1VMax10M25 boards let me know. It would be nice for a few others to have this board to play with and help me with tests if interested. Options for P1<>P1V com with minimal overhead/pins? For simple stuff just to send opcodes and simple parameters back and forth maybe hardwire TXRX and use a basic FDS. I don't yet see massive data or speeds required for my needs. No reply back yet from Altera FTDI.JTAG possibility. So I may just stick with my fav CP2110 on a cable mounted module, and use a 4 pin header for both P1 and P1V for programming.

Embedded FPGA options for P1?

Comments