Xilinx port started...

overclocked · 2014-08-12 08:06

David Betz wrote: »

Thanks for the analysis. I guess I can forget about using my XuLA-200 for P1 work but fortunately that isn't what I bought for. Now if I can only remember what I *did* buy it for... :-)

It would probably make a very nice board for implementing the SUMP logic analyzer or part of its derivate: BitHound
The original SUMP uses SRAM and BitHound SDRAM so a mix between them maybe...

overclocked · 2014-08-12 22:47

David Betz wrote: »

Thanks for the analysis. I guess I can forget about using my XuLA-200 for P1 work but fortunately that isn't what I bought for. Now if I can only remember what I *did* buy it for... :-)

just to wrap this up (S3-200K test) I tested it now when I have I machine with Xilinx ISE around.
It actually manage to cram in th following spec. P1. SO it is very tight, but maybe this one could actually be usable.

TARGET: Spartan-3 200K (only configured CLK pin)

`define NUMBER_OF_COGS 1 //8
`define HUB_MEM_SIZE 2048 //8192
`define COG_RAM_SIZE 512 //512

So 1 COG with full mem, 1/4 of the HUB RAM, disabled CharRom and enabled InterpreterRom.

Resource usage:

Number of occupied Slices: 1,825 / 1,920 95%
Number of RAMB16s: 12/12 100%

The report matches my given timing of 160 Mhz PLL_clk, thus running 80 Mhz COG and could probably go a little higher.

Is having a 1/4 HUB RAM usable or does that present some problems?

Cluso99 · 2014-08-13 00:10

Unfortunately any hub less than 32KB is not going to be popular at all ! not worth bothering with.
Possibly we could get away with a smaller number of cogs - we survived with 1 cog on P2.

jmg · 2014-08-13 00:16

overclocked wrote: »

Is having a 1/4 HUB RAM usable or does that present some problems?

There is only one COG, so this has twice the average RAM per COG

Such fractional/minimal reference points are useful for someone looking at a uC+Logic.
I think the XC3S series have some parts in 100tqfp, whilst many others are in TQFP144 or only BGA.

The price edge is only modest, and certainly the price-per-logic is worse, but some may already have the parts.

jmg · 2014-08-13 00:20

Cluso99 wrote: »

Unfortunately any hub less than 32KB is not going to be popular at all ! not worth bothering with.
Possibly we could get away with a smaller number of cogs - we survived with 1 cog on P2.

I doubt such a system will be the ONLY processing engine in a system, so less HUB RAM is not a deal breaker.
(it is actually more RAM/COG).
These fractional P1V's make sense next to another controller, that also needs Logic fabric.
So they might go next to a real P1, or a small ARM, or.... comes down to package and Price.

The MAX 10 from Altera is due soon, could be interesting as it is 55nm FLASH.

overclocked · 2014-08-14 13:25

Next update here:

I now realize that to be able to share anything I must first start my own Propeller on one of my FPGA-cards.
I just bought a simple USB-UART 3.3V cable that hopefully will work to download stuff to the Propeller.
It does not support the reset signal, so what do I need to do to make it work. Simple solution is good enough for now.
Just pulse that pin on the FPGA-Propeller with a button or do it need more control than that?

I actually haven't decided which card to start of with:
- Microblaze Starter Kit (Xilinx)
- Spartan-3 Starter kit (Xilinx)
- BEMicroSDK (Altera CycloneIV - actually the same as on Nano I think)

But to be able to make the smallest amount of changes to the original, I choose the BeMicroSDK to start with.
That will probably make me solder a connector at least for the programmer connection.
Is there a very simple boot program to start off with? Like DO nothing, just start COG1 so a LED is lit!

Probably this will have to wait until this weekend.

The end result will be me sharing by forking the original code in Github and setting up a total rewrite.
Please keep posted and try these things with me for quality control!

jmg · 2014-08-14 14:00

overclocked wrote: »

It does not support the reset signal, so what do I need to do to make it work. Simple solution is good enough for now.
Just pulse that pin on the FPGA-Propeller with a button or do it need more control than that?

See section
3.1. Boot-Up Procedure
in the data sheet. It maybe best to try to add the reset signal ?

Ramon · 2014-08-17 00:11

overclocked · 2014-08-17 02:41

Ramon wrote: »

Overclocked, How did you solved this?

Line 90: wire [1:0] hub_bus_s = bus_s[7] | bus_s[6] | bus_s[5] | bus_s[4] | bus_s[3] | bus_s[2] | bus_s[1] | bus_s[0];

ERROR:Xst:868 - "dig.v" line 90: Index out of range for bus_s.

As I haven't found the time to actually TEST the P1V in hardware this i just how I got it to compile.

I exchanged the commented line below with the new line. This is just one of many changes needed to be done regarding multi-dim wires.

//wire [7:0] bus_s [1:0]    ;
wire [1:0] bus_s [7:0]    ;

The reasoning here is that the original syntax is SystemVerilog only and the new syntax hopefully does the same thing, but using standard verilog. What do you guys think? When I find time I will share my whole project instead so someone can try if it works. Maybe you Ramon? Which board/FPGA do you use?

Ramon · 2014-08-17 07:59

I am using a digilentinc S3board 1000K gates (instead of 200K). I think this can fit up to 7 cores. Maybe full 8 cores if I trimm video or any other feature.

I was able to solve the multi dimensioned arrays (MDA) using that change and also doing individual assigment array[0] = xxx; but this error looks slightly different because bus_s is a copy of other MDA.

overclocked · 2014-08-17 09:17

Ramon wrote: »

I am using a digilentinc S3board 1000K gates (instead of 200K). I think this can fit up to 7 cores. Maybe full 8 cores if I trimm video or any other feature.

I was able to solve the multi dimensioned arrays (MDA) using that change and also doing individual assigment array[0] = xxx; but this error looks slightly different because bus_s is a copy of other MDA.

Yea I know that card very well. Great card and its little brother (200K) was was my init into the FPGA-world!
A actually built an dummy-build for the 200K (mentioned else where on forum) but it so tight to use so I'm not sure its good to start with that for me.

How are you planning to program your P1V? It's getting advice and ideas. The built in RS-232 would be nice, but does it support flow-control? It does not seem so on the schematics. I can't remember exactly but without flow you're out of luck to be able to init the P1-download or? Or is there a simple work-around in the code to make it work without that init reset pulse?

overclocked · 2014-08-17 13:25

I tried builing it for the S3-1000K now but are you sure you really got everything compiled correctly if you can fit 7 Full Cogs?
At 4 I've got 99% full and quite alot of RAM has been implemented using logic.

jmg · 2014-08-17 13:49

overclocked wrote: »

I can't remember exactly but without flow you're out of luck to be able to init the P1-download or? Or is there a simple work-around in the code to make it work without that init reset pulse?

I've started a new thread for possible non-reset downloads. (Likely someone may already have done this?)

http://forums.parallax.com/showthread.php/156977-Propellor-Loader-links

and notes on the Send flow encode/timing are here, with mention of the timeout windows
http://propeller.wikispaces.com/Download+Protocol

Timing looks tight, but maybe a PC host can do a regular minimal ping, looking for a checksum result echo ?

A Hardware-expended variant could use a S-R FF on the FPGA end, so that the User Sets, and a first Char leading edge Clears (then the removed reset starts the P1V Reset preamble), so the very first char is ignored/lost, as it starts the reset sequence. This variant needs the simple HW, but would be more predictable.

With the SR-FF the order changes slightly, so you manually Prime the P1V SR-FF then start the PC end.
Most FPGA boards have buttons and leds, so a tiny verilog stub could map a PB to the SR-FF and a led can show the P1V reset (combination of PB + Serial RX edge)

Unfortunately, a completely standard (old P1) loader is unlikely to work in either case.
If a 'standard loader' is modified to also send a dummy CHAR when it toggles DTR that should work in both cases. A StdP1 will simply ignore the leading char, and the SR-FF version will remove RST on that. (thus it 'clones' a DTR line)

jmg · 2014-08-17 20:37

overclocked wrote: »

Or is there a simple work-around in the code to make it work without that init reset pulse?

Expanding on my above post, I think there may also be a TinyLogic solution for a standard P1 chip, which can use features of the Three Bit Protocol (3BP) to have a Break string, trigger a RESET from RXD alone (DTR is optional).

This keeps the FPGA RST the same as a P1

However, this does need a trivial code change to the Downloader SW, that instead of this
Reset the Propeller (on Parallax development boards: set DTR low, wait at least 10 ms, set DTR high).
it modifies to
Reset the Propeller (on Parallax development boards: set DTR low, Send Break String*, wait at least 10 ms, set DTR high).

Ideal Logic device is a 74AUP1T58, wired as
Fig 6. 2-input NAND gate with input B inverted or 2-input OR gate with inverted C input (NXP data)

Y -> Reset Pin ( Gate is: Y = C & !B)
!B <- Std DTR RC circuit, optional connection
C <- RC LPF on the RXD line, this detects a Break String, ignores normal download data

The standard Three Bit Protocol (3BP) never goes below a 40% data density, which is above the threshold of the 74AUP1T58.

If the download SW is modified to send a 1~5~10ms* burst of 00H, or 01H chars (Break String), then that will be below the threshold of the 74AUP1T58, and a reset pulse will be generated.

Quick Spice checks says a single char Break is marginal, but there is plenty of time in the 10~16ms DTR pulse to send a short string of chars, eg 10 to 50, or even roughly as wide as DTR.

If both DTR and RXD are connected, two reset pulses may be generated, the last one wins.

On designs with no DTR line, the RXD signal generates the controlled Reset, very similar timing as DTR.

* Addit : reading historic DTR posts, I think the BreakString should be nominally the same width as the DTR. FWIR with USB tests, changes to handshake lines wait for the serial data to complete first. so PC code that does
[set DTR low, Send Break String ~10ms, set DTR high]
will have string ~ 1-2ms shorter than DTR.

Ramon · 2014-08-18 08:32

overclocked wrote: »

How are you planning to program your P1V? It's getting advice and ideas. The built in RS-232 would be nice, but does it support flow-control? It does not seem so on the schematics. I can't remember exactly but without flow you're out of luck to be able to init the P1-download or? Or is there a simple work-around in the code to make it work without that init reset pulse?

I remember that the reset pulse was needed by the software (PNUT programmer) and it is not easy to change that so I will use some USB-TTL adapter with reset, or the propeller RS-232 programmer (with modified schematic to emulate reset using some control signal). I remember that the schematic was in the datasheet.

overclocked · 2014-08-18 13:10

Ramon wrote: »

I remember that the reset pulse was needed by the software (PNUT programmer) and it is not easy to change that so I will use some USB-TTL adapter with reset, or the propeller RS-232 programmer (with modified schematic to emulate reset using some control signal). I remember that the schematic was in the datasheet.

Yea OK. This seem to be one of the hurdles for first time propeller heads with the lack of programmer. I hopefully solved my problems now by returning the USB UART TTL which was missing control signals and buying a lilypad programmer. Hopefully that will work better.

http://www.electrokit.com/lilypad-usb-link-3-3v-ftdi-ft232rl.46884

Bill Henning · 2014-08-18 13:16

I think it would be useful to have a loader mode where the P1V would wait for say 2-5 seconds after reset for a download.

This would allow using any USB adapter, even those without any handshaking lines.

overclocked · 2014-08-20 00:25

OK an update to Xilinx-based P1V!

Thanks to Magnus work here:
http://forums.parallax.com/showthread.php/157004-Propeller-1-running-on-Pipistrello-(Xilinx-Spartan6-LX45)
(and some hours in my own chamber) the Propeller now actually boots and executes code in my Microblaze Starter Kit based on Spartan-3E 1600E.
GREAT! So the first blinking LED examples have been executed. SPIN looks really intriguing for an old Windows thread programmer as myself! Very nice simple examples are included which only need a couple of LED's to show multi-threaded goodness!

The main things missing was:

1) Using a newer version of the ISE verilog parser engine by giving command-line parameter: "-use_new_parser yes"
2) The above mentioned rewriting of SystemVerilog=>verilog
3) Magnus solution to reset wait/delay
4) An ugly way that I trial'n'errored to be able to emulate DTR-reset using a button on FPGA-board. I would really advise anyone to buy a programmer with support for flow-control! :-) I already bought one but it is waiting at home for me as we speak.. with me being on business travel for the moment.

A good thing to notice is by using the newer parser it is OK to use wires/reg before they are defined in the verilog-files. But as a side-note the actual optimization for older devices seem worse, both when it comes to performance (FMAX) and resource allocation (Slices). But in this case it seem like a good deal to at least start off with using this switch. This way it is possible to change as little as possible in the original P1V sources and still being able to run on legacy devices.

The code looks really ugly for now to some amount of work need to be attended before releasing anything. If you are really eager to get it, give a shout and I will dump a ZIP including some instructions on how to get it going.

References:
http://forums.xilinx.com/t5/Synthesis/How-to-enable-the-new-parser-for-XST-in-ISE-12-1/m-p/133272

jmg · 2014-08-20 00:55

overclocked wrote: »

....
A good thing to notice is by using the newer parser it is OK to use wires/reg before they are defined in the verilog-files. But as a side-note the actual optimization for older devices seem worse, both when it comes to performance (FMAX) and resource allocation (Slices).

Do you mean here that the older parser (edited sources) gives better MHz/Slices figures than the new parser ?

overclocked · 2014-08-20 01:55

jmg wrote: »

Do you mean here that the older parser (edited sources) gives better MHz/Slices figures than the new parser ?

Correct! The new parser actually WARNS with a message that it is NOT optimized for old devices. So if you really need the last Mhz/Slices, you need to rewrite the code to pass the old parser.

Ramon · 2014-08-20 06:44

This are the results with -use_new_parser yes and changing PLL to DCM:

Xilinx 3s1000fg320-5 and full 8 cores:

 Number of Slices:                    25792  out of   7680   335% (*) 
 Number of Slice Flip Flops:           5657  out of  15360    36%  
 Number of 4 input LUTs:              43223  out of  15360   281% (*) 
    Number used as logic:             26839
    Number used as RAMs:              16384
 Number of IOs:                          46
 Number of bonded IOBs:                  46  out of    221    20%  
 Number of BRAMs:                        24  out of     24   100%  
 Number of GCLKs:                         8  out of      8   100%  
 Number of DCMs:                          1  out of      4    25%

Xilinx 3s1000fg320-5 with only 1 cog in the generate loop:

 Number of Slices:                     9034  out of   7680   117% (*) 
 Number of Slice Flip Flops:            928  out of  15360     6%  
 Number of 4 input LUTs:              14855  out of  15360    96%  
    Number used as logic:              5639
    Number used as RAMs:               9216
 Number of IOs:                          46
 Number of bonded IOBs:                  46  out of    221    20%  
 Number of BRAMs:                        24  out of     24   100%  
 Number of GCLKs:                         4  out of      8    50%  
 Number of DCMs:                          1  out of      4    25%

I have seen on the reports that this device only has RAM blocks to allocate 16 Kb of HUB (all 24 RAM Blocks get full with COG RAM and first 16Kb of HUB).

I also have seen the warning about optimization and will try tomorrow again without the old parser (changing all regs and wires to the top), to compare if this something gets better (but I guess it will be similar result).

overclocked · 2014-08-20 07:01

Ramon wrote: »

This are the results with -use_new_parser yes and changing PLL to DCM:

Xilinx 3s1000fg320-5 and full 8 cores:
[code]
Number of Slices: 25792 out of 7680 335% (*)

I also have seen the warning about optimization and will try tomorrow again without the old parser (changing all regs and wires to the top), to compare if this something gets better (but I guess it will be similar result).

But isn't the big HubRam part of the problem? Like IO said above (earlier here) with my already fixed local code:
-snip-
I tried builing it for the S3-1000K now but are you sure you really got everything compiled correctly if you can fit 7 Full Cogs?
At 4 I've got 99% full and quite alot of RAM has been implemented using logic.
-snip-

If I remember correctly, this is with "full" defalt RAM's on all included components, but with 4 COG like stated.
So yes you could probably fit something useful into that board as wel.

Ramon · 2014-08-20 07:40

I was not sure about how many cogs can fit. I said just 7 before by reading some numbers in the 1600K gates part (or elsewhere, I cannot remember). But it looks that this part is really far away from the Cyclone IV inside the DE0-nano

jmg · 2015-10-07 21:27

I see a new Xilinx Board $99,
http://www.em.avnet.com/en-us/design/drc/Pages/Artix-7-35T-FPGA-Evaluation-Kit.aspx?cmp=na-aes3-pr-201509

Uses
Xilinx XC7A35T-L1CSG324I
256 MB DDR3 SDRAM
16 MB of QSPI Flash
10/100 Ethernet Interface

Number of LABs/CLBs 2600
Number of Logic Elements/Cells 33,208
Total RAM Bits 1,843,200
Number of I/O 210

So that's in a similar ballpark to the top MAX10's, but with 10% more RAM

Strangely, it says
"and a Xilinx Vivado: Design Edition Voucher card (device-locked to XC7A35T)."

even tho this is not a very large FPGA, they seem to need a special license to support it ?

Not sure where Artix-7 stacks up in the MHz space, relative to Altera MAX 10 or larger A7/A9 ?

Addit: Pastes from Digikey, for Altera boards FPGAs

5CEFA2F23C8N [BeMicro CV $49]
Number of LABs/CLBs 9434
Number of Logic Elements/Cells 25,000
Total RAM Bits 2,002,944
Number of I/O 224

5CEFA9F23C8N [BeMicro Cv A9 $149]
Number of LABs/CLBs 113560
Number of Logic Elements/Cells 301,000
Total RAM Bits 14,251,008
Number of I/O 224

Leon · 2015-10-08 03:08

I've just bought one! I can't get the voucher code for Vivado to work, I'll download the ISE and try that.

Ale · 2015-10-08 17:56

All these boards are very nice, but I'd really want just a bare-bones FPGA breakout board that I can integrate in my projects.

LoopyByteloose · 2015-10-08 19:15

Ale wrote: »

All these boards are very nice, but I'd really want just a bare-bones FPGA breakout board that I can integrate in my projects.

@Ale
Interesting comment...
I ignored several barebones Cyclone boards that are sitting on a dusty display rack locally, and purchased the BeMicroCV and BeMicroCVA9 to get started. The USB port was seductive.

They may require a programing interface that is a bit more of a challenge, but just might never go out of style for interesting builds.

+++++++++++++
I have been attempting to migrate a Forth project from Lattice to Altera -- so these challenges are very interesting.

I strongly suspect that none of the major FPGA players really want to have universal Verilog or VHDL. They all have turf to protect.

+++++
So far, the Altera Quartus II V15.0.2 software seems to be a great freebie. That makes buy a BeMicroCV and a BeMicroCVA9 a wise way to start.

Lattice software is really limited to Windows, no simulation available in Linux. Not sure what Xilinx is offering. But without the simulation software, the hardware is not going to yield much learning.

LoopyByteloose · 2015-10-08 19:22

https://joelw.id.au/FPGA/CheapFPGADevelopmentBoards

This Aussie may help the comparison shoppers out there. But you really should be sure about the IDE that supports what you are getting.

jmg · 2015-10-08 19:33

Ale wrote: »

All these boards are very nice, but I'd really want just a bare-bones FPGA breakout board that I can integrate in my projects.

I like my 'breakouts' to still include a usb-loader (Lattice & this Xilinx PCB use FT2232H), but yes, small is good too...

If you want small, P1 form factor, and Xilinx, then look at this nifty design by Antti :

https://hackaday.io/project/6786-soft-propeller
http://shop.trenz-electronic.de/en/TE0722-01-DIPFORTy1-Soft-Propeller

jmg · 2015-10-08 19:38

Loopy Byteloose wrote: »

https://joelw.id.au/FPGA/CheapFPGADevelopmentBoards

This Aussie may help the comparison shoppers out there.....

Nice link, says this about the ARTY
["Arty Evaluation Kit ($99) .... and a one year licence for Vivado Design Edition."}
Hmm... What happens after that one year ?

Xilinx port started...

Comments