question regarding P1v clocks

rjo__ · 2015-04-15 09:31

I have been sequentially building and then destroying my memory control unit for the DE2-115. So far I have dual-ported ram (M9K) (DPR) working like a top, with a projected rate of 800 MB/sec reads and writes. I am stuck at about 35MSPS for SRAM, and I haven't scratched SDRAM yet:)

I have found that I can drive the the DPR with a clock frequency of 800 MHz but not 1000 MHz. There is a very regular jitter at 1000MHz...not terrible... but enough to corrupt the reads slightly... so it is a regular irregularity.

In the compilation report, when I chose "clock" I get a report of my clocks and those being generated by the P1V.

prop:inst1|dig:core|cog:coggen[0].cog_|cog_ctr:cog_ctra|ctr[23] for each cog shows 1000MHz...

I think that is a problem...
?

I don't have the exact skills required to fix this in less than a fortnight and I am heavy into other P1V stuff at the moments... but we have folks here for which this is probably a no-brainer... why not drop these clocks back to 800MHz and then get rid of the 1:2 clock reduction in the hub?

rjo__ · 2015-04-16 06:04

Here is the .sof file for the current iteration of the project. I need to clean up the sources before posting them. This .sof produces a DE2-115, 4 Cog P1v running at 100MHz, with output to 800x600 40Hz VGA. This implementation requires the Parallax add-on board for programming the P1v. SRAM and DPR throughput are limited by the fact that each operation requires at least 4 PASM commands... So, running the DPRam with an 800MHz clock has no immediate advantage:)... but it does prove that it is possible. It would be better if PASM or Spin were issuing macro commands to a verilog interpreter.

To test the SRAM and m9k DPRam run dprsramtest.spin.

This 4Cog implementation also improves Spin performance by about 9 percent.

Proof of this can be seen by running P1V_4cogtest.spin

Thanks

Rich

rjo__ · 2015-04-16 06:31

oops... I should have added that P1v handles communications with the MCU using Port B... as originally described by Pik33. In this kind of implementation, dirb,inb, and outb are all on individual buses. So adding Port B in this manner adds 64 outputs and 32 inputs. Dirb is used for addresses and various states and outb is used for data out of the P1V and into the mcu. inb is used to receive data into the P1v from the mcu...16 bit data from the SRAM in inb[15..0] and 16 bits from DRP at inb[31..16]

nutson · 2015-04-16 06:57

According to the Cyclone IV Datasheet the maximum frequency of an M9K memory is around 275 Mhz, and the clock tree can handle up to 430 Mhz. So I wonder what you are really doing...

rjo__ · 2015-04-16 07:58

I only read datasheets as a last resort and I never claimed to know what I was doing. I have attached the code and a screen shot of the block diagram. In the code the DPR has its own clock and in the pll at the upper left of the diagram... clock zero is set to 800Mhz... feeding the DPR_clock. The VGA requests a pixel at DRP address specified by pixloc in the VGA module... the DPR retrieves this data and sends it to the VGA via pixval. It works:)

BUT... if I make even the slightest change ... it breaks... so it might not be as useful as I originally thought.

The point of this thread is that Chip has stated that if we could get rid of the clock reduction... which reduction is used because of jitter ... we could maybe get a P1v operating around 200MHz... and the jitter is coming from the 1000 MHz clock ... then it would make sense to replace the 1000 MHz clock... I don't know how to do this
but I would suggest 800Mhz... seems to be a fine clock:)

nutson · 2015-04-16 10:48

Sorry, my designs always have a Verilog top-level module, so I have problems interpreting your schematic. A CyIV PLL output on DE2-115 can go up to 450Mhz, see datsheet, so what it really does in your case?? The reason why the design seems to work: even if the clock would be 800MHz, the write address and write data are generated by propeller SW and change slowly, say 1 MHZ, so you would write to the same address with the same data 800 successive clocks. Yeah that will work eventally. Same on the read side: the read address changes only 40M times, so you would read the same address 20 successive times. You can only claim 800 MHz operation if you perform 800M read and write operations from 800M different addresses.

rjo__ · 2015-04-16 13:02

I understand and agree with your analysis, and I am planning to dial it back. from the Prop side I can read about 10 frames(300KB/frame) per second and also write without errors at about the same rate... which is about 3 MB/s... one byte at a time. The VGA is asking for pixels at at rate of about 40MB/sec... and getting them. I would guess that there is a limit to reliable throughput exactly where Altera says it is. Even at 450MHz... that would translate to about 450MB/s by increasing the data width to 24bits... mine is currently set at 8 bits. When I tried to increase the data width with an 800MHz clock, it blew up... not literally of course:)

It is funny that Quartus will let you create a PLL at 1000MHz if in reality it isn't physically possible... and then back to the issue of the clock division in the P1V... Chip is definitely using a 1000MHz clock according to Quartus... So...?

jmg · 2015-04-16 14:29

rjo__ wrote: »

It is funny that Quartus will let you create a PLL at 1000MHz if in reality it isn't physically possible... and then back to the issue of the clock division in the P1V... Chip is definitely using a 1000MHz clock according to Quartus... So...?

The PLL Value is less important than the Clock outputs
Often FPGA and MCU have quite high PLLs (even up to 2GHz) as the higher figures allow printed inductors for better analog PLL and smaller die area. Flipflops to divide from there are cheap.
I've seen parts with multiple VCOs to cover a 2:1 span (inductor + varicap VCOs )

That GHz region clock never travels far, and does not drive any clock trees directly.

rjo__ · 2015-04-16 14:42

very nice. Thank you.

question regarding P1v clocks

Comments