VGA from P2 silicon

24

Comments

  • Admire your focus...

    By the way hope to put a proto pcb panel in on Monday. There'll be room, if you want to run a proto (1 or 2 copies)
  • There's something about this VGA demo. I can't quite put my finger on it....


    The slideshow blanks while it loads and flips the bmp files and then holds each for 4 frames. The vga driver reloads the palette and intensity every frame and advances a frame counter so that we can synch.

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    Latest binary V5.4 includes EASYFILE +++++ Tachyon Forth News Blog
    P2 SHORTFORM DATASHEET +++++ TAQOZ documentation
    Brisbane, Australia
  • Peter,
    It's the Sonic Dreams image right? ;)
  • thejthej Posts: 154
    edited October 25 Vote Up0Vote Down
    "That's not a microcontroller... THAT'S a MicroController" !!!

    :-)

    J
  • Nice, Peter. Are you clocking the SD card at the speed limit? That's pretty much your entire bottleneck, right?
  • I've optimised the VGA driver so that I can load a bmp file directly into the video memory with just a slight offset adjustment. So yes, the main bottleneck is loading a 300Kb file and even though it is using a single sector read command that normally wastes a lot of time waiting for the SD controller to access the sector as if it were non-sequential, it is still quite fast. However even at 180MHz the SPI clock is around 22.5MHz and the signal is looking quite rounded already. I will have to see what I need to do to improve the signal itself but of course I could speed it a lot simply by getting the sequential block read mode to work properly.


    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    Latest binary V5.4 includes EASYFILE +++++ Tachyon Forth News Blog
    P2 SHORTFORM DATASHEET +++++ TAQOZ documentation
    Brisbane, Australia
  • And then there's always that members-only 4-bit mode to investigate. I wonder if the pins slew faster in that mode.
  • Peter JakackiPeter Jakacki Posts: 7,834
    edited October 25 Vote Up0Vote Down
    This is outputting a clock from the P2 but I think it is a limitation of my 100MHz scope. Here is a 50MHz waveform using x10 probes and a 33R series termination resistor just to get the reflections out a bit. The probe ground is direct to a pin on the P2D2 ground, and this is from P42 near the center.

    Can anyone confirm what they get?

    50MHZ%20PIN.png
    800 x 480 - 38K

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    Latest binary V5.4 includes EASYFILE +++++ Tachyon Forth News Blog
    P2 SHORTFORM DATASHEET +++++ TAQOZ documentation
    Brisbane, Australia
  • ..However even at 180MHz the SPI clock is around 22.5MHz and the signal is looking quite rounded already.
    ... This is outputting a clock from the P2 but I think it is a limitation of my 100MHz scope.
    Here is a 50MHz waveform using x10 probes and a 33R series termination resistor just to get the reflections out a bit. The probe ground is direct to a pin on the P2D2 ground, and this is from P42 near the center.

    Is that 50MHz from P2 ? Did you generate for 50% duty cycle ?

    P2 physical pin speeds were always going to impose some limit, the only question was where.
    There is a lot of stuff hanging off a P2 Pin...

  • roglohrogloh Posts: 853
    edited October 25 Vote Up0Vote Down
    cgracey wrote: »
    And then there's always that members-only 4-bit mode to investigate. I wonder if the pins slew faster in that mode.

    Yeah the P2 hopefully opens up the possibility to do SDIO in 4 bit mode with proper CRC calculations at a reasonable speed giving some benefit over SPI mode. In fact I'm actually laying out a board right now assuming that it can be done at some point by including the extra two SD DAT1,DAT2 wires to P2 pins. I've had SD mode going before with my own custom P1V implementation so I learned what is needed in terms of the command and data transfer state machines. You have to hunt around for various snippets and samples of information and code online to work that out but it can be done with lots of patience.

    I think streaming nibbles to/from the 4 pins can be done automagically by the P2 streamer but the COG is also going to need to either precompute any sent CRC16/CRC7's in advance or more ideally compute it as it goes. Maybe the COG can track the information being sent from the streamer to the pins on each new clock and inject the CRC at the end from a table lookup being done use the LUT memory on each byte. This would be a very interesting timing project...and a good way to learn P2 capabilities.

    Update: I've mapped my SD pins as follows to the P2. I hope the streamer supports this allocation for the optimized nibble transfers...though if it doesn't work at least SPI is still going to be possible.
    CMD P26
    CLK P27
    DAT0 P28
    DAT1 P29
    DAT2 P30
    DAT3 P31
  • rogloh wrote: »
    cgracey wrote: »
    And then there's always that members-only 4-bit mode to investigate. I wonder if the pins slew faster in that mode.

    Yeah the P2 hopefully opens up the possibility to do SDIO in 4 bit mode with proper CRC calculations at a reasonable speed giving some benefit over SPI mode. In fact I'm actually laying out a board right now assuming that it can be done at some point by including the extra two SD DAT1,DAT2 wires to P2 pins. I've had SD mode going before with my own custom P1V implementation so I learned what is needed in terms of the command and data transfer state machines. You have to hunt around for various snippets and samples of information and code online to work that out but it can be done with lots of patience.

    I think streaming nibbles to/from the 4 pins can be done automagically by the P2 streamer but the COG is also going to need to either precompute any sent CRC16/CRC7's in advance or more ideally compute it as it goes. Maybe the COG can track the information being sent from the streamer to the pins on each new clock and inject the CRC at the end from a table lookup being done use the LUT memory on each byte. This would be a very interesting timing project...and a good way to learn P2 capabilities.

    Hey Roger, you reckon you could post that 4-bit code or whatever you have?

    I can also see by the reflections that I will need series termination, so I will allow for a resnet or two.


    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    Latest binary V5.4 includes EASYFILE +++++ Tachyon Forth News Blog
    P2 SHORTFORM DATASHEET +++++ TAQOZ documentation
    Brisbane, Australia
  • Hi Peter. This is actually using my own Verilog engine and not public right now unfortunately.

    Good catch on the series termination. I might add some too, just in case. Cheers.
  • cgraceycgracey Posts: 10,128
    edited October 25 Vote Up0Vote Down
    Peter, here are some pictures from an 8Gs/s 1.5GHz Infiniium scope with active FET probes. I pulled it out and set it up after I saw your post above. It runs on an old version of Windows and needs a PS2 mouse and the BIG round keyboard connector. It's a pain to set up, but works great. Have you ever seen 1ns/div? This was new to me. We bought this for $10k back in 2005 for the Prop1. These were $40k new. Today you can buy them for about $3k.

    Anyway, here is a shot of a smart pin in NCO mode with F=$8000_0000, so that it toggles on every clock. The waveform is half the clock frequency of 300MHz:

    pin_clocked.jpg

    That was cheating, though, because I had clocking enabled on the pin (C=1 in %CIOHHHLLL). It squared up the timing. Look at the duty cycle measurement.

    Here it is without clocking enabled (C=0), so we are seeing what the core logic is outputting. See the difference in Tph vs. Tpl:

    pin_unclocked.jpg


    I found out that clocking (C=1) fails around 300MHz, unless you raise VIO a bit.

    1096 x 813 - 302K
    1128 x 854 - 320K
  • cgracey wrote: »
    I found out that clocking (C=1) fails around 300MHz, unless you raise VIO a bit.

    That'll be setup time of OUT rise getting too late.

    Clearly OUT is falling faster than rising to produce such a skewed duty.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • evanhevanh Posts: 5,641
    edited October 25 Vote Up0Vote Down
    Nice photos btw. Makes me want a prototype part now so I can duplicate that to see what it looks like with a boring 200 MHz passive probe.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • Looking good!

    500ps rise and fall time
    Overshoot doesn't look too bad either :)

    Do you know why the high is shorter than the low?
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • evanhevanh Posts: 5,641
    edited October 25 Vote Up0Vote Down
    Cluso99 wrote: »
    Do you know why the high is shorter than the low?

    That one really will be the long run OUT route. The final buffer/driver is squaring it up but the internal rise and fall slewing must be uneven.

    EDIT: It won't affect the DIR to OUT relationship because they will both experience the same long paths. Or at least, the rules will ensure they are affected equally.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • evanh wrote: »
    cgracey wrote: »
    I found out that clocking (C=1) fails around 300MHz, unless you raise VIO a bit.

    That'll be setup time of OUT rise getting too late.

    Clearly OUT is falling faster than rising to produce such a skewed duty.

    That partially explains the low glitch we get when DIR and OUT fall on the same clock.
  • That would also explain the notch in reliable clock frequency for ADC operation too. Beyond about 325ish MHz it appeared reliable again. The IN bitstream is possibly skewing so badly that the transitions are being sampled by the smartpin on the following clock. It's also possible this is producing a distortion between rise and fall at the seemingly reliable 340 Mhz sysclock.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • cgraceycgracey Posts: 10,128
    edited October 25 Vote Up0Vote Down
    evanh wrote: »
    That would also explain the notch in reliable clock frequency for ADC operation too. Beyond about 325ish MHz it appeared reliable again. The IN bitstream is possibly skewing so badly that the transitions are being sampled by the smartpin on the following clock. It's also possible this is producing a distortion between rise and fall at the seemingly reliable 340 Mhz sysclock.

    Yep. I never imagined the I/O pads wouldn't be fast *enough*. Now, it's a good thing that clocking must be turned ON, and not the default, so that we have an asynchronous path to the I/O pins that allows the core to still function over I/O, at all, at such high speeds. Providence!!
  • I don't see it as the pads being slow. I see it just those very long routes that will be unique to IN OUT and DIR. And rules will allow it.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • That's why the Cogs aren't crashing at 340 MHz. The tighter constrained areas aren't at their limit yet.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • evanhevanh Posts: 5,641
    edited October 25 Vote Up0Vote Down
    It could be another variant of the problem we saw with the I/O between 40 MHz and 80 MHz on the FPGA.
    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • cgracey wrote: »
    Yep. I never imagined the I/O pads wouldn't be fast *enough*.
    Oh, I didn't read that careful enough. You're agreeing the pads are fast.

    Yeah, I guess, we're way beyond rated speed. Anything here is just bonus.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • Peter JakackiPeter Jakacki Posts: 7,834
    edited October 25 Vote Up0Vote Down
    So I've fixed that bug I had in the multiple block read mode so that is working now and loaded a 300kB bmp file in......
    LAP TIGER LAP .LAP 31,316,360 cycles = 97,863,625ns @320MHz ok
    
    Less than 100ms! That means it is reading at a rate of over 3MB/sec !!!
    The SPI clock routines have an extra nop in the clock to stretch them out now too.

    Who knows how fast I can push it if I used the smartpins and include some series termination on the next board.

    BTW - this is all there is to the VIEW routine.
    : VIEW ( sector -- )
    ---	read file header and align file palette to memory palette then read all
    	DUP FOPEN PALETTE 10 SDW@ $400 - - BMPSZ HIDE SDRDS SHOW
    	;
    
    Just one line of code.

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    Latest binary V5.4 includes EASYFILE +++++ Tachyon Forth News Blog
    P2 SHORTFORM DATASHEET +++++ TAQOZ documentation
    Brisbane, Australia
  • evanhevanh Posts: 5,641
    edited October 25 Vote Up0Vote Down
    Peter,
    Use this for improved clocking template:
    SPIRD
    		outnot 	sck			' clock (low)
    		rep	@.end,#7		' first 7 bits
    		outnot 	sck			' clock (high)
    		testp	miso wc			' read data from card
    		outnot 	sck			' clock (low)
    		rcl	tos,#1			' shift in msb first
    .end
    		outnot 	sck			' clock (high)
    		testp	miso wc			' read data from card
    _ret_		rcl	tos,#1			' shift in final bit (lsb)
    
    
    PS: In case you're wondering, I've derived that from your posting here - https://forums.parallax.com/discussion/comment/1426178/#Comment_1426178

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • I've probably got the polarity labelling opposite to normal.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • Peter JakackiPeter Jakacki Posts: 7,834
    edited October 25 Vote Up0Vote Down
    evanh wrote: »
    Peter,
    Use this for improved clocking template:

    That doesn't offer any improvements since it has 5 instruction in the main loop, the same as the old routine plus it also adds 2 extra instructions. Nonetheless I tried it out and while it worked, it didn't run any faster. The SD clock is running at 30MHz for both. Maybe I will have to look at smartpin modes next.

    EDIT: Sorry, yours has 4 instructions in the main loop but I would have to do the same to the SPIRX version as well which handles blocks. One of the problems we have is that we only have 16kB of ROM total and we really had to do some squeezing.

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    Latest binary V5.4 includes EASYFILE +++++ Tachyon Forth News Blog
    P2 SHORTFORM DATASHEET +++++ TAQOZ documentation
    Brisbane, Australia
  • evanhevanh Posts: 5,641
    edited October 25 Vote Up0Vote Down
    One of the problems we have is that we only have 16kB of ROM total and we really had to do some squeezing.

    I hope you've carefully replaced every RET with _RET_ :P

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • evanh wrote: »
    [just saw your edit]

    I'm running my loop now with P2 at 340MHz and I'm reading a 33.8MHz clock, so it is very close to 1/10 of the CPU clock.

    Tachyon Forth - compact, fast, forthwright and interactive
    useforthlogo-s.png
    --->CLICK THE LOGO for more links<---
    Latest binary V5.4 includes EASYFILE +++++ Tachyon Forth News Blog
    P2 SHORTFORM DATASHEET +++++ TAQOZ documentation
    Brisbane, Australia
Sign In or Register to comment.