All PASM2 gurus - help optimizing a text driver over DVI?

1212224262729

Comments

  • evanhevanh Posts: 8,617
    edited 2019-12-07 - 10:08:12
    cgracey wrote: »
    Yes, it takes three dual-port RAMs to achieve single-clock execution because there's no 3R/1W memory instance available. That was the main driver to dropping to two-clock execution, since it only requires a single dual-port RAM.
    Didn't know that. I thought it was primarily the design heat spec. That would explain why you were saying there so much spare space after though.
    We made the cogs big, anyway, after that.
    There were a few complaints of kitchen sinkism. :) Plenty was added outside the cogs though.
    On P3, it would be good to go back to single-clock execution. If we use a much smaller process, three dual-port RAMs for cog memory won't be a problem.
    Bah, surely a refined quad-port 3R/1W could be sorted out. They must be getting used and getting more commonly used now too.

  • We had actually designed our own 3R/1W 512x32 RAM for the ONC18 process, but it was going to be hard to qualify. I am blown away how fast ON's memories go. I don't think we could, for sure, go so fast. It's much safer to use what they've thorough qualified and characterized.
  • So Chip, just what speed will the ONC18 RAM run at?

    I am curious about cogs having quite a bit more RAM and having the instruction set simplified and the streamers removed.
    Then have another set of cogs tuned for streamer and I/O, perhaps taking in the smart pins too.

    By simplifying each cog set, I think both would perform nicely and complement each other. And with the simplification they would be much smaller and run much faster.

    As for the 3R1W RAM, I wonder if ON would be interested to work with you to get them working and qualified, so that they could also use them too???
  • evanhevanh Posts: 8,617
    edited 2019-12-07 - 11:56:50
    I'm not in favour going asymmetrical at all. If wanting to separate the streamers then a better direction is just group them separately to the cogs, like the smartpins are now.

    And that goes for any external buses too. No privileged cogs.
  • Cluso99Cluso99 Posts: 15,671
    edited 2019-12-07 - 12:08:49
    The problem with symmetrical cogs is a lot of wasted silicon and a loss of speed. That’s why we are seeing multi core ARMs with different base cores eg A73 with M0. Same goes for the video cores vs the main cpu cores.

    Not every program running in the P1 or P2 is equal. It’s just a fact.

    It’s quite likely you could get 8 tight processing cogs plus 8 streamer io cogs in the same silicon space as 8 full-blown cogs.
  • Cluso99 wrote: »
    So Chip, just what speed will the ONC18 RAM run at?

    I am curious about cogs having quite a bit more RAM and having the instruction set simplified and the streamers removed.
    Then have another set of cogs tuned for streamer and I/O, perhaps taking in the smart pins too.

    By simplifying each cog set, I think both would perform nicely and complement each other. And with the simplification they would be much smaller and run much faster.

    As for the 3R1W RAM, I wonder if ON would be interested to work with you to get them working and qualified, so that they could also use them too???

    The ONC18 RAMs are guaranteed to run at a little over 200MHz with worst-case process conditions, lowest voltage, and 150C junction temp.

    When I got the RAMs working for the P1, I needed to use our e-beam prober to monitor the bitlines and control signals. I think it would be over $150k to get ON to qualify a custom RAM for inclusion into the tool chain. We are better off using their RAMs.
  • evanhevanh Posts: 8,617
    edited 2019-12-07 - 12:28:52
    Cluso99 wrote: »
    The problem with symmetrical cogs is a lot of wasted silicon and a loss of speed. That’s why we are seeing multi core ARMs with different base cores eg A73 with M0. Same goes for the video cores vs the main cpu cores.

    Ick, mixed instruction sets and mixed feature sets must be a messy dev environment for those poor guys.

    I'm glad I don't have to deal with it.

  • What's the basepin on the binary?
  • potatoheadpotatohead Posts: 9,957
    edited 2019-12-07 - 18:36:55
    This is the best PAL signal we've seen on a Propeller.

    Nice work. You wondered about the color bursts. There is a delay mode that puts the blanking periods into frame. Sorry for reflections. My garage workspace has crap lighting right now. But, the upside is I have a garage workspace now! (it's been years, so I'm happy)

    20191207_101455.jpg1207_101455.jpg[/img]

    Btw, there appears to be a forum bug. When one inserts an uploaded image, the software drops about 5 copies, and I'm deleting 4 of them.
    3024 x 4032 - 3M
  • potatoheadpotatohead Posts: 9,957
    edited 2019-12-07 - 18:10:39
    Holy moly, those were big! Oh well. You can see the sync areas, dead on, frame after frame. Sweet! And those tints are coming across in my cell phone camera with about the right saturation, but a bit bright. Just FYI. Hue seems on though. For your reference.

    Lots of people are going to really appreciate this work. Thought of one hack already. Now that we've got a PAL signal that is respectable, anyone up for more than 8 bits of color via PAL artifacting? In NTSC, this is done horizontally. In PAL, it's done vertically, both done on a non-interlaced 50/60Hz frame, one field.

    With NTSC, it's all about overloading the color signal. Just pack pixels in horizontally. With PAL, one can put two pixels aligned vertically to take advantage of the vertical blending of colors. I've never tried it with a respectable signal. Will one day in the future.

  • @potatohead If you want to get rid of those black bars in a CRT photo, simply set your camera's shutter speed to 1/50 (or 1/60 for NTSC, obviously) or close to a multiple of that
  • potatoheadpotatohead Posts: 9,957
    edited 2019-12-07 - 18:15:34
    Does an android phone do that? [goes off to look]

    It does. Man, these advanced phones have everything. I am going to edit now.

  • Wuerfel_21Wuerfel_21 Posts: 504
    edited 2019-12-07 - 18:14:41
    Depends on the model. My LG G5 (RIP :cry: ) had the option right in the stock camera app, you just had to enable some kind of "expert mode" or whatever it was called
  • Mine was pro mode. Has ISO, shutter, and a bunch of other settings. Had no idea. There are a number of photos I would love to retake with a much higher ISO and faster shutter...

    Well, now I know. Spiffy!

    I've got a Note 8.
  • jmgjmg Posts: 14,175
    cgracey wrote: »
    Cluso99 wrote: »
    So Chip, just what speed will the ONC18 RAM run at?

    I am curious about cogs having quite a bit more RAM and having the instruction set simplified and the streamers removed.
    Then have another set of cogs tuned for streamer and I/O, perhaps taking in the smart pins too.

    By simplifying each cog set, I think both would perform nicely and complement each other. And with the simplification they would be much smaller and run much faster.

    As for the 3R1W RAM, I wonder if ON would be interested to work with you to get them working and qualified, so that they could also use them too???

    The ONC18 RAMs are guaranteed to run at a little over 200MHz with worst-case process conditions, lowest voltage, and 150C junction temp.

    When I got the RAMs working for the P1, I needed to use our e-beam prober to monitor the bitlines and control signals. I think it would be over $150k to get ON to qualify a custom RAM for inclusion into the tool chain. We are better off using their RAMs.

    Can this over-the-horizon-stuff be moved to another thread, to keep this one cleaner ?
  • jmg wrote: »
    cgracey wrote: »
    Cluso99 wrote: »
    So Chip, just what speed will the ONC18 RAM run at?

    I am curious about cogs having quite a bit more RAM and having the instruction set simplified and the streamers removed.
    Then have another set of cogs tuned for streamer and I/O, perhaps taking in the smart pins too.

    By simplifying each cog set, I think both would perform nicely and complement each other. And with the simplification they would be much smaller and run much faster.

    As for the 3R1W RAM, I wonder if ON would be interested to work with you to get them working and qualified, so that they could also use them too???

    The ONC18 RAMs are guaranteed to run at a little over 200MHz with worst-case process conditions, lowest voltage, and 150C junction temp.

    When I got the RAMs working for the P1, I needed to use our e-beam prober to monitor the bitlines and control signals. I think it would be over $150k to get ON to qualify a custom RAM for inclusion into the tool chain. We are better off using their RAMs.

    Can this over-the-horizon-stuff be moved to another thread, to keep this one cleaner ?

    Yes, but I don't know how to move threads. Publison knows how. Maybe he will notice this.
  • What I get out of the document is:

    "The extraordinary increase of the fab cost: more than 6 billion $ for a 14-nm process,
    and associated chip design cost: 176 M$, more than 10 times the cost of a 130nm IC design."

    We need to buy a LOT of P2s from Parallax to allow for shrinking down to this.

    Enjoy!

    Mike
  • cgracey wrote: »
    jmg wrote: »
    cgracey wrote: »
    Cluso99 wrote: »
    So Chip, just what speed will the ONC18 RAM run at?

    I am curious about cogs having quite a bit more RAM and having the instruction set simplified and the streamers removed.
    Then have another set of cogs tuned for streamer and I/O, perhaps taking in the smart pins too.

    By simplifying each cog set, I think both would perform nicely and complement each other. And with the simplification they would be much smaller and run much faster.

    As for the 3R1W RAM, I wonder if ON would be interested to work with you to get them working and qualified, so that they could also use them too???

    The ONC18 RAMs are guaranteed to run at a little over 200MHz with worst-case process conditions, lowest voltage, and 150C junction temp.

    When I got the RAMs working for the P1, I needed to use our e-beam prober to monitor the bitlines and control signals. I think it would be over $150k to get ON to qualify a custom RAM for inclusion into the tool chain. We are better off using their RAMs.

    Can this over-the-horizon-stuff be moved to another thread, to keep this one cleaner ?

    Yes, but I don't know how to move threads. Publison knows how. Maybe he will notice this.

    VonSzarvas is the expert, but I'll give it a try
  • Go publison! (And delete this)
  • Too many interspersed replies to do a split. It seems it has to be contiguous posts to move correctly without loosing pertinent posts
  • potatohead wrote: »
    ... You can see the sync areas, dead on, frame after frame. Sweet! And those tints are coming across in my cell phone camera with about the right saturation, but a bit bright. Just FYI. Hue seems on though. For your reference.

    Ok, thanks for taking a look @potatohead . I recall way back in my first job the technicians at Telstra had Sony PVMs and would use them to do video quality tests etc, and I remember I used to watch them use this delayed mode as you show above and use the blue only mode. Never really understood what they were doing at the time, now after doing this driver with PAL etc I know exactly how useful it would be for checking sync and colour. It would have been great to have one during the development of this code to test out what I was doing. Problem is all the gamers are now into buying PVMs and the prices shot up on eBay etc.
  • Cluso99Cluso99 Posts: 15,671
    edited 2019-12-08 - 00:02:23
    cgracey wrote: »
    Cluso99 wrote: »
    So Chip, just what speed will the ONC18 RAM run at?

    I am curious about cogs having quite a bit more RAM and having the instruction set simplified and the streamers removed.
    Then have another set of cogs tuned for streamer and I/O, perhaps taking in the smart pins too.

    By simplifying each cog set, I think both would perform nicely and complement each other. And with the simplification they would be much smaller and run much faster.

    As for the 3R1W RAM, I wonder if ON would be interested to work with you to get them working and qualified, so that they could also use them too???

    The ONC18 RAMs are guaranteed to run at a little over 200MHz with worst-case process conditions, lowest voltage, and 150C junction temp.

    When I got the RAMs working for the P1, I needed to use our e-beam prober to monitor the bitlines and control signals. I think it would be over $150k to get ON to qualify a custom RAM for inclusion into the tool chain. We are better off using their RAMs.

    Chip,
    I was meaning to get ON to do it for free so they could get the IP for free :)
    I think the benefit to On would be way more than to Parallax!
  • Oh I know. The CRT is being rediscovered!

    Honestly, nothing beats glowing phosphors in a tube yet.

    I scored a couple CRTs just before the craze. And I have a big, fast 120hz Plasma. Those are my go to for fun gaming and retro fun. My plasma is 3D capable. Running Unigraphics on it is awesome. (Gonna have to try 3D with a prop.

    Anyway, nice work. At some point, one of us will do the older TTL signals and we can say a P2 drives basically anything. My PVM does CGA. My MDA capable screen died. On the hunt for one of those.

  • rogloh wrote: »
    Problem is all the gamers are now into buying PVMs and the prices shot up on eBay etc.

    Inexplicably here in Europe, too, where it's fairly trivial to get a decent RGB-SCART equipped consumer set in any desired size.
  • potatohead wrote: »
    Anyway, nice work. At some point, one of us will do the older TTL signals and we can say a P2 drives basically anything. My PVM does CGA.

    Yes I think CGA and EGA can (and should?/will) be added as well as TTL parallel LCD to this driver code. CGA/EGA can work with LUT modes and I was also starting to think more about LCD DE/CLK stuff with Smartpins this morning.

    The thing with parallel LCDs is they sometimes need power to come up before signals, backlight etc. The time delay itself not necessarily really tightly controlled (I think usually it's in the region of tens to hundreds of msec) but the order can be important. I wonder if this needs to be handled in the driver or outside of it. I think once the COG is spawned and setup the actual video data signal should be able start quickly, probably in 100-200 instructions after the COG is loaded up from hub. I'd quite like the video COG to be able to control it's own backlight PWM (remaining adjustable after startup) to avoid needing another COG to do it, and obviously it would output CLK, DE (and/or V&HSYNCs), ideally with configurable polarity and pin order/positions etc. Having the optional ability to do another GPIO with time delay at setup time could be useful for enabling LCD power etc, although shutdown sequencing could be an issue if the COG is simply stopped and the pins tri-state. Maybe that's getting a bit complicated for now...and sometimes LCDs have multiple power inputs control as well. Eg. LCD power, backlight power, PWM and each might need to be sequenced. Still thinking about it...


  • The backlight application could be a neat example of the smartpin switcher mode

    I'll see if my original nec multisync fires up. We've got some old PCs around the place that would probably still drive it just fine. There's even an old hercules system with thomson mono monitor

  • Tubular wrote: »
    I'll see if my original nec multisync fires up. We've got some old PCs around the place that would probably still drive it just fine. There's even an old hercules system with thomson mono monitor

    Yeah both Hercules/MDA and VGA monochrome would be interesting to put in too. My driver can already support a mono-VGA capability if you mix everything on one channel with suitable colour space converter parameters, but right now it doesn't eliminate the other 2 colour channels so this method is just wasting IO pins. I think having a true 3 pin mono VGA output mode (e.g, Grey+Vsync+Hsync) would be useful too. It's probably just a minor setup change for that so it should be easily doable to add at some point.

    The two video pin (Hercules/MDA) TTL monochrome format is another output option, but right now as coded the text driver portion could not replicate its 9 bit font (all fonts are 8 bits) unless I make the text code portion an overlay that could do it specifically when in that output mode. Not having colour would free up a bit of space for that perhaps. Otherwise you could adjust porch/line timing and send out 640 pixels with 80 column text instead of 720 pixels on those monitors.

    Actually I am already considering patching the text mode code for another reason, in order to allow 2x pixel clock operation with a monochrome text instead of just running colour and slowing the sync output down making the driver unusable with text at low P2 clock speeds as happens today. You'd lose colour text if you run the P2 below 5x pixel clock but still could have mono text down to 2x. I think that's the best way to go. 1x most likely couldn't support any text, unless it was double width perhaps..? Another option is to launch a secondary COG to augment the main COG to help share the burden if you really need coloured text at low P2 clock speeds. Not there yet.
  • roglohrogloh Posts: 1,784
    edited 2019-12-08 - 13:26:05
    Thinking more about the LCD output mode. This is how I am thinking of setting it up.

    Port A or B output selection.

    Parallel TTL RGB video data output on lower 8 / 16 / 24 bits of the 32 bit port with maskable bits in case you only have a 12 or 18 bit panel. This mask can effectively free up pins for use from other COGs, potentially for things such as I2C or SPI / resistive touchscreen control etc.

    If only 8/16 bit output is used, the LCD port can be optionally shifted up by 8 or 16 bits from its base pin 0, while for full 24 bit RGB operation it will be limited to output on pins 23-0 or 55-32.

    Anywhere from 2-8 LCD control pins will be situated immediately above the parallel RGB data pins, in the following order:

    DE - this would be a DAC0 sync channel output, also controlled via the streamer and synchronized with pixel data, with a configurable polarity.
    CLK - can be a Smartpin in a PWM mode, needs at least a 2x CPU clock to generate a pixel clock with configurable polarity. Odd clock ratios would still be supported but with non 50% duty cycle outputs.
    optional GPIO 0 (or HSYNC output which would be a Smartpin in a PWM mode, adjustable polarity)
    optional GPIO 1 (or VSYNC output which would be a Smartpin in a PWM mode, adjustable polarity)
    optional GPIO 2 output - e.g. LCD backlight power
    optional GPIO 3 output - e.g. LCD backlight on/off / PWM
    optional GPIO 4 output - e.g. LCD main power
    optional GPIO 5 output - e.g. LCD reset

    Any of these GPIO control pins can be controlled dynamically through the driver and their states get updated per frame at vertical sync time when the driver reads the frame parameter from hub RAM. e.g. done every ~16ms (or possibly even during hsync if more precision is required). These pins can be tri-stated or set logic high/low and can be used to control LCD power switches, backlights etc. The LCD power sequencing can then be controlled by the external COG applications, though the pins will remain under control of the video driver COG which means all other COGs outside the driver can be dynamically shutdown/re-spawned without losing I/O state. I will also try to have the driver's actual video output pins enabled through the same mechanism. This lets outside applications fully control the LCD panel startup/shutdown sequencing. I think I will add a frequency configurable PWM mode for at least one GPIO that could control a LCD backlight.

    With the dynamic sync code method used in my driver, I have up to 18+7+11 = 36 longs available for Vsync+Hsync code specific to LCD control which I expect should be enough. The LCD sync logic should be a lot simpler than say interlaced PAL, especially if V+H are each handled by a Smartpin in a continuous PWM mode. There really won't be a great deal to do apart from passing through the GPIO control pin states.

  • Some(most?) of those 18 bit LCDs seem to also support a 6 bit mode where the RGB components are sent one after the other (at 3x the usual dot clock). That seems useful for saving pins, but driving it off anything that isn't already formatted as 24bit RGB seems tricky.
  • That's generally referred to as LVDS. It's related to digital system called Camera Link. I had thought DVI was based on it but I guessed wrong on that one.

Sign In or Register to comment.