SDRAM Driver

jazzed · 2013-04-15 16:17

cgracey wrote: »

I'm actually not sure about getting rid of the UDQM/LDQM pins.

What about the CKE pin? It's used to skip clocks and put the SDRAM into power-down/self-refresh mode. It allows the Prop2 Module to go into a ~4mA mode while preserving SDRAM contents. That's important, I think. What do you guys think? Could we do without it? The SDRAM data sheet says that it can be tied high if you don't want the functionality.

Pull it up. I don't use CKE in my driver, but it can be good for power saving especially if the SDRAM is not used at all on a proto-board. There is a special sequence required to enter and leave auto-refresh, so CKE would need to be under program control for that.

I suggested individual resistors on DQM to make it easy to solder, but an unmasked via is just as easy to solder with (32 gauge?) wire. A small SMD resistor array is fine. Hard-wiring things to VDD/GND is not my favorite practice.

I presume that we can pull up the RAS/CAS/WE/CS lines with the P2 pin settings, otherwise they need to be pulled up too.

BTW, IIRC, PRE-CHARGE is not necessary for small bursts.

Roy Eltham · 2013-04-15 16:48

Bill,
You may not be aware, but I have been doing graphics engine programming for a couple decades now in the games industry (with a decade more before that doing largely graphics related coding as my hobby on the old Commodore VIC-20/64 and Amiga as well as Apple II computers). I've written a few software rasterizers, and done line drawing to funky non-linear graphics modes. I believe I am fully aware of the issues involved. I'm saying this just so you understand where I am coming from, not intending to boast, so I'm sorry if it comes off that way.

I understand the implications of dealing with the non-linear space, and acknowledge that it will take more code. I think that it's worth the effort. I will provide any code I write towards this end freely. I will acknowledge that for a small number of lines/circles it'll likely be faster to do it via byte level writes, but for an overall graphics display a tile or chunk based approach will quickly outdo it. There are probably also some other situations where it's not ideal, but that's the beauty of the Propeller architecture. You can pick your driver to suite your needs.

Also, my intention is that the backbuffer would still be linear, as needed by the waitvid stuff. You just operate on it in chunks.

potatohead · 2013-04-15 17:05

On the crazy Apple 2, what people did was abstract the screen and store a few more shifted shapes. At first these were sloooow, but some clever lookup tables later, the Apple actually pushed way more pixels than people expected. Getting there took some work, but that computer demonstrated the advantages of software in more than a few cases.

Sapieha · 2013-04-15 17:20

Hi Chip.

I will not say made else not made.

BUT ask -- Will this board be for some people that WILL don't have that --- Else for all people that will test all Prop 2 possibility's ?

cgracey wrote: »

We've got that board really tightly laid out. I don't know if there's room for a few 0603's. Then, we'd need to take those pins down to the connector, unless someone wanted to solder wires, as you said. That ~6-clock penalty for software masking vs UDQM/LDQM controlling only represents about a 10% penalty for a byte write, by the time you get the code wrapped in the other instructions that make it all play. My gut feeling is that it's not worth all the extra consideration.

Bill Henning · 2013-04-15 17:42

Roy,

No, I was not aware that you did graphics programming in the game industry.

Your comments read as you wanted a non-linear bitmap; based on your 32x32 pixel posting, a 640x480 bitmap would consist of 20x16 tiles, and the addressing for line zero would be something like

- bytes 0..31 from tile 0
- bytes 0..31 from tile 1
- ...
- bytes 0..31 from tile 19

line 1 would be

- bytes 32..63 from tile 0
- ...
- bytes 32..63 from tile 19

line 32 would be

- bytes 0..31 from tile 20
- bytes 0..31 from tile 21
- ...
- bytes 0..31 from tile 39

Which would require 20 waitvids per scan line, and twenty 32-byte reads from SDRAM, instead of reading 640 bytes from the SDRAM in one burst.

unless you used twenty 32x32 buffer tiles in the hub (20 * 1k tile, 20k) which is a big chunk of the available hub.

Even if you do that, and maintain a dirty bit per tile, if only a few pixels are plotted in a tile, you lose big time.

Also, every single point plot will be computationally significantly more expensive.

Chip's suggestion of read/modify/write word would be faster, and byte write capability faster yet.

The bottom line is that it is Chip's choice how he implements it on the Parallax board; I was just pushing for the maximum flexibility / performance as I want the P2 to succeed like crazy

Chip could put a pullup on CKE, and pulldowns on the DQM's, and have a jumper block to attach them to p2 pins - this way people could disable CKE/DQM's by removing the jumpers.

Roy Eltham wrote: »

Bill,
You may not be aware, but I have been doing graphics engine programming for a couple decades now in the games industry (with a decade more before that doing largely graphics related coding as my hobby on the old Commodore VIC-20/64 and Amiga as well as Apple II computers). I've written a few software rasterizers, and done line drawing to funky non-linear graphics modes. I believe I am fully aware of the issues involved. I'm saying this just so you understand where I am coming from, not intending to boast, so I'm sorry if it comes off that way.

I understand the implications of dealing with the non-linear space, and acknowledge that it will take more code. I think that it's worth the effort. I will provide any code I write towards this end freely. I will acknowledge that for a small number of lines/circles it'll likely be faster to do it via byte level writes, but for an overall graphics display a tile or chunk based approach will quickly outdo it. There are probably also some other situations where it's not ideal, but that's the beauty of the Propeller architecture. You can pick your driver to suite your needs.

Also, my intention is that the backbuffer would still be linear, as needed by the waitvid stuff. You just operate on it in chunks.

cgracey · 2013-04-15 17:49

Thanks for all of your input, Everyone. I've decided to leave CKE, UDQM, and LDQM connected to I/O pins. This way, there is maximum flexibility.

Have any of you used the SDRAM driver yet? Have any of you made your own? The DE0-Nano could show big graphics with one.

Bill Henning · 2013-04-15 18:09

Thanks Chip, that is great news.

I got sidetracked with some other work, but I intend to try your SDRAM driver cog tomorrow.

I also intend to start working with the SPI Flash this week.

cgracey wrote: »

Thanks for all of your input, Everyone. I've decided to leave CKE, UDQM, and LDQM connected to I/O pins. This way, there is maximum flexibility.

Have any of you used the SDRAM driver yet? Have any of you made your own? The DE0-Nano could show big graphics with one.

Sapieha · 2013-04-15 18:18

Hi Chip.

Thanks for that decision
I planing use of SDRAM in my BASIC

But only if I can address it by 8-bit's..

cgracey wrote: »

Thanks for all of your input, Everyone. I've decided to leave CKE, UDQM, and LDQM connected to I/O pins. This way, there is maximum flexibility.

Have any of you used the SDRAM driver yet? Have any of you made your own? The DE0-Nano could show big graphics with one.

potatohead · 2013-04-15 18:20

Similar story here. I've not written any code against it, other than tinkering with the driver you did Chip. Got a short business trip to take, and I'm a little hesitant to take the DE2... It's the weekend for me.

Roy Eltham · 2013-04-15 19:02

I believe there is a fundamental difference in thinking going on here. I'm coming from a place where we draw the entire back buffer every frame. I don't read the screen buffer, modify it, then write that back out. I draw everything every frame. So the rendering cog(s) are filling in memory to be shipped out the the SDRAM. The display cog is reading chunks of SDRAM to feed waitvid. I think there will always be some reasonable sized buffer in HUB memory for the display cog.

My understanding on the waitvid setup would be that it's a waitvid per four 8 bit pixels (long) because I'd have the CLUT filled with colors for the 8 bit pixels to look up, so I'd be feeding waitvid in a loop that would read from HUB for the pixels. If you use one of the streamed modes (of which the 8 bit ones are monochrome with 256 shades, the 3:3:2 one, or the other odd RGBI8 one), then you would read from hub into the stack memory similar to Chip's 24 bit color demo. So, in any case, the hub memory layout of the pixels is less important. So, I may have to go back on my earlier statement about the back buffer being linear, because I'm thinking that might not be optimal.

Anyway, I see that while I was thinking about this and typing that chip has decided to keep the pins there, so byte access is easy.

I'm still going to write a driver that does something like what I described, because I think it will perform the best for the kinds of graphics I want to do. When I get that done, we'll see if it's fast or not and if I am wrong or not.

Bill Henning · 2013-04-15 19:26

Roy,

You are right - we were looking at it from different directions, and now our discussion makes sense to me.

From the point of view of regenerating the whole frame, every frame (like in FPS games), byte writes are redundant.

From CAD/CAM, paint, wireframe, editors with proportional fonts etc. applications point of view it does not make much sense to regenerate every frame on the fly - so byte writes are useful for 8bpp displays.

Basically, both our points of view are valid, and the appropriate view depends on the application

I am looking forward to seeing your demos, and hopefully some games...

I feed WAITVID with as large CLUT buffers as I can for high resolution bitmap graphics, and plan to use as few transactions (with as large SDRAM bursts as possible) as I can for displaying the screen. Unfortunately a lot (most?) of my usage cases are heavily dependent on the speed of plotting pixels.

Roy Eltham wrote: »

I believe there is a fundamental difference in thinking going on here. I'm coming from a place where we draw the entire back buffer every frame. I don't read the screen buffer, modify it, then write that back out. I draw everything every frame. So the rendering cog(s) are filling in memory to be shipped out the the SDRAM. The display cog is reading chunks of SDRAM to feed waitvid. I think there will always be some reasonable sized buffer in HUB memory for the display cog.

My understanding on the waitvid setup would be that it's a waitvid per four 8 bit pixels (long) because I'd have the CLUT filled with colors for the 8 bit pixels to look up, so I'd be feeding waitvid in a loop that would read from HUB for the pixels. If you use one of the streamed modes (of which the 8 bit ones are monochrome with 256 shades, the 3:3:2 one, or the other odd RGBI8 one), then you would read from hub into the stack memory similar to Chip's 24 bit color demo. So, in any case, the hub memory layout of the pixels is less important. So, I may have to go back on my earlier statement about the back buffer being linear, because I'm thinking that might not be optimal.

Anyway, I see that while I was thinking about this and typing that chip has decided to keep the pins there, so byte access is easy.

I'm still going to write a driver that does something like what I described, because I think it will perform the best for the kinds of graphics I want to do. When I get that done, we'll see if it's fast or not and if I am wrong or not.

Sapieha · 2013-04-15 19:35

Hi Ray.

Sorry if I break all that discussions -- But as I said to Chip

That PCB need be made that all people can play with it.

Not only for --- I'm can so no need for other possibility's !!

It need present all Prop2 possibility's so it is big customer CIRCLE not only specialized ones.

Roy Eltham wrote: »

I believe there is a fundamental difference in thinking going on here. I'm coming from a place where we draw the entire back buffer every frame. I don't read the screen buffer, modify it, then write that back out. I draw everything every frame. So the rendering cog(s) are filling in memory to be shipped out the the SDRAM. The display cog is reading chunks of SDRAM to feed waitvid. I think there will always be some reasonable sized buffer in HUB memory for the display cog.

My understanding on the waitvid setup would be that it's a waitvid per four 8 bit pixels (long) because I'd have the CLUT filled with colors for the 8 bit pixels to look up, so I'd be feeding waitvid in a loop that would read from HUB for the pixels. If you use one of the streamed modes (of which the 8 bit ones are monochrome with 256 shades, the 3:3:2 one, or the other odd RGBI8 one), then you would read from hub into the stack memory similar to Chip's 24 bit color demo. So, in any case, the hub memory layout of the pixels is less important. So, I may have to go back on my earlier statement about the back buffer being linear, because I'm thinking that might not be optimal.

Anyway, I see that while I was thinking about this and typing that chip has decided to keep the pins there, so byte access is easy.

I'm still going to write a driver that does something like what I described, because I think it will perform the best for the kinds of graphics I want to do. When I get that done, we'll see if it's fast or not and if I am wrong or not.

jazzed · 2013-04-15 20:07

Sapieha wrote: »

It need present all Prop2 possibility's so it is big customer CIRCLE not only specialized ones.

I totally agree.

A method of reading or writing just one byte in SDRAM will be horribly inefficient in many cases, but not all. As long as that method doesn't interfere with block transfers, it shouldn't matter. Since P2 has programmable I/O pull-ups and pull-downs, either method should work by just connecting all pins and programming accordingly.

Cluso99 · 2013-04-15 20:36

I also agree!

Maximum flexibility so everyone can experiment and try out things with the P2. Use the board to achieve some real uses. The wider the base, the better for Parallax, and that translates as better for us.

A number of us will do our own specialised boards. But the Parallax board(s) have to demonstrate maximum flexibilty to show off what the P2 can really do.

jazzed · 2013-04-16 09:32

Cluso99 wrote: »

... But the Parallax board(s) have to demonstrate maximum flexibilty to show off what the P2 can really do.

There are limits of course.

Parallax should never violate standard hardware design practice for the sake of flexibility (again).

cgracey · 2013-04-16 09:39

jazzed wrote: »

There are limits of course.

Parallax should never violate standard hardware design practice for the sake of flexibility (again).

What do you mean?

pedward · 2013-04-16 09:45

Is he talking about push-pull on the I2C EEPROM in the P1?

jazzed · 2013-04-16 10:59

Quickstart Propeller has several connections to external device inputs. They are not driven at startup and cause all kinds of ill effects. The LEDs can flash just by picking up the board for example if the pins are not set to output. The main problem however is on P30 to the buffer (there is nothing driving the node and it becomes an oscillator essentially). This has caused no end of grief, and most recently has caused UTF8 decoding failures because of all the garbage being created when the pin is not driven.

cgracey · 2013-04-16 12:52

jazzed wrote: »

Quickstart Propeller has several connections to external device inputs. They are not driven at startup and cause all kinds of ill effects. The LEDs can flash just by picking up the board for example if the pins are not set to output. The main problem however is on P30 to the buffer (there is nothing driving the node and it becomes an oscillator essentially). This has caused no end of grief, and most recently has caused UTF8 decoding failures because of all the garbage being created when the pin is not driven.

So, it sounds like a pull-up resistor on P30 would have been appropriate.

pedward · 2013-04-16 13:27

The irony is that there is an extra buffer chip in there to isolate the P30/P31 from the FTDI to avoid parasitic charging of the FTDI when not bus powered.

evanh · 2013-04-17 13:57

potatohead wrote: »

Indeed we are talking past one another.

I went to some trouble to precisely address what you had said.

potatohead · 2013-04-17 17:33

Well then I don't think we entirely agree. I'm just not a fan of motion on interlaced displays that exceeds the "display all lines" rate. I think they are great in many other contexts though, particularly when render time is at a premium. Slower sweeps maximizes that time, and really the appeal of the 24p is to understand what the various displays really will accept. So yeah, I'm gonna hack on it. That's where the fun is.

Oh, and please don't take "talking past one another" as a bad thing, or some fault, because it's not. Sometimes text is difficult, that's all. No worries here, and I enjoyed the conversation. It's good to understand where others are coming from on these things.

evanh · 2013-04-18 04:37

It doesn't matter what the display technology is; Interlacing only comes into effect if the effective framerate is the full rate of the interlaced fields, otherwise it's a progressive transmission. So, you are saying that interlacing is only any good if it is totally avoided altogether.

That is clearly a silly statement given how well interlacing works in TV transmissions. Most modern TV programmes are now running at the half framerate that comes with progressive encoding, but not all. There are a few that use the superior full rate of interlace encoding.

As for why any TV programmes would use progressive, that's a matter of speculation. Possibly it's a "want to imitate movies" type thing. The recent craziness over the Hobbit's 48Hz making the film critics say things like "it made me feel sick" almost seems to lend weight to the argument that there is an intent to lower the framerate.

I do know the transition from majority interlaced to majority progressive roughly coincided with the rapid sales of DVDs as a personal purchase as opposed to overnight hiring.

potatohead · 2013-04-18 09:10

Yes. You got it exactly regarding computer graphics, and motion is why. For other kinds of things, say pictures, interlace works great!

Here is a great A/B test data point for you. Cad / technical computing users. Put an interlaced display out there next to a progressive one and have them run both and see which one they buy. Progressive will see the majority of purchases, with viewer fatigue and motion artifacts being why. Assume 70Hz for the test. (I did this in the 90's)

Dave Hein · 2013-04-18 11:28

Field interlacing works fine if you're displaying at the native resolution of the monitor, but it's a mess if the monitor has to scale the video before displaying. The monitor has to scale the image using lines in the same field, or it has to de-interlace the video to produce a progressive image, and then scale it. Either solution has it's drawbacks.

potatohead · 2013-04-18 11:34

Well, I was in a hurry. I am not saying do not use interlace. I like interlace, but I don't like motions that exceed half the rate. Where crisp objects are desired, and there is motion that would exceed half the interlace rate, I find it undesirable.

TV broadcasts work for a lot of reasons. They aren't the same thing. For all but serious effects being in a production, I like an interlaced display.

Heater. · 2013-04-18 12:15

potatohead,

I don't like motions that exceed half the rate

I have been reading your explanations with interest but I don't really understand what you mean by a phrase like that.
The strobing of wagon wheels as seen on old western movies is quite understandable.
But for a linear motion of a car, say, how far is it allowed to move across the screen from frame to frame before it causes a problem?

potatohead · 2013-04-18 12:57

I should have done this earlier:

http://www.cardinalpeak.com/blog/?p=834

There are more considerations than he's got in the blog post, but the core idea of motion artifacts is there and represented nicely.

evanh · 2013-04-18 16:41

potatohead wrote: »

http://www.cardinalpeak.com/blog/?p=834

That is a single image taken from an attempt to deinterlace. It's the worst form of deinterlacing. That does not show interlacing for real.

It is easy to describe interlaced encoding but it's hard to show interlacing for real because it relies on motion and it relies on phosphor length and persistence of vision. In other words it can only be seen for real when actually used.

Deinterlacing can also be done better than a progressive can be interpolated.

Regarding the comparison of a 70 Hz interlaced CRT monitor vs a 70 Hz "non-interlaced" sequential CRT monitor, you are failing to comparing apples with apples again on two accounts: - You are confusing display refresh rate with animation framerate. - And a 70 Hz sequential display is operating at twice the bandwidth of a 70 Hz interlaced display.

potatohead · 2013-04-18 17:46

Let's take that ugly interlace discussion here: http://forums.parallax.com/showthread.php/147466-Interlace...-so-you-were-saying

SDRAM Driver

Comments