MATH: Discrete Cosine Transform for JPEG in pasm

tonyp12 · 2010-08-15 14:51

Using just integer math,
the hardest part in JPEG encoding/decoding seems to be the DCT part.

Pdf file explaining how DCT is used in jpg
http://www.naic.edu/~phil/hardware/nvidia/doc/src/dct8x8/doc/dct8x8.pdf

Using 16bit integer Cuda source code should help to create a pasm version.
http://code.google.com/p/gpuocelot/source/browse/trunk/tests/cuda2.2/tests/dct8x8/dct8x8_kernel_short.cu?r=412

Any math person up to the challange to create 2D-DCT for the prop.

Taking picture with a cmos cam and saving to to SD card as jpg's using a prop!

stevenmess2004 · 2010-08-16 01:29

That file has about 730 lines in it. Would have to be a good coder to fit that all in one cog. Should be able to do it either in c or in lmm though. Also, don't forget that there is a lot of other code to get that library to work as well.

Would be happy to see someone do it though.

Baggers · 2010-08-16 03:38

The prop can already do jpeg decoding, I've done it before, a few years back, when I initially did the full fat PropGFX, I did a JPG viewer, that loaded any JPG, any size, and rescaled it to fit on the display 128x96 or 160x128 depending on what mode I set the PropGFX to.

Yes, it was slow, and written in Spin, but I don't see why it couldn't be converted to PASM / lmm without too much effort.

I've not done an encoder yet, though, as I never thought it would be of use on a prop, when we have PC's that can convert BMPs TGAs PNGs etc to JPG's a lot faster.

Rayman · 2010-08-16 06:02

Baggers wrote: »

I did a JPG viewer, that loaded any JPG, any size, and rescaled it to fit on the display 128x96 or 160x128

That's a neat trick Baggers. It would be cool to use that with my serial camera. I've been using uncompressed mode, but JPG mode would be a lot faster. There is a 160x128 preview mode...

Did you ever post that code?

Baggers · 2010-08-16 06:07

Hi Rayman, I never got to posting the code, as it was for a dedicated board.
But, I could do, I'll dig it out, and post.

davehein1 · 2010-08-16 06:42

I looked into writing a JPEG decoder for the Prop a few months ago, but I didn't get very far. The biggest problem with implementing this on the Prop is the memory required. 32Kbytes of RAM can only hold the equivalent of a 200x160 monochrome image, and that's without any room for program memory. This could be expanded with external memory, but how would you display it at a reasonable resolution? It seems like the Prop II with external DDR memory will allow this to happen.

A simple low-res JPEG decoder could be implemented for the Prop I just by decoding and displaying the DC component of the image. The JPEG algorithm works on smal 8x8 blocks of data at a time. The DC component is just the average value of an 8x8 block. A 1280x1024 consists of an array of 160x128 blocks. A 160x128 version of the image could be displayed just by decoding the DC coefficients. This would not require implementing the DCT.

Rayman · 2010-08-16 06:46

I'm trying to remember... Doesn't jpg work in 16x16 blocks? If so, couldn't you use an SD card to decompress any size image to it?

Just noticed davehein1 said 8x8 blocks... even better...

Bill Henning · 2010-08-16 07:35

Morpheus has a published 256x192 8-bit color per pixel (3R 3G 2B) driver, XGA timing

There is also an unpublished 320x240 8-bit color per pixel driver with sprites, VGA timing

I also have an unfinished 400x300 8-bit color driver, SVGA timing

Using Floyd-Steinberg dithering to get more "effective" colors JPEG's should look decent

davehein1 wrote: »

I looked into writing a JPEG decoder for the Prop a few months ago, but I didn't get very far. The biggest problem with implementing this on the Prop is the memory required. 32Kbytes of RAM can only hold the equivalent of a 200x160 monochrome image, and that's without any room for program memory. This could be expanded with external memory, but how would you display it at a reasonable resolution? It seems like the Prop II with external DDR memory will allow this to happen.

A simple low-res JPEG decoder could be implemented for the Prop I just by decoding and displaying the DC component of the image. The JPEG algorithm works on smal 8x8 blocks of data at a time. The DC component is just the average value of an 8x8 block. A 1280x1024 consists of an array of 160x128 blocks. A 160x128 version of the image could be displayed just by decoding the DC coefficients. This would not require implementing the DCT.

Baggers · 2010-08-16 08:06

I could get a jpg of ANY size, as it read it direct from SD, thus removing the need for a huge Hub-RAM buffer, for it.

It can be done using a single prop, including having the display driver, albeit low res.

tonyp12 · 2010-08-16 08:11

Even with limited ram in a prop, the fact that jpeg uses 8 by 8 blocks and need just need to buffer 2 or 3 of those blocks to calculate DC deltas etc

Jpeg goes through a couple of stages to get such good compression.

http://www.impulseadventure.com/photo/jpeg-compression.html

davehein1 · 2010-08-16 09:08

@Bill, sprites cannot be used to display large images. Error diffusion is effective for displaying on 8-bit color displays, but the results aren't always very good. In my view, the image display should be at least 640x480 and 24 bits.

@Baggers, I'm looking forward to seeing your code. What algorithm did you use for the DCT? Was it a matrix multiply or a fast butterfly algorithm?

@Tony, the JPEG encoder could get by with as little as a single 8x8 memory. This would require random access of the image file to extract each 8x8 block. It would be better to have 3 8-line buffers, one for each color component. For an image with 1024 pixel across, that would be 24 Kbytes of line buffers. This assumes you use 4:4:4 or 4:2:2 sampling.

4:4:4 means that the chroma components are at the same resolution as lume. 4:2:2 would use half the resolution horizontally for chroma. Another common mode is 4:2:0, which uses half the resolution for chroma in both the verticle and horizontal directions. This would require a 16-line buffer for luma.

mpark · 2010-08-16 09:10

davehein1 wrote: »

32Kbytes of RAM can only hold the equivalent of a 200x160 monochrome image...

Why monochrome? The attached image is 192x160 standard Propeller colors.

davehein1 · 2010-08-16 09:15

mpark wrote: »

Why monochrome? The attached image is 192x160 standard Propeller colors.

The monochrome example was just for a point of reference. And I think the picture answers your own question. The dithering error is quite obvious in that picture.

Bill Henning · 2010-08-16 09:25

Dave,

I only mentioned sprites as that driver has them - I had no intention of using them for large images

Unfortunately the Prop is not up to 640x480x24bpp without a lot of extra hardware

davehein1 wrote: »

@Bill, sprites cannot be used to display large images. Error diffusion is effective for displaying on 8-bit color displays, but the results aren't always very good. In my view, the image display should be at least 640x480 and 24 bits.

davehein1 · 2010-08-16 09:50

Prop I would be a great graphics display device if it could only support more display memory. I have high hopes for Prop II.

Bill Henning · 2010-08-16 10:07

Agreed about Prop II :-)

About Prop 1 - the real problem is not enough bandwidth & pins to refresh a 640x480x24bpp display; adding enough memory is not a problem.

Prop 2 will have more than enough bandwidth & pins, and it will be possible to use nice cheap SDRAM's with it.

Can't wait

davehein1 wrote: »

Prop I would be a great graphics display device if it could only support more display memory. I have high hopes for Prop II.

mpark · 2010-08-16 10:12

davehein1 wrote: »

The monochrome example was just for a point of reference. And I think the picture answers your own question. The dithering error is quite obvious in that picture.

Ouch. I thought it looked OK, considering.

davehein1 · 2010-08-16 10:39

Sorry Michael. I didn't mean to be so blunt. The picture does look good considering that it is a dithered image. The last time I worked with dithering was about 14 years ago when I developed a video decoder that ran on a PC. PC displays still used 8 and 16 bit modes back then. 24 bit displays were also common, but my program had to work with the lower color resolutions as well. I was never very happy with the quality of 8-bit dithered images.

mpark · 2010-08-16 10:56

No worries, Dave. I can see the flaws in the picture, but I'm just an 8-bit guy at heart.

Graham Stabler · 2010-08-16 15:54

This looks interesting and would be a nice addition to go with reading analogue cameras as per Hanno. Doesn't look too terrible, save raw data to SD (in a convenient structure) then process block by block back to the SD. Lots of cos and some multiplication. Cos is easy and fast with cordic so that is one option.

I may at some point be able to generate and excuse to do this at work but at the moment there is no chance.

Graham

MATH: Discrete Cosine Transform for JPEG in pasm

Comments