PDA

View Full Version : would you use fifo input/output or faster external memory in prop2?



Teva McMillan
03-20-2009, 05:46 AM
My concept for having FIFO's included on prop2 is fairly simple:

2 FIFO units one that works on port A and one on port B
32 bit config/control register for each fifo in main memory (could also be in cogs registers but this would take up more die space for 32 lines run to each cog)
Each fifo can be configured as input fifo or output fifo.
Each fifo can do a bus matching operation with chunks being either 32 bit or 96 bit. (8,16->32bit or 12,24->96bit)
each fifo can do 32bit, 128bit data transfers between FIFO and main memory.
each fifo holds 512 bits. (16 longs)
Any cog can trigger the transfer which is then done on the time it could have done a rdlong or wrlong.
Data does not pass through the cog, it just supplies then base address and width (32 or 128 bit) for data to either be pulled out or pushed into FIFO A or FIFO B.
Each fifo has a dedicated clock pin, which runs a dedicated PLL which generates 4 or 6 (not sure about limits of silicon here) times faster clock.
The clock from the FIFO PLL is used to drive timing of either capture or transition of FIFO. This being set to one of 4 or 6 positions.
Clock source should be stable, a more complex design might allow clock to pause but I don't think many applications would need this.

I came up with these ideas mostly to have a practical frame buffer that would not use up lots of cogs on the propeller. Frame buffering say a 1280x1024 video signal has three data transfers: Data into external memory, data out of external memory and data out to VGA port somehow. In this case one fifo might be used for getting data back out of external memory and into main memory, then the other fifo being driven off external clock source at the VGA clock speed used to move data out during scan lines. The fifos would be able to do this with a very small number of cog cycles being used. Without the fifos 2 Cogs would be needed to put data in ext ram, 2 cogs to get data out of ext ram, and another 2 to send data out. And sendout out the data would be hard to get the timing right except at low res. With the fifos the two cogs putting data into ext ram could also control the FIFOs and generate the H+V syncs. Leaving 6 cogs for other things instead of 2.

128 bit data path, that can be used every other clock at max is overkill, but this is needed so that at max transfer speed (32bits in/out each port every clock) 4 cogs don't have to worry about running some fifo triggers giving up there main memory access slot. Also its possible the FIFOs could be running at higher speed, perhaps even as high as 233 mhz, this could allow external memory devices with low pin count bus (like 8 bits only) to get rather decent performance. Still with 2 cogs using most of the slots for fifo triggers this is would be enough for 24bit bus at 210mhz which is plenty as prop2 could barely do any processing at all on data rates this high.

I have talked to chip about this idea a few times, and he mentioned it in a couple posts here, but he was going to integrate in the video stuff he is doing but then didn't like the idea after some of the video bits changed.

I think this is the fastest and simplest method of having very fast external memory access without building something for a specific memory device. With this feature in prop2 I would expect to see sdram chips on some of the premade boards, and some propeller objects to allow use of 32MB sdram's released when the chip launches.

Is this something your application for prop2 would benefit from?
Is it something you would never use?
How many months would you not mind waiting extra to have this feature be in prop2?

I would love to hear what other poeple might find uses for the FIFO's or maybe some other ways to implement them. I have tried to come up with the simplest, lowest die area, most deterministic way of implementing a FIFO feature I could come up with.

-Teva