Timers/video generators and external memory
stevenmess2004
Posts: 1,102
How about another approach to external memory.
If the timers are used than you could use just two pins to output the address into a shift register. Than just use 8 pins for reading the data and a couple of pins for control. That should be a total of 12 pins needed for any size address space. For a 24 bit address it that will take 12 clock cycles to output a clock and the address (I think you can get the clock from the timer PLL?). Then we need at least one instruction to enable the outputs, another to read and another to clear the outputs (12 cycles) which comes to a total of 24 (probably actually 28 since the waitvid takes a few instructions) cycles or 2.85MB/s at 80MHz.
If you didn't mind an extra IC you could put in an extra latch between the shift register and the ram and probably make it 4 clocks quicker by reading the data while outputting the address but it would probably also need another pin.
Also not sure if standard CMOS parts will take those kind of frequencies (160MHz).
Probably useless and I don't have time to do anything with it but thought I would put the idea out there for greater minds than mine to think about [noparse]:)[/noparse]. Please go easy on it, there are probably mistakes in the timing etc. that will need correcting.
If the timers are used than you could use just two pins to output the address into a shift register. Than just use 8 pins for reading the data and a couple of pins for control. That should be a total of 12 pins needed for any size address space. For a 24 bit address it that will take 12 clock cycles to output a clock and the address (I think you can get the clock from the timer PLL?). Then we need at least one instruction to enable the outputs, another to read and another to clear the outputs (12 cycles) which comes to a total of 24 (probably actually 28 since the waitvid takes a few instructions) cycles or 2.85MB/s at 80MHz.
'assumes timers have been properly setup and we've done any other setup stuff required loop xor OUTA,readMask ' set the pins so that we can read the data. For writing we do the same but with a writeMask 'This also has to set the pins from timer high. See quote from prop manual below mov temp,INA 'get the data into a buffer, we can shift it around and copy it to the hub later while we're waiting for the address to be shifted out. ' could change this to a wrbyte but it will slow things down a bit xor OUTA,readMask 'unmask pins, problem here because we'll get some data we don't want sent to the shift register 'but that should be easy enough to fix because it will always be in the same wrong place waitvid address, %1_0000_0000 'sets up our address. nop nop 'however many instructions you need to use up the cycles it takes to output the data. Do things like moving data to hub, etc. Just time for one hubop or 1 and 3/4 instructions. 'If a hub op could messup timings, but then could use it for getting the address you want. Then have another cog that is using the data. Could also have a flag in it to set read/write 'which would change cancel the jmp below jmp #loop
If you didn't mind an extra IC you could put in an extra latch between the shift register and the ram and probably make it 4 clocks quicker by reading the data while outputting the address but it would probably also need another pin.
This is kind of a pain in the neck but I think it should be fairly easy to work around by not using the end pins of the shift register if the timing is precisely maintained. If the timing of the loop is variable than it will be basically useless or you need to move the first xor instruction to the right place.Data Sheet said...
When FrameClocks cycles occur and the cog is not in a WAITVID instruction, whatever data is on the source and destination busses at the time will be fetched and used. So it is important to be in a WAITVID instruction before this occurs.
Also not sure if standard CMOS parts will take those kind of frequencies (160MHz).
Probably useless and I don't have time to do anything with it but thought I would put the idea out there for greater minds than mine to think about [noparse]:)[/noparse]. Please go easy on it, there are probably mistakes in the timing etc. that will need correcting.
Comments
The problem is that you can not synchronize with the video generator. So it's hard to stop the clock signal at the right time. So I put 2 waitvid after another, one shifting out the adress, the other setting the video output to 0. The first bit of the first waitvid I set to 1 and let the external shift register stop, when this 1 reached the end of the shift register. It worked! But ... doing things in parallel still is faster. You can at max shift out 6 bits per instruction cycle. With one instruction - using the 8 bit data bus you can set 8 bits of the adress.
The only reason for using shift-register is if you run out of IOs and you do adress out, data out and data in with the shift method.
Currently I try to develop a CPLD for fast RAM access.