SRAM speed - is 10nS possible ???
Cluso99
Posts: 18,069
Started on the RamBlade thread· http://forums.parallax.com/showthread.php?p=849265
Update: Read the end of the thread as Chip has confirmed the delays between OUT and IN as being ~33nS.
I made the statement "10nS access is not possible!"
I will qualify this:
Now let me explain. The cog operates in an overlapped mode of IdSDeR (I=instruction fetch, d=internal decode & R of previous instruction, S=source fetch, D=destination fetch, e=internal execution & I of next instruction, R=result writeback)
So, when an instruction writes a RAM address to the prop port, it occurs in the R phase. The next instruction, which we will say is reading the RAM data, has already been fetched. The very next clock cycle after the "R" will be "S" phase of the read instruction. AT 100MHz, this means 10nS between the "R" and "S" cycles. BUT, you MUST allow for delays in each of these cycles.
The "R" will almost certainly not occur at the beginning of it's clock cycle because of the internal gate delays, so since we do not have exact timings, it can only be assumed to reach the output pin at the end of it's clock cycle.
Now the next cycle "S" is also not specified, so one has to assume that the data must be present at the beginning of this cycle for internal gate delays to route it to the internal S register.
So how much time is between the end of the "R" cycle and the start of the "S" cycle. 0nS !!!
So based on the above, even if you use multiple cogs, you will require 8 cogs to do 10nS fetches. And your timing will still be unknown. Even if you use an expensive CRO to look at the timing, this is not an exact science and you may find delays between chips.
Therefore you need at least a clock delay between the address and the read cycles (even at 100MHz). Write cycles are worse.
So, 10nS is NOT possible at clocks up to 100MHz.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Post Edited (Cluso99) : 4/19/2010 9:55:41 AM GMT
Update: Read the end of the thread as Chip has confirmed the delays between OUT and IN as being ~33nS.
I made the statement "10nS access is not possible!"
I will qualify this:
- up to 100MHz prop clock (80MHz is the norm)
- No matter how wide the data bus bits (8/16/32)
- Address output to read input (no overhead for any coding)
- Single COG
- Multiple cogs will have additional overhead that must be included
- This is a hardware timing issue, not software
Now let me explain. The cog operates in an overlapped mode of IdSDeR (I=instruction fetch, d=internal decode & R of previous instruction, S=source fetch, D=destination fetch, e=internal execution & I of next instruction, R=result writeback)
So, when an instruction writes a RAM address to the prop port, it occurs in the R phase. The next instruction, which we will say is reading the RAM data, has already been fetched. The very next clock cycle after the "R" will be "S" phase of the read instruction. AT 100MHz, this means 10nS between the "R" and "S" cycles. BUT, you MUST allow for delays in each of these cycles.
The "R" will almost certainly not occur at the beginning of it's clock cycle because of the internal gate delays, so since we do not have exact timings, it can only be assumed to reach the output pin at the end of it's clock cycle.
Now the next cycle "S" is also not specified, so one has to assume that the data must be present at the beginning of this cycle for internal gate delays to route it to the internal S register.
So how much time is between the end of the "R" cycle and the start of the "S" cycle. 0nS !!!
So based on the above, even if you use multiple cogs, you will require 8 cogs to do 10nS fetches. And your timing will still be unknown. Even if you use an expensive CRO to look at the timing, this is not an exact science and you may find delays between chips.
Therefore you need at least a clock delay between the address and the read cycles (even at 100MHz). Write cycles are worse.
So, 10nS is NOT possible at clocks up to 100MHz.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Post Edited (Cluso99) : 4/19/2010 9:55:41 AM GMT
Comments
It might need chips rated at 8.4ns.
I will gladly concede that it would be very difficult, and may in fact not be possible - that would depend on weather the "pipeline" delays could be compensated for in 10ns (assuming 100MHz) adjustable grain for addressing.
Once I have some free time (amusing concept) I'll simply try it. I believe five cogs will be necessary, however I do not believe eight would be required. I might have to use the video circuitry in unnatural ways though [noparse]:)[/noparse]
I am leery of the word "impossible".
Some people also thought 20MB/sec burst reads would also not be possible with a Propeller - and the Morpheus sitting next to me displaying high resolution bitmap graphics proves otherwise [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board coming soon. $21.99 has backlight driver and touch sensitive decoder.
I'm convinced that a one Propeller solution has no value for XMM on Propeller 1. This became painfully clear when I finally got the ICC XMM kernel working. Reminds me of Spin JVM spitting "Hello World" (running 0.6MIPS). Port pin interference slowed things terribly.
With faster XMM 2.5MIPS direct and 3MIPS+ cached (possibly) things are more attractive. With Propeller II these numbers increase by 10x (order of magnitude between friends). The algorithm I posted in the other thread is based in part on some of Kuroneko's work. Kuroneko's original demo would be a good study for many here. I'll be nice and post my demo here when I have some presentable code (with proper credit) instead of the RamBlade thread. I *may* even remove some posts there [noparse]:)[/noparse]
What interests me, in light of the recent and disturbing trend set by Hanno and Ken, is *what module will you eat* if you proven wrong?
Entree sized ramblade or banquet style triblade?
Would Sir prefer that with Chianti or Claret ?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
If you always do what you always did, you always get what you always got.
I think maybe the entree size with a scotch and coke [noparse]:)[/noparse] Can I use a future micro-module??? I want to build just 1 micro sized RamBlade to make sure I get the "worlds smallest".
Now back to the discussion....
Presuming that the address is output on 1 cycle, a delay of 1 full cycle is required (due to unpublished data), before the read cycle. This means that a 10nS fetch is still not possible to guarantee with a 100MHz clock. This means that the maximum read access is 20nS with 100MHz overclocking, which is what we agreed would be used for our purposes.
In all my assumptions, I have presumed no external hardware. Not that I think this will matter, other than saving cogs to increment the addresses in an external counter. However, if this method is used, I believe that the overhead in setting this up will far outweigh the benefits. This statement is based on the fact that we are not talking about big block transfers, which of course do benefit by this method, but the fact that a lot of data access is random. Take for instance, the spin interpreter (yes I know it is hub access, but follow on). It is rare that the accesses are sequential, because variables and stacks are regularly accessed. So, even though you may think that the bytecodes are sequential, in fact, other data and stacks are being accessed in between these accesses, which effectively defeats the sequential acesses.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
PSRAM chips require you to set the start address, then start driving the clock pin. with cogs 1,3,5, and 7 working together reading into cog RAM only you will get 10nS until you hit the end of the row(128words).
The real question is can the prop handle a 8.3125MHz Crystal to make 7.6nS possible?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board coming soon. $21.99 has backlight driver and touch sensitive decoder.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Using PSRAM and burst mode, you will not be able to guarantee that you can clock the data out every 10nS because of the lack of prop timing data. The best you can hope for is 2 clocks, which at 100MHz is 20nS.
Postedit:
I must say these are exactly the threads that I love. Sometimes, great ideas evolve out of the friendly discussion that takes place. Not necessarily an on topic idea, but nevertheless, ideas that have real benefit. Keep it up
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
Post Edited (Cluso99) : 12/1/2009 7:11:33 AM GMT
When it comes down to it, ARM is better for running big programs. But I want to see how hard Propeller1 can be pushed for running it's pseudo-native LMM with 4M to 8M external memory. With a GNU class tool set emitting PASM LMM (which is a problem), that makes Linux or BSD possible on your favorite chip. Less than 1 MIPS LMM is unacceptable. Propeller II could do it easy enough with very few COG, but it ain't here ... and it won't be for a long time (nuts! yes, nuts!).
BTW tubular, I am intrigued by your Avatar.
A second PROP would do all the rest. And from time to time receives a request from the GUI PROP to deliver data that needs to be displayed. Or it can be the master and sends commands to the display PROP.
Ehmmm ... just one idea that came into my mind when writing this:
What about using 2 PROPS for random access. Both PROPS run with the same clock and totally in sync. Both run the same program except of the RAM driver and the I/O access. PROP 1 would generate adress bits 0-11 and drive all outputs and PROP 2 would generate adress bits 12-23 without other output (can simply be switched off with dira register). As both run the same software the will do the same things, read the same data do the same decisions .... only difference is the portion of the RAM adress they generate.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
Ugggh! I thought you meant we can't use 10ns access speed memory! WTF ... what a day. It's of course exceedingly unlikely to get a byte every 10ns.
MagIO: Both props would also have to read the data to act on it. I don't see the benefit of this method.
If you can compartmentalise the external SRAM to·one prop and move other operations to another prop, then that is a good solution. Alternately, if you can accept slower access, then latches (or FPGA/CPLD) can reduce pins, freeing some up pins for other things. Otherwise, I guess this is the wrong chip for·your solution.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
Plus .. the FPGA/CPLD solution has it's big advantage in doing bulk access. I talk about totally random access.
You have 2 Props running the same program. One driving address lines one driving data lines, both running in step of the same clock and running the same program. Am I right ?
Bur what happens when the data read from the RAM should cause a branch in the code, or not, depending on it's value ?
Only the Prop with the data lines knows which way to go.
To fix this both props have to have the data lines.
But then there is no point in the second prop with no address lines.
Or am I missing a point here?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Everyone is worried about using a prop in a bad way.
Instead look at the end result.
My black box uses 4 props and no other chips. (+eeproms of coarse) I COULD have accomplished what I did using only 1 prop and some I2C I/O expanders, but then all my code would be crammed into one prop, and be quite large. I doubt that all my code would even fit into a single prop. Perhaps barely. If I did use a single prop, i wouldn't need to deal with interprop communications but my requirements don't need speed, so 115200 baud is what I use.
But a props price makes it a possible choice to replace many chips directly. Once prop2 hits, perhaps the prop price will drop and make it even cheaper to use a few props in a design.
Or dare I mention the idea of using a Prop2 as a master, and many little prop1's as slaves. (if the prop1's price can be hacked) (it has to get cheaper for this to be practical tho)
Post Edited (Clock Loop) : 12/1/2009 10:35:30 AM GMT
Prop 2 prop comms?
prop 2 prop communication means non identical programs.
I don't see how the same EXACT program can do this either.
And I thought the suggestion was that a second Prop could speed up RAM access somehow.
Introducing Prop to Prop comms is, well, slow.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
yea, what he said.
But how fast can one prop bring a line high, which the other prop looks for. Pretty damn quick. Not for 10ns tho.
Like a trigger line rather than data being transmitted.
Perhaps he was talking about having both props connected to the data AND address lines.
Then look for state changes on the data/address lines..
I know, just use a clock chip that has 2 outputs.
Then run that clock at 160mhz and then only output every other clock on each 2 outputs.
Or have two 160mhz clock chips running in sync, and skip every other clock on one and opposite on the other.
Then both props run at 80mhz, but are out of phase with eachother,
while one props clock is changing state, the other is in the middle of its cycle.
I think they call this a ring wave generator.
yea... my brain was wak.
Post Edited (Clock Loop) : 12/1/2009 2:43:11 PM GMT
I think MagIO means that the 2 props are in sync, both have the same data bus, and each partially generate part of the address lines. The program is ALMOST identical in each prop. What this does is get some free pins for other purposes and the spare cogs can drive these.
Clockloop: I agree, it is better to use multiple props and keep the same development environment and reduce other chips.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
And yes, I mean at least the parts that need RAM access are nearly exactly the same. But only one propeller really drives the other I/O lines. BOTH of course read all external signals to have the same basis for decisions. The other minor difference is that one prop generates the high bits of the adress on the adress-bus I/O pins, the other one creates the low bits of the adress on the adress-bus I/O pins. So, the difference is that both props work internally with say 24 up to 32 bit adresses but prop 1 puts bits 0-11 (0-15) on it's adress bus lines, the other one puts bits 12-23 (16-31) on it's adress bus lines.
In opposite to FPGA/CPLD or Latch solutions you can access each RAM-adress with 2 instructions. Of course there are applications where bulk access is better, but for others random access is better.
Maybe a combined design is cool as well. The props write the adress to external latched counters. If you want bulk access you write the adress and then give a clock signal to the counters.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
No electrons were harmed while rendering this message.
An LMM loop rolled and uncached produces 2.5MIPS at 80MHz ... cached could be higher, but I see a hit taking at least 6 instructions intra-cog without tagging.
You can easily get one byte per LMM loop with 50ns SRAM, but there is no way to get more than 1 byte from 50ns SRAM in the same time. You have to use unlatched fast SRAM which comes in two flavors PSRAM 20ns in default asynchronous mode, and 10ns standard SRAM. Synchronous PSRAM is interesting for burst read at 7ns per byte after a 70ns setup but requires 3 more pins.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
If 10ns access is not possible, what IS the fastest possible (and stable) merely using an off the shelf prop and standard crystal ?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔