SRAM speed - is 10nS possible ???

Cluso99 · 2009-12-01 04:15

Started on the RamBlade thread· http://forums.parallax.com/showthread.php?p=849265

Update: Read the end of the thread as Chip has confirmed the delays between OUT and IN as being ~33nS.

I made the statement "10nS access is not possible!"

I will qualify this:

up to 100MHz prop clock (80MHz is the norm)
No matter how wide the data bus bits (8/16/32)
Address output to read input (no overhead for any coding)
Single COG
Multiple cogs will have additional overhead that must be included
This is a hardware timing issue, not software

Now let me explain. The cog operates in an overlapped mode of IdSDeR (I=instruction fetch, d=internal decode & R of previous instruction, S=source fetch, D=destination fetch, e=internal execution & I of next instruction, R=result writeback)

So, when an instruction writes a RAM address to the prop port, it occurs in the R phase. The next instruction, which we will say is reading the RAM data, has already been fetched. The very next clock cycle after the "R" will be "S" phase of the read instruction. AT 100MHz, this means 10nS between the "R" and "S" cycles. BUT, you MUST allow for delays in each of these cycles.

The "R" will almost certainly not occur at the beginning of it's clock cycle because of the internal gate delays, so since we do not have exact timings, it can only be assumed to reach the output pin at the end of it's clock cycle.

Now the next cycle "S" is also not specified, so one has to assume that the data must be present at the beginning of this cycle for internal gate delays to route it to the internal S register.

So how much time is between the end of the "R" cycle and the start of the "S" cycle. 0nS !!!

So based on the above, even if you use multiple cogs, you will require 8 cogs to do 10nS fetches. And your timing will still be unknown. Even if you use an expensive CRO to look at the timing, this is not an exact science and you may find delays between chips.

Therefore you need at least a clock delay between the address and the read cycles (even at 100MHz). Write cycles are worse.

So, 10nS is NOT possible at clocks up to 100MHz.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

Post Edited (Cluso99) : 4/19/2010 9:55:41 AM GMT

Bill Henning · 2009-12-01 04:26

You may be correct, however depending on the timing relationship between the address generating cog and the reading cogs, it may still be possible for bursts - as essentially it would be a pipelined application, and as long as you kept the SRAM's /RD asserted and only manipulated the address lines, and you had the appropriate delay before trying to sample the data, it may be possible.

It might need chips rated at 8.4ns.

I will gladly concede that it would be very difficult, and may in fact not be possible - that would depend on weather the "pipeline" delays could be compensated for in 10ns (assuming 100MHz) adjustable grain for addressing.

Once I have some free time (amusing concept) I'll simply try it. I believe five cogs will be necessary, however I do not believe eight would be required. I might have to use the video circuitry in unnatural ways though [noparse]:)[/noparse]

I am leery of the word "impossible".

Some people also thought 20MB/sec burst reads would also not be possible with a Propeller - and the Morpheus sitting next to me displaying high resolution bitmap graphics proves otherwise [noparse]:)[/noparse]

Cluso99 said...
Started on the RamBlade thread http://forums.parallax.com/showthread.php?p=849265

I made the statement "10nS access is not possible!"

I will qualify this:

<UL>
* up to 100MHz prop clock (80MHz is the norm)

* No matter how wide the data bus bits (8/16/32)

* Address output to read input (no overhead for any coding)

* Single COG

* Multiple cogs will have additional overhead that must be included

* This is a hardware timing issue, not software
</UL>
Now let me explain. The cog operates in an overlapped mode of IdSDeR (I=instruction fetch, d=internal decode & R of previous instruction, S=source fetch, D=destination fetch, e=internal execution & I of next instruction, R=result writeback)

So, when an instruction writes a RAM address to the prop port, it occurs in the R phase. The next instruction, which we will say is reading the RAM data, has already been fetched. The very next clock cycle after the "R" will be "S" phase of the read instruction. AT 100MHz, this means 10nS between the "R" and "S" cycles. BUT, you MUST allow for delays in each of these cycles.

The "R" will almost certainly not occur at the beginning of it's clock cycle because of the internal gate delays, so since we do not have exact timings, it can only be assumed to reach the output pin at the end of it's clock cycle.

Now the next cycle "S" is also not specified, so one has to assume that the data must be present at the beginning of this cycle for internal gate delays to route it to the internal S register.

So how much time is between the end of the "R" cycle and the start of the "S" cycle. 0nS !!!

So based on the above, even if you use multiple cogs, you will require 8 cogs to do 10nS fetches. And your timing will still be unknown. Even if you use an expensive CRO to look at the timing, this is not an exact science and you may find delays between chips.

Therefore you need at least a clock delay between the address and the read cycles (even at 100MHz). Write cycles are worse.

So, 10nS is NOT possible at clocks up to 100MHz.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller

kuroneko · 2009-12-01 04:26

INA isn't sampled during S but e (IdSDeR). And before you deny that go and check [noparse]:)[/noparse]

mctrivia · 2009-12-01 04:50

with my RAM modules 10ns burst is possible with 4 cogs. You would have to use a counter to run the clock line at 100MHz then each of the 4 cogs could read 1 after the next 1 word of data in at a time and store it in hub ram. you would have to use an unrolled read loop with 4 cycles/read/cog but you could get a 128word burst in. of course you would have to slow down after that and transfer the data to hub ram for useful processing since each cog only has every 4th word. however while processing a second prop could be using the address line to burst another 128words out.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board coming soon. $21.99 has backlight driver and touch sensitive decoder.

jazzed · 2009-12-01 05:14

Single COG read burst < 50ns per byte is clearly not possible.

I'm convinced that a one Propeller solution has no value for XMM on Propeller 1. This became painfully clear when I finally got the ICC XMM kernel working. Reminds me of Spin JVM spitting "Hello World" (running 0.6MIPS). Port pin interference slowed things terribly.

With faster XMM 2.5MIPS direct and 3MIPS+ cached (possibly) things are more attractive. With Propeller II these numbers increase by 10x (order of magnitude between friends). The algorithm I posted in the other thread is based in part on some of Kuroneko's work. Kuroneko's original demo would be a good study for many here. I'll be nice and post my demo here when I have some presentable code (with proper credit) instead of the RamBlade thread. I *may* even remove some posts there [noparse]:)[/noparse]

Tubular · 2009-12-01 05:15

Cluso, I'm not sure whether 10ns is possible or not.

What interests me, in light of the recent and disturbing trend set by Hanno and Ken, is *what module will you eat* if you proven wrong?

Entree sized ramblade or banquet style triblade?

BradC · 2009-12-01 05:19

Tubular said...

Entree sized ramblade or banquet style triblade?

Would Sir prefer that with Chianti or Claret ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
If you always do what you always did, you always get what you always got.

Cluso99 · 2009-12-01 06:22

Kuroneco: I fear you may be correct. That makes sense with the waitpxx instruction which loops at the "e" cycle. I do expect you are correct.

I think maybe the entree size with a scotch and coke [noparse]:)[/noparse] Can I use a future micro-module??? I want to build just 1 micro sized RamBlade to make sure I get the "worlds smallest".

Now back to the discussion....

Presuming that the address is output on 1 cycle, a delay of 1 full cycle is required (due to unpublished data), before the read cycle. This means that a 10nS fetch is still not possible to guarantee with a 100MHz clock. This means that the maximum read access is 20nS with 100MHz overclocking, which is what we agreed would be used for our purposes.

In all my assumptions, I have presumed no external hardware. Not that I think this will matter, other than saving cogs to increment the addresses in an external counter. However, if this method is used, I believe that the overhead in setting this up will far outweigh the benefits. This statement is based on the fact that we are not talking about big block transfers, which of course do benefit by this method, but the fact that a lot of data access is random. Take for instance, the spin interpreter (yes I know it is hub access, but follow on). It is rare that the accesses are sequential, because variables and stacks are regularly accessed. So, even though you may think that the bytecodes are sequential, in fact, other data and stacks are being accessed in between these accesses, which effectively defeats the sequential acesses.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

mctrivia · 2009-12-01 06:36

10nS random access is totaly impossible. A PSRAM chip has a nice convient burst mode thought that could make 10nS possible if using a 6.25MHz Crystal.

PSRAM chips require you to set the start address, then start driving the clock pin. with cogs 1,3,5, and 7 working together reading into cog RAM only you will get 10nS until you hit the end of the row(128words).

The real question is can the prop handle a 8.3125MHz Crystal to make 7.6nS possible?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
24 bit LCD Breakout Board coming soon. $21.99 has backlight driver and touch sensitive decoder.

heater · 2009-12-01 06:51

Call me crazy, which you are free to do in light of ZiCog, but isn't using multiple 32 bit processors just to access a RAM completely nuts? Interesting puzzle yes, but is it of practical use? (or even non-practical use).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Cluso99 · 2009-12-01 07:04

mctrivia: I expect most pcbs, if not all, will not be able to be overclocked to 133MHz (8.3125MHz). I guess I will have more answers soon based on trial and error only, not a valid engineering test. I also lack the equipment to check to see that the PLL is locked in.

Using PSRAM and burst mode, you will not be able to guarantee that you can clock the data out every 10nS because of the lack of prop timing data. The best you can hope for is 2 clocks, which at 100MHz is 20nS.

Postedit:

I must say these are exactly the threads that I love. Sometimes, great ideas evolve out of the friendly discussion that takes place. Not necessarily an on topic idea, but nevertheless, ideas that have real benefit. Keep it up

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

Post Edited (Cluso99) : 12/1/2009 7:11:33 AM GMT

jazzed · 2009-12-01 07:19

mctrivia said...
The real question is can the prop handle a 8.3125MHz Crystal to make 7.6nS possible?

Wrong direction guys! Over clocking makes the problem worse. Assuming a direct address/data bus connection to a stock 10ns asynchronous SRAM, at 80MHz, you get 12.5ns address time (2.5ns margin) assuming the collector COGs sample in the right place and the address stepper can run "at speed" .... I haven't given the PSRAM too much thought yet.

When it comes down to it, ARM is better for running big programs. But I want to see how hard Propeller1 can be pushed for running it's pseudo-native LMM with 4M to 8M external memory. With a GNU class tool set emitting PASM LMM (which is a problem), that makes Linux or BSD possible on your favorite chip. Less than 1 MIPS LMM is unacceptable. Propeller II could do it easy enough with very few COG, but it ain't here ... and it won't be for a long time (nuts! yes, nuts!).

BTW tubular, I am intrigued by your Avatar.

MagIO2 · 2009-12-01 07:21

It depends on what you want to do with the RAM. If you want to do much other things it might happen that the 4 leftover COGs are not enough to do everything you want to do. But using the RAM for a standalone graphical subsystem would be useful. Attach a touchscreen and RAM and let it deal with the whole user interaction. Menus, input fields and so on ...

A second PROP would do all the rest. And from time to time receives a request from the GUI PROP to deliver data that needs to be displayed. Or it can be the master and sends commands to the display PROP.

Ehmmm ... just one idea that came into my mind when writing this:
What about using 2 PROPS for random access. Both PROPS run with the same clock and totally in sync. Both run the same program except of the RAM driver and the I/O access. PROP 1 would generate adress bits 0-11 and drive all outputs and PROP 2 would generate adress bits 12-23 without other output (can simply be switched off with dira register). As both run the same software the will do the same things, read the same data do the same decisions .... only difference is the portion of the RAM adress they generate.

jazzed · 2009-12-01 07:29

@MagIO2, I've considered the Parallel Propeller idea also, but I never came to grips with how to handle jumps based on changing data ... both Propellers have to know about that no? Maybe it could be solved another way?

Cluso99 · 2009-12-01 07:50

Jazzed: Yes, the faster the prop goes the shorter the access. But at 80MHz you are never going to get 10nS access, so you have to run at 100MHz.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

jazzed · 2009-12-01 08:01

Cluso99 said...
Jazzed: Yes, the faster the prop goes the shorter the access. But at 80MHz you are never going to get 10nS access, so you have to run at 100MHz.

Ugggh! I thought you meant we can't use 10ns access speed memory! WTF ... what a day. It's of course exceedingly unlikely to get a byte every 10ns.

Cluso99 · 2009-12-01 08:24

jazzed said...
Ugggh! I thought you meant we can't use 10ns access speed memory! WTF ... what a day. It's of course exceedingly unlikely to get a byte every 10ns.

Of course you can use 10nS parts. Just the prop cannot read that quick. Hence my original statement "10nS access is not possible!" which I am still saying.

MagIO: Both props would also have to read the data to act on it. I don't see the benefit of this method.

If you can compartmentalise the external SRAM to·one prop and move other operations to another prop, then that is a good solution. Alternately, if you can accept slower access, then latches (or FPGA/CPLD) can reduce pins, freeing some up pins for other things. Otherwise, I guess this is the wrong chip for·your solution.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

MagIO2 · 2009-12-01 10:01

So, what's the difference in having a FPGA/CPLD or having an additional Propeller? Easy answer .. doing it with a second propeller could be much easier. You simply run the same programs ... all COGs totally identical except in one prop you don't drive outputs except of the adress lines ... you read the inputs to have the same data as the primary prop to have the same decisions. Of course this is a nearly totally waste of the capabilities of one propeller. Maybe a clever solution would allow differences in the program, so COGs not involved in memory access can do different things.

Plus .. the FPGA/CPLD solution has it's big advantage in doing bulk access. I talk about totally random access.

heater · 2009-12-01 10:19

MagI02. I don't get the idea.

You have 2 Props running the same program. One driving address lines one driving data lines, both running in step of the same clock and running the same program. Am I right ?

Bur what happens when the data read from the RAM should cause a branch in the code, or not, depending on it's value ?
Only the Prop with the data lines knows which way to go.
To fix this both props have to have the data lines.
But then there is no point in the second prop with no address lines.

Or am I missing a point here?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Clock Loop · 2009-12-01 10:28

FPGA's require very stable power supplies, many decoupling caps, and various other hardware, it all adds up, a prop is cheap. Most fpga's also don't have DIP available.

Everyone is worried about using a prop in a bad way.

Instead look at the end result.

My black box uses 4 props and no other chips. (+eeproms of coarse) I COULD have accomplished what I did using only 1 prop and some I2C I/O expanders, but then all my code would be crammed into one prop, and be quite large. I doubt that all my code would even fit into a single prop. Perhaps barely. If I did use a single prop, i wouldn't need to deal with interprop communications but my requirements don't need speed, so 115200 baud is what I use.

But a props price makes it a possible choice to replace many chips directly. Once prop2 hits, perhaps the prop price will drop and make it even cheaper to use a few props in a design.

Or dare I mention the idea of using a Prop2 as a master, and many little prop1's as slaves. (if the prop1's price can be hacked) (it has to get cheaper for this to be practical tho)

Post Edited (Clock Loop) : 12/1/2009 10:35:30 AM GMT

Clock Loop · 2009-12-01 10:40

heater said...
Bur what happens when the data read from the RAM should cause a branch in the code, or not, depending on it's value ?
Only the Prop with the data lines knows which way to go.

Prop 2 prop comms?

prop 2 prop communication means non identical programs.
I don't see how the same EXACT program can do this either.

heater · 2009-12-01 10:47

Still not with you. I thought this thread was about fast RAM access from a Prop.
And I thought the suggestion was that a second Prop could speed up RAM access somehow.
Introducing Prop to Prop comms is, well, slow.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Clock Loop · 2009-12-01 10:57

heater said...
Still not with you. I thought this thread was about fast RAM access from a Prop. Introducing Prop to Prop comms is, well, slow.

yea, what he said.

But how fast can one prop bring a line high, which the other prop looks for. Pretty damn quick. Not for 10ns tho.
Like a trigger line rather than data being transmitted.

Perhaps he was talking about having both props connected to the data AND address lines.
Then look for state changes on the data/address lines..

I know, just use a clock chip that has 2 outputs.
Then run that clock at 160mhz and then only output every other clock on each 2 outputs.

Or have two 160mhz clock chips running in sync, and skip every other clock on one and opposite on the other.

Then both props run at 80mhz, but are out of phase with eachother,
while one props clock is changing state, the other is in the middle of its cycle.

I think they call this a ring wave generator.

yea... my brain was wak.

Post Edited (Clock Loop) : 12/1/2009 2:43:11 PM GMT

Cluso99 · 2009-12-01 11:31

None of this makes 10nS RAM access possible. That is, output an address and read back in the next clock cycle, meaning 10nS access.

I think MagIO means that the 2 props are in sync, both have the same data bus, and each partially generate part of the address lines. The program is ALMOST identical in each prop. What this does is get some free pins for other purposes and the spare cogs can drive these.

Clockloop: I agree, it is better to use multiple props and keep the same development environment and reduce other chips.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

MagIO2 · 2009-12-01 13:52

Yes, ... I kind of hitchhiked this thread as the idea would not allow 10nS RAM access. But I had this idea while writing an answer what this efford in fast RAM access is good for. Sorry for that.

And yes, I mean at least the parts that need RAM access are nearly exactly the same. But only one propeller really drives the other I/O lines. BOTH of course read all external signals to have the same basis for decisions. The other minor difference is that one prop generates the high bits of the adress on the adress-bus I/O pins, the other one creates the low bits of the adress on the adress-bus I/O pins. So, the difference is that both props work internally with say 24 up to 32 bit adresses but prop 1 puts bits 0-11 (0-15) on it's adress bus lines, the other one puts bits 12-23 (16-31) on it's adress bus lines.

In opposite to FPGA/CPLD or Latch solutions you can access each RAM-adress with 2 instructions. Of course there are applications where bulk access is better, but for others random access is better.
Maybe a combined design is cool as well. The props write the adress to external latched counters. If you want bulk access you write the adress and then give a clock signal to the counters.

Bill Henning · 2009-12-01 14:53

Practical use: High resolution bitmapped graphics, lower but 256 color per pixel graphics - on Prop1

heater said...
Call me crazy, which you are free to do in light of ZiCog, but isn't using multiple 32 bit processors just to access a RAM completely nuts? Interesting puzzle yes, but is it of practical use? (or even non-practical use).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller

Bill Henning · 2009-12-01 14:55

I believe I can do somewhere between 2M-2.5M XLMM MIPS on Prop1

jazzed said...

mctrivia said...
The real question is can the prop handle a 8.3125MHz Crystal to make 7.6nS possible?

Wrong direction guys! Over clocking makes the problem worse. Assuming a direct address/data bus connection to a stock 10ns asynchronous SRAM, at 80MHz, you get 12.5ns address time (2.5ns margin) assuming the collector COGs sample in the right place and the address stepper can run "at speed" .... I haven't given the PSRAM too much thought yet.

When it comes down to it, ARM is better for running big programs. But I want to see how hard Propeller1 can be pushed for running it's pseudo-native LMM with 4M to 8M external memory. With a GNU class tool set emitting PASM LMM (which is a problem), that makes Linux or BSD possible on your favorite chip. Less than 1 MIPS LMM is unacceptable. Propeller II could do it easy enough with very few COG, but it ain't here ... and it won't be for a long time (nuts! yes, nuts!).

BTW tubular, I am intrigued by your Avatar.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller

Parsec · 2009-12-01 15:30

All I can say is you all are seriously hardcore. It's not every day you can hear a debate surrounding RAM timing. I knew I should have gone for a stinkin' EE degree instead of taking the shortcut 4-year IS degree...!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
No electrons were harmed while rendering this message.

jazzed · 2009-12-01 17:05

While Parallel Propellers allow more memory, the problem is the same regarding performance (unless you use them in a distributed computing way). You still need to have a way to fetch a 32 bit instruction in a reasonable amount of time and the only way to do that is to get more bytes per LMM loop using fast memory.

An LMM loop rolled and uncached produces 2.5MIPS at 80MHz ... cached could be higher, but I see a hit taking at least 6 instructions intra-cog without tagging.

You can easily get one byte per LMM loop with 50ns SRAM, but there is no way to get more than 1 byte from 50ns SRAM in the same time. You have to use unlatched fast SRAM which comes in two flavors PSRAM 20ns in default asynchronous mode, and 10ns standard SRAM. Synchronous PSRAM is interesting for burst read at 7ns per byte after a 70ns setup but requires 3 more pins.

Bill Henning · 2009-12-01 17:51

I believe I can build a multi-cog XLMM engine that can effectively do about 2.5 XLMM MIPS, however I think it would take take 6 cogs, require 8.4ns ram, and take a performance hit on any non-sequential access - limiting its practicality.

jazzed said...
While Parallel Propellers allow more memory, the problem is the same regarding performance (unless you use them in a distributed computing way). You still need to have a way to fetch a 32 bit instruction in a reasonable amount of time and the only way to do that is to get more bytes per LMM loop using fast memory.

An LMM loop rolled and uncached produces 2.5MIPS at 80MHz ... cached could be higher, but I see a hit taking at least 6 instructions intra-cog without tagging.

You can easily get one byte per LMM loop with 50ns SRAM, but there is no way to get more than 1 byte from 50ns SRAM in the same time. You have to use unlatched fast SRAM which comes in two flavors PSRAM 20ns in default asynchronous mode, and 10ns standard SRAM. Synchronous PSRAM is interesting for burst read at 7ns per byte after a 70ns setup but requires 3 more pins.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller

CounterRotatingProps · 2009-12-01 19:13

May I ask a related memory (noob) question here? ( a short answer would be perfect ).

If 10ns access is not possible, what IS the fastest possible (and stable) merely using an off the shelf prop and standard crystal ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

SRAM speed - is 10nS possible ???

Comments