Catalina - request for input regarding XMM support

RossH · 2009-06-23 13:54

@all,

I'm looking for input on future directions for XMM support in the Catalina C compiler. I'm working on this at the moment, and I appear to have reached a point from which I could take Catalina in any of several directions - most of which appear to be incompatible, and some of which (while attractive) may ultimately prove to be fairly pointless.

I'd appreciate any thoughts anyone may have about to which way to go. There's a lot of very bright people on this forum, and somebody may be able to give me the pointer I need. But first I should explain my dilemma in a bit of detail, so bear with me for a couple of paragraphs:

First a little bit about compilers and program segments: Most compilers generate their output into multiple code and data segments. The definitions of the segments varies slightly from compiler to compiler, but Catalina is fairly typical - here is a brief definition of the Catalina segments and their usage:

CODE: a read-only segment containing executable code (most compilers do not generate self modifying code).
CNST: a read-only segment containing constant data (e.g. string literals, constant values, jump tables etc).
INIT: a read/write segment containing statically allocated data, such as global variables (for most procedural languages this segment is trivially small).
DATA: a read/write segment containing dynamically allocated data, such as stack and heap variables (this is where all local variables are allocated).

The current release of Catalina only supports putting the CODE segment into XMM RAM. This allows programs to have code sizes up to 16Mb - but all the other segments must fit entirely within the 32k of Hub RAM.

The NEXT release of Catalina is likely to support having both the read-only segments (i.e. the CODE and CNST segments) in XMM RAM - but the read-write segments will still have to fit within the 32k of Hub RAM. While this incremental change may not sound like it will improve things much, in reality it does because it allows much larger real-world programs to be supported - for example, many games programs need large amounts of read-only constant data - i.e. most of the images, sounds, game logic etc - but only enough read/write data to store the current state of the user's interaction. And if you don't think games programs are "real-world" enough then substitute "internet applications" instead - the situation is quite similar.

However ... even this incremental change is turning out to be harder than I expected, and is requiring quite a significant rethink and redesign of the Catalina code generator. While I'm confident this particular change will prove to be both technically feasible and ultimately worthwhile, I'm now not so sure of where to go after that. Many people (including me, initially) would probably have said that the obvious next step is to also put the read/write segments in XMM RAM. But this may ultimately prove to be a bit pointless. Here's why:

For much the same reasons LMM programs execute slower and PASM programs, XMM programs will execute slower than LMM programs - i.e. the cost of accessing the RAM gets higher in each case. Precisely how much slower will depend on the XMM implementation, but XMM RAM access is only likely approach the speed of LMM RAM access when something like an "autoincrement" addressing mode can be used - and the program is accessing memory sequentially. Whenever the program has to set a new random address before performing the next access, a "worst case" random access may be required, which might easily be 2 or 4 or times slower than a "best case" sequential access. Fortunately, sequential access is more common than random access in both of the read-only segments (CNST and CODE). But even so, and even assuming the best possible XMM RAM implementation, I would expect such XMM programs to execute maybe 2 times slower than LMM programs - and even then only at the cost of dedicating at least 2 (and possibly more) cogs to managing the XMM RAM.

But it gets even worse for programs that have their read-write segments in XMM RAM as well as their read-only segments - they may be many times slower yet again. The problem here is not only that read/write segments are typically accessed randomly rather than sequentially - it is also that these accesses will be interleaved with the normally sequential code accesseses, destroying the usefulness of any "autoincrement" addressing mode, and forcing nearly every XMM access to be to a "worst case" access.

Hence my dilemma - is there any point in doing all the work required to support read/write data segments in XMM if it ends up needing so many cogs (and the resulting code executes so slowly anyway) that it really makes no sense to use a Propeller for the job in the first place? Perhaps I should instead concentrate on faster access to read-only segments in XMM, and also on various consolidation and code optimization improvements? Concentrate on exploiting the Props unique features, and just live with the fact that the data size is limited to 32k? One thing to keep in mind here is that the Prop II will help solve this problem without any further work being required - just by supporting more Hub RAM than the Prop I.

Logically, the answer would have to be to defer futher work on XMM - but on the other hand I'm not immune to the lure of being able to use Catalina to compile itself on the Prop - and that would definitely require the ability to put read/write segments in XMM (compilers typically being quite memory hungry).

There are some possible hardware solution to this dilemma - one possibility would be to have XMM RAM that supports two address registers - one for sequential access and another for random access. Perhaps even "multi port" XMM RAM that can be independently accessed from several cogs. But as far as I know, no-one is working on such a thing - and in any case, such solutions may end up being so expensive that they also make it preferrable to simply use another chip in place of the Propeller.

Anyone have any brilliant ideas? Or suggestions? Or comments? Am I missing something vital, or making any incorrect assumptions?

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Mike Huselton · 2009-06-23 14:19

Unless Parallax makes an announcement, Prop I will be bound by the 32k memory bandwidth PRACTICAL limit until next Spring, when I calculate the Prop II will be announced.
Thus, I would concentrate on optimizations to the LMM model. Physics just cannot be ignored and we cannot have everything we want.

Concentrate on the 95% of the applications the forum NEEDS (take a poll) and leave the 5% of Wild Blue Yonder apps alone.

A good working Macro utility, data structure utility and efficient packed binary string handling is a good area upon which to concentrate, for example.
A good B-Tree implementation for linked lists will be useful. Intellisense is great for compliers.

Maybe this is Largos territory.

March 2010 is just 9 months away. Do you want to still be working on a Catalina LMM and XMM then?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

Post Edited (James Michael Huselton) : 6/23/2009 2:39:47 PM GMT

Sapieha · 2009-06-23 14:43

Hi RossH.

What I understod ... For XMM to have both RD and RD/WR block's on external RAM's You must have one new construction on External RAM adresing that have 2 separate adresing registers ... One for Code and other for RD/WR Data block's

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
For every stupid question there is at least one intelligent answer.
Don't guess - ask instead.
If you don't ask you won't know.
If your gonna construct something, make it·as simple as·possible yet as versatile as posible.

Sapieha

WNed · 2009-06-23 14:52

I've got to agree with JMH. Polishing may not be the sexiest task on the list, but taking some time to tighten down and optimize what you already have will ultimately give the tool a broader appeal, and may just occupy your mind long enough for it to sort out the next step on its own...

Are you allowing for compiler directives so the developer can select which segments go into RAM? That way you're not making the choice at all, the person working a particular problem is.

Ned

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"They may have computers, and other weapons of mass destruction." - Janet Reno

Bill Henning · 2009-06-23 15:35

Hi Ross,

Interesting post, understandable problem.

Ideally, I'd like Catalina to support it all... as it would make seamless compilation of standard C code easy. As you said, it would allow Catalina to run on the Prop, which is something I really want to see.

It would also allow me to compile Las for the prop (after I trim back a bit on how much memory I waste to make it blazingly fast)

I see the performance hit as a user choice - and I notice a close parallel to the evolution of memory models on the old 8086/80286 for C compilers under dos... remember:

small data / small code
small data / large code
large data / small code
large data / large code

It even had performance penalties due to frequent segment register reloading!

As far as hardware solutions go, I've been playing with a paper design for a CPLD based memory controller with auto increment/decrement pointers, however even if I decide to proceed it would be a couple of months before it was available.

I personally like choice, so what I'd prefer would be a compiler switch for supporting the following modes:

small data / small code - LMM code, hub memory only
small data / large code - XMM code (+ consts later), hub data only
large data / small code - LMM code, leave XMM for data - good for graphics
large data / large code - XMM code, XMM data, take the performance hit - use "register" keyword to keep data in hub

I know it is a lot of extra work for you, however the benefits are great - especially with PropII, as 256KB of hub will not solve the "small data" problem for code like Las...

The parallels with the development of the PC are almost scary....

Best Regards,

Bill

RossH said...
@all,

I'm looking for input on future directions for XMM support in the Catalina C compiler. I'm working on this at the moment, and I appear to have reached a point from which I could take Catalina in any of several directions - most of which appear to be incompatible, and some of which (while attractive) may ultimately prove to be fairly pointless
....

However ... even this incremental change is turning out to be harder than I expected, and is requiring quite a significant rethink and redesign of the Catalina code generator. While I'm confident this particular change will prove to be both technically feasible and ultimately worthwhile, I'm now not so sure of where to go after that. Many people (including me, initially) would probably have said that the obvious next step is to also put the read/write segments in XMM RAM. But this may ultimately prove to be a bit pointless. Here's why:

...

But it gets even worse for programs that have their read-write segments in XMM RAM as well as their read-only segments - they may be many times slower yet again. The problem here is not only that read/write segments are typically accessed randomly rather than sequentially - it is also that these accesses will be interleaved with the normally sequential code accesseses, destroying the usefulness of any "autoincrement" addressing mode, and forcing nearly every XMM access to be to a "worst case" access.

...

There are some possible hardware solution to this dilemma - one possibility would be to have XMM RAM that supports two address registers - one for sequential access and another for random access. Perhaps even "multi port" XMM RAM that can be independently accessed from several cogs. But as far as I know, no-one is working on such a thing - and in any case, such solutions may end up being so expensive that they also make it preferrable to simply use another chip in place of the Propeller.

Anyone have any brilliant ideas? Or suggestions? Or comments? Am I missing something vital, or making any incorrect assumptions?

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
my 6.250MHz custom Crystals now available - run your Propeller at 100MHz!
Las - Large model assembler for the Propeller (alpha version released)
Largos - a feature full nano operating system for the Propeller
www.mikronauts.com - a new blog about microcontrollers

jazzed · 2009-06-23 15:51

@Ross,

It seems to me that the 32K data model is reasonable for generic programs. If a system is designed that allows the coexistance of fast program data access and specialized buffer access, then you can take that path.

Having a second type of memory just for graphics sprites, etc... access is interesting. As a "precedent", older Cisco core routers have a fast accessible packet memories and the operating system clearly is designed with that in mind. So it is not unreasonable to expect a certain facility for graphics storage in code if the hardware is available (not sure how details of that would work).

In the case of common memory, the time it takes to make decisions about whether or not to use an auto address increment access nullifies the performance enhancement derived from the approach except maybe in the case of buffer operations. The same is true for any caching schemes (having multi-bit hardware range comparators would help this greatly though ... a PropII wish list item [noparse]:)[/noparse].

It always comes down to needing more pins or something else. At some point you just have to say: "this is what works reasonably well with what we have" and just do that.

Unless we're in a dream, we have to live within our means ... or find more appropriate means.

Thankfully, the overwhelming majority of XMM access for running LMM programs is reading long words. This is why having a read-only item like Flash would make sense for LMM.

Other CPUs have similar code/data ratios to XMM 512K/32K or 1MB/32K so at least staying with the 32K data model puts you in good company. If you really want to go beyond 1MB memory, things get complicated and slower one way or another.

In the long run, supporting multiple code/data models would be best as Bill mentions, but I'm sure even you have limits about what you're willing to do "just for fun."

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

Ron Sutcliffe · 2009-06-23 16:37

Hi Ross,
I would not worry too much about it. Just draw a line in the sand and decide on the API for XMM. I am biased of course, but for what its worth I would like to see it support a 2 X16 buss with an option to support four bit post inc/dec addressing. It would be hard to beat the Hydra Extreme for streaming Video and gaming. Tri-Blade is based around random access and makes great use of overlays. Come to think of it, there is jazzed,, localroger, kurenoko (praxis possible) and myself who have some form of XMM boards up and running. I am sure there are more.

Some are already thinking in terms of +512K XMM. If that’s the case XMM will also be required for Prop-11 so a new approach is required. It will likely use CPLD to accommodate many of the features which have already been identified.

I think the Prop community can find their own way once an API is defined. I’m with James on this one, but you won’t please everyone.

Ron

Ron Sutcliffe · 2009-06-23 16:44

Wow,
Bill and Steve have already posted, my post is old news already.

Ron

Mike Huselton · 2009-06-23 16:57

PropII comm & memory access:

[url=http://www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-03-17-9a-Webinar-[18].wmv]www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-03-17-9a-Webinar-[noparse][[/noparse]18].wmv[/url]
[url=http://www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-03-17-4p-Webinar-[04].wmv]www.parallax.com/Portals/0/Downloads/mm/video/Webinar/2009-03-17-4p-Webinar-[noparse][[/noparse]04].wmv[/url]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

Bill Henning · 2009-06-23 17:18

You can add me to the "have some form of XMM boards up and running" list, details to be revealed at UPEW 2009

Ron Sutcliffe said...
Hi Ross,
I would not worry too much about it. Just draw a line in the sand and decide on the API for XMM. I am biased of course, but for what its worth I would like to see it support a 2 X16 buss with an option to support four bit post inc/dec addressing. It would be hard to beat the Hydra Extreme for streaming Video and gaming. Tri-Blade is based around random access and makes great use of overlays. Come to think of it, there is jazzed,, localroger, kurenoko (praxis possible) and myself who have some form of XMM boards up and running. I am sure there are more.

Some are already thinking in terms of +512K XMM. If that’s the case XMM will also be required for Prop-11 so a new approach is required. It will likely use CPLD to accommodate many of the features which have already been identified.

I think the Prop community can find their own way once an API is defined. I’m with James on this one, but you won’t please everyone.

Ron

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
my 6.250MHz custom Crystals now available - run your Propeller at 100MHz!
Las - Large model assembler for the Propeller (alpha version released)
Largos - a feature full nano operating system for the Propeller
www.mikronauts.com - a new blog about microcontrollers

Ron Sutcliffe · 2009-06-23 21:08

@Bill
We are sadly disadvantaged living on the other side of the world, when it comes to all things Parallax. I hope that we will see and hear something of what you have to say at UPEW posted to this site.

Ron

BTW Bill, whilst I remember.
LAS does not like underscore byte separaters (ie long %00000000_11111111_000000000 _11111111)

Post Edited (Ron Sutcliffe) : 6/23/2009 10:03:25 PM GMT

Bill Henning · 2009-06-23 21:57

Hi Ron,

I think video's will be posted... and I will be updating my site.

Thanks, I'll add support for underscores in numbers after UPEW.

Ron Sutcliffe said...
@Bill
We are sadly disadvantaged living on the other side of the world, when it comes to all things Parallax. I hope that we will see and hear something of what you have to say at UPEW posted to this site.

Ron

BTW Bill, whilst I remember.
LAS does not like underscore byte separaters (ie long %00000000_11111111_000000000 _11111111)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
my 6.250MHz custom Crystals now available - run your Propeller at 100MHz!
Las - Large model assembler for the Propeller (alpha version released)
Largos - a feature full nano operating system for the Propeller
www.mikronauts.com - a new blog about microcontrollers

RossH · 2009-06-23 23:37

@all,

Thanks for the feedback. I will consider the options while I'm finishing off the current XMM support (i.e. allowing read-only segments in XMM). I'll certainly also be spending quite a lot of time tidying things up (especially the XMM support) so that others can then extend Catalina more easily.

However ...

@Bill, @Wned, @jazzed, @Ron : I probably didn't make it clear enough that while I could probably support any ONE of the options you collectively suggest, what I am finding is that each one will probably require not only a different kernel, but also a different code generator. To be efficient enough to be useful, each one tends to require a slightly different approach to the code generation, and a slightly different set of kernel primitives - and there just doesn't seem to be enough space in the kernel to support them all at once. This is also what makes the idea of having a standardized "API" for XMM support less feasible than it might at first appear - having an API is all well and good, but it is quite possible that no single kernel could ever implement all the different types of functionality that might be required. I've even considered going back to a simpler LMM kernel - i.e. closer to Bill's original - but I find I'd have to give up too many things that I need for my own purposes.

@JMH : I think a poll is a good idea. I'll try and come up with a sensible set of options - or perhaps just Bill's list would do. I dont quite understand your point about adding a Macro utility, or B-trees etc. These things are readily available in C already. And (as far as I know) "intellisense" is mostly to do with the development environment, not the compiler. Am I missing something?

@Sapieha : Yes, that's what I think is required. Unless someone can think of another hardware-based solution?

In summary, if there was consensus on obvious "best option" I'd be willing to go with that - at least until the Prop II arrives - but I probably can't offer all possible options. My own key requirements for Catalina are in fact very close to being met by the existing version - and (as @jazzed points out) I still have to work for a living, so there is a limit to what I can do "just for fun"

Ross

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Cluso99 · 2009-06-24 04:50

Ross & All,

I have taken the approach that a CPLD is too complex a task to bother with, both from a hardware standpoint and software view. This is why I wanted fast random access to 512KB of contiguous RAM and no multiplexing. Extremely simple hardware, and I will offload all I/O (except microSD access = disk) to another prop. It also reduces the complexities of the interface to a simple driver to the second prop which will be intelligent with it's own code. The prop is cheap enough to do this.

So the Ram,·Prop and uSD section costs about $15 in parts, plus the pcb and assembly and cost. Maybe some commercial profit as well. I unfortunately pre-announced this and something came up to delay it, although I have progressed some other ideas, but the sram interface remains the same. Unfortunately I am stuck with some other issues at this time, so I haven't progressed the pcb for production. The actual hardware design is complete - hey it's only 3 IC's. No eeprom required·

This approach is more aligned to mainly a single cog execution model for XMM, except that the uSD code runs in a second cog, and the interface to the other intelligent peripheral (propeller) runs in a third cog.

I would envisage that this model would use a compiler which has code, const, init and data in XMM. Byte access can be done in 4 instructions and my recollection is 17 instructions for long access. This excludes the calling overhead. But I would assume that the compiler would just think that it used a memory driver and all memory was located somewhere which just replaces the hub access. This means the compiler would "think" all 512KB of memory was hub, but in reality the driver would say "no, it is really external SRAM".

I have no doubts Chip will resolve the external memory issue on PropII, but that is tooooo far away.

We need something simple now, and that anyone can build. I just want the jump for my idea before I release the whole schematic, although I have discussed with some others offline.

And of course, my goal has always been to have a Prop operating system and compilation on the prop.

Just my 2 cents

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBladeProp, RamBlade, TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80), MoCog (6809)
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

Mike Huselton · 2009-06-24 06:08

RossH - I got my IDE and compilers mixed up

Cluso99 - I totally agree. RAM is worth dedicating one P8 chip to the control & access functions.
Could you PM me a schematic or at least a parts list?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

RossH · 2009-06-24 06:40

Hi Cluso,

Interesting - I'd have to see a circuit diagram, but if I understand correctly, you're saying you can read or write any long in XMM RAM using 17 "cog" instructions? That's between 3 and 10 times slower than accessing hub RAM (depending on the synchronization between the cog and the hub) but it's better than I expected. That makes your solution a very attractive option for a native PASM program that needs access to lots of RAM.

But if my quick calculations are correct it would make typical LMM instruction cycles (e.g. read an instruction from RAM which then writes a long to some random RAM location) around 5 times slower when executed from XMM RAM than when executed from hub RAM.

Knowing how experienced you are with hardware, I'd say your solution is probably close to optimal. Having random access to XMM from native PASM only 3 times slower than hub RAM (best case) or 10 times slower (worst case) certainly makes your solution a winner when programming in PASM - but is it fast enough to justify the effort required to allow Catalina to use it? LMM programs are already at least 4 times slower than native PASM, which means we're looking at C programs executing something like 20 times slower than the equivalent program written in native PASM. Yes, it can be done - but is it worthwhile? As yet, I'm just not sure - hence the dilemma (BTW - we're now getting down to the levels where C may actually execute slower than SPIN!)

However, you have made one good point - I hadn't properly considered the idea of just ignoring the 32k of hub RAM altogether - it is partly trying to make use of that RAM in addition to the XMM RAM that is really making the whole thing so difficult. I guess I have some inbuilt resistance to the idea of not using the hub RAM for anything - it's so much faster than XMM RAM that I feel any compiler that doesn't make SOME use of it must be sub-optimal somehow!

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

jazzed · 2009-06-24 06:41

I'm afraid even a direct address/data scheme is still to slow to write and read program variables. Many such variables will be more than a byte by nature. Of course if you're just trying to replicate something that was done 30 years ago, that is inconsequential.

Using just one cog for word/long read and especially write is just just foolish. Write time can be literally cut in half with two cogs because of the write strobe and address changes for long access. Reads are less complex than writes, so they do not benefit as much as writes with two cogs. There is a solution that can optimize reads further with fast enough SRAM, but it is cog hungry. Of course if you have a propeller doing nothing but code execution and memory access, using up cogs is not a problem.

I find some efforts have not been encouraged or have been outright supressed because they do not fit in a particular world view. I find that very troubling.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

RossH · 2009-06-24 06:45

@jazzed,

To quote a line that W9GFO may be familiar with - "we will not be suppressing anyone's opinion here today"

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

potatohead · 2009-06-24 06:46

It's worth noting that Prop II does not eliminate Prop I.

Meaning, if the result can be usable and reasonably practical, it's probably worth doing on Prop I, which will remain favorably indicated on a lot of projects, Prop II or not.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
Safety Tip: Life is as good as YOU think it is!

RossH · 2009-06-24 06:56

@potatohead,

Agreed - the Prop I will continue to be an extremely useful chip in its own right, and should continue to be supported. But the advent of Prop II may make the idea of using XMM RAM on the Prop I (for anything other than pure data storage) seem a bit silly - it wouldn't make much sense to use it as program storage when the Prop II would be so much faster - and "Prop I + XMM" maybe be more expensive than "Prop II" anyway.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

jazzed · 2009-06-24 07:03

But Ross, Prop-II will only have 256KB of hub ram as far as we know. It is a good chunk for data "segment", but program space will still be very limited. I fully expect that some XMM facility will still be needed for Prop-II although it won't be as pressing a problem because of the number of pins and performance factors.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

RossH · 2009-06-24 07:12

@jazzed,

Yes, I agree. But maybe the whole picture is as follows:

- If you need only 32K total RAM you use a Prop I.

- If you need more than 32K but less than 256K total RAM you use a Prop II.

- If you need more than 256K in total but still under 256K data space then use a Prop II + XMM.

But if you need more than 256K data space, then maybe the right answer is not to use a Prop at all.

This would seem to imply that spending time supporting data segments in XMM may be a waste of effort.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

RossH · 2009-06-24 07:21

Sorry, didn't mean to be too heretical in my last post.

Here's an idea - what about using one Prop II to supply XMM RAM to another Prop II? (Actually, it's not my idea - I was inspired by Cluso's TriBladeProp).

What I mean is having two or more Prop II's tied together - one may be doing mostly I/O and not need much RAM, but it can use one or more spare cogs as a "memory controller" to serve up its spare RAM to another Prop II - to which that RAM appears mcuh as any other XMM. You could probably daisy chain lots of these together.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Cluso99 · 2009-06-24 07:30

Steve: I am not ignoring the other methods. They have their own place. I have an opinion and I would delighted to be proved wrong. I am just not going down that path, so it is better I ignore it than criticise. However, if by pushing my method, someone comes up with a better solution, we all benefit.

Ross: Are all your accesses going to be longs? If so, then probably you are right with the speed calculation. Otherwise, there may be hidden benefits. Somewhere I got the impression that there were word accesses.

I had hoped that to use my method would be very simple to totally substitute hub ram with external sram only by modifying the driver and allowing for 512KB. Isn't this the case???

BTW: At 100MHz I think I will have to insert a nop delay in reading and writing to sram. I haven't redone the timing yet. Of course, this does not translate to increasing long access as some instructions can be moved to take advantage of this.

What we need is a Prop I with 64 I/O and a new PLL section to get to 300MHz [noparse]:D[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBladeProp, RamBlade, TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80), MoCog (6809)
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

Mike Huselton · 2009-06-24 07:31

You can't make a single chip do a motherboard's work. Are we losing sight of the economy and objective of using a single chip?
Sure, I can keep adding subsystems until I have poorly replicated a mini ATX PC motherboard. What is the point? Am I missing the point?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

jazzed · 2009-06-24 08:01

@Ross, Depending upon Prop-II cost, using two or more together makes lots of sense in boosting memory resources. Chip mentioned serial links could go as fast as 12 M bytes/second on Prop-II (not sure if that is only intra-Chip or not). So, for IPC etc... that's a big win, but a parallel bus would still be needed for fetching program code or accessing data.

@James, you nailed it pretty good. PC motherboards are practically free. There is a point I think where it would not make sense to use even a Prop-II for an application, but that should be obvious to the most casual observer ... unless the observer is blind. Still, there are many things possible for either Propeller.

@Ray, I will be one of the first customers for your 2-blade board. There is only so much that one can do with 8 cogs. Using the second propeller for a graphics engine with quick access to sprite memory, etc... would be pretty sweet. Have you considered bringing out pins to a header for an optional SRAM module? That would be very attractive where fast access to data like sprites is needed. Heck one could even create really nice Windows-like GUIs given that [noparse]:)[/noparse] It is amazing how nice the UI is for today's phones .... I've made old school display stuff on Propeller with an LCD, but the really nice stuff takes lots more power.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

Cluso99 · 2009-06-24 08:17

JMH: I dont want it to be too complex which is why I said 1 x Prop + 1 x 512KB Sram + 1 other chip, plus a normal prop setup. But yes we are approaching other systems which are better suited to the task. But, if you want the same programming base as the prop for the peripherals then it is a good choice.

And.... we are nuts [noparse]:([/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBladeProp, RamBlade, TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80), MoCog (6809)
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

Mike Huselton · 2009-06-24 08:22

Cluso99,

Okay - what is the third mystery chip?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH

Cluso99 · 2009-06-24 08:33

nothing special - just a quirk for speed - my surprise [noparse]:)[/noparse]

Ok I will post an idea - nothing done yet (and not the chip) - see the RamBlade thread shortly because it is really OT here.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBladeProp, RamBlade, TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80), MoCog (6809)
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

Post Edited (Cluso99) : 6/24/2009 8:41:48 AM GMT

Sapieha · 2009-06-24 08:41

Hi James Michael Huselton.

Overclocking and adding of Memory.
It is not for replace PC.

BUT it is VERY FUN

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
For every stupid question there is at least one intelligent answer.
Don't guess - ask instead.
If you don't ask you won't know.
If your gonna construct something, make it·as simple as·possible yet as versatile as posible.

Sapieha

RossH · 2009-06-24 08:52

@Cluso,

No, not all data accesses will be long, but all code accesses are, and the majority of data accesses are as well - If I had to guess, I'd say a typical Catalina program might be something like 85%-5%-10% distribution between long, word and byte accesses. In Catalina, "short" is the only word sized data type.

The preponderance of longs for instructions, for all registers, for most data types, and for most temporary variables - and also for nearly all stack and frame access - would make it such that even programs that do mostly byte or character manipulations would still probably only have a distribution something like 75%-5%-25%

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Catalina - request for input regarding XMM support

Comments