A technique for communicating with cogs from any language

RossH · 2012-01-08 05:34

All,

In this recent thread, we discussed the possibility of developing a standard for communicating with cogs from any language. Some people were for it, and some against it. Others simply couldn't see the point of it.

For those who were interested, I had intended to start another thread distilling all the good suggestions and proposals made in the earlier thread in order to continue the discussion - but since I found myself out of internet contact for the past few weeks, I decided instead to just implement as many as I could to see how they might work.

I have created a draft document (attached) which describes an implementation of as many of the suggestions as I could accomodate. I will incorporate additional stuff as I find the time, so any comments or additional suggestions are welcome. If you feel I have missed (or misunderstood) something significant, please let me know.

The Spin examples in the appendices of the document should compile with any Spin compiler (I used the Parallax one) and should run unmodified on a C3. The usual modifications (e.g. clock speed, mode and pins) will be required on other propeller platforms. I have also included a bit of work on C examples, but since you will need the (as yet unreleased) Catalina 3.5 to compile them, they are currently not complete and are included mainly for illustration purposes (i.e. to show how the technique applies to other languages). This is probably the documents biggest drawback at the moment - the techniques are much "wordier" in Spin than they are in most other languages, and the benefits of them are not so evident since there are generally other ways to achieve the same thing in Spin (provided you only ever want to use Spin). However, eventually the document will contain fully worked and interoperable PASM, Spin and C examples - and perhaps other languages if I get time - so hopefully the point of the exercise should become clearer.

Some of you may notice I have stuck to the some of the existing Catalina terminology - this just made the document easier for me to write. The examples do not require you to have Catalina installed to compile and run them, and the document goes right back to start with the very basics of Propeller programs, and does not assume any specific Catalina knowledge (as far as I can tell!). For those of you who worry about the terminology, a simpe global replace of "registry" with "catalog" and "plugin" with "coglet" would suffice to divorce the document completely from Catalina.

For those of you that are familiar with Catalina, some of the early stuff will look familiar - however there are some subtle but significant improvements over Catalina's implementation, based mostly on the suggestions of Heater, Mike and Eric. These improvments will make their way into Catalina 3.5, which will run faster than previous versions of Catalina as a result.

Ross.

EDIT: Document and source files updated to match Catalina 3.5.

obrienm · 2012-01-08 10:24

Ross,
I am very interested in this specification work. Language and even device independence is always good, this is why XML is a good enterprise markup language. Reviewing your analysis document now.
thank you
/michael

Heater. · 2012-01-08 11:12

I'm a bit worried, this spec is 35 pages long. A bit more than the half page of recommendations that have been made in the past. It will take some time to digest.

obrienm

XML is a good enterprise markup language.

In what way exactly? (No don't answer it's way off topic here:))

RossH · 2012-01-08 14:46

Heater. wrote: »

I'm a bit worried, this spec is 35 pages long.

Yes, I may have gotten carried away while I was on holiday and had nothing else to do

But it is so large mainly because I decided to include all the source code in the document itself - the last 15 pages are just appendices listing the source code. Maybe I should just have included a zip file - I may do that in the final version.

Ross.

Dr_Acula · 2012-01-08 14:47

This is very good, Ross. I've just finished reading through the document. I like the examples near the end.

On that other thread there was a long discussion about whether this would be useful or not. I wonder if the next step might be to think about a package of objects that are available in Spin? examples might be mouse, keyboard, various displays, some maths, serial port and an sd card driver. If you had a critical mass of such drivers, any user would be able to use this package and then if they wanted to add one more, there would be a compelling reason to stick to this standard.

Just my 2c worth! I'm in the middle of coding an object at the moment, based originally on Cluso's work, and thinking about how one could translate Cluso's standard to your standard. I can talk about the problems one might find along the way.

RossH · 2012-01-08 14:48

Heater. wrote: »

obrienm

In what way exactly? (No don't answer it's way off topic here:))

I think michael was just using XML as an example of something that is successful because it is language and vendor independent. You can of course argue whether XML is a good standard or not (but not in this thread, please!).

Ross.

Dr_Acula · 2012-01-08 15:00

Ok, technical questions:

Is the ability to separate out the spin and pasm part and load and reload pasm into cogs an integral part of the standard?

If it is, then that implies that the compiled pasm part has to be loaded from somewhere, and if so, that implies some code and methods to do this. The first place one might store compiled pasm code is on an SD card, so that implies an SD driver as part of the package. Which SD driver would you recommend, and how much code space does a minimum driver need?

But there are other places to store this information. One other place is the high 32k of an eeprom. Do you have some code that can read and write blocks of pasm code to high eeprom? (I see there is an active thread at the moment on this very topic. Presumably the data has to come from somewhere and ultimately it probably is from an SD card)

So an SD driver would be an integral part of the package?

OR - is the separate pasm/spin thing not essential?

RossH · 2012-01-08 15:02

Dr_Acula wrote: »

On that other thread there was a long discussion about whether this would be useful or not. I wonder if the next step might be to think about a package of objects that are available in Spin? examples might be mouse, keyboard, various displays, some maths, serial port and an sd card driver. If you had a critical mass of such drivers, any user would be able to use this package and then if they wanted to add one more, there would be a compelling reason to stick to this standard.

Hi Dr_A,

Yes, there are those who won't see the use of it, but there are enough people who do to make it worthwhile. I'm quite excited by how neatly it all fits together as a framework, and the fact that you can add this kind of framework so simply and naturally to the Propeller made me realize all over again just how well engineered the chip actually is. For me, it has also crystallized a lot of the things I was trying to do with Catalina, and puts them on a much sounder foundation. A good example is Catalina's "proxy" drivers - these started out as a bit of a curiosity, but in this new framework they now seem like a perfectly natural method of making software work in a multi-propeller environment.

And yes, I do intend to add a larger set of plugins (e.g. keyboard, mouse and TV driver). I just haven't had time yet. Also, I didn't want to get too far ahead in case someone came up with any more good suggestions that meant I would have to rewrite the whole thing.

Ross.

Cluso99 · 2012-01-08 15:09

Don't have time for reading yet, but I will later in the week. This should be a good start Ross. While I like code examples embedded in a doc, 35 pages will frighten a lot of people.

I agree with Drac that a basic set of complying drivers will start the job nicely.

RossH · 2012-01-08 15:53

Dr_Acula wrote: »

Ok, technical questions:

Is the ability to separate out the spin and pasm part and load and reload pasm into cogs an integral part of the standard?

Yes

Dr_Acula wrote: »

If it is, then that implies that the compiled pasm part has to be loaded from somewhere, and if so, that implies some code and methods to do this. The first place one might store compiled pasm code is on an SD card, so that implies an SD driver as part of the package. Which SD driver would you recommend, and how much code space does a minimum driver need?

Yes. The examples in the document only load plugins from Hub RAM (via Spin) but the document discusses the option of loading from other media. I just didn't want to get too complicated too soon.

Dr_Acula wrote: »

But there are other places to store this information. One other place is the high 32k of an eeprom. Do you have some code that can read and write blocks of pasm code to high eeprom? (I see there is an active thread at the moment on this very topic. Presumably the data has to come from somewhere and ultimately it probably is from an SD card)

That's a good idea, since most people have a 64k EEPROM. Perhaps I'll include that as an example.

Dr_Acula wrote: »

So an SD driver would be an integral part of the package?

Eventually, yes.

RossH · 2012-01-08 15:54

Cluso99 wrote: »

... 35 pages will frighten a lot of people.

Only those that weren't really interested in the first place

Ross.

Dr_Acula · 2012-01-08 16:07

Only those that weren't really interested in the first place

I like that!

Ok, for those of us who think this is great... One thing I find doing often when starting up a new project (either in spin or C) is creating a skeleton project. For me, this generally involves a keyboard driver, mouse, serial port (for debugging), some sort of display (TV or VGA) and an SD driver.

Then once that is all put together, it is just one line of code print 'Hello World'.

I wonder what such a skeleton program would look like using your method?

In my experience, such a program can consume half to three quarters of hub memory space. But your system is already going to use a lot less because if you load the pasm part off the sd card, you will save 2k for the mouse, 2k for the keyboard, 2k for the display driver, maybe 2k for the sd card, 2k for the serial port, so that could be 8k saved right there. And you could save even more if the sd driver were a simpler driver than, say, the full bells and whistles one used in Kyedos.

An example of the 'old method' vs the 'new method' in terms of memory savings could be a great demonstration of why this package makes sense.

RossH · 2012-01-08 17:23

Dr_Acula wrote: »

Ok, for those of us who think this is great... One thing I find doing often when starting up a new project (either in spin or C) is creating a skeleton project. For me, this generally involves a keyboard driver, mouse, serial port (for debugging), some sort of display (TV or VGA) and an SD driver.

Then once that is all put together, it is just one line of code print 'Hello World'.

I wonder what such a skeleton program would look like using your method?

Hi Dr_A,

The good thing about this technique is that I can tell you what such a program might look like even in the absence of any of the necessary plugins (which would of course have to differ from platform to platform). If I were programming this in Spin, I would put all the registry initialization and plugin loading in a Hardware Abstraction Layer object called something like "HAL.spin" (much the same way Catalina does) and then a portable "Hello World" application becomes trivial. Here is one possibility:

CON
''============================================================================== 
'' Example Spin "Hello, World" program.
''============================================================================== 

_CLKFREQ = Platform#CLOCKFREQ 
_CLKMODE = Platform#CLOCKMODE 
_STACK   = Platform#STACKSPACE 

OBJ 
  Platform: "Platform.spin"   ' this object contains all the platform-specific stuff
  HAL     : "HAL.spin"        ' this sets up the registry and loads plugins appropriate to this platform
  Display : "Display.spin"    ' this is a wrapper object for whatever display plugin is loaded

PUB Start

  ' Set up our Hardware Abstraction Layer (which will load the plugins)
  HAL.Start

  ' use the services of the Display plugin - on some Propellers this may be a VGA 
  ' display, while on others it may be a TV display or a serial terminal.
  Display.Print(STRING("Hello World"))

Of course, this does not really appear to be saying much - until you realize that you now have a Propeller program that is completely portable to any platform, and also that you can now use exactly the same HAL (i.e. the same set of plugins, offering exactly the same set of services) just as easily from any language, not just from Spin.

Ross.

pjv · 2012-01-08 17:30

Hello Ross;

I have been following this thread with some interest... not so much as a potential user, (because I do not know, nor am I interested in learning the C language) but more as it might pertain to other assembler code that I am developing. My experience has been mostly as a high speed embedded assembler programmer, but have now learned enough Spin to get by.

So my particular interest is in squeezing a lot out of each Cog with low latency "lean and mean" drivers, and in that pursuit I have developed a real-time co-operative scheduler that runs in a Cog, and permits several (like 8 or so) driver threads to operate simultaneously in that one Cog. The scheduler has a timing resolution of 1 uSec, and each driver/thread operates independently of the timing of any other thread. And hence each thread is written as an independent piece of code, albeit with some co-operation rules. The scheduler is fully self contained, and consumes approximately 50 longs of code plus 10 registers of the Cog's 512 long memory. The functionality of the scheduler permits the drivers to be very short, permitting a significant repertoire to be loaded at one time.

There are two versions of the scheduler, the first and simplest one requirest that the assembler driver codes are loaded into the Cog at compile time, and a second version (still onder development) that can at run-time dynamically load assembly codes into a Cog running that scheduler, all while the Cog continues un-impacted operation of drivers already running. In this manner a Cog could easily dismiss drivers no longer required, and load further drivers without much limitation. Tests on driver loading speeds through the scheduler were typically less than 100 uSec.

Since there may be 8 Cogs operating in this manner, (and please remember my interest lies in extreme performance) I have chosen to "communicate" with these Cogs in a very simple manner... a fixed and dedicated piece of hub Ram comprising 4 contiguous longs for each Cog. The base location of each of those interface blocks is a one byte flag register where Spin can signal any of 8 threads in that Cog to start/resume or stop code execution, and the scheduler writes back to that same register the running status of each thread in the Cog. Spin is pretty slow... typically 20-ish uSec per simple instruction, but the scheduler replies to the Spin flag command in about 3 to 5 uSec, depending on what is already happening in the Cog. Also, on a thread concluding some operation such as finishing a serial transmission, the corresponding bit in the flag register is cleared in 2 or 3 uSecs, signalling Spin that the operation is finished, and Spin can continue with its activities. The remaining 3 bytes and longs of each block are driver specific, typically indicating a sub-function for the thread to perform, as well as Hub address specifications for buffers and the like.

So my reason for posting here is to see if there is any value or interest in this multi-threading scheduler approach to put some convenience and cog-saving performance into the growing trend of modularity, and how it might be impacted by what you are up to..... Frankly I have not much comprehension of what that really is, as I am personally not a believer in the applicability of large C programs for small embedded processors, and have chosen not to align my thinking in those directions. Instead I take great pleasure in writing very compact high performance assemler code, and I believe my current sheduler and drivers migh be of benefit to others, especially if they become Cog constrained, as larger applications are apt to do.

Your comments please ?

Cheers,

Peter (pjv).

RossH · 2012-01-08 17:54

pjv wrote: »

Hello Ross;

...

So my reason for posting here is to see if there is any value or interest in this multi-threading scheduler approach to put some convenience and cog-saving performance into the growing trend of modularity, and how it might be impacted by what you are up to..... Frankly I have not much comprehension of what that really is, as I am personally not a believer in the applicability of large C programs for small embedded processors, and have chosen not to align my thinking in those directions. Instead I take great pleasure in writing very compact high performance assemler code, and I believe my current sheduler and drivers migh be of benefit to others, especially if they become Cog constrained, as larger applications are apt to do.

Your comments please ?

Cheers,

Peter (pjv).

Hi Peter,

I'm aware of your scheduler, and I think it would be absolutely ideal for an application that needs to be written entirely in PASM (e.g. where you need absolute top speed). It would also be ideal for use within a PASM plugin - but the main topic of this thread is how to make these type of programs accessible from other languages, which is how most Propeller applications are written in practice (in Spin for most people). While I quite like programming directly in PASM, it is hard for most of us to write large, complex or portable applications that way.

But you do raise a good point - I'll have a think about whether I may be limiting myself too much by assuming only one thread in each cog would need to communicate with other cogs (after all, I specifically allow for high-level language programs to be multi-threaded - why not plugins!).

Then our techniques would be quite compatible.

Ross.

RossH · 2012-01-08 18:01

All,

kuroneko has pointed out an error on page 4 of my document. I say on that page that the JMP instruction in the following code is optional if the cog program is started from Spin:

DAT
         ORG  0
entry    JMP  #cogstart  ' jump over initialization data
coginit1 LONG 0          ' <- overwrite before starting cog (offset=4)
coginit2 LONG 0          ' <- overwrite before starting cog (offset=8)
                         ' <- etc
cogstart                 ' <- cog program code starts here

This is of course complete twaddle - the JMP instruction is always required. I'll remove that comment in the next version.

Ross.

Cluso99 · 2012-01-08 23:24

Peter (pjv): I would see your multitask pasm code as a solution to cog limitations. Cogs will be in short supply in a number of cases, particularly where pasm speed is required, but it ewill not the fastest pasm code (to allow for the multitasking). I could see I2C, SPI, UARTs, Kbd, Mouse, etc all as objects that would benefit from being able to run in a multitasking cog. And those benefits will be realised not just with spin, but other languages including Catalina. So your development is important and strategic.

General:
In reality here, we are using the prop in areas where it was not intended. (intended as a small micro with programmable peripherals and multiple small cores) But, we have all found that this little chip is capable of soooo much mooore! Otherwise, why on earth would we be doing this with a prop instead of an ARM or similar.

So, we have chosen to go this route. Hence, Ross is trying his best to make everyone see that it is beneficial to have at least one standard cross-platform method. And for the record, I agree. We have sat on our hands for far too long! Thanks Ross - I will look asap and post my comments.

BTW you should check your (snail) mailbox today/tomorrow.

Cluso99 · 2012-01-09 00:40

Ross: On page 11
PUB SetResponse(cog, response)
[FONT=Calibri,Calibri][FONT=Calibri,Calibri]This method sets the first long of the communications block allocated for the designated [/FONT][/FONT]cog [FONT=Calibri,Calibri][FONT=Calibri,Calibri]to the specified [/FONT][/FONT]response [FONT=Calibri,Calibri][FONT=Calibri,Calibri](which should be non-zero).

I think this should be "... sets the second long ..."

Ross: I have read the doc. Looks good but I need more time to read and digest again.[/FONT][/FONT]

RossH · 2012-01-09 01:36

Cluso99 wrote: »

Ross: On page 11
PUB SetResponse(cog, response)
This method sets the first long of the communications block allocated for the designated cog to the specified response (which should be non-zero).

I think this should be "... sets the second long ..."

Quite right. I'll add some errata to the first post in this thread until I get around to updating the document.

Ross.

RossH · 2012-01-10 01:51

All,

I've updated the document (first post in this thread) to include the couple of errors people have noted. Also, I've added all the Spin examples as a zip file to that post, to make it easier for people to compile and run.

Ross.

ersmith · 2012-01-10 06:48

You've done a nice job on the documentation Ross, and the concept is sound. There's some great ideas in there! However, I think the actual implementation is a bit more complicated than it needs to be. The whole of Layer 2 is superfluous, really. The cog id is not needed for anything -- the only method of inter-cog communication is via memory, so there is no need to associate plugins with cogs. Also, why not integrate the communication block with the registry itself?

I haven't had a chance to write up a detailed proposal, but here is a sketch of what I had in mind:

(0) Memory Allocation

As in Ross' proposal, memory is allocated from the end of hub RAM down. The top word of hub RAM contains a pointer to the first allocated word. To allocate N more bytes, decrement this word by N (which must be a multiple of 4). By convention the first 4 bytes of every memory block is a header, containing 2 bytes of flags describing the plugin using the block (see below) and 2 bytes for the length of the block. A header with a "0" length terminates the linked list of blocks. At startup $7FFC will contain $7FF8 (the first allocated block) and $7FF8 will contain 0 (indicating the end of the block chain). To allocate 8 bytes for a plugin, decrement $7FFC so it contains $7FF0, and write a header (described below) into $7FF0 with a length of 8 bytes. Note that I've chosen to illustrate with concrete addresses, but we could easily have the "base of memory" variable somewhere other than $7FFC.

(1) Plugin Creation

The Spin wrapper for loading/initializing the plugin allocates a block of memory for the plugin. This is called the "service directory entry" (sdentry) and is usually at least 8 bytes long: 4 bytes for the header, and 4 bytes (or more) for the communication block. Any additional working space a plugin requires also comes in this block.

The sdentry blocks form a linked list of services that plugins provide. The first long word (4 bytes) of the service directory entry is a header. The first byte gives the service type. The second byte gives additional details about the service, including any locks it requires. The third and fourth bytes give the length of the entry, i.e. the total amount of memory allocated for this plugin (including the header).

Service directory header:
     | type |  lock  |  subtype  |    length         |
bit   0     7 8    10 11       15 16               31

or in C:

typedef struct sdheader {
  unsigned int type:    8;
  unsigned int lock:    3;
  unsigned int subtype: 5;
  unsigned int length: 16;
} Sdheader;

The service directory entries form a linked list; to get to the next one in the list you add the length of the current one. A length of "0" indicates the end of the chain.

The "type" field of the header indicates the type of plugin: display, keyboard, mouse, clock, floating point, and so on. A type of "0" indicates an allocated block of memory not specifically offering a service, and can be used if plugins need to allocate memory after initialization. We can also provide that type "0" with lock 7 and subtype 31 indicates a free block of memory, so plugins can release their memory.

(1b) The COG running the plugin is passed a pointer to its service directory entry. Any parameters it needs are either in that memory or have been placed into the COG code (so they are now in COG memory). If the COG needs access to other services, it can walk the chain of sdentries to find those services; or the Spin wrapper in the initialization code can do the same. Note that for the common cases (no other services needed, or only services that have already been started are needed) the COG can rely entirely on its PAR. If mutually dependent services are required the COGs will have to go out and read the memory allocation at $7FFC to find any services started after it. This should be rare.

(1c) If a plugin provides more than one service, its wrapper should allocate multiple service directory entries (chained together as usual) and pass the first one to the COG. The COG can then use the chain to find where the service entries are. Alternatively, the initialization code can allocate one large block (big enough for all the services provided) and pass that to the COG, allowing the COG to subdivide the block into service directory entries.

(2) Service usage

The communication block comes immediately after the service directory header. The size of the communication block is variable (depends on the plugin type). The first word of this is the command request word. For simple services responses can be placed in this word as well. The communication block basically functions as in Ross' document.

In this model we only have the one "registry" (I've called it the "service directory" to distinguish it from the Catalina registry) for services/plugins. Finding a service is straightforward, as is having a single COG provide multiple services. The service users do not need to know which COG is providing the service, they just need to find the sdentry (and hence communication block) for the service.

Eric

RossH · 2012-01-10 15:33

Hi Eric,

Thanks for the thoughtful comments. There's some stuff in there that I will have to re-read and think about, but I have some initial responses ...

ersmith wrote: »

I think the actual implementation is a bit more complicated than it needs to be. The whole of Layer 2 is superfluous, really. The cog id is not needed for anything -- the only method of inter-cog communication is via memory, so there is no need to associate plugins with cogs.

I kind of agree with you on this - Layer 3 is actually more efficient to use than Layer 2, and would be the recommended approach in most cases. However, Layer 3 is slightly more complex and also consumes more Hub RAM (especially as the set of standard service ids for a particular language grow large) whereas Layer 2 is essentially static in size no matter how many services are required. Also, the idea is that Layer 2 is pretty close to being the "simplest thing that could work" and hence could be used to form the basis for many alternative Layer 3 implementations. In my view it it is worth keeping intact until the layers themselves mature a bit more. If you use a Spin compiler that eliminates unused methods, there is no overhead incurred by doing so - so that's what I decided to do.

ersmith wrote: »

Also, why not integrate the communication block with the registry itself?

I wavered back and forth on this one - but the problem of making the registry entries larger and more complex is that it simply introduces overhead in those cases where the cogs do not require any comms blocks - which (from experience) is true about half the time - i.e. any Hub RAM you add to the registry is very often just wasted. I understand that in your proposal you may not have a registry entry per cog - but doing so significantly simplifies cog management (which - along with memory management - is the main function of the lower layers)

ersmith wrote: »

I haven't had a chance to write up a detailed proposal, but here is a sketch of what I had in mind:

(0) Memory Allocation

As in Ross' proposal, memory is allocated from the end of hub RAM down. The top word of hub RAM contains a pointer to the first allocated word. To allocate N more bytes, decrement this word by N (which must be a multiple of 4). By convention the first 4 bytes of every memory block is a header, containing 2 bytes of flags describing the plugin using the block (see below) and 2 bytes for the length of the block. A header with a "0" length terminates the linked list of blocks. At startup $7FFC will contain $7FF8 (the first allocated block) and $7FF8 will contain 0 (indicating the end of the block chain). To allocate 8 bytes for a plugin, decrement $7FFC so it contains $7FF0, and write a header (described below) into $7FF0 with a length of 8 bytes. Note that I've chosen to illustrate with concrete addresses, but we could easily have the "base of memory" variable somewhere other than $7FFC.

Yes, I thought about adding more complex memory management - my original intent was to do something along the lines you describe. However, I decided to remove it for three reasons:

I wanted "minimal" techniques in each layer, so I decided not to specify what was in the memory blocks allocated for each plugin, or how they were managed after being allocated - especially as you can add this for yourself by extending the appropriate layer if you wanted to. But instead of just using a constant to indicate where to allocate memory from, I could reserve an upper memory location (i.e. $7FFC) to accommodate different types of memory management (such as your linked lists).
Most memory allocation on the Propeller is static (since it is mostly used in embedded applications) - i.e. once allocated, memory blocks are hardly ever de-allocated - so the overhead of more complex memory management is not really necessary - at least not until you get up above layer 4. If I could have thought of a way of sharing the memory management at these lower layers with the memory management at the application layer it would be worthwhile - but I couldn't think of an easy way of doing so.
It would have made the document even longer!

ersmith wrote: »

(1) Plugin Creation

The Spin wrapper for loading/initializing the plugin allocates a block of memory for the plugin. This is called the "service directory entry" (sdentry) and is usually at least 8 bytes long: 4 bytes for the header, and 4 bytes (or more) for the communication block. Any additional working space a plugin requires also comes in this block.

The sdentry blocks form a linked list of services that plugins provide.

...

The service directory entries form a linked list; to get to the next one in the list you add the length of the current one. A length of "0" indicates the end of the chain.

...

If the COG needs access to other services, it can walk the chain of sdentries to find those services; or the Spin wrapper in the initialization code can do the same.

I considered this kind of approach (I knew you favored it from your comments in the previous thread) but in the end I decided that any solution that involved searching arrays, or stepping through linked lists to find the information you need was going to be (a) too slow (as I found out in my original registry implementation for Catalina) and (b) non-deterministic (which is not in keeping with the deterministic nature of the Propeller).

To eliminate the need to search every time, you find you need to store pointers or other information either locally or globally - which is messy and adds more complexity.

My solution was to adopt as much service orientation as I could while keeping both the cog registry and the service registry static - this way, if you access plugins via layer 3 then you never need to search anything, and you never need to store anything yourself.

There is a fine line to walk between making the techniques simple enough yet functional enough that people will use them, and making them more useful but so complex that potential users - especially Spin users - will simply shrug their shoulders and go do their own thing (which is what people tend to do now). Even my proposal (which is substantially simpler than yours) may already be on the wrong side of this fine line.

I'm happy for others to put a view here as to whether the additional benefits are worth the additional complexity.

Ross.

RossH · 2012-01-11 13:53

All,

Kuroneko found a couple of bugs in my example program (I never claimed to be a Spin programmer

). Source code (and document) now updated. There is no change to the techniques themselves, so unless you intend compiling the example you probably don't need to download it again.

Ross.

Cluso99 · 2012-02-22 18:28

Time to bump this thread for those of you playing with OSes on the prop...

I have just been looking at this after a long time away from Sphinx where some nice methods started.

Do we agree with Ross's 1 long per cog plus another 2 longs per cog??

I am wondering (with a small waste) if rather than the 2 stage, we could allocate 4 longs per cog to simplify the registry???

Another thought was that the address be limited to 23 bits (8MB) because we could then simplify the code to set the type byte using "movi re,#type<<1".

If the memory was 4 longs per cog, then

+0:
- <type 9 bits> 'only uses 8 bits but is set by movi xxx,#type<<1
- <23 bits> 'app specific
-1:
- <address pointer 32 bits> 'we now have full 32 bits
-2:
-3:

longs 2 & 3 are used as proposed by Ross. But a plugin could use this together with 1 for expanded services.

I am thinking specifically for HAL input and output, we could have a separate TV or VGA driver and Keyboard driver, or we could have a combined serial (PC/VT100/another prop) driver. As I have worked this up from Sphinx, it would work here nicely with the 4 longs.

For this to work nicely, I think we need to be able to define the type using bits 0 & 1 for input and output ability. So Kbd would be $x1, and TV/VGA would be $x2, and a PC/serial would be $x3 (because it handles both input and output). Does this make sense?

BTW Here is a link to the SphinxOS thread where we had some previous discussions about this
http://forums.parallax.com/showthread.php?123055-What-should-a-Propeller-OS-have-(Sphinx-SphinxOS-PropDos-PropCmd-FATEngine&highlight=sphinx

Cluso99 · 2012-02-22 19:56

I just rescanned the SphinxOS thread and others.

I think it fair to say, this OS should be both a FAT16/32 and SD card based OS.

The big difference between SphinxOS and Ross's Registry is that Sphinx was based on functions like StdIn and StdOut, etc, whereas the Registry is based on cogs.

There had been discussions about pin maps too. From where I sit now, I often transfer my SD card (microSD) between various hardware (TriBlade#2, RamBlade, RamBlade). I don't want to recompile an OS to support various hardware implementations, at least for drivers. Currently we will have to do this for XMM programs under Catalina, although I see this also as possible to be changed further down the track. At least all OS support programs (such as DIR, LS, etc) should be both driver and pin agnostic. Sphinx alread works this way, and has done so for a couple of years.

There are also possibilities that the hardware my change from boot to boot. e.g. I have plugin hardware modules that have TV and VGA on another module. Both would use the same pins, but they dont have to if plugged into a different socket on the pcb. How do we handle this???

Well, a simple (but specific) program could inspect the pins to determine what options are plugged in and set an appropriate file (or hub locations) with the current hardware and pinout. This program would run after the eeprom boot program, and would then load the OS program. If no hardware change was required, then the SD file should not be rewritten to avoid wearing out the sd card.

One of the other things I raised in previous threads is that I believe the eeprom should only contain a simple boot from SD card program. It should not know about any other information. The reason for this is that the eeprom would be shipped pre-programmed and therefore a propplug is not required to be purchased by the user. All files can be transferred to the SD card on the pc. I saw where Mike? has in his eeprom boot code that if the sd card is not found then an alternative OS is loaded that presumes a fixed hardware for that particular board. This would be an alternative. My later designs after the TriBlade all have a write enable link to permit writing to the eeprom. Default is protected - I don't want a user inadvertantly overwriting the eeprom.

Cluso99 · 2012-02-22 20:36

Here is a possible layout and description.
Should a 512 byte hub buffer also be allocated for the SD sector buffer???
Should a 2KB hub buffer be allocated for to load cogs???
Could the 512B sector buffer share the 2KB cog buffer???

Heater. · 2012-02-22 23:26

If this gets much more complicated we may as well implement CORBA on the Cogs:)

http://en.wikipedia.org/wiki/Common_Object_Request_Broker_Architecture

pjv · 2012-02-23 09:44

Hi All;

I'm not sure if, or how, the following applies to the discussion in this thread..... I don't even really know what a registry is, although I understand PC's use them.

But I do understand the need for fast and efficient ( in both speed and code size ) two way signalling between Spin code running from Hub and AMS code running in Cogs. In my applications speed is always a huge issue, as well as running multiple simutaneous threads in any Cog. So what has evolved in my development is a very fast standard interface which consists an array of 2 longs for each Cog, parked in static positions at the end of HubRam. All signalling between Spin and any Cog is through the associated 2 longs using some standard approach that is quite compact and very fast.

As some may be aware, I run a co-operative OS in Cogs that allow multiple threads per Cog.... typically up to 8. These threads would be engines to effect some low-level activity such as PWM, Serial, I2C, One-Wire, Keyboard, Mouse, etc. Each of these threads needs to get commands from Spin such as sending serial or reading an I2C port. So the first long associated with the Cog contains an 8 bit Status register indicating which of the 8 possible threads are running/busy. At this point, the balance of the long is unused. The next long is the command register that tells the OS what to do. The OS checks this register every one microsecond for service requests (kind of like an interrupt) while it continues to execute code. On detecting a non-zero content, the OS decodes the lowest byte as two nibbles, The low nibble does a 0..15 entry table lookup which contains the Cog address of a thread table. The second nibble also does a 0..15 entry lookup in the selected thread table end retrieves the Cog address of a routine in that thread. Note, the tables are not necessarily fully populated, any can be up to 16 entries each. If shorter tables are adequate, then Cog memory is saved.

The next byte in the long contains a variable that is salient to the particular piece of thread code that was selected by the previous two nibbles. For example, it might contain the number of serial bytes to to read, or the number of I2C transfers..... it all depends. The next two bytes are typically used as a word, being an address in Hub memory as to where data should be stored, or where data mignt come from. This address is used in the Cog thread code with its rdbyte/word/long or wrbyte/word/long instructions.

So typically, lists in Hub ram are the source or destination buffers for the Cog's engines. There are some further protocol efficiency situations that add flexibility. Typically (and it depends on the piece of thread code being interfaced) if the second byte in the long has zero value, that is a signal that the first entry in the list accessed by the word referencing the Hub address contains a quantity ( byte, word or long as appropriate ) indicating the length of the addressed list. So one can have many static lists (as well as strings) or buffers in Hub ram, each prepended with their length.

What is nice about this method is that it is very fast, and a Cog thread routine is typically triggered in one microsecond after the OS detects the command, which is itself typically also in one microsecond. Also, the single long encoded commands are atomic in nature so slow (Spin) locks are not required to ensure their composure. Furthermore, the OS automatically updates (clears) the command register so Spin is signalled that the command has been accepted. Then the final touch is the automatic signal into the attendant Status register bit to indicate that execution of the selected thread is busy or has completed. That function readily allows Spin methods to optionally hold until the activity is complete before proceding to the next instruction/method.

There are many additional features and functions that permit Spin control of each Cog's OS itself. For example spawning or terminating or temporarily pausing threads. And (hopefully) soon to be the dynamic loading of Cog threads.

So my point here in replying to this thread is that for high performance operation, it is imperative that the interface being considered must be very efficient in its execution, both in clock cycles and Cog code length. And I offer the description of my ASM/Spin interface as a further checkpoint in your considerations..... really without well understanding what you are up to. Perhaps there is no relevance at all, and this response is all just my two cents of clutter.

Cheers,

Peter (pjv)

Cluso99 · 2012-02-23 18:09

Peter: Thanks for your input. Your comments are definately relevant.

Like you, I require a clean+simple+fast interface. My main concern for the time being is to have the standard input and output contained within either 1 or 2 cogs (like a keyboard cog and a screen cog or a single cog doing serial/pc comms to achieve both input and output in the one cog). If an output cog requires more cogs (eg VGA driver) then it will enlist the extra cogs.

I require to be able to load/unload/reload cogs so that I can change the stdio on the fly. And the code for loading into cogs does not permanently reside in hub ram, therefore freeing up valuable hub space.

I want to build an OS that can be accessible from all languages (pasm, spin, c, basic, forth, etc) and CPM2.2/ZiCog. The last may seem a little unusual, being an OS in its own right. But, if it is subserviant to the stdio, then we can just switch to CPM2.2 to use the tools that it already contains, including spredsheets! Then, just as quickly, we can switch back to the prop OS.

From my understanding, Catalina compiled programs can be loaded in their own right. It seems just a matter of understanding to be able to load/unload c programs within this new OS, and take advantage of the stdio routines previously loaded, rather than the catalina c program reload them.

If we can achieve my vision, then all the routines that CPM and PCDOS had could be written simply. Remember DIR is a little program in those OSes, not actually part of the compiled OS. This permitted programs suchas LS to be written, which provided simple but better DIR displays. Currenlty PropDos, PropCmd, KyeDos all implement these as embedded commands. However, Sphinx doesnt, favouring the little modules. And Sphinx brings us to compiling these simple programs on the prop itself.

And for those who touted the much more advanced features of *nix style OSes... IMHO this is way too complex from where I see the Prop being. Maybe this could be a P2 version but again IMHO there are too many other ARM based cheap solutions that would not make this very viable for the effort involved.

So, in summary, I am looking for a more CPM/PCDOS like OS that is more complex than the current PropDos/PropCmd, and more along the lines Michael Park envisioned with Sphinx. Drac has made some great inroads with KyeDos. I would like to base the OS on Kye's SD drivers because they seem to be very fast and complete. Catalina also uses modified Kye SD drivers too, so there is reasons to combine this effort.

Comments anyone???

Dr_Acula · 2012-02-23 23:13

@cluso, I'm doing similar things and I'm still pondering (having followed this thread since it started) exactly how to pull things together with the cogs. Simplicity vs complexity.

Re

Currenlty PropDos, PropCmd, KyeDos all implement these as embedded commands.

Some commands in Kyedos are implemented as loadable programs - just to show it can be done. Xmodem is one of those.

I'm porting Kyedos over to the ILI9325 display. I think this can become a portable battery powered propeller platform. My daughter wants it to be the size of her iphone and I think that is entirely possible with surface mount chips.

I want to get xmodem as part of the operating system, and then I want to pull in some of the very clever comments being made on the 'faster download' thread. The purpose for this is that I want C programs to be part of the ILI9325 operating system. Ultimately, you write some C code, define an icon, where on the screen you want that icon, press 'compile/download/run' and a new icon pops up on the screen. Press it to run your large C program.

The ILI9325 operating system needs external memory, and so do C programs, so to throw another spanner in the works, one needs to think not just about cogs interacting with each other, but also how multiple programs are going to share external memory.

One nice thing about the ILI9325 display is that it doesn't use too much in terms of cog resources. The external memory driver used by C and the operating system, plus the display driver, plus (probably) a keyboard driver, should all fit into one cog. So plenty of room to add more cogs. And then of course you will want to be able to load and reload cogs on the fly from 2k files sitting on an SD card. And so those cogs will need to talk to each other.

I guess because I am mangaging to squeeze so much into one cog I'm not coming up against the problem of cogs talking, but Ross is ahead of me on that one!

Cluso99 · 2012-02-24 00:57

Drac: Nice.

So how are you...
1. Compiling cogjects?
2. Loading cogjects?

I also want to be able to load and run c objects (or programs) without reloading the whole prop. Have you done this?

BTW I have been looking at the various hwdef files (Sphinx, Catalina, etc) because I want to read this info from an SD file at boot time.

I am looking for a format that not only has the pin definitions, but also the cog objects to start. Something like...

VGA_80X24.COG PIN=16
FDX_SERL.COG PIN=30,31,0,115200
KEYBOARD.COG PIN=26
SD_CARD_.COG PIN=0,1,2,3
etc