A proposal to develop a standard for communicating with cogs from any language!

Dr_Acula · 2011-12-02 23:39

Ok, I think we are on a roll here. If all pasm code accepts PAR as the registry location, that makes all pasm code portable, does it not? You can compile it independantly if you want to, and if you don't then the overhead is only a few extra lines of pasm in any program.

We ought to be able to take any existing pasm obex code and add in a few lines of code. At the moment, it is fairly standard to pass 'par' as pointing to the list of parameters.

Now instead, we pass par as a pointer to the registry and the cog number and then the pasm code can find out where its parameter list is. I don't know about searching... seems to be more pasm overhead?

I'm still not sure about how the pasm code knows what cog it is though. If the registry contains 8 longs and the pasm code is running in cog 4, how does it know what registry long to read? Or do you pass it the registry value plus 4?

The alternative is to pass the beginning of the registry, but then that adds more pasm code to decode which cog it is. And pasm code space is precious.

What exactly would the extra pasm code look like? In pseudo code, get registry address, add cog number, read value at this address, mask off some bits? What would it look like in code though? And can we write this extra code in 5 longs? Or less?

RossH · 2011-12-02 23:46

Dr_Acula wrote: »

Ok, I think we are on a roll here. If all pasm code accepts PAR as the registry location, that makes all pasm code portable, does it not? You can compile it independantly if you want to, and if you don't then the overhead is only a few extra lines of pasm in any program.

Yes, that's correct. Registering, unregistering and processing requests takes more code - but many cog programs don't need to do any of that (or it can be done from the high level language).

Dr_Acula wrote: »

We ought to be able to take any existing pasm obex code and add in a few lines of code. At the moment, it is fairly standard to pass 'par' as pointing to the list of parameters.

Now instead, we pass par as a pointer to the registry and the cog number and then the pasm code can find out where its parameter list is. I don't know about searching... seems to be more pasm overhead?

The cog doesn't have to search - only the programs using it.

Dr_Acula wrote: »

I'm still not sure about how the pasm code knows what cog it is though. If the registry contains 8 longs and the pasm code is running in cog 4, how does it know what registry long to read? Or do you pass it the registry value plus 4?

The cogid instruction tells you your own cog id.

Dr_Acula wrote: »

The alternative is to pass the beginning of the registry, but then that adds more pasm code to decode which cog it is. And pasm code space is precious.

Actually, that's what I do. Calculating your own entry takes only a few instructions, which can be re-used as data space anyway - this is a common technique used in many drivers, and means the overhead of the code is zero.

Dr_Acula wrote: »

What exactly would the extra pasm code look like? In pseudo code, get registry address, add cog number, read value at this address, mask off some bits? What would it look like in code though? And can we write this extra code in 5 longs? Or less?

See the code of the "generic" Catalina plugin described in the document attached to the first post.

Ross.

RossH · 2011-12-02 23:50

jazzed wrote: »

I would not be against having a single 16 bit word point to the address of a registry. Thing is it is not really necessary to do this at all unless you want to reload programs. While I can see advantages in some situations like an O/S, not everyone wants to do that.

True - but there is a lot to be said for having a single mechanism that suits everyone's needs - especially when it does not add much complexity at this low level. Also, I think more people would use this capability if it were readily avalable.

Ross.

Dr_Acula · 2011-12-03 03:28

Hi Ross,

My concern here is we might drown in a sea of complexity

I have read through the introduction document and some of it makes sense but some of it is far too complex for my simple brain. Especially the diagram on page 7 and the description on page 6 of the four different types of service requests. My quest for understanding is also hindered by a lack of information - this discussion thread does not explain what the standard should be - instead it references a pdf document, which then references some code (which you have to download) and then the document references a file "Examine the file Catalina_Plugin.spin in the simple Directory" which in my case is a directory with 12 files, none of which are Catalina_Plugin.spin.

I fear we may be putting some people off using this standard!

Let me describe a super simple example. I have some pasm code which accepts one variable. I write some code in a higher level language which changes that variable. I pass the location of that single variable via PAR to a pasm program at startup. My pasm program reads the par value and polls this for any value not equal to zero. The higher level language changes the value to something non zero. The pasm program notes this, flashes a led, and then sets the value back to zero to indicate it has been processed.

Now, let's go about porting that over to a 'plugin'. First, we need to define a registry using the higher level language. Then, on the pasm side, we need to pass the location of the registry via PAR. We need some extra pasm code to find the value of the cog and to add this to the registry value. We will need some more working variables. We also need to decode one of those four communication protocols - maybe with some bitmasking. We might need to set some bits high or low to indicate a variable has been read - so will will need another long devoted to being a bitmask for that variable.

My concern is how much extra code this takes on the pasm side. The higher level language does not matter - you have lots of room for code and you can drop in big functions if you like. But pasm is different. There are only 496 longs and many of those are devoted to storing data, so the actual code might be a lot less than 496 instructions. Even an overhead of 5 extra instructions may be too much.

Do you need to 'register' a cog for instance? The calling program knows it loaded a cog and started it and presumably the code was written and debugged properly, so once it has been loaded, could one not always assume it is registered and running?

Do some cogs need to ever know about the registry? Sure, it might be clever for some cogs to talk directly to other cogs via the registry but how often does this happen? Most of the time, your keyboard cog code will talk to your display cog via the higher level language, not directly. But there are costs with talking via the registry because you need to add extra pasm code.

So - my idea is that you have the idea of plugins talking via the registry, but to maintain backwards compatibility, you also allow much simpler protocols, such as the one above which have just one variable and nothing goes via the registry.

Can such code be compiled separately? Yes, I believe it can. Compile that cog code into a binary file and store it on an SD card. The calling program knows this program has one hub long for comms, and the Spin/C/Basic code can create a variable and then look up its location in hub, on the fly and from within code, pass that value to the cog via par on startup and the cog neither knows nor cares what the calling language is.

As I see it, there is a cost associated with the complexity of the catalina plugin protocol. It takes precious cog code space, and the protocol is hard to describe. What are the benefits and do they outweigh the costs?

Can protocols that avoid the registry be included in your standard?

Dave Hein · 2011-12-03 05:10

PropGCC currently has a "registry" for a single item -- the external memory driver. I have suggested a driver list in the PropGCC forum that contains a little more information than Catalina's registry. However, it seems like it will be difficult to pin down the details at this time.

A simple array of 8 longs looks like a good way to start. The first word could contain an optional lock number and the type of program that is running in the cog. The second word could be used for a command, or it could be a pointer to a command/response mailbox. I think it would be better to dedicate two longs per cog so that the second word could be dedicated to the command, and the second long would be a single parameter or a pointer to a parameter list. It would be replaced by the response when the command is completed.

As far as Spin support, my understanding is that Parallax will support this for the P2. They are in the process of converting the Spin compiler from x86 assembly to C. I suspect it will support some of the new features of the P2. If Parallax doesn't do this, I'm sure the Spin community will make sure that Spin runs on the P2.

EDIT: After another cup of coffee I've changed my mind on using 2 longs per cog. I would prefer the single long per cog registry. The second word would be used for a simple command/response mailbox or it would point to a more flexible mailbox and more memory if needed.

RossH · 2011-12-03 05:13

Dr_Acula wrote: »

Hi Ross,

My concern here is we might drown in a sea of complexity I have read through the introduction document and some of it makes sense but some of it is far too complex for my simple brain. Especially the diagram on page 7 and the description on page 6 of the four different types of service requests. My quest for understanding is also hindered by a lack of information - this discussion thread does not explain what the standard should be - instead it references a pdf document, which then references some code (which you have to download) and then the document references a file "Examine the file Catalina_Plugin.spin in the simple Directory" which in my case is a directory with 12 files, none of which are Catalina_Plugin.spin.

Sorry you find it so complex, Dr_A, because the basic concepts aren't - you can blame my documentation skills if I have been unable to explain it well enough (and by the way, if you are missing the file Catalina_Plugin.spin then I'd advise you to reinstall the current version of Catalina, since it is certainly present in that directory in release 3.4).

In reality I don't think a suitable mechanism could be much simpler and still be of use, But I could be wrong there, and in any case this is kind of missing the point - I'm perfectly happy for alternative proposals to be put forward. That was in fact why I started the thread.

Dr_Acula wrote: »

I fear we may be putting some people off using this standard!

What standard? There isn't one - that's the point.

I'm not proposing my own standard in this thread (except as a working example). Reread the title - I'm proposing we work together to develop one!

I put Catalina's model forward as an example because it is documented, has plenty of example code, and has been demonstrated to work quite well, requiring only minor modifications to existing PASM objects. But I mainly intended it to trigger debate. I'm perfectly happy for others to propose alternative models, or suggest improvements. If we can jointly come up with something that everyone agrees is better, then I will adopt it.

Dr_Acula wrote: »

...
As I see it, there is a cost associated with the complexity of the catalina plugin protocol. It takes precious cog code space, and the protocol is hard to describe. What are the benefits and do they outweigh the costs?

Can protocols that avoid the registry be included in your standard?

Yes, there is cost associated with a registry-based solution - it amounts to about 4 longs. For example, here is some minimalist code you could use to get your pointer to your dedicated comms block after being passed the address of a "minimalist" registry in your par parameter:

my_reg  cogid   my_reg                           ' read ...
my_comm shl     my_reg,#2                        ' ... my comms block address  ...
        add     my_reg,par                       ' ... from ...
        rdlong  my_comm,my_reg                   ' ... the registry

The code gets more complicated (by a couple of instructions) if your registry also encodes additional information over and above just a pointer to a comms block (such as the plugin type) - but an alternative scheme might instead encode that kind of information in the comms block itself, so if you don't need it then there is no further overhead.

Is 4 longs (**) too much overhead for all the potential benefits to be derived from such a scheme? Actually, if you pass the address of the cog's own registry entry (rather than the address of the registry itself) the overhead can be reduced further - but then you have to calculate the registry address anyway when you want to interact with another cog, so in my view it is not really much of a saving, and in fact makes the scheme more complex, not less.

Ross.

** In most cases, the overhead is only 2 longs, not 4 - since after the above code has been executed, the 1st and 2nd instructions are used as variables (they point to your registry entry and your comm block) - but the 3rd and 4th instructions can also be used as variables by the cog program, saving 2 longs elsewhere.

RossH · 2011-12-03 05:19

Dave Hein wrote: »

PropGCC currently has a "registry" for a single item -- the external memory driver. I have suggested a driver list in the PropGCC forum that contains a little more information than Catalina's registry. However, it seems like it will be difficult to pin down the details at this time.

A simple array of 8 longs looks like a good way to start. The first word could contain an optional lock number and the type of program that is running in the cog. The second word could be used for a command, or it could be a pointer to a command/response mailbox. I think it would be better to dedicate two longs per cog so that the second word could be dedicated to the command, and the second long would be a single parameter or a pointer to a parameter list. It would be replaced by the response when the command is completed.

As far as Spin support, my understanding is that Parallax will support this for the P2. They are in the process of converting the Spin compiler from x86 assembly to C. I suspect it will support some of the new features of the P2. If Parallax doesn't do this, I'm sure the Spin community will make sure that Spin runs on the P2.

Hi Dave,

Thanks for the info. I agree that a comms block of two longs per cog is enough for nearly all purposes, but I think you need some additional space to store the information such as the program type (and the lock, which I wish I had put in there at the beginning!).

I think I've got the balance about right by effectively using three longs per cog in Catalina (the single long in the registry points to a two long comms block as well as encoding some information itself).

What other information would you like to see encoded in the registry?

Ross.

Dave Hein · 2011-12-03 06:44

I thought some more about using 1, 2 or 3 longs per cog in the registry, and it I agree that 2 longs per cog would handle most cases. The first word could contain the lock number, type and a mode bit. The mode bit would indicate whether the cog uses only the 2 longs in the registry for the mailbox (simple mode), or whether the mailbox and other information is located in another area of memory (extended mode).

In the simple mode the second word could be used for the command. The second long would be used for a parameter or a pointer to a parameter list. When the command is completed the driver cog will return a response in the second long and clear the command word.

I'm not sure how to specify the extended mode. In addition to an extended mailbox it could include descriptors about the memory space used by the driver, such as the stack and program space for a Spin program.

Dr_Acula · 2011-12-03 15:00

Hi Ross,

Yes I agree brainstorming this will be very useful.

This is one of your demonstrations.

' Plugin demo for Catalina - pass the location of the registry and it places a value in the registry
' that restarts Catalina

CON

  _clkfreq = 80_000_000                                              ' 5Mhz Crystal
  _clkmode = xtal1 + pll16x                                          ' x 16

PUB Main
    coginit(1,@entry,0)                                           ' cog 1, cogstart, dummy value


DAT
              org       0
entry
              
              mov       t1,par                  ' start of the array = array[0]
              add       t1,#4                   ' array[1]
              rdlong    registry,t1             ' value,address read this value from hub ram, equals the registry
              
              wrlong    zero,registry           'value,address store a zero to the registry first 4 bytes

              cogid     t1                      ' get ...
              shl       t1,#2                   ' ... our ...
              add       t1,registry                  ' ... registry block
              rdlong    rqstptr,t1              ' register ...
              and       rqstptr,low_24          ' ... this ...
              wrlong    zero,rqstptr            ' ... plugin ...
              mov       t2,#8               ' Addit - CHANGED PTYPE TO 8 (not able to find a ref to ptype)
              shl       t2,#24                  ' ... the ...
              or        t2,rqstptr              ' ... appropriate ...
              wrlong    t2,t1                   ' ... type

loop2         jmp       #loop2                   ' loop forever

'---------------------------------- Storage ------------------------------------
'
zero          long      0                       ' handy value (zero)
low_24        long      $00FFFFFF               ' handy value (lower 24 bits)
rqstptr       long      0                       ' request address
'rqst          long      0                       ' service request
'rslt          long      0                       ' service result
t1            long      0                       ' temporary variable
t2            long      0                       ' temporary variable
t3            long      0                       ' temporary variable

arraylocation long      0
registry      long      0
sevenf        long      $7F00                   ' location for debugging

param         long      0                       ' saved initialization data
'
              fit       $1f0

I'm sure that parts of this can be removed but my concern is there may be some situations where this is too much code.

What I would propose is that if you have a complex system like this, as a subset of this standard you also include a standard that does not need the registry, such as a very simple pasm code that only communicates with one long. One could even think of a plugin that does not need any communication at all - eg code to flash a led.

I don't mind adopting Catalina's model in its entirety. What I am concerned about is that if you do this, you are not "allowed" to use simpler protocols that bypass the registry.

Can you explain why the registry is always needed? For instance, I write some C code, I create an unsigned long variable in the correct place so it is in hub ram, I determine the location of that variable. I determine in C that there is a cog free. I load some precompiled code off an SD card and into a cog. I start it up and pass my variable using PAR. I then run a C function to tell the registry that there is now some code running in that cog.

The registry is happy because it knows that cog is being used.

Data I/O is going via my variable, not via the registry, but why does this matter? The only situation I can think of is where other cogs want to interact with my cog, but my cog does not have any code that needs to interact with other cogs directly, and even if it did, I'd prefer to interact via the higher level language because I can then add in debug statements and see what is going on.

Are there any reasons the standard cannot include a communications protocol that bypasses the registry?

RossH · 2011-12-03 16:24

Dr_Acula wrote: »

Hi Ross,

Yes I agree brainstorming this will be very useful.

This is one of your demonstrations.
...

I'm sure that parts of this can be removed but my concern is there may be some situations where this is too much code.

That code is not mine, although I can see parts of it are derived from code of mine. I can't quite figure out what it is intended to do. I think this code was from your experiments in stopping and restarting the Catalina kernel.

Anyway, it doesn't matter. As I pointed out in a previous post, if all you want to do is have a common approach to managing your various cog comms blocks, then the code overhead is a couple of instructions. If you want to do more complex things then naturally more code is required - but this is not "overhead" - the same (or very similar) code would be required anyway.

The main benefit from having a common approach is that it makes it easier for other programs to interact with your cog program. It doesn't dictate how each cog program must works internally - generally they will work exactly the same as they would otherwise have done. And it doesn't dictate how programs must be written if they do not need a comms block, or if they do not need to interact with other programs.

Dr_Acula wrote: »

What I would propose is that if you have a complex system like this, as a subset of this standard you also include a standard that does not need the registry, such as a very simple pasm code that only communicates with one long. One could even think of a plugin that does not need any communication at all - eg code to flash a led.

I can't understand why you keep saying it is too complex - it gets complex, yes - but if you don't need all the fancy features, it boils down to an overhead of 8 longs of Hub RAM, and 2 instructions per cog. I can't think of a much better or simpler method (although that's not to say there isn't one - that's really the point of this thread).

Anyway, as I said above the intent is not to dictate how a cog program which doesn't interact with other programs must be written - it is about how to make it easier to write cog programs that do need to interact.

If all you need to do is flash a LED, then your cog program takes about half a dozen instructions, and of course any overhead looks like too much and seems a bit pointless - but if you wanted (for example) to be able to modify the flash frequency at run time, and also run several instances of this program concurrently (all running at different frequencies) and perhaps also start and stop the programs dynamically in response to external events - and then manage all this from a high level language such as C which doesn't want to know or care which cog your programs are running in or how they work internally - then the small overhead required becomes very worthwhile.

Dr_Acula wrote: »

I don't mind adopting Catalina's model in its entirety. What I am concerned about is that if you do this, you are not "allowed" to use simpler protocols that bypass the registry.

Of course you are - you are "allowed" to do anything you want (except subvert the use of the registry by other cogs of course). Even Catalina does not use the registry for all cogs (although more and more I found reasons why it should) - it only uses it for those cogs that need to interact with other cogs dyamically.

Dr_Acula wrote: »

Can you explain why the registry is always needed? For instance, I write some C code, I create an unsigned long variable in the correct place so it is in hub ram, I determine the location of that variable. I determine in C that there is a cog free. I load some precompiled code off an SD card and into a cog. I start it up and pass my variable using PAR. I then run a C function to tell the registry that there is now some code running in that cog.

The registry is needed if you write cog programs that need to interact dynamically. As the name implies, it provides a place where all cogs know where to look if they need to find services offered by other cogs, and means you don't need to allocate specific programs to specific cogs, or use a specific memory location for cog to cog interactions - these things become more difficult as your programs increase in complexity, and become much more difficult when using a high level language.

Dr_Acula wrote: »

The registry is happy because it knows that cog is being used.

Data I/O is going via my variable, not via the registry, but why does this matter? The only situation I can think of is where other cogs want to interact with my cog, but my cog does not have any code that needs to interact with other cogs directly, and even if it did, I'd prefer to interact via the higher level language because I can then add in debug statements and see what is going on.

Are there any reasons the standard cannot include a communications protocol that bypasses the registry?

Post-initialization, no communication "goes via the registry" - in the simplest case, you use the registry only to provide a common cog initialization process - i.e. to provide a uniform method of allocating a comms block to each program rather than using a fixed memory location (which in many cases is not possible). But once your cog program is running, it does not need to interact with the registry at all - it can do anything it likes.

Ross.

RossH · 2011-12-03 16:47

Dave Hein wrote: »

I thought some more about using 1, 2 or 3 longs per cog in the registry, and it I agree that 2 longs per cog would handle most cases. The first word could contain the lock number, type and a mode bit. The mode bit would indicate whether the cog uses only the 2 longs in the registry for the mailbox (simple mode), or whether the mailbox and other information is located in another area of memory (extended mode).

In the simple mode the second word could be used for the command. The second long would be used for a parameter or a pointer to a parameter list. When the command is completed the driver cog will return a response in the second long and clear the command word.

I'm not sure how to specify the extended mode. In addition to an extended mailbox it could include descriptors about the memory space used by the driver, such as the stack and program space for a Spin program.

Hi Dave,

I think it is better to have the "simple" mode use 2 full longs for communications. Instruction wise in the cog, the overhead is exactly the same for longs, words and bytes, but it means the simple mode can support full two-way communication of 32 bit values. I would tend reserve the "extended" mode for more complex cases such as when structures need to be passed or returned - and if you can already pass up to 32 bits, I think the need for this would be quite rare (one case I can think of off the top of my head is the floating point library cogs, where generally two 32 bit floating point values and an operation need to be passed in each service request).

Thats why I eventually settled on 3 longs, with one of them containing the cog type, plus a pointer to the comms block of 2 longs (which could be replaced by a comms block of any other length if required). This way there is no difference in the use of the registry itself between "simple" and "extended" mode - but the use of the comms block is implied by the type stored in the registry.

I agree it might be nice to store the required stack size in the registry, but why the program size? Dynamically loading and unloading PASM programs doesn't need this (they are always a maximum of 2k!) - are you expecting to dynamically load and unload Spin programs as well? - that's difficult since they could be any size! I suppose in my "simpler" registry the assumption is that you have to derive this kind of information ftom the type if you need it.

Also, when you say "the memory space used by the driver", what do you mean?

Ross.

ersmith · 2011-12-03 18:17

I think there are a lot of good ideas in the registry (thanks for writing and posting the document, Ross!). I do share Dr. Acula's concern that it may be more overhead than required for many drivers. The common model, it seems to me, is of one central control cog (the one running the main Spin or C program) communicating with various drivers running on other cogs. It'll be less common for the other cogs to need to communicate with each other. For most drivers, it'll be simpler to pass a pointer to their parameters in PAR, rather than passing a pointer to the registry and then having them search through the registry for the actual parameters. For those drivers which do need the registry, we could adopt a convention that a pointer to the registry is always the first long in the parameter block (or perhaps the third long... the Catalina convention which has the first long being the command/status word and the second long being a result word is a good default).

So I certainly agree that a standard format for a registry of drivers/services would be useful, but it is most useful for fairly complicated programs, typically interpreted code or LMM code, and that the overhead (such as it is) of using the registry should go in those programs rather than in the PASM drivers, which could receive just the request blocks from their callers.

The other major concern I have with the registry as it is currently set up is that it is cog centric rather than service centric. It seems to me that what programs really need to find are the services, not the cogs, and there is not always a 1-1 mapping between services and cogs. For example, consider a real time clock driver and a serial driver. Normally these would run on two separate cogs... but a simple real time clock could certainly be updated by the serial cog, which has to maintain timing for the serial port anyway. Or perhaps a floating point service and some other math service might be combined in one cog in some drivers, and provided by different cogs in other drivers. Similarly, some drivers might require multiple cogs (e.g. a graphics driver).

The mapping of services to cogs may not remain constant, either. It's reasonable for a threads library to do load balancing to move work between cogs, so services provided by those threads will move around at run time (the GCC pthreads library does this for LMM C code). Graphics drivers might allocate additional cogs during periods of heavy work and then release them. Finally, some services might be provided by dedicated hardware on some boards, and by cogs on other boards. A floating point co-processor comes to mind here.

What seems most useful to me is to define standard classes of drivers (text output, graphics output, text input, pointer input, external memory interface, math co-processor, etc.) and standard requests and request blocks for those drivers. That's where the real portability issue between languages and drivers is. Finding the cog that's responsible for the driver, or even the address of the request block, is relatively easy compared to knowing how to use the driver.

Eric

Dave Hein · 2011-12-03 18:34

@Ross, I use two longs for the system call mailbox in SpinSim. It uses 1 word for the lock number, 1 word for the command, and 1 long for the parameter/response. This allows me to send and receive 32-bit values. The word used for the command is more than sufficient to convey the small number of commands it implements. The long used for the parameter and result is used as a data block pointer when passing more than one parameter, such as with fread or fseek. I don't see a need for more than 2 longs in the simple mode.

The memory space and size parameters are only needed if the system supports some form of memory management. This allows for reclaiming memory when a driver is terminated. I used this spinix to start and kill processes, such as Spin programs. spinix uses a kernel and memory manager to allocate memory when loading a process and then reclaiming it when the process is killed. This is beyond the intent of your proposal, but it could be something supported by the extended mode.

@Eric, I suggested using a driver list in the PropGCC forum. In that proposal each entry would contain the cog number, driver type and a pointer to the mailbox. Each bit in the driver type specifies a functional type, such as serial port, display, SD, etc. A driver could indicate multiple features by setting each bit in the type that applied. The list would be terminated by a null entry. The main reason for a driver list is so that a program can start another program, and the new program can load drivers that it needs if they're not already loaded. This is probably only needed for an OS type application.

RossH · 2011-12-03 19:36

Hi Eric,

Some good thoughts there - thanks. I have to say that I still can't quite get my head around how an overhead of 8 hub longs and 2 cog longs can be "too much overhead" for all the potential benefits it brings - but I'm certainly willing to be convinced if anyone can come up with a simpler alternative.

One thing I'm trying to get away from is the model of "one main program and a bunch of drivers". This is of course an important model that must be supported (it is inherent in all C programs for example) - but if that's all the Propeller offers then it will always be doomed to suffer by comparison with other microcontrollers (which shall remain nameless) which already do that kind of thing - and do it better, faster and cheaper than either the Propeller 1 or the Propeller 2 can.

I'm trying to move at least to a model of "a bunch of main programs sharing a bunch of drivers" - something those other microcontroller can't easily offer. Ultimately, I'd like to move to the model of "a bunch of co-operating programs" - but this is probably not practical until we get to the Propeller 3!

But this is important - supporting models other than "one main program and a bunch of drivers" allows us to take full advantage of the two "killer" advantages of the Propeller - absurdly simple multi-processing, and amazingly flexible soft peripherals. No other chip on the market seems to (currently) be able to offer this combination.

I agree that this thread is mainly intended to benefit high-level languages like C or Spin - but 90% (just a rough estimate based on looking around the forums) of all Propeller programming is actually done in these languages. In such cases, I would maintain that having a registry definitely reduces the complexity. There are very few "pure PASM" programs where adding in a mechanism such as a registry might add overhead or complexity.

I understand your comments about the registry being "cog-centric" rather than "service-centric", but this is not quite true. The registry itself is definitely "service-centric", but it does assume you will not be able to offer more services than you have cogs. The opposite is easy - Catalina already offers services (I call them plugins) that span multiple cogs. For instance, the HMI plugin can spans between 2 and 5 cogs depending on the options you choose and the and hardware you have. My experience is that even when a service spans multiple cogs, it will still generally offer an interface only via one of those cogs. If that is not the case, then I agree Catalina's current registry is not particularly suitable.

I also recently had a case where I needed to combine the services offered by two cogs into one cog without changing the interface (I combined the clock and the SD card driver based on various command line options) - it turned out to be possible (there are in fact several ways of doing it), but I agree it was not particularly easy with Catalina's current registry.

Dynamically allocating cogs adds complexity - but this is also something that can be added over the top of quite a simple registry (Catalina can already demonstrate this). This is not something I would like to "embed" any deeper. The basic capabilty is already there, and for most programs it would just add needless complexity - something both you and Dr_A think we already have too much of!

I will be interested in hearing if you have an alternative mechanism that copes better with such cases.

I do agree that once we have a common underlying mechanism in place, we should standardize on some common driver "types". That would help everyone - whether they write programs in C, Spin or PASM. However, this is not really the main point of this thread - in fact, it would seem to take us right back to the thread that this one spun off, without actually solving the problem this thread was intended to address!

Ross.

Cluso99 · 2011-12-03 19:41

I am late to the party. A lot of this was discussed (about 2 years ago) when we were talking about OSes and there were a lot of good features in mparks' Sphinx that could be expanded. There was a lot of good discussion and thein it all died with no resolution. Perhaps it is worth digging up this thread again.

While I am sure, whatever is decided, it will not suit every application. That is fine, as long as it serves the majority.

My biggest concern for the moment, is why C cannot/seems-not to co-exist with spin and we need to rewrite the objects. Am I missing something here???

RossH · 2011-12-03 19:45

Dave Hein wrote: »

@Ross, I use two longs for the system call mailbox in SpinSim. It uses 1 word for the lock number, 1 word for the command, and 1 long for the parameter/response. This allows me to send and receive 32-bit values. The word used for the command is more than sufficient to convey the small number of commands it implements. The long used for the parameter and result is used as a data block pointer when passing more than one parameter, such as with fread or fseek. I don't see a need for more than 2 longs in the simple mode.

Ah, ok - I didn't get that you use the same long for passing data both ways. Might be a good optimization. But it does increase the registry size to a minumum of 16 longs, whereas Catalina's is a minimum of 8 (but by default 24).

Dave Hein wrote: »

The memory space and size parameters are only needed if the system supports some form of memory management. This allows for reclaiming memory when a driver is terminated. I used this spinix to start and kill processes, such as Spin programs. spinix uses a kernel and memory manager to allocate memory when loading a process and then reclaiming it when the process is killed. This is beyond the intent of your proposal, but it could be something supported by the extended mode.

I think this is best handled by adding it on to a more basic proposal. As Jazzed pointed out, this kind of thing is really only used by a few of us.

Dave Hein · 2011-12-03 19:53

The big problem with C and Spin co-existing is that the calling interface and stack handling are not compatible. One way to call Spin methods from C is to use a mailbox, where the top object of the Spin program is a message dispatcher. This will work in a lot of cases, but it will be more efficient and cleaner to just convert ths Spin code to C. It may be feasible to write pre-processor that converts Spin code to C, which would help to automate the process.

RossH · 2011-12-03 19:54

Cluso99 wrote: »

My biggest concern for the moment, is why C cannot/seems-not to co-exist with spin and we need to rewrite the objects. Am I missing something here???

This is mainly because of the limitations of the current Spin compilers. If Parallax had decided to develop a Spin front end for GCC, all our problems would be solved. They didn't seem (at the time) to understand there would be a need (perhaps they may change their mind one day) so we have to use all kinds of "yucky" workarounds (to quote heater!) to make it work. In fact, it is only possible at all because of the efforts of Michael Park and Brad Campbell (who developed Homespun and BST) - if we all had to stick to using the Parallax tools, it would not be possible at all.

Ross.

P.S. by all means post a link to that Sphinx thread if you think it contains relevant ideas - although the intent of this thread is not to make it easier to build a Propeller operating system, it is mainly intended to make it easier to write Propeller programs. There is some similarity between the two - but they are not the same thing.

RossH · 2011-12-03 19:59

Dave Hein wrote: »

The big problem with C and Spin co-existing is that the calling interface and stack handling are not compatible. One way to call Spin methods from C is to use a mailbox, where the top object of the Spin program is a message dispatcher. This will work in a lot of cases, but it will be more efficient and cleaner to just convert ths Spin code to C. It may be feasible to write pre-processor that converts Spin code to C, which would help to automate the process.

Converting Spin to C would not solve the problem - it would just make the Spin programs too large to run!

A Spin front-end to GCC that produced Spin objects (the same as produced by a Spin compiler) but executed then using the Spin interpreter, in the same way that C objects are executed by an LMM kernel would be a better answer.

Ross.

Dave Hein · 2011-12-03 20:17

I don't think it makes sense to compile Spin in GCC. The Spin VM is a stack-base machine, and not register based, so it doesn't fit well in the GCC structure. Also, Spin statements are translated almost directly into Spin bytecodes. There isn't much optimization that can be done. Anyhow, this would not solve the problem with interfacing C to Spin.

Overall, it is better to convert the Spin code to C if you want to interface it to other C code. The larger code size is going to be a fact of life when working with C. However, the good news is that we can execute directly from EEPROM, so all you need is a 64K EEPROM. With a 128K or 256K EEPROM you can run large programs in C that would not have fit in 32K with Spin. And the good news is that they will run faster than Spin, even if executed from EEPROM.

RossH · 2011-12-03 21:05

Dave Hein wrote: »

I don't think it makes sense to compile Spin in GCC. The Spin VM is a stack-base machine, and not register based, so it doesn't fit well in the GCC structure. Also, Spin statements are translated almost directly into Spin bytecodes. There isn't much optimization that can be done. Anyhow, this would not solve the problem with interfacing C to Spin.

Overall, it is better to convert the Spin code to C if you want to interface it to other C code. The larger code size is going to be a fact of life when working with C. However, the good news is that we can execute directly from EEPROM, so all you need is a 64K EEPROM. With a 128K or 256K EEPROM you can run large programs in C that would not have fit in 32K with Spin. And the good news is that they will run faster than Spin, even if executed from EEPROM.

I don't want to drag this thread too off-topic, but there is no impediment to using GCC to compile to a stack-based architecture. The Zylin ZPU is a stack-based architecture, and is hapilly supported by GCC (and don't ever let Zog hear you say otherwise!).

However, I'll concede on this one - a Spin to C translator would be quite easy to write, so no doubt someone will do this eventually (has any work been done on one already?). It might even be a useful alternative once the Prop 2 becomes available and it is possible to execute the resulting program from Hub RAM (I don't think executing them from external EEPROM on a Prop 1 is going to be an attractive option).

Ross.

jazzed · 2011-12-03 21:55

RossH wrote: »

I don't think executing them from external EEPROM on a Prop 1 is going to be an attractive option

It is attractive because ...

The advantage is in having about 58KB of code memory available practically for free on a large install base (about 122KB for Hydra).
It is a little slow relatively speaking, but it's hardly different from using SPIN/PASM where the business code is slower than the devices.
Of course GCC being able to allocate any functions to HUB instead of EEPROM makes it even sweeter.

RossH · 2011-12-03 22:29

jazzed wrote: »

It is attractive because ...

The advantage is in having about 58KB of code memory available practically for free on a large install base (about 122KB for Hydra).
It is a little slow relatively speaking, but it's hardly different from using SPIN/PASM where the business code is slower than the devices.
Of course GCC being able to allocate any functions to HUB instead of EEPROM makes it even sweeter.

I'm obviously missing something here. I doubt there are enough Hydras out there to worry about, but I agree there may be quite a few Propellers with a 64k EEPROM installed. So let's say you have 58k (why is this not 64k?) available for code space instead of the normal 32k. But your Spin program has to be converted to C and then compiled, and this means your code size is going to be at least twice as large (from experience, probably closer to three times). So you actually end up with significantly less effective code space for your program. And the end result is that it executes slower?

I'm having trouble seeing any actual benefit in this. Anyway, I'm happy to see GCC take the lead on this one - if there does turn out to be a demand I'll add an EEPROM XMM driver to Catalina

Ross.

jazzed · 2011-12-03 22:43

RossH wrote: »

So let's say you have 58k (why is this not 64k?)

Because the EEPROM must contain the program, the VM, and the cache driver.

It's hard to compete with SPIN for space. If someone wants to use C, then it is an option for a bigger program essentially for free. Of course SD card makes more sense - we have that lead too

pedward · 2011-12-03 23:11

Off-topic aside: As I was reading the above comments, I couldn't help but think: Why not take the Spin interpreter (now that it's been released) and extend it with some instructions to do LMM to EEPROM in a sane way (maybe with explicit swapping in/out of code) and drop the SPIN interpreter into a PASM section of an object, then do whatever the heck you want.

The sparseness of SPIN is extremely attractive, and perhaps some purposeful or clever optimization could result in faster SPIN code that runs closer to PASM in speed?

I don't mean to hijack this thread, just posting the meandering thoughts.

RossH · 2011-12-03 23:14

jazzed wrote: »

Of course SD card makes more sense - we have that lead too

Now you're just trying to shift the ground of the argument

We were talking about Spin programs converted to C, not C programs.

Anyway, as for executing code from SD Card, much the same argument applies. I'm happy for GCC to lead the way here - the installed user base of Propellers with an SD card attached (outside hobbyists like us) has got to be quite small. But if it turns out to be a useful addition, I'll add an XMM SD Card driver to Catalina.

The thing that we tend to lose sight of in these discussions (and I'll put my hand up to this one as much as anyone, because it took me a long time to realize it) is that there is no demand for these complex solutions that require additional external hardware! Why would there be, when the end result is a hybrid solution that executes slower, is more complex to build and program, and costs more than the alternatives? The situation will be no different when the Prop 2 arrives - only solutions supported by the bare chip itself will get any significant traction in the commercial market, since even now you can buy an alternative chip that will be cheaper and faster than the Prop 2 is currently proposed to be (at least when it is used like this).

That's why I'm more interested in the subject of this thread, and am not racing off to add support for all these new XMM devices to Catalina - we have to take advantage of the things the Propeller is best at - (a) multiprocessing, and (b) soft peripherals that make use of its unique I/O capabilities.

Ross.

RossH · 2011-12-03 23:29

pedward wrote: »

Off-topic aside: As I was reading the above comments, I couldn't help but think: Why not take the Spin interpreter (now that it's been released) and extend it with some instructions to do LMM to EEPROM in a sane way (maybe with explicit swapping in/out of code) and drop the SPIN interpreter into a PASM section of an object, then do whatever the heck you want.

I think something along these lines has been tried, in the experiments designed to achieve "Big Spin" - i.e. Spin without the 32k limit. This is not quite the same as what Dave and Jazzed are proposing. I'm not exactly sure where "Big Spin" ended up. You might try this thread.

Ross.

RossH · 2011-12-04 00:11

Ariba wrote: »

The answer to all this C and Spin combination / translation is: Dave Hein

He has done a lot of clever work in this area, which is mostly disregarded:
C to Spin converter:
Spin + LMM
and also:
C library in Spin

Andy

Hi Ariba,

I know Dave has done good work on "C to Spin" - but I don't think anyone has yet done "Spin to C".

Ross.

Ariba · 2011-12-04 00:13

Sorry, I noticed my Error and deleted the post, but you was faster.

The other Link about LMM + Spin may still be useful.

Andy

Dave Hein · 2011-12-04 06:34

RossH wrote: »

I don't think anyone has yet done "Spin to C".

I've tried doing a Spin to C converter -- three times! I just haven't posted anything on it. My latest attempt looks promising. Instead of converting it to C it converts it to C++. This makes it much easier to handle Spin objects. Basically, each Spin object is wrapped in a C++ class statement, which provides the object encapsulation that Spin uses. This could also be done with C structs, but it requires using function pointers, and initializing them at startup.

In general, it's much easier to convert Spin to C than C to Spin. Spin does have slightly different operator precedence, so statements will need to be parsed and reconstructed with parenthesis where necessary. Some unary operators always store the result back to the variable, so that needs to be handled also. When the @ operator is used in a DAT section it produces an object offset, but when used in code it generates an absolute address. There's just a number of special cases that need to be handled properly.

A proposal to develop a standard for communicating with cogs from any language!

Comments