Propeller basics

mickelsen · 2006-11-05 05:06

I'm trying to get my mind around the basics and theory of operation of the Propeller chip.· From what I've read, in most cases each cog runs by loading a copy of the Spin interpreter and then running a program that resides in main memory.· Is this correct?· Are there any cases where the running program resides entirely in the cog's memory?· Or have I missed the point entirely?· Please pardon a newbie's lack of understanding.

Thanks,
Mark

Paul Baker · 2006-11-05 05:36

Spin programs execute code which resides in main memory, assembly programs execute from cog memory. Spin is run by loading up a cog with a special assembly language program which then fetches, decodes and executes special bytecodes (that sit in main memory).·These bytecodes are·the user's compiled Spin·program.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Mike Green · 2006-11-05 05:49

Things that typically run in assembly language in a cog include: Floating point routines (for speed), High speed I/O (like video generation or full duplex 1MBit serial or high speed I2C or an up to 32 servo driver). Medium to low speed I/O and other functions commonly work fine in SPIN running out of main memory.

mickelsen · 2006-11-05 09:35

Thanks for the replies.· How much memory is left over in the cog memory for assembly language programs?· Or, in other words, how big can they be?· I assume you always have to have a Spin interpreter since Spin code and assembly can be mixed in any way you like.· Right?· Are all Spin tokens (bytecodes) exactly one byte long or are they variable length, like a bytecode with an arguement or two to go along with it?· I'm wondering if the interpreter has some sort of look-ahead feature so that it isn't spending a lot of time waiting for its turn to read the next token out of main memory.· Having a couple of applications that need all the speed I can get out of a cog, I'm interested in squeezing as much juice out of·each one·as I can.· I'm used to using interrupts, DMA's and the like to get extra speed.· I'm not used to just using another processor!· It's a real paradigm shift.

Thanks again for helping me to "change my mind", literally.

Mark

Paul Baker · 2006-11-05 10:59

Each cog runs either Spin or assembly, they aren't mixed on an individual cog. You use Spin for rapid development or non-timing specific "glue" operations or coordination. Assembly is used for functions requiring fast, determinisitc timing.

Each cog accesses the main memory in round robin fashion where each cog has access to the main memory every 16 clock cycles. Since each assembly instruction takes 4 cycles except for a few which take 5 or more (like a main memory access), theres only time to perform·2 assembly instructions between main memory access slots if every slot is to be used.

Both have thier place in applications, and for those which require all eight processors to be doing time intensive functions, a "coginit(0,asm_ptr,parameter_ptr)" replaces the original Spin cog with an assembly program of your own. Chip's released bootstrap program and all third party "non-Spin" development environments do exactly this, they insert the Spin bytecode to override Spin (ie a cog is loaded with Spin which then in turn tells itself to overwrite itself with another assembly program).

Word of warning though, once you kill all cogs which run Spin, it is an arduous task to reinitialize a cog running Spin.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Post Edited (Paul Baker (Parallax)) : 11/5/2006 11:10:45 AM GMT

ciw1973 · 2006-11-05 13:09

What doesn't seem to be made clear in any of the above replies (I'm sure we all know this, but a casual reader just starting out with the Propeller may not) is that Cogs *only* run assembly programs. What in fact happens if you want to run Spin code, is the Spin interpreter, which is itself an assembly program is loaded into a Cog, and this starts running byte codes which it gets from main RAM. A couple of friends who I've pointed in the direction of the Parallax site and this part of the forum specifically certainly seemed confused by this to begin with.

It is a fairly unusual setup in the microcontroller field, especially "out of the box", and something which has arisen from Parallax's BASIC-Stamp products. It's a clever and powerful approach, which opens up development to both hardcore low level types and more casual programmers, who due to the power of the chip can get good performance without getting too close to the metal. I suspect we'll start to start to see alternatives to the Spin language popping up before long, which are either compiled to Spin bytecode, or use their own custom interpreter.

I'm actually working on a runtime to tie in with my GUI project. It's for a 4GL based on something called Retrieve, a kind of COBOL/Pascal hybrid which I've developed a lot of businees applications in over the years. Initially it'll be used primarily as a way of defining screen layouts, simple user iteraction and the like, but I hope to expand it and ideally allow development to be done directly on the Propeller by writing editors and a compiler in the language itself. Even if this never sees the light of day (and I'm very hopeful it will) it's a lot of fun and is keeping me out of the pub, which apparently is a good thing.

Mike Green · 2006-11-05 17:07

To answer your question about memory, each cog has 512 long words (2K bytes) of memory of which 16 are special purpose (timers, I/O, other control registers). As Paul mentioned, there's an instruction (COGINIT) which copies 512 long words from main memory to a cog, zeroes the last 16 special registers, then starts execution at the first location. It takes about 512 x 16 clock cycles to execute (a little over 100us at 80MHz) so it's not something you want to do frequently.

Most assembly routines are effectively interpreters in that they pick up parameters from some fixed locations in memory, act on them, then look for another set of parameters. High speed I/O usually works through main memory buffers that way. The high speed floating point package works that way. That way, the routines don't have to be reloaded into a cog over and over again.

SPIN bytecodes are variable length although a lot of them are intentionally single byte. As Paul implied, successive accesses to main memory are optimum when separated by two instructions and it's not hard to find two unrelated instructions to move to between two main memory accesses, so the SPIN interpreter doesn't waste time waiting for its turn to come around.

SPIN is quite fast. Half duplex serial I/O routines in SPIN can handle up to at least 19200 Baud for example. On the other hand, a cog executes an instruction every 50ns. Two synchronized cogs can produce VGA text at a resolution of 1024 x 768 with only a couple of resistors for digital to analog conversion as external components (each cog taking turns building scan lines).

mickelsen · 2006-11-05 18:21

I understand that the Spin interpreter is an assembly language program but I didn't realize that you couldn't mix assembly and Spin in the same application.

I seems apparent to me that, with only a maximum of 496 assembly instruction slots available in cog memory, assembly programs can become code space limited very quickly unless you implement some sort of paging scheme which moves segments of code into cog memory from main memory.· In fact, I can see that the only permanently resident part of some assembly programs would be the paging code leaving the largest possible area to move code into.· Has anybody run into this limited code space problem?· Has anyone written a code swapping routine for assembly programs that are too big to fit into cog memory?

Phil Pilgrim (PhiPi) · 2006-11-05 18:50

Mickelsen,

The apparently limited size of each cog's codespace is offset by the fact that the instructions are two-address instructions that operate on 32-bit operands. Combine this with some really clever math and bit-manipulation instructions, selective setting of flags, and conditional execution for each instruction, and you have a highly-efficient and compact programming model. This is exemplified by the Spin interpreter, all of which fits in a single cog without overlays.

-Phil

Mike Green · 2006-11-05 19:07

To date, most uses of assembly language have fit comfortably in the 496 instructions in cog memory. The only exception so far has been video generation where there's a sizable buffer (for one or more scan lines) also in the cog memory. In those cases, there isn't the time to load in an overlay.

It's a fairly trivial task to write a code overlay scheme using main memory overlays. It would take maybe 10-12 instructions including setup. The real trick is to use some type of external memory to save on the space in main memory needed for the overlays. A general purpose I2C driver takes maybe 250 instructions leaving 1/2 of the cog for overlays. An SPI driver would probably take less, depending on the details of the memory used, but would take extra I/O pins to use.

mickelsen · 2006-11-05 19:43

Phil and Mike,

Your points are well taken.· I need to go back and study the assembly language instructions so that when I start writing code I can get all the functions out of one instruction possible.· I've spent many years programming 8-bit micros whose instructions have only one function and maybe, in some rare cases, two.· Once again I need a paradigm shift.· This certainly is an interesting processor(s) to work with.· Thanks for all your help.· I'm sure you'll hear from me again.

Mark

potatohead · 2006-11-05 20:42

This is the biggie for me too.

I was really just lost for a while. No registers (well the COG ram is really a lotta registers --if you want to think of it that way), no indexing, self-modifying code, conditional flags and results and execution. It's really different!

So far, each time I code a loop it gets smaller!

Coupla things I'm still missing:

Byte level operators. Having everything 32 bits in the COG RAM seems clumsy to me. It's getting less so, so maybe that's just me. I'm sure the net gain from the powerful instructions and the benefits of the very regular timing aspects of this outweigh byte operators.

Would be really nice to have single cycle delays possible. A NOP takes 4 cycles. Would be great to have it take 1 - 4 depending on something... Somehow this seems wasteful.

Things I'm liking!

Lots of smart branches. At first the large number of them seemed redundant compared to your typical 8bitter. However, this does grant you an instruction for every case you might need.

Taken branches are fastest! Good call.

No need to learn and tweak lots of addressing modes. This in the 8bit world is the key to everything and is hard. Love the 6809 because of it's many addressing modes and ability to make relocatable and re-entrant code. At first, this language seemed really goofy, but now it's making all that other stuff seem like cruft one does not need. ---damn cool.

(Still love the 6809 though --it was just too fun to program)

No interrupts. At first, this seemed like a serious shortcoming to me. But after finally getting my mind wrapped around the COGs, it makes a heck of a lot of sense. There is going to be a lot of new ways to do things! This is essentially why I bought a prop. That spark of discovery and learning is alive and well in this regard.

The no interrupts also still sits in the things I don't like column too. It's because I'm still really digesting multiprocessing and what it means where states are concerned. Parallel execution is just really different. It's difficult for me to grok what the state of things will be at times. IMHO, this will pass!

Conditional results! It's possible to have long strings of instructions without having to worry about flag states and such. Very cool.

Paul Baker · 2006-11-05 20:50

I still don't get why people prefer the old interrupt system, you essentially have 8 full time deterministic interrupts with the current system. And with the JMPRET trick you can time slice any individual cog into even more processes (16-10MHz, 32-5MHz, ...)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Post Edited (Paul Baker (Parallax)) : 11/5/2006 10:38:43 PM GMT

potatohead · 2006-11-05 20:59

Do get, or don't get?

And this JMPRET trick? (I'll do some searching..)

Paul Baker · 2006-11-05 22:39

Sorry, a typo that Ive now changed, I meant "don't get". Look at the FullDuplexSerial.spin assembly object for a 2 thread example of JMPRET, this technique can be expanded for any number of threads.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Mike Green · 2006-11-05 22:57

Regarding byte (and other partial word operations). It is pretty easy and fast to do shifts and logical operations. Unlike in most 8 bit processors, you can shift any number of bits in one instruction. Particularly if you're assembling bytes into words or vice versa, it's very easy and fast. Using a rotating byte mask and a shift right count, it's easy to do:

               mov   :where,#byteTbl
               mov   mask,#$FF          'Initialize for LSB first
               mov   shftcnt,#0
:where    mov   temp,0-0             'Get the long value
              and    temp,mask
              rol     mask,#8               'Mask off byte & shift mask
              shr    temp,shftcnt         'Right justify the byte
              add   shftcnt,#8             'Increment to next byte
              and   shftcnt,#$1F
              tjnz   shftcnt,#:skipIt     'Check for wraparound to
              add    :where,#1            '  next long value
:skipIt    tjz     temp,#:quit          'Zero-terminated string
' Now we have the string character in temp
              jmp    :where
:quit

Paul Baker · 2006-11-06 00:14

Also if fitting 3 9 bit entities in a long is sufficient for what you want to do, you can use movs, movd and movi and thier appropriate masks. rev is also a possibility for word length quantities.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

Propeller basics

Comments