PDA

View Full Version : PropellerForth



Cliff L. Biffle
11-02-2006, 12:38 AM
Over the past week or so, I've been working on PropellerForth, a mostly ANS-compliant Forth for the P8X32 chip. This is a trial balloon to see if there are any Forthies in the audience. (Or, for that matter, anyone who would like to have an interactive development environment on the Propeller itself, and is willing to learn Forth to get there.)

RATIONALE
I really dig the Propeller, but I don't really like writing large applications in assembler. For a while, I considered developing a typesafe, optimized version of Spin (which has very nice attributes, like the component [née object] model), but reverse-engineering the bytecodes went slowly -- and, as I remember all too well from writing VB tools many years ago, trying to process a language without a formal spec is the death of a thousand papercuts.

I was pondering this on a phone call with my father, who was not sympathetic. "Why haven't you just written a Forth?" Now, granted, this is his response to nearly everything, but he had a point -- Forth is well-known for its ability to bootstrap interactive development/debugging environments in tiny (~8k) amounts of RAM. I hadn't implemented Forth in years, but visions of Propeller-hosted development swam in my head, so I set to work.

I believe Forth is exceptionally well-suited for the Propeller because it is, simultaneously, as high-level and as low-level as the programmer desires. You can drop to the metal for speed and write in assembly, or define a domain-specific vocabulary and work at the 10,000-foot level.

OVERVIEW
- Indirect threaded.
- Forth code lives in shared RAM and can be shared by multiple Cogs.
- Runtime core runs on one or more Cogs and contains CODE definitions for the primitive words.
- As a traditional Forth, it's mostly written in Forth (with primitives written for propasm).

STATUS
It's not quite ready for release yet; right now it supports
- Interactive Forth environment over a serial console (19200/8n1)
- Most of the compiler -- I can define and test words interactively, but some oddities like DOES> are missing.
- Most of the primitive word set.

In terms of ANS, the main missing features right now are
- Double-cell arithmetic and single-cell multiply/divide/mod
- Number formatting (everything is hex until I finish the division code)
- Error recovery (ABORT currently halts)

For the Propeller specifically, I haven't finished the interactive assembler, nor the set of words for controlling the counters, video generator, and locks.

SIZE
The core of the runtime occupies 266 longs in a Cog including registers and temporaries, so the remainder is available for user-defined assembly words (though the compiler can't help you with this yet). The base image currently occupies 5,468 bytes, which includes the runtime core, most of the ANS Core words, the compiler, the the terminal input buffer, and both stacks -- the entire runtime footprint. Thus, currently, 27k of RAM are available for user definitions.

SPEED
At 80MHz, the current unoptimized implementation executes around 1.4m low-level (colon and primitive) words per second, as measured by some shoddy microbenchmarking. Modeling suggests this is mostly held back by RAM, and the pipeline stalls induced by accessing it at the wrong time. I'm working on a whole-program instruction scheduler so that I don't have to fix this by hand.

DETAILS FOR FORTH-HEADS
Some of the oddities of this implementation were brought on by the P8X32 architecture, such as
- CODE words cannot efficiently be stored in shared RAM, because we cannot execute directly from shared RAM. (I'm treating the chip as a Harvard architecture.)
- For this reason (and the lack of any hardware return stack), an indirect threaded model -- traditionally the slowest way to implement Forth -- proved fastest. (Though this implementation feels more like a token-threaded system; see below.)
- Both stacks are slow shared RAM, rather than fast local RAM, because indirect addressing of local RAM is not easy. (In terms of cycles burned, it's faster to keep the stack in shared RAM than write self-modifying stack code.) TOS is cached in a register as a concession.
- The memory address is half as wide as a cell -- 16 vs 32 bits. Most Forths on such platforms (such as the 68000) use the memory address size as the cell size, but the P8X32 provides very little support for directly working with 16-bit values. Thus, a cell is twice as wide as a memory address, forcing the use of two new operators, H@ and H!, for dealing with halfcells in certain circumstances.
- Since the Cog and Shared RAM address spaces overlap, all execution tokens (which are halfcells, like in a token-threaded system) are pointers into shared RAM, but every CODE field points into Cog RAM. This is a tad ugly, but unavoidable -- unlike on the 8051 (for example) we can't remap the address spaces to be disjoint.

SHODDY MICROBENCHMARKS
For the curious, here are the two words I used to benchmark the runtime. These are microbenchmarks written to test very specific interpreter features, so they don't look much like normal high-level Forth code. (In particular, they bang pretty hard on the stack.)




\ Test basic stack arithmetic speed
\ Approx. 7n+2 words via NEXT, or 458747 total
: BENCH
1F1h L@ \ Push the current CNT
65535 \ We're avoiding DO/LOOP for now
BEGIN
DUP WHILE
1- \ This is a user-created colon word, not a primitive.
REPEAT
DROP
1F1h L@
SWAP - . ;

\ Like BENCH, but with even more hot stack action.
\ Approx. 13n+2 words via NEXT, or 851957 total
: BENCH2
1F1h L@
65535
BEGIN
DUP WHILE
\ This totally useless line measures our top-of-stack
\ manipulation speed (which is very important).
0 0 SWAP SWAP DROP DROP
1-
REPEAT
DROP
1F1h L@
SWAP - . ;


Post Edited (Cliff L. Biffle) : 11/1/2006 5:45:42 PM GMT

Eviatar Tsadok
11-02-2006, 01:36 AM
Hay Cliff

Interactive Forth sound cool.
I worked with RSC-Forth long time ago, I will be happy to work with this kind of solution.

Eviatar

Paul Baker
11-02-2006, 02:06 AM
Looking forward to your release. Forth is a language I've played around with and written a couple bastardized versions for microcontrollers, but never had an opportunity to use it·much for application development. Too bad PForth is already taken for a name.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker (mailto:pbaker@parallax.com)
Propeller Applications Engineer
[/url][url=http://www.parallax.com] (http://www.parallax.com)
Parallax, Inc. (http://www.parallax.com)

cgracey
11-02-2006, 02:33 AM
Yeah, Cliff!

I think if you could make this stand-alone, it would be especially interesting. Host computers are always orphaning projects. One that would still be accessible 30 years from now might be useful.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


Chip Gracey
Parallax, Inc.

Phil Pilgrim (PhiPi)
11-02-2006, 02:46 AM
Cliff,

Who's your father? Chuck Moore? http://forums.parallax.com/images/smilies/wink.gif

Seriously, this looks really cool! Forth can be an excellent bridge language between the user and the metal when speed is of the essence. It's often gotten a bad rap for being unreadable and, in truth, it's very easy to write unreadable code with it. But that's true of almost any language.

One thing I'd like to see, unless you're an ANSI Forth purist, is a wrap-around expression stack. Stack maintenance is the bane of Forth programmers. It would be nice just to be able to abandon a non-empty stack when you're done with it, rather than having to remember how much stuff to pop off. Implementing the stack like a queue might solve this problem, in that stuff that's not needed eventually just gets pushed off the end. Also stack overflow and underflow go away -- although this would be a bad thing for debugging. An alternative would be what Postscript does with its mark and cleartomark operators. Postscript is strongly typed, though, so a mark can be unique. In Forth, you'd have to keep separate track of any that were on the stack. (It's been awhile since I've done anything with Forth, so maybe this has been addressed in a later standard.)

Anyway, good luck with the project! I'm anxious to see more! And I've already got an app (four-axis mill) I wrote years ago in Forth for Zilog's Super-8 that I'd love to port to the Propeller.

-Phil

Cliff L. Biffle
11-02-2006, 05:58 AM
Glad folks are interested.


Chip Gracey said...
I think if you could make this stand-alone, it would be especially interesting.


That's the goal. It's already standalone if you have a serial terminal. http://forums.parallax.com/images/smilies/smile.gif

I'm working with your TV and VGA drivers, trying to embed them. (As I mentioned in a previous post, the main stumbling block is propasm's lack of constant expression support.) I'm not confident that Forth will be fast enough to drive the display itself -- not at 1.4m words/sec. It'll have to be native code.

My ultimate goal involves TV, PS/2, and SD mass storage -- but I won't overcommit until I get the thing shipped.


Phil Pilgrim said...
Who's your father? Chuck Moore?


No, though they do have stories about each other.


Phil Pilgrim said...
Seriously, this looks really cool! Forth can be an excellent bridge language between the user and the metal when speed is of the essence. It's often gotten a bad rap for being unreadable and, in truth, it's very easy to write unreadable code with it. But that's true of almost any language.


Forth is the essence of what we now call "domain-specific languages." It can be evolved to look like just about anything. If a programmer doesn't put thought into it, it can become far worse than Perl.

But in skilled hands, it produces some of the most elegant code I've seen. (Yes, I would rank it with or above Smalltalk.)


Phil Pilgrim said...
One thing I'd like to see, unless you're an ANSI Forth purist, is a wrap-around expression stack.


I'm not an ANS purist, but that's pushing it. http://forums.parallax.com/images/smilies/smile.gif

It's pretty straightforward to define words that reset the stack to a previously stored location, like in Postscript. They're used in some of the try/finally type constructs for writing guard blocks.

(Incidentally, what the hell were you doing writing Postscript? You've passed even my geek threshold here.)

There are other approaches -- ANS forth supports local variables, for example -- but in general, if stack management is becoming a chore, your word definitions are too long. This is something I keep relearning over and over, and never really applying. http://forums.parallax.com/images/smilies/smile.gif


Paul Baker said...
Too bad PForth is already taken for a name.


Hey, now. I don't think PropellerForth is so bad. http://forums.parallax.com/images/smilies/smile.gif

The directory on my disk is actually called P8x32Forth, so it can certainly be worse....

Peter Jakacki
11-02-2006, 06:26 AM
Goodonya (well done) Cliffe! Yes, I had started to write a token threaded Forth but alas I have been moving and setting-up this past month or so. Not only that but where I am is not Broadband enabled (maybe wireless) and connecting to the net at 28.8K is painful. I have written Forths for many 8/16/32-bit processors and most of my high-level embedded apps are written in it. Can we work together on developing this Forth? I am looking at making the spinning 8-bladed propeller dohickey go completely pcless and run development totally from the PS/2 keyboard and VGA with SD card storage.


*Peter*

Cliff L. Biffle
11-02-2006, 06:40 AM
Peter Jakacki said...
Can we work together on developing this Forth?


When PropellerForth is released (after I do some more testing and iron out some kinks), it will be under an open source license, and hosted on one of the collaborative development sites (probably Google Code, but possible SourceForge). At that point, I'm happy to accept enhancements, bugfixes, additions, and the like from anyone on the 'net.

I will be passing out prerelease archives on the forums shortly, and I encourage you (and everyone else) to bang on them and tell me what I screwed up. http://forums.parallax.com/images/smilies/smile.gif

mike101video
11-02-2006, 06:54 AM
Cliff,
This will be wonderful. I have always liked the interactive development in Forth.

Really looking forward to getting my hands on it.

Mike

vandys
11-02-2006, 09:25 AM
I'm a hard core Forth'er (see www.forthos.org)... I think this would be awesome to shake out!

Phil Pilgrim (PhiPi)
11-02-2006, 10:44 AM
Cliff L. Biffle said...
Incidentally, what the hell were you doing writing Postscript?

For a time, many years ago, I was blessed with a Postscript laser printer and cursed with a DOS computer, while all the Cool People had those new-fangled Macs. So programming Postscript was the only way I could get decent graphical output. I still use it occasionally to produce code wheels for encoders and the like.

-Phil

CCraig
11-02-2006, 08:28 PM
I'm normally on the BS2 side. I think this could be a great thing to draw people over to the Propeller. Forth would be a great addition to the tools. Sign me up.

Chris

Cliff L. Biffle
11-06-2006, 12:41 PM
Brief status update:

I've got most of the useful words from the ANS core set working, and have been doing a lot of reworking and simplification.

What Works:
- Interpreted interactive exploration (including defining words)
- Most necessary primitives (including hub operations and doubleword multiplication)
- Host "compilation" of words into the image (really just a Ruby preprocessor to assembler, I'm not using a Forth cross-compiler).
- Most interesting Propeller-specific functions (I/O, control of the counters, Cog control)
- A small subset of the ANS String word set.

What Doesn't:

The main stumbling block at this point is storage. I don't have any extra EEPROMs strung off the Propeller and the SD module I wanted (at SparkFun) is out of stock. Without mass storage, there's no realistic way to store, compile, and recompile the sources on the Propeller itself -- so this is taking priority over other self-hosting issues (such as keyboard and TV).

I'm considering writing elementary serial file transfer support (XMODEM or ZMODEM) and faking it that way, but we'll see.

Other elementary things that aren't working include:
- Multitasking, either within a Cog or across Cogs. (Not all the USER vars are being accessed through USER currently, so I can't separate tasks. This should be fixed this evening; I'm most of the way there.)
- Handy programming tools like DUMP, FORGET, and SEE.
- Direct EEPROM access.


I'm doing this primarily for fun and to learn the architecture, so I'm hesitant to involve others too early; however, I encourage interested parties to contact me. I'll have code for testing soon.

Phil Pilgrim (PhiPi)
11-06-2006, 01:05 PM
Cliff,

Just out of curiosity, how are you storing dictionary symbols? The first three letters and a length, or the whole enchelada?

Also, my own preference as a potential user would be to forget the Propeller-resident editing and file ops for now (which I'd likely never use), in favor of a complete and rich set of native words, including strings and conversions.

Thanks,
-Phil

Cliff L. Biffle
11-06-2006, 01:46 PM
Phil Pilgrim said...
Just out of curiosity, how are you storing dictionary symbols?


Length+7, case preserved (though user input is case-insensitive). My older Forths tended to use length+3, but I kept running into conflicts; a lot of Forth-83s used Length+7 and I think I've got the RAM to spare here.

I am currently storing all symbol information in the dictionary, but I plan to split it out so an image can optionally be stripped. (My embedded applications rarely use EVAL or INTERPRET.)


Phil Pilgrim said...
Also, my own preference as a potential user would be to forget the Propeller-resident editing and file ops for now (which I'd likely never use), in favor of a complete and rich set of native words, including strings and conversions.


Well, that's certainly where I'm headed now, but it's mostly for lack of spare parts. http://forums.parallax.com/images/smilies/smile.gif

We currently have decent String support (.", S", COMPARE, and some other goodies). I'm eschewing counted-string support in favor of the c-addr +u strings that the ANS folks like -- of course, any user could add counted string support trivially.

I don't have any of the pictured numeric output words working (<# #>); I only recently got division working, so base conversion and formatting were pretty far down on the list. http://forums.parallax.com/images/smilies/smile.gif

So, keeping in mind that the only way to compile Forth properly is Forth, how would you want to do your development? I can certainly document the dictionary layout and primitives, which would allow you (or others) to write a hosted cross-compiler in some other Forth.

Phil Pilgrim (PhiPi)
11-06-2006, 02:58 PM
Cliff,

The way I'd prefer to develop for a system like this is to use an editor like UltraEdit. If you had a shell command which then uploaded the raw source to the Propeller (e.g. fterm -f mysource.forth), your resident firmware could compile it as it came in. fterm would dump the stuff that comes back to STDOUT (which UltraEdit captures). Once uploading is complete, it could pop up an interactive terminal window (maybe only if a -t modifier were present on the command line) for testing words, etc. Anything entered or received would also get dumped to STDOUT.

Forth itself would have to have a built-in word to transfer the compiled image to EEPROM. That way it could be done interactively in the terminal window. Or the "store" command could be placed at the end of the source code.

-Phil

Post Edited (Phil Pilgrim (PhiPi)) : 11/6/2006 8:02:26 AM GMT

Cliff L. Biffle
11-06-2006, 11:00 PM
Phil Pilgrim said...
The way I'd prefer to develop for a system like this is to use an editor like UltraEdit.


UltraEdit?! Holy crap, is it still around? Neat! I actually remember it from back when I was on Windows!

I'm a vi guy myself for everything but Java (mmm, Eclipse), so I can't blame you.


Phil Pilgrim said...
If you had a shell command which then uploaded the raw source to the Propeller (e.g. fterm -f mysource.forth), your resident firmware could compile it as it came in. fterm would dump the stuff that comes back to STDOUT (which UltraEdit captures).


This is pretty much the hack I was proposing when I referenced XMODEM, though I was going to use an actual transfer protocol (with error correction and such) rather than just piping it.

I could hide that behind the script, of course. I'll see what I can do.


For me, of course, I'd rather write all my code in a Forth, even if it means I have to give up vi. Filing in Forth sources can have a lot of unintended side effects, since the source includes executable commands (or, really, is executable commands). If there's a syntax error halfway through, you can't roll the previous changes back.

(Sure, there's FORGET, but that's not complete. This pickiness probably comes from my day job working on transactional systems.)

Creating a word at a time in Forth helps prevent this.

rokicki
11-07-2006, 12:48 AM
I live very close to you (I am in Palo Alto) and I would be delighted to lend you my SD card setup.
I would even be happy to contribute in some small way to the programming, if you felt that would
be worthwhile.

The SD card setup is just the SparkFun socket with a .1" header soldered to it so it can be plugged
in to a breadboard, along with all the resistors needed to make it all fly on the demo board.

You can email me at rokicki at gmail.com if you are interested.

(Actually at some point it might be interesting to do a Bay Area Propeller Group or some such,
maybe in conjunction with the robot club or maybe separately.)

Phil Pilgrim (PhiPi)
11-07-2006, 12:53 AM
Cliff,

I think XMODEM would be overkill here and waste too many precious Propeller resources. But I do get your point about unintended side effects. One bad word, and you get a whole cascade of errors. Perhaps the PC-resident uploader could me made just a little bit smart and stop uploading when it saw the first error. Or define a Forth word that sets a flag that terminates compiling and enters flush mode on the first error.

In this environment, I'm not terribly concerned about rolling changes back, or even using FORGET, since I can just make my corrections in UltraEdit and upload the whole thing again. If I wanted to be interactive, I'd just run fterm -t without the -f and enter stuff by hand.

The main point, I guess, is to keep from having to store any raw source in the Propeller. You'll run out of room way too soon. Hence the upload-with-compile-on-the-fly suggestion. This is the way I developed with Forth on the Zilog Super8 (may it rest in peace), and it worked quite well.

-Phil

Cliff L. Biffle
11-07-2006, 03:22 AM
rokicki said...
I live very close to you (I am in Palo Alto) and I would be delighted to lend you my SD card setup.
I would even be happy to contribute in some small way to the programming, if you felt that would
be worthwhile.


Rock on. Which SD board do you have -- the raw breakout board, or the DOSonCHIP FAT controller? (I'm looking at the latter; all the automation of the GHI controllers, with a less stupid protocol. I'm trying not to have a FAT implementation on the Propeller if I can help it.)


rokicki said...
(Actually at some point it might be interesting to do a Bay Area Propeller Group or some such,
maybe in conjunction with the robot club or maybe separately.)


I'd be interested.

Cliff L. Biffle
11-07-2006, 03:31 AM
Phil Pilgrim said...
I think XMODEM would be overkill here and waste too many precious Propeller resources.


XMODEM is remarkably simple, too much so for most purposes. It's either that, or reinvent it (write my own protocol with some basic handshaking and CRCs).

Using XMODEM would also let Windows users use HyperTerminal or whatever, and allow Unix folks to use standard tools. (Why anyone uses HyperTerminal for anything is beyond me, I admit.) I'm not super interested in writing a cross-platform terminal emulator -- or even a platform-specific one -- when there are so many good open source ones available. I'd rather just use Minicom.

The scheme I'm working on would have the PC upload a chunk of source, and the Propeller compile it; repeat as necessary. I'm using the word "chunk" to distinguish it from Forth's traditional "block" -- this would not be a 64x16 grid of characters, but rather a text segment with normal line endings.

At the most granular level, this would be a single colon def, variable definition, etc. More realistically, individual source files of a kilobyte or three could be sent. A simple script could automate your transfer program to upload a directory of source files in a defined order.

Phil Pilgrim (PhiPi)
11-07-2006, 04:00 AM
Cliff,

I assume, even with the XMODEM option, that a straight-ASCII, line-oriented, compile-on-the-fly mode will still be there by default, since this is Forth's normal mode of operation — right? Will the compiler be fast enough not to require serial handshaking at, say, 9600 baud, if it were being force-fed?

-Phil

Cliff L. Biffle
11-07-2006, 05:18 AM
Phil Pilgrim said...
I assume, even with the XMODEM option, that a straight-ASCII, line-oriented, compile-on-the-fly mode will still be there by default, since this is Forth's normal mode of operation — right?


Yes -- in fact, this works right now, though it has a couple deviations from ANS that I'm working on.

Currently the terminal is 19200 8n1, and works just fine. I've tried pasting sources into HyperTerminal (to simulate what most users will experience) and it mangles them really, really badly. I blame HyperTerminal; I'll try it in minicom when I get home to the hardware.

Feeding sources this way (from a competent terminal emulator) should be possible, but like most traditional Forth compilers, mine is O(size of dictionary) for speed -- each lookup must traverse a linked list. So, you might want to introduce a brief delay between lines (a few bytes worth of bit periods should be fine).

I'm going to play with this more tonight when I'm home, and I'll post results. (And also possibly pass out binaries.)

Cliff L. Biffle
11-07-2006, 07:27 AM
So, an update to what I said before:

Currently, files can be "typed" into PropellerForth by using your terminal emulator's "send as ASCII" feature. The compiler will process your file a line at a time, storing no more than 80 characters in RAM at any point. This lets you build up the dictionary in-place without any mass storage.

It's less than ideal, but it works.

There's one main caveat: the terminal routines are in Forth, have not been optimized, and are not terribly fast. At 19200bps (the default terminal speed), you'll want a character-to-character delay of about 1ms, and a line-to-line delay of 50ms. Most terminal emulators support this.

Yes, this is annoying. Yes, it will be fixed.

Cliff L. Biffle
11-09-2006, 02:31 PM
As of tonight, I've squashed a number of lurking bugs and finished separating all task-specific state into Forth's USER area. This means PropellerForth now has multitasking.

Currently, it has the ability to execute no-argument words on other Cogs, but I'm working on cooperative multitasking within the Cogs as well. (They're based on the same mechanism, after all.)

In terms of ANS Forth, I've implemented most of the Core words, many of the Core Ext words, all of the Exception words, and a number of the Programming Tools words.

Here's a binary suitable for loading with Propeller Tool. It will bring up a serial console (19200, 8n1) on the standard pins (31 and 30, as used on the demo board).

Demo board users should see some LEDs come on and a prompt. Users of other boards beware: PropellerForth currently takes great liberties with pins 16-23, which drive debug LEDs on the demo board. These would be bad pins to use for, say, your atomic death ray.

I'll post some documentation soon, and work on cleaning up the sources for release. In the meantime, to help folks see results, here's some demo code to try. A note on the non-standard Propeller words:
- All the Propeller registers (CNT, CTRA, etc.) are defined as Forth words that return the in-Cog address.
- In-Cog addresses like this can be read with L@ and written with L!, by analogy to @ and ! for the shared RAM. (L is for Local, because C for Cog was taken.)

Paste this into your terminal emulator, or into a text file and send it as raw ASCII. Remember that my terminal code sucks, so set a character-to-character delay of at least 2ms, and an end-of-line delay of at least 50ms.




\ Sets up CTRA to generate audible frequencies.
\ Run before any of the following words.
: sound-init
1 10 lshift dira L@ or dira L! \ make pin 10, the left sound channel, an output
4 26 lshift 10 or ctra L! \ set counter A to pin 10, NCO mode
0 frqa L! ;

\ Beeps for 1/10sec. Takes a counter increment, not Hertz.
\ Convert Hertz to counter increment using the HZ library word, like this:
\ 440 HZ BEEP
: beep ( counter-increment -- )
frqa L!
100 milliseconds wait
0 frqa L! ;

\ Plays a short note.
: note ( frequency -- )
HZ beep
400 milliseconds wait ;

: rest 500 milliseconds wait ;

: sunshine
784 note
1046 note
1174 note
1319 note
rest
1319 note
rest rest
1319 note
1174 note
1319 note
1046 note
rest
1046 note ;




Once that's all entered, attach speakers and try:




sound-init sunshine




Note: it will be loud. I'm not doing PWM yet, as you can see.

Gavin
11-09-2006, 07:34 PM
Going to need to relearn Forth.
Last time I was interested was when the Harris forth chip came out.
Looking·forward to forth on the prop.
A lot of the early space stuff was writen in forth, they seemed to work without crashing into things.

Hmm Prop robot running forth?
And the price has droped on the chip too.
Hmm ballbot with prop controller?

Peter Jakacki
11-09-2006, 07:54 PM
Well Cliff, after I saw that you had released something I could actually play with I just had to try it even though I'm still in the middle of unpacking and setting up my office. The fact that I can actually write some code and examine the prop guts is great. I couldn't find WORDS plus some other stuff so I wrote a couple of Q&D words and here is the word list. I'll play with it some more when I have some time and BTW I can contribute some code, especially the SD card file system etc if you like.

*Peter*




1BAA: SECONDS 1B84: HZ 1B6C: MILLISE 1B58: REBOOT
1B42: LOCKRET 1B2C: COGSTOP 1B14: COGID 1AFE: CLKSET
1AD4: LEDEMIT 1AC2: SECOND 1AB4: VSCL 1AA6: VCFG
1A98: PHSB 1A8A: PHSA 1A7C: FRQB 1A6E: FRQA
1A60: CTRB 1A52: CTRA 1A44: DIRA 1A36: OUTA
1A28: INA 1A1A: CNT 1A0C: PAR 19FE: 'CLKFRE
19A6: COLD 194E: FIND 192C: .S 1900: THROW
18D2: CATCH 18BC: \ 18A6: ( 1882: S"
1856: (S") 183A: ." 1812: (.") 17F8: +LOOP
17DE: LOOP 17C8: DO 17AE: REPEAT 1798: WHILE
1782: AGAIN 1772: BEGIN 1758: ELSE 1748: THEN
1732: IF 171C: .( 16FA: ' 16B8: POSTPON
169A: !COLON 1678: !CONSTA 1660: ['] 1648: ;
1634: CONSTAN 161E: : 15A2: CREATE 1580: IMMEDIA
155E: REVEAL 153C: HIDE 14E6: NAME-MA 14BE: 3DUP
1468: COMPARE 140E: CASE-CO 13EA: >LOWERC 138C: QUIT
136C: ?REFILL 1306: INTERPR 12E0: DIGIT>N 1256: >NUMBER
11F6: NUMBER? 11AC: DIGIT? 1178: MAX-DIG 114A: BETWEEN
111A: PARSE-W 1102: [CHAR] 10EC: CHAR 1086: PARSE
1010: ] 0FFA: [ 0FE6: ABORT 0FD0: ?DUP
0FAA: ACCEPT2 0EFE: ACCEPT 0E94: . 0E7C: RUBOUT
0E64: PAD 0E36: STRING, 0E12: XALIGNE 0DFE: HALIGNE
0DEA: ALIGNED 0DD4: HALIGN 0DBE: ALIGN 0D94: DIGIT
0D68: TYPE 0D54: <RESOLV 0D44: MARK> 0D2C: <MARK
0D16: RESOLVE 0D02: NFA>LFA 0CEE: NFA>CFA 0CDA: CFA>NFA
0CCA: COMPILE 0CB0: C, 0C96: H, 0C7C: ,
0C64: ALLOT 0C52: HERE 0C40: SPACE 0C20: PARSE-A
0C0C: SOURCE 0BF6: DECIMAL 0BE0: HEX 0BD2: BL
0BC0: S>D 0BB0: D>S 0B92: /MOD 0B80: /
0B6E: <> 0B5C: TUCK 0B42: +! 0B2E: CR
0B1C: * 0B0A: 2DROP 0AF8: 2DUP 0AE8: HALFCEL
0AD4: CELLS 0AC6: CHARS 0AB6: TIB 0AA6: HANDLER
0A96: FIRST-W 0A86: RSP0 0A76: SP0 0A66: SOURCEC
0A56: >IN 0A46: STATE 0A34: BASE 0A26: FALSE
0A18: TRUE 0A00: PAGE 09E6: DP 09D8: TIBSIZE
09CA: 'KERNEL 09BA: LATEST 09AA: BYE 099E: HUBOP
0992: WAIT 0986: RSP@ 097A: SP@ 096E: U0
0962: RSP! 0956: SP! 094A: I 093E: (+LOOP)
0932: (LOOP) 0926: BRANCH 091A: 0BRANCH 090E: ?BRANCH
0902: 1- 08F6: 1+ 08EA: (/MOD) 08DE: M*
08D2: U< 08C6: 0= 08BA: 0< 08AE: >
08A2: = 0896: < 088A: INVERTA 087E: INVERT
0872: OR 0866: AND 085A: MIN 084E: MAX
0842: RSHIFT 0836: LSHIFT 082A: NEGATE 081E: 2/
0812: 2* 0806: - 07FA: + 07EE: KEY
07E2: EMIT 07D6: L@ 07CA: L! 07BE: @
07B2: H@ 07A6: C@ 079A: ! 078E: H!
0782: C! 0776: 2>R 076A: R@ 075E: R>
0752: >R 0746: NIP 073A: ROT 072E: OVER
0722: DROP 0716: DUP 070A: SWAP 06FE: DOVAR32
06F2: DOVAR 06E6: DOUSER 06DA: DOCON 06CE: EXECUTE
06C2: LIT16 06B6: LITERAL 06AA: EXIT 069E: ENTER
0692: NEXT

Ym2413a
11-09-2006, 10:53 PM
Come on now, I always connect my atomic death ray to pins 16 through 23. *laughs*
I ran your code and it blow up my house by miss-firing evil rays of death from my demoboard. http://forums.parallax.com/images/smilies/jumpin.gif

Forth looks cool!
The sad thing is I never programmed in it before, nor do I know its syntax.
But I'd be excited to see some Propeller-IDEs.

Cliff L. Biffle
11-09-2006, 11:14 PM
Peter Jakacki said...
I couldn't find WORDS plus some other stuff so I wrote a couple of Q&D words and here is the word list.


Yeah, WORDS and a few others are part of a source file I'm manually loading each time I flash the board, as a test of the compiler. So they're not in the standard image. (FORGET is missing too, you'll note.)

I'm absolutely interested in whatever help I can get. I've been hoping to do most of the kernel fundamentals myself, since it's been a while since I've written one and it's about as far as I can get from my day job -- but that's mostly done now, so I'll be releasing what I've got, and I'd love your help.

I'm actually pretty impressed that you put together WORDS without any docs. Can you post your source?

Cliff L. Biffle
11-09-2006, 11:15 PM
Ym2413a said...
Come on now, I always connect my atomic death ray to pins 16 through 23. *laughs*
I ran your code and it blow up my house by miss-firing evil rays of death from my demoboard.


Dude! What'd I tell you?!

Nobody listens. http://forums.parallax.com/images/smilies/smile.gif

Edit: Fixed some markup problems. mCode is silly. Why every CMS on the internet needs its own markup language is beyond me.

Cliff L. Biffle
11-09-2006, 11:46 PM
So, since a couple folks seem to have actually downloaded that image, here are some very brief docs on the Propeller bits.

Remember that PropellerForth is case-insensitive for user input, so these words don't have to be entered in SCREAMING UPPERCASE.

( Edit: My snarky comment about every forum needing their own markup language? Well, one of the advantages of a standard one is that I know which HTML tags stop my lines from wrapping. The code tag here apparently does too. Fixed. )



Local register words: PAR CNT INA OUTA DIRA CTRA CTRB FRQA FRQB PHSA PHSB VCFG VSCL
Local register/memory access words: L@ L!
Shared RAM halfcell (16-bit) access words: H@ H!
(PropellerForth uses halfcells for execution tokens and link fields in the dictionary; it's the size of a memory address.)

WAIT ( u -- )
Waits for some number of cycles (plus some execution overhead). Uses waitcnt under the hood,
but is not directly analogous (the argument is a cycle count, not an absolute clock value).

'CLKFREQ ( -- a-addr)
Pushes the address of the clock frequency in shared RAM (access with @ and !)

SECOND ( -- +u )
Pushes the number of clock ticks in a second. This will be renamed to /SECOND in a later release.
Uses 'CLKFREQ so you don't have to.

LEDEMIT ( c -- )
Pops a value and displays its low-order eight bits on the debug LEDs. Many words in the dictionary
use this internally at the moment.

INVERTAND ( n1 -- n2 )
A Forth equivalent of the Propeller's much-loved andn instruction.

HUBOP ( value command -- result )
Executes a hub operation (such as the LOCK* and COG* ops). Used as a primitive behind many
other words. Does not expose the value of C after the op (which some ops, like lockset and cognew,
require -- grrrrr).

CLKSET ( clkval -- )
Sets the system clock register; specifics on p.28 of the Propeller Manual. Uses Hubop.

COGID ( -- cogid )
Pushes the ID of the current Cog, which is a small integer. Uses Hubop.

COGSTOP ( cogid -- )
Kills a cog. The sequence "COGID COGSTOP" will shut down the current core.

REBOOT ( -- )
Reboots the entire chip, reloading code from EEPROM. This is your new favorite word. When you
hose the dictionary or nuke the interpreter code by poking around in RAM, this word will save you --
if you can still run it. Otherwise, there's always the reset button.

SECONDS ( n1 -- n2 )
Converts from seconds to clockticks, for use with WAIT or the counters. Usage: " 5 seconds "

MILLISECONDS ( n1 -- n2 )
Converts from milliseconds to clockticks, for use with WAIT or the counters. Usage: "500 milliseconds"

HZ ( n1 -- n2 )
Converts a value from Hertz to a counter increment. Usage: " 440 HZ "

'KERNEL ( -- a-addr )
Pushes the shared-RAM address of the Forth native code kernel. This is a useful building block for
spawning processes on other cores. (Though to do it properly, you must also know the layout of the
USER area; see below).





The USER area is Forth's per-task storage area (C/C++/Java programmers would call it Thread Local). PropellerForth's current USER area layout is:

Offset
$00 BASE - task's initial number base for I/O
$04 STATE - task's initial interpretation (0) or compilation (-1) state.
$08 >IN - task's initial parse position in its input - should be 0
$0C SOURCECOUNT - number of bytes initially in task's input - should be 0
$10 SP0 - pointer to base of task's data stack
$14 RSP0 - pointer to base of task's return stack
$18 FIRST-WORD - execution token (right-justified in a cell) of the first word this task runs
$1C TIB - task's terminal input buffer

All these are cell-sized variables, accessible with @ and !, except for TIB -- which is an 84-byte character buffer.

For the purposes of starting a new task, one must
1. Pick a block of RAM for the task's USER area. PropellerForth leaves most of high RAM, up to around $7000, free; I tend to use $6000 for the second task's USER base.
2. Pick sections for your data and return stacks. The space required will depend on your task, and no, from that other thread, we can't statically determine it. http://forums.parallax.com/images/smilies/smile.gif I tend to use $606E for the data stack and $616E for the return stack; this leaves a sizeable space for both.
3. Set at least SP0, RSP0, and FIRST-WORD, like this:



HEX
606E 6010 !
616E 6014 !
' word-you-want-to-run 6018 !




4. The USER area is passed to a new Cog in its PAR field, which is a 14-bit address in the high-order bits of coginit's argument. Using this knowledge and 'KERNEL, we can construct the word:



( still in hex )
6000 DECIMAL 16 LSHIFT 'KERNEL 2 LSHIFT OR 8 OR ( set the new-cog flag )



5. And start the cog. I don't have coginit packaged into a word, because (as you've noticed) this process is laborious and stupid. PropellerForth's COGINIT word will do most of this work for you.



( 2 is the hubop for coginit, as per the Propeller Manual p.366 )
2 hubop .



It'll display the ID of the cog you just started; you can stop it again by entering "the-cog-id-goes-here cogstop".

Happy hacking!

Cliff L. Biffle
11-10-2006, 12:20 AM
Okay, cooked up a more interesting example of multi-Cog Forth.

It's not protein folding, but it's a little more flexible than what I posted before. This example will spawn Forth tasks on all eight Cogs (counting the console task on Cog 0) and show it by blinking LEDs from each task. It allocates 7 USER task areas starting from $6000, for a total memory footprint of slightly over a kilobyte.

First, some toolkit words for setting up and starting tasks:



\ Given the base address of a task's USER area, does initial setup.
\ This doesn't set the BASE or other fields you need to do console
\ I/O, and sets the stack sizes very small.
: init-user ( user-addr -- )
[ hex ]
dup 90 + over 10 + !
70 + over 14 + ! ;

\ Handy holder for the size of a task's USER area, for this example.
: task-user-size 0D0 ;

\ Given a USER address and an XT, sets the first-word field.
: set-first-word ( user-addr xt -- )
swap 18 + ! ;

\ Given a prepared USER address, spawns a new Cog running Forth.
: start-forth-task ( user-addr -- )
[ decimal ]
16 lshift 'kernel 2 lshift or 8 or
2 hubop ;

\ Kills every Cog but the console Cog (0).
: killall
8 1 do i cogstop loop ;





And some demo program code:



\ Blinks an LED, 0-7, depending on which Cog we're on.
: cog-blink
1 cogid lshift ledemit \ Put a 1 in our cog ID's position
500 milliseconds wait
0 ledemit
500 milliseconds wait ;

\ Blinks forever. This is separate from the above so that
\ cog-blink can be tested.
: cog-blink-forever
begin
cog-blink
again ;

\ Given the addr of a task's USER area, sets the task up
\ to run cog-blink-forever when it starts.
: make-blink-task ( user-addr -- )
dup init-user
['] cog-blink-forever set-first-word ;

\ Spawns tasks on Cogs 1-7, offset by a quarter second.
\ Net effect: chasing LEDs.
: many-tasks
[ hex ]
7 0 do
\ Compute task address
task-user-size i * 6000 +
\ Set up the task
dup init-user
dup make-blink-task
\ Spawn
start-forth-task
100 milliseconds wait
loop ;




To use:



many-tasks




To stop:



killall

Peter Jakacki
11-10-2006, 05:32 AM
Well my initial test code isn't pretty but here it is. I have to rush off and start the day but I will get back later. This is fun.

*Peter*



hex

: .hex ( char -- )
30 + dup
39 > if 7 + then
emit ;

: .byte ( byte -- )
FF and 10 /mod .hex .hex ;

: .word
FFFF and 100 /mod .byte .byte ;

: .chars ( adrh adrl -- )
do i c@
dup 7F > if 7F and then
dup 20 < if drop 2E then
emit
loop ;

\ dump hex bytes + ascii from main memory
: dump ( adr cnt -- )
over + swap
do i 0F and 0=
if cr i .word ." : " then
i c@ .byte space
i 1+ 0F and 0=
if space space i 1+ i 0F - .chars then
loop ;

\ dump words from cog memory
: ldump ( adr cnt -- )
over + swap
do i 1F and 0= if cr i .word ." : " then
i l@ dup 10 rshift .word .word space
4 +loop ;

: ftype ( addr cnt -- )
over + swap do i c@ dup 0= if 20 + then emit loop ;

\ list dictionary words
: WORDS
0 latest @
begin
8 -
\ 4 words/line
over 3 > if swap drop 0 swap cr then
dup .word ." : "
dup 1+ 7 ftype
space space swap 1+ swap
nfa>lfa h@ dup
while
repeat
2drop ;

Cliff L. Biffle
11-10-2006, 06:53 AM
Peter,

Dead-on and reasonably well-factored for "quick-and-dirty" code. My only tip would be to use cfa>nfa instead of 8 -; the dictionary layout is going to change soon.

So, with a cfa>nfa nfa>lfa pair, you can get from the XT to the link field.

The next release will include .R, which will simplify your life some (by eliminating the need for .byte). I'll also go ahead and include more of my programming words (DUMP, FORGET, and possibly SEE if I can get it working right).

Peter Jakacki
11-10-2006, 09:33 AM
Thanks Cliff, ok now I'm back in front of the propeller I can cleanup that bit of code and correct some mistakes too. The WORDS formats each word so that it displays the CFA,code field,attributes,count, and of course the name.



19AE=[001D] ...04 COLD 1956=[001D] ...04 FIND
1934=[001D] ...02 .S 1908=[001D] ...05 THROW
18DA=[001D] ...05 CATCH 18C4=[001D] I..01 \
18AE=[001D] I..01 ( 188A=[001D] I..02 S"
185E=[001D] ...04 (S") 1842=[001D] I..02 ."




Next bit of code I will try to do something useful....

*Peter*



\ PBJ'S q&d CLB propeller forth extensions
hex
\ convert start and cnt to form suitable for "DO"
: bounds ( start cnt -- end start )
over + swap ;

: spaces ( cnt -- )
0 do space loop ;

: .hex ( nibble -- )
30 + dup
39 > if 7 + then
emit ;

\ Print as 2 hex digits regardless of current base
: .byte ( byte -- )
FF and 10 /mod .hex .hex ;

: .word ( 16bits -- )
FFFF and 100 /mod .byte .byte ;

: .long
dup 10 rshift .word .word ;

: .chars ( adrh adrl -- )
do i c@ 7F and
dup 20 < if drop 2E then emit
loop ;

\ dump hex bytes + ascii from main memory
: dump ( adr cnt -- )
bounds
do i 10 bounds do
i 0F and 0= if cr i .word ." : " then
i c@ .byte space
loop
2 spaces i 10 + i .chars
10 +loop ;

\ dump words from cog memory
: ldump ( wadr wcnt -- )
bounds
do i 7 and 0= if cr i .word ." : " then
i l@ .long space
loop ;

: ftype ( addr cnt -- )
bounds do i c@ dup 0= if 20 + then emit loop ;

: .head ( cfa -- )
.word ." =[" \ print cfa
dup h@ .word ." ] "
cfa>nfa
dup c@ 80 and if ." I" else ." ." then
dup c@ 40 and if ." C" else ." ." then
dup c@ 20 and if ." S" else ." ." then
dup c@ 1F and .byte \ print count+atr
space dup 1+ 7 ftype \ print name
4 spaces ;


\ list dictionary words
: WORDS
0 latest @
begin
\ 2 words/line
over 1 and 0= if cr then
dup .head
swap 1+ swap
nfa>lfa h@ dup
while
repeat
2drop ;

cgracey
11-10-2006, 10:17 AM
Cliff,

It looks like your Forth is really working well. I don't know much about Forth, but it looks like the kind of thing that I'd really like -- terse and RPN (cuts to the chase, doesn't it?). Are compiler optimizations even possible in such a direct language? It looks very intriguing. I think Chuck Moore had the right idea here.

I was reading earlier today about a new chip called SEAForth24. It has 24 processors which execute·1 billion native Forth instructions each, per second. I think I was reading that it comes in a 240-pin BGA and costs under $20 at 1K units. It seems aimed at very high-volume apps. If I remember from a while back, Chuck Moore designed this chip using his own IC layout tools written in...· Color Forth! It was interesting how he used·his hierarchical language to make a hierarchical chip. He made GDSII (Gerber for chips) tiles which were like macros that he could arrange hierarchically. Outputting the database was no thought for his tools. It seems Chuck Moore is all about identifying redundancies and eliminating them.

Anyway, I'm anxious to see how your Forth goes.



▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


Chip Gracey
Parallax, Inc.

Lawson
11-10-2006, 12:03 PM
huh, that SEAForth24 is an interesting chip.· One aspect I find especially interesting is that it uses asynchronis digital logic. (or more accuratly local/self clocking logic)· The posted power consumption is pretty dang low for a chip with·the equivelent of a ~1GHz clock!·· I wonder how well this type of clocking would adapt to propeller style shared memory interconnect?· I wonder how they get the chip to work in applications that need·precise pacing?· (didn't see anything like the Prop's CNT register)·

Anywho...

Marty

Cliff L. Biffle
11-10-2006, 02:07 PM
People are rediscovering Forth chips all over the place. I read three papers last night alone on the subject, released in the past year.

10 years of silence and they come back...what changed? FPGAs. One of the papers included the complete Verilog for a Forth core inline in the paper, in true Literate Programming style. It was very impressive and makes me want to dust off my Verilog. (FPGAs are so much more capable these days; cores will be the next source to open.)

Chuck's design tools are neat, but the next wave of chips will come from free tools, I promise you that.


Chip, as to Forth:

Forth tends to be terse, but in the hands of a capable programmer it's also one of the most readable languages out there -- because you can conform the entire language to your problem domain. Want infix, object-orientation, auto-vectorization? No sweat.

Forth optimizations are a well-explored field, and are quite straightforward. They tend to fall into a few discrete areas:
- Finding commonly repeated word sequences (swap + over, for example) and synthesizing them into a native-code primitive automatically (so-called superinstructions).
- Inlining short words to save on procedure-call overhead.
- Compiling "hot" words to native code, often on the fly.
- Stack analysis to convert the code to a register-to-register form, on architectures where that makes sense.

If you combine these three, you can compile the entire Forth system to native code, with performance rivaling a good C compiler. All three of these have been available in production systems since the 1980s, and I've used all three of these in other (non-Forth) VMs in the past. (Smalltalk, specifically, which is also a language that allows the programmer to change any aspect on-the-fly, making traditional optimizations difficult.)

On the Propeller, superinstructions are doable -- I've got nearly 200 longs of Cog RAM free in the current kernel. Native compilation is harder, since we can't execute directly from shared or external RAM. I am already working on inlining.

(I've worked with code to page short native-code sections in and out of Cog RAM, but Forth-style threaded interpreters are already very fast. The copy operation is invariably slower than executing an interpreted form -- and threaded code takes up vastly less storage space. I've discussed some potential optimizations for PropellerForth on my tech blog, and there'll be more to come.)

Cliff L. Biffle
11-10-2006, 02:22 PM
So, I didn't read my own docs -- the definition of UNTIL I provided is wrong, and will make your system do annoying things.

Here's one that works.




\ Skips back to a matching BEGIN unless 'flag' is true
: UNTIL ( flag -- )
['] 0branch compile,
<resolve ; immediate




This will be included in the next PropellerForth release, but for now, that should do 'ya.

Bill Henning
11-10-2006, 03:28 PM
Cliff, you wrote:

"On the Propeller, superinstructions are doable -- I've got nearly 200 longs of Cog RAM free in the current kernel. Native compilation is harder, since we can't execute directly from shared or external RAM. I am already working on inlining."

Actually... time for me to fess up.

Check the new thread I'm posting in a minute or two "Large memory model for Propeller Assembly language programs"

Cliff L. Biffle
11-12-2006, 09:01 AM
Folks, PropellerForth now supports multitasking "both ways" -- running Forth words on multiple Cogs, and cooperative multitasking on a single Cog.

Each task requires 40 bytes of shared RAM to hold some state, plus stacks (plus another 80 for tasks that talk to the terminal). Only 4 of these bytes are specific to the task switcher.

I demonstrated running Forth code on multiple Cogs earlier, and will post the cooperative task code shortly. In summary form, the system is quite similar to most RTOSes:
- Tasks are linked together into task cycles (mostly transparently to the user), which are scheduled round-robin.
- Control is passed from task to task using the traditional Forth word PAUSE. (If the standard library is made task-aware it'll also be passed at any blocking operation, like some I/O primitives.)
- The code is entirely in Forth, so task switches take several microseconds (see numbers below). This could be optimized or moved to native code, but it's well within the parameters for my application (a full-duplex serial driver).

Using a minimal test environment (total overhead of 60 bytes per task), I've successfully run 50 tasks on a single Cog. All tasks were sharing code and some global data, but had thread-local copies of necessary variables.

The test routine:



: blink-forever
begin
u0 ledemit \ writes the address of this task to the debug LEDs
pause
again ;




For a ring of 42 tasks (chosen because it's an amusing number and also fits in a buffer I'd allocated), 1,000 context switches from this routine took 87,424,528 cycles. That's 1.09s, or 1.09ms per iteration -- so 26.01us per task activation. With one task (but with task switching), it took 26ms (the same 26us per activation).

With task switching disabled, it took 16.4ms for the same 1,000 iterations, suggesting a task-switch overhead of 10us.

This code will be included in the next release of PropellerForth (which, it should be noted, will include sources!).

Cliff L. Biffle
11-12-2006, 02:11 PM
I've set up a web page for PropellerForth.

www.cliff.biffle.org/software/propeller/forth/ (http://www.cliff.biffle.org/software/propeller/forth/)

There's a newer release there than what I've posted in this thread, though it's a build from earlier today and lacks the in-Cog tasker.

I've also set up code hosting and issue tracking on Google Code Hosting, so folks can feel free to file bugs against me there. (The project page is linked from the homepage above.)

Future updates will be posted on the website; I find forum threads to be really bad repositories for information, so I'm going to quit updating this one unless anyone has specific questions or issues they'd like to discuss.

Bill Henning
11-12-2006, 02:18 PM
Way to go!!!

Very kewl.


Cliff L. Biffle said...
Folks, PropellerForth now supports multitasking "both ways" -- running Forth words on multiple Cogs, and cooperative multitasking on a single Cog.

Each task requires 40 bytes of shared RAM to hold some state, plus stacks (plus another 80 for tasks that talk to the terminal). Only 4 of these bytes are specific to the task switcher.

I demonstrated running Forth code on multiple Cogs earlier, and will post the cooperative task code shortly. In summary form, the system is quite similar to most RTOSes:
- Tasks are linked together into task cycles (mostly transparently to the user), which are scheduled round-robin.
- Control is passed from task to task using the traditional Forth word PAUSE. (If the standard library is made task-aware it'll also be passed at any blocking operation, like some I/O primitives.)
- The code is entirely in Forth, so task switches take several microseconds (see numbers below). This could be optimized or moved to native code, but it's well within the parameters for my application (a full-duplex serial driver).

Using a minimal test environment (total overhead of 60 bytes per task), I've successfully run 50 tasks on a single Cog. All tasks were sharing code and some global data, but had thread-local copies of necessary variables.

The test routine:



: blink-forever
begin
u0 ledemit \ writes the address of this task to the debug LEDs
pause
again ;




For a ring of 42 tasks (chosen because it's an amusing number and also fits in a buffer I'd allocated), 1,000 context switches from this routine took 87,424,528 cycles. That's 1.09s, or 1.09ms per iteration -- so 26.01us per task activation. With one task (but with task switching), it took 26ms (the same 26us per activation).

With task switching disabled, it took 16.4ms for the same 1,000 iterations, suggesting a task-switch overhead of 10us.

This code will be included in the next release of PropellerForth (which, it should be noted, will include sources!).

Cliff L. Biffle
11-12-2006, 03:28 PM
Okay, I lied. Here's an update.

I've managed to run 256 threads on a single core. All were running a really simple cycle-burner so I could measure their performance.

For the folks who'd rather not read through a page of Forth code, here are the results. Sure wish mCode supported tables.

Benchmark Times (all in cycles)

Single Task:
Tasker disabled, one task: 1,536,528
Tasker enabled, one task: 2,304,528

Having the tasker enabled imposes roughly a 768 cycle penalty for each call to PAUSE, even with only one task. That's what I get for writing the engine in an afternoon; I'll optimize it later. (For those playing along at home, that's 9.6us.)

Multiple Tasks:
2: 4,560,528
4: 9,072,528
8: 18,096,528
16: 36,144,528
32: 72,240,528
64: 144,432,528
128: 288,816,528
256: 577,584,528

Performance scales almost perfectly linearly with the number of tasks -- that is, doubling the number of tasks halves your speed. The scheduler is deterministic O(n) for the number of tasks, by design -- though this may change when I get the thread prioritization code working.

The system has no hard limit to the number of tasks; it's bounded only by available RAM, and at 256 tasks, I had -4 bytes free. (I allowed one task to reuse some of my interpreter's thread-locals. It's cheating, but it worked.) I am deliberately overallocating, giving each task 96 (decimal) bytes.

Astute readers have noticed that every time ends in 528. I think it's weird too. You can try to replicate it with the sources below.

The Code

This will be built into the next PropellerForth release, but interested parties can try it now. Paste the code into your terminal emulator, or put it in a text file and send it as ASCII.

Note that this is not exactly the code I ran for my benchmarks above. It's slightly better factored, and runs a bit slower. Readers can match (or beat) my numbers by inlining the task-id display code.

First, the multitasking engine:




hex \ the number base of champions

\ Given a pointer to the base of a task's USER area,
\ sets up a pretty generic, spacious task.
\ Note: you cannot PAUSE directly to this task, you
\ must first activate it with start-task
: create-task ( user-addr -- user-addr )
10 over ! \ Set the number base for I/O...
dup 7C + over 10 + ! \ ...the stack pointer
dup 17C + over 14 + ! \ ...the return pointer
next-task @ over 24 + ! \ ...the next task in the cycle
dup next-task ! ; \ ...and set it to our next task

\ Given a pointer to the base of a task's USER area,
\ sets up a dinky task suitable only for trivial things.
\ Note: you cannot PAUSE directly to this task, you
\ must first activate it with start-task
: create-small-task ( user-addr -- user-addr )
10 over !
dup 28 + over 10 + !
dup 48 + over 14 + !
next-task @ over 24 + !
dup next-task ! ;

\ Suspends the current task, passing control to the
\ next (which may be the same task)
: pause
rsp@ \ stash our return-stack pointer on the param stack.
sp@ saved-sp ! \ store our stack pointer in a USER thread-local
next-task @ usp! \ activate the next task
saved-sp @ sp! \ restore our stack pointer (we're in the next task)
rsp! ; \ restore our return stack pointer and proceed

\ Given a pointer to a USER area (prepared with
\ create-task or create-small-task) and the XT of
\ a word to run, starts a new task, immediately
\ giving it control.
: start-task ( user-addr xt -- )
rsp@
rot rot 2>r
sp@ saved-sp !
r> r>
dup next-task !
usp!
sp0 @ !
sp0 @ sp!
rsp0 @ rsp!
execute \ Start the task's initial word
bye ; \ If we ever return, halt -- something is wrong

\ Creates a small task using memory from the
\ dictionary.
: allot-small-task ( -- user-addr )
align \ make sure we're at a word boundary
here 60 allot \ make room
create-small-task ;




Now, some code to multitask:



\ Shows a semi-unique value for the
\ current task, to show that we've switched
: show-task-id
u0 \ address of user area
5 rshift \ these bits are not interesting, drop them
ledemit ;

\ Simply keeps showing our task ID.
\ This is used for the non-interactive tasks.
: show-task-id-forever
begin
show-task-id
pause
again ;

\ Show the task ID and pause a specified
\ number of times. Used for the interactive
\ task (so we get control back).
: show-task-id-repeatedly ( count -- )
0 do
show-task-id
pause
loop ;

: spawn-task
allot-small-task
['] show-task-id-forever start-task ;

: tasks ( count -- )
0 do spawn-task loop ;





To use, enter something like the following:



32 tasks
1000 show-task-id-repeatedly




If you'd like to see how long it takes, enter this word:



: benchmark
'
cnt l@ >r
execute
cnt l@ r> -
. ." cycles" cr ;




It's used like this:



1000 benchmark show-task-id-repeatedly




Happy hacking!

Post Edited (Cliff L. Biffle) : 11/12/2006 8:36:46 AM GMT

Phil Pilgrim (PhiPi)
11-12-2006, 04:15 PM
Cliff,

mCode does support tables, apparently. Chip posts them occasionally (example here (http://forums.parallax.com/showthread.php?p=613987)). I can't figure out how he does it, though, and I've been meaning to ask him.

-Phil

cgracey
11-12-2006, 05:48 PM
I didn't know one way or another, I just copied from Word and pasted it in.

Phil Pilgrim (PhiPi) said...
Cliff,

mCode does support tables, apparently. Chip posts them occasionally (example here (http://forums.parallax.com/showthread.php?p=613987)). I can't figure out how he does it, though, and I've been meaning to ask him.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


Chip Gracey
Parallax, Inc.

Cliff L. Biffle
11-13-2006, 12:01 AM
I reckon that Chip, as a forum moderator, can use HTML.

simonl
04-02-2007, 07:27 PM
Hey Cliff, how PropellerFORTH comin' on?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,

Simon

BTW: I type as I'm thinking, so please don't take any offense at my writing style http://forums.parallax.com/images/smilies/smile.gif

-------------------------------
www.norfolkhelicopterclub.co.uk (http://www.norfolkhelicopterclub.co.uk)
You'll always have as many take-offs as landings, the trick is to be sure you can take-off again ;-)