So how is your coding doing 2M baud?
Why is it so much faster than the standard Full Duplex 4 port object?
Is the serial driver full duplex?
When do you think the whole of the Tachyon code will be posted?
Thanks in advance for your patience with all these questions.
cheers,
rich
There's nothing really difficult in getting the Prop to handle a couple of mega-baud but to get it to do it full-duplex or worse still, four full-duplex channels from the one cog is a huge ask. I dedicate a cog just to handle the receive and buffering side while the Tachyon VM cog will bit-bash the transmit side directly as it's faster to do it that way. So effectively you have Full-duplex without the timing problems that would normally arise from getting a cog to jumpret here and there and introducing jitter both in receive and transmit as is the case with most implementations. I've looked at the timing for a lot of these implementations and while they are fine at low speed they are way out, asymetrical, and jittery at high speeds.
The code I have on Google Docs is "live" all the time and will always be the most current version as it is the one I work off, if I make a change and someone has the document open then their document will reflect that change immediately. However to keep a few grumblers happy I have zipped up and attached the current version.
I am still a few more days away or so from having a fully functional system running, at present I need to finish off the whole text interpreting thing with it's number processing and compilation modes and then I will look at integrating SD and VGA etc.
Do you wonder what the memory map looks like for Tachyon? I made a quick one up in Tachyon's on-line document but here is what it looks like:
Out of this almost 4K bytes there is another 800 bytes or so that I could reclaim again so that really brings the memory footprint closer to 3kB. Since Tachyon bytecode is so efficient this might only go up another few k when I add in the rest of the high-level functions.
Here's an update, I've had a busy week but still managed to get a lot more going. The current version runs at a default of 3M baud (I use Minicom on Linux) and includes break detection to reboot the Prop. The word and number parser are working well as are the formated print functions. Numbers encountered in the terminal text input in Forth are normally converted in the current number base (decimal,hex,binary etc) but I also allow a very flexible format for these numbers as I like to use them this way.
Number prefixes include: $ for hex, % for binary, # for decimal while suffixes can be h for hex, d for decimal, and b for binary. As long as it begins or ends in a digit or an valid symbol you can use any other symbol mixed between the digits for instance:
#12:45 is decimal for 1245
%1100_0101 is binary for $C5
5_000_000d or #5,000,000 is what it looks like etc
BTW, a hex dump listing of the full 64K of memory takes 4 seconds including interpreting the text "CNT@ 0 $10000 DUMP CNT@ SWAP - #80,000 / DECIMAL ." which means at 56 characters/line and 4,096 lines that equates to 17.43us average time per character which also includes the 4us or so to send the character! I know that Spin would never ever keep up with that!
Next step now is to implement compilation mode with it's associated dictionary handlers, Hopefully I will have that done this weekend.
3M baud is crazy fast. Didn't somebody say 9600 should be fast enough for anybody?
I didn't check, are you doing any error checking? Sal said something about a limit at which point it starts to get unstable, have you noticed anything like that?
3M baud is crazy fast. Didn't somebody say 9600 should be fast enough for anybody?
I didn't check, are you doing any error checking? Sal said something about a limit at which point it starts to get unstable, have you noticed anything like that?
At 3M baud the terminal screen updates blindingly fast and unless it's a long listing all it does is blink a new screen into existence! Since a cog is devoted to receive only it is not handicapped with trying to maintain buffering and timing and timing for the transmitter as well. To do this in Spin would require three cogs: one for Spin, one for Receive, one for Transmit. However in Tachyon I can transmit straight from Tachyon's cog and leave the other cog to handle the receive. I can probably go faster yet still but 3M works reliably.
BTW, I have updated the top post with a zipped attachment of the latest files which I will keep up-to-date with the latest "release" version whereas the on-line document will always be the live and bleeding edge.
The VGA_512x384_Bitmap object has now been incorporated into the source as there is sufficient memory available despite the fact that this object requires 24KB just for the bitmap. Being able to include objects such as these was one of the requirements of Tachyon which is also why it had to have a very small footprint despite the need for speed. Access to the object at present is via the color and pixel pointers but at a later stage I will write a Graphics demo object in Forth to showcase Tachyon.
I mentioned in the thread that Tachyon does not follow the traditional method of handling text input which normally would accumulate into a text input buffer (TIB) and after an <enter> would be processed word by word in an interpretive fashion. One of the drawbacks of this is that branch and loop stuctures which are meant to be compiled cannot be entered in an interpretive fashion and you just end up with the "compile only" warning. What I've been able to do is to skip the TIB and just parse words as they are encountered in the input stream and compile these immediately into the current code compilation location. When an <enter> is encountered the bytecodes are then executed which means that it is now possible to type in interactively such statements as:
20 1 DO CR I 0 DO 2A EMIT LOOP LOOP<enter>
without having to resort to creating a special word, invoking it, and then forgetting it (deleting). The interactve statements also behave exactly like they would inside a definition and at the same speed. Of course words such as IF and THEN etc are tagged as immediate words and are not compiled but instead execute immediately and control the compilation.
My next step which I am presently on is allowing new definitions to be created to which end I am testing the dictionary and vector table handlers. This stage of the testing will probably be completed in the next 24 hours.
Has anyone tried loading Tachyon onto a board yet? Please let me know if you had any problems. The same goes for viewing the on-line Google document source. BTW, I am using one of my Pixie boards at the moment so I can play with the VGA graphics especially when the Tachyon kernel is developed enough (soon) that I can just communicate to it via Bluetooth or ZigBee and interactively develop my applications.
Here's a simple benchmark for Spin vs Tachyon in terms of raw HUB memory access. I increment each byte of the 24K of bitmap memory which of course involves a hub read and a hub write as well as looping. So both fragments of code are limited by the hub access time and shouldn't really perform any differently as hub access is like wading through molasses. However the Spin loop takes 368ms to execute vs the 137ms it takes Tachyon to do the same. I will look at some other routines later and compare the differences.
' Spin: Increment 24kB of video memory byte by byte - takes 368ms with 50 bytes of code
repeat
i := Pixels
x := CNT
repeat $6000
byte[i++] += 1
y := CNT
x := (y-x)/80000
coms.tx($0D)
coms.tx($0A)
coms.dec(x)
' Tachyon: Increment 24kB of video memory byte by byte - takes 137ms with 32 bytes of code
100 0 DO CNT@ $8000 $2000 DO 1 I C+! LOOP CNT@ SWAP - 80,000 / CR U. LOOP
If I replace "byte[i++] += 1" with "byte[i++]++" in Spin I get 294 msec. I get 132 msec in TrimSpin.
AS I understand it, TrimSpin is a subset of spin. What kind of applications do you use it for?
For example, forth is often used for cases where we want drivers maybe eventually in assembler, but don't want to write the whole program in assembler. And/or we want to develop and test interactively.
Can you do everything spin can do using your subset? This is a cool idea.
If Tachyon is capable of handling the full repertoire of Spin commands -- either natively or with a combination of native words -- I wonder if it might be a better target virtual machine for the Spin compiler than the current p-code interpreter.
AS I understand it, TrimSpin is a subset of spin. What kind of applications do you use it for?
Can you do everything spin can do using your subset?
The current appoach on TrimSpin is to write a Spin interpreter that implements all of the instructions, but runs about twice as fast. However, because this doesn't fit in a cog, the interpreter is trimmed down to only the bytecodes that are used by a particular program. This is done by profiling the bytecodes in an existing binary file, and generating an interpreter that matches it.
I'm not sure which applications would use it other than those that need to run just a little bit faster.
The current appoach on TrimSpin is to write a Spin interpreter that implements all of the instructions, but runs about twice as fast. However, because this doesn't fit in a cog, the interpreter is trimmed down to only the bytecodes that are used by a particular program. This is done by profiling the bytecodes in an existing binary file, and generating an interpreter that matches it.
I'm not sure which applications would use it other than those that need to run just a little bit faster.
Peter, the 88 msec time is very impressive.
I think you are on the right track with having your own Spin interpreter and trimming the way you do. I assume that what can't fit in the cog but is needed is then implemented at a higher-level?
As I develop and test Tachyon further I keep optimizing the kernel. I just improved the erase and fill operations simply by creating a helper word with two PASM instructions so that I can now fill or erase the 24kB of memory in 34ms.
: CLS pixels W@ $6000 ERASE ;
In the kernel I also had code for some fast single bytecode constants using mov X,#n and jmp #PUSHX instructions but now I have reduced this to one instruction per "entry" so that constants 0...8 take but 10 instructions total.
As of this moment the runtime compiler seems to be working fine although there are a lot of little things that need to be tidied up. Here is a little test code that I use to list the words in the dictionary. BTW, I have a whitespace after each line except the last to prevent execution of the line of code which normally is executed on a <CR>. I will tidy up little things like this in the process of testing and developing.
I have updated the top post to include a demo board binary and Forth source test samples in the zip attachment. So now it's easy just to load in a standard binary (set for 57.6K baud) and hookup a VGA monitor.
shown below, and coded that in TACHYON, what would the result look like ?
It seems the braces in the Instruction List carry an implied stack ?
; Instruction list coded
AND(
OR I0.1
OR I0.2
)
AND(
OR I0.5
OR I0.6
)
AND I2.0
AND I2.5
= Q2.3
; Algebraic coded this is
Q2.3 = I2.0 AND I2.5 AND (I0.1 OR I0.2) AND (I0.5 OR I0.6)
This link http://users.isr.ist.utl.pt/~pjcro/courses/api0809/docs/API_I_C3_IL.pdf
has a slightly different style, and I think this is also equivalent to the above
AND(
LD I0.1
OR I0.2
)
AND(
LD I0.5
OR I0.6
)
AND I2.0
AND I2.5
ST Q2.3
shown below, and coded that in TACHYON, what would the result look like ?
It seems the braces in the Instruction List carry an implied stack ?
The thing with Forth is that you can tailor the way it compiles to suit your style or application. In this case I have added or modified a couple of words which would eventually be part of the kernel anyway. Following this I set the number base to octal to allow the port numbering system to be used where the last digit I assume is from 0..7, so it's octal. So if I interpreted the instructions correctly they could look like this example where I have spread the statement over a few lines and named it as well.
STL doesn't look like it's using a stack from the programmers perspective at least, interally it may but with a lot of these compilers the syntax is very rigid.
\ Let's define a couple of general-purpose extensions
: IN ( bit -- ) MASK P@ AND 0<> ;
: OUT ( state pin -- ) px ;
\ change number base to octal to enhance readability of port numbering
8 BASE C!
\ Now code the example
: PLC_DEMO
0.1 IN 0.2 IN OR
0.5 IN 0.6 IN OR
AND
2.0 IN AND 2.5 IN AND
2.3 OUT
;
STL doesn't look like it's using a stack from the programmers perspective at least, interally it may but with a lot of these compilers the syntax is very rigid.
I did find some more info that mentions stack here,
The current version appears to be working well although I still have to build in a SAVE feature for backing-up downloaded user code for full auto-restore and run. But that will come soon. For now I have uploaded to youtube a short video where I load in some Forth code into a Prop board running Tachyon and show how well this bytecode handles bitmapped VGA without resorting to a GPU cog.
For reference this is the code I downloaded:
: START CNT@ $10 @REG ! ;
: LAP CNT@ $10 @REG @ - #80,000 / ;
\ **************************** VGA FUNCTIONS ********************
: CLRSCN pixels W@ $6000 ERASE ;
: COLORS colors W@ W! colors W@ DUP 2+ #382 CMOVE ;
$FF04 COLORS
{
: PLOT ( x y -- )
6 SHL OVER 3 SHR +
SWAP 7 AND MASK SWAP pixels W@ + SET
;
}
: HLINE ( x y length -- )
ROT SWAP ADO I OVER PLOT LOOP DROP
;
: VLINE ( x y length -- )
ADO DUP I PLOT LOOP DROP
;
: ITEM 2* 2* @REG ;
: ITEMS ( items -- ) 0 DO I ITEM ! LOOP ;
: RECT ( x1 y1 xlen ylen -- )
4 ITEMS
3 ITEM @ 2 ITEM @ 1 ITEM @ HLINE
3 ITEM @ 1 ITEM @ + 2 ITEM @ 0 ITEM @ VLINE
3 ITEM @ 2 ITEM @ 0 ITEM @ VLINE
3 ITEM @ 2 ITEM @ 0 ITEM @ + 1 ITEM @ HLINE
;
: BOXES
CLRSCN
#200 0 DO I I 50 50 RECT 4 +LOOP
$C0 $C0 $30 FOR 2DUP $80 $40 RECT SWAP 4 + SWAP 4 - NEXT 2DROP
;
: SLANTS 180 0 DO 100 0 DO I J + I PLOT LOOP 4 +LOOP ;
\ : RSLANTS 180 0 DO 100 0 DO I 180 J - + 100 I - PLOT LOOP 4 +LOOP ;
: X 8 @REG ;
: Y #10 @REG ;
: VCR 0 X W! ;
: VLF #34 Y W+! ;
: HOME VCR 0 Y W! ;
HOME
: CHAR ( ch -- )
DUP 2/ 7 SHL $8000 +
( ch addr )
20 0 DO DUP @ 3RD 1 AND IF 2/ THEN
10 0 DO DUP 1 AND IF X W@ I + Y W@ J + PLOT THEN 2/ 2/ LOOP
DROP 4 +
LOOP 2DROP #18 X W+! X W@ #500 > IF VCR VLF THEN ;
: CTRL
DUP $0D = IF VCR DROP EXIT THEN
DUP $0A = IF VLF DROP EXIT THEN
DUP $0C = IF HOME CLRSCN DROP EXIT THEN
DUP $01 = IF HOME DROP EXIT THEN
DUP $1B = IF DROP R> DROP EXIT THEN
CHAR
;
: VEMIT DUP 20 < IF CTRL ELSE CHAR THEN ;
: DEMO
CLRSCN HOME
BEGIN KEY VEMIT AGAIN
;
Hi Peter,
I am very impressed by your direction implemeting forth. I like the byte code which is directly the adress. Simple and fast whow!
I have played just a little bit with tachyon. I can switch on a LED at port P7: 7 MASK OUTSET. I have not been able to do this with the LED on Port P23 of the demoboard: "23 MASK OUTSET".
I cannot see a number with two digits (22) on the stack using ^d. But if I use "." I can print it.
What is your idea of the system, when will it be complete for you? Will it be a self- hosted system with SD-card and keyboard support?
I understand that the stack in cog-ram is a key to a very fast implementation but I am a little bit frightend what ca be done with such a short stack?
Hi Christof,
Thanks, I'm trying to keep it simple, both in terms of the kernel source and one step compilation plus in terms of just using it. Yes, I will be implementing full SD and keyboard drivers etc, either in Forth itself perhaps running in another cog or a stripped down version from the OBEX. The default number base is hex so that may be the problem then? I will update the binary to the latest version too. You can always force the number to decimal either as #23 or 23d and the same goes for forcing hex or binary as well as ASCII and control characters as in "#" and ^D.
Yes, the stack "seems" a little small but stack abuse is rife in a lot of Forth code and should really be reined in. In fact the return stack gets misused by many programs because they have no way of manipulating data easily on the data stack and when the return stack is not restored properly, then whamo, crash ..... not good. So you see I have the loop stack just for loops or even for temporary values and they can even be addressed easily too using the I,IX, and J type words etc. The other thing that helps is the use of the fixed register area which is used internally and also to handle four or more parameters. Eventually I will have local variables created from the stack comment ( -- ) and saved in these registers. Manipulating and juggling values on the data-stack is fine when there are only 2 or 3 values but over that it becomes annoying. Anyway my original plan was to try and keep it small and tidy on the stack but allow for a mechanism which could automatically come into action when the stack overflowed and start saving these values to hub memory but only when necessary which is obviously required to make the system robust (but slower during these exceptions).
I like the loop compiling from the input stream - very useful for testing/playing!
Thanks Rick, that's what always used to annoy me with every Forth I've ever played with (including my own) in that you had to create a definition to run a quick command just because it contained looping and/or branching etc. So compiling word by word has advantages and when it is executed on an <enter> it runs at the same speed and the exact same way it would when it's built into a definition. The VGA stuff I'm doing is really just to provide a test environment for Tachyon at present but it looks like it would be good to continue with this stuff and turn it into a development system on a chip rather than having to rely on PCs or other platforms. When Prop II comes along I will be in a good position to port this stuff over fairly easily I think.
BTW, when I did some testing and tweaking of my serial receive driver the other day I found out why it wouldn't work above 3M baud. It turns outs that Minicom is quite happy to issue the 3.5 and 4M baud commands to the Linux drivers but the FTDI chip just reverts to a slow speed instead. From the timing it looks like I could go faster again if only the PC could keep up !
Hi Peter,
great if this will be a full self hosting system!
_DECIMAL 22 MASK OUTSET ok
does not switch on the LED.
I downloaded the new version, but I still cannot see a 22d on the stack. Only after a DUP there are two entrys 16hex. See the terminal protocol below.
Perhaps you want to have a look at it, although I don't want to break your actual work.
Christof
Propeller .:.:--TACHYON--:.:. Forth V1.0 rev120727.1800
_22d ok
Ah, I have pins 16..23 setup for VGA, that would explain why you can't set the output. As for the 22 not appearing on the stack the problem could be with the earlier version's whitespace "feature" in that it would compile but not execute a line if it had a trailing whitespace. Try it without a space after it and just hit <enter> or else download the latest binary at the top post as I have just updated it before. I tried it just now:
Propeller .:.:--TACHYON--:.:. Forth V1.0 rev120727.1800
_ ok
_.S ok
STACK: 00000000 00000000 00000000 A55AA55A _ ok
_22d ok
.S ok
STACK: 00000000 00000000 A55AA55A 00000016 _
Here's a new video that shows a bit more detail that you should find interesting
Hi, Peter,
as the cog outputs are "ored", this is not a problem of the hardware.
I had already tried rev120727.1800.
_22d MASK OUTSET ok works
_DECIMAL 22 MASK OUTSET ok does not work.
Yes, the DECIMAL is compiled but not executed until you hit enter however the numbers are converted before that happens. The best way is to either set the base beforehand as you have done or else explicitly force the number to a decimal either as #22 or 22d so that there is no confusion. i just checked and the OUTA and DIRA registers are being set correctly although I have VGA hooked up to these pins so I haven't connected an LED.
_22d MASK OUTSET ok
_1F4 COG@ . ok
40400000 _ ok
_1F6 COG@ . ok
40400000 _ ok
Thanks for the explanation, Peter and keep on with this!
I will next try your forth for a little experiment. The interactive way and the speed should be just optimal for such things.
I want to compare a transistor amplifiier with a valve amplifier. I want to give them short burst signals like beep-pause-beep-pause... as input. This shall be done with the demoboard. Should be quite simple with your forth, but I am no forth guru at all....
Due to the difference of the output impedance I assume, that a difference of the loudspeaker input signal at the end of the beeps could be visible with an oscilloscope.
Thanks for sharing Christof
I was struggling with the previous version "white space" damage I think using the Prop BOE and BST but since rev1220727. 1800 all is well.
Also trying to use "download" with google docs inserts a bunch of garbage into the source (MAC OSX, FireFox 13) so I for one appreciate the zips. Probably just pebkac.
Appreciate all of your work, look forward to trying your bluetooth module.
I like the graphics. Are you going to include a source editor in your package?
I get tested the HC06 blue tooth module yesterday, I was able to use the terminal from two houses away.
Certainly that is my intention to add a source code editor since I can have VGA. So there will be support for keyboard, mouse, and SD card as well as audio etc. I want to make this as standalone as possible although most of my projects won't need all of these things. Since the 512x384 bitmap takes up a bit of memory, well most of it, I might have to allow for an enhanced text and graphics mode. For the moment even with the 512x384 bitmap I can simply let code overflow into where the lower video memory is and move the "start of video" pointer up accordingly. This means that junk from the ROM would be displayed at the bottom of the screen but I can blank this by setting the tiles for these to the same foreground and background colors.
Although it is possible to add extra chips to do more wonderful things with the video I'd rather concentrate on getting the most out of the Prop itself and also so that when Prop II pops up I'm ready to port and play!
At present I am working on making the user code persistent and autostarting with the backup to EEPROM.
Comments
There's nothing really difficult in getting the Prop to handle a couple of mega-baud but to get it to do it full-duplex or worse still, four full-duplex channels from the one cog is a huge ask. I dedicate a cog just to handle the receive and buffering side while the Tachyon VM cog will bit-bash the transmit side directly as it's faster to do it that way. So effectively you have Full-duplex without the timing problems that would normally arise from getting a cog to jumpret here and there and introducing jitter both in receive and transmit as is the case with most implementations. I've looked at the timing for a lot of these implementations and while they are fine at low speed they are way out, asymetrical, and jittery at high speeds.
The code I have on Google Docs is "live" all the time and will always be the most current version as it is the one I work off, if I make a change and someone has the document open then their document will reflect that change immediately. However to keep a few grumblers happy I have zipped up and attached the current version.
I am still a few more days away or so from having a fully functional system running, at present I need to finish off the whole text interpreting thing with it's number processing and compilation modes and then I will look at integrating SD and VGA etc.
TACHYON.zip
EDIT: Now tested and working at 3M baud!
Out of this almost 4K bytes there is another 800 bytes or so that I could reclaim again so that really brings the memory footprint closer to 3kB. Since Tachyon bytecode is so efficient this might only go up another few k when I add in the rest of the high-level functions.
Number prefixes include: $ for hex, % for binary, # for decimal while suffixes can be h for hex, d for decimal, and b for binary. As long as it begins or ends in a digit or an valid symbol you can use any other symbol mixed between the digits for instance:
#12:45 is decimal for 1245
%1100_0101 is binary for $C5
5_000_000d or #5,000,000 is what it looks like etc
BTW, a hex dump listing of the full 64K of memory takes 4 seconds including interpreting the text "CNT@ 0 $10000 DUMP CNT@ SWAP - #80,000 / DECIMAL ." which means at 56 characters/line and 4,096 lines that equates to 17.43us average time per character which also includes the 4us or so to send the character! I know that Spin would never ever keep up with that!
Next step now is to implement compilation mode with it's associated dictionary handlers, Hopefully I will have that done this weekend.
I didn't check, are you doing any error checking? Sal said something about a limit at which point it starts to get unstable, have you noticed anything like that?
At 3M baud the terminal screen updates blindingly fast and unless it's a long listing all it does is blink a new screen into existence! Since a cog is devoted to receive only it is not handicapped with trying to maintain buffering and timing and timing for the transmitter as well. To do this in Spin would require three cogs: one for Spin, one for Receive, one for Transmit. However in Tachyon I can transmit straight from Tachyon's cog and leave the other cog to handle the receive. I can probably go faster yet still but 3M works reliably.
BTW, I have updated the top post with a zipped attachment of the latest files which I will keep up-to-date with the latest "release" version whereas the on-line document will always be the live and bleeding edge.
The VGA_512x384_Bitmap object has now been incorporated into the source as there is sufficient memory available despite the fact that this object requires 24KB just for the bitmap. Being able to include objects such as these was one of the requirements of Tachyon which is also why it had to have a very small footprint despite the need for speed. Access to the object at present is via the color and pixel pointers but at a later stage I will write a Graphics demo object in Forth to showcase Tachyon.
I mentioned in the thread that Tachyon does not follow the traditional method of handling text input which normally would accumulate into a text input buffer (TIB) and after an <enter> would be processed word by word in an interpretive fashion. One of the drawbacks of this is that branch and loop stuctures which are meant to be compiled cannot be entered in an interpretive fashion and you just end up with the "compile only" warning. What I've been able to do is to skip the TIB and just parse words as they are encountered in the input stream and compile these immediately into the current code compilation location. When an <enter> is encountered the bytecodes are then executed which means that it is now possible to type in interactively such statements as:
20 1 DO CR I 0 DO 2A EMIT LOOP LOOP<enter>
without having to resort to creating a special word, invoking it, and then forgetting it (deleting). The interactve statements also behave exactly like they would inside a definition and at the same speed. Of course words such as IF and THEN etc are tagged as immediate words and are not compiled but instead execute immediately and control the compilation.
My next step which I am presently on is allowing new definitions to be created to which end I am testing the dictionary and vector table handlers. This stage of the testing will probably be completed in the next 24 hours.
Has anyone tried loading Tachyon onto a board yet? Please let me know if you had any problems. The same goes for viewing the on-line Google document source. BTW, I am using one of my Pixie boards at the moment so I can play with the VGA graphics especially when the Tachyon kernel is developed enough (soon) that I can just communicate to it via Bluetooth or ZigBee and interactively develop my applications.
' Spin: Increment 24kB of video memory byte by byte - takes 368ms with 50 bytes of code ' Tachyon: Increment 24kB of video memory byte by byte - takes 137ms with 32 bytes of code
AS I understand it, TrimSpin is a subset of spin. What kind of applications do you use it for?
For example, forth is often used for cases where we want drivers maybe eventually in assembler, but don't want to write the whole program in assembler. And/or we want to develop and test interactively.
Can you do everything spin can do using your subset? This is a cool idea.
-Phil
I'm not sure which applications would use it other than those that need to run just a little bit faster.
Peter, the 88 msec time is very impressive.
I think you are on the right track with having your own Spin interpreter and trimming the way you do. I assume that what can't fit in the cog but is needed is then implemented at a higher-level?
As I develop and test Tachyon further I keep optimizing the kernel. I just improved the erase and fill operations simply by creating a helper word with two PASM instructions so that I can now fill or erase the 24kB of memory in 34ms.
In the kernel I also had code for some fast single bytecode constants using mov X,#n and jmp #PUSHX instructions but now I have reduced this to one instruction per "entry" so that constants 0...8 take but 10 instructions total.
As of this moment the runtime compiler seems to be working fine although there are a lot of little things that need to be tidied up. Here is a little test code that I use to list the words in the dictionary. BTW, I have a whitespace after each line except the last to prevent execution of the line of code which normally is executed on a <CR>. I will tidy up little things like this in the process of testing and developing.
Here's that attachment anyway:TACHYON.zip
http://www.automation-course.com/branching-in-il/
which links from
http://en.wikipedia.org/wiki/Instruction_list
shown below, and coded that in TACHYON, what would the result look like ?
It seems the braces in the Instruction List carry an implied stack ?
The thing with Forth is that you can tailor the way it compiles to suit your style or application. In this case I have added or modified a couple of words which would eventually be part of the kernel anyway. Following this I set the number base to octal to allow the port numbering system to be used where the last digit I assume is from 0..7, so it's octal. So if I interpreted the instructions correctly they could look like this example where I have spread the statement over a few lines and named it as well.
STL doesn't look like it's using a stack from the programmers perspective at least, interally it may but with a lot of these compilers the syntax is very rigid.
I did find some more info that mentions stack here,
http://claymore.engineer.gvsu.edu/~jackh/books/plcs/chapters/plc_il.pdf
An important concept in this programming language is the stack. (Note: if you use a calculator
with RPN you are already familiar with this.)
and I think they mean the brackets are an inferred/hidden stack, not quite the explicit stack of forth.
For reference this is the code I downloaded:
I am very impressed by your direction implemeting forth. I like the byte code which is directly the adress. Simple and fast whow!
I have played just a little bit with tachyon. I can switch on a LED at port P7: 7 MASK OUTSET. I have not been able to do this with the LED on Port P23 of the demoboard: "23 MASK OUTSET".
I cannot see a number with two digits (22) on the stack using ^d. But if I use "." I can print it.
What is your idea of the system, when will it be complete for you? Will it be a self- hosted system with SD-card and keyboard support?
I understand that the stack in cog-ram is a key to a very fast implementation but I am a little bit frightend what ca be done with such a short stack?
Christof
I like the loop compiling from the input stream - very useful for testing/playing!
Thanks, I'm trying to keep it simple, both in terms of the kernel source and one step compilation plus in terms of just using it. Yes, I will be implementing full SD and keyboard drivers etc, either in Forth itself perhaps running in another cog or a stripped down version from the OBEX. The default number base is hex so that may be the problem then? I will update the binary to the latest version too. You can always force the number to decimal either as #23 or 23d and the same goes for forcing hex or binary as well as ASCII and control characters as in "#" and ^D.
Yes, the stack "seems" a little small but stack abuse is rife in a lot of Forth code and should really be reined in. In fact the return stack gets misused by many programs because they have no way of manipulating data easily on the data stack and when the return stack is not restored properly, then whamo, crash ..... not good. So you see I have the loop stack just for loops or even for temporary values and they can even be addressed easily too using the I,IX, and J type words etc. The other thing that helps is the use of the fixed register area which is used internally and also to handle four or more parameters. Eventually I will have local variables created from the stack comment ( -- ) and saved in these registers. Manipulating and juggling values on the data-stack is fine when there are only 2 or 3 values but over that it becomes annoying. Anyway my original plan was to try and keep it small and tidy on the stack but allow for a mechanism which could automatically come into action when the stack overflowed and start saving these values to hub memory but only when necessary which is obviously required to make the system robust (but slower during these exceptions).
BTW, when I did some testing and tweaking of my serial receive driver the other day I found out why it wouldn't work above 3M baud. It turns outs that Minicom is quite happy to issue the 3.5 and 4M baud commands to the Linux drivers but the FTDI chip just reverts to a slow speed instead. From the timing it looks like I could go faster again if only the PC could keep up !
great if this will be a full self hosting system!
_DECIMAL 22 MASK OUTSET ok
does not switch on the LED.
I downloaded the new version, but I still cannot see a 22d on the stack. Only after a DUP there are two entrys 16hex. See the terminal protocol below.
Perhaps you want to have a look at it, although I don't want to break your actual work.
Christof
Propeller .:.:--TACHYON--:.:. Forth V1.0 rev120727.1800
_22d ok
DATA STACK
01CE: 000001CE A55AA55A
01D0: 00000000 00000000 00000000 00000000
01D4: 00000000 00000000 00000000 00000000
01D8: 00000000 00000000
RETURN STACK
01DA: 00001007 00000CBB
01DC: 00000C60 00000ABC 00000AA2 00000A44
01E0: 000009DF 000009DF 000009D7 00000000
01E4: 00000000 00000000 00000000 00000000
LOOP STACK
01E8: 00000000 00000000 00000000 00000000
01EC: 00000000 00000000 00000000 00000000
REGISTERS
093C: 00 00 F0 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
094C: 00 00 00 00 00 00 00 00 00 00 00 00 24 00 00 00 ............$...
095C: 00 20 00 00 16 00 00 00 00 00 00 00 02 00 00 00 . ..............
096C: BC 09 00 00 00 00 E0 14 00 00 A4 01 A6 01 20 0D .............. .
097C: 01 10 10 32 64 00 00 00 00 00 00 00 00 00 00 00 ...2d...........
098C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
099C: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
09AC: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
COMPILATION AREA
01A4: 81 16 7C 5C 76 07 FF 5C 06 96 7F E8 0E 01 7C 5C ..|\v..\......|\
01B4: 76 07 FF 5C 0E 97 7F E8 AD 5F FF 5C 0E 01 7C 5C v..\....._.\..|\
01C4: 76 07 FF 5C 76 07 FF 5C 76 07 FF 5C 0E 01 7C 5C v..\v..\v..\..|\
01D4: 0E 9D 7F EC CE 97 BF A0 33 01 7C 5C CF 97 BF A0 ........3.|\....
DUP ok
_
DATA STACK
01CE: 000001CE 00000016
01D0: 00000016 A55AA55A 00000000 00000000
01D4: 00000000 00000000 00000000 00000000
01D8: 00000000 00000000
Here's a new video that shows a bit more detail that you should find interesting
as the cog outputs are "ored", this is not a problem of the hardware.
I had already tried rev120727.1800.
_22d MASK OUTSET ok works
_DECIMAL 22 MASK OUTSET ok does not work.
_DECIMAL ok
_22 MASK OUTSET ok does work
Christof
New youtube video showing some more graphics and scrolling as well as a little bit with the serial terminal speed
I will next try your forth for a little experiment. The interactive way and the speed should be just optimal for such things.
I want to compare a transistor amplifiier with a valve amplifier. I want to give them short burst signals like beep-pause-beep-pause... as input. This shall be done with the demoboard. Should be quite simple with your forth, but I am no forth guru at all....
Due to the difference of the output impedance I assume, that a difference of the loudspeaker input signal at the end of the beeps could be visible with an oscilloscope.
Thanks for sharing Christof
Also trying to use "download" with google docs inserts a bunch of garbage into the source (MAC OSX, FireFox 13) so I for one appreciate the zips. Probably just pebkac.
Appreciate all of your work, look forward to trying your bluetooth module.
I get tested the HC06 blue tooth module yesterday, I was able to use the terminal from two houses away.
Although it is possible to add extra chips to do more wonderful things with the video I'd rather concentrate on getting the most out of the Prop itself and also so that when Prop II pops up I'm ready to port and play!
At present I am working on making the user code persistent and autostarting with the backup to EEPROM.