Here are the reins and the saddle, I've come this far but surely you are not asking me to do it by myself all over again too? Putting aside "language" we should always consider Forth both a language and an environment or O/S even and just like I can make Forth look like an assembler I can also make it look like a Spin/Basic/C compiler if I wished. But Forth code maps one for one essentially so I have tight control over the code and for me it's a fun interactive and fast language to develop in. Spin/Basic/C are just languages with their very strict syntax and typing without the flexibility I myself need to have to stay creative. Yes, I can write in any language and like many of us we have had to code in many but I prefer the non-swearing kind that sweet talks back to me interactively and one that I can teach new tricks. I would like to think we have gone far beyond submitting our code in one great big "punch-card" blob and crossing our fingers.
Fair enough. However, I'm still not really sure what you're proposing. Are you asking that we all come over to the Forth side and work to improve Tachyon or are you asking us to write code generators for the Tachyon VM for C, C++, Spin, and whatever language we want to use? While I like the idea of using an interactive language, I'd rather use JavaScript or Python or something like that. If it isn't possible to do that on the Propeller, I'd prefer to use a processor where it is possible. I wouldn't mind using a hybrid approach of running the interactive language front end on something like an ARM but running the target code on a COG. I had hoped to do that with OBC's MicroMite Companion board but that plan was stalled when I learned that the MMC doesn't have an on-board way to program the PIC32.
I've been trying to follow this thread and have to admit that I don't get the point! The propeller is a micro-controller not a microprocessor. While it's very fast and capable at what it's intended to do I think it's a mistake to try to use it as a microprocessor! An i7 CPU is fast but don't think it would make a very good micro-controller.
Reading input pins and bringing output pins high and low are what a micro-controller is for! What else should it be doing? While I'm sure Forth is a great language and Tachyon is a great implementation I have to admit I don't know forth and at this point in my life I'm not all that interested in learning the language. Maybe 30 years ago I may have been interested but not now anymore. The programs I write for the prop aren't all that big and writing them on a pc and transferring the program to the prop board via usb/serial is completely acceptable to me.
I've been trying to follow this thread and have to admit that I don't get the point! The propeller is a micro-controller not a microprocessor. While it's very fast and capable at what it's intended to do I think it's a mistake to try to use it as a microprocessor! An i7 CPU is fast but don't think it would make a very good micro-controller.
Reading input pins and bringing output pins high and low are what a micro-controller is for! What else should it be doing? While I'm sure Forth is a great language and Tachyon is a great implementation I have to admit I don't know forth and at this point in my life I'm not all that interested in learning the language. Maybe 30 years ago I may have been interested but not now anymore. The programs I write for the prop aren't all that big and writing them on a pc and transferring the program to the prop board via usb/serial is completely acceptable to me.
Aren't we always learning? Isn't the Prop itself quite a bit of a learning curve in itself? I wasn't really interested in writing a Forth for the Prop myself, but I did it out of necessity. There's a certain point in life where we aren't interested in anything at all but by that time it's terminal. While we live, we breath, we learn.
BTW, I've always kept saying that the Prop is a microcontroller, not a microprocessor, so I'm not interested in stand-alone systems either, it's just the fact that we have the capability to implement whatever parts of it that we may so desire in an embedded product. I'm not interested in emulating some ancient computer I may have played with in my younger days, but some systems need networking, some displays and keyboards, some filesystems, or combinations of these etc.
Sure some of this stuff is beyond the scope of a microcontroller but if we can do it with the Prop simply which we can and without kludge upon kludge then that's a major bonus because it's the Prop we love.
@David et. al.
For those of you curious, but to busy to follow the links.
Here comes the current Tachyon Kernel Code (hope it's ok Peter ...)
If you have some PASM experience you should be able to read it -
if not - well - here you can learn it as I did.
and a teaser - sorry, some whitespace has gone ...
[code]
{ TACHYON VM - COG KERNEL (PASM) }
DAT
{{ Byte tokens directly address code in the first 256 longs of the cog. A two byte-code instruction XOP allows access to the second 256 longs Rather than a jump table most functions are shortor cascaded to optimize COG memory Larger fragments of code jump to the second half of the cog's memory. As a result of notusing a jump table (there's not enough memory) there are gaps in the bytecode values andnot all values are usable.
The formatted source has bytecode instruction labels as bold white on red background. }}
org0
RESETmovIP,PAR' Load the IP with the address of the first instruction as if it were an XOP
' position XOP here so that any search for an address of an XOP word returns with the correct cog address of $01xx ' Use next byte as an opcode that directly addresses top 256 words of cog XOPrdbyteinstr,IP' get next bytecode orinstr,#$100' shift range jmp#doNext+1' IP++, execute
{*** RUNTIME BYTECODE INTERPRETER *** }
'**** ' Fetch the next byte code instruction in hub RAM pointed to by the instruction pointer IP ' This is the very heart of the runtime interpreter ' doNEXTrdbyteinstr,IP'read byte code instruction addIP,#1 wc'advance IP to next byte token (clears the carry too!) jmpinstr'execute the code by directly indexing the first 256 long in cog
' Find the end of the string which could end in a null or any characeter >$7F ' this is also used to find the end of a larger text buffer ' STREND ( ptr -- ptr2 ) STREND fchlprdbyteR0,tos' read a byte subR0,#1' end is either a null or anything >$7F cmpR0,#$7E wc if_c addtos,#1 if_cjmp#fchlp jmpunext
' 0EXIT ( flg -- ) Exit if flg is false (or zero) Used in place of IF......THEN EXIT as false would just end up exiting ZEXITcall#POPX tjnzX,unext ' ' EXIT a bytecode definition by popping the top of the return stack into the IP EXITcall#RPOPX' Pop from return stack into X JUMPXmovIP,X' update IP _NOPjmpunext' continue
{*** STACK OPERATORS *** }
' DROP3 ( n1 n2 n3 -- ) Pop the top 3 items off the datastack and discard them (used mostly by cog kernel) DROP3call#POPX ' DROP2 ( n1 n2 -- ) Pop the top 2 items off the datastack and discard them DROP2call#POPX ' 1us execution time including bytecode read and execute ' DROP ( n1 -- ) Pop the top item off the datastack and discard it DROPcall#POPX jmpunext
' ?DUP ( n1 -- n1 n1 | 0 ) DUP n1 if non-zero QDUPtjztos,unext ' DUP ( n1 - n1 n1 ) Duplicate the top item on the stack DUPmovX,tos ' Read directly from the top of the data stack PUSHXcall#_PUSHX' Push the internal X register onto the datastack jmpunext
' OVER ( n1 n2 -- n1 n2 n1 ) OVERmovX,tos+1'read second data item and push jmp#PUSHX ' 3RD ( n1 n2 n3 -- n1 n2 n3 n1 ) Copy the 3rd item onto the stack THIRDmovX,tos+2' read third data item jmp#PUSHX ' 4TH ( n1 n2 n3 n4 -- n1 n2 n3 n4 n1 ) Copy the 4th item onto the stack FOURTHmovX,tos+3 jmp#PUSHX
' BOUNDS ( n1 n2 -- n2+n1 n1 ) == OVER + SWAP BOUNDSaddtos,tos+1 ' SWAP ( n1 n2 -- n2 n1 ) Swap the top two items SWAPmovX,tos+1 SWAPXmovtos+1,tos PUTXmovtos,X jmpunext ' ROT ( a b c -- b c a ) ROTmovX,tos+2 movtos+2,tos+1 jmp#SWAPX
{*** ARITHMETIC *** }
' - ( n1 n2 -- n3 ) Subtract n2 from n1 MINUSnegtos,tos' (note: save one long by negating and adding) ' + ( n1 n2 -- n3 ) Add top two stack items together and replace with result PLUSaddtos+1,tos jmp#DROP
' 1- ( n1 -- n1-1 ) DECtest$,#1 wc ' 1+ ( n1 -- n1+1 ) INCsumctos,#1 ' inc or dec depending upon carry (default cleared by doNEXT) jmpunext
' -NEGATE ( n1 sn -- n1 | -n1 ) negate n1 if the sign of sn is negative (used in signed divide op) MNEGATEshrtos,#31 ' ?NEGATE ( n1 flg -- n2 ) negate n1 if flg is true QNEGATEtjztos,#DROP call#POPX ' NEGATE ( n1 -- n2 ) equivalent to n2 = 0-n1 NEGATEnegtos,tos jmpunext
' u/mod ( u1 u2 -- remainder quotient) both remainder and quotient are 32 bit unsigned numbers UDIVMODcall#_UDIVMOD jmpunext
' 400ns execution time including bytecode read and execute ' INVERT ( n1 -- n2 ) bitwise invert n1 and replace with result n2 INVERTaddtos,#1 jmp#NEGATE { _BITStest$,#1 wc' set carry rclACC,tos andtos+1,ACC jmp#DROP } _ANDandtos+1,tos jmp#DROP _ANDNandntos+1,tos jmp#DROP _ORortos+1,tos jmp#DROP _XORxortos+1,tos jmp#DROP ' 1.2us execution time including bytecode read and execute ' SHR ( n1 cnt -- n2 ) Shift n1 right by count (5 lsbs ) _SHRshrtos+1,tos jmp#DROP _SHLshltos+1,tos jmp#DROP _ROLroltos+1,tos jmp#DROP _RORrortos+1,tos jmp#DROP
' 400ns execution time including bytecode read and execute ' 2/ ( n1 -- n1 ) shift n1 right one bit (equiv to divide by 2) _SHR1shrtos,#1 jmpunext '_SHL16shltos,#15 ' 2* ( n1 -- n2 ) shift n1 left one bit (equiv to multiply by 2) _SHL1shltos,#1 jmpunext
' REV ( n1 bits -- n2 ) Reverse LSBs of n1 and zero-extend _REVrevtos+1,tos jmp#DROP
' 400ns execution time including bytecode read and execute ' MASK ( bitpos -- bitmask \ only the lower 5 bits of bitpos are taken, regardless of the higher bits ) MASKmovX,tos movtos,#1 shltos,X jmpunext
' >N ( n -- nibble ) mask n to a nibble toNIBandtos,#$0F ' >B ( n -- nibble ) mask n to a byte toBYTEandtos,#$FF jmpunext
{*** COMPARISON *** }
' Basic instructions from which other comparison instructions are built from
' = ( n1 n2 -- flg ) true if n1 is equal to n2 EQsubtos+1,tos' n1 == 0 if equal call#POPX' drop n2 ' ' 0= ( n1 -- flg ) true if n1 equals 0 - same as a boolean NOT where TRUE becomes FALSE _NOT ZEQcmptos,#1 wc' kuroneko method, nice and neat SETZ subxtos, tos' a carry becomes -1, else 0 jmpunext
' C@++ ( caddr -- caddr+1 byte ) fetch byte character and increment address CFETCHINCmovX,tos' dup the address call#_PUSHX addtos+1,#1' inc the backup address ' C@ ( caddr -- byte ) Fetch a byte from hub memory CFETCHrdbytetos,tos jmpunext
' W@ ( waddr -- word ) Fetch a word from hub memory WFETCHrdwordtos,tos jmpunext
' @ ( addr -- long ) Fetch a long from hub memory FETCHrdlongtos,tos jmpunext
' C+! ( n caddr -- ) add n to byte at hub addr CPLUSSTrdbyteX,tos ' read in word from adress addtos+1,X ' add to contents of address - cascade ' C! ( n caddr -- ) store n to byte at addr CSTOREwrbytetos+1,tos' write the byte using address on the tos jmp#DROP2
' W+! ( n waddr -- ) add n to word at hub addr WPLUSSTrdwordX,tos ' read in word from address addtos+1,X ' W! ( n waddr -- ) store n to word at addr WSTOREwrwordtos+1,tos jmp#DROP2
' +! ( n addr -- ) add n to long at hub addr PLUSSTrdlongX,tos ' read in long from address addtos+1,X ' ! ( n addr -- ) store n to long at addr STOREwrlongtos+1,tos jmp#DROP2
' BIT! ( mask caddr state -- ) Set or clear bit(s) in hub byte 'BITcall#POPX 'tjzX,#CLR' carry clear, finalize ' SET ( mask caddr -- ) Set bit(s) in hub byte SETtest$,#1 wc' set the carry flag ' Finalize the bit operation by read/writing the result ' ( mask caddr -- ) CLRrdbyteX,tos' Read the contents of the memory location muxcX,tos+1' set or clear the bit(s) specd by mask wrbyteX,tos' update jmp#DROP2
{*** LITERALS *** }
' LITERALS are stored unaligned in big endian format which faciliates cascading byte reads to accumulate the full number
' 3.6us execution time including bytecode read and execute ' ( -- 32bits ) Push a 32-bit literal onto the datastack by reading in the next 4 bytes (non-aligned) _LONG PUSH4call#ACCBYTE ' read the next byte @IP++ and shift accumulate ' 3us execution time including bytecode read and execute ' ( -- 24bits ) Push a 24-bit literal onto the datastack by reading in the next 3 bytes (non-aligned) PUSH3call#ACCBYTE _WORD ' 2.4us execution time including bytecode read and execute ' ( -- 16bits) Push a 16-bit literal onto the datastack by reading in the next 2 bytes (non-aligned) PUSH2call#ACCBYTE ' 1.8us execution time including bytecode read and execute ' ( -- 8bits ) Push an 8-bit literal onto the datastack by reading in the next byte _BYTE PUSH1call#ACCBYTE PUSHACCcall#_PUSHACC ' Push the accumulator onto the stack then zero it jmpunext
{*** FAST CONSTANTS *** }
' Push a preset literal onto the stack using just one bytecode ' Use the "accumulator" to push the value which is built up by incrementing and/or decrementing ' There is a minor penalty for the larger constants but it's still faster and more compact ' overall than using the PUSH1 method or the mov X,# method
' 140606 just reordered to 1 4 2 3 according to BCA results ' 140603 new method to allow any value in any order, relies on carry being cleared in doNEXT and min will always set carry here BLif_ncminACC,#32+1 wc' 1.52us _16if_ncminACC,#16+1 wc _8if_ncminACC,#8+1 wc _4if_ncminACC,#4+1 wc _2if_ncminACC,#2+1 wc _1if_ncminACC,#1+1 wc _3if_ncminACC,#3+1 wc' bytecode analysis reveals 3 is used quite heavily _TRUE MINUS1subACC,#1 _FALSE _0jmp#PUSHACC' 1.12us
{*** CONSTANTS & VARIABLES *** }
' Constants and variables etc are standalone fragments preceded by an opcode then the parameters, either a long or the addess of the parameter field
' Long aligned constant - created with CONSTANT and already aligned CONL rdlongX,IP' get constant jmp#PUSHX_EXIT
' Byte aligned variables start with this single byte code which returns with the address of the byte variable following ' long variables just externally align this opcode a byte before the boundary ' INLINE: VARBmovX,IP PUSHX_EXITcall#_PUSHX' push address of variable jmp#EXIT
' OPCODE assumes that a long aligned long follows which contains the 32-bit opcode. OPCODErdlongopc,IP' read the long that follows (just like a constant) nop opcnop jmp#EXIT' return back to caller
{*** I/O ACCESS *** }
{ not used - removed to extensions using COG@ COG! ' P@ ( -- n1 ) Read the input port A (assume it is always A for Prop 1) PFETCHmovX,INA jmp#PUSHX ' P! ( n1 -- ) Store n1 to the output port A PSTOREmovOUTA,tos jmp#DROP
' STROBE ( iomask -- ) Generate a 100ns low pulse - pins must be preset as outputs (first up anyway) STROBEandnOUTA,tos ' strobe low jmp#OUTSET' release high (use jmp to add one extra cycle)
}
' CLOCK ( COGREG4=iomask ) Toggle multiple bits on the output) CLOCKxorOUTA,clockpins jmpunext
' OUTCLR ( iomask -- ) Clear multiple NUMBERbits on the output OUTCLRandnOUTA,tos jmp#OUTPUTS ' OUTMASK ( data iomask -- ) ' call#POPX andnOUTA,X' clear all iomask outputs ' OUTSET ( iomask -- ) Set multiple bits on the output OUTSETorOUTA,tos ' OUTPUTS ( iomask -- ) Set selected port pins to outputs OUTPUTSorDIRA,tos jmp#DROP
' INPUTS ( iomask -- ) Set selected port pins to inputs INPUTSandnDIRA,tos jmp#DROP
WAITHILO'waitpeqreg3,reg3' wait for a hi to lo - look for falling edge ' WAITPNE Wait until input is low - REG3 = mask, REG0 = CNT _WAITPNEwaitpnereg3,reg3' use COGREG3 as the mask movreg0,cnt' capture count in COGREG0 jmpunext
' WAITPEQWait until input is high - REG3 = mask, REG1 = CNT _WAITPEQwaitpeqreg3,reg3 movreg1,cnt' capture count in COGREG1 jmpunext
{*** SERIAL I/O OPERATORS *** }
{ To maximize the speed of I/O operations especially serial I/O such as ASYNCH, I2C and SPI etc there are special operators that avoid pushing and popping the stack and instead perform the I/O bit by bit and leave the latest shifted version of the data on the stack. }
' SHIFT from INPUT - Assembles with last bit received as msb - needs SHR to right justify if asynch data ' SHRINP ( iomask dat -- iomask dat/2 ) SHRINPtesttos+1,INA wc rcrtos,#1 jmpunext
{ SHIFT to OUT - Thisis optimized forwhen you are sending out multiple bits asin asynchronous serial data or I2C Shift data one bit right into output via iomask - leave mask & shifted data on stack (looping) 400ns execution time including bytecode read and execute or200ns/bit with REPS } ' SHROUT ( iomask dat -- iomask dat/2 ) SHROUTshrtos,#1 wc' Shift right and get lsb muxcOUTA,tos+1' reflect state to output jmpunext
@MJB: Pasting formatted Google docs doesn't play well with BBcode but It's just as easy to link this to the "published" webpage version which doesn't require any sign-in and although the formatting here is not 100% it's still 500% up on the pasted code
@MJB: Pasting formatted Google docs doesn't play well with BBcode but It's just as easy to link this to the "published" webpage version which doesn't require any sign-in and although the formatting here is not 100% it's still 500% up on the pasted code
I suppose you're talking about XMM and external memory. Most of the time I spent working on PropGCC was spent trying to get that to work well. I guess I wasted my time and should have adopted Forth instead.
I suppose you're talking about XMM and external memory. Most of the time I spent working on PropGCC was spent trying to get that to work well. I guess I wasted my time and should have adopted Forth instead.
Well David I wasn't really thinking of anything in particular and XMM didn't even cross my mind nor did I realize that you have spent considerable time on it. But adding external parallel data bus memory to the Prop's limited I/O only pins is a kludge as far as I am concerned. It's a bad design decision IMO as I would rather design with a ARM type chip with plenty of memory in the first place or team an ARM up with a Prop.
I don't think your work is ever wasted though, sometimes we have to go down a path to see what it is like. I suppose it's whether we just continue down the same path because we have come so far or step back as we have need to and reappraise our position and always be prepared to back-track if we have to as is so often the case As with cooking all that matters is the end result, how we almost didn't get there you don't share
Thanks for your feedback in the kitchen too! It's good to get some peer appraisals and observations to help in making decisions and adjustments, you always seem to try to be balanced and reasonable. What other stuff have you been doing yourself?
Well David I wasn't really thinking of anything in particular and XMM didn't even cross my mind nor did I realize that you have spent considerable time on it. But adding external parallel data bus memory to the Prop's limited I/O only pins is a kludge as far as I am concerned. It's a bad design decision IMO as I would rather design with a ARM type chip with plenty of memory in the first place or team an ARM up with a Prop.
I don't think your work is ever wasted though, sometimes we have to go down a path to see what it is like. I suppose it's whether we just continue down the same path because we have come so far or step back as we have need to and reappraise our position and always be prepared to back-track if we have to as is so often the case As with cooking all that matters is the end result, how we almost didn't get there you don't share
Thanks for your feedback in the kitchen too! It's good to get some peer appraisals and observations to help in making decisions and adjustments, you always seem to try to be balanced and reasonable. What other stuff have you been doing yourself?
I don't agree with wasting tons of pins on a parallel memory bus either. The best solution seems to be a SPI flash chip that only takes three pins if you don't need to share the SPI bus with any other peripherals or maybe four if you do. And three of those four pins can be used to talk to other devices so they're not really wasted. Running cached code out of SPI flash gives pretty good performance, certainly good enough for outer loop logic. You can always load a COG with PASM for stuff that needs higher performance or deterministic execution. It seems like a good compromise and it allows much larger programs to be run on the Propeller. Unfortunately, Parallax never really liked the idea either so it did end up being wasted time.
I don't see how adding external memory can be viewed as a kludge. Is the addition of an SD card a kludge also? Maybe it's only a kludge if you use more than 3 pins? I guess one person's kludge is another person's solution to a problem.
Peter can you restate your original post in a way that a dummy like me can understand. It sounds like you want us all to join together to develop a system that's something like an ARM system, but without the ARM chip. Also we're not allowed to use external memory either. So you want a group of Propellerheads to band together to develop an OS. The system would consist of a Prop, SD card, and VGA, keyboard and mouse interfaces. Is that correct? You say that the system would not be standalone, but it sounds like you're describing a standalone system.
You also said the system would not need to be based on Forth, is that correct? Since most people are not Forth programmers it would make sense that the user interface should not look or feel like Forth. It would be OK if the OS was Forth underneath the hood. However, that would limit the number of people that would contribute to the OS. Spin is probably the best choice for developing the OS since almost all Prop programmers know how to program in Spin.
I don't see how adding external memory can be viewed as a kludge. Is the addition of an SD card a kludge also? Maybe it's only a kludge if you use more than 3 pins? I guess one person's kludge is another person's solution to a problem.
Peter can you restate your original post in a way that a dummy like me can understand. It sounds like you want us all to join together to develop a system that's something like an ARM system, but without the ARM chip. Also we're not allowed to use external memory either. So you want a group of Propellerheads to band together to develop an OS. The system would consist of a Prop, SD card, and VGA, keyboard and mouse interfaces. Is that correct? You say that the system would not be standalone, but it sounds like you're describing a standalone system.
You also said the system would not need to be based on Forth, is that correct? Since most people are not Forth programmers it would make sense that the user interface should not look or feel like Forth. It would be OK if the OS was Forth underneath the hood. However, that would limit the number of people that would contribute to the OS. Spin is probably the best choice for developing the OS since almost all Prop programmers know how to program in Spin.
It would be interesting to see what the performance would be of a Spin compiler that generates code for the Tachyon VM both in execution speed and code size.
Aren't we always learning? Isn't the Prop itself quite a bit of a learning curve in itself? I wasn't really interested in writing a Forth for the Prop myself, but I did it out of necessity. There's a certain point in life where we aren't interested in anything at all but by that time it's terminal. While we live, we breath, we learn.
BTW, I've always kept saying that the Prop is a microcontroller, not a microprocessor, so I'm not interested in stand-alone systems either, it's just the fact that we have the capability to implement whatever parts of it that we may so desire in an embedded product. I'm not interested in emulating some ancient computer I may have played with in my younger days, but some systems need networking, some displays and keyboards, some filesystems, or combinations of these etc.
Sure some of this stuff is beyond the scope of a microcontroller but if we can do it with the Prop simply which we can and without kludge upon kludge then that's a major bonus because it's the Prop we love.
I'm still not clear about what it is that you're proposing be done? I also don't understand the point. On the one hand you complain that the propeller is slow with an insufficient amount of memory and want to load it down with an operating environment (OS). You also complain that the propeller doesn't have enough IO pins but want to tie up pins and force it to drive a keyboard and display.
I still haven't figured out what you're trying to do but I do know that I'm not interested in participating until I know a lot more about it.
I don't see how adding external memory can be viewed as a kludge. Is the addition of an SD card a kludge also? Maybe it's only a kludge if you use more than 3 pins? I guess one person's kludge is another person's solution to a problem.
Peter can you restate your original post in a way that a dummy like me can understand. It sounds like you want us all to join together to develop a system that's something like an ARM system, but without the ARM chip. Also we're not allowed to use external memory either. So you want a group of Propellerheads to band together to develop an OS. The system would consist of a Prop, SD card, and VGA, keyboard and mouse interfaces. Is that correct? You say that the system would not be standalone, but it sounds like you're describing a standalone system.
You also said the system would not need to be based on Forth, is that correct? Since most people are not Forth programmers it would make sense that the user interface should not look or feel like Forth. It would be OK if the OS was Forth underneath the hood. However, that would limit the number of people that would contribute to the OS. Spin is probably the best choice for developing the OS since almost all Prop programmers know how to program in Spin.
Look, I'm primarily a hardware guy more than a software guy, I design systems all the time and believe me having a 32 I/O microcontroller devote most of it's I/O pins to external parallel memory is a kludge, especially when the pins aren't even multiplexed as address/data bus. Sure you can do it but it's like trying to put bolt a jet engine on a Spitfire, although it's a great Prop plane, it is not a solution because it is still a problem, it needs to be done right.
Now you try to compare SPI memory as if it were the same thing but we are not expecting SPI memory to achieve the same speed as a full address/data bus memory if that were ever possible with P1, we simply want it for mass storage and if it can go faster then that's a good thing. But SPI or SD memory normally only takes around 4 I/O, that is never a problem considering the huge bonus mass storage can bring to a system that has need of it.
Obviously these threads can get long so once again I will make it quite clear, I am not advocating developing a stand-alone OS but I am simply stating that the Prop already has that capability so that we can use elements of this OS as we want in embedded applications. Most of my products are embedded controllers of various shapes and sizes, I don't have a need for a stand-alone system itself but many of my systems have a lot of development capability in them at no extra cost simply because the hardware is there supported by the software. That's a bonus.
I was quite happy with using Spin and PASM when I first used the Prop, I quite like it and I am very familiar with it. Despite being happy and familiar with it I couldn't make the Prop always do what I wanted without some uber-hardware perhaps. So this is reality and you have to face choices, if the Prop can't do it in Spin then I could easily use any variety of ARM with C (yes) and ASM etc. But now I thought to myself I like the Prop from a hardware point of view as well and I know I can make the Prop more capable in software, which I did. Is this a bad thing because some just can't adapt, or won't? I know I did the right thing, because it works.
So how then could you make a statement that Spin is probably the best choice? Perhaps for smaller and slower applications it is the best choice for those who know it already and can't change but I won't let myself be restrained by such arbitrary and artificial boundaries. The "to ARM or not to ARM" is the question I ask myself as I have so many choices in this department but I am loath to do so because I love the Prop and I am reluctant to not design with it when I know I can make it do what I want without kludges. Although I work quietly behind the scenes I have beaten off companies who are so eager to get me to design systems but they try to dictate how, not wise to go down that path.
To each his own but to those who want to empower themselves I say again, this is a call to arms, if you are serious then let's get serious, otherwise everyone will end up being scattered before what I see as the inevitable demise of P1 as a realistic choice for new designs (and new designers) if it continues with such fragmented and flawed "support". Rather than fluffing up the tools and languages and trying to mosey up to a perceived clientele, just make it work in the areas of interest and this will be proof enough that the chip is a good design choice. I've seen too many examples of this capability and that in many micros and the examples are just that, an example only, incomplete and full of holes, an unbelievable joke.
Now I can see I have made a fatal mistake by bringing up my misapprehensions and suggestions, this is a forum and one thing is certain on this forum, everyone loves talking but not much gets done with all this talking. I've proven that I can do this stuff I'm talking about, I've walked the walk and now I talk the talk, and I'm simply asking if anyone else is interested in being part of it in some way or fashion otherwise I don't know if I could be bothered to do much more in regard to the forum as it just complicates the workload, I'm not retired and it's not a hobby, although it is fun and I choose the work I choose.
I know Parallax are only interested in C in the hope that this can deliver the Prop but that's a lot of eggs in just one basket. P2 will have some design wins I am very sure but acceptance builds slowly which would have been fine if P2 were delivered in some form a few years ago, but now that window has been stretched very thin in a crowded, gloomy and competitive market with a product whose form and figure has not yet coalesced.
@4x5n: No, that's not what I said, you haven't read my post but fold your arms all you like, I'm not trying to make you interested, you just have to be already otherwise I'm not interested, I've got real work to do.
P.S. I am probably not going to devote any more time to this thread, it's far too much talking, repeating, and clarifying. I will just continue to finish the Prop projects I have already committed to.
It would be interesting to see what the performance would be of a Spin compiler that generates code for the Tachyon VM both in execution speed and code size.
In spinix/pfth I can create a executable Forth program that can be loaded just like any other program. However, the binary is self-compiled by pfth and the resulting binary file contains the entire pfth dictionary. It would be nice to produce a binary that only contains the words that are actually used to reduce the size of the executable image. I believe Tachyon uses a lookup table as part of its dictionary, so it may be possible to remove words from the table.
I dabbled with another Forth interpreter call Fast, where I used a cross-compiler to generate the Forth image. The cross compiler was written in C and it emulated the kernel words that were used in the Prop implementation. The advantage to the cross-compiler was that I didn't have to generate a boot program written using the BYTE, WORD and LONG keywords in a Spin DAT section. The cross-compiler was also able to optimize the generated Forth code by folding small words into a compiled word rather than calling the small words. For Fast this optimization produce smaller and faster code.
Its certainly possible to write a cross-compiler in Spin as well. However, if you're already running on the Prop you might as well compile directly using the Forth interpreter.
Look, I'm primarily a hardware guy more than a software guy, I design systems all the time and believe me having a 32 I/O microcontroller devote most of it's I/O pins to external parallel memory is a kludge, especially when the pins aren't even multiplexed as address/data bus. Sure you can do it but it's like trying to put bolt a jet engine on a Spitfire, although it's a great Prop plane, it is not a solution because it is still a problem, it needs to be done right.
Now you try to compare SPI memory as if it were the same thing but we are not expecting SPI memory to achieve the same speed as a full address/data bus memory if that were ever possible with P1, we simply want it for mass storage and if it can go faster then that's a good thing. But SPI or SD memory normally only takes around 4 I/O, that is never a problem considering the huge bonus mass storage can bring to a system that has need of it.
Obviously these threads can get long so once again I will make it quite clear, I am not advocating developing a stand-alone OS but I am simply stating that the Prop already has that capability so that we can use elements of this OS as we want in embedded applications. Most of my products are embedded controllers of various shapes and sizes, I don't have a need for a stand-alone system itself but many of my systems have a lot of development capability in them at no extra cost simply because the hardware is there supported by the software. That's a bonus.
I was quite happy with using Spin and PASM when I first used the Prop, I quite like it and I am very familiar with it. Despite being happy and familiar with it I couldn't make the Prop always do what I wanted without some uber-hardware perhaps. So this is reality and you have to face choices, if the Prop can't do it in Spin then I could easily use any variety of ARM with C (yes) and ASM etc. But now I thought to myself I like the Prop from a hardware point of view as well and I know I can make the Prop more capable in software, which I did. Is this a bad thing because some just can't adapt, or won't? I know I did the right thing, because it works.
So how then could you make a statement that Spin is probably the best choice? Perhaps for smaller and slower applications it is the best choice for those who know it already and can't change but I won't let myself be restrained by such arbitrary and artificial boundaries. The "to ARM or not to ARM" is the question I ask myself as I have so many choices in this department but I am loath to do so because I love the Prop and I am reluctant to not design with it when I know I can make it do what I want without kludges. Although I work quietly behind the scenes I have beaten off companies who are so eager to get me to design systems but they try to dictate how, not wise to go down that path.
To each his own but to those who want to empower themselves I say again, this is a call to arms, if you are serious then let's get serious, otherwise everyone will end up being scattered before what I see as the inevitable demise of P1 as a realistic choice for new designs (and new designers) if it continues with such fragmented and flawed "support". Rather than fluffing up the tools and languages and trying to mosey up to a perceived clientele, just make it work in the areas of interest and this will be proof enough that the chip is a good design choice. I've seen too many examples of this capability and that in many micros and the examples are just that, an example only, incomplete and full of holes, an unbelievable joke.
Now I can see I have made a fatal mistake by bringing up my misapprehensions and suggestions, this is a forum and one thing is certain on this forum, everyone loves talking but not much gets done with all this talking. I've proven that I can do this stuff I'm talking about, I've walked the walk and now I talk the talk, and I'm simply asking if anyone else is interested in being part of it in some way or fashion otherwise I don't know if I could be bothered to do much more in regard to the forum as it just complicates the workload, I'm not retired and it's not a hobby, although it is fun and I choose the work I choose.
I know Parallax are only interested in C in the hope that this can deliver the Prop but that's a lot of eggs in just one basket. P2 will have some design wins I am very sure but acceptance builds slowly which would have been fine if P2 were delivered in some form a few years ago, but now that window has been stretched very thin in a crowded, gloomy and competitive market with a product whose form and figure has not yet coalesced.
@4x5n: No, that's not what I said, you haven't read my post but fold your arms all you like, I'm not trying to make you interested, you just have to be already otherwise I'm not interested, I've got real work to do.
P.S. I am probably not going to devote any more time to this thread, it's far too much talking, repeating, and clarifying. I will just continue to finish the Prop projects I have already committed to.
Peter, the problem is that you still haven't clearly stated what your goal is. It sounds like you want to develop a tool-kit that can be used to build embedded applications. If that's the case that already exists in the OBEX. And when I said that Spin is the best language for this I really meant Spin/PASM. The reason that this is the obvious choice for a Prop tool-kit is that everyone speaks Spin/PASM. There's also a significant number of Prop programmers that use C, but that would leave out the other programmers that aren't proficient in C. There's only a very small percentage of the Prop developers that use or understand Forth, so I don't think Forth would be a good choice for a Prop tool-kit.
Beyond the tool-kit are you also suggesting that there should be a Prop standard for device drivers? Ross Higson proposed making the Catalina device driver structure a standard, but he couldn't get agreement on this. In my view it would be hard to get agreement on a device driver spec, or even a standard tool-kit. It seems that most of us suffer form the NIH syndrome, and we like the way we structure our own code. However, the OBEX is a very useful resource, and I have used many objects from it. Now all we need is a Gold-Standard OBEX. Oh wait, that's been tried and it failed miserably also.
Can I summarize your request by saying you want people to help you flesh out Tachyon Forth and also that you suggest that others should adopt it as their preferred development language because it makes the best use of the limited resources on the Propeller? That isn't unreasonable and if I had more time I might be interested. There seem to be lots of Forth enthusiasts here who undoubtedly will be interested. As far as other languages targeting the Tachyon VM goes, that will involve a fair amount of work for PropGCC so it would be worth doing some hand compilations to see if there is anything to be gained over CMM.
David, I misinterpreted you earlier comment about a Spin compiler for the Tachyon VM. I thought you were talking about a cross-compiler written in Spin that would compile Forth code. I now understand you were referring to a compiler that would compile Spin code to the Tachyon VM. This actually might result in good performance for Spin programs. I think Tachyon has a limited call stack, but maybe it could be increased. That's an interesting idea for anybody who has a lot of time on their hands.
Since the CMM VM is hand-tailored for the Gnu compiler I'm skeptical that the Tachyon VM would do any better. It also seems like it would take a lot of work to get the compiler to work with the Tachyon VM.
David, I misinterpreted you earlier comment about a Spin compiler for the Tachyon VM. I thought you were talking about a cross-compiler written in Spin that would compile Forth code. I now understand you were referring to a compiler that would compile Spin code to the Tachyon VM. This actually might result in good performance for Spin programs. I think Tachyon has a limited call stack, but maybe it could be increased. That's an interesting idea for anybody who has a lot of time on their hands.
Peter said in an earlier message that he had implemented a hybrid stack where some of it is in COG memory and it overflows into hub memory. This might work well with Spin. Of course, writing a new compiler is a big project and there are issues with optimizing Spin code as seen by some of the problems people have had with code translated by spin2cpp and then run through the optimizing PropGCC compiler.
I think the hybrid stack only applies to the data stack, and the return stack is in registers, but I may be wrong. I'm pretty sure that Tachyon doesn't allow for recursive calls.
I think the hybrid stack only applies to the data stack, and the return stack is in registers, but I may be wrong. I'm pretty sure that Tachyon doesn't allow for recursive calls.
The "natural" language(s) for the Propeller are Spin/PASM.
Why? Because the Spin byte code interpreter is built into the device itself and PASM is there for when you feel the need for speed. That is the way God intended. Well Chip anyway:)
I put the "(s)" there because in the normal world an HLL and assembler are different things. But in the Spin world they are beautifully and tightly integrated.
The LMM thing is a siren song. Yes it allows writing big PASM program. Yes that can be used by compilers. But performance sucks and code size is huge. Really, if you need that speed for big programs you are probably already better off looking elsewhere.
XMM is just an amplification of that siren song.
I will just continue to finish the Prop projects I have already committed to.
That is probably the best course of action. As is often said "build it and they will come". Or perhaps not. Either way you will have done what you wanted to do so all is good.
The "natural" language(s) for the Propeller are Spin/PASM.
Why? Because the Spin byte code interpreter is built into the device itself and PASM is there for when you feel the need for speed. That is the way God intended. Well Chip anyway:)
I put the "(s)" there because in the normal world an HLL and assembler are different things. But in the Spin world they are beautifully and tightly integrated.
The LMM thing is a siren song. Yes it allows writing big PASM program. Yes that can be used by compilers. But performance sucks and code size is huge. Really, if you need that speed for big programs you are probably already better off looking elsewhere.
XMM is just an amplification of that siren song.
Probably true for XMM although people are using LMM and CMM so they may have been worth the effort.
That is probably the best course of action. As is often said "build it and they will come". Or perhaps not. Either way you will have done what you wanted to do so all is good.
It seems to me that Peter already did "build it" and many came judging from the activity level on his Tachyon threads. I think he's now trying to win more of us over to his solution. However, if I have to use a non-standard language on the Propeller, I'd prefer to use Spin/PASM rather than Tachyon Forth as long as it performs well enough for my application. Of course, my applications mostly include the deprecated "LED blinking" experiments! :-)
I work with makers. They are doctors, farmers, musicians, machinists, fabric makers, woodworkers... all just trying to add small electro mechanical programmable devices to their projects. This group doesn't have the time/bandwidth for compile-link-download strict syntax procedural language methodologies that most of you cling to avidly as "the" approach. I need to sit them down and get them started with simple examples and simple output. The Tachyon / Forth console on a quickstart board is ideal. I'm not here to give them a lecture on computer science, just get the reward cycle started and get the smiles going so they can self motivate to do more! After they get some basic concepts they can do more i.e. procedural languages if they choose. Remember the audience.
I work with makers. They are doctors, farmers, musicians, machinists, fabric makers, woodworkers... all just trying to add small electro mechanical programmable devices to their projects. This group doesn't have the time/bandwidth for compile-link-download strict syntax procedural language methodologies that most of you cling to avidly as "the" approach. I need to sit them down and get them started with simple examples and simple output. The Tachyon / Forth console on a quickstart board is ideal. I'm not here to give them a lecture on computer science, just get the reward cycle started and get the smiles going so they can self motivate to do more! After they get some basic concepts they can do more i.e. procedural languages if they choose. Remember the audience.
Yes, an interactive language is very useful for prodding the hardware and getting some immediate feedback. I maintain that it doesn't have to be Forth. Any interactive language will do. However, it may be that Forth is the only powerful enough interactive language that will fit on an unexpanded Propeller.
...it may be that Forth is the only powerful enough interactive language that will fit on an unexpanded Propeller.
I do believe you have hit the nail on the head there. As far as I know there is no other technique that will do the Read Execute Print Loop (REPL) in such a small space.
D.P.
I do agree that instant results, cause to effect, are very desirable for introducing even just the idea of being able to program something. Never mind "computer science" which is not under discussion here.
It was certainly true that compilers were tedious and slow, not to mention complicated in the past.
That is why BASIC was invented. That is why JavaScript is so easy as an introductory programming language.
However I don't see that changing a few characters or lines of a Spin program and hitting the "go" button is significantly slower at arriving at the result than making the same change to a Forth program via it's REPL.
Does this REPL thing really make such a dramatic difference when the edit, compile, run cycle of a system like Spin is so quick anyway?
I do believe you have hit the nail on the head there. As far as I know there is no other technique that will do the Read Execute Print Loop (REPL) in such a small space.
D.P.
I do agree that instant results, cause to effect, are very desirable for introducing even just the idea of being able to program something. Never mind "computer science" which is not under discussion here.
It was certainly true that compilers were tedious and slow, not to mention complicated in the past.
That is why BASIC was invented. That is why JavaScript is so easy as an introductory programming language.
However I don't see that changing a few characters or lines of a Spin program and hitting the "go" button is significantly slower at arriving at the result than making the same change to a Forth program via it's REPL.
Does this REPL thing really make such a dramatic difference when the edit, compile, run cycle of a system like Spin is so quick anyway?
I think the advantage of REPL is that you can evaluate individual expressions. You don't have to surround them with a main() function like you do in Spin or C/C++. You don't have to always invoke printf or somthing like that to "see" the result. Of course, my interactive language of choice is Lisp. :-)
I work with makers. They are doctors, farmers, musicians, machinists, fabric makers, woodworkers... all just trying to add small electro mechanical programmable devices to their projects. This group doesn't have the time/bandwidth for compile-link-download strict syntax procedural language methodologies that most of you cling to avidly as "the" approach. I need to sit them down and get them started with simple examples and simple output. The Tachyon / Forth console on a quickstart board is ideal. I'm not here to give them a lecture on computer science, just get the reward cycle started and get the smiles going so they can self motivate to do more! After they get some basic concepts they can do more i.e. procedural languages if they choose. Remember the audience.
If you look at the controllers that the "makers" use you'll find that it's nearly exclusively the realm of Aurduino. My understanding is that Aurduino is programmed via C++. Doesn't that follow the "compile-link-download strict syntax" methodology? (I removed procedural since C++ isn't procedural but can be) For the record I'm not arguing for or against any particular language for the propeller (I do think we can all agree that trying to program a prop with a language like perl is silly though!).
Comments
I suppose it could always be shortened to ISKYWYS for a 32K system, and since it is pronounceable it could even be added to the language as a word.
Reading input pins and bringing output pins high and low are what a micro-controller is for! What else should it be doing? While I'm sure Forth is a great language and Tachyon is a great implementation I have to admit I don't know forth and at this point in my life I'm not all that interested in learning the language. Maybe 30 years ago I may have been interested but not now anymore. The programs I write for the prop aren't all that big and writing them on a pc and transferring the program to the prop board via usb/serial is completely acceptable to me.
Aren't we always learning? Isn't the Prop itself quite a bit of a learning curve in itself? I wasn't really interested in writing a Forth for the Prop myself, but I did it out of necessity. There's a certain point in life where we aren't interested in anything at all but by that time it's terminal. While we live, we breath, we learn.
BTW, I've always kept saying that the Prop is a microcontroller, not a microprocessor, so I'm not interested in stand-alone systems either, it's just the fact that we have the capability to implement whatever parts of it that we may so desire in an embedded product. I'm not interested in emulating some ancient computer I may have played with in my younger days, but some systems need networking, some displays and keyboards, some filesystems, or combinations of these etc.
Sure some of this stuff is beyond the scope of a microcontroller but if we can do it with the Prop simply which we can and without kludge upon kludge then that's a major bonus because it's the Prop we love.
For those of you curious, but to busy to follow the links.
Here comes the current Tachyon Kernel Code (hope it's ok Peter ...)
If you have some PASM experience you should be able to read it -
if not - well - here you can learn it as I did.
to large for Forum post size ... :-( so here a direct link
https://docs.google.com/document/d/1Hje4ZTnt2xUW-_uLINVm9g72t_uhxobezoUb9k7JTWU/pub
and a teaser - sorry, some whitespace has gone ...
[code]
DAT
{{
Byte tokens directly address code in the first 256 longs of the cog.
A two byte-code instruction XOP allows access to the second 256 longs
Rather than a jump table most functions are shortor cascaded to optimize COG memory
Larger fragments of code jump to the second half of the cog's memory.
As a result of notusing a jump table (there's not enough memory) there are gaps
in the bytecode values andnot all values are usable.
The formatted source has bytecode instruction labels as bold white on red background.
}}
org0
RESETmovIP,PAR' Load the IP with the address of the first instruction as if it were an XOP
' position XOP here so that any search for an address of an XOP word returns with the correct cog address of $01xx
' Use next byte as an opcode that directly addresses top 256 words of cog
XOPrdbyteinstr,IP' get next bytecode
orinstr,#$100' shift range
jmp#doNext+1' IP++, execute
{*** RUNTIME BYTECODE INTERPRETER *** }
'****
' Fetch the next byte code instruction in hub RAM pointed to by the instruction pointer IP
' This is the very heart of the runtime interpreter
'
doNEXTrdbyteinstr,IP'read byte code instruction
addIP,#1 wc'advance IP to next byte token (clears the carry too!)
jmpinstr'execute the code by directly indexing the first 256 long in cog
' Find the end of the string which could end in a null or any characeter >$7F
' this is also used to find the end of a larger text buffer
' STREND ( ptr -- ptr2 )
STREND
fchlprdbyteR0,tos' read a byte
subR0,#1' end is either a null or anything >$7F
cmpR0,#$7E wc
if_c addtos,#1
if_cjmp#fchlp
jmpunext
' 0EXIT ( flg -- ) Exit if flg is false (or zero) Used in place of IF......THEN EXIT as false would just end up exiting
ZEXITcall#POPX
tjnzX,unext
'
' EXIT a bytecode definition by popping the top of the return stack into the IP
EXITcall#RPOPX' Pop from return stack into X
JUMPXmovIP,X' update IP
_NOPjmpunext' continue
{*** STACK OPERATORS *** }
' DROP3 ( n1 n2 n3 -- ) Pop the top 3 items off the datastack and discard them (used mostly by cog kernel)
DROP3call#POPX
' DROP2 ( n1 n2 -- ) Pop the top 2 items off the datastack and discard them
DROP2call#POPX
' 1us execution time including bytecode read and execute
' DROP ( n1 -- ) Pop the top item off the datastack and discard it
DROPcall#POPX
jmpunext
' ?DUP ( n1 -- n1 n1 | 0 ) DUP n1 if non-zero
QDUPtjztos,unext
' DUP ( n1 - n1 n1 ) Duplicate the top item on the stack
DUPmovX,tos ' Read directly from the top of the data stack
PUSHXcall#_PUSHX' Push the internal X register onto the datastack
jmpunext
' OVER ( n1 n2 -- n1 n2 n1 )
OVERmovX,tos+1'read second data item and push
jmp#PUSHX
' 3RD ( n1 n2 n3 -- n1 n2 n3 n1 ) Copy the 3rd item onto the stack
THIRDmovX,tos+2' read third data item
jmp#PUSHX
' 4TH ( n1 n2 n3 n4 -- n1 n2 n3 n4 n1 ) Copy the 4th item onto the stack
FOURTHmovX,tos+3
jmp#PUSHX
' BOUNDS ( n1 n2 -- n2+n1 n1 ) == OVER + SWAP
BOUNDSaddtos,tos+1
' SWAP ( n1 n2 -- n2 n1 ) Swap the top two items
SWAPmovX,tos+1
SWAPXmovtos+1,tos
PUTXmovtos,X
jmpunext
' ROT ( a b c -- b c a )
ROTmovX,tos+2
movtos+2,tos+1
jmp#SWAPX
{*** ARITHMETIC *** }
' - ( n1 n2 -- n3 ) Subtract n2 from n1
MINUSnegtos,tos' (note: save one long by negating and adding)
' + ( n1 n2 -- n3 ) Add top two stack items together and replace with result
PLUSaddtos+1,tos
jmp#DROP
' 1- ( n1 -- n1-1 )
DECtest$,#1 wc
' 1+ ( n1 -- n1+1 )
INCsumctos,#1 ' inc or dec depending upon carry (default cleared by doNEXT)
jmpunext
' -NEGATE ( n1 sn -- n1 | -n1 ) negate n1 if the sign of sn is negative (used in signed divide op)
MNEGATEshrtos,#31
' ?NEGATE ( n1 flg -- n2 ) negate n1 if flg is true
QNEGATEtjztos,#DROP
call#POPX
' NEGATE ( n1 -- n2 ) equivalent to n2 = 0-n1
NEGATEnegtos,tos
jmpunext
' u/mod ( u1 u2 -- remainder quotient) both remainder and quotient are 32 bit unsigned numbers
UDIVMODcall#_UDIVMOD
jmpunext
' U/ ( n1 n2 -- n3 ) unsigned divide
UDIVIDEcall#_UDIVMOD
NIPmovtos+1,tos
jmp#DROP
{*** BOOLEAN *** }
' 400ns execution time including bytecode read and execute
' INVERT ( n1 -- n2 ) bitwise invert n1 and replace with result n2
INVERTaddtos,#1
jmp#NEGATE
{
_BITStest$,#1 wc' set carry
rclACC,tos
andtos+1,ACC
jmp#DROP
}
_ANDandtos+1,tos
jmp#DROP
_ANDNandntos+1,tos
jmp#DROP
_ORortos+1,tos
jmp#DROP
_XORxortos+1,tos
jmp#DROP
' 1.2us execution time including bytecode read and execute
' SHR ( n1 cnt -- n2 ) Shift n1 right by count (5 lsbs )
_SHRshrtos+1,tos
jmp#DROP
_SHLshltos+1,tos
jmp#DROP
_ROLroltos+1,tos
jmp#DROP
_RORrortos+1,tos
jmp#DROP
' 400ns execution time including bytecode read and execute
' 2/ ( n1 -- n1 ) shift n1 right one bit (equiv to divide by 2)
_SHR1shrtos,#1
jmpunext
'_SHL16shltos,#15
' 2* ( n1 -- n2 ) shift n1 left one bit (equiv to multiply by 2)
_SHL1shltos,#1
jmpunext
' REV ( n1 bits -- n2 ) Reverse LSBs of n1 and zero-extend
_REVrevtos+1,tos
jmp#DROP
' 400ns execution time including bytecode read and execute
' MASK ( bitpos -- bitmask \ only the lower 5 bits of bitpos are taken, regardless of the higher bits )
MASKmovX,tos
movtos,#1
shltos,X
jmpunext
' >N ( n -- nibble ) mask n to a nibble
toNIBandtos,#$0F
' >B ( n -- nibble ) mask n to a byte
toBYTEandtos,#$FF
jmpunext
{*** COMPARISON *** }
' Basic instructions from which other comparison instructions are built from
' = ( n1 n2 -- flg ) true if n1 is equal to n2
EQsubtos+1,tos' n1 == 0 if equal
call#POPX' drop n2
'
' 0= ( n1 -- flg ) true if n1 equals 0 - same as a boolean NOT where TRUE becomes FALSE
_NOT
ZEQcmptos,#1 wc' kuroneko method, nice and neat
SETZ subxtos, tos' a carry becomes -1, else 0
jmpunext
' > ( n1 n2 -- flg ) true if n1 > n2
GTcmpstos,tos+1 wc' n1 > n2: carry set
subxtos+1,tos+1
jmp#DROP
{*** MEMORY *** }
' C@++ ( caddr -- caddr+1 byte ) fetch byte character and increment address
CFETCHINCmovX,tos' dup the address
call#_PUSHX
addtos+1,#1' inc the backup address
' C@ ( caddr -- byte ) Fetch a byte from hub memory
CFETCHrdbytetos,tos
jmpunext
' W@ ( waddr -- word ) Fetch a word from hub memory
WFETCHrdwordtos,tos
jmpunext
' @ ( addr -- long ) Fetch a long from hub memory
FETCHrdlongtos,tos
jmpunext
' C+! ( n caddr -- ) add n to byte at hub addr
CPLUSSTrdbyteX,tos ' read in word from adress
addtos+1,X ' add to contents of address - cascade
' C! ( n caddr -- ) store n to byte at addr
CSTOREwrbytetos+1,tos' write the byte using address on the tos
jmp#DROP2
' W+! ( n waddr -- ) add n to word at hub addr
WPLUSSTrdwordX,tos ' read in word from address
addtos+1,X
' W! ( n waddr -- ) store n to word at addr
WSTOREwrwordtos+1,tos
jmp#DROP2
' +! ( n addr -- ) add n to long at hub addr
PLUSSTrdlongX,tos ' read in long from address
addtos+1,X
' ! ( n addr -- ) store n to long at addr
STOREwrlongtos+1,tos
jmp#DROP2
' BIT! ( mask caddr state -- ) Set or clear bit(s) in hub byte
'BITcall#POPX
'tjzX,#CLR' carry clear, finalize
' SET ( mask caddr -- ) Set bit(s) in hub byte
SETtest$,#1 wc' set the carry flag
' Finalize the bit operation by read/writing the result
' ( mask caddr -- )
CLRrdbyteX,tos' Read the contents of the memory location
muxcX,tos+1' set or clear the bit(s) specd by mask
wrbyteX,tos' update
jmp#DROP2
{*** LITERALS *** }
' LITERALS are stored unaligned in big endian format which faciliates cascading byte reads to accumulate the full number
' 3.6us execution time including bytecode read and execute
' ( -- 32bits ) Push a 32-bit literal onto the datastack by reading in the next 4 bytes (non-aligned)
_LONG
PUSH4call#ACCBYTE ' read the next byte @IP++ and shift accumulate
' 3us execution time including bytecode read and execute
' ( -- 24bits ) Push a 24-bit literal onto the datastack by reading in the next 3 bytes (non-aligned)
PUSH3call#ACCBYTE
_WORD
' 2.4us execution time including bytecode read and execute
' ( -- 16bits) Push a 16-bit literal onto the datastack by reading in the next 2 bytes (non-aligned)
PUSH2call#ACCBYTE
' 1.8us execution time including bytecode read and execute
' ( -- 8bits ) Push an 8-bit literal onto the datastack by reading in the next byte
_BYTE
PUSH1call#ACCBYTE
PUSHACCcall#_PUSHACC ' Push the accumulator onto the stack then zero it
jmpunext
{*** FAST CONSTANTS *** }
' Push a preset literal onto the stack using just one bytecode
' Use the "accumulator" to push the value which is built up by incrementing and/or decrementing
' There is a minor penalty for the larger constants but it's still faster and more compact
' overall than using the PUSH1 method or the mov X,# method
' 140606 just reordered to 1 4 2 3 according to BCA results
' 140603 new method to allow any value in any order, relies on carry being cleared in doNEXT and min will always set carry here
BLif_ncminACC,#32+1 wc' 1.52us
_16if_ncminACC,#16+1 wc
_8if_ncminACC,#8+1 wc
_4if_ncminACC,#4+1 wc
_2if_ncminACC,#2+1 wc
_1if_ncminACC,#1+1 wc
_3if_ncminACC,#3+1 wc' bytecode analysis reveals 3 is used quite heavily
_TRUE
MINUS1subACC,#1
_FALSE
_0jmp#PUSHACC' 1.12us
{*** CONSTANTS & VARIABLES *** }
' Constants and variables etc are standalone fragments preceded by an opcode then the parameters, either a long or the addess of the parameter field
' Long aligned constant - created with CONSTANT and already aligned
CONL
rdlongX,IP' get constant
jmp#PUSHX_EXIT
' Byte aligned variables start with this single byte code which returns with the address of the byte variable following
' long variables just externally align this opcode a byte before the boundary
' INLINE:
VARBmovX,IP
PUSHX_EXITcall#_PUSHX' push address of variable
jmp#EXIT
' OPCODE assumes that a long aligned long follows which contains the 32-bit opcode.
OPCODErdlongopc,IP' read the long that follows (just like a constant)
nop
opcnop
jmp#EXIT' return back to caller
{*** I/O ACCESS *** }
{ not used - removed to extensions using COG@ COG!
' P@ ( -- n1 ) Read the input port A (assume it is always A for Prop 1)
PFETCHmovX,INA
jmp#PUSHX
' P! ( n1 -- ) Store n1 to the output port A
PSTOREmovOUTA,tos
jmp#DROP
' STROBE ( iomask -- ) Generate a 100ns low pulse - pins must be preset as outputs (first up anyway)
STROBEandnOUTA,tos ' strobe low
jmp#OUTSET' release high (use jmp to add one extra cycle)
}
' CLOCK ( COGREG4=iomask ) Toggle multiple bits on the output)
CLOCKxorOUTA,clockpins
jmpunext
' OUTCLR ( iomask -- ) Clear multiple NUMBERbits on the output
OUTCLRandnOUTA,tos
jmp#OUTPUTS
' OUTMASK ( data iomask -- )
' call#POPX
andnOUTA,X' clear all iomask outputs
' OUTSET ( iomask -- ) Set multiple bits on the output
OUTSETorOUTA,tos
' OUTPUTS ( iomask -- ) Set selected port pins to outputs
OUTPUTSorDIRA,tos
jmp#DROP
' INPUTS ( iomask -- ) Set selected port pins to inputs
INPUTSandnDIRA,tos
jmp#DROP
WAITHILO'waitpeqreg3,reg3' wait for a hi to lo - look for falling edge
' WAITPNE Wait until input is low - REG3 = mask, REG0 = CNT
_WAITPNEwaitpnereg3,reg3' use COGREG3 as the mask
movreg0,cnt' capture count in COGREG0
jmpunext
' WAITPEQWait until input is high - REG3 = mask, REG1 = CNT
_WAITPEQwaitpeqreg3,reg3
movreg1,cnt' capture count in COGREG1
jmpunext
{*** SERIAL I/O OPERATORS *** }
{
To maximize the speed of I/O operations especially serial I/O such as ASYNCH, I2C and SPI etc there are special
operators that avoid pushing and popping the stack and instead perform the I/O bit by bit and leave the
latest shifted version of the data on the stack.
}
' SHIFT from INPUT - Assembles with last bit received as msb - needs SHR to right justify if asynch data
' SHRINP ( iomask dat -- iomask dat/2 )
SHRINPtesttos+1,INA wc
rcrtos,#1
jmpunext
{ SHIFT to OUT -
Thisis optimized forwhen you are sending out multiple bits asin asynchronous serial data or I2C
Shift data one bit right into output via iomask - leave mask & shifted data on stack (looping)
400ns execution time including bytecode read and execute or200ns/bit with REPS }
' SHROUT ( iomask dat -- iomask dat/2 )
SHROUTshrtos,#1 wc' Shift right and get lsb
muxcOUTA,tos+1' reflect state to output
jmpunext
' SPI INSTRUCT
Format is from SPIN
the BOLD names are the Forth words corresponding to the KERNEL bytecodes
Or how about the pdf of it?
TACHYONForthV2.4.pdf
Well David I wasn't really thinking of anything in particular and XMM didn't even cross my mind nor did I realize that you have spent considerable time on it. But adding external parallel data bus memory to the Prop's limited I/O only pins is a kludge as far as I am concerned. It's a bad design decision IMO as I would rather design with a ARM type chip with plenty of memory in the first place or team an ARM up with a Prop.
I don't think your work is ever wasted though, sometimes we have to go down a path to see what it is like. I suppose it's whether we just continue down the same path because we have come so far or step back as we have need to and reappraise our position and always be prepared to back-track if we have to as is so often the case As with cooking all that matters is the end result, how we almost didn't get there you don't share
Thanks for your feedback in the kitchen too! It's good to get some peer appraisals and observations to help in making decisions and adjustments, you always seem to try to be balanced and reasonable. What other stuff have you been doing yourself?
Peter can you restate your original post in a way that a dummy like me can understand. It sounds like you want us all to join together to develop a system that's something like an ARM system, but without the ARM chip. Also we're not allowed to use external memory either. So you want a group of Propellerheads to band together to develop an OS. The system would consist of a Prop, SD card, and VGA, keyboard and mouse interfaces. Is that correct? You say that the system would not be standalone, but it sounds like you're describing a standalone system.
You also said the system would not need to be based on Forth, is that correct? Since most people are not Forth programmers it would make sense that the user interface should not look or feel like Forth. It would be OK if the OS was Forth underneath the hood. However, that would limit the number of people that would contribute to the OS. Spin is probably the best choice for developing the OS since almost all Prop programmers know how to program in Spin.
I'm still not clear about what it is that you're proposing be done? I also don't understand the point. On the one hand you complain that the propeller is slow with an insufficient amount of memory and want to load it down with an operating environment (OS). You also complain that the propeller doesn't have enough IO pins but want to tie up pins and force it to drive a keyboard and display.
I still haven't figured out what you're trying to do but I do know that I'm not interested in participating until I know a lot more about it.
Look, I'm primarily a hardware guy more than a software guy, I design systems all the time and believe me having a 32 I/O microcontroller devote most of it's I/O pins to external parallel memory is a kludge, especially when the pins aren't even multiplexed as address/data bus. Sure you can do it but it's like trying to put bolt a jet engine on a Spitfire, although it's a great Prop plane, it is not a solution because it is still a problem, it needs to be done right.
Now you try to compare SPI memory as if it were the same thing but we are not expecting SPI memory to achieve the same speed as a full address/data bus memory if that were ever possible with P1, we simply want it for mass storage and if it can go faster then that's a good thing. But SPI or SD memory normally only takes around 4 I/O, that is never a problem considering the huge bonus mass storage can bring to a system that has need of it.
Obviously these threads can get long so once again I will make it quite clear, I am not advocating developing a stand-alone OS but I am simply stating that the Prop already has that capability so that we can use elements of this OS as we want in embedded applications. Most of my products are embedded controllers of various shapes and sizes, I don't have a need for a stand-alone system itself but many of my systems have a lot of development capability in them at no extra cost simply because the hardware is there supported by the software. That's a bonus.
I was quite happy with using Spin and PASM when I first used the Prop, I quite like it and I am very familiar with it. Despite being happy and familiar with it I couldn't make the Prop always do what I wanted without some uber-hardware perhaps. So this is reality and you have to face choices, if the Prop can't do it in Spin then I could easily use any variety of ARM with C (yes) and ASM etc. But now I thought to myself I like the Prop from a hardware point of view as well and I know I can make the Prop more capable in software, which I did. Is this a bad thing because some just can't adapt, or won't? I know I did the right thing, because it works.
So how then could you make a statement that Spin is probably the best choice? Perhaps for smaller and slower applications it is the best choice for those who know it already and can't change but I won't let myself be restrained by such arbitrary and artificial boundaries. The "to ARM or not to ARM" is the question I ask myself as I have so many choices in this department but I am loath to do so because I love the Prop and I am reluctant to not design with it when I know I can make it do what I want without kludges. Although I work quietly behind the scenes I have beaten off companies who are so eager to get me to design systems but they try to dictate how, not wise to go down that path.
To each his own but to those who want to empower themselves I say again, this is a call to arms, if you are serious then let's get serious, otherwise everyone will end up being scattered before what I see as the inevitable demise of P1 as a realistic choice for new designs (and new designers) if it continues with such fragmented and flawed "support". Rather than fluffing up the tools and languages and trying to mosey up to a perceived clientele, just make it work in the areas of interest and this will be proof enough that the chip is a good design choice. I've seen too many examples of this capability and that in many micros and the examples are just that, an example only, incomplete and full of holes, an unbelievable joke.
Now I can see I have made a fatal mistake by bringing up my misapprehensions and suggestions, this is a forum and one thing is certain on this forum, everyone loves talking but not much gets done with all this talking. I've proven that I can do this stuff I'm talking about, I've walked the walk and now I talk the talk, and I'm simply asking if anyone else is interested in being part of it in some way or fashion otherwise I don't know if I could be bothered to do much more in regard to the forum as it just complicates the workload, I'm not retired and it's not a hobby, although it is fun and I choose the work I choose.
I know Parallax are only interested in C in the hope that this can deliver the Prop but that's a lot of eggs in just one basket. P2 will have some design wins I am very sure but acceptance builds slowly which would have been fine if P2 were delivered in some form a few years ago, but now that window has been stretched very thin in a crowded, gloomy and competitive market with a product whose form and figure has not yet coalesced.
@4x5n: No, that's not what I said, you haven't read my post but fold your arms all you like, I'm not trying to make you interested, you just have to be already otherwise I'm not interested, I've got real work to do.
P.S. I am probably not going to devote any more time to this thread, it's far too much talking, repeating, and clarifying. I will just continue to finish the Prop projects I have already committed to.
I dabbled with another Forth interpreter call Fast, where I used a cross-compiler to generate the Forth image. The cross compiler was written in C and it emulated the kernel words that were used in the Prop implementation. The advantage to the cross-compiler was that I didn't have to generate a boot program written using the BYTE, WORD and LONG keywords in a Spin DAT section. The cross-compiler was also able to optimize the generated Forth code by folding small words into a compiled word rather than calling the small words. For Fast this optimization produce smaller and faster code.
Its certainly possible to write a cross-compiler in Spin as well. However, if you're already running on the Prop you might as well compile directly using the Forth interpreter.
Beyond the tool-kit are you also suggesting that there should be a Prop standard for device drivers? Ross Higson proposed making the Catalina device driver structure a standard, but he couldn't get agreement on this. In my view it would be hard to get agreement on a device driver spec, or even a standard tool-kit. It seems that most of us suffer form the NIH syndrome, and we like the way we structure our own code. However, the OBEX is a very useful resource, and I have used many objects from it. Now all we need is a Gold-Standard OBEX. Oh wait, that's been tried and it failed miserably also.
Since the CMM VM is hand-tailored for the Gnu compiler I'm skeptical that the Tachyon VM would do any better. It also seems like it would take a lot of work to get the compiler to work with the Tachyon VM.
I kind of agree.
The "natural" language(s) for the Propeller are Spin/PASM.
Why? Because the Spin byte code interpreter is built into the device itself and PASM is there for when you feel the need for speed. That is the way God intended. Well Chip anyway:)
I put the "(s)" there because in the normal world an HLL and assembler are different things. But in the Spin world they are beautifully and tightly integrated.
The LMM thing is a siren song. Yes it allows writing big PASM program. Yes that can be used by compilers. But performance sucks and code size is huge. Really, if you need that speed for big programs you are probably already better off looking elsewhere.
XMM is just an amplification of that siren song. That is probably the best course of action. As is often said "build it and they will come". Or perhaps not. Either way you will have done what you wanted to do so all is good.
D.P.
I do agree that instant results, cause to effect, are very desirable for introducing even just the idea of being able to program something. Never mind "computer science" which is not under discussion here.
It was certainly true that compilers were tedious and slow, not to mention complicated in the past.
That is why BASIC was invented. That is why JavaScript is so easy as an introductory programming language.
However I don't see that changing a few characters or lines of a Spin program and hitting the "go" button is significantly slower at arriving at the result than making the same change to a Forth program via it's REPL.
Does this REPL thing really make such a dramatic difference when the edit, compile, run cycle of a system like Spin is so quick anyway?
If you look at the controllers that the "makers" use you'll find that it's nearly exclusively the realm of Aurduino. My understanding is that Aurduino is programmed via C++. Doesn't that follow the "compile-link-download strict syntax" methodology? (I removed procedural since C++ isn't procedural but can be) For the record I'm not arguing for or against any particular language for the propeller (I do think we can all agree that trying to program a prop with a language like perl is silly though!).