Some thoughts about how I might make a QBASIC emulator fit in the Propeller RAM
Dennis Ferron
Posts: 480
Anyone thinking about QBASIC on the Propeller?
I'll take Nibbles as the canonical example of the kind of QBASIC program I want to be able to run.
I think in terms of variables alone, Nibbles the program doesn't need more than about 5 to 10K of RAM.
But Nibbles the source code is 24KB. Even if I keep my entire emulator program down to 8K, we have already filled all 32K available memory.
Not only does this not give any room for program variables, you also need to have space for the tokenized version of the program.
I could force you to tokenize the programs on the PC and download only the tokenized version to the Propeller, but in my opinion the main novelty of a Propeller QBASIC would be the ability to actually edit the program on the Propeller too.
So how do you get room for the emulator and the source code and the tokenized code and the program? Well, what if we didn't really store the source code at all, but just let the user think they can see the source code. What you would really be seeing when you look at the code is a decompressed portion of the tokenized version of the program because the IDE would use a "dynamic detokenizer" to generate the editor display. The only line of real text would be the one you're currently typing, which gets tokenized as soon as you press enter.
You're going to lose your descriptive variable names when you go through the tokenization process, so the dynamic detokenizer would need to keep tables mapping variable names to token numbers.
Comments are an interesting case. I suspect most of the size of the nibbles program is comments. We could strip them out but it would really suck to have to work in an IDE that didn't let you have comments. We could make a few compromises though. Trim excess whitespace in comments. Oh and here's a big optimization we could do: What kind of data is a comment? (English) words. Words get repeated. Often. We could tokenize the comments! The best part is that we wouldn't need much extra code to do this because you could reuse much of the utility functions that you write for tokenizing code. But we would store the tokenized comments separately from the tokenized code, so that the execution engine doesn't worry about them. We would need a table to correlate comments to line numbers to recombine the program token stream with the comment token stream.
All that might save significant space, or the overhead might take up more room than it saves. Even if I end up having to use an external EEPROM to make this work (scratch that, I am DEFINITELY going to need the space of an external EEPROM) I would probably still want to implement the dynamic detokenization. When coding, you type maybe 100 words a minute max, but with long pauses for thought too. Typing speed is not the bottleneck when programming. On the other hand, the real bottleneck is waiting for the program to compile/tokenize/download, etc. The faster and easier it is to run your program, the more often you can get feedback and the faster your development goes. So if we optimize for that (program stored in tokenized form all the time, instantly ready to run) at the expense of maybe needing more clock cycles to reconstitute the text for editing, it's worth it on a machine where tokenizing a whole program might take time.
It also has the advantage that if you rename a variable, the IDE will have to rename it in the symbol table, and then it renames it for the whole tokenized program - no search and replace necessary.
I'll take Nibbles as the canonical example of the kind of QBASIC program I want to be able to run.
I think in terms of variables alone, Nibbles the program doesn't need more than about 5 to 10K of RAM.
But Nibbles the source code is 24KB. Even if I keep my entire emulator program down to 8K, we have already filled all 32K available memory.
Not only does this not give any room for program variables, you also need to have space for the tokenized version of the program.
I could force you to tokenize the programs on the PC and download only the tokenized version to the Propeller, but in my opinion the main novelty of a Propeller QBASIC would be the ability to actually edit the program on the Propeller too.
So how do you get room for the emulator and the source code and the tokenized code and the program? Well, what if we didn't really store the source code at all, but just let the user think they can see the source code. What you would really be seeing when you look at the code is a decompressed portion of the tokenized version of the program because the IDE would use a "dynamic detokenizer" to generate the editor display. The only line of real text would be the one you're currently typing, which gets tokenized as soon as you press enter.
You're going to lose your descriptive variable names when you go through the tokenization process, so the dynamic detokenizer would need to keep tables mapping variable names to token numbers.
Comments are an interesting case. I suspect most of the size of the nibbles program is comments. We could strip them out but it would really suck to have to work in an IDE that didn't let you have comments. We could make a few compromises though. Trim excess whitespace in comments. Oh and here's a big optimization we could do: What kind of data is a comment? (English) words. Words get repeated. Often. We could tokenize the comments! The best part is that we wouldn't need much extra code to do this because you could reuse much of the utility functions that you write for tokenizing code. But we would store the tokenized comments separately from the tokenized code, so that the execution engine doesn't worry about them. We would need a table to correlate comments to line numbers to recombine the program token stream with the comment token stream.
All that might save significant space, or the overhead might take up more room than it saves. Even if I end up having to use an external EEPROM to make this work (scratch that, I am DEFINITELY going to need the space of an external EEPROM) I would probably still want to implement the dynamic detokenization. When coding, you type maybe 100 words a minute max, but with long pauses for thought too. Typing speed is not the bottleneck when programming. On the other hand, the real bottleneck is waiting for the program to compile/tokenize/download, etc. The faster and easier it is to run your program, the more often you can get feedback and the faster your development goes. So if we optimize for that (program stored in tokenized form all the time, instantly ready to run) at the expense of maybe needing more clock cycles to reconstitute the text for editing, it's worth it on a machine where tokenizing a whole program might take time.
It also has the advantage that if you rename a variable, the IDE will have to rename it in the symbol table, and then it renames it for the whole tokenized program - no search and replace necessary.
Comments
As for tokenizing as the user enters the source, that has been done before.
Just take a look at Sinclair basic on the sinclair ZX 80, the ZX 81 and the ZX Spectrum.
(Timex Sinclair 1000 and upwards in the USA)
On those computers, all keywords were printed on the keyboard, and pressing a key would yield either a character or a keyword, depending on what had come before it in the line, so it tokenized the source as you typed it.
(It also did a fair bit of error-checking at the same time)
Liked the idea of using a symbol-table for variable names...
What kind of variables are you thinking of allowing?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...
In QBASIC, variables are "strongly" typed as integers (short or long), strings, single or double precision floats, or type structures. You do have an "any" keyword but it's mostly used for glue.
I used to be a static/strong typing advocate but some exposure to SmallTalk and derivative languages has convinced me of the power of the dark side (dynamic/weak typing), so I'll actually be providing an "Any" (variant) type no matter what type the program asks for. I.e., if it asks for an integer, it will get a variant, when it later assigns 0 to it it becomes an integer, and the program doesn't know it could also reassign something else.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Who says you have to have knowledge to use it?
I've killed a fly with my bare mind.
I will have to check the type at runtime in my emulator, though, because if, in QBASIC, you type "A + B", it requires quite different behavior for strings than for, say, integers. It wouldn't do to add the two string pointers to each other! All you have to do is include a byte with a magic number saying what type is stored in the data pointed to.
I'm beginning to have doubts how well I'll be able to get it to fit though! I rewrote 1/2 of nibbles last night in Spin, and even a port of nibbles ran into memory issues. Getting a QBASIC interpreter and an IDE in here will be like putting a ship in a bottle. I really hope Parallax quadruples the RAM for the next version of the Propeller. That's my official feature request: more RAM! more RAM! Maybe if we all shout it long enough the (silicon) crystal fairy will drop more of it on the next Propeller chip.
I could just run the emulator on the 68K side of my prototype computer, it has 256K of RAM, but then I wouldn't be able to share the emulator with everyone else! (Because you'd need the hardware.) On the other hand, it would then be in C, and I could release a PC version of it too...
tokens being interpreted, and store the entire BASIC program itself offchip using RAM or flash.
Bring in chunks of say 64 bytes, use a direct-mapped organization, and I'll bet the slower speed
of the flash won't even matter that much.
I may prototype this approach using the SD routines and Basic interpreter I've already written.