junkbasic

David Betz · 2020-04-18 11:54

I know "junkbasic" isn't a very encouraging name but I've written so many BASIC interpreters that I've ended up abandoning for various reasons that I thought maybe putting the word "junk" in the name might make this the first one that doesn't actually get junked.

Anyway, my intent with this project is to first create an interactive BASIC for the P2 that compiles to byte codes that are then interpreted by a fast XBYTE VM. As a second step I'd like to make the compiler generate PASM code to improve performance.

So I'm sort of in both the language design and language implementation phase. I can start implementation early because I can "borrow" code from all of the other BASIC interpreters I've written in the past which in turn borrowed code from XLISP, Bob, AdvSys, and countless other languages I've created over the years.

However, I still have some language design problems to solve before I can actually run any programs under junkbasic. The one I'm working on now has to do with function/sub calls vs. array references. The ebasic3 interpreter that I wrote used parens to indicate function or subroutine calls and square brackets for array references. This is like C but not really the way BASIC usually works. BASIC typically uses parens for both operations. So my problem is that when I compile something like "foo(1, 2)", I might not know if "foo" is a function, subroutine, or an array. I could solve this by not allowing any forward references to arrays and then just assume that this expression is a function call if the symbol "foo" is as yet undefined but I'm not sure if that is the way other BASICs work.

So, my question is: Do implementations of BASIC allow forward references to arrays? I know I have to allow forward references to functions because someone might want to define a pair of mutually recursive functions and the only other way to handle that would be with a second compiler pass. I'd like to avoid that because it requires that I have the entire program available and an interactive BASIC might just get one piece of the program at a time as typed by the user.

Any language design people out there who might have suggestions about this?

ersmith · 2020-04-18 11:58

I think it's fair to force people to use DIM A(n) before using A as an array, and otherwise assuming A(x) is a function reference. Personally I kind of like the alternative of using [] for array subscripts so they're easily differentiated from array dereference, but I guess that's a question of how compatible with other BASICs you want to be.

David Betz · 2020-04-18 12:01

ersmith wrote: »

I think it's fair to force people to use DIM A(n) before using A as an array, and otherwise assuming A(x) is a function reference. Personally I kind of like the alternative of using [] for array subscripts so they're easily differentiated from array dereference, but I guess that's a question of how compatible with other BASICs you want to be.

I like the brackets too but maybe that's because I've been programming in C too long!

Actually, I think brackets are the standard mathematical notation for array references, no? I wonder why BASIC used parens? Were square brackets a late addition to the character set?

David Betz · 2020-04-18 12:18

ersmith wrote: »

I think it's fair to force people to use DIM A(n) before using A as an array, and otherwise assuming A(x) is a function reference. Personally I kind of like the alternative of using [] for array subscripts so they're easily differentiated from array dereference, but I guess that's a question of how compatible with other BASICs you want to be.

Yes, requiring arrays to be declared in a DIM before being referenced is probably okay. However, I did have a language once that was fairly unremarkable in most respects but had as it's major claim to fame that it was easy to create complicated interlinked data structures. To allow that, I assumed any identifier that was not yet defined when first referenced was a forward reference to an object. This language was for writing text adventure games where one of the main features is an interconnected set of locations containing objects and actors. Here is an example of a few location objects including forward references. Note for example that in the "hallway" object there are lines like "east storage-room" and "north kitchen". These are forward references to "storage-room" and "kitchen".

(location hallway
  (property
    description "You are in a long narrow hallway. There is a door to the
                 east into a small dark room.  There are also exits on both
                 the north and south ends of the hall."
    short-description "You are in the hallway."
    east storage-room
    north kitchen
    south livingroom))

(location kitchen
  (property
    description "This is a rather dusty kitchen.  There is a hallway to the
                 south and a pantry to the west."
    short-description "You are in the kitchen."
    south hallway
    west pantry))

(location pantry
  (property
    description "This is the kitchen pantry.  The kitchen is through a 
                 doorway to the east."
    short-description "You are in the pantry."
    east kitchen)
  (method (enter obj)
    (send-super enter obj)))

(location livingroom
  (property
    description "This appears to be the livingroom.  There is a hallway to
                 the north and a closet to the west."
    short-description "You are in the livingroom."
    north hallway
    west closet
    south front-door-1))

(location outside
  (property
    description "You are outside a small house.  The front door is to the
                 north."
    short-description "You are outside."
    north front-door-2))

Tubular · 2020-04-18 21:21

It sounds like one of those 'perfect is the enemy of good enough' dilemmas. I think its find to make choices that make your life easier and let you get deeper into design of other aspects of the language. You can always come back to it and refine.

More broadly, its great to see more development happening on P2

David Betz · 2020-04-18 22:15

Tubular wrote: »

It sounds like one of those 'perfect is the enemy of good enough' dilemmas. I think its find to make choices that make your life easier and let you get deeper into design of other aspects of the language. You can always come back to it and refine.

More broadly, its great to see more development happening on P2

Yeah, I'm anxious to get to the point of running my first program so I may ignore some of the design issues at least initially. Right now I'm trying to cleanup the memory management. Also, I took a side track and added LOAD, SAVE, and CAT so you can store and retrieve programs through the filesystem. Eric recently added both host and SD filesystem support. So far I've only tried the host support but it seems to work well.

Circuitsoft · 2020-04-19 01:38

Parens vs Brackets: https://www.quora.com/Why-did-BASIC-designers-choose-parentheses-to-index-array-elements

localroger · 2020-04-19 02:26

Every version of BASIC I have ever used has required DIM to assign an array before it can be used, even if other variables auto-DIM. And this is sensible, since otherwise how would the interpreter know how large to make the array? Even VB6, which allows dynamically dimensioned arrays, requires you do dim the array as an empty array so the compiler will know what to do with the reference later.

AJL · 2020-04-19 02:36

So, if the designers of BASIC at Dartmouth were using a Model 33 or 35 teletype to talk to the computer they only had parentheses available. The Model 33 teletype used a Baudot code to communicate with the remote device, not ASCII, so didn’t have square brackets as an option.

David Betz · 2020-05-02 20:44

I just ran my first junkbasic program on the P2. It doesn't do much of anything but it does show that the basic compiler and byte code interpreter are working. The updated code is checked into GitHub. Unfortunately, the executable is 141k on the P2. That is compiled with no optimization. If I use -O2 the program fails to run.

loadp2 -b 230400 -9 . junkbasic.p2 -t
( Entering terminal mode.  Press Ctrl-] to exit. )
 !!! corrupted heap??? !!!  !!! corrupted heap??? !!!  !!! corrupted heap??? !!! %

Here is the test program:

100 a=1
110 function foo(x)
120  return bar(x) + bar(a)
130 end function
140 function bar(y)
150  return y+1
160 end function
170 b=2
180 foo(12)

The line numbers are only for the benefit of the editor. The BASIC compiler doesn't use them or even see them.

Phil Pilgrim (PhiPi) · 2020-05-02 21:04

localroger wrote:

The Model 33 teletype used a Baudot code to communicate with the remote device, not ASCII, so didn’t have square brackets as an option.

The Model 33 used ASCII; the less-used Model 32, Baudot. However, you're right that the Model 33 did not have brackets (or braces) on its keyboard or type cylinder. Nor did it have lower-case letters.

-Phil

ersmith · 2020-05-02 21:11

That's looking pretty cool, David!

The "corrupted heap" message is probably due to memory not being initialized to 0 (the -O2 compiler leaves off static data). I think if you give the -ZERO option to loadp2 it should fix that.

David Betz · 2020-05-02 21:13

ersmith wrote: »

That's looking pretty cool, David!

The "corrupted heap" message is probably due to memory not being initialized to 0 (the -O2 compiler leaves off static data). I think if you give the -ZERO option to loadp2 it should fix that.

My code does not depend on the memory being initialized to zero. Is there something in the C runtime code that does?

ersmith · 2020-05-02 21:15

David Betz wrote: »

ersmith wrote: »

That's looking pretty cool, David!

The "corrupted heap" message is probably due to memory not being initialized to 0 (the -O2 compiler leaves off static data). I think if you give the -ZERO option to loadp2 it should fix that.

My code does not depend on the memory being initialized to zero. Is there something in the C runtime code that does?

Yes, there's some code in the garbage collector that does. It's a bug, I haven't figured out how to fix it yet though.

David Betz · 2020-05-02 21:22

ersmith wrote: »

David Betz wrote: »

ersmith wrote: »

That's looking pretty cool, David!

The "corrupted heap" message is probably due to memory not being initialized to 0 (the -O2 compiler leaves off static data). I think if you give the -ZERO option to loadp2 it should fix that.

My code does not depend on the memory being initialized to zero. Is there something in the C runtime code that does?

Yes, there's some code in the garbage collector that does. It's a bug, I haven't figured out how to fix it yet though.

Well -O2 didn't reduce the code size anyway. Is there a -Os?

ersmith · 2020-05-02 21:27

David Betz wrote: »

ersmith wrote: »

David Betz wrote: »

ersmith wrote: »

That's looking pretty cool, David!

The "corrupted heap" message is probably due to memory not being initialized to 0 (the -O2 compiler leaves off static data). I think if you give the -ZERO option to loadp2 it should fix that.

My code does not depend on the memory being initialized to zero. Is there something in the C runtime code that does?

Yes, there's some code in the garbage collector that does. It's a bug, I haven't figured out how to fix it yet though.

Well -O2 didn't reduce the code size anyway. Is there a -Os?

No, unfortunately not.

riscvp2 would probably produce a smaller binary (using compressed RISC-V instructions). But personally I'd probably wait and deal with the size issue after everything is working.

David Betz · 2020-05-02 22:17

ersmith wrote: »

David Betz wrote: »

ersmith wrote: »

David Betz wrote: »

ersmith wrote: »

That's looking pretty cool, David!

The "corrupted heap" message is probably due to memory not being initialized to 0 (the -O2 compiler leaves off static data). I think if you give the -ZERO option to loadp2 it should fix that.

My code does not depend on the memory being initialized to zero. Is there something in the C runtime code that does?

Yes, there's some code in the garbage collector that does. It's a bug, I haven't figured out how to fix it yet though.

Well -O2 didn't reduce the code size anyway. Is there a -Os?

No, unfortunately not.

riscvp2 would probably produce a smaller binary (using compressed RISC-V instructions). But personally I'd probably wait and deal with the size issue after everything is working.

I know it's stupid but I want to use a native code compiler for the P2. It would be interesting to see what the executable size is for the RISC-V compiler though.

David Betz · 2020-05-03 01:09

Well, maybe the 141K isn't as bad as I thought. I forgot that I have a 64K buffer compiled into my code that I use as a compiler heap and edit buffer. If you subtract that the code size is only 76K and that includes the host filesystem code. I need to find a way to allocate the heap at runtime rather than compile time.

rogloh · 2020-05-03 02:13

Being concerned about space reminds me of a time when I wrote a tokenized BASIC interpreter for the AVR. It also included bit-banged VGA, PS/2 keyboard, filesystem, sound and a font, graphics etc and it all had to fit in a 16kB flash part. When I started I agonized whether to use C or Assembly code for the BASIC interpreter portion, in the end for speed of coding I risked C but I was always worried it would not fit. In the end after a lot of optimization to shave bytes here and there throughout the whole thing it was all done it just fit perfectly and so I was sort of proud of it. I showed it to a mate and explained what I did, who then turned up his nose and said "Real programmers write in assembly". LOL! Pride before a fall.

David Betz · 2020-05-03 11:34

rogloh wrote: »

Being concerned about space reminds me of a time when I wrote a tokenized BASIC interpreter for the AVR. It also included bit-banged VGA, PS/2 keyboard, filesystem, sound and a font, graphics etc and it all had to fit in a 16kB flash part. When I started I agonized whether to use C or Assembly code for the BASIC interpreter portion, in the end for speed of coding I risked C but I was always worried it would not fit. In the end after a lot of optimization to shave bytes here and there throughout the whole thing it was all done it just fit perfectly and so I was sort of proud of it. I showed it to a mate and explained what I did, who then turned up his nose and said "Real programmers write in assembly". LOL! Pride before a fall.

My junkbasic interpreter will never be really small because it parses the BASIC source into a parse tree and then runs a code generator to produce byte codes (and later maybe native code). It also has a disassembler for the byte codes and parse tree dump code as well as a simple line editor. This is why I had so much trouble getting it to run on the P1. There just isn't enough memory without going to XMM.

David Betz · 2020-05-03 15:59

FYI, removing the compiled memory space from junkbasic reduces the executable size to around 90k and that still includes the filesystem code as well as FlexC's heap so maybe it's not all that bad. Also, the program works fine with the new memory allocate scheme.

David Betz · 2020-05-05 00:51

Things are starting to work. This program runs correctly. The funky functions at the start are the definitions of some internal functions that are used by the PRINT statement. I'll have to build those in before this is usable.

a = 1

function printStr(str, chn)
 asm
  lref 1
  trap 2
 end asm
end function

function printInt(n, chn)
 asm
  lref 1
  trap 3
 end asm
end function

function printNL(chn)
 asm
  trap 5
 end asm
end function

function foo(x)
 return bar(x) + bar(a)
end function

function bar(y)
 return y + 1
end function

function baz(q, r)
  return q * 10 + r
end function

b = 2

print "Hello, world!"
print foo(12)
print baz(4,6)

This produces the results:

Hello, world!
15
46
OK

David Betz · 2020-05-07 11:22

I got tired of adding the print helper function to every BASIC program so I added an include capability. This now works:

include "io.bas"

a = 1

function foo(x)
 return bar(x) + bar(a)
end function

function bar(y)
 return y + 1
end function

function baz(q, r)
  return q * 10 + r
end function

b = 2

print "Hello, world!"
print foo(12)
print baz(4, 6)

Where "io.bas" contains:

function printStr(str, chn)
 asm
  lref 1
  trap 2
 end asm
end function

function printInt(n, chn)
 asm
  lref 1
  trap 3
 end asm
end function

function printTab(chn)
 asm
  trap 4
 end asm
end function

function printNL(chn)
 asm
  trap 5
 end asm
end function

I also changed the line editor to not store the line numbers in the "*.bas" files on "save" and "load". The line numbers can't be used as targets in a GOTO statement anyway so it doesn't matter if the numbering is retained from one session to the next. This also makes the "*.bas" files easier to edit in an external editor that doesn't need or want the line numbers.

ersmith · 2020-05-07 11:53

It's looking pretty good David! The latest junkbasic source did turn up a bug in fastspin (in evaluating pointers to pointers to functions). I've checked in a fix for it.

David Betz · 2020-05-07 12:04

ersmith wrote: »

It's looking pretty good David! The latest junkbasic source did turn up a bug in fastspin (in evaluating pointers to pointers to functions). I've checked in a fix for it.

That's interesting. I hadn't noticed it yet. I usually test both on the Mac an on the P2 but lately I've just been working on editor bugs and hadn't had a chance to do a P2 build. Thanks for fixing the bug!

David Betz · 2020-05-08 00:52

Got array initialization and indexing working:

include "io.bas"
dim array[4] = { 1, 2, 3, 4 }
dim i
for i=0 to 3
  print array[i]
next i

Output:

junkbasic

Comments