Two CS-101 problems and their FORTH solutions. Comments on style would be helpful.

Martin_H · 2013-06-26 07:02

My flute/whistle instructor said the way to learn an instrument is to learn 100 songs. The approach worked well, so I learn new computer languages the same way by writing solutions to classic programming problems. When I posted my last FORTH program I learned a fair amount from people's comments. Specifically FORTH style like using . in the name of output functions, but other things as well. So to further mooch off the accumulated FORTH experience of the forum, I'm posting two programs for comments on style and approach.

The first program prints all the factors of a number. I found this one pretty simple to do.

\ predicate used to determine if a number is a divisor
: factor? ( n1 n2 - f )
   mod 0= ;

\ prints n2 if it is a factor of n1
: .factor ( n1 n2 - )
    tuck    \ n2 n1 n2
    factor? \ n2 f
    if
       .
    else
       drop
    then ;

\ prints all the numbers less than the number that are factors.
: factors ( n - )
  dup 2     \ from 2 to n
  do
    dup i   \ n n i
    .factor \ n
  loop
  drop ;

The second calculates the integer square root with rounding. In case Heater is curious, I found the stack manipulations on this one hideously difficult to write. In the end I had to solve the problem by writing the program inside out. This is exactly the opposite way I would write the C function.

\ compute the number with the highest non sign bit set.
1 cell 8 * 2 - lshift
create max_bit 1 cells allot
max_bit !

\ shift "bit" to the highest power of four <= n.
: starting_bit ( n - n start_bit )
   max_bit @
   begin 2dup <= while \ one > n
      2 rshift
   repeat ;

\ predicate who's name says it all 
: n>=r+b? ( n r b - n r b f )
      \ copy n to return stack
      rot dup >r -rot

      \ compute result + bit
      2dup +

      \ n >= result + bit
      r> swap >= ;

\ round up n2 by one if n1 is greater
: round_up ( n1 n2 - n1 n2 )
   2dup > if
       1+
   then ;

\ Fast integer square root algorithm, with rounding the result to
\ the next greater integer if the fractional part is 0.5 or greater.
\ For example 2->1, 3->2, 4->2, 6->2, 7->3, 9->3
: isqrtr \ ( n - n^1/2 )
   \ Throughout the function we'll juggle three numbers on the stack:
   \ n (input), bit (computed), and result (output, starts at 0).
   starting_bit 0

   begin over while       \ bit is not zero

      n>=r+b? if
         2dup + >r        \ push result + bit to return stack
         rot r> - -rot    \ n = n - (result + one)
         over 2 * +       \ result += 2 * bit;
      then

      1 rshift swap       \ divide result by 2
      2 rshift swap       \ divide bit by 4.
   repeat

   \ bit has outlived its usefulness
   swap drop

   \ Do arithmetic rounding to nearest integer
   round_up

   \ clean off n to return only result.
   swap drop ;

In terms of time the first program took me no longer than writing it in C, but the second program took much longer than writing in C.

prof_braino · 2013-06-26 10:03

Martin_H wrote: »

the way to learn an instrument is to learn 100 songs.

I've heard this. How many is this so far? I'd be interested in how many you write before you feel you learned forth.

In terms of time the first program took me no longer than writing it in C, but the second program took much longer than writing in C.

: n>=r+b? ( n r b - n r b f )

I would suggest your problem is that you made a word too complicated.

If a word takes more than two or three item off the stack, and leave more than one or two, it might be better to break it down further. Of course you can have more or less, this is just the rule of thumb. But its much easier to debug and test a simple word that does just one thing. You can always reduce a serial of called words to a single word after it works.

Also, if the stack manipulations are getting out of hand, just store some values in memory as variables. The rule of thumb here is if its too hard, do it easier. Using variable is much slower, but easier to keep track. Again, you can always factor out the variable after you get it working.

Really cool work!

Dave Hein · 2013-06-26 10:21

Or you could use pick and roll to access deeper items on the stack. This would reduce n>=r+b? to ": n>=r+b? ( n r b - n r b f ) 2dup + 3 pick < ;". I know that Sal and prof_braino hate pick and roll, but that doesn't mean the rest of us aren't allowed to use it.

prof_braino · 2013-06-26 11:04

Dave Hein wrote: »

I know that Sal and prof_braino hate pick and roll, but that doesn't mean the rest of us aren't allowed to use it.

There are many case where pick and roll are well suited. If I need pick and roll, I'm usually just being sloppy, so I just avoid them.

In this particular case, it looks really nice.

Martin_H · 2013-06-26 12:24

prof_braino wrote: »

How many is this so far? I'd be interested in how many you write before you feel you learned forth.

Also, if the stack manipulations are getting out of hand, just store some values in memory as variables. The rule of thumb here is if its too hard, do it easier. Using variable is much slower, but easier to keep track. Again, you can always factor out the variable after you get it working.

Really cool work!

Thanks for the feedback. I've written eight programs so far. In general I know a programming language when I can write code without looking at a reference, and I've developed a sense of the languages aesthetics. At that point I can review my own code and be able to recognize good code versus bad code. I know about variables because I used one for max_bit, but that was essentially a precomputed constant. I've had reentrancy drummed into my head so much by multi-threaded programming that I'm allergic to global variables. I only use them as the jumping off point to other context, not for state within a function.

Dave Hein wrote: »

Or you could use pick and roll to access deeper items on the stack. This would reduce n>=r+b? to ": n>=r+b? ( n r b - n r b f ) 2dup + 3 pick < ;". I know that Sal and prof_braino hate pick and roll, but that doesn't mean the rest of us aren't allowed to use it.

Thanks for the tweak to the function. Swaping the sense of the comparison operator eliminated the swap, so I'll have to remember that. The words pick and roll are a bit out of my league at the moment, but I'll keep them in mind for the future. I'm just glad I was finally able to find a use for tuck!

Dave Hein · 2013-06-26 12:43

The parameter for PICK is used to select the stack element that is duplicated. "0 PICK" does the same thing as DUP, and "1 PICK" is the same as OVER.

ROLL uses the parameter to select the element it will move to the top of the stack. "1 ROLL" is the same as SWAP, and "2 ROLL" is the same as ROT.

PICK and ROLL can be implemented on any Forth system that makes the stack pointer accessible. They could probably be implemented in PropForth. Tachyon implements the stacks in cog registers, which would make it hard to implement PICK and ROLL.

Martin_H · 2013-06-28 10:56

So here's the sieve of Eratosthenes for the first 1024 numbers using a bit vector for efficiency and the bit set and clear operations. One thing I don't like is that when it produces its output there's always a space between the number and the comma (e.g. 2 , 3 , 5 , ...). I would prefer something like 2, 3, 5, 7, ... but I can't figure out how to do that.

\ Begin borrowed code from http://www.forth.org/svfig/Len/bits.htm
\ Note: that code had a bug I fixed as indicated below.

\ Bit_array is a defining word that will create a bit array.
\ l is the number of bytes
create masks 128 c, 64 c, 32 c, 16 c, 8 c, 4 c, 2 c, 1 c,
: bit_array \ ( len -- ) ( i -- m a)
   create allot does>
      swap   \ a i
      8      \ a i 8
      /mod   \ a remainder whole
      swap   \ a whole remainder !MCH bug fix!
      masks  \ a whole remainder a
      +      \ a whole mask_a
      c@     \ a whole m
      -rot   \ m a whole
      + ;    \ m a

\ We also need words to store and fetch bits. The words .@ and .!
\ Will fetch and store, respectively, a Boolean flag on the stack to
\ a bit array. (Do not confuse .! with cset or .@ with creset.) 

: .! ( f m a -- ) rot if cset else creset then ;
: .@ ( m a -- f ) c@ and 0<> ;

\ Examples
\ 3 bit-array thflag  \ Create a bit array for 24 bit-flags
\ 11 thflag ctoggle    \ Toggle the 11th bit
\ 10 thFLAG .@ ( -- f) \ Extract the 10th bit and leave as well-formed Boolean flag

\ End of borrowed code.

\ Create a bit vector to hold the sieve.
\ 128 bytes allows for all primes less than 1024.
128 bit_array sieve

\ locates the index of the first non-zero bit in the sieve
\ with an index greater than the input.
: find_one ( n -- n )
   1+ dup 1024 swap do
      i sieve .@ \ get the bit corresponding to the integer.
      if
         leave
      else
         1+
      then
   loop ;

\ sets bits starting at n.
: set_bits ( n -- )
   1024 swap
   do
      i sieve cset
   loop ;

\ clears bits starting at n in steps of n. So n=2 starts at 2
\ and clears every other bit. While n=3 starts at three and
\ clears every third bit. However, it then resets the first
\ bit as that number is prime.
: clear_bits ( n -- )
   dup 1024 swap
   do
      i sieve creset
      dup
   +loop
   sieve cset ;

\ We'll waste two bits for integers 0 and 1 which can't be prime
: do_sieve
  2 set_bits   \ Assume all numbers 2 or greater are prime.
  2            \ Initially, let p equal 2, the first prime number.

  begin dup 1024 < while
     dup          \ n n
     clear_bits   \ n
     find_one     \ index of first one > n
  repeat ;

\ iterates through the bit vector printing the index of a prime number
: .sieve
   ." List of primes "
   1024 0 do
      i sieve .@ \ get the bit corresponding to the integer.
      if
         i . ." , "
      then
   loop ;

Dave Hein · 2013-06-28 14:42

After I defined cset and creset it worked fine under pfth.

Martin_H · 2013-06-28 16:32

Dave Hein wrote: »

After I defined cset and creset it worked fine under pfth.

That's awesome. I love code portability. I'm also really liking the create does> construct. It's key to raising the level of semantic abstraction in FORTH from low level to higher level. All that bit vector grunt work was nicely encapsulated.

Martin_H · 2013-07-08 13:46

String parsing is another important part of a language to understand. So I started looking into a FORTH program to parse a Roman numeral into an Arabic one. I found some pretty freaky FORTH to do this, so I began reverse engineering it. Below is the code, but I got stuck on the /string word which I haven't found an explanation of. The command "see /string" dumped some assembler, so it's a FORTH built in.

\ Create an associative array by left shifting Roman values with the assoicated character
create (arabic)
   1000 128 * char M + ,  \ 128077
    500 128 * char D + ,  \  64068
    100 128 * char C + ,  \  12867
     50 128 * char L + ,  \   6476
     10 128 * char X + ,  \   1368
      5 128 * char V + ,  \    726
      1 128 * char I + ,  \    201
does> ( char addr -- value value )
   7 cells      \ char addr 7*cell_size
   bounds       \ char addr+7*cell_size addr
   do
      i @       \ char array[i]
      over over \ char array[i] char array[i]
      127 and = \ char array[i] f
      if        \ chars match, then return value
         nip 7 rshift leave
      else
         drop   \ char
      then
      1 cells  \ increment index one cell.
   +loop
   dup
;
 
\ parses a counted string in roman numeral form.
: >arabic ( addr count - )
   0 dup        \ addr count 0 0
   >r >r        \ addr count
   begin
      over over \ addr count addr count
      while     \ count is non-zero
      c@        \ load the character at addr
      dup       \ addr count char char
      (arabic)  \ addr count char value value
      rot <>    \ addr count value f
      while
         r> over r> over over >
         if
            2* negate + else drop
         then
         + swap >r >r 1 /string
   repeat
   then
   drop
   2drop
   r> r>
   drop
;

s" MCMLXXXIV" >arabic .

Dave Hein · 2013-07-08 15:00

According to the ANS standard /string adds an amount to the string address pointer, and reduces the count by the same amount. I believe it can be implemented as

: /string ( addr count num -- addr count) dup >r - swap r> + swap ;

Martin_H · 2013-07-08 16:32

Thanks Dave, that helps. The other weird bit is the then right after repeat which is not matched with an if, but I suspect that is caused by having two while clauses. The main word really looks like it could use some factoring to get rid of it. Basically this looks like ugly FORTH to me.

Martin_H · 2013-07-10 13:12

Here's the fully reverse engineered and refactored Roman numeral parser. My improvement is getting rid of the two whiles and unmatched then. I also broke it down into smaller words to make the stack management more straight forward at each step. It's probably commented too heavily, but I needed to do that while reverse engineering.

\ Helpers used to create cell values, and access portion of the cell.
: makePair 7 lshift + ;  ( char value -- array[i] )
: >char  127 and ;       ( array[i] -- char )
: >value 7 rshift ;      ( array[i] -- value )

\ Create an associative array by left shifting Roman values with the assoicated character
create (arabic)
   char M 1000 makePair ,  \ 128077
   char D  500 makePair ,  \  64068
   char C  100 makePair ,  \  12867
   char L   50 makePair ,  \   6476
   char X   10 makePair ,  \   1368
   char V    5 makePair ,  \    726
   char I    1 makePair ,  \    201
does> ( char addr -- value value )
   7 cells      \ char addr 7*cell_size
   bounds       \ char addr+7*cell_size addr
   do
      i @       \ char array[i]
      over over \ char array[i] char array[i]
      >char =   \ char array[i] f
      if        \ chars match, then return value
         nip >value leave
      else
         drop   \ char
      then
      1 cells  \ increment index one cell.
   +loop
   dup
;

\ adds the value of char to sum and keeps a copy of its value in digit
: do_digit (  sum digit char -- new_sum new_digit )
   (arabic) >r   \ sum digit new_digit
   over over     \ sum digit new_digit digit new_digit
   < if
      \ old digit is less than new, so subtract it from sum
      -rot 2* negate +
      swap       \ adjusted_sum new_digit
   else
      nip        \ sum new_digit
   then
   + r>          \ new_sum new_digit
;

\ parses a counted string in roman numeral form.
: >arabic ( addr count - )
   0 -rot 0 -rot \ sum digit addr count
   begin
      over over  \ sum digit addr count addr count
      while      \ count is non-zero
      c@         \ load the character at addr
      -rot >r >r \ sum digit char
      do_digit   \ sum digit
      r> r>      \ sum digit addr count
      1 /string  \ advance to next character
   repeat
   \ ( sum digit addr count addr )
   2drop         \ sum digit addr
   2drop         \ sum
;

s" MCMLXXXIV" >arabic .

prof_braino · 2013-07-10 21:17

I think the comments are just fine. I can read it and pretty much see what its doing. Rule of thumb is if you needed it one time, you will need it next time.

I'll run this by the college kids and see what they say.

Nice exercise!

Dave Hein · 2013-07-11 06:36

Martin, I confused by the definition of (arabic). Have you actually tried running this? Also, you are using "does> ... ;" without defining a colon word. Shouldn't the definition of (arabic) be ": (arabic) create"?

Martin_H · 2013-07-11 06:59

Dave Hein wrote: »

Martin, I confused by the definition of (arabic). Have you actually tried running this? Also, you are using "does> ... ;" without defining a colon word. Shouldn't the definition of (arabic) be ": (arabic) create"?

Yes, I've run this under gforth, and that construct was in the original code I found here: http://rosettacode.org/wiki/Roman_numerals/Decode#Forth

What is going on is that (arabic) is not a word, it is the name of an array allocated by create. But you have the option of adding code that executes when the array is referenced via the does> construct. The does> construct requires the ';' to terminate its definition, but create doesn't require the preceding ':'. It's a bit confusing, but that's why this reverse engineering exercise was worth doing.

Dave Hein · 2013-07-11 07:27

OK. I've never used does> that way. I'm pretty sure pfth wouldn't handle it correctly. I'll have to check the ANS Forth standard to see how does> is defined in the interpreter mode. Of course, pfth would also have a problem with s" since it's not supported in the interpreter mode by the ANS standard.

It seems like the use of does> in this context is unusual since it is only applied to the single word (arabic). For me, the code would look cleaner if it defined the array and the "does>" portion of (arabic) as separate words, but I understand that the original code was written using does>.

Dave Hein · 2013-07-11 09:18

I checked the ANS Forth standard, and does> is undefined in the interpretation mode. GForth must have non-standard extensions that allow using does> and s" during interpretation. I modified (arabic) to be two words, and the code runs fine under pfth.

Martin_H · 2013-07-11 13:15

Dave Hein wrote: »

I checked the ANS Forth standard, and does> is undefined in the interpretation mode. GForth must have non-standard extensions that allow using does> and s" during interpretation. I modified (arabic) to be two words, and the code runs fine under pfth.

The original code was definitely on the obscure side. I feel I improved it, but you're right there is further room for improvement.

Martin_H · 2013-07-31 09:44

I accidentally went down the FORTH rabbit hole. It bugged me that FORTH has the words constant, variable, and value.

* Obviously I understand variables, they are a storage location referenced by an address that contains a value. The symbolic name assigned to that variable is stored in some sort of look up table which translates the name into the address and deposit it on the stack.

* On the surface I understand a constant. It is a symbolic name for an unchanging value. In compiled languages a constant is available only at compile time, often the value is stored in the code, and referenced using the immediate mode of the underlying instruction set. But FORTH is a threaded interpreter, so how does that impact this concept?

* A value seems like the most nebulous concept. It seems to behave identically to a constant. For example "10 constant foo" results in 10 on the stack when you type foo. Likewise "10 value foo" results in the same thing. So this puzzles me.

So I decided to find out why and what each does, and armed with the see word I followed the trail below:

see variable 
: Variable  
  Create 0 , ; ok
  ok

see constant
see constant 
: Constant  
  (Constant) , ; ok
  ok

see (Constant)
see (Constant) 
: (Constant)  
  header reveal docon: cfa, ; ok
  ok

see docon:
see docon: 
: docon:  
  2126060304 @ ; ok
  ok

see value
see value 
: Value  
  (Value) , ; ok
  ok

see (Value)
see (Value) 
: (Value)  
  header reveal dovalue: cfa, ; ok
  ok

see dovalue:
see dovalue: 
: dovalue:  
  2126060752 @ ; ok
  ok

I think I understand the variable definition, but the rest are alien to me. However, the definitions of constant and value seem similar to me.

Dave Hein · 2013-07-31 11:23

CFA is the code field address. So constant writes the contents of 2126060304 into the code field, and value writes the contents of 216060752 into the code field. It would be interesting to do "dovalue: ." and "docon: ." to see what those values are. I assume that TO will check the contents of CFA and not write to a constant, but will write to a value. Do a "see to" and see what it does.

Martin_H · 2013-07-31 13:03

Dave Hein wrote: »

It would be interesting to do "dovalue: ." and "docon: ." to see what those values are. I assume that TO will check the contents of CFA and not write to a constant, but will write to a value. Do a "see to" and see what it does.

dovalue: .
dovalue: . 4199741  ok
  ok

docon: .
docon: . 4199626  ok
  ok

see to
see to 
noname : 
  ' dup >definer 2126112392 >definer = 
  IF     8 + ! 
  ELSE   -32 throw 
  THEN ;
latestxt

noname : 
  COMP' drop dup >definer 2126112392 >definer over = 
  IF     drop 8 +  POSTPONE ALiteral 4268668 compile, 
  ELSE   2126129400 >definer over = 
         IF     drop 4269264 compile, 8 + @ lp-offset, 4268680 compile, 
         ELSE   2126129496 >definer over = 
                IF     drop 4269264 compile, 8 + @ lp-offset, 4268668 compile, 
                ELSE   2126129432 >definer over = 
                       IF     drop 4269264 compile, 8 + @ lp-offset, 4268684 
                              compile, 
                       ELSE   2126129464 >definer over = 
                              IF     drop 4269264 compile, 8 + @ lp-offset, 
                                     4268996 compile, 
                              ELSE   -32 throw drop 
                              THEN 
                       THEN 
                THEN 
         THEN 
  THEN ;
latestxt
interpret/compile: TO  ok
  ok

Dave Hein · 2013-07-31 13:29

Well it looks like TO is comparing the "definer" in the following word with the definer in the word at 2126112392, which must be a known VALUE word. If it matches it adds 8 to the execution token and writes the value. The body of the word must be 8 bytes after the code pointer, which is where the content of a variable or value is stored. If the definers don't match then it throws an error code of -32. The ANS standard says the meaning of error code -32 is "invalid name argument (e.g., TO xxx)".

Martin_H · 2013-07-31 17:00

Thanks for the explanation Dave. Were further down the rabbit hole, and as near as I can tell the only difference between constant and value is which magic number the definer is compared against. Does that match your understanding?

Dave Hein · 2013-07-31 17:39

Yes, that's the way I understand it.

Martin_H · 2013-07-31 19:58

Dave Hein wrote: »

Yes, that's the way I understand it.

Good, now I feel I have a right to be confused about the difference between constant and value. I've made a good faith effort, so I'm going to chalk this up as likely the result of some sort of legacy in the language. It might get clearer over time, or remain a perpetual mystery like some of the weirder bits of C++.

Dave Hein · 2013-07-31 20:11

Maybe I didn't make it clear, but the Forth interpreter you are using is only allowing TO to be used with VALUE words. You are correct that VALUE and CONSTANT words are essentially the same thing. Some Forth interpreters don't make any distinction between the two, such as pfth. Other interpreters will not allow you to change a constant word after it has been defined.

In pfth the definitions for CONSTANT, VALUE and TO are as follows:

: constant create here ! cellsize allot does> @ ;
: value create here ! cellsize allot does> @ ;
: to ' >body state @ if postpone literal [compile] ! else ! then ; immediate

pfth doesn't do any error checking on the use of TO, so it could be applied to a value, constant or even a variable.

Martin_H · 2013-08-01 06:26

OK, thanks again.