Two CS-101 problems and their FORTH solutions. Comments on style would be helpful.
My flute/whistle instructor said the way to learn an instrument is to learn 100 songs. The approach worked well, so I learn new computer languages the same way by writing solutions to classic programming problems. When I posted my last FORTH program I learned a fair amount from people's comments. Specifically FORTH style like using . in the name of output functions, but other things as well. So to further mooch off the accumulated FORTH experience of the forum, I'm posting two programs for comments on style and approach.
The first program prints all the factors of a number. I found this one pretty simple to do.
The second calculates the integer square root with rounding. In case Heater is curious, I found the stack manipulations on this one hideously difficult to write. In the end I had to solve the problem by writing the program inside out. This is exactly the opposite way I would write the C function.
In terms of time the first program took me no longer than writing it in C, but the second program took much longer than writing in C.
The first program prints all the factors of a number. I found this one pretty simple to do.
\ predicate used to determine if a number is a divisor
: factor? ( n1 n2 - f )
mod 0= ;
\ prints n2 if it is a factor of n1
: .factor ( n1 n2 - )
tuck \ n2 n1 n2
factor? \ n2 f
if
.
else
drop
then ;
\ prints all the numbers less than the number that are factors.
: factors ( n - )
dup 2 \ from 2 to n
do
dup i \ n n i
.factor \ n
loop
drop ;
The second calculates the integer square root with rounding. In case Heater is curious, I found the stack manipulations on this one hideously difficult to write. In the end I had to solve the problem by writing the program inside out. This is exactly the opposite way I would write the C function.
\ compute the number with the highest non sign bit set.
1 cell 8 * 2 - lshift
create max_bit 1 cells allot
max_bit !
\ shift "bit" to the highest power of four <= n.
: starting_bit ( n - n start_bit )
max_bit @
begin 2dup <= while \ one > n
2 rshift
repeat ;
\ predicate who's name says it all
: n>=r+b? ( n r b - n r b f )
\ copy n to return stack
rot dup >r -rot
\ compute result + bit
2dup +
\ n >= result + bit
r> swap >= ;
\ round up n2 by one if n1 is greater
: round_up ( n1 n2 - n1 n2 )
2dup > if
1+
then ;
\ Fast integer square root algorithm, with rounding the result to
\ the next greater integer if the fractional part is 0.5 or greater.
\ For example 2->1, 3->2, 4->2, 6->2, 7->3, 9->3
: isqrtr \ ( n - n^1/2 )
\ Throughout the function we'll juggle three numbers on the stack:
\ n (input), bit (computed), and result (output, starts at 0).
starting_bit 0
begin over while \ bit is not zero
n>=r+b? if
2dup + >r \ push result + bit to return stack
rot r> - -rot \ n = n - (result + one)
over 2 * + \ result += 2 * bit;
then
1 rshift swap \ divide result by 2
2 rshift swap \ divide bit by 4.
repeat
\ bit has outlived its usefulness
swap drop
\ Do arithmetic rounding to nearest integer
round_up
\ clean off n to return only result.
swap drop ;
In terms of time the first program took me no longer than writing it in C, but the second program took much longer than writing in C.
Comments
I've heard this. How many is this so far? I'd be interested in how many you write before you feel you learned forth.
I would suggest your problem is that you made a word too complicated.
If a word takes more than two or three item off the stack, and leave more than one or two, it might be better to break it down further. Of course you can have more or less, this is just the rule of thumb. But its much easier to debug and test a simple word that does just one thing. You can always reduce a serial of called words to a single word after it works.
Also, if the stack manipulations are getting out of hand, just store some values in memory as variables. The rule of thumb here is if its too hard, do it easier. Using variable is much slower, but easier to keep track. Again, you can always factor out the variable after you get it working.
Really cool work!
There are many case where pick and roll are well suited. If I need pick and roll, I'm usually just being sloppy, so I just avoid them.
In this particular case, it looks really nice.
Thanks for the feedback. I've written eight programs so far. In general I know a programming language when I can write code without looking at a reference, and I've developed a sense of the languages aesthetics. At that point I can review my own code and be able to recognize good code versus bad code. I know about variables because I used one for max_bit, but that was essentially a precomputed constant. I've had reentrancy drummed into my head so much by multi-threaded programming that I'm allergic to global variables. I only use them as the jumping off point to other context, not for state within a function.
Thanks for the tweak to the function. Swaping the sense of the comparison operator eliminated the swap, so I'll have to remember that. The words pick and roll are a bit out of my league at the moment, but I'll keep them in mind for the future. I'm just glad I was finally able to find a use for tuck!
ROLL uses the parameter to select the element it will move to the top of the stack. "1 ROLL" is the same as SWAP, and "2 ROLL" is the same as ROT.
PICK and ROLL can be implemented on any Forth system that makes the stack pointer accessible. They could probably be implemented in PropForth. Tachyon implements the stacks in cog registers, which would make it hard to implement PICK and ROLL.
\ Begin borrowed code from http://www.forth.org/svfig/Len/bits.htm \ Note: that code had a bug I fixed as indicated below. \ Bit_array is a defining word that will create a bit array. \ l is the number of bytes create masks 128 c, 64 c, 32 c, 16 c, 8 c, 4 c, 2 c, 1 c, : bit_array \ ( len -- ) ( i -- m a) create allot does> swap \ a i 8 \ a i 8 /mod \ a remainder whole swap \ a whole remainder !MCH bug fix! masks \ a whole remainder a + \ a whole mask_a c@ \ a whole m -rot \ m a whole + ; \ m a \ We also need words to store and fetch bits. The words .@ and .! \ Will fetch and store, respectively, a Boolean flag on the stack to \ a bit array. (Do not confuse .! with cset or .@ with creset.) : .! ( f m a -- ) rot if cset else creset then ; : .@ ( m a -- f ) c@ and 0<> ; \ Examples \ 3 bit-array thflag \ Create a bit array for 24 bit-flags \ 11 thflag ctoggle \ Toggle the 11th bit \ 10 thFLAG .@ ( -- f) \ Extract the 10th bit and leave as well-formed Boolean flag \ End of borrowed code. \ Create a bit vector to hold the sieve. \ 128 bytes allows for all primes less than 1024. 128 bit_array sieve \ locates the index of the first non-zero bit in the sieve \ with an index greater than the input. : find_one ( n -- n ) 1+ dup 1024 swap do i sieve .@ \ get the bit corresponding to the integer. if leave else 1+ then loop ; \ sets bits starting at n. : set_bits ( n -- ) 1024 swap do i sieve cset loop ; \ clears bits starting at n in steps of n. So n=2 starts at 2 \ and clears every other bit. While n=3 starts at three and \ clears every third bit. However, it then resets the first \ bit as that number is prime. : clear_bits ( n -- ) dup 1024 swap do i sieve creset dup +loop sieve cset ; \ We'll waste two bits for integers 0 and 1 which can't be prime : do_sieve 2 set_bits \ Assume all numbers 2 or greater are prime. 2 \ Initially, let p equal 2, the first prime number. begin dup 1024 < while dup \ n n clear_bits \ n find_one \ index of first one > n repeat ; \ iterates through the bit vector printing the index of a prime number : .sieve ." List of primes " 1024 0 do i sieve .@ \ get the bit corresponding to the integer. if i . ." , " then loop ;
That's awesome. I love code portability. I'm also really liking the create does> construct. It's key to raising the level of semantic abstraction in FORTH from low level to higher level. All that bit vector grunt work was nicely encapsulated.
\ Create an associative array by left shifting Roman values with the assoicated character create (arabic) 1000 128 * char M + , \ 128077 500 128 * char D + , \ 64068 100 128 * char C + , \ 12867 50 128 * char L + , \ 6476 10 128 * char X + , \ 1368 5 128 * char V + , \ 726 1 128 * char I + , \ 201 does> ( char addr -- value value ) 7 cells \ char addr 7*cell_size bounds \ char addr+7*cell_size addr do i @ \ char array[i] over over \ char array[i] char array[i] 127 and = \ char array[i] f if \ chars match, then return value nip 7 rshift leave else drop \ char then 1 cells \ increment index one cell. +loop dup ; \ parses a counted string in roman numeral form. : >arabic ( addr count - ) 0 dup \ addr count 0 0 >r >r \ addr count begin over over \ addr count addr count while \ count is non-zero c@ \ load the character at addr dup \ addr count char char (arabic) \ addr count char value value rot <> \ addr count value f while r> over r> over over > if 2* negate + else drop then + swap >r >r 1 /string repeat then drop 2drop r> r> drop ; s" MCMLXXXIV" >arabic .
: /string ( addr count num -- addr count) dup >r - swap r> + swap ;
\ Helpers used to create cell values, and access portion of the cell. : makePair 7 lshift + ; ( char value -- array[i] ) : >char 127 and ; ( array[i] -- char ) : >value 7 rshift ; ( array[i] -- value ) \ Create an associative array by left shifting Roman values with the assoicated character create (arabic) char M 1000 makePair , \ 128077 char D 500 makePair , \ 64068 char C 100 makePair , \ 12867 char L 50 makePair , \ 6476 char X 10 makePair , \ 1368 char V 5 makePair , \ 726 char I 1 makePair , \ 201 does> ( char addr -- value value ) 7 cells \ char addr 7*cell_size bounds \ char addr+7*cell_size addr do i @ \ char array[i] over over \ char array[i] char array[i] >char = \ char array[i] f if \ chars match, then return value nip >value leave else drop \ char then 1 cells \ increment index one cell. +loop dup ; \ adds the value of char to sum and keeps a copy of its value in digit : do_digit ( sum digit char -- new_sum new_digit ) (arabic) >r \ sum digit new_digit over over \ sum digit new_digit digit new_digit < if \ old digit is less than new, so subtract it from sum -rot 2* negate + swap \ adjusted_sum new_digit else nip \ sum new_digit then + r> \ new_sum new_digit ; \ parses a counted string in roman numeral form. : >arabic ( addr count - ) 0 -rot 0 -rot \ sum digit addr count begin over over \ sum digit addr count addr count while \ count is non-zero c@ \ load the character at addr -rot >r >r \ sum digit char do_digit \ sum digit r> r> \ sum digit addr count 1 /string \ advance to next character repeat \ ( sum digit addr count addr ) 2drop \ sum digit addr 2drop \ sum ; s" MCMLXXXIV" >arabic .
I'll run this by the college kids and see what they say.
Nice exercise!
Yes, I've run this under gforth, and that construct was in the original code I found here: http://rosettacode.org/wiki/Roman_numerals/Decode#Forth
What is going on is that (arabic) is not a word, it is the name of an array allocated by create. But you have the option of adding code that executes when the array is referenced via the does> construct. The does> construct requires the ';' to terminate its definition, but create doesn't require the preceding ':'. It's a bit confusing, but that's why this reverse engineering exercise was worth doing.
It seems like the use of does> in this context is unusual since it is only applied to the single word (arabic). For me, the code would look cleaner if it defined the array and the "does>" portion of (arabic) as separate words, but I understand that the original code was written using does>.
The original code was definitely on the obscure side. I feel I improved it, but you're right there is further room for improvement.
* Obviously I understand variables, they are a storage location referenced by an address that contains a value. The symbolic name assigned to that variable is stored in some sort of look up table which translates the name into the address and deposit it on the stack.
* On the surface I understand a constant. It is a symbolic name for an unchanging value. In compiled languages a constant is available only at compile time, often the value is stored in the code, and referenced using the immediate mode of the underlying instruction set. But FORTH is a threaded interpreter, so how does that impact this concept?
* A value seems like the most nebulous concept. It seems to behave identically to a constant. For example "10 constant foo" results in 10 on the stack when you type foo. Likewise "10 value foo" results in the same thing. So this puzzles me.
So I decided to find out why and what each does, and armed with the see word I followed the trail below:
see variable : Variable Create 0 , ; ok ok see constant see constant : Constant (Constant) , ; ok ok see (Constant) see (Constant) : (Constant) header reveal docon: cfa, ; ok ok see docon: see docon: : docon: 2126060304 @ ; ok ok see value see value : Value (Value) , ; ok ok see (Value) see (Value) : (Value) header reveal dovalue: cfa, ; ok ok see dovalue: see dovalue: : dovalue: 2126060752 @ ; ok ok
I think I understand the variable definition, but the rest are alien to me. However, the definitions of constant and value seem similar to me.
dovalue: . dovalue: . 4199741 ok ok docon: . docon: . 4199626 ok ok see to see to noname : ' dup >definer 2126112392 >definer = IF 8 + ! ELSE -32 throw THEN ; latestxt noname : COMP' drop dup >definer 2126112392 >definer over = IF drop 8 + POSTPONE ALiteral 4268668 compile, ELSE 2126129400 >definer over = IF drop 4269264 compile, 8 + @ lp-offset, 4268680 compile, ELSE 2126129496 >definer over = IF drop 4269264 compile, 8 + @ lp-offset, 4268668 compile, ELSE 2126129432 >definer over = IF drop 4269264 compile, 8 + @ lp-offset, 4268684 compile, ELSE 2126129464 >definer over = IF drop 4269264 compile, 8 + @ lp-offset, 4268996 compile, ELSE -32 throw drop THEN THEN THEN THEN THEN ; latestxt interpret/compile: TO ok ok
Good, now I feel I have a right to be confused about the difference between constant and value. I've made a good faith effort, so I'm going to chalk this up as likely the result of some sort of legacy in the language. It might get clearer over time, or remain a perpetual mystery like some of the weirder bits of C++.
In pfth the definitions for CONSTANT, VALUE and TO are as follows:
: constant create here ! cellsize allot does> @ ; : value create here ! cellsize allot does> @ ; : to ' >body state @ if postpone literal [compile] ! else ! then ; immediate
pfth doesn't do any error checking on the use of TO, so it could be applied to a value, constant or even a variable.