Spin2

cgracey · 2020-01-31 22:21

Chip,
I've just tested Pnut v33p for the LOC bug and found it is still present. Further reading - https://forums.parallax.com/discussion/comment/1457051/#Comment_1457051

Here's my latest testing source repacked to suit Pnut. And also the correct output of Fastspin along with the buggy output of Pnut.

Fastspin

 LOC  PA, #label testing
=========================

 CogExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000015 fe9ffff2 00000008   11111111 00000000 fdb00078
hub110   00000019 fe9000f6 00000110   52525252 22222222 00000000
hub4a4   0000001d fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   00000021 fe8003a0 000003a0   facbeff6 00000000 44444444

 HubExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000468 fe800008 00000008   11111111 00000000 fdb00078
hub110   00000478 fe9ffc94 00000110   52525252 22222222 00000000
hub4a4   00000488 fe900018 000004a4   00000000 33333333 fd640200
lut3a0   00000498 fe8003a0 000003a0   facbeff6 00000000 44444444

 LutExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   000003a7 fe800008 00000008   11111111 00000000 fdb00078
hub110   000003ab fe800110 00000110   52525252 22222222 00000000
hub4a4   000003af fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   000003b3 fe9fffec 000003a0   facbeff6 00000000 44444444

Pnut

 LOC  PA, #label testing
=========================

 CogExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000015 fe9fffc8 000fffde   8607ee30 00000000 fa97fe3e
hub110   00000019 fe9003d8 000003f2   fd63ec18 00000000 fd6bec18
hub4a4   0000001d fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   00000021 fe900df8 00000e1a   fda004a8 00000000 fd7c002d

 HubExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000468 fe800008 00000008   11111111 00000000 fdb00078
hub110   00000478 fe800110 00000110   52525252 22222222 00000000
hub4a4   00000488 fe900018 000004a4   00000000 33333333 fd640200
lut3a0   00000498 fe8003a0 000003a0   facbeff6 00000000 44444444

 LutExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   000003a7 fe9ff180 000ff528   fd8003a1 9e3e3e39 00000000
hub110   000003ab fe9ff590 000ff93c   2aa3ec03 21030111 00000000
hub4a4   000003af fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   000003b3 fe9fffb0 00000364   6f432020 00000000 00000000

Evanh, thanks for bringing this up. I'll get to this very soon.

Cluso99 · 2020-02-01 07:11

Here is the latest program (v026) to calculate the skip fields.
Included with the exe are the spin2 interpreter with adjustments to create the skip fields.
The source-skip.spin2 is the output from the skip program. It is designed to be an include file in a CON section.
The python exe takes the source filename as a parameter from the commandline or user.

Here is an example...
This code in the source file...

'~22------------------------------------' + + + + + + + + + + + + +
mod_iso         mov     w,x             ' a b         g   i   k l         a: ++var, var++ (isolated)                        iso   *RR 
mod_psh         pusha   x               ' | | c d e f | h | j | | m       b: --var, var-- (isolated)                        push  *RR 
                alti    rd              ' a b c d e f g h i j k l m       c: ++var        (push)                            rd    *RR 
rd_field        call    #\rdf           ' a b c d e f g h i j k l m       d: --var        (push)          (pipeline spacer) rd    *RR 
                xoro32  x               ' | | | | | | | | | | | l m       e: var++        (push)                            ??    *RR 
                mov     v,x             ' | | | | e f | h | j k l m       f: var--        (push)                            post  *RR 
                add     x,#1            ' a | c | e | | | | | | | |       g: var!!        (isolated)                        ++    *RR 
                sub     x,#1            ' | b | d | f | | | | | | |       h: var!!        (push)                            --    *RR 
                zerox   x,sz            ' | | c d | | | | | | | | |       i: var!         (isolated)                        ptr   *RR 
                muxz    x,_FFFFFFFF     ' | | | | | | g h | | | | |       j: var!         (push)                            !!    *RR 
                not     x               ' | | | | | | | | i j | | |       k: var\new      (swap)                            !     *RR 
                mov     x,w             ' | | | | | | | | | | k | |       l: ??var        (isolated)                        swap  *RR 
                alti    wr              ' a b c d e f g h i j k l m       m: ??var        (push)                            wr    *RR 
wr_field        call    #\wrf           ' a b c d e f g h i j k l m                                       (pipeline spacer) wr    *RR 
        _ret_   mov     x,w             ' a b | | | | g | i | | l |                                                         iso   *RR 
        _ret_   mov     x,v             '     | | e f   h   j k   m                                                         iso   *RR 
                ret                     '     c d                                                                           main  *RR 
'~--------------------------------------' - - - - - - - - - - - - -

will generate the following by the skip program...

mod_iso_skip_a          = mod_iso         |        %000111110110010 << 10  '
mod_iso_skip_b          = mod_iso         |        %000111101110010 << 10  '
mod_psh_skip_c          = mod_psh         |      %0110011101011000_ << 10  '
mod_psh_skip_d          = mod_psh         |      %0110011100111000_ << 10  '
mod_psh_skip_e          = mod_psh         |       %010011111001000_ << 10  '
mod_psh_skip_f          = mod_psh         |       %010011110101000_ << 10  '
mod_iso_skip_g          = mod_iso         |        %000110111110010 << 10  '
mod_psh_skip_h          = mod_psh         |       %010011011101000_ << 10  '
mod_iso_skip_i          = mod_iso         |        %000101111110010 << 10  '
mod_psh_skip_j          = mod_psh         |       %010010111101000_ << 10  '
mod_iso_skip_k          = mod_iso         |       %0100011111010010 << 10  '
mod_iso_skip_l          = mod_iso         |        %000111111000010 << 10  '
mod_psh_skip_m          = mod_psh         |       %010011111100000_ << 10  '

So, the existing source long definitions...

bc_var_inc              long  mod_iso   |       %000111110110010 << 10  '0A     ++var, var++    (isolated)
bc_var_dec              long  mod_iso   |       %000111101110010 << 10  '0B     --var, var--    (isolated)
bc_var_preinc_push      long  mod_psh   |     %0110011101011000_ << 10  '0C     ++var           (push)
bc_var_predec_push      long  mod_psh   |     %0110011100111000_ << 10  '0D     --var           (push)
bc_var_postinc_push     long  mod_psh   |      %010011111001000_ << 10  '0E     var++           (push)
bc_var_postdec_push     long  mod_psh   |      %010011110101000_ << 10  '0F     var--           (push)
bc_var_lognot           long  mod_iso   |       %000110111110010 << 10  '10     var!!           (isolated)
bc_var_lognot_push      long  mod_psh   |      %010011011101000_ << 10  '11     var!!           (push)
bc_var_bitnot           long  mod_iso   |       %000101111110010 << 10  '12     var!            (isolated)
bc_var_bitnot_push      long  mod_psh   |      %010010111101000_ << 10  '13     var!            (push)
bc_var_swap             long  mod_iso   |      %0100011111010010 << 10  '14     var\new         (swap)
bc_var_rnd              long  mod_iso   |       %000111111000010 << 10  '15     ??var           (isolated)
bc_var_rnd_push         long  mod_psh   |      %010011111100000_ << 10  '16     ??var           (push)

will be changed (once by the user) to use the CONstant labels generated...

bc_var_inc              long  mod_iso_skip_a    '0A     ++var, var++    (isolated)
bc_var_dec              long  mod_iso_skip_b    '0B     --var, var--    (isolated)
bc_var_preinc_push      long  mod_psh_skip_c    '0C     ++var           (push)
bc_var_predec_push      long  mod_psh_skip_d    '0D     --var           (push)
bc_var_postinc_push     long  mod_psh_skip_e    '0E     var++           (push)
bc_var_postdec_push     long  mod_psh_skip_f    '0F     var--           (push)
bc_var_lognot           long  mod_iso_skip_g    '10     var!!           (isolated)
bc_var_lognot_push      long  mod_psh_skip_h    '11     var!!           (push)
bc_var_bitnot           long  mod_iso_skip_i    '12     var!            (isolated)
bc_var_bitnot_push      long  mod_psh_skip_j    '13     var!            (push)
bc_var_swap             long  mod_iso_skip_k    '14     var\new         (swap)
bc_var_rnd              long  mod_iso_skip_l    '15     ??var           (isolated)
bc_var_rnd_push         long  mod_psh_skip_m    '16     ??var           (push)

and the included CON section (lines generated by the skip program) are...

CON
mod_iso_skip_a          = mod_iso         |        %000111110110010 << 10  '
mod_iso_skip_b          = mod_iso         |        %000111101110010 << 10  '
mod_psh_skip_c          = mod_psh         |      %0110011101011000_ << 10  '
mod_psh_skip_d          = mod_psh         |      %0110011100111000_ << 10  '
mod_psh_skip_e          = mod_psh         |       %010011111001000_ << 10  '
mod_psh_skip_f          = mod_psh         |       %010011110101000_ << 10  '
mod_iso_skip_g          = mod_iso         |        %000110111110010 << 10  '
mod_psh_skip_h          = mod_psh         |       %010011011101000_ << 10  '
mod_iso_skip_i          = mod_iso         |        %000101111110010 << 10  '
mod_psh_skip_j          = mod_psh         |       %010010111101000_ << 10  '
mod_iso_skip_k          = mod_iso         |       %0100011111010010 << 10  '
mod_iso_skip_l          = mod_iso         |        %000111111000010 << 10  '
mod_psh_skip_m          = mod_psh         |       %010011111100000_ << 10  '

This way, the skip program only needs to be run before compilation and its' output included in a CON section of the source file.

cgracey · 2020-02-01 09:02

That looks really good, Cluso99. I will give it a try.

Cluso99 · 2020-02-01 13:07

Here is the python source.
I didn't want to use any special python libraries to keep it simple.

cgracey · 2020-02-01 17:10

I've got SEND working. It's really simple and efficient

Do you all think it would be a good idea to add some basic number-printing and string-printing functionality to it, in order to get around needing another object for simple programs?

I've got things set up now so that I can call very compact Spin2 bytecode methods that are hidden in the interpreter. The compiler sets it all up and keeps the syntax very simple.

I think these would be useful:

STR(stringaddress)
DEC(value, format)
HEX(value, digits)
BIN(value, digits)

I wouldn't want to steal those names from general usage potential, though. An object would not do that.

If we put those in, there's a temptation to put some serial I/O in, also.

Maybe I should just let objects handle these things?

ersmith · 2020-02-01 17:30

I think letting objects handle things is the right way to do it.

JRetSapDoog · 2020-02-01 20:38

Since doing so doesn't preclude users from using their own objects with similar functionality, then it seems worthy of consideration to me (if it'll all fit, I mean). Not having to include specific objects to do such basic things could make coding appear simpler to novice users, which could also be beneficial in education and make the job of teaching a bit easier. So, I wouldn't write it off so fast. However, admittedly, it could result in Spin2 getting somewhat "fractured" if it is not supported in (all/both) the main versions out there.

Then again, I voiced support for having things like IsLetter and so on built-in to the hardware itself for speed and convenience on P2Hot, when it can be done pretty well in software. So don't go by me, as maybe that shows how "lazy" I am. Heck, I've even wondered if routines for such common character parsing functions could be part of Spin2, even if not hardware accelerated (as perhaps they still might be a bit faster with the interpreter's help and also would be more convenient).

Roy Eltham · 2020-02-01 20:40

Chip,
Instead of having this stuff "hidden in the interpreter", why not just make a "standard library" of these things that you provide with the compiler as source.

If you want to make it feel more like it's built in, then perhaps you could make it so that when you include an object in the OBJ section you could optionally have it treat that object as if it was part of the object including it (extending it).
Objects included this way would be restricted to not have any name conflicts for methods and variables, but that's fine. Maybe it's a new section instead of OBJ?

It's kind of but not an inheritance mechanism, more of an extension since you won't be overriding anything, but the parent object would have full access to everything in the object it included this way, as if it was just more code in itself.

(It's really just the same as an #include in C.)

I really don't like having things that should be in a library or included object being hidden inside the interpreter.

cgracey · 2020-02-06 01:54

evanh wrote: »

Chip,
I've just tested Pnut v33p for the LOC bug and found it is still present. Further reading - https://forums.parallax.com/discussion/comment/1457051/#Comment_1457051

Here's my latest testing source repacked to suit Pnut. And also the correct output of Fastspin along with the buggy output of Pnut.

Fastspin

 LOC  PA, #label testing
=========================

 CogExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000015 fe9ffff2 00000008   11111111 00000000 fdb00078
hub110   00000019 fe9000f6 00000110   52525252 22222222 00000000
hub4a4   0000001d fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   00000021 fe8003a0 000003a0   facbeff6 00000000 44444444

 HubExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000468 fe800008 00000008   11111111 00000000 fdb00078
hub110   00000478 fe9ffc94 00000110   52525252 22222222 00000000
hub4a4   00000488 fe900018 000004a4   00000000 33333333 fd640200
lut3a0   00000498 fe8003a0 000003a0   facbeff6 00000000 44444444

 LutExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   000003a7 fe800008 00000008   11111111 00000000 fdb00078
hub110   000003ab fe800110 00000110   52525252 22222222 00000000
hub4a4   000003af fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   000003b3 fe9fffec 000003a0   facbeff6 00000000 44444444

Pnut - NEW!!!

 LOC  PA, #label testing
=========================

 CogExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000015 fe9ffff2 00000008   11111111 00000000 fdb00078
hub110   00000019 fe800110 00000110   52525252 22222222 00000000  absolute!
hub4a4   0000001d fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   00000021 fe90037e 000003a0   facbeff6 00000000 44444444  relative!

 HubExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   00000468 fe800008 00000008   11111111 00000000 fdb00078
hub110   00000478 fe9ffc94 00000110   52525252 22222222 00000000
hub4a4   00000488 fe900018 000004a4   00000000 33333333 fd640200
lut3a0   00000498 fe8003a0 000003a0   facbeff6 00000000 44444444

 LutExec    PC    Op-code  PA-data    Cog-dref Hub-dref LUT-dref
cog008   000003a7 fe9ffc60 00000008   11111111 00000000 fdb00078  relative!
hub110   000003ab fe800110 00000110   52525252 22222222 00000000
hub4a4   000003af fe8004a4 000004a4   00000000 33333333 fd640200
lut3a0   000003b3 fe9fffec 000003a0   facbeff6 00000000 44444444

Evanh, it took me a few hours today to figure out what was needed to get the right answers, like Fastspin does. I wound up using slightly different logic, though. I now detect if a label declared under ORGH was used as the address. That tells me if it is a hub address, even though it might be under $400. I consider the cog registers and LUT registers relatively addressable together. Here is my logic:

orgh mode	orgh symbol used | target address >= $400
-------------------------------------------------------------
0		0		rel	address - (cog_org+1)
0		1		abs
1		0		abs
1		1		rel	address - (hub_org+4)

How to do the logic is debatable. I went with forcing absolute when a cog/LUT address is interacting with a hub address. Relative addressing is used for cog/LUT-to-cog/LUT or hub-to-hub.

Thanks a lot for putting this test program together. It was really helpful.

evanh · 2020-02-06 02:54

Excellent!

I don't know if you noticed, it's not commented at all, but I started playing around with proving relative branching works between hubexec and cogexec. That was shortly after Eric and Dave had adjusted their assemblers. I was kind of amused it worked without issue.

cgracey · 2020-02-06 02:58

evanh wrote: »

Excellent!

I don't know if you noticed, it's not commented at all, but I started playing around with proving relative branching works between hubexec and cogexec. That was shortly after Eric and Dave had adjusted their assemblers. I was kind of amused it worked without issue.

I was starting to think about that as I worked. I'm too tired of working on this today to think clearly about the ramifications of relative being the default in all cases. I had to look at your program output for a while before I understood what the goal was. It is a great set of test cases.

Cluso99 · 2020-02-08 20:00

Chip,
Any progress yet?

cgracey · 2020-02-09 04:33

Cluso99 wrote: »

Chip,
Any progress yet?

Yes, it's all working. Just making some last-minute enhancements to the symbol table to make it go 100x faster. I will post something here soon.

Roy Eltham · 2020-02-09 05:29

Chip,
OpenSpin uses a simple hashtable for the symbols verses the linear search done by your original x86 code. Lookup speed is basically the same no matter how many symbols there are, so the perf gain grows pretty dramatically as you add more symbols. For large numbers of symbols, lookups can get to over 1000x times faster.
There were a few OBEX programs that would take minutes to compile with propellent, but a fraction of a second with OpenSpin because of the symbol lookup difference.

OpenSpin also has no hard limit on number of symbols, and symbols can be up to 254 chars instead of only 31.

cgracey · 2020-02-09 05:36

Roy Eltham wrote: »

Chip,
OpenSpin uses a simple hashtable for the symbols verses the linear search done by your original x86 code. Lookup speed is basically the same no matter how many symbols there are, so the perf gain grows pretty dramatically as you add more symbols. For large numbers of symbols lookups can get to over 1000x times faster.
There were a few OBEX programs that would take minutes to compile with propellent, but a fraction of a second with OpenSpin because of the symbol lookup difference.

Yes, I should have mentioned, this idea came from you and OpenSpin.

I am thinking of just adding the symbol characters up to get the hash. I started to do this a long time back (8031 assembler) and I did some test and summing the characters seemed the best way to do it. Is that too simplistic of an approach, though?

I had a dream last night in which I realized that for 32-chr-max symbols, a summed-byte hash would only require a table of size 8+5 bits, or 8,192 entries. Actually, it could be 4,096 entries, since symbols only contain characters under 128, which take only 7 bits, not 8.

Roy Eltham · 2020-02-09 05:53

Chip,
Summing would work, but you'd likely have more collisions. I highly recommend using these instead:

    // calculate a hash value of a zero terminated string (uppercased)
    // uses Jenkins One-at-a-time hash function
    int GetStringHashUppercase(const char* s)
    {
        int hash = 0;
        while (*s != 0)
        {
            int c = *s;
            c = c - (32 * (c >= 'a' && c <= 'z'));
            hash += c;
            hash += (hash << 10);
            hash ^= (hash >> 6);
            s++;
        }
        hash += (hash << 3);
        hash ^= (hash >> 11);
        hash += (hash << 15);
        return hash;
    }

    // calculate a hash value of a zero terminated string
    // uses Jenkins One-at-a-time hash function
    int GetStringHash(const char* s)
    {
        int hash = 0;
        while (*s != 0)
        {
            hash += *s;
            hash += (hash << 10);
            hash ^= (hash >> 6);
            s++;
        }
        hash += (hash << 3);
        hash ^= (hash >> 11);
        hash += (hash << 15);
        return hash;
    }

OpenSpin only uses the uppercase one because symbols are all upper cased in spin 1 internally. Let me know if you need the C translated to something else.

cgracey · 2020-02-09 06:07

Super! Thanks, Roy. I'll use that.

I just found this:

The LoseLose algorithm (where hash = hash+character) is truly awful. Everything collides into the same 1,375 buckets

That's where I was headed.

https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Roy Eltham · 2020-02-09 06:25

OpenSpin has 3 hashtables for symbols, one for the built in symbols, one for the user symbols, and then one for the temp user symbols (used for local symbols during compiling so it's wiped and refilled a lot).
The built in table is a 256 entry one, and is filled once and never cleared.
The user one is a 8192 entry one, I chose that size to reduce collisions for programs with lots of symbols. It could be 4096 without impacting perf very much, but 8192 is already not that much memory for my hashtable setup.
The temp one is a 1024 entry one, because it's just for local symbols during function compiles. It could be smaller still. It get's filled and wiped constantly during compiles.
My FindSymbol() function does try each table in order to find symbols, and I considered doing something more elaborate. However, collisions are so rare that it just doesn't matter.

I don't have fixed sized strings like you did, so symbols only use up as much memory as they need. I would recommend using a heap buffer for your symbol strings (packed), and just have your hashtable entries hold offsets into that. Then you can limit the total symbol memory usage without limiting the individual symbol length.

It could be more memory efficient, but it wasn't really needed.

cgracey · 2020-02-09 06:27

Roy, I implemented the second example you gave for a z-string:

		push	esi			;compute symbol hash, save symbol pointer
		xor	ecx,ecx			;reset symbol length
		xor	ebx,ebx			;reset hash
@@hash:		xor	eax,eax			;get symbol chr into eax
		lodsb
		cmp	al,0			;if chr = 0, finish hash
		je	@@finish
		add	ebx,eax			;hash += chr
		mov	eax,ebx			;hash += (hash << 10)
		shl	eax,10
		add	ebx,eax
		mov	eax,ebx			;hash ^= (hash >> 6)
		shr	eax,6
		xor	ebx,eax
		inc	ecx			;inc length
		jmp	@@hash			;next chr

@@finish:	mov	eax,ebx			;hash += (hash << 3)
		shl	eax,3
		add	ebx,eax
		mov	eax,ebx			;hash ^= (hash >> 11)
		shr	eax,11
		xor	ebx,eax
		mov	eax,ebx			;hash += (hash << 15)
		shl	eax,15
		add	ebx,eax
		pop	esi			;esi points to symbol, ebx=hash, ecx=length

Seems like a lot of time will be spent computing the hash.

Do you think I can fold the 32-bit hash down to something smaller by XOR'ing the bits together, like (hash[31:16] ^ hash[15:0])?

Roy Eltham · 2020-02-09 06:37

I never found the hash generation time to be significant in profiles.

I'd worry that if you modify things you are going to get more collisions and impact lookup time way more than any savings you might get generating hashes.

cgracey · 2020-02-09 06:38

Roy Eltham wrote: »

OpenSpin has 3 hashtables for symbols, one for the built in symbols, one for the user symbols, and then one for the temp user symbols (used for local symbols during compiling so it's wiped and refilled a lot).
The built in table is a 256 entry one, and is filled once and never cleared.
The user one is a 8192 entry one, I chose that size to reduce collisions for programs with lots of symbols. It could be 4096 without impacting perf very much, but 8192 is already not that much memory for my hashtable setup.
The temp one is a 1024 entry one, because it's just for local symbols during function compiles. It could be smaller still. It get's filled and wiped constantly during compiles.
My FindSymbol() function does try each table in order to find symbols, and I considered doing something more elaborate. However, collisions are so rare that it just doesn't matter.

I don't have fixed sized strings like you did, so symbols only use up as much memory as they need. I would recommend using a heap buffer for your symbol strings (packed), and just have your hashtable entries hold offsets into that. Then you can limit the total symbol memory usage without limiting the individual symbol length.

It could be more memory efficient, but it wasn't really needed.

I remember all those things you told me. I've been thinking today about how you had three tables and I understand why.

My symbol table has been packed, all along. Even the value is packed, not just the string:

;
;
; Enter symbol2 into symbol table
; symbol2 must hold name, terminated with 0
; al must hold type
; ebx must hold value
;
enter_symbol2:	call	print_symbol2
enter_symbol2i:
		push	eax
		push	ebx
		push	ecx
		push	esi
		push	edi

		push	eax

		lea	edi,[symbol2]		;get symbol length in bits 4..0
		call	measure_symbol
		mov	al,cl

		xor	esi,esi			;get value size in bits 7..5
		dec	esi
		jmp	@@getsize
@@size:		shr	esi,8
		ror	ebx,8
		add	al,20h
@@getsize:	test	ebx,esi
		jnz	@@size

		mov	edi,[symbol_table_ptr]	;symbol table full?
		cmp	edi,offset symbol_table+symbol_table_limit-symbol_limit-6-2
		ja	error_stif

		stosb				;enter symbol length and value size

		lea	esi,[symbol2]		;enter symbol2
	rep	movsb

		shr	al,5			;enter value
		mov	cl,al
		mov	eax,ebx
		jecxz	@@zerovalue
@@value:	rol	eax,8
		stosb
		loop	@@value
@@zerovalue:
		pop	eax			;enter type
		stosb

		mov	[symbol_table_ptr],edi	;update symbol_table_ptr

		pop	edi
		pop	esi
		pop	ecx
		pop	ebx
		pop	eax
		ret

Roy Eltham · 2020-02-09 06:44

Maybe it was some other part that had fixed sized strings for the symbol name? There are other areas of the compiler that just use big fixed size arrays of strings of fixed sizes (like filenames and such, with rather low fixed limits too).
Also, if I recall right you had a fixed limit to the number of symbols total. It's been a long time, so my memory is fuzzy on the details of the x86 code.

Anyway, I am glad to hear you are improving symbol stuff.

cgracey · 2020-02-09 06:48

Roy Eltham wrote: »

Maybe it was some other part that had fixed sized strings for the symbol name? There are other areas of the compiler that just use big fixed size arrays of strings of fixed sizes (like filenames and such, with rather low fixed limits too).
Also, if I recall right you had a fixed limit to the number of symbols total. It's been a long time, so my memory is fuzzy on the details of the x86 code.

Anyway, I am glad to hear you are improving symbol stuff.

Yes, it was the set of filenames that I had to communicate back to the host app. They were in fixed-size arrays within a structure. It seemed way simpler to do it that way.

cgracey · 2020-02-09 06:51

Roy, do you think I could just use as many LSBs of the hash as I want to address my table? Maybe there's no need to XOR any bits together, but any set of bits could be used from within the hash. What do you think? How did you do it?

Roy Eltham · 2020-02-09 07:07

My hashtable uses the lower x bits of the hash key to index the buckets.
You still need the full hash calculation, because it uses the full hash key to compare with to make sure you got the right result from the bucket which could have more than one entry. Also, many symbols will have the same lower bits without being a match.

Also, once I get a result from the hashtable lookup, I still string compere the symbol names to handle hash collisions.

cgracey · 2020-02-09 07:11

Roy Eltham wrote: »

My hashtable uses the lower x bits of the hash key to index the buckets.
You still need the full hash calculation, because it uses the full hash key to compare with to make sure you got the right result.

Right, the hash can't be simplified just because you don't want to use all of its output bits. Are you saying, though, that you have some secondary use for the full hash? Once you get the index to the bucket, the hash is no longer important, right?

Roy Eltham · 2020-02-09 07:26

You need the full hash key stored in each entry in the hashtable.
When you get to the bucket, the entry their will have the same lower bits for sure as your lookup hash key, but may not have the same upper bits and thus not be a match.
Also, your bucket might have more than one entry in it, you need to walk those until you get one that matches the full hash key.

Additionally, if you have a hash key collision (pretty darn rare) you will have two entries in the same bucket with the same full hash key, so you still need to compare the actual symbol names to guarantee you have the right result.
If you don't use the full hash key, then you will have to do string compares of every entry in the bucket.

cgracey · 2020-02-09 07:34

Roy Eltham wrote: »

...If you don't use the full hash key, then you will have to do string compares of every entry in the bucket.

Before I do a string compare, I do a string-length compare, since there is a length byte in the record. I think that is maybe more efficient than looking for the exact hash and then still having to do a string compare. What do you think?

Having the string-length byte as part of the record makes it easy to walk through the records within a bucket.

Roy Eltham · 2020-02-09 07:37

If you have multiple symbols of the same length (common) then your method will potentially do multiple string compares, where using the hash key won't.

My method only does more than one string compare when there is a hash key collision which is super rare.

cgracey · 2020-02-09 07:51

Roy Eltham wrote: »

If you have multiple symbols of the same length (common) then your method will potentially do multiple string compares, where using the hash key won't.

Yes, but it will be exceedingly rare to have a hash-bucket collision, at all. Maybe how that gets handled is a wash. The string compare is just a 'repe cmpsb' combo, and that would only occur on a length-byte match. It seems no less efficient than comparing the full hash and then doing a string compare to be sure.

Spin2

Comments