Here is the latest program (v026) to calculate the skip fields.
Included with the exe are the spin2 interpreter with adjustments to create the skip fields.
The source-skip.spin2 is the output from the skip program. It is designed to be an include file in a CON section.
The python exe takes the source filename as a parameter from the commandline or user.
Here is an example...
This code in the source file...
'~22------------------------------------' + + + + + + + + + + + + +
mod_iso mov w,x ' a b g i k l a: ++var, var++ (isolated) iso *RR
mod_psh pusha x ' | | c d e f | h | j | | m b: --var, var-- (isolated) push *RR
alti rd ' a b c d e f g h i j k l m c: ++var (push) rd *RR
rd_field call #\rdf ' a b c d e f g h i j k l m d: --var (push) (pipeline spacer) rd *RR
xoro32 x ' | | | | | | | | | | | l m e: var++ (push) ?? *RR
mov v,x ' | | | | e f | h | j k l m f: var-- (push) post *RR
add x,#1 ' a | c | e | | | | | | | | g: var!! (isolated) ++ *RR
sub x,#1 ' | b | d | f | | | | | | | h: var!! (push) -- *RR
zerox x,sz ' | | c d | | | | | | | | | i: var! (isolated) ptr *RR
muxz x,_FFFFFFFF ' | | | | | | g h | | | | | j: var! (push) !! *RR
not x ' | | | | | | | | i j | | | k: var\new (swap) ! *RR
mov x,w ' | | | | | | | | | | k | | l: ??var (isolated) swap *RR
alti wr ' a b c d e f g h i j k l m m: ??var (push) wr *RR
wr_field call #\wrf ' a b c d e f g h i j k l m (pipeline spacer) wr *RR
_ret_ mov x,w ' a b | | | | g | i | | l | iso *RR
_ret_ mov x,v ' | | e f h j k m iso *RR
ret ' c d main *RR
'~--------------------------------------' - - - - - - - - - - - - -
will generate the following by the skip program...
I've got SEND working. It's really simple and efficient
Do you all think it would be a good idea to add some basic number-printing and string-printing functionality to it, in order to get around needing another object for simple programs?
I've got things set up now so that I can call very compact Spin2 bytecode methods that are hidden in the interpreter. The compiler sets it all up and keeps the syntax very simple.
Since doing so doesn't preclude users from using their own objects with similar functionality, then it seems worthy of consideration to me (if it'll all fit, I mean). Not having to include specific objects to do such basic things could make coding appear simpler to novice users, which could also be beneficial in education and make the job of teaching a bit easier. So, I wouldn't write it off so fast. However, admittedly, it could result in Spin2 getting somewhat "fractured" if it is not supported in (all/both) the main versions out there.
Then again, I voiced support for having things like IsLetter and so on built-in to the hardware itself for speed and convenience on P2Hot, when it can be done pretty well in software. So don't go by me, as maybe that shows how "lazy" I am. Heck, I've even wondered if routines for such common character parsing functions could be part of Spin2, even if not hardware accelerated (as perhaps they still might be a bit faster with the interpreter's help and also would be more convenient).
Chip,
Instead of having this stuff "hidden in the interpreter", why not just make a "standard library" of these things that you provide with the compiler as source.
If you want to make it feel more like it's built in, then perhaps you could make it so that when you include an object in the OBJ section you could optionally have it treat that object as if it was part of the object including it (extending it).
Objects included this way would be restricted to not have any name conflicts for methods and variables, but that's fine. Maybe it's a new section instead of OBJ?
It's kind of but not an inheritance mechanism, more of an extension since you won't be overriding anything, but the parent object would have full access to everything in the object it included this way, as if it was just more code in itself.
(It's really just the same as an #include in C.)
I really don't like having things that should be in a library or included object being hidden inside the interpreter.
Evanh, it took me a few hours today to figure out what was needed to get the right answers, like Fastspin does. I wound up using slightly different logic, though. I now detect if a label declared under ORGH was used as the address. That tells me if it is a hub address, even though it might be under $400. I consider the cog registers and LUT registers relatively addressable together. Here is my logic:
How to do the logic is debatable. I went with forcing absolute when a cog/LUT address is interacting with a hub address. Relative addressing is used for cog/LUT-to-cog/LUT or hub-to-hub.
Thanks a lot for putting this test program together. It was really helpful.
I don't know if you noticed, it's not commented at all, but I started playing around with proving relative branching works between hubexec and cogexec. That was shortly after Eric and Dave had adjusted their assemblers. I was kind of amused it worked without issue.
I don't know if you noticed, it's not commented at all, but I started playing around with proving relative branching works between hubexec and cogexec. That was shortly after Eric and Dave had adjusted their assemblers. I was kind of amused it worked without issue.
I was starting to think about that as I worked. I'm too tired of working on this today to think clearly about the ramifications of relative being the default in all cases. I had to look at your program output for a while before I understood what the goal was. It is a great set of test cases.
Chip,
OpenSpin uses a simple hashtable for the symbols verses the linear search done by your original x86 code. Lookup speed is basically the same no matter how many symbols there are, so the perf gain grows pretty dramatically as you add more symbols. For large numbers of symbols, lookups can get to over 1000x times faster.
There were a few OBEX programs that would take minutes to compile with propellent, but a fraction of a second with OpenSpin because of the symbol lookup difference.
OpenSpin also has no hard limit on number of symbols, and symbols can be up to 254 chars instead of only 31.
Chip,
OpenSpin uses a simple hashtable for the symbols verses the linear search done by your original x86 code. Lookup speed is basically the same no matter how many symbols there are, so the perf gain grows pretty dramatically as you add more symbols. For large numbers of symbols lookups can get to over 1000x times faster.
There were a few OBEX programs that would take minutes to compile with propellent, but a fraction of a second with OpenSpin because of the symbol lookup difference.
Yes, I should have mentioned, this idea came from you and OpenSpin.
I am thinking of just adding the symbol characters up to get the hash. I started to do this a long time back (8031 assembler) and I did some test and summing the characters seemed the best way to do it. Is that too simplistic of an approach, though?
I had a dream last night in which I realized that for 32-chr-max symbols, a summed-byte hash would only require a table of size 8+5 bits, or 8,192 entries. Actually, it could be 4,096 entries, since symbols only contain characters under 128, which take only 7 bits, not 8.
Chip,
Summing would work, but you'd likely have more collisions. I highly recommend using these instead:
// calculate a hash value of a zero terminated string (uppercased)
// uses Jenkins One-at-a-time hash function
int GetStringHashUppercase(const char* s)
{
int hash = 0;
while (*s != 0)
{
int c = *s;
c = c - (32 * (c >= 'a' && c <= 'z'));
hash += c;
hash += (hash << 10);
hash ^= (hash >> 6);
s++;
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
// calculate a hash value of a zero terminated string
// uses Jenkins One-at-a-time hash function
int GetStringHash(const char* s)
{
int hash = 0;
while (*s != 0)
{
hash += *s;
hash += (hash << 10);
hash ^= (hash >> 6);
s++;
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
OpenSpin only uses the uppercase one because symbols are all upper cased in spin 1 internally. Let me know if you need the C translated to something else.
OpenSpin has 3 hashtables for symbols, one for the built in symbols, one for the user symbols, and then one for the temp user symbols (used for local symbols during compiling so it's wiped and refilled a lot).
The built in table is a 256 entry one, and is filled once and never cleared.
The user one is a 8192 entry one, I chose that size to reduce collisions for programs with lots of symbols. It could be 4096 without impacting perf very much, but 8192 is already not that much memory for my hashtable setup.
The temp one is a 1024 entry one, because it's just for local symbols during function compiles. It could be smaller still. It get's filled and wiped constantly during compiles.
My FindSymbol() function does try each table in order to find symbols, and I considered doing something more elaborate. However, collisions are so rare that it just doesn't matter.
I don't have fixed sized strings like you did, so symbols only use up as much memory as they need. I would recommend using a heap buffer for your symbol strings (packed), and just have your hashtable entries hold offsets into that. Then you can limit the total symbol memory usage without limiting the individual symbol length.
It could be more memory efficient, but it wasn't really needed.
I never found the hash generation time to be significant in profiles.
I'd worry that if you modify things you are going to get more collisions and impact lookup time way more than any savings you might get generating hashes.
OpenSpin has 3 hashtables for symbols, one for the built in symbols, one for the user symbols, and then one for the temp user symbols (used for local symbols during compiling so it's wiped and refilled a lot).
The built in table is a 256 entry one, and is filled once and never cleared.
The user one is a 8192 entry one, I chose that size to reduce collisions for programs with lots of symbols. It could be 4096 without impacting perf very much, but 8192 is already not that much memory for my hashtable setup.
The temp one is a 1024 entry one, because it's just for local symbols during function compiles. It could be smaller still. It get's filled and wiped constantly during compiles.
My FindSymbol() function does try each table in order to find symbols, and I considered doing something more elaborate. However, collisions are so rare that it just doesn't matter.
I don't have fixed sized strings like you did, so symbols only use up as much memory as they need. I would recommend using a heap buffer for your symbol strings (packed), and just have your hashtable entries hold offsets into that. Then you can limit the total symbol memory usage without limiting the individual symbol length.
It could be more memory efficient, but it wasn't really needed.
I remember all those things you told me. I've been thinking today about how you had three tables and I understand why.
My symbol table has been packed, all along. Even the value is packed, not just the string:
;
;
; Enter symbol2 into symbol table
; symbol2 must hold name, terminated with 0
; al must hold type
; ebx must hold value
;
enter_symbol2: call print_symbol2
enter_symbol2i:
push eax
push ebx
push ecx
push esi
push edi
push eax
lea edi,[symbol2] ;get symbol length in bits 4..0
call measure_symbol
mov al,cl
xor esi,esi ;get value size in bits 7..5
dec esi
jmp @@getsize
@@size: shr esi,8
ror ebx,8
add al,20h
@@getsize: test ebx,esi
jnz @@size
mov edi,[symbol_table_ptr] ;symbol table full?
cmp edi,offset symbol_table+symbol_table_limit-symbol_limit-6-2
ja error_stif
stosb ;enter symbol length and value size
lea esi,[symbol2] ;enter symbol2
rep movsb
shr al,5 ;enter value
mov cl,al
mov eax,ebx
jecxz @@zerovalue
@@value: rol eax,8
stosb
loop @@value
@@zerovalue:
pop eax ;enter type
stosb
mov [symbol_table_ptr],edi ;update symbol_table_ptr
pop edi
pop esi
pop ecx
pop ebx
pop eax
ret
Maybe it was some other part that had fixed sized strings for the symbol name? There are other areas of the compiler that just use big fixed size arrays of strings of fixed sizes (like filenames and such, with rather low fixed limits too).
Also, if I recall right you had a fixed limit to the number of symbols total. It's been a long time, so my memory is fuzzy on the details of the x86 code.
Anyway, I am glad to hear you are improving symbol stuff.
Maybe it was some other part that had fixed sized strings for the symbol name? There are other areas of the compiler that just use big fixed size arrays of strings of fixed sizes (like filenames and such, with rather low fixed limits too).
Also, if I recall right you had a fixed limit to the number of symbols total. It's been a long time, so my memory is fuzzy on the details of the x86 code.
Anyway, I am glad to hear you are improving symbol stuff.
Yes, it was the set of filenames that I had to communicate back to the host app. They were in fixed-size arrays within a structure. It seemed way simpler to do it that way.
Roy, do you think I could just use as many LSBs of the hash as I want to address my table? Maybe there's no need to XOR any bits together, but any set of bits could be used from within the hash. What do you think? How did you do it?
My hashtable uses the lower x bits of the hash key to index the buckets.
You still need the full hash calculation, because it uses the full hash key to compare with to make sure you got the right result from the bucket which could have more than one entry. Also, many symbols will have the same lower bits without being a match.
Also, once I get a result from the hashtable lookup, I still string compere the symbol names to handle hash collisions.
My hashtable uses the lower x bits of the hash key to index the buckets.
You still need the full hash calculation, because it uses the full hash key to compare with to make sure you got the right result.
Right, the hash can't be simplified just because you don't want to use all of its output bits. Are you saying, though, that you have some secondary use for the full hash? Once you get the index to the bucket, the hash is no longer important, right?
You need the full hash key stored in each entry in the hashtable.
When you get to the bucket, the entry their will have the same lower bits for sure as your lookup hash key, but may not have the same upper bits and thus not be a match.
Also, your bucket might have more than one entry in it, you need to walk those until you get one that matches the full hash key.
Additionally, if you have a hash key collision (pretty darn rare) you will have two entries in the same bucket with the same full hash key, so you still need to compare the actual symbol names to guarantee you have the right result.
If you don't use the full hash key, then you will have to do string compares of every entry in the bucket.
...If you don't use the full hash key, then you will have to do string compares of every entry in the bucket.
Before I do a string compare, I do a string-length compare, since there is a length byte in the record. I think that is maybe more efficient than looking for the exact hash and then still having to do a string compare. What do you think?
Having the string-length byte as part of the record makes it easy to walk through the records within a bucket.
If you have multiple symbols of the same length (common) then your method will potentially do multiple string compares, where using the hash key won't.
My method only does more than one string compare when there is a hash key collision which is super rare.
If you have multiple symbols of the same length (common) then your method will potentially do multiple string compares, where using the hash key won't.
Yes, but it will be exceedingly rare to have a hash-bucket collision, at all. Maybe how that gets handled is a wash. The string compare is just a 'repe cmpsb' combo, and that would only occur on a length-byte match. It seems no less efficient than comparing the full hash and then doing a string compare to be sure.
Comments
Evanh, thanks for bringing this up. I'll get to this very soon.
Included with the exe are the spin2 interpreter with adjustments to create the skip fields.
The source-skip.spin2 is the output from the skip program. It is designed to be an include file in a CON section.
The python exe takes the source filename as a parameter from the commandline or user.
Here is an example...
This code in the source file... will generate the following by the skip program...
So, the existing source long definitions... will be changed (once by the user) to use the CONstant labels generated... and the included CON section (lines generated by the skip program) are...
This way, the skip program only needs to be run before compilation and its' output included in a CON section of the source file.
I didn't want to use any special python libraries to keep it simple.
Do you all think it would be a good idea to add some basic number-printing and string-printing functionality to it, in order to get around needing another object for simple programs?
I've got things set up now so that I can call very compact Spin2 bytecode methods that are hidden in the interpreter. The compiler sets it all up and keeps the syntax very simple.
I think these would be useful:
STR(stringaddress)
DEC(value, format)
HEX(value, digits)
BIN(value, digits)
I wouldn't want to steal those names from general usage potential, though. An object would not do that.
If we put those in, there's a temptation to put some serial I/O in, also.
Maybe I should just let objects handle these things?
Then again, I voiced support for having things like IsLetter and so on built-in to the hardware itself for speed and convenience on P2Hot, when it can be done pretty well in software. So don't go by me, as maybe that shows how "lazy" I am. Heck, I've even wondered if routines for such common character parsing functions could be part of Spin2, even if not hardware accelerated (as perhaps they still might be a bit faster with the interpreter's help and also would be more convenient).
Instead of having this stuff "hidden in the interpreter", why not just make a "standard library" of these things that you provide with the compiler as source.
If you want to make it feel more like it's built in, then perhaps you could make it so that when you include an object in the OBJ section you could optionally have it treat that object as if it was part of the object including it (extending it).
Objects included this way would be restricted to not have any name conflicts for methods and variables, but that's fine. Maybe it's a new section instead of OBJ?
It's kind of but not an inheritance mechanism, more of an extension since you won't be overriding anything, but the parent object would have full access to everything in the object it included this way, as if it was just more code in itself.
(It's really just the same as an #include in C.)
I really don't like having things that should be in a library or included object being hidden inside the interpreter.
Evanh, it took me a few hours today to figure out what was needed to get the right answers, like Fastspin does. I wound up using slightly different logic, though. I now detect if a label declared under ORGH was used as the address. That tells me if it is a hub address, even though it might be under $400. I consider the cog registers and LUT registers relatively addressable together. Here is my logic:
How to do the logic is debatable. I went with forcing absolute when a cog/LUT address is interacting with a hub address. Relative addressing is used for cog/LUT-to-cog/LUT or hub-to-hub.
Thanks a lot for putting this test program together. It was really helpful.
I don't know if you noticed, it's not commented at all, but I started playing around with proving relative branching works between hubexec and cogexec. That was shortly after Eric and Dave had adjusted their assemblers. I was kind of amused it worked without issue.
I was starting to think about that as I worked. I'm too tired of working on this today to think clearly about the ramifications of relative being the default in all cases. I had to look at your program output for a while before I understood what the goal was. It is a great set of test cases.
Any progress yet?
Yes, it's all working. Just making some last-minute enhancements to the symbol table to make it go 100x faster. I will post something here soon.
OpenSpin uses a simple hashtable for the symbols verses the linear search done by your original x86 code. Lookup speed is basically the same no matter how many symbols there are, so the perf gain grows pretty dramatically as you add more symbols. For large numbers of symbols, lookups can get to over 1000x times faster.
There were a few OBEX programs that would take minutes to compile with propellent, but a fraction of a second with OpenSpin because of the symbol lookup difference.
OpenSpin also has no hard limit on number of symbols, and symbols can be up to 254 chars instead of only 31.
Yes, I should have mentioned, this idea came from you and OpenSpin.
I am thinking of just adding the symbol characters up to get the hash. I started to do this a long time back (8031 assembler) and I did some test and summing the characters seemed the best way to do it. Is that too simplistic of an approach, though?
I had a dream last night in which I realized that for 32-chr-max symbols, a summed-byte hash would only require a table of size 8+5 bits, or 8,192 entries. Actually, it could be 4,096 entries, since symbols only contain characters under 128, which take only 7 bits, not 8.
Summing would work, but you'd likely have more collisions. I highly recommend using these instead: OpenSpin only uses the uppercase one because symbols are all upper cased in spin 1 internally. Let me know if you need the C translated to something else.
I just found this:
That's where I was headed.
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
The built in table is a 256 entry one, and is filled once and never cleared.
The user one is a 8192 entry one, I chose that size to reduce collisions for programs with lots of symbols. It could be 4096 without impacting perf very much, but 8192 is already not that much memory for my hashtable setup.
The temp one is a 1024 entry one, because it's just for local symbols during function compiles. It could be smaller still. It get's filled and wiped constantly during compiles.
My FindSymbol() function does try each table in order to find symbols, and I considered doing something more elaborate. However, collisions are so rare that it just doesn't matter.
I don't have fixed sized strings like you did, so symbols only use up as much memory as they need. I would recommend using a heap buffer for your symbol strings (packed), and just have your hashtable entries hold offsets into that. Then you can limit the total symbol memory usage without limiting the individual symbol length.
It could be more memory efficient, but it wasn't really needed.
Seems like a lot of time will be spent computing the hash.
Do you think I can fold the 32-bit hash down to something smaller by XOR'ing the bits together, like (hash[31:16] ^ hash[15:0])?
I'd worry that if you modify things you are going to get more collisions and impact lookup time way more than any savings you might get generating hashes.
I remember all those things you told me. I've been thinking today about how you had three tables and I understand why.
My symbol table has been packed, all along. Even the value is packed, not just the string:
Also, if I recall right you had a fixed limit to the number of symbols total. It's been a long time, so my memory is fuzzy on the details of the x86 code.
Anyway, I am glad to hear you are improving symbol stuff.
Yes, it was the set of filenames that I had to communicate back to the host app. They were in fixed-size arrays within a structure. It seemed way simpler to do it that way.
You still need the full hash calculation, because it uses the full hash key to compare with to make sure you got the right result from the bucket which could have more than one entry. Also, many symbols will have the same lower bits without being a match.
Also, once I get a result from the hashtable lookup, I still string compere the symbol names to handle hash collisions.
Right, the hash can't be simplified just because you don't want to use all of its output bits. Are you saying, though, that you have some secondary use for the full hash? Once you get the index to the bucket, the hash is no longer important, right?
When you get to the bucket, the entry their will have the same lower bits for sure as your lookup hash key, but may not have the same upper bits and thus not be a match.
Also, your bucket might have more than one entry in it, you need to walk those until you get one that matches the full hash key.
Additionally, if you have a hash key collision (pretty darn rare) you will have two entries in the same bucket with the same full hash key, so you still need to compare the actual symbol names to guarantee you have the right result.
If you don't use the full hash key, then you will have to do string compares of every entry in the bucket.
Before I do a string compare, I do a string-length compare, since there is a length byte in the record. I think that is maybe more efficient than looking for the exact hash and then still having to do a string compare. What do you think?
Having the string-length byte as part of the record makes it easy to walk through the records within a bucket.
My method only does more than one string compare when there is a hash key collision which is super rare.
Yes, but it will be exceedingly rare to have a hash-bucket collision, at all. Maybe how that gets handled is a wash. The string compare is just a 'repe cmpsb' combo, and that would only occur on a length-byte match. It seems no less efficient than comparing the full hash and then doing a string compare to be sure.