It might be useful to have multi-threading within an LMM cog, but seems like more of a novelty. It would be more useful to dedicate one cog per thread. This would allow the programmer to get the full benefit of the Prop's multi-processor hardware. Of course, there will be cases where 8 cogs is just no enough, and multi-threading with a cog would be required. I suspect that programs with this requirement wouldn't fit in 32K of memory, and they would have to run in XMM mode.
In the ideal world it would be nice to allow for a mix of COG, LMM and XMM threads running at the same time. Speed-sensitive threads would run in COG or LMM mode, and slower background threads could be XMM or multi-threaded XMM.
Well my RamPage has 128kB of RAM, so I'm feeling tempted to try this out!
My only problem with it is that it is so, so very obfuscated. It'd be nice to have some clue as to what's going on under the hood...
On the other hand, I guess it'd be a good lesson in C syntax to try to figure it out...
Does that work? What does he mean by "Now rename the provided c_bios.bin to C and run the emulator."?
I wonder how fast it runs?
DrAc,
exactly as it says, the emulator looks for 3 files at startup, called "A", "B", and "C" (single uppercase letter).
C is the renamed c_bios.bin, A and B are CMP64.COM file (extracted from kaypro zip archive) and again, renamed to a single letter.
While playing with it I've done some helper scripts (for linux), I can send them to you if you want.
Well my RamPage has 128kB of RAM, so I'm feeling tempted to try this out!
My only problem with it is that it is so, so very obfuscated. It'd be nice to have some clue as to what's going on under the hood...
On the other hand, I guess it'd be a good lesson in C syntax to try to figure it out...
There are two problems to solve:
1. telling LCC to relax type checking for pointers (dunno if possible at all, I was quickly skimming thru LCC documentation and it said DON'T treat pointers as integers). Trying to manually cast anything but function arguments is road to sanitarium
...even the syntax highlighting appear a little confused in pairing some of the braces
2. telling LCC to avoid optimizing what appears to be useless code (tried #pragma optimize(0), but it's not the right thing).
Ok, that makes sense. I suppose one needs to "un-obfuscate" the code - change "C" to something like 'CPM", maybe replace ";" with "; then carriage return / line feed", pair up all the braces.
code::blocks had a pretty good go at syntax highlighting and there is an option to pair braces which helps (control/shift then click on a brace and the other one should highlight.
Ok, that makes sense. I suppose one needs to "un-obfuscate" the code - change "C" to something like 'CPM", maybe replace ";" with "; then carriage return / line feed", pair up all the braces.
code::blocks had a pretty good go at syntax highlighting and there is an option to pair braces which helps (control/shift then click on a brace and the other one should highlight.
How far have you got with those two problems?
Ray was referring to compiling it with Catalina, as for running it on linux I have it working, compiled and with some disks imported.
P.S. the trouble with trying to de-obfuscate is that is fundamentally a 3 row program: system(stty...) to set the console at start, another system() a the end to reset, and ALL the code in the middle row: a for instruction (to iterate fetch I guess) with some dozen or more nested levels of the "?" operator. Very hard to linearize, if possible at all.
I see down the bottom of the webpage with the source code is a "spoiler" where he explains which array as the 64k memory, and which one has the registers, and a bit about how it works.
Thanks to all of Ross' fantastic help over the last few pages on this thread, I do have a "Hello World" running on code::blocks using external memory (512k) and ready to drop in some real code. At work at the moment so can't test it on a real board. That means everyone else gets an 8 hour head start on me
First one to get this working on Catalina wins a prize...
It might be useful to have multi-threading within an LMM cog, but seems like more of a novelty. It would be more useful to dedicate one cog per thread. This would allow the programmer to get the full benefit of the Prop's multi-processor hardware. Of course, there will be cases where 8 cogs is just no enough, and multi-threading with a cog would be required. I suspect that programs with this requirement wouldn't fit in 32K of memory, and they would have to run in XMM mode.
In the ideal world it would be nice to allow for a mix of COG, LMM and XMM threads running at the same time. Speed-sensitive threads would run in COG or LMM mode, and slower background threads could be XMM or multi-threaded XMM.
That's the goal. The XMM part is just waiting for some more time for me to complete it.
I'm glad someone else sees the potential benefits!
First one to get this working on Catalina wins a prize...
Don't knock yourself out, guys - this program is quite likely to suffer the same problem as the toledo chess program, so it won't work until I release Catalina 3.4 (which contains the necessary code generator fixes).
I have a few techniques for "un-obfuscating" C code - I'll try them tonight if I can.
Yes I suspect it will need "unobfuscating" if it doesn't compile first go.
I tried tcc and got this
C:\tcc>tcc toledo2.c
toledo2.c:37: warning: assignment makes integer from pointer without a cast
toledo2.c:37: warning: assignment makes integer from pointer without a cast
toledo2.c:37: warning: assignment makes pointer from integer without a cast
toledo2.c:37: warning: assignment makes pointer from integer without a cast
toledo2.c:37: warning: assignment makes integer from pointer without a cast
toledo2.c:37: warning: assignment makes integer from pointer without a cast
toledo2.c:42: warning: assignment makes pointer from integer without a cast
toledo2.c:45: warning: assignment makes pointer from integer without a cast
toledo2.c:50: memory full
C:\tcc>
Are all those #define's just there to replace normal code with obfuscated jibberish?
main.c:36: operands of = have illegal types `int' and `pointer to struct __iobuf'
main.c:36: warning: expression with no effect elided
main.c:36: operands of = have illegal types `int' and `pointer to struct __iobuf'
main.c:36: type error in argument 4 to `fread'; found `int' expected `pointer to struct __iobuf'
main.c:36: type error in argument 1 to `fclose'; found `int' expected `pointer to struct __iobuf'
main.c:36: operands of = have illegal types `int' and `pointer to struct __iobuf'
main.c:36: operands of = have illegal types `int' and `pointer to struct __iobuf'
main.c:41: operands of ?: have illegal types `pointer to unsigned int function(pointer to const void,unsigned int,unsigned int,pointer to struct __iobuf)' and `pointer to unsigned int function(pointer to void,unsigned int,unsigned int,pointer to struct __iobuf)'
main.c:42: type error in argument 1 to `fseek'; found `int' expected `pointer to struct __iobuf'
main.c:42: operands of ?: have illegal types `int' and `pointer to unsigned char'
main.c:45: warning: expression with no effect elided
main.c:45: type error in argument 1 to `fgetc'; found `int' expected `pointer to struct __iobuf'
main.c:45: warning: expression with no effect elided
main.c:46: operands of ?: have illegal types `int' and `pointer to unsigned char'
main.c:46: warning: expression with no effect elided
main.c:46: operands of ?: have illegal types `pointer to unsigned char' and `unsigned char'
main.c:52: warning: expression with no effect elided
main.c:53: warning: expression with no effect elided
main.c:54: warning: expression with no effect elided
main.c:55: operands of ?: have illegal types `int' and `pointer to unsigned char'
main.c:55: warning: expression with no effect elided
main.c:55: warning: expression with no effect elided
main.c:55: operands of ?: have illegal types `pointer to unsigned char' and `unsigned char'
main.c:56: warning: expression with no effect elided
main.c:57: type error in argument 1 to `fclose'; found `int' expected `pointer to struct __iobuf'
main.c:57: warning: expression with no effect elided
main.c:57: type error in argument 1 to `fclose'; found `int' expected `pointer to struct __iobuf'
main.c:57: type error in argument 1 to `fclose'; found `int' expected `pointer to struct __iobuf'
main.c:57: warning: missing return value
Catalina Compiler 3.3
Process terminated with status 1 (0 minutes, 0 seconds)
17 errors, 12 warnings
I spent a little time seeing if anyone on the internet has already de-obfuscated it but nothing obvious.
So - where do you start? Do a "find and replace" with all the #defines?
separate out into more lines?
The author says the original code was eaten by his dog!
I spent a little time seeing if anyone on the internet has already de-obfuscated it but nothing obvious.
I half-deobfuscated it last year, and fixed the issues that could cause warnings or trouble with some compilers. I was in fact looking for that piece of code yesterday but couldn't find it. Maybe I didn't like the result or something, but I rarely throw away anything. I didn't look very hard though, it's probably somewhere around.
Tor, that would be fantastic if you have decoded this program!
(is "de-obfuscated" a word?)
Ross said
What exactly are you hoping to achieve?
Well, first of all, if Tor can make this work, it would be really cool.
More specifically, I have been looking at the Dracblade running CP/M. Firstly, CP/M mostly ran on real Z80 chips but most programs were backwards compatible with the 8080 so really you only need to write using 8080 code for an emulation.
The Dracblade takes over 90 pasm instructions to execute one 8080 instruction.
... S... L...O...W...
I am *absolutely* sure that number can be reduced. And there are two things that make me say that:
The first is caching. I am absolutely intrigued by caching, especially when people here talk about going straight past a "direct mapped" cache to a "2 way associative fill" cache http://en.wikipedia.org/wiki/CPU_cache.
I have this idea that maybe, with efficient C code and a good caching algorithm, maybe one can write a CP/M emulator faster in C than in pasm.
The second is by studying clever C emulation code. For instance, you read in a 8080 opcode, one of 256 bytes. In a simplistic way, you write a "switch" in C that selects one of 256 values and jumps to that location. But what if your opcode is #250 or #251 etc. Lots of searches for nothing. Smarter, you do something like in the CP/M source code where you note that each instruction is a jump, and each jump instruction is a 3 byte instruction, so you build a table using those values and then use a version of self-modifying code to take the opcode, multiply by 3, then add to the start of the jump table, poke it back into a jump, and hey presto, it jumps to the correct location.
I do not know if such things are possible in C, but I have seen some intriguing clues that maybe it is using arrays.
I do not know if this is possible in PASM. I need to study the jump instruction and the opcodes it generates.
But there are some clues in this obfuscated code, and on other sites on the internet, that maybe you can be even smarter again. 256 jumps is 2^8 so in theory, the least number of binary tests and jumps is 8 rather than 256. Probably less than 256 since some codes do not exist in 8080, only Z80.
And I believe that one can sort 8080 opcodes using some of the bit values, and then based on 2 of the 8 bits, jump to the "MOV" group or the "ADD/SUB" group. And so maybe one can work out the jumps faster and hence run the emulation faster.
The answers to these questions lie in the minds of those much smarter than me!
But I think the clues lie in this obfuscated C code, if only a mere mortal like me could decode it.
Hence my motivation to study this code. Even if nothing comes out of it, maybe I could discover an algorithm in C to speed up a "switch" algorithm by not having to search through every term but rather by using a jump table based on array.
Input from resident C experts would be most appreciated.
I'd love to see a CP/M emulation running in C on the propeller!
The second is by studying clever C emulation code. For instance, you read in a 8080 opcode, one of 256 bytes. In a simplistic way, you write a "switch" in C that selects one of 256 values and jumps to that location. But what if your opcode is #250 or #251 etc. Lots of searches for nothing. Smarter, you do something like in the CP/M source code where you note that each instruction is a jump, and each jump instruction is a 3 byte instruction, so you build a table using those values and then use a version of self-modifying code to take the opcode, multiply by 3, then add to the start of the jump table, poke it back into a jump, and hey presto, it jumps to the correct location.
You don't need clever C code - this is how a switch statement works anyway. It does not do a linear seach for the case to execute - it uses a jump table.
Dr_A,
I would be most impressed if anyone could write a PASM Z80 emulator much faster than the ones we have now.
I have tried a few approaches to tackling this problem with advice from all over the place and so did PullMoll.
Doing it in C is clearly impossible. There I said it so it will now happen:)
Yes I would have to agree that the Z80 is about as fast as it can get.
Thanks for that link - very interesting. Though longer than the Toledo code.
However, I do wonder about the 8080. The reason I say that is that so much time is spent getting data in and out of the ram chip and caching ought to make that faster. And if you want a cache then you need some space for the code, and (I believe) there might be space in the 8080 code but not the Z80 one (the Z80 was LMM only I think).
Talking about pasm for the moment, with hardware counters I think it should be possible to get a byte out of a sram with 4 pasm instructions (cf about 20 for the dracblace). But the catch is that those bytes come out as a data stream rather than random access, so that naturally leads to a cache, and for speed, probably a cog cache rather than a hub cache. I don't know if it will all fit in a cog though. I would have said no, but then along comes that tiny Toledo code in C, and then I found this little discussion thread as well http://www.vintage-computer.com/vcforum/archive/index.php/t-7391.html
Way of topic here but there is a new Catalina thread so I hope Ross does not mind.
The original 8080 emulator "PropAltair" squeezed the entire emulation into COG except for the registers and the opcode dispatch table.
The ZiCog Z80 emulator has been through a few mutations. Originally all 8080 ops were handled in COG. The extra Z80 ops being handled by overlays.
There was/is a #define that could be used to select 8080 only or full Z80.
There was/is a #define to select using HUB RAM as the memory space or external RAM.
The last version or so of ZiCog changed to use LMM for the extra Z80 ops. This saved space for more goodies in COG and had no effect on speed as those ops rarely get used.
So it would still be possible to take ZiCog and build it for 8080 only mode using HUB RAM. As HUB RAM remaining is only 16K or so then you will need caching to get the full 64K swapped in and out as required.
N.B. ZiCogs 8080 mode is not flag perfect as was at leat one version of PropAltair.
N.B. You cannot rebuild CP/M on a 8080 system as the SIMH build set up we use requires some Z80 ops, probably just the string ops.
Here are some think about. The first was made public after my initial post.
In GCC we can tell any function in an XMM program to run in HUB memory with a minor C code statement. Can Catalina do this without major rework?
How many DMIPS can you get out of an 80MHz Propeller? We're getting about 3.95 now.
Can Catalina generate PASM that runs in a COG from C source?
I don't have much else to compare because I don't give your work any attention.
Maybe you can make a big list of things that Propeller GCC could never do as a challenge
Here are some think about. The first was made public after my initial post.
In GCC we can tell any function in an XMM program to run in HUB memory with a minor C code statement. Can Catalina do this without major rework?
How many DMIPS can you get out of an 80MHz Propeller? We're getting about 3.95 now.
Can Catalina generate PASM that runs in a COG from C source?
I don't have much else to compare because I don't give your work any attention.
Maybe you can make a big list of things that Propeller GCC could never do as a challenge
After all your insinuations, this is the best you can come up with? I guess I should feel flattered that you obviously feel so threatened by Catalina .
After all your insinuations, this is the best you can come up with? I guess I should feel flattered that you obviously feel so threatened by Catalina .
Ross.
Surely you jest. I'm not threatened
I look forward to your dhrystone2.2 performance reply.
What was it? 3000 dhrystones/second with the optimizer?
Your C plane needs some hangar time.
Comments
In the ideal world it would be nice to allow for a mix of COG, LMM and XMM threads running at the same time. Speed-sensitive threads would run in COG or LMM mode, and slower background threads could be XMM or multi-threaded XMM.
My only problem with it is that it is so, so very obfuscated. It'd be nice to have some clue as to what's going on under the hood...
On the other hand, I guess it'd be a good lesson in C syntax to try to figure it out...
DrAc,
exactly as it says, the emulator looks for 3 files at startup, called "A", "B", and "C" (single uppercase letter).
C is the renamed c_bios.bin, A and B are CMP64.COM file (extracted from kaypro zip archive) and again, renamed to a single letter.
While playing with it I've done some helper scripts (for linux), I can send them to you if you want.
There are two problems to solve:
1. telling LCC to relax type checking for pointers (dunno if possible at all, I was quickly skimming thru LCC documentation and it said DON'T treat pointers as integers). Trying to manually cast anything but function arguments is road to sanitarium
...even the syntax highlighting appear a little confused in pairing some of the braces
2. telling LCC to avoid optimizing what appears to be useless code (tried #pragma optimize(0), but it's not the right thing).
code::blocks had a pretty good go at syntax highlighting and there is an option to pair braces which helps (control/shift then click on a brace and the other one should highlight.
How far have you got with those two problems?
Ray was referring to compiling it with Catalina, as for running it on linux I have it working, compiled and with some disks imported.
P.S. the trouble with trying to de-obfuscate is that is fundamentally a 3 row program: system(stty...) to set the console at start, another system() a the end to reset, and ALL the code in the middle row: a for instruction (to iterate fetch I guess) with some dozen or more nested levels of the "?" operator. Very hard to linearize, if possible at all.
I see down the bottom of the webpage with the source code is a "spoiler" where he explains which array as the 64k memory, and which one has the registers, and a bit about how it works.
Thanks to all of Ross' fantastic help over the last few pages on this thread, I do have a "Hello World" running on code::blocks using external memory (512k) and ready to drop in some real code. At work at the moment so can't test it on a real board. That means everyone else gets an 8 hour head start on me
First one to get this working on Catalina wins a prize...
Now you have me intrigued! Send me a PM, then.
Ross.
That's the goal. The XMM part is just waiting for some more time for me to complete it.
I'm glad someone else sees the potential benefits!
Ross.
Don't knock yourself out, guys - this program is quite likely to suffer the same problem as the toledo chess program, so it won't work until I release Catalina 3.4 (which contains the necessary code generator fixes).
I have a few techniques for "un-obfuscating" C code - I'll try them tonight if I can.
Ross.
I tried tcc and got this
Are all those #define's just there to replace normal code with obfuscated jibberish?
Almost all of them, yes.
Ross.
I spent a little time seeing if anyone on the internet has already de-obfuscated it but nothing obvious.
So - where do you start? Do a "find and replace" with all the #defines?
separate out into more lines?
The author says the original code was eaten by his dog!
Ross.
-Tor
(is "de-obfuscated" a word?)
Ross said
Well, first of all, if Tor can make this work, it would be really cool.
More specifically, I have been looking at the Dracblade running CP/M. Firstly, CP/M mostly ran on real Z80 chips but most programs were backwards compatible with the 8080 so really you only need to write using 8080 code for an emulation.
The Dracblade takes over 90 pasm instructions to execute one 8080 instruction.
... S... L...O...W...
I am *absolutely* sure that number can be reduced. And there are two things that make me say that:
The first is caching. I am absolutely intrigued by caching, especially when people here talk about going straight past a "direct mapped" cache to a "2 way associative fill" cache http://en.wikipedia.org/wiki/CPU_cache.
I have this idea that maybe, with efficient C code and a good caching algorithm, maybe one can write a CP/M emulator faster in C than in pasm.
The second is by studying clever C emulation code. For instance, you read in a 8080 opcode, one of 256 bytes. In a simplistic way, you write a "switch" in C that selects one of 256 values and jumps to that location. But what if your opcode is #250 or #251 etc. Lots of searches for nothing. Smarter, you do something like in the CP/M source code where you note that each instruction is a jump, and each jump instruction is a 3 byte instruction, so you build a table using those values and then use a version of self-modifying code to take the opcode, multiply by 3, then add to the start of the jump table, poke it back into a jump, and hey presto, it jumps to the correct location.
I do not know if such things are possible in C, but I have seen some intriguing clues that maybe it is using arrays.
I do not know if this is possible in PASM. I need to study the jump instruction and the opcodes it generates.
But there are some clues in this obfuscated code, and on other sites on the internet, that maybe you can be even smarter again. 256 jumps is 2^8 so in theory, the least number of binary tests and jumps is 8 rather than 256. Probably less than 256 since some codes do not exist in 8080, only Z80.
And I believe that one can sort 8080 opcodes using some of the bit values, and then based on 2 of the 8 bits, jump to the "MOV" group or the "ADD/SUB" group. And so maybe one can work out the jumps faster and hence run the emulation faster.
The answers to these questions lie in the minds of those much smarter than me!
But I think the clues lie in this obfuscated C code, if only a mere mortal like me could decode it.
Hence my motivation to study this code. Even if nothing comes out of it, maybe I could discover an algorithm in C to speed up a "switch" algorithm by not having to search through every term but rather by using a jump table based on array.
Input from resident C experts would be most appreciated.
I'd love to see a CP/M emulation running in C on the propeller!
Catalina 3.4 has now been released - see here
Ross.
Ross.
I would be most impressed if anyone could write a PASM Z80 emulator much faster than the ones we have now.
I have tried a few approaches to tackling this problem with advice from all over the place and so did PullMoll.
Doing it in C is clearly impossible. There I said it so it will now happen:)
Here is a link to the C source of the gnusim8085 execution engine: http://bazaar.launchpad.net/~gnusim8085-admins/gnusim8085/trunk/view/head:/src/8085-instructions.c
Strangely enough it uses a jump table instead of a switch, all in all does not look like it is built for speed.
Thanks for that link - very interesting. Though longer than the Toledo code.
However, I do wonder about the 8080. The reason I say that is that so much time is spent getting data in and out of the ram chip and caching ought to make that faster. And if you want a cache then you need some space for the code, and (I believe) there might be space in the 8080 code but not the Z80 one (the Z80 was LMM only I think).
Talking about pasm for the moment, with hardware counters I think it should be possible to get a byte out of a sram with 4 pasm instructions (cf about 20 for the dracblace). But the catch is that those bytes come out as a data stream rather than random access, so that naturally leads to a cache, and for speed, probably a cog cache rather than a hub cache. I don't know if it will all fit in a cog though. I would have said no, but then along comes that tiny Toledo code in C, and then I found this little discussion thread as well http://www.vintage-computer.com/vcforum/archive/index.php/t-7391.html
I suspect you have already done all this though.
Jazzed? Still waiting for that PM! Or send me an email instead. I'd be interested in finding out more.
Ross.
The original 8080 emulator "PropAltair" squeezed the entire emulation into COG except for the registers and the opcode dispatch table.
The ZiCog Z80 emulator has been through a few mutations. Originally all 8080 ops were handled in COG. The extra Z80 ops being handled by overlays.
There was/is a #define that could be used to select 8080 only or full Z80.
There was/is a #define to select using HUB RAM as the memory space or external RAM.
The last version or so of ZiCog changed to use LMM for the extra Z80 ops. This saved space for more goodies in COG and had no effect on speed as those ops rarely get used.
So it would still be possible to take ZiCog and build it for 8080 only mode using HUB RAM. As HUB RAM remaining is only 16K or so then you will need caching to get the full 64K swapped in and out as required.
N.B. ZiCogs 8080 mode is not flag perfect as was at leat one version of PropAltair.
N.B. You cannot rebuild CP/M on a 8080 system as the SIMH build set up we use requires some Z80 ops, probably just the string ops.
I wish I knew what you were talking about. A clue, perhaps?
Ross.
Can I assume from your continued lack of response that this "claim" (whatever it was) has come to nothing?
Ross.
In GCC we can tell any function in an XMM program to run in HUB memory with a minor C code statement. Can Catalina do this without major rework?
How many DMIPS can you get out of an 80MHz Propeller? We're getting about 3.95 now.
Can Catalina generate PASM that runs in a COG from C source?
I don't have much else to compare because I don't give your work any attention.
Maybe you can make a big list of things that Propeller GCC could never do as a challenge
After all your insinuations, this is the best you can come up with? I guess I should feel flattered that you obviously feel so threatened by Catalina .
Ross.
I look forward to your dhrystone2.2 performance reply.
What was it? 3000 dhrystones/second with the optimizer?
Your C plane needs some hangar time.