Yeah but it will be hard to give up that hub execution speed and go back to XMM. I'm hoping that the great Bill Henning will pull another rabbit out of a hat and come up with a clever way to do something like XMM without giving up so much performance. My lame idea of overlays can't be the only way of handling this! :-)
Edit: I'm serious. LMM was a great idea and I wouldn't be surprised if you come with another for this problem.
I have been away for a couple of days, so only brief forum views. But I got to thinking...
P2....
Clock: 2.5x P1
Instructions: 4x P1 (so 10x P1 with clock increase)
Hub: 8x P1 Ram (basically no ROM)
Hub Bandwidth: 2x P1 slot access (ie 1 in 8 was 1 in 16)
Hub Bandwidth: 8x (wide) P1 access per slot (ie 16x P1)
Cog: New 256 Long Aux Ram for video ram, cog stacks, cog aux registers
And this is before we look at all the fantastic new features such as cog multithreading, hubexec, caching, SDRAM, ADC, special instructions, etc etc etc.
Trying not to think about executing from SDRAM, again, right? Please don't ....
Seriously, what are you up to now? Will you make your goal?
I've given each task its own PTRA/PTRB set. I want to make register remapping work for code, too, not just D and S. Then, I want to add CALLR D/#/@. I think that will pretty well round out hub execution without going to a TLB (I think it's called) for HUGE memory addressing. After all that, I must address SERDES.
I've given each task its own PTRA/PTRB set. I want to make register remapping work for code, too, not just D and S. Then, I want to add CALLR D/#/@. I think that will pretty well round out hub execution without going to a TLB (I think it's called) for HUGE memory addressing. After all that, I must address SERDES.
I've given each task its own PTRA/PTRB set. I want to make register remapping work for code, too, not just D and S. Then, I want to add CALLR D/#/@. I think that will pretty well round out hub execution without going to a TLB (I think it's called) for HUGE memory addressing. After all that, I must address SERDES.
One of the things I was planning on doing was adding remapping to instructions, so that they could be remapped just like D and S registers. As I got into it, I realized that would create THE new critical path, which was why I had gotten rid of instruction remapping a while back. So, it wasn't going to work. But, then, I realized that we don't even need it anymore because we now have relative branches, so all code is relocatable, already. This is actually better than remapping instructions, because it allows for differently-sized programs to be loaded into a cog, together, with the bottom area remapped just for the variables. We could write lots of short programs that form soft peripherals and mix and match them at runtime into a cog and have them all work together. They just need to fit, along with their variables. If one program needs 16 longs, while others need less, just set the remapping to 4 blocks of 16, load the programs in, one after another, and start them up in tasks.
Meanwhile, I've made room for the new JMPR instructions, which jump somewhere, while depositing the return address into register $000. This is the 'link register' function that GCC needs. It will be useful for all kinds of other things, as well. To make room, I had to limit JMPLIST to immediate S, only, which is how it would have been used 99% of the time, anyway.
One of the things I was planning on doing was adding remapping to instructions, so that they could be remapped just like D and S registers. As I got into it, I realized that would create THE new critical path, which was why I had gotten rid of instruction remapping a while back. So, it wasn't going to work. But, then, I realized that we don't even need it anymore because we now have relative branches, so all code is relocatable, already. This is actually better than remapping instructions, because it allows for differently-sized programs to be loaded into a cog, together, with the bottom area remapped just for the variables. We could write lots of short programs that form soft peripherals and mix and match them at runtime into a cog and have them all work together. They just need to fit, along with their variables. If one program needs 16 longs, while others need less, just set the remapping to 4 blocks of 16, load the programs in, one after another, and start them up in tasks.
Meanwhile, I've made room for the new JMPR instructions, which jump somewhere, while depositing the return address into register $000. This is the 'link register' function that GCC needs. It will be useful for all kinds of other things, as well. To make room, I had to limit JMPLIST to immediate S, only, which is how it would have been used 99% of the time, anyway.
Why not make it so that JMPLIST also stores the return address to reg 000, like JMPR. So you can use JMPLIST for jumps and calls. To return from such a "CALLIST" you just do a JMP 0.
Why not make it so that JMPLIST also stores the return address to reg 000, like JMPR. So you can use JMPLIST for jumps and calls. To return from such a "CALLIST" you just do a JMP 0.
Andy
That sounds like a good idea. You'd get the fastest 'call' that way, too.
May I suggest the JMPLIST & JMPR (or CALLIST & CALLR) be named LINKLST & LINK.
This way, they are differentiated from the other CALLs. They use the "LINK REGISTER $000" to deposit the return address.
I presume the link register $000 will obey the task variable re-mapping?
I presume that the other CALL variants will not deposit their return address into $000 (as had previously discussed)?
May I suggest the JMPLIST & JMPR (or CALLIST & CALLR) be named LINKLST & LINK.
This way, they are differentiated from the other CALLs. They use the "LINK REGISTER $000" to deposit the return address.
I presume the link register $000 will obey the task variable re-mapping?
I presume that the other CALL variants will not deposit their return address into $000 (as had previously discussed)?
Yes, $000 will obey remapping.
Yes, the other CALL variants won't write to $000 - only these new instructions.
Can't this be worked out by using a simple flag bit per task (four per COG)?
So then a RET could have a dual behavior, depending on the type of JUMP/CALL previously used to access the routine it's the ending instruction.
Then one could access the same code piece by means of any type of JUMP/CALL; the RET instruction will behave accordingly and, if the flag bit was set before, by a previously executed JMPLIST & JMPR (or CALLIST & CALLR) (or LINKLST & LINK, as sugested by Cluso99), it will do a Jump 0 and clear the flag bit, instead?
Thinking further, perhaps this approach could introduce a new problem, as follows:
A CALLIST to a code piece that has an inner (and otherwise normal) CALL to another code piece; in this case, the first RET encountered must behave normally, executing as a "normal" RET; then, there will come another RET, i.e. the true ending for the CALLIST. THIS ONE must behave as a JUMP to 0, and finally clear the flag.
Perhaps, a single bit, two position stack, exercised by the FIRST normal CALL, could work out this situation, but there will be no heal to multilevel CALLISTs, then they must be CLEARLY FORBIDEN by instruction's documentation.
Only one problem -- That routines that ends with RET -- dont know anything that them need first jump to 0
A simple LINKRET instruction would solve the return and be consistent. Of course, we could just use a LINK $0, but a specific instruction would be better (we have plenty of instruction space for instructions without operands).
Can't this be worked out by using a simple flag bit per task (four per COG)?
So then a RET could have a dual behavior, depending on the type of JUMP/CALL previously used to access the routine it's the ending instruction.
Then one could access the same code piece by means of any type of JUMP/CALL; the RET instruction will behave accordingly and, if the flag bit was set before, by a previously executed JMPLIST & JMPR (or CALLIST & CALLR) (or LINKLST & LINK, as sugested by Cluso99), it will do a Jump 0 and clear the flag bit, instead?
Thinking further, perhaps this approach could introduce a new problem, as follows:
A CALLIST to a code piece that has an inner (and otherwise normal) CALL to another code piece; in this case, the first RET encountered must behave normally, executing as a "normal" RET; then, there will come another RET, i.e. the true ending for the CALLIST. THIS ONE must behave as a JUMP to 0, and finally clear the flag.
Perhaps, a single bit, two position stack, exercised by the FIRST normal CALL, could work out this situation, but there will be no heal to multilevel CALLISTs, then they must be CLEARLY FORBIDEN by instruction's documentation.
A simple LINKRET instruction would solve the return and be consistent. Of course, we could just use a LINK $0, but a specific instruction would be better (we have plenty of instruction space for instructions without operands).
I'm not sure that "cross-assembler" is what you mean. After all, for example, PNut is a cross assembler, it runs on x86 and assembles code for the Propeller 2.
Sounds like what you are talking about is a program that translate 68000 assembler to P2 PASM.
So I have to ask why do we want to make the P2 into a 68000 compatible instruction set?
I'm not sure that "cross-assembler" is what you mean. After all, for example, PNut is a cross assembler, it runs on x86 and assembles code for the Propeller 2.
Sounds like what you are talking about is a program that translate 68000 assembler to P2 PASM.
So I have to ask why do we want to make the P2 into a 68000 compatible instruction set?
Sorry my friend, I didn't had chance to even try to program such kind of processor/instruction.
My lonely contact with the 68000 family was just viewing its internals thru a cristal clear window, crafted over a factory sample, distributed by Motorola's french branch, as a commemorative gift, back at the early 80's.
I was thinking generally, in a way to enable the RET instruction, to know how to behave, depending on the type of instruction that called the routine it's just the end.
Chip,
Once before when we were running out of instruction space, I suggested that perhaps the WZ & WC are not required for DECOD2/3/4/5 instructions.
If WZ & WC are not required for these instructions, then they could share
1000011 ff I CCCC DDDDDDDDD SSSSSSSSS DECOD2/3/4/5 D,S/#
thereby freeing up 3 prime instructions...
1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS
I also wonder if any of ISOB, NOTB, CLRB, SETB, SETBC, SETBNC, SETBZ, SETBNZ could similarly give up their WZ and/or WC?
Chip,
Once before when we were running out of instruction space, I suggested that perhaps the WZ & WC are not required for DECOD2/3/4/5 instructions.
If WZ & WC are not required for these instructions, then they could share
1000011 ff I CCCC DDDDDDDDD SSSSSSSSS DECOD2/3/4/5 D,S/#
thereby freeing up 3 prime instructions...
1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS
I also wonder if any of ISOB, NOTB, CLRB, SETB, SETBC, SETBNC, SETBZ, SETBNZ could similarly give up their WZ and/or WC?
WC is used in DECODx as a over range flag. If WC & WZ were removed , range and zero testing would have to be done separately.
An example of it being used is in Chip's SDRAM driver.
Chip,
Once before when we were running out of instruction space, I suggested that perhaps the WZ & WC are not required for DECOD2/3/4/5 instructions.
If WZ & WC are not required for these instructions, then they could share
1000011 ff I CCCC DDDDDDDDD SSSSSSSSS DECOD2/3/4/5 D,S/#
thereby freeing up 3 prime instructions...
1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS
I also wonder if any of ISOB, NOTB, CLRB, SETB, SETBC, SETBNC, SETBZ, SETBNZ could similarly give up their WZ and/or WC?
The DECOD2/3/4/5 instructions can copy the two bits above those being decoded into Z/C via WZ/WC. I've been using these for the Spin interpreter, as they provide mask functions with flag setting.
The ISOB/NOTB... instructions copy the original bit into C, with Z being affected by the long result. Maybe those flag results aren't so important, but they have some uses.
After I get this LINK stuff done, I want to add your USB pin instructions. The only matter after that (and maybe some related CRC instruction(s)) is the SERDES.
Comments
Trying not to think about executing from SDRAM, again, right? Please don't ....
Seriously, what are you up to now? Will you make your goal?
Edit: I'm serious. LMM was a great idea and I wouldn't be surprised if you come with another for this problem.
(thanks)
P2....
Clock: 2.5x P1
Instructions: 4x P1 (so 10x P1 with clock increase)
Hub: 8x P1 Ram (basically no ROM)
Hub Bandwidth: 2x P1 slot access (ie 1 in 8 was 1 in 16)
Hub Bandwidth: 8x (wide) P1 access per slot (ie 16x P1)
Cog: New 256 Long Aux Ram for video ram, cog stacks, cog aux registers
And this is before we look at all the fantastic new features such as cog multithreading, hubexec, caching, SDRAM, ADC, special instructions, etc etc etc.
This is seriously one hell of a chip!!!
I've given each task its own PTRA/PTRB set. I want to make register remapping work for code, too, not just D and S. Then, I want to add CALLR D/#/@. I think that will pretty well round out hub execution without going to a TLB (I think it's called) for HUGE memory addressing. After all that, I must address SERDES.
PTRA/PTRB for each task.. Excellent !
Outstanding!!! Great stuff Chip.
Meanwhile, I've made room for the new JMPR instructions, which jump somewhere, while depositing the return address into register $000. This is the 'link register' function that GCC needs. It will be useful for all kinds of other things, as well. To make room, I had to limit JMPLIST to immediate S, only, which is how it would have been used 99% of the time, anyway.
It is why I asked some time ago -- If we can have even relative CALL's/LUMP's
BUT still ---- JMPLIST have not big use --- In my opinion.
CALL-LIST have big use --- one of them are for croscompiler from 68000 code to P2
Andy
That sounds like a good idea. You'd get the fastest 'call' that way, too.
This way, they are differentiated from the other CALLs. They use the "LINK REGISTER $000" to deposit the return address.
I presume the link register $000 will obey the task variable re-mapping?
I presume that the other CALL variants will not deposit their return address into $000 (as had previously discussed)?
Only one problem -- That routines that ends with RET -- dont know anything that them need first jump to 0
Yes, $000 will obey remapping.
Yes, the other CALL variants won't write to $000 - only these new instructions.
I am thinking about your name proposals.
Can't this be worked out by using a simple flag bit per task (four per COG)?
So then a RET could have a dual behavior, depending on the type of JUMP/CALL previously used to access the routine it's the ending instruction.
Then one could access the same code piece by means of any type of JUMP/CALL; the RET instruction will behave accordingly and, if the flag bit was set before, by a previously executed JMPLIST & JMPR (or CALLIST & CALLR) (or LINKLST & LINK, as sugested by Cluso99), it will do a Jump 0 and clear the flag bit, instead?
Thinking further, perhaps this approach could introduce a new problem, as follows:
A CALLIST to a code piece that has an inner (and otherwise normal) CALL to another code piece; in this case, the first RET encountered must behave normally, executing as a "normal" RET; then, there will come another RET, i.e. the true ending for the CALLIST. THIS ONE must behave as a JUMP to 0, and finally clear the flag.
Perhaps, a single bit, two position stack, exercised by the FIRST normal CALL, could work out this situation, but there will be no heal to multilevel CALLISTs, then they must be CLEARLY FORBIDEN by instruction's documentation.
Yanomani
If You programed 68000 ---> You know how TRAP instruction work
It is CALL using fast Interrupt address.
In code You write.
TRAP xx
> xx stands for function number in JUMP table to simple routines that ends with RETurn
So only usable instruction to simulate it in cross-assembler is
>
CALL JUMP-table, xx
One thing is write program in PNut to work on that.
But to construct cross-assembler that need work in simpler way
I'm not sure that "cross-assembler" is what you mean. After all, for example, PNut is a cross assembler, it runs on x86 and assembles code for the Propeller 2.
Sounds like what you are talking about is a program that translate 68000 assembler to P2 PASM.
So I have to ask why do we want to make the P2 into a 68000 compatible instruction set?
You are mostly correct.
BUT my intention are Assemble directly 68000 ASM code to P2 bin-run loadable code
Sorry my friend, I didn't had chance to even try to program such kind of processor/instruction.
My lonely contact with the 68000 family was just viewing its internals thru a cristal clear window, crafted over a factory sample, distributed by Motorola's french branch, as a commemorative gift, back at the early 80's.
I was thinking generally, in a way to enable the RET instruction, to know how to behave, depending on the type of instruction that called the routine it's just the end.
Yanomani
Now why would Chip want to bend his P2 into a 68000? Why is translating 68000 to P2 instructions important for Prop users?
Is this even remotely possible? I thought the 68000 had lots of addressing modes that the P2 surely does not.
Speaking of TRAP. That is an interrupt. And you know what we think of interrupts around here!
If you need a subroutine which can be called from JMPLIST and from a normal Call then you can write it like that: It's not the same entry point but I think this can easy be handled in the jmp-list (address-1).
Andy
Once before when we were running out of instruction space, I suggested that perhaps the WZ & WC are not required for DECOD2/3/4/5 instructions.
If WZ & WC are not required for these instructions, then they could share
1000011 ff I CCCC DDDDDDDDD SSSSSSSSS DECOD2/3/4/5 D,S/#
thereby freeing up 3 prime instructions...
1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS
1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS
I also wonder if any of ISOB, NOTB, CLRB, SETB, SETBC, SETBNC, SETBZ, SETBNZ could similarly give up their WZ and/or WC?
WC is used in DECODx as a over range flag. If WC & WZ were removed , range and zero testing would have to be done separately.
An example of it being used is in Chip's SDRAM driver.
Not a big deal if it helps gain opcodes.
The DECOD2/3/4/5 instructions can copy the two bits above those being decoded into Z/C via WZ/WC. I've been using these for the Spin interpreter, as they provide mask functions with flag setting.
The ISOB/NOTB... instructions copy the original bit into C, with Z being affected by the long result. Maybe those flag results aren't so important, but they have some uses.
After I get this LINK stuff done, I want to add your USB pin instructions. The only matter after that (and maybe some related CRC instruction(s)) is the SERDES.