Great! I did not realize those instructions were so flexible .... so the stacks would be in Hub in that case ....
Cheers, B.
X/Y stack space was in a special area of COG memory referred to as AUXRAM. It could be used for stack/data or a Color Look Up Table (CLUT) used for video.
oops!
A quick look back at the old notes and the indirect stuff used the CCCC bits in the opcode not the PTRx stuff.
The indirect feature was nice too, but I don't think that made the cut in the new P2.
Your kidding right! So all the stuff that makes a great Forth computer is culled ....... bugger!
Grab an FPGA and a copy of the February 2014 P2Hot emulation and you can play with PFTH and the P2 like I did.
You are not getting the point! I am looking at what the current thinking is for the P2 instruction set, not the P2-HOT .... lot's of things have been culled.
I want to push back to Chip, if we can't make a decent Forth machine out of P2 because he missed out some essential PUSH/POP style commands ....... before it is too late.
The P2 should be a more than decent Forth machine running a version of Tachyon tailored for its final architecture. Tachyon screams on tbe P1 and should really scream on the P2.
The P2 should be a more than decent Forth machine running a version of Tachyon tailored for its final architecture. Tachyon screams on tbe P1 and should really scream on the P2.
Yes, but this thread is about PFTH! Not Tachyon ....
I had a look at Tachyon, I hate the syntax, the code is too hard to follow, it's un-readable.
I prefer PFTH, it's easy to understand AND quick (and with a bit of tweaking much faster) .... a good example of Forth and PASM to teach the young ones ....
Yes, but this thread is about PFTH! Not Tachyon ....
I had a look at Tachyon, I hate the syntax, the code is too hard to follow, it's un-readable.
I prefer PFTH, it's easy to understand AND quick (and with a bit of tweaking much faster) .... a good example of Forth and PASM to teach the young ones ....
I am looking forward to Forth on the Propeller 2 -- especially if it is both educational and functional. So I still prefer PFTH, while I appreaciate Tachyon's pushing the boundaries. You just can't have one Forth do everything on the Propeller. If you stretch in one direction, you ignore another. Don't forget PropForth. I am sure it will try to keep up.
@bmentink
I believe these were the most recent opcodes presented on the new P2. See here
@ozpropdev Still don't get where you got the following from.
PUSHA D/# (alias for WRLONG D/#,PTRA++)
PUSHB D/# (alias for WRLONG D/#,PTRB++)
POPA (alias for RDLONG D,--PTRA)
POPB (alias for RDLONG D,--PTRB)
The link you gave for the latest instruction set was the link I allready quoted, it does not have the above.
It does have a comment "Aliases for WRLONG/RDLONG: PUSHA/PUSHB/POPA/POPB" at the bottom of the quote.
Confused! Did you write the above quote, or did Chip. If it was Chip you must have got it from another source, not the above link ...
@Bernie
Hmm... I see the problem
The PUSH/POP stuff I posted is from a instruction set dated 9 April 2014. I also have a set dated 14 April 2014.
I can't seem to find either set on the forum now.
Anyhow the 16 April 2014 set refers to PUSHA/POPA/{USHB/POPB as aliases for WRLONG etc.
In order to be a true PUSH/POP it MUST be auto increment/decrement.
Based on the further evidence of CALLA/RETA one can assume that the infrastructure exists for auto inc/dec.
Sorry for the confusion.
BTW. The link in your post is broken.
Cheers
Brian
pushx/y ptra
popx/y ptra
and something like jmp *ptra where I can jump to the address in the address pointed to by ptra (I don't think that exists or I don't know how to code it) I'm still looking because that is where the big speed gain is.
I wonder if the following instruction can be used as an indirect jump:
---- 1111101 11 1 CCCC 0 nnnnnnnnnnnnnnnnn CALLA #abs (call to 17-bit absolute address using PTRA)
---- 1111101 11 1 CCCC 1 nnnnnnnnnnnnnnnnn CALLA @rel (call to 17-bit relative address using PTRA)
Maybe we could make the CALL a JMP by modifying the SP to remove the stacked return address .... I seem to remember using this trick before in the past ....
The only problem is there is no CALLA @D, so would have to use the @rel some how ...
There is also this command:
ZCWS 1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSW D,S/@ (jump to S/@, store return address in D, WZ/WC to save/load flags)
@Bernie
Hmm... I see the problem
The PUSH/POP stuff I posted is from a instruction set dated 9 April 2014. I also have a set dated 14 April 2014.
I can't seem to find either set on the forum now.
Anyhow the 16 April 2014 set refers to PUSHA/POPA/{USHB/POPB as aliases for WRLONG etc.
@Bernie, Thanks! That is the thread I was looking for with the final consolidated version of PFTH-P2. Shortly after this, the P2 went to jelly and except for a fun exercise playing on a P2 that will never be in my spare time, I haven't done anything further or thought any more about it. Once we get a more tangible P2, I'm sure I'll start up the hunt again.
Note, this spec is published AFTER the dates publishing the opcodes ... so maybe they have been updated again, I wish Chip would publish his latest offereings ....
Chip's busy. We don't want to hear anything from Chip until there is a post that say, "Try this on your FPGAs!"
Execute code from HUBRAM - last I recall, this is still on the feature list for the P2. I think everyone liked that feature! It gives you a MUCH larger code space if you can actually execute code that lives outside the 2KB COG RAM. It opens up a big, flat address space to play with!
I did not know that each COG can execute code in hub as well as it's own ram ... interesting ..
Cool eh, we've been calling it HubExec for short. It's existence has a history and it's many details caused Chip a lot of stress during the Prop2-HOT development cycle but it was also deemed as important to achieve.
I don't think anyone even wanted to ask if he was including it in the Prop2-COLD design but he's been adamant it's going to happen. I'd hazard a guess that Chip is re-implementing HubExec in the new design right now.
@Dave Hein
Hi Dave, in the following definition of pop1, what is the purpose of the setting of the zero flag? I don't see any instances where you use pop1, that use the zero flag.
Can it simply be removed? It would then allow more "pop" stack operations to be in-lined without penalty ...
pop1 popx parm1
cmp parm1, #0 wz
pop_ret
pop2_ret ret
Also I notice in the following fragment instances where two operations could be one ..
semicolonfunc mov temp1, #0
wrlong temp1, a_state
I think you can do a simple wrlong #0, a_state, the instructions supports it (was this not possible on the P1?)
_jzfunc uses the zero flag after calling pop1. "wrlong #0, a_state" doesn't work on the P1. I didn't know about that on the P2. That's great if it works.
_jzfunc uses the zero flag after calling pop1. "wrlong #0, a_state" doesn't work on the P1. I didn't know about that on the P2. That's great if it works.
If _jzfunc is the only place, then we can just set the flag there, rather than burden every pop1 .. as mindrobots allready did in attached file
He just did not delete the compare out of pop1, which can now happen ... then pop1 can be in-lined in more places ..
If the following is correct, using #constant should work ..
--LS 1110001 0L I CCCC DDDDDDDDD SSSSSSSSS WRLONG D/#,S/PTRA/PTRB (waits for hub)
Yes, that works. It will be interesting to see how fast pfth runs on the P2. One thing that concerns me is the use of hub reads and jumps in the pfth kernel. Jumps cause a pipeline flush, which wastes cycles. Hub reads will cause hub stalls. The fastest Forth implementation for P2 may be one that uses the hubexec mode instead of doing all the primitive operations in a kernel. This will require more code memory, but it may be a lot faster.
Yes, that works. It will be interesting to see how fast pfth runs on the P2. One thing that concerns me is the use of hub reads and jumps in the pfth kernel. Jumps cause a pipeline flush, which wastes cycles. Hub reads will cause hub stalls. The fastest Forth implementation for P2 may be one that uses the hubexec mode instead of doing all the primitive operations in a kernel. This will require more code memory, but it may be a lot faster.
Hmmm .. if we could execute most of the code AND have the stacks and some variables in COG, then hub reads will be minimal, I would like to see that approach benchmarked against running entirely in hubexec.
It "should" still be faster, as the hub is clocked at 1/2 the speed of the cogs .. the overflow of variables and/or code could be hubexec ..... maybe we could get the best of both worlds.
We could have a split Forth system, run in COG if possible, remainder in hubexec. I believe the COG's have 4kB of ram now?
It "should" still be faster, as the hub is clocked at 1/2 the speed of the cogs ..
Prop2 Hub is same speed as it's Cogs - 16 clocks and 16 operations per rotation. Albeit an "operation" here has many facets, from the CORDIC to parallel accesses from all Cogs.
The cogs still only have 2K (512 Longs) of cog ram.
The old "HOT" P2 had in addition to cog ram a "CLUT/AUX" ram block of 1K (256 Longs) which has been removed from the new P2.
Prop2 Hub is same speed as it's Cogs - 16 clocks and 16 operations per rotation. Albeit an "operation" here has many facets, from the CORDIC to parallel accesses from all Cogs.
I thought I read somewhere on the forums that the COG's run at 100MIPS and the HUB at 50MIPS ... is that not correct?
Which brings me to a question that has to be asked: (.. and please do not see this as a criticism, just need knowledge, I am more than happy with 100MIPS)
Why is it, in the day of modern ARM processors we can get upto 2..3MIPS/Mhz, but with the Prop only 0.5MIPS/Mhz, I though the addition of a pipeline helped with that, but I know
very little about this ... someone with more knowledge please help .. is it something to do with the silicon technology used?
Comments
X/Y stack space was in a special area of COG memory referred to as AUXRAM. It could be used for stack/data or a Color Look Up Table (CLUT) used for video.
Use PUSHX/POPX and PUSHY/POPY instructions. Look at my PFTH examples, they use the stack space in AUXRAM.
Your kidding right! So all the stuff that makes a great Forth computer is culled ....... bugger!
Read the rest of this thread ..... that has been culled .....
Grab an FPGA and a copy of the February 2014 P2Hot emulation and you can play with PFTH and the P2 like I did.
You are not getting the point! I am looking at what the current thinking is for the P2 instruction set, not the P2-HOT .... lot's of things have been culled.
I want to push back to Chip, if we can't make a decent Forth machine out of P2 because he missed out some essential PUSH/POP style commands ....... before it is too late.
Looks like I will have to delve into Veralog and make my own ....:-(
I had a look at Tachyon, I hate the syntax, the code is too hard to follow, it's un-readable.
I prefer PFTH, it's easy to understand AND quick (and with a bit of tweaking much faster) .... a good example of Forth and PASM to teach the young ones ....
These instructions are in the last proposed P2 instruction set.
If these are included then the auto inc/dec feature is still a goer!
I am looking forward to Forth on the Propeller 2 -- especially if it is both educational and functional. So I still prefer PFTH, while I appreaciate Tachyon's pushing the boundaries. You just can't have one Forth do everything on the Propeller. If you stretch in one direction, you ignore another. Don't forget PropForth. I am sure it will try to keep up.
I believe these were the most recent opcodes presented on the new P2.
See here
@ozpropdev Still don't get where you got the following from.
The link you gave for the latest instruction set was the link I allready quoted, it does not have the above.
It does have a comment "Aliases for WRLONG/RDLONG: PUSHA/PUSHB/POPA/POPB" at the bottom of the quote.
Confused! Did you write the above quote, or did Chip. If it was Chip you must have got it from another source, not the above link ...
Hmm... I see the problem
The PUSH/POP stuff I posted is from a instruction set dated 9 April 2014. I also have a set dated 14 April 2014.
I can't seem to find either set on the forum now.
Anyhow the 16 April 2014 set refers to PUSHA/POPA/{USHB/POPB as aliases for WRLONG etc.
In order to be a true PUSH/POP it MUST be auto increment/decrement.
Based on the further evidence of CALLA/RETA one can assume that the infrastructure exists for auto inc/dec.
Sorry for the confusion.
BTW. The link in your post is broken.
Cheers
Brian
Maybe that was what you were looking for, when you were looking for your code that had all the speed improvements in (Return Stack, Data Stack, Serial) ?
In that thread you state:
I wonder if the following instruction can be used as an indirect jump:
Maybe we could make the CALL a JMP by modifying the SP to remove the stacked return address .... I seem to remember using this trick before in the past ....
The only problem is there is no CALLA @D, so would have to use the @rel some how ...
There is also this command:
Here's the 9 April 2014 link - http://forums.parallax.com/showthread.php/155132-The-New-16-Cog-512KB-64-analog-I-O-Propeller-Chip?p=1258426&viewfull=1#post1258426
I did not know that each COG can execute code in hub as well as it's own ram ... interesting ..
Note, this spec is published AFTER the dates publishing the opcodes ... so maybe they have been updated again, I wish Chip would publish his latest offereings ....
Cheers,
Bernie
Execute code from HUBRAM - last I recall, this is still on the feature list for the P2. I think everyone liked that feature! It gives you a MUCH larger code space if you can actually execute code that lives outside the 2KB COG RAM. It opens up a big, flat address space to play with!
Cool eh, we've been calling it HubExec for short. It's existence has a history and it's many details caused Chip a lot of stress during the Prop2-HOT development cycle but it was also deemed as important to achieve.
I don't think anyone even wanted to ask if he was including it in the Prop2-COLD design but he's been adamant it's going to happen. I'd hazard a guess that Chip is re-implementing HubExec in the new design right now.
Hi Dave, in the following definition of pop1, what is the purpose of the setting of the zero flag? I don't see any instances where you use pop1, that use the zero flag.
Can it simply be removed? It would then allow more "pop" stack operations to be in-lined without penalty ...
Also I notice in the following fragment instances where two operations could be one ..
I think you can do a simple wrlong #0, a_state, the instructions supports it (was this not possible on the P1?)
Cheers,
Bernie
If _jzfunc is the only place, then we can just set the flag there, rather than burden every pop1 .. as mindrobots allready did in attached file
He just did not delete the compare out of pop1, which can now happen ... then pop1 can be in-lined in more places ..
If the following is correct, using #constant should work ..
Cheers,
Bernie
Hmmm .. if we could execute most of the code AND have the stacks and some variables in COG, then hub reads will be minimal, I would like to see that approach benchmarked against running entirely in hubexec.
It "should" still be faster, as the hub is clocked at 1/2 the speed of the cogs .. the overflow of variables and/or code could be hubexec ..... maybe we could get the best of both worlds.
We could have a split Forth system, run in COG if possible, remainder in hubexec. I believe the COG's have 4kB of ram now?
Cheers,
Bernie
Prop2 Hub is same speed as it's Cogs - 16 clocks and 16 operations per rotation. Albeit an "operation" here has many facets, from the CORDIC to parallel accesses from all Cogs.
The old "HOT" P2 had in addition to cog ram a "CLUT/AUX" ram block of 1K (256 Longs) which has been removed from the new P2.
Which brings me to a question that has to be asked: (.. and please do not see this as a criticism, just need knowledge, I am more than happy with 100MIPS)
Why is it, in the day of modern ARM processors we can get upto 2..3MIPS/Mhz, but with the Prop only 0.5MIPS/Mhz, I though the addition of a pipeline helped with that, but I know
very little about this ... someone with more knowledge please help .. is it something to do with the silicon technology used?
Cheers,
B.