Chip,
I am not wanting a full JMPRET, but rather a stripped-down version. There is no room for a full JMPRET instruction anyway.
Thinking further, we can use the existing P2 instruction JMP #ABS for the JMP/RET, so only the JMPRET CALL instruction is necessary...
JMPRET D,#S
(ie a CALL #XXX where XXX_RET is where the return address is placed)
#S is always immediate and a 9-bit address to jump to (PC=#S[8:0])
D is a 9-bit cog address to store the 9-bit cog return address (R[31:0]=R[31:9] + PC+1[8:0])
I can live without WC and WZ.
Whatever pipeline delays can also be lived with.
The instruction is actually a combination of JMP and SETS..
SETS D,pc++ (ie MOVS in P1)
JMP #cog
Now if the return address has the P2 instruction JMP #abs thus...
jmpret label_ret,#label
.....
label .....
.....
label_ret jmp #0 'absolute jump to cog (upper 11 bits are already coded as '0' so no need to modify)
Note I forget the modifier to force absolute cog address.
Or looking at it another way, it is a
EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,#S {wc/wz/wcz}
instruction where only the PC[8:0] is written to D (PC being PC++)
With this instruction, wc/wz/wcz is used to enable writing the flags. Is there any point in not writing them???
So could the instruction be changed such that if neither WC or WZ was set, then only the bottom 9-bit (or 20-bit) return address would be written???
What might be general interest is a package without a thermal pad. Given the v2 die power is now half the v1 die, thermally I suspect this to be viable with 4-cog package, say a 64 pin package with:
32x I/O
8x VIO
8x GIO
6x VDD
6x VSS
TES/RES
XI/XO
It is unfortunate that
EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,#S {wc/wz/wcz}
writes {C, Z, 10'b0, PC[19:0]} to D
I presume that if wc/wz/wcz is not specified that D[31:30] are unchanged?
But the problem is that the 10'b0 clears out the instruction residing in those bits, which could be a
EEEE 1101100 RAA AAAAAAAAA AAAAAAAAA JMP #abs/#rel (using #abs)
Currently the only way around the missing JMPRET is to use
CALLD SUB_RET,#SUB {WC/WZ/WCZ}
.....
SUB .....
JMP SUB_RET {WC/WZ/WCZ}
SUB_RET LONG 0
Unfortunately, there are a lot of these calls and returns in any program, so the use of the extra long for each return is often a killer
Perhaps a better question may be, do these 10'b0 need to be written in any of the instructions, aside from pushing on the stack where keeping those bits would not be relevant?
And, if wc/wz/wcz is not specified, does D[31:30] get 2'b0 written, or do they remain unchanged??? They would need to remain unchanged if the next 10 bits remained unchanged.
So, CALLD D,{#}S {WC/WZ/WCZ} would become a much better instruction by only copying the return address bits to D[19:0] and leaving D[31:20] unchanged if wc/wz/wcz were not specified.
Doesn't require a reflow oven to mount and remove from circuit boards. Many people will like to assemble their own boards but won't have the option with only the thermal pad version. And repairs won't mean throwing the whole board out.
PS: I remember a teacher saying how he prefers the DIP40 version of the prop1 because it's so much cheaper to replace just the microcontroller than to replace the whole project board each time a student blows an output.
It is unfortunate that
EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,#S {wc/wz/wcz}
writes {C, Z, 10'b0, PC[19:0]} to D
I presume that if wc/wz/wcz is not specified that D[31:30] are unchanged?
But the problem is that the 10'b0 clears out the instruction residing in those bits, which could be a
EEEE 1101100 RAA AAAAAAAAA AAAAAAAAA JMP #abs/#rel (using #abs)
Currently the only way around the missing JMPRET is to use
[/code]
CALLD SUB_RET,#SUB {WC/WZ/WCZ}
.....
SUB .....
JMP SUB_RET {WC/WZ/WCZ}
SUB_RET LONG 0
[/code]
Unfortunately, there are a lot of these calls and returns in any program, so the use of the extra long for each return is often a killer
Perhaps a better question may be, do these 10'b0 need to be written in any of the instructions, aside from pushing on the stack where keeping those bits would not be relevant?
And, if wc/wz/wcz is not specified, does D[31:30] get 2'b0 written, or do they remain unchanged??? They would need to remain unchanged if the next 10 bits remained unchanged.
So, CALLD D,{#}S {WC/WZ/WCZ} would become a much better instruction by only copying the return address bits to D[19:0] and leaving D[31:20] unchanged if wc/wz/wcz were not specified.
I understand what you are saying, but it's way too late to make such changes.
What might be general interest is a package without a thermal pad.
The ADC performance would likely take a hit, and I’m not sure you gain much.
A 64 (or 128 pin) package is still SMD, so lacks the claimed benefits of DIP.
Best to focus on a ‘best package’ for P2, and if a customer rocks up with enough volumes, I’m sure Parallax can make P2 in any practical package the client demands
That would be an exceptional flag operation too. At the moment all CALLxx instructions use WCZ only for the optional restoring of pre-stored flags.
No, the wcz bits are used to determine if the current C & Z flag bits are written to along with the return address.
The POP/RET instructions use the wcz to determine if the C & Z flags get set according to the bits 31 & 30 in the return address.
While the docs indicate that if wcz are not used then the C and Z bits are not written but I am guessing (without trying) that those bits are just written as zeros (ie they are overwritten if its a register)
Sure, it's not as quick as a socket but anyone can use a simple cheap soldering iron for both soldering and desoldering a plain QFP.
The ADC will be fine. There is a lot of individual GIO/VIO supplies. And if that is really to be a concern then it should be proven to be a reason before using it as such.
Sure, it's not as quick as a socket but anyone can use a simple cheap soldering iron for both soldering and desoldering a plain QFP.
The ADC will be fine. There is a lot of individual GIO/VIO supplies. And if that is really to be a concern then it should be proven to be a reason before using it as such.
I’d expect a 128 or 144 pin package of present P2 is not too hard, but such an animal really needs a large enough customer to drive it, ‘Nice for hobbyists’ with a mid-level skill set, is quite a small total market.
A question also is how many GND bonds are needed ?
I have seen Asian MCU vendors offer both fine and coarse pitch (eg) 64 pin parts, and I believe the driving market demand for the 0,8mm parts there, is for wave soldering.
It’s maybe conceivable large demand could come from some wave soldering production line use, so Parallax could look into that. P2 is quite well suited to power electronics.
I wasn't really thinking of any commercial uses at all. A board/module swap is more appropriate there. So a thermal pad will always be the right answer there. If nothing else, it has the pin count advantage, but also can be overclocked much easier, the 250 MHz mark won't be possible without the thermal pad.
Education is the primary market for propeller chips. It is prudent for Parallax to serve education. I'm actually imagining the educational transition from prop1 to prop2 to be quite slow partly because of the price of whole board replacements. And I guess I have to admit a plain QFP isn't likely to help much since a teacher isn't going to have the time to be hand soldering a box full of fault propeller chips.
I was originally thinking of only a 2-cog version without a thermal pad for a small cheap edition but Chip's pinout for 4-cog 64 pin package made me think that that might work too now.
Doesn't require a reflow oven to mount and remove from circuit boards. Many people will like to assemble their own boards but won't have the option with only the thermal pad version. And repairs won't mean throwing the whole board out.
PS: I remember a teacher saying how he prefers the DIP40 version of the prop1 because it's so much cheaper to replace just the microcontroller than to replace the whole project board each time a student blows an output.
I don't think that a thermal pad would be an impediment in that case. If you don't need a thermal pad, don't use it. Simple!
A package having a thermal pad would serve you the same, and other people as well.
The 40 pins (P0-31 and P56-63) works nicely too.
4 cogs will improve hub access latency nicely.
I’m guessing that power will be close to half.
How do the DACs work out in this arrangement?
P2X2C1M??P P2X2C1M24-QFP44 ? P2X2C1M20-DIP40 ?
Maybe a 44 QFP with 24 I/O (P0-16 and P56-63)
Or a DIP40 with 20 I/O (P0-14 and P58-63) - reduced VDD and VIO pins to make way for GND pins
If/when you get to a P2X2C11M??P I wonder if you could add back in 2 tiny cores?
These tiny cores would only have access to their adjacent cogs via shared LUT, and no HUB access, no streamer, no DACs. Instructions would only be the P1 equivalents plus the extra I/O instructions. Think of them as tiny I/O processors. I can’t help but think a pair of good cores need some I/O help since the P2 lacks dedicated peripherals, so a 2 core P2 will find them lacking smart peripherals.
Here is what these tiny cores could do...
* SD FAT driver including perhaps SPI and an RTC (clock)
* Multiple Serial port driver including perhaps I2C and keyboard
* USB driver
* LCD driver (not a screen as that requires hub access)
I imagine that overclocking might run to at least 400+MHz.
Chip,
I know you don’t like the tiny cog idea, but a 2 core P2 will find those missing dedicated peripherals a problem compared to other micros. But you have done the hard Verilog design work. This would just be copying the cog logic and removing a large part of it. If you like, I’ll do a list of what instructions need to remain.
If you don't need a thermal pad, don't use it. Simple!
The prop2 would not work if the thermal pad wasn't securely soldered in place. There is 32 pin pads from the die connected to the thermal pad internally, it is the low noise common reference and power return.
Eg: In the pin list above I had to assign pins for VSS and GIO. Those labels don't occur in Chip's pinout image because they all connect to the thermal pad.
That would be an exceptional flag operation too. At the moment all CALLxx instructions use WCZ only for the optional restoring of pre-stored flags.
No, the wcz bits are used to determine if the current C & Z flag bits are written to along with the return address.
That certainly ain't true for the plain CALL instruction, because I've heavily relied on CALL to preserve C and Z into the hardware stack without ever using the WCZ encoded instruction bits.
@evanh,
Seems apologies are in order.
I completely misunderstood the function of the WC/WZ/WCZ bits on the CALL instructions (when available).
Thanks for bring this up! I was sure they determined whether the C & Z flags were saved, not that they were replaced.
Now I'll have to see how I can use it
EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,{#}S {WC/WZ/WCZ} Call to S** by writing {C, Z, 10'b0, PC[19:0]} to D. C = S[31], Z = S[30].
EEEE 1011010 0LI DDDDDDDDD SSSSSSSSS CALLPA {#}D,{#}S Call to S** by pushing {C, Z, 10'b0, PC[19:0]} onto stack, copy D to PA.
EEEE 1101011 CZ0 DDDDDDDDD 000101101 CALL D {WC/WZ/WCZ} Call to D by writing {C, Z, 10'b0, PC[19:0]} to hub long at PTRA++. C = D[31], Z = D[30], PC = D[19:0].
EEEE 1101101 RAA AAAAAAAAA AAAAAAAAA CALL #A Call to A by pushing {C, Z, 10'b0, PC[19:0]} onto stack. If R = 1, PC += A, else PC = A.
EEEE 1101110 RAA AAAAAAAAA AAAAAAAAA CALLA #A Call to A by writing {C, Z, 10'b0, PC[19:0]} to hub long at PTRA++. If R = 1, PC += A, else PC = A.
EEEE 11100WW RAA AAAAAAAAA AAAAAAAAA CALLD PA/PB/PTRA/PTRB,#A Call to A by writing {C, Z, 10'b0, PC[19:0]} to PA/PB/PTRA/PTRB (per W). If R = 1, PC += A, else PC = A.
Occasionally I have preset flags or passed flags down to a subroutine so I guess with this we could save an instruction depending on the CALL used.
It is more frequent to pass a condition back to the calling program which is where you do not want the flags to be popped. There other occasions where you want to pop the flags back to where they were when returning back the the calling routine. So here, the RET WCZ helps. But don't forget the _RET_ cannot pop the flags back.
Looks to me like you can use D[31] and D[30] to set Z and C when you CALL. Not sure about use case for that... Guess useful to clear the flags before the call...
RET with wcz would restore the flags stored on the call...
At least, that's how I see it. looking at instruction spreadsheet...
Comments
I am not wanting a full JMPRET, but rather a stripped-down version. There is no room for a full JMPRET instruction anyway.
Thinking further, we can use the existing P2 instruction JMP #ABS for the JMP/RET, so only the JMPRET CALL instruction is necessary...
JMPRET D,#S
(ie a CALL #XXX where XXX_RET is where the return address is placed)
#S is always immediate and a 9-bit address to jump to (PC=#S[8:0])
D is a 9-bit cog address to store the 9-bit cog return address (R[31:0]=R[31:9] + PC+1[8:0])
I can live without WC and WZ.
Whatever pipeline delays can also be lived with.
The instruction is actually a combination of JMP and SETS..
SETS D,pc++ (ie MOVS in P1)
JMP #cog
Now if the return address has the P2 instruction JMP #abs thus... Using it...
Note I forget the modifier to force absolute cog address.
Or looking at it another way, it is a
EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,#S {wc/wz/wcz}
instruction where only the PC[8:0] is written to D (PC being PC++)
With this instruction, wc/wz/wcz is used to enable writing the flags. Is there any point in not writing them???
So could the instruction be changed such that if neither WC or WZ was set, then only the bottom 9-bit (or 20-bit) return address would be written???
Kind regards, Samuel Lourenço
JMPRET
It is unfortunate that
EEEE 1011001 CZI DDDDDDDDD SSSSSSSSS CALLD D,#S {wc/wz/wcz}
writes {C, Z, 10'b0, PC[19:0]} to D
I presume that if wc/wz/wcz is not specified that D[31:30] are unchanged?
But the problem is that the 10'b0 clears out the instruction residing in those bits, which could be a
EEEE 1101100 RAA AAAAAAAAA AAAAAAAAA JMP #abs/#rel (using #abs)
Currently the only way around the missing JMPRET is to use Unfortunately, there are a lot of these calls and returns in any program, so the use of the extra long for each return is often a killer
Perhaps a better question may be, do these 10'b0 need to be written in any of the instructions, aside from pushing on the stack where keeping those bits would not be relevant?
And, if wc/wz/wcz is not specified, does D[31:30] get 2'b0 written, or do they remain unchanged??? They would need to remain unchanged if the next 10 bits remained unchanged.
So, CALLD D,{#}S {WC/WZ/WCZ} would become a much better instruction by only copying the return address bits to D[19:0] and leaving D[31:20] unchanged if wc/wz/wcz were not specified.
PS: I remember a teacher saying how he prefers the DIP40 version of the prop1 because it's so much cheaper to replace just the microcontroller than to replace the whole project board each time a student blows an output.
I understand what you are saying, but it's way too late to make such changes.
A 64 (or 128 pin) package is still SMD, so lacks the claimed benefits of DIP.
Best to focus on a ‘best package’ for P2, and if a customer rocks up with enough volumes, I’m sure Parallax can make P2 in any practical package the client demands
No, the wcz bits are used to determine if the current C & Z flag bits are written to along with the return address.
The POP/RET instructions use the wcz to determine if the C & Z flags get set according to the bits 31 & 30 in the return address.
While the docs indicate that if wcz are not used then the C and Z bits are not written but I am guessing (without trying) that those bits are just written as zeros (ie they are overwritten if its a register)
The ADC will be fine. There is a lot of individual GIO/VIO supplies. And if that is really to be a concern then it should be proven to be a reason before using it as such.
A question also is how many GND bonds are needed ?
I have seen Asian MCU vendors offer both fine and coarse pitch (eg) 64 pin parts, and I believe the driving market demand for the 0,8mm parts there, is for wave soldering.
It’s maybe conceivable large demand could come from some wave soldering production line use, so Parallax could look into that. P2 is quite well suited to power electronics.
Education is the primary market for propeller chips. It is prudent for Parallax to serve education. I'm actually imagining the educational transition from prop1 to prop2 to be quite slow partly because of the price of whole board replacements. And I guess I have to admit a plain QFP isn't likely to help much since a teacher isn't going to have the time to be hand soldering a box full of fault propeller chips.
I was originally thinking of only a 2-cog version without a thermal pad for a small cheap edition but Chip's pinout for 4-cog 64 pin package made me think that that might work too now.
The P2X4C2M40P looks good. 40 I/O in a QFP64.
Wonder how much price difference it will make.
A package having a thermal pad would serve you the same, and other people as well.
Kind regards, Samuel Lourenço
It might cost 30% less than the current chip.
The 40 pins (P0-31 and P56-63) works nicely too.
4 cogs will improve hub access latency nicely.
I’m guessing that power will be close to half.
How do the DACs work out in this arrangement?
P2X2C1M??P
P2X2C1M24-QFP44 ?
P2X2C1M20-DIP40 ?
Maybe a 44 QFP with 24 I/O (P0-16 and P56-63)
Or a DIP40 with 20 I/O (P0-14 and P58-63) - reduced VDD and VIO pins to make way for GND pins
If/when you get to a P2X2C11M??P I wonder if you could add back in 2 tiny cores?
These tiny cores would only have access to their adjacent cogs via shared LUT, and no HUB access, no streamer, no DACs. Instructions would only be the P1 equivalents plus the extra I/O instructions. Think of them as tiny I/O processors. I can’t help but think a pair of good cores need some I/O help since the P2 lacks dedicated peripherals, so a 2 core P2 will find them lacking smart peripherals.
Here is what these tiny cores could do...
* SD FAT driver including perhaps SPI and an RTC (clock)
* Multiple Serial port driver including perhaps I2C and keyboard
* USB driver
* LCD driver (not a screen as that requires hub access)
I imagine that overclocking might run to at least 400+MHz.
Chip,
I know you don’t like the tiny cog idea, but a 2 core P2 will find those missing dedicated peripherals a problem compared to other micros. But you have done the hard Verilog design work. This would just be copying the cog logic and removing a large part of it. If you like, I’ll do a list of what instructions need to remain.
Eg: In the pin list above I had to assign pins for VSS and GIO. Those labels don't occur in Chip's pinout image because they all connect to the thermal pad.
Seems apologies are in order.
I completely misunderstood the function of the WC/WZ/WCZ bits on the CALL instructions (when available).
Thanks for bring this up! I was sure they determined whether the C & Z flags were saved, not that they were replaced.
Now I'll have to see how I can use it
It is more frequent to pass a condition back to the calling program which is where you do not want the flags to be popped. There other occasions where you want to pop the flags back to where they were when returning back the the calling routine. So here, the RET WCZ helps. But don't forget the _RET_ cannot pop the flags back.
So a CALL does save the flags and a RET restores them
A CALL wcz does what?
A RET wcz would overwrite the flags upon return?
am I near?
Mike
RET with wcz would restore the flags stored on the call...
At least, that's how I see it. looking at instruction spreadsheet...