Just finished verifying the internal stack. It is a depth of 8.
As you push more onto the stack, the bottom one will drop off when full.
As you pop off the stack, the bottom value moves up one and the new bottom value is the same as it was previously. ie once you empty the stack, the last legitimate value popped will continue to be delivered on underflow.
I needed to verify this behaviour as I am building some hubexec routines where I want to use the internal stack but I do not know what depth is available for use. So I can now save the stack (to hub???), and restore it when I am finished.
That behavior is correct. As for saving the stack, you don't need to know the depth. Just pop all 8, then push all 8 when restoring it.
Just finished verifying the internal stack. It is a depth of 8.
As you push more onto the stack, the bottom one will drop off when full.
As you pop off the stack, the bottom value moves up one and the new bottom value is the same as it was previously. ie once you empty the stack, the last legitimate value popped will continue to be delivered on underflow.
I needed to verify this behaviour as I am building some hubexec routines where I want to use the internal stack but I do not know what depth is available for use. So I can now save the stack (to hub???), and restore it when I am finished.
Also be aware that the hardware stack is 22 bits wide not 32.
You know, storing the stack to hub memory will be kinda slow. Too bad there isn't a pair of instructions that would pop/push all of them at once to/from the hub. They would take ~9-24 clock cycles. This would make context switches much faster than the current approach (I think).
Transferring the stack to cog RAM would take just 8 cycles, and then storing it to hub RAM ~9-24 cycles. It doesn't seem worth the effort to create special instructions to do the same thing. I suppose the special instructions could be faster by overlapping the stack accesses with the hub accesses. If we're voting, I'd vote not to do it so the P2 would be available sooner. I don't think I can wait another decade.
Transferring the stack to cog RAM would take just 8 cycles, and then storing it to hub RAM ~9-24 cycles. It doesn't seem worth the effort to create special instructions to do the same thing. I suppose the special instructions could be faster by overlapping the stack accesses with the hub accesses. If we're voting, I'd vote not to do it so the P2 would be available sooner. I don't think I can wait another decade.
Wouldn't that be 16 cycles, not 8? And you would also require 8 unused cog register. And you would need to set up the fast write. But, you are right. It certainly shouldn't be added if it affects delivery of the chip. This is purely a minor optimization, and therefore not critical to the design.
As ozpropdev just said, stack is only 22 bits. I don't think it's very useful in the general case. I'm just going to leave it alone and let it do it's call/ret thing.
Sorry, I forgot that instructions take 2 cycles instead of 1. It's been a while since I paid much attention to Parallax. As time marches on there are more new and exciting chips and boards out there, and Parallax is kind of getting lost in the crowd. They really need to get the P2 out ASAP to remain relevant in the market.
Have you tried writing code and keeping track of what you are doing? Which is it that we write data to again? Was that X or was that Y?
Having written some smartpin code in assembler and in high level I must say that to read and write data using RDPIN and WRPIN makes it a lot easier to read and remember. As for setting the mode then SETPIN also makes sense but so as not to confuse what we do with the extra parameter register I would avoid using the word set and simply say WXPIN. Also all these mnemonics end cleanly in "PIN" and are clearly identified as the PIN family along with their cousin AKPIN. I would say that WXPIN could be called something else but it is just as easy to leave it too.
Looks to me like M, X, and Y will usually only get set once, so they're not so important.
The thing you'll do a lot of is reading and acking the pin.
I'd be fine with READPIN instead of GETPIN.
Not a fan of contracting syntax when not necessary. Think that makes it less clear, and harder to remember.
contracting is normal for mnemonics, that's why they're called mnemonics, you know like rdbyte and wrfast etc. I don't know where you get the idea from that you would only be doing a lot of reading from a pin as I have found that that is not the case. What do you do when you are transmitting serial data?
You're right Y, you do need that a lot too...
Maybe SetPinY should just be SetPin, or WritePin (if that's not to long) or SendPin or something else without the Y.
There is rdbyte and wrfast already, like you say, so maybe that's OK. I think those things are harder to remember though.
When you are coding or looking at your code later, it's nice to be able to clearly see where you are reading and writing the data register. I think we are agreed on WRPIN and RDPIN (the Y and Z registers).
That. Just leaves us with the M and X registers.
Sine you are mostly only ever going to set the M once, and that is the smart pin MODE, then why not just call it MODEPIN ?
In some modes, the X can be set once and other modes it it set often. How about we call it either SETPIN or SETXPIN. I very much dislike WXPIN as it is confusing with WRPIN which is the data.
I don't see a need for special saving instructions. My use is not very time specific as it will usually result in waiting for user input. If my code only requires a depth of 4 then I only need to save/restore 4 levels.
My code is for doing an extended monitor/debugger like I did before P2Hot. The routines will also be useful for standard input/output plus serial. Things like data, hex, decimal, binary, dump, string, etc.
And yes, the width is only 22 bits. (Not verified - will check tomorrow)
While coding, I have been thinking about the differences between the P1 instruction set and the P2 instruction set. They are really quite different aside from the basic instructions such as ADD, AND, ROR, etc.
This made me ponder the question...
What if the first 8 cogs were P1B compatible?
The P1 section would be (than P1) faster, much smaller die than P2, 64 I/O, more hub, security, maybe in the wash up not that much more power than the P1, and would give an immediate upgrade path.
Remember, P1V verilog is already done.
Or dare I even suggest, configurable P1 or P2 ALU ???
At least, it could be considered for a pin compatible variant.
After getting to know this P2, I don't really want to go back to P1...
Except maybe for cases where small size is important...
I am really missing the "glue" objects in the P1. I have to get them running before I can get to the real P2 stuff.
So I was just musing that a mix of P1 & P2 cogs would be quite nifty. We could have the old combined with the new. Of course we would want a version with just the new bits. Others would likely be happy with just a P1 with more hub and faster. Later maybe we could get a mix of P2 and P2Hot cogs
But if the first cab out of the ranks were a P1 & P2 combo, then if the P2 section had problems, at least the P1 would most likely work.
<ducks for cover> What if the first 8 cogs were P1B compatible?
It is easy to write, just a mere few words, but much harder to actually do.
In reality, this would add more complexity to testing, increasing the chance of falling between two stools and having neither work properly.
Documentation gets much more difficult, as do toolchains, as now you have to manage two compilers at the same time.... well, probably 3 as the P1B is bound to end up not-quite-P1.
There seems little risk of P2 16 COGS not fitting, and that is the only scenario where any consideration of a half-baked hybrid would be needed.
Compared to the issues you create, the issue you sought to solve is quite minor.
Most new users will not be coding in PASM anyway.
Comments
That behavior is correct. As for saving the stack, you don't need to know the depth. Just pop all 8, then push all 8 when restoring it.
Wouldn't that be 16 cycles, not 8? And you would also require 8 unused cog register. And you would need to set up the fast write. But, you are right. It certainly shouldn't be added if it affects delivery of the chip. This is purely a minor optimization, and therefore not critical to the design.
SetPinM
SetPinX
SetPin
GetPin
AckPin
Have you tried writing code and keeping track of what you are doing? Which is it that we write data to again? Was that X or was that Y?
Having written some smartpin code in assembler and in high level I must say that to read and write data using RDPIN and WRPIN makes it a lot easier to read and remember. As for setting the mode then SETPIN also makes sense but so as not to confuse what we do with the extra parameter register I would avoid using the word set and simply say WXPIN. Also all these mnemonics end cleanly in "PIN" and are clearly identified as the PIN family along with their cousin AKPIN. I would say that WXPIN could be called something else but it is just as easy to leave it too.
The thing you'll do a lot of is reading and acking the pin.
I'd be fine with READPIN instead of GETPIN.
Not a fan of contracting syntax when not necessary. Think that makes it less clear, and harder to remember.
Maybe SetPinY should just be SetPin, or WritePin (if that's not to long) or SendPin or something else without the Y.
There is rdbyte and wrfast already, like you say, so maybe that's OK. I think those things are harder to remember though.
That. Just leaves us with the M and X registers.
Sine you are mostly only ever going to set the M once, and that is the smart pin MODE, then why not just call it MODEPIN ?
In some modes, the X can be set once and other modes it it set often. How about we call it either SETPIN or SETXPIN. I very much dislike WXPIN as it is confusing with WRPIN which is the data.
I vote for and (postedit)
I don't see a need for special saving instructions. My use is not very time specific as it will usually result in waiting for user input. If my code only requires a depth of 4 then I only need to save/restore 4 levels.
My code is for doing an extended monitor/debugger like I did before P2Hot. The routines will also be useful for standard input/output plus serial. Things like data, hex, decimal, binary, dump, string, etc.
And yes, the width is only 22 bits. (Not verified - will check tomorrow)
While coding, I have been thinking about the differences between the P1 instruction set and the P2 instruction set. They are really quite different aside from the basic instructions such as ADD, AND, ROR, etc.
This made me ponder the question...
What if the first 8 cogs were P1B compatible?
The P1 section would be (than P1) faster, much smaller die than P2, 64 I/O, more hub, security, maybe in the wash up not that much more power than the P1, and would give an immediate upgrade path.
Remember, P1V verilog is already done.
Or dare I even suggest, configurable P1 or P2 ALU ???
At least, it could be considered for a pin compatible variant.
Except maybe for cases where small size is important...
I am really missing the "glue" objects in the P1. I have to get them running before I can get to the real P2 stuff.
So I was just musing that a mix of P1 & P2 cogs would be quite nifty. We could have the old combined with the new. Of course we would want a version with just the new bits. Others would likely be happy with just a P1 with more hub and faster. Later maybe we could get a mix of P2 and P2Hot cogs
But if the first cab out of the ranks were a P1 & P2 combo, then if the P2 section had problems, at least the P1 would most likely work.
Or just faster.
Tiny and with four fast P1 cogs - that's where I live.
In reality, this would add more complexity to testing, increasing the chance of falling between two stools and having neither work properly.
Documentation gets much more difficult, as do toolchains, as now you have to manage two compilers at the same time.... well, probably 3 as the P1B is bound to end up not-quite-P1.
There seems little risk of P2 16 COGS not fitting, and that is the only scenario where any consideration of a half-baked hybrid would be needed.
Compared to the issues you create, the issue you sought to solve is quite minor.
Most new users will not be coding in PASM anyway.
Remember, Chip has the Verilog configured so four fast P2 COGS is quite practical.
I am trying to reverse the bytes in a long. A while ago we had ESWAP8 but this is gone now.
I think you want:
MOVBYTS d, #%%0123
(Edit: dangit! I thought MOV while I typed ROL!)
Good point. A reduced P2 has an excellent chance of being made. Seems likely the package & pin count would be reduced too.
This instruction shifts he D register left by 8 bits and inserts the selected byte in S register in D[7:0]
Edit: Treat the following example as a sequence of instructions.
Here are the Prop1 equivalents
Does this instruction and ROLWORD,ROLNIB really need to be SHLBYTE, SHLWORD & SHLNIB?
Based on your description, should that be: