Now, I've just got to come up with the compact common-usage WRPIN variants to enable efficient DAC and mode setting. Then, everything involving multiple smart pins is going to be very efficient.
... Now you've got me thinking about making all this parallel.
I thought you had selected ripple mode to save logic ?
Parallel (same edge) pin updates exist already using usual out opcodes. right ?
This allows a pin-pointer, should someone want relocatable/relative pin coding ?
... Now you've got me thinking about making all this parallel.
I thought you had selected ripple mode to save logic ?
Parallel (same edge) pin updates exist already using usual out opcodes. right ?
This allows a pin-pointer, should someone want relocatable/relative pin coding ?
I'm trying to imagine what the logic difference would be.
That is why I thought having a mask like SETPAT or a new SETM/SETM2 might be better. I don't think anything needs to be saved for Interrupts. Then it could be done in parallel with no clock penalty.
I have no idea of the silicon penalty though. It's just more like a super DIRx/OUTx/INx instruction. Could even be restricted to 32 pin A or B group.
That is why I thought having a mask like SETPAT or a new SETM/SETM2 might be better. I don't think anything needs to be saved for Interrupts. Then it could be done in parallel with no clock penalty.
I have no idea of the silicon penalty though. It's just more like a super DIRx/OUTx/INx instruction. Could even be restricted to 32 pin A or B group.
We output now a 6-bit pin pointer and a 34-bit command code (2type+32data).
Instead of the 6-bit pointer, we could output a fully decoded 64-bit pattern with extra bits like the improved bit/pin instructions use. This wouldn't take much more hardware, if any. Then, we could write the same data to any number of smart pins in a 2-clock instruction.
Again, I will stress I am concerned about feature creep.
To me, a 32 or 64 bit pin mask would be nicer. Set it once, or more often if it changes. Then just use it with all the extra instructions. The only penalty is setting the mask once. No penalty for the other instructions. The SETMASK (or use the SETPAT instruction) could set the whole 64 bits in one go by using both D and S operands.
As I said above, happy to limit it to 32 bits and an A & B port.
I just think a mask is cleaner. As a by-product, the pins don't have to be successive.
Just realised that only the WxPIN instructions take the extra clocks.
I had thought this applied to the DRVx etc instructions.
For these, up to 8 successive pins can be done with an immediate pin count, else use a register, and a limitation of 32 pins in an A or B bank.
Sounds great to me.
Again, I will stress I am concerned about feature creep.
To me, a 32 or 64 bit pin mask would be nicer. Set it once, or more often if it changes. Then just use it with all the extra instructions. The only penalty is setting the mask once. No penalty for the other instructions. The SETMASK (or use the SETPAT instruction) could set the whole 64 bits in one go by using both D and S operands.
As I said above, happy to limit it to 32 bits and an A & B port.
I just think a mask is cleaner. As a by-product, the pins don't have to be successive.
The pins don't have to be successive with a mask, but for objects that will be passed pin numbers, they are likely to work in ranges, anyway. It's likely that things will be lumped together, in practice. Of course, a 64-bit OR gate would give us both possibilities. We'll see.
Ah, there is one problem when dealing with pin registers. To operate on both A and B ports at once leaves some problems with data-forwarding in the pipeline. Some accommodation would have to be made to span both ports.
Chip,
OT. Could the CNT be extended to 64 bits?
Then just need a new instruction to copy the 64-bit CNT to the GETX and GETY internal result registers.
Everything would work as-is for 32-bit CNT, but to get a 64-bit CNT, you do a COPYCNT instruction followed by GETX and/or GETY (or whatever these instructions are called).
WAITCTx etc do not work on the 64-bit CNT.
This just allows a bigger timer to be implemented in software, should someone want longer timers.
What do you think? Presume little risk, little silicon?
Chip,
OT. Could the CNT be extended to 64 bits?
Then just need a new instruction to copy the 64-bit CNT to the GETX and GETY internal result registers.
Everything would work as-is for 32-bit CNT, but to get a 64-bit CNT, you do a COPYCNT instruction followed by GETX and/or GETY (or whatever these instructions are called).
WAITCTx etc do not work on the 64-bit CNT.
This just allows a bigger timer to be implemented in software, should someone want longer timers.
What do you think? Presume little risk, little silicon?
Let me think about it. It's not hard to do, just HOW to present it.
Chip,
OT. Could the CNT be extended to 64 bits?
Then just need a new instruction to copy the 64-bit CNT to the GETX and GETY internal result registers.
Everything would work as-is for 32-bit CNT, but to get a 64-bit CNT, you do a COPYCNT instruction followed by GETX and/or GETY (or whatever these instructions are called).
WAITCTx etc do not work on the 64-bit CNT.
This just allows a bigger timer to be implemented in software, should someone want longer timers.
What do you think? Presume little risk, little silicon?
Let me think about it. It's not hard to do, just HOW to present it.
What if a new instruction read the upper 32 bits, and WC set C according to CNT[31] ?
Software can then check if the upper has been increments. More work, but at least it can be used.
Could GETCT set Z if CT has passed through zero since the last GETCT? This rollover bit cleared after GETCT. Could also set C if CT[31]=1 for completeness. The 64-bit count code go be:
Could GETCT set Z if CT has passed through zero since the last GETCT? This rollover bit cleared after GETCT. Could also set C if CT[31]=1 for completeness. The 64-bit count code go be:
MOV CTHI,#0
...
GETCT CTLO WZ
IF_Z ADD CTHI,#1
If Z if difficult, then use C.
Yes, that's a low cost way to get more capture headroom, if 64b it too complex.
It can have 2 D-FF and write to C & Z if asked.
C can toggle on overflow and Z is sticky, set if 2nd overflow (C set and next overflow)
That gives
ZC
00 = within 2^32 range
01 = 32b range extension, safe (34s at 250MHz)
10 = has overflowed, user care needed to decide if this is 2..3 *2^32, or is 4..5*2^32 or >
11 = has overflowed, user care needed to decide if this is 3..4 *2^32
Chip,
OT. Could the CNT be extended to 64 bits?
Then just need a new instruction to copy the 64-bit CNT to the GETX and GETY internal result registers.
Everything would work as-is for 32-bit CNT, but to get a 64-bit CNT, you do a COPYCNT instruction followed by GETX and/or GETY (or whatever these instructions are called).
WAITCTx etc do not work on the 64-bit CNT.
This just allows a bigger timer to be implemented in software, should someone want longer timers.
What do you think? Presume little risk, little silicon?
And, do we need it ???
This type of queued capture, gets complicated by interrupts - ie if an INT happens after the capture, but before the read, and INT also reads CNT, the first queued value is invisibly replaced.
I've got the spanned WRPIN/WXPIN/WYPIN instructions working on any number of contiguous pins in 2 clocks. So, now you'll be able to update lots of pins at once with the same data.
I've got the spanned WRPIN/WXPIN/WYPIN instructions working on any number of contiguous pins in 2 clocks. So, now you'll be able to update lots of pins at once with the same data.
... hmm.. I worry how much extra logic did that flash-span-encoder cost ?
I've got the spanned WRPIN/WXPIN/WYPIN instructions working on any number of contiguous pins in 2 clocks. So, now you'll be able to update lots of pins at once with the same data.
... hmm.. I worry how much extra logic did that flash-span-encoder cost ?
But data rippling through each pin a clock at a time is not good.
I've got the spanned WRPIN/WXPIN/WYPIN instructions working on any number of contiguous pins in 2 clocks. So, now you'll be able to update lots of pins at once with the same data.
... hmm.. I worry how much extra logic did that flash-span-encoder cost ?
For a 2-cog compile, the LE-count shot up to 27,926 from a prior 27,906. That's 20 more LE's!
if you are going to leave CNT as is, then you should NOT change this.
Seriously, this is seriously absurd to argue to change when CNT is 32bit and requires PASM code (more than this does) to deal with and impacts more people and is in my opinion MORE annoying.
The WHOLE argument for this change was convenience to use a feature and save a long or two when using it. A feature that a LOT of projects will never use.
CNT will likely be used in like 99% of projects, and if you don't deal with wrap (no matter how long your timing need is) you will have issues, and it will happen a lot more often.
if you are going to leave CNT as is, then you should NOT change this.
Seriously, this is seriously absurd to argue to change when CNT is 32bit and requires PASM code (more than this does) to deal with and impacts more people and is in my opinion MORE annoying.
The WHOLE argument for this change was convenience to use a feature and save a long or two when using it. A feature that a LOT of projects will never use.
CNT will likely be used in like 99% of projects, and if you don't deal with wrap (no matter how long your timing need is) you will have issues, and it will happen a lot more often.
I'll get to it. I can only do one thing at a time.
Sorry, I am a bit frustrated with what is deemed important vs what is not.
I feel like this change is not needed at all, and the LUT PTRx change is only being done for software HDMI that won't be used in the actual chip since you are doing hardware assisted HDMI. Both seem like things that should be very low priority compared to anything else.
Obviously, fixes for the bugs found are top priority, after that everything should be considered "only do it if it's really important". Even the CNT change I that I am advocating should be carefully considered first. I feel like that change is probably the most important of the "nice to have" additions being talked about.
It's amazing to me how people are supporting one change that could be done with just a little software, but against another change that requires MORE software to handle it.
Comments
If pin 63 is exceeded, the writing wraps to pin 0 and continues upward.
These instructions take 2 clocks + 1 clock per extra pin.
Does that +1 clock mean each pin gets the effect 1 clock later?
Yes, it becomes a ripple instruction. (and can be quite slow if applied to many pins)
That's right.
Now you've got me thinking about making all this parallel.
I thought you had selected ripple mode to save logic ?
Parallel (same edge) pin updates exist already using usual out opcodes. right ?
This allows a pin-pointer, should someone want relocatable/relative pin coding ?
I'm trying to imagine what the logic difference would be.
I have no idea of the silicon penalty though. It's just more like a super DIRx/OUTx/INx instruction. Could even be restricted to 32 pin A or B group.
We output now a 6-bit pin pointer and a 34-bit command code (2type+32data).
Instead of the 6-bit pointer, we could output a fully decoded 64-bit pattern with extra bits like the improved bit/pin instructions use. This wouldn't take much more hardware, if any. Then, we could write the same data to any number of smart pins in a 2-clock instruction.
To me, a 32 or 64 bit pin mask would be nicer. Set it once, or more often if it changes. Then just use it with all the extra instructions. The only penalty is setting the mask once. No penalty for the other instructions. The SETMASK (or use the SETPAT instruction) could set the whole 64 bits in one go by using both D and S operands.
As I said above, happy to limit it to 32 bits and an A & B port.
I just think a mask is cleaner. As a by-product, the pins don't have to be successive.
I had thought this applied to the DRVx etc instructions.
For these, up to 8 successive pins can be done with an immediate pin count, else use a register, and a limitation of 32 pins in an A or B bank.
Sounds great to me.
The pins don't have to be successive with a mask, but for objects that will be passed pin numbers, they are likely to work in ranges, anyway. It's likely that things will be lumped together, in practice. Of course, a 64-bit OR gate would give us both possibilities. We'll see.
Ah, there is one problem when dealing with pin registers. To operate on both A and B ports at once leaves some problems with data-forwarding in the pipeline. Some accommodation would have to be made to span both ports.
Definitely don't want to risk upsetting the data forwarding circuit!
OT. Could the CNT be extended to 64 bits?
Then just need a new instruction to copy the 64-bit CNT to the GETX and GETY internal result registers.
Everything would work as-is for 32-bit CNT, but to get a 64-bit CNT, you do a COPYCNT instruction followed by GETX and/or GETY (or whatever these instructions are called).
WAITCTx etc do not work on the 64-bit CNT.
This just allows a bigger timer to be implemented in software, should someone want longer timers.
What do you think? Presume little risk, little silicon?
And, do we need it ???
Let me think about it. It's not hard to do, just HOW to present it.
Software can then check if the upper has been increments. More work, but at least it can be used.
If Z if difficult, then use C.
Yes, that's a low cost way to get more capture headroom, if 64b it too complex.
It can have 2 D-FF and write to C & Z if asked.
C can toggle on overflow and Z is sticky, set if 2nd overflow (C set and next overflow)
That gives
ZC
00 = within 2^32 range
01 = 32b range extension, safe (34s at 250MHz)
10 = has overflowed, user care needed to decide if this is 2..3 *2^32, or is 4..5*2^32 or >
11 = has overflowed, user care needed to decide if this is 3..4 *2^32
This type of queued capture, gets complicated by interrupts - ie if an INT happens after the capture, but before the read, and INT also reads CNT, the first queued value is invisibly replaced.
... hmm.. I worry how much extra logic did that flash-span-encoder cost ?
But data rippling through each pin a clock at a time is not good.
For a 2-cog compile, the LE-count shot up to 27,926 from a prior 27,906. That's 20 more LE's!
Seriously, this is seriously absurd to argue to change when CNT is 32bit and requires PASM code (more than this does) to deal with and impacts more people and is in my opinion MORE annoying.
The WHOLE argument for this change was convenience to use a feature and save a long or two when using it. A feature that a LOT of projects will never use.
CNT will likely be used in like 99% of projects, and if you don't deal with wrap (no matter how long your timing need is) you will have issues, and it will happen a lot more often.
I'll get to it. I can only do one thing at a time.
I feel like this change is not needed at all, and the LUT PTRx change is only being done for software HDMI that won't be used in the actual chip since you are doing hardware assisted HDMI. Both seem like things that should be very low priority compared to anything else.
Obviously, fixes for the bugs found are top priority, after that everything should be considered "only do it if it's really important". Even the CNT change I that I am advocating should be carefully considered first. I feel like that change is probably the most important of the "nice to have" additions being talked about.
It's amazing to me how people are supporting one change that could be done with just a little software, but against another change that requires MORE software to handle it.
These spanning pin and bit instructions are going to go a long ways in cleaning up people's code.
FYI, this is info on the RaspPi counter
https://jsandler18.github.io/extra/sys-time.html
It runs 64b Systick counter, and 32 bit lower compares. Mentions 1us increment, which I guess means /250 prescaler from the 250MHz peripheral clock ?