I can't remember what REP + Interrupt ended up doing.
I think that's a mutually exclusive case, as the REP loads parallel cycle counting hardware, with address compare.
That also means you cannot nest REPs
I think REP inside an interrupt is ok, just not the trigger of interrupt state action, to break out of an active REP.
@jmg, yes. But, I think we had a fix so that an interrupt could happen, but would be delayed until after the REP circuit completes it's work.
You had asked about other delay causes, and I thought that might be one, and if so, worth some discussion as an exception to your idea of a max delay option.
In addition to the "use COG to marginalize jitter case", we've got robust polling too. It's likely people may setup an I/O response COG with specialized code, tuned, cycle counted, whatever, packaged up to do the job, while other COGS are running more general purpose code and or drivers / objects.
@jmg
Would the interrupt jitter problem not be overcome by using another cog without stalling instructions? Do have the added advantage of 16 of those now.
Yes, but that comes with caveats.
Sometimes you need close coupling between main code and the low jitter interrupt, and using a whole COG is still wasteful.
It's already such that interrupts cannot occur on cycles where ALTI/ALTR/ALTD/ALTS are executing. In these cases, the interrupt is delayed by one instruction.
Thanks Chip, that's a relief for sure. I guess it goes without saying I can see the same of effect on this proposed SCL instruction being desirable.
Sometimes you need close coupling between main code and the low jitter interrupt, and using a whole COG is still wasteful.
All I can say is such practices were never the design intent of interrupts per se. Not even close. Well, not in CPU's at any rate.
I can see why coders of microcontrollers would want such a feature though. I guess the more complex a processor is then the less chance there is of achieving jitter free interrupts.
It's already such that interrupts cannot occur on cycles where ALTI/ALTR/ALTD/ALTS are executing. In these cases, the interrupt is delayed by one instruction.
Are there other sources of interrupt jitter ?
The SETQ/SETQ2 instructions along with the following instruction will also delay interrupts. This could be a very long delay when used with RDLONG/WRLONG.
All trade-offs. Raising complexity for scenarios where there are good options needs careful consideration.
Using a whole COG is only wasteful in a full chip scenario.
That won't be the dominant norm. As for the close coupling, we've learned how to do that on P1. Part of this whole affair is learning to think in parallel and author things accordingly.
I can see why coders of microcontrollers would want such a feature though. I guess the more complex a processor is then the less chance there is of achieving jitter free interrupts.
I think the SX core achieved jitter free interrupts, but you are correct that more complex processors have no chance of zero SW jitter, but they can use FIFOs and DMA and advanced peripherals to somewhat side-step the problem.
The P2 is closer to the microcontroller end.
....
Andy, you're right. That would be way better. I was thinking, too, that MAC was not ideal, given the overhead.
Do you think a fixed >> 15 would be appropriate?
We could have an instruction to set the SCL destination register.
Yes a fixed arithemetic shift right by 15 makes most sense with a 16x16 multiplier. We always can use MULS with following SAR if we need some special scaling.
I think a fixed result register for SCL is all we need. Mostly we just need the result as a source value for the next instruction, so SCL can even work like a ALT-type instructions which changes just the src input of the following ALU instruction.
Andy
Andy,
I got this implemented. It turned out to be very simple to do.
I got rid of the C flag for MUL/MULS and used the two new opcode spaces for:
SCLU D,S/# 'unsigned scale, returns top 16 bits of unsigned multiply into next instruction's S value.
SCL D,S/# 'signed scale, returns top 18 bits of signed multiply into next instruction's S value.
By shifting arithmetically right by 14, instead of 15 for SCL, inputs and outputs $4000/$C000 become 1.0/-1.0, making scaling not-necessarily reductive, but inside of -2...+2.
I can't remember what REP + Interrupt ended up doing.
Interrupts can be registered at any time, outside of the ISR, but they won't be able to execute until certain conditions are met:
wire int0_qual = exec && // cog must be executing user code
!altx && // alti/altr/altd/alts must not be executing
!sclx && // sclu/scl must not be executing
!augsp && // augs must not be executing or waiting
!augdp && // augd must not be executing or waiting
!setq && // setq must not be executing
!setq2 && // setq2 must not be executing
!rep && // rep must not be executing
!repa ; // rep must not be active
wire intn_qual = int0_qual && // all int0_qual conditions must be met
!stalli && // stalli must not be executing
!int_stall ; // interrupts must not be stalled
....
Andy, you're right. That would be way better. I was thinking, too, that MAC was not ideal, given the overhead.
Do you think a fixed >> 15 would be appropriate?
We could have an instruction to set the SCL destination register.
Yes a fixed arithemetic shift right by 15 makes most sense with a 16x16 multiplier. We always can use MULS with following SAR if we need some special scaling.
I think a fixed result register for SCL is all we need. Mostly we just need the result as a source value for the next instruction, so SCL can even work like a ALT-type instructions which changes just the src input of the following ALU instruction.
Andy
Andy,
I got this implemented. It turned out to be very simple to do.
I got rid of the C flag for MUL/MULS and used the two new opcode spaces for:
SCLU D,S/# 'unsigned scale, returns top 16 bits of unsigned multiply into next instruction's S value.
SCL D,S/# 'signed scale, returns top 18 bits of signed multiply into next instruction's S value.
By shifting arithmetically right by 14, instead of 15 for SCL, inputs and outputs $4000/$C000 become 1.0/-1.0, making scaling not-necessarily reductive, but inside of -2...+2.
Sounds very good
Can't wait to try it out.
The -2 ... +2 range for SCL is sometimes useful, but also lowers the resolution of the coefficients to 15 bits signed in the -1 ... +1 range. Coefficients can get very small if you have high sampling frequencies and low filter frequencies.
But with a 16x16 bit multiplier we are anyway limited to "not so high quality" DSP applications, so just keep it as it is now. For Audio synthesis for example it should be good enough.
If we need higher resolution we always can use the Cordic multipliers. They will be much slower, but with pipelined MULT and interlaced ADDs it may be a good alternative.
Was just thinking that the 64-byte alignment requirement for streamer is a little cumbersome...
Not really a headache, but more of an annoyance...
I think I'd prefer it to loop after a given # of bytes instead of some #of 64-byte blocks.
Maybe there's some fundamental reason it has to be this way though...
The embedded bitmap is an example of the issue I have... You have to a priori know the offset to the image data to get it aligned..
Edit: hope nobody saw first version of this post... Got confused between streamer and rdfast again...
Maybe the streamer starting point alignment requirement could be waived for rdfast D=#0 case of no looping?
I still get confused between rdfast and streamer... I guess it's because the streamer uses rdfast...
Actually, maybe I see know that the reading point only has to be long aligned.
I think I knew that at one time and then got mixed up...
So, never mind...
I think bitmap data is long aligned, so it shouldn't be a problem...
Comments
That also means you cannot nest REPs
I think REP inside an interrupt is ok, just not the trigger of interrupt state action, to break out of an active REP.
Would the interrupt jitter problem not be overcome by using another cog without stalling instructions? Do have the added advantage of 16 of those now.
@jmg, yes. But, I think we had a fix so that an interrupt could happen, but would be delayed until after the REP circuit completes it's work.
You had asked about other delay causes, and I thought that might be one, and if so, worth some discussion as an exception to your idea of a max delay option.
In addition to the "use COG to marginalize jitter case", we've got robust polling too. It's likely people may setup an I/O response COG with specialized code, tuned, cycle counted, whatever, packaged up to do the job, while other COGS are running more general purpose code and or drivers / objects.
Sometimes you need close coupling between main code and the low jitter interrupt, and using a whole COG is still wasteful.
Thanks Chip, that's a relief for sure. I guess it goes without saying I can see the same of effect on this proposed SCL instruction being desirable.
All I can say is such practices were never the design intent of interrupts per se. Not even close. Well, not in CPU's at any rate.
I can see why coders of microcontrollers would want such a feature though. I guess the more complex a processor is then the less chance there is of achieving jitter free interrupts.
Using a whole COG is only wasteful in a full chip scenario.
That won't be the dominant norm. As for the close coupling, we've learned how to do that on P1. Part of this whole affair is learning to think in parallel and author things accordingly.
The P2 is closer to the microcontroller end.
There it is. Somehow, I didn't see that in the docs.
Andy,
I got this implemented. It turned out to be very simple to do.
I got rid of the C flag for MUL/MULS and used the two new opcode spaces for:
SCLU D,S/# 'unsigned scale, returns top 16 bits of unsigned multiply into next instruction's S value.
SCL D,S/# 'signed scale, returns top 18 bits of signed multiply into next instruction's S value.
By shifting arithmetically right by 14, instead of 15 for SCL, inputs and outputs $4000/$C000 become 1.0/-1.0, making scaling not-necessarily reductive, but inside of -2...+2.
Interrupts can be registered at any time, outside of the ISR, but they won't be able to execute until certain conditions are met:
Sounds very good
Can't wait to try it out.
The -2 ... +2 range for SCL is sometimes useful, but also lowers the resolution of the coefficients to 15 bits signed in the -1 ... +1 range. Coefficients can get very small if you have high sampling frequencies and low filter frequencies.
But with a 16x16 bit multiplier we are anyway limited to "not so high quality" DSP applications, so just keep it as it is now. For Audio synthesis for example it should be good enough.
If we need higher resolution we always can use the Cordic multipliers. They will be much slower, but with pipelined MULT and interlaced ADDs it may be a good alternative.
Andy
Not really a headache, but more of an annoyance...
I think I'd prefer it to loop after a given # of bytes instead of some #of 64-byte blocks.
Maybe there's some fundamental reason it has to be this way though...
The embedded bitmap is an example of the issue I have... You have to a priori know the offset to the image data to get it aligned..
Edit: hope nobody saw first version of this post... Got confused between streamer and rdfast again...
I still get confused between rdfast and streamer... I guess it's because the streamer uses rdfast...
Actually, maybe I see know that the reading point only has to be long aligned.
I think I knew that at one time and then got mixed up...
So, never mind...
I think bitmap data is long aligned, so it shouldn't be a problem...