FlexSpin and Interrupts

msrobots · 2022-07-06 03:07

I can use Interrupts in PASM Code with PropTool and FlexSpin.

But I can not use Interrupts in FlexSpin Spin2, Basic or C.

So question for @ersmith (or @Wuerfel_21 who understand his code), why is it that that FlexSpin does not support the Interrupts and can that be added?

I really prefer FlexSpin for my projects but would like to run some Spin (or Basic) routines in one of the available interrupts, would help a lot in my current projects.

curious,

Mike

evanh · 2022-07-06 06:53

Filesystem support will need worked over for sure. The fast smartpin code I wrote assumes no interrupts and, as is, will break badly if a block transfer is interrupted. It gets a lot of its speed from unbroken SPI clocking.

That said, if I ever sort out using the streamer then that should be able to operate alongside interrupts.

Rayman · 2022-07-06 12:59

I think interrupts break the fifo and so doesn’t work with hub exec type code.

But, any pasm driver cogs can use interrupts…

Yanomani · 2022-07-06 14:56

@Rayman said:
I think interrupts break the fifo and so doesn’t work with hub exec type code.

But, any pasm driver cogs can use interrupts…

Fortunately it'll never occur, unless the interrupt code itself depends/relies-on Hub execution.

Once it's started, and enters in sync with the Hub "pace", Hub exec can "serve" the Alu at Sysclk-rate, but the fastest instructions can only "consume" it at Sysclk/2-rate.

In case of Hub execution, the Fifo interface can be seen as always "in advance", as related to the rate the Alu is able to "consume" it.

So, from the standpoint of Hub execution, keeping Interrupt routines code within the limits of Cog registries and/or Lut memory spaces is a win-win situation.

Hope it helps a bit.

Henrique

evanh · 2022-07-06 15:25

An IRQ will be a branch. Any branch into hubRAM automatically invokes the FIFO. Interrupts shouldn't be an issue living anywhere. The limitation on the use of the FIFO is explicitly attempting to use it for two jobs at once. eg: If hubexec is using it then the streamer can't without crashing the program.

Yanomani · 2022-07-06 19:51

@evanh said:
An IRQ will be a branch. Any branch into hubRAM automatically invokes the FIFO. Interrupts shouldn't be an issue living anywhere. The limitation on the use of the FIFO is explicitly attempting to use it for two jobs at once. eg: If hubexec is using it then the streamer can't without crashing the program.

You're right, as usual.

It was my fault; mother's-language-related, but still a fault...

I understood the "break the fifo"-part of @Rayman's post in terms of breaking the rhythm (aka: breaking the pace), not in terms of any conficting/crashing situation.

I was thinking on how the beggining of servicing the interrupt routine would force the Fifo to get redirected to another Hub address, in order to "grab" interrupt routine's first instruction; this normally consumes cycles, untill realignment with the Hub rotation can be achieved;

And it'll occur once more, at the end (return) of the interrupt code execution, when the Fifo would need to be redirected again, in order to "grab" the next instruction of the code that was interrupted.

Both situations (entering interrupt service routine, and returning from it) would cause Fifo realignment with Hub rotation, hence any timing penalty.

Sorry by the misinterpretation.

Rayman · 2022-07-06 19:57

Ok, I remembered wrong...

I did find this post from @ersmith :

Flexprop's high level languages absolutely do not support ISRs, the generated code is not interrupt safe (e.g. it uses the CORDIC). If you need to use ISRs, I recommend putting them in a separate cog from the one running the HLL.

Wuerfel_21 · 2022-07-06 20:06

Simple (no CORDIC, etc) handwritten ISRs should actually work. You'll need to PUSHA/POPA any registers you clobber. Also, if FCACHE is enabled y

There's a tracking issue for real IRQ support (including high-level handlers), main issue is that the mutable state of a cog is really quite large.

pik33 · 2022-07-06 21:52

A PNut /PropTool compiles to a bytecode, which is then interpreted by a short interpreter by the cog #0. In case of IRQ, that's interpreter which is interrupted, not a program, the interpreter state can be much more defined than the unknown big piece of code.

ersmith · 2022-07-06 23:41

@msrobots said:
So question for @ersmith (or @Wuerfel_21 who understand his code), why is it that that FlexSpin does not support the Interrupts and can that be added?

>

I really prefer FlexSpin for my projects but would like to run some Spin (or Basic) routines in one of the available interrupts, would help a lot in my current projects.

Making FlexSpin generated code be re-entrant (and hence safe to use both inside and outside of interrupts) would be an enormous amount of work -- the code generator was never designed with that in mind (it was originally written for P1, where interrupts were not ever an issue). Off the top of my head there would be the following problems:

(1) We would have to give up CORDIC instructions, or else protect them from interrupts (delaying the interrupts)
(2) Lots of COG registers would have to be saved/restored: all of the arg* registers, result* registers, and var* registers
(3) Various other internal state would have to be saved/restored (the _muldiva registers, RESULT registers, and so on)
(4) FCACHE would have to be handled correctly
(5) We'd have to give up using the REP instruction in the code generator

All of this would cost quite a bit in performance, and it's by no means trivial to implement.

evanh · 2022-07-06 23:59

Bring on the 1 MB 16-Cog Prop2 I say.

JonnyMac · 2022-07-07 00:46

Let's get the current P2 documented and a develop full suite of useful drivers before we send Chip off on another 14-year adventure, shall we?

evanh · 2022-07-07 01:06

Oh, that's not a new design I'm referring to. It only takes money for OnSemi to re-spin the Prop2 onto 130 nm silicon.

Electrodude · 2022-07-07 01:12

@evanh said:
Oh, that's not a new design I'm referring to. It only takes money for OnSemi to re-spin the Prop2 onto 130 nm silicon.

Sure, the digital stuff is all parametric, but what about redesigning all the analog parts for a different process? Is that just a matter of manual layout, or would Chip's schematics have to change as well?

evanh · 2022-07-07 01:15

Schematic, no. Although Chip will want to fix the crystal oscillator, sysclock source select, and ADC flaws.

Treehouse will need to fit the primitives to the new process, yes. More money of course. Chip's involvement is minimal though.

pik33 · 2022-07-07 07:12

@evanh said:
Bring on the 1 MB 16-Cog Prop2 I say.

And/or a "P2 Plus" with (PS)RAM interface added to the HUB and (a lot of) (PS)RAMs glued on top

But then

Let's get the current P2 documented and a develop full suite of useful drivers before we send Chip off on another 14-year adventure, shall we?

This is the most priority task for Propeller's family. This chip can do a lot of things but you have to know how to write the code for it. I play with P2s more than one year and I have a P1 experience which helped a lot, so "cog", "hub", SPIN, overall program model with 8 parallel processors with limited internal RAM, etc, were not new things for me. Still, I don't know a lot of things and the descriptions are hard to find and are scattered in the current documentation files. My current question is: how often can I call the CORDIC from one cog? Do I have to wait 55 clocks, get the result and set a new operation, or maybe I can set a loop, calling the CORDIC every 8 clocks and receiving the result from 7th call before? The experiment will tell but not everyone wants to experiment and then the set of examples and the list of things that can and cannot be done could save time for these experiments.

evanh · 2022-07-07 09:19

@pik33 said:
And/or a "P2 Plus" with (PS)RAM interface added to the HUB and (a lot of) (PS)RAMs glued on top

Not for me thanks. That would need a fully implemented hardware cache (Expect hubRAM to half in size) so as to handle burst read/writes and reduce the resulting read-modify-write thrashing. Latency will still be high when compared to native DRAM.

ManAtWork · 2022-07-07 09:49

@ersmith said:
(1) We would have to give up CORDIC instructions, or else protect them from interrupts (delaying the interrupts)

Not exactly. We only had to give up pipelined use of the CORDIC, e.g. starting multiple Q.... instructions before GETQX/Y. Using only a single CORDIC operation at a time is not disturbed by interrupts if they don't use the CORDIC themselves.

But yes, you have to be careful what's inside the ISR. A simple, hand writte ASM ISR that interrupts the compiled main program isn't a problem. Using compiled ISRs would be difficult due to reasons (2) to (4), I agree. (5) (REP) does no harm as long as you're aware that it delays/stalls the ISR execution.

ManAtWork · 2022-07-07 10:01

@pik33 said:
My current question is: how often can I call the CORDIC from one cog? Do I have to wait 55 clocks, get the result and set a new operation, or maybe I can set a loop, calling the CORDIC every 8 clocks and receiving the result from 7th call before? The experiment will tell but not everyone wants to experiment and then the set of examples and the list of things that can and cannot be done could save time for these experiments.

This works fully automagically. You can start up to 7 CORDIC operations before the first GETQX/Y (as long as you don't use interrupts which can mess up the timing). See example here

Using the CORDIC solver non-pipelined, e.g. single operation at a time only, is fine even when using interrupts. I've tested that and it works. See discussion here

Wuerfel_21 · 2022-07-07 10:04

I'm pretty sure the state of non-piped CORDIC can be saved/restored using a sequence such as:

' IRQ entry here
    pollqmt wc ' clear event flag
    getqx rx
    getqy ry
    pollqmt wc
    wrc rq

' Do interrrupt stuff

    tjnz rq,#.no_cordic
    setq ry
    qrotate rx,#0
.no_cordic
' Return from IRQ

pik33 · 2022-07-07 10:24

My current question is: how often can I call the CORDIC from one cog? Do I have to wait 55 clocks, get the result and set a new operation, or maybe I can set a loop, calling the CORDIC every 8 clocks and receiving the result from 7th call before? The experiment will tell

I found I can call the cordic fast. Time to try this

pik33 · 2022-07-07 12:03

@pik33 said:

My current question is: how often can I call the CORDIC from one cog? Do I have to wait 55 clocks, get the result and set a new operation, or maybe I can set a loop, calling the CORDIC every 8 clocks and receiving the result from 7th call before? The experiment will tell

I found I can call the cordic fast. Time to try this

This works fully automagically. You can start up to 7 CORDIC operations before the first GETQX/Y (as long as you don't use interrupts which can mess up the timing). See example here

I always had in mind that the CORDIC is in most cases too slow (55 clocks, while reading from HUB is up to 17 clocks and from LUT 3 clocks). Now I tested the pipelined operation: it computes 12 sines in 166 clocks with partially unrolled loop. This is 14 clock for one sample and should be reduced even more when there is more to compute in one loop. The asymptotic value is 8 clocks per operation. Maybe this means a 6-op FM synth with a 16 voice polyphony (=something like a DX7, but with a perfect sine waves and DACs ) can be fit in one cog this way. A CORDIC can also calculate all the exponential/logarythmic stuff needed for envelopes.

A test loop:

asm shared

cordic  

p101   getct   time1
       qrotate a1000,angle1
       add angle1,delta1
       qrotate a1000,angle2
       add angle2,delta2
       qrotate a1000,angle3
       add angle3,delta3
       qrotate a1000,angle4
       add angle4,delta4
       qrotate a1000,angle5
       add angle5,delta5
       qrotate a1000,angle6
       add angle6,delta6
       getqx result1
       qrotate a1000,angle1
       add angle1,delta1       
       getqx result2
       qrotate a1000,angle2
       add angle2,delta2      
       getqx result3
       qrotate a1000,angle3
       add angle3,delta3       
       getqx result4
       qrotate a1000,angle4
       add angle4,delta4       
       getqx result5
       qrotate a1000,angle5
       add angle5,delta5       
       getqx result6
       qrotate a1000,angle6
       add angle6,delta6         
       getqx result11
       getqx result12
       getqx result13
       getqx result14
       getqx result15
       getqx result16
       getct time2
       sub time2,time1

       wrlong result1,#$30
       wrlong result2,#$34
       wrlong result3,#$38
       wrlong result4,#$3c
       wrlong result5,#$40
       wrlong result6,#$44
       wrlong result11,#$48
       wrlong result12,#$4c
       wrlong result13,#$50
       wrlong result14,#$54
       wrlong result15,#$58
       wrlong result16,#$5c
       wrlong time2,#$60
       waitx ##2_999_999       

       jmp #p101

a1000 long $10000
angle1 long 0
angle2 long 0
angle3 long 0
angle4 long 0
angle5 long 0
angle6 long 0
delta1 long $00800000
delta2 long $00900000
delta3 long $00a00000
delta4 long $00b00000
delta5 long $00c00000
delta6 long $00d00000
result1 long 0
result2 long 0
result3 long 0
result4 long 0
result5 long 0
result6 long 0
result11 long 0
result12 long 0
result13 long 0
result14 long 0
result15 long 0
result16 long 0

time1 long 0
time2 long 0

end asm

evanh · 2022-07-07 12:30

In practise, it's actually hard to fit the required numerical shuffling into just four instructions. Especially for long buffers. Using a 16-clock interval (every second pipeline slot) produces better pacing.

ersmith · 2022-07-07 15:05

@ManAtWork said:

@ersmith said:
(1) We would have to give up CORDIC instructions, or else protect them from interrupts (delaying the interrupts)

Not exactly. We only had to give up pipelined use of the CORDIC, e.g. starting multiple Q.... instructions before GETQX/Y. Using only a single CORDIC operation at a time is not disturbed by interrupts if they don't use the CORDIC themselves.

Sorry, I should have been more clear. The compiler would have to give up generating CORDIC instructions, at least if we wanted to use high level languages for ISRs (which the OP was asking about). Although perhaps something like Ada's qrotate hack would allow us to save and restore the CORDIC state; we'd have to do some testing.

rogloh · 2022-07-07 15:26

What happens if you want to use the Q register in your ISR, and you need to save the current SETQ state? E.g., if it is currently being used for MUXQ operations when the interrupt hits.

Is there a good way to read the current Q so you can restore it afterwards with another SETQ?
Maybe QROTATE #0,#0 then a GETQY D and later SETQ D? But this adds a lot of ISR entry overhead and may have to compete with other CORDIC operations in progress at the time of the ISR.

EDIT: actually it doesn't seem like that would work, as it uses 0 for Y if the SETQ prefix is not used prior to the QROTATE operation, not some existing Q from an earlier SETQ.

Wuerfel_21 · 2022-07-07 15:36

you can save Q by doing

    mov tmp,#0
    muxq tmp,##-1

but the compiler never generates code that relies on stale Q register, so whatever (infact, does it ever use MUXQ? Certainly not for the compiled code)

Electrodude · 2022-07-07 15:38

(Nevermind; Ada beat me to it.)

rogloh · 2022-07-07 15:39

@Wuerfel_21 said:
you can save Q by doing
    mov tmp,#0
    muxq tmp,##-1
but the compiler never generates code that relies on stale Q register, so whatever (infact, does it ever use MUXQ? Certainly not for the compiled code)

Very nice.

FlexSpin and Interrupts

Comments