You should use "mov ijmp1, ##_my_fancy_isr". The value of of _my_fancy_isr is greater than 0x3ff since the hub code starts at 0x400.
EDIT: Each C file gets compiled into an object file starting at 0x400. The linker combines all the object files together, and relocates the object files so they fit in hub memory one after another.
You should use "mov ijmp1, ##_my_fancy_isr". The value of of _my_fancy_isr is greater than 0x3ff since the hub code starts at 0x400.
EDIT: Each C file gets compiled into an object file starting at 0x400. The linker combines all the object files together, and relocates the object files so they fit in hub memory one after another.
WOOHOO! Just had to change my set_isr macro and it's working
So #asdf refers to a symbol "asdf" in cog memory and ##asdf refers to a symbol in hub memory? This would be part of the Spin2/PASM language standard then, is that the right vocabulary?
No, #asdf does not refer to cog memory. The "mov ijmp1, #_my_fancy_isr" instructions generates a single instruction that loads the 9 LSBs of _my_fancy_isr into the ijmp1 register. "mov ijmp1, ##_my_fancy_isr" actually generates 2 instructions that load the 32-bit value of _my_fancy_isr into ijmp1. The first instruction is an AUGS that gets the 23 MSBs of _my_fancy_isr, and the second instruction is essentially "mov ijmp1, #_my_fancy_isr & 511". These two instructions work together to load the full 32 bits of _my_fancy_isr into ijmp1.
No, #asdf does not refer to cog memory. The "mov ijmp1, #_my_fancy_isr" instructions generates a single instruction that loads the 9 LSBs of _my_fancy_isr into the ijmp1 register. "mov ijmp1, ##_my_fancy_isr" actually generates 2 instructions that load the 32-bit value of _my_fancy_isr into ijmp1. The first instruction is an AUGS that gets the 23 MSBs of _my_fancy_isr, and the second instruction is essentially "mov ijmp1, #_my_fancy_isr & 511". These two instructions work together to load the full 32 bits of _my_fancy_isr into ijmp1.
My whole understanding of C/C++ compilers is getting thrown for a loop. I thought, upon entering a function, all the local registers were pushed to the top of the stack and before returning they were popped off again, allowing the calling function to have its state saved and restored. But I'm not seeing any of that in the code generated from PropGCC. How does PropGCC avoid clobbering local variable when it calls a function???
So my post is two questions:
1) How the heck does this work
and
2) Assuming it works by "magic" and said "magic" will break if an ISR is called in the middle of this function executing, can p2gcc wrap my ISR function logic with push/pop instructions? Maybe it could search for symbols starting with "_isr" and insert push instructions at the top. That seems "easy" enough. Then maybe p2gcc could find the "jmp lr" and replace it with pop instructions and a reti<X>? How to determine X though... maybe further restrictions on the function name, such as "function must start with isr_X" where "X" is a number 1-3?
One temporary option is that I write two macros which push & pop all the registers. This methodology would be a bit wasteful though, because in the C code I wouldn't know which registers actually need to be saved. p2gcc could, potentially, look at the function implementation and only save those registers which are going to be clobbered.
Dave Hein and David Zemon, thank you for addressing the subject of interrupts. This is currently appreciated and will be much more appreciated by me in the future.
The first 6 arguments are placed in registers r0 to r5. If there are more than 6 arguments the rest of the arguments are put on the stack. This makes function calling more efficient.
For functions with a variable number of arguments, the extra arguments plus the one previous to them are all put on the stack. As an example, for printf, all of the arguments are on the stack. For ssprintf, the first argument is in r0, and the rest are on the stack.
r0 is used to return a single 32-bit value. r0 and r1 are used to return a 64-bit value, such as double.
If a function uses a pointer to an argument it must copy the argument to memory, and use the memory copy instead of the value in the register.
Also, PropGCC uses 15 general purpose registers call r0 to r14. The are 2 special purpose registers call sp and lr, which are the stack pointer and link register. The link register hold the return address, and a function save it if it needs to call another function. Of the registers r0 through r14, some of the registers can be modified, and some of the must be preserved. I'm pretty sure r0 through r5 don't need to be preserved. I don't know if there are any others that can be changed without restoring their original values.
So we have the scenario of func1(...) calling func2(...). Am I correct in understanding that, prior to invoking func2(), PropGCC will ensure that none of r0-r14 are being used for anything other than arguments into func2? PropGCC is going to ensure any local variables that it created and stored in r0-r14 which were needed before func2 and will be needed again after func2 have been saved onto the stack for safe keeping?
And that does seem to confirm that we both need and should be able to push all the registers onto the stack at the beginning of an ISR and pop them off when we're done. Alternatively, we could "switch register banks." That might be an interesting thought exercise: is it feasible for p2gcc to implement switchable register banks on top of PropGCC and, for the future, how hard is it to implement switchable register banks in PropGCC itself? Seems much more complicated, but it would save a lot of CPU time not having to push/pop for each ISR.
My whole understanding of C/C++ compilers is getting thrown for a loop. I thought, upon entering a function, all the local registers were pushed to the top of the stack and before returning they were popped off again
Right, a quick run down on calling conventions in high level languages and compiling for them:
Typically the code generator for a particular target architecture will assign the machine registers into several
disjoint sets, caller-save, callee-save and scratch/temporary. When function a() calls function b(), a is the
caller, b is the callee.
The protocol is that a procedure/function's code has to save and restore the callee-save registers it touches,
they are strictly property of the calling function.
Caller-save registers are assumed to always be trashed by procedure/function calls (unless global dataflow
analysis indicates otherwise), so must be saved/restored by the caller across calls (if live at that point).
A live register is one whose value is needed later in at least one execution path.
Temporary/scratch registers are for use within a code fragment, they never need to be saved and are assumed
trashed by any callee. Special purpose machine registers and flags are typical examples, and sometimes a
couple of general purpose registers are set aside for complex fragments.
Interrupt handlers have a different calling convention in which all registers (and flags) are callee-save.
You could make all registers caller-save, or you could make all callee-save, but neither performs as well
as using some of each - what you are striving for is that the leaf functions and the next level up (which is
where code spends most of its time) are typically avoiding unnecessary stack-traffic - each caller-save
register is free for the leaf function to use, each callee-save is free for the next level up functions to use...
Higher up in the call levels you get less benefit, but leaf calls are usually where a program spends most time.
When one language can call into another (like C to assembler), the calling conventions may differ, and such
"foreign" calls require extra code fragments to convert between conventions.
There is of course more to it than this, as usually a set of the caller-save/temp registers are also the argument/return registers, and many architectures have complicated rules about which registers arguments go into
when there is a mix of types and sizes, and when the number of argments means the stack is also involved
in passing arguments.
Heavily optimizing compilers can do more advanced things, like assign different calling conventions to
different functions, and inlining leaf functions automatically, based on analysis of the inner loops and
the call tree - however anything visible to the linker has to use the standard calling conventions.
Only r0-r5 are used for passing arguments, and as I said, I believe these register do not need to be preserved. I know that the higher registers do need to be preserved, but I don't know which register this starts at. One way to determine this is to look at generated assembly code and see what it does. That's how I figured out the calling argument rules.
A C ISR needs to save all the registers on entry, and restore all the registers on exit. With the eggbeater architecture this can be done quite efficiently with the streaming instructions. It takes about the same number of cycles to store all 15 registers as it would to store 2 or 3 registers separately.
Only r0-r5 are used for passing arguments, and as I said, I believe these register do not need to be preserved. I know that the higher registers do need to be preserved, but I don't know which register this starts at. One way to determine this is to look at generated assembly code and see what it does. That's how I figured out the calling argument rules.
A C ISR needs to save all the registers on entry, and restore all the registers on exit.
Makes sense that normal functions wouldn't need to save off r0-r5 - for my purposes, I only care about ISRs though. I trust PropGCC to do the right thing everywhere else and don't want to mess with it. But I do want to be able to use a high-level language for my ISRs.
With the eggbeater architecture this can be done quite efficiently with the streaming instructions. It takes about the same number of cycles to store all 15 registers as it would to store 2 or 3 registers separately.
I haven't gotten far enough in my explorations to know how to do this yet. Right now, I'm trying to make the simple-but-ugly solution work, by sticking this at the end of prefix.spin2:
__PUSH_FRAME
mov isr_bak_r0, r0
mov isr_bak_r1, r1
mov isr_bak_r2, r2
mov isr_bak_r3, r3
mov isr_bak_r4, r4
mov isr_bak_r5, r5
mov isr_bak_r6, r6
mov isr_bak_r7, r7
mov isr_bak_r8, r8
mov isr_bak_r9, r9
mov isr_bak_r10, r10
mov isr_bak_r11, r11
mov isr_bak_r12, r12
mov isr_bak_r13, r13
mov isr_bak_r14, r14
mov isr_bak_sp, sp
mov isr_bak_temp, temp
mov isr_bak_temp1, temp1
mov isr_bak_temp2, temp2
ret
__POP_FRAME
mov r0, isr_bak_r0
mov r1, isr_bak_r1
mov r2, isr_bak_r2
mov r3, isr_bak_r3
mov r4, isr_bak_r4
mov r5, isr_bak_r5
mov r6, isr_bak_r6
mov r7, isr_bak_r7
mov r8, isr_bak_r8
mov r9, isr_bak_r9
mov r10, isr_bak_r10
mov r11, isr_bak_r11
mov r12, isr_bak_r12
mov r13, isr_bak_r13
mov r14, isr_bak_r14
mov sp, isr_bak_sp
mov temp, isr_bak_temp
mov temp1, isr_bak_temp1
mov temp2, isr_bak_temp2
ret
isr_bak_r0 long 0
isr_bak_r1 long 0
isr_bak_r2 long 0
isr_bak_r3 long 0
isr_bak_r4 long 0
isr_bak_r5 long 0
isr_bak_r6 long 0
isr_bak_r7 long 0
isr_bak_r8 long 0
isr_bak_r9 long 0
isr_bak_r10 long 0
isr_bak_r11 long 0
isr_bak_r12 long 0
isr_bak_r13 long 0
isr_bak_r14 long 0
isr_bak_sp long 0
isr_bak_temp long 0
isr_bak_temp1 long 0
isr_bak_temp2 long 0
Of course... that's not gonna be good when the second interrupt fires in the middle of the first .... but that's a problem for another day (hopefully by the time I get that far, either I'll know how to use the eggbeater or you'll have come to my rescue and provided the necessary code).
So I paired the above with a couple macros in my headers:
Annnddd if I did everything right, that should work, right? Not that it's necessary for this ISR, but I'm starting here since I know it already works and then will move it to my serial object. But it's not working. I'll bang on this a bit more before I try to post a simple and complete example which demonstrates the problem.
In PropGCC, r0-r7 are scratch registers and are not saved across function calls. If a function only uses those registers then it won't have to do any pushes or pops. We also use r0-r5 for passing arguments and for returning values.
The definitions for all of these things are in the gcc sources, gcc/gcc/config/propeller.h.
Made some progress this evening. I figured out that I can't just put my __PUSH_FRAME and __POP_FRAME functions at the end of prefix.spin2 (right before the "orgh $400" line). I'm thinking that's because my application code is overwriting those functions at link time (or something???). So I stuck a "org 400" right above the "__PUSH_FRAME" label and at least now the behavior is consistent whether I put my ISR definition at the top or bottom of blinky.c Unfortunately, that "consistent" behavior is not the correct behavior. All my application is running perfectly except the ISR... it's as if the ISR is being skipped entirely. It's almost as if... the "ret" instruction in __PUSH_FRAME is returning from the ISR rather than returning back to the ISR. So that led me to try and insert the push/pop functions inline in the ISR in blinky.c Of course... that quickly failed because the mov instructions were being rewritten as wrlong/rdlong. Oh well.... to p2asm I go!
So I combined prefix.spin with some hand-written code and it's working just fine. One key difference is that I removed the "org $400" from between prefix.spin2 and my code. Without that, I was getting a bunch of "illegal literals," which I suppose makes sense.
But this was at least enough to confirm that call/ret is allowed in ISR blocks. I was starting to worry that I missed something, and that the "ret" instruction in __PUSH_STACK was going all the way back to the main code rather than back to the ISR.
Anyway, my sadly broken code is pushed to git. Full commit here, blinky.c with the ISR here, working test_int.p2asm here, and I've attached my modified prefix.spin2 to the post.
In PropGCC, r0-r7 are scratch registers and are not saved across function calls. If a function only uses those registers then it won't have to do any pushes or pops. We also use r0-r5 for passing arguments and for returning values.
The definitions for all of these things are in the gcc sources, gcc/gcc/config/propeller.h.
So taking this and all the other comments above into consideration, is the following correct?
It is up to the calling function to ensure that r0-r7 are stashed away somewhere if it makes a call to another function but still needs those values. And it is up to the callee function to ensure r8-r14 are saved off and restored prior to modifying them, if it needs to modify them?
And, of course, an ISR needs to save off and restore any SPR that will be modified.
So taking this and all the other comments above into consideration, is the following correct?
It is up to the calling function to ensure that r0-r7 are stashed away somewhere if it makes a call to another function but still needs those values. And it is up to the callee function to ensure r8-r14 are saved off and restored prior to modifying them, if it needs to modify them?
And, of course, an ISR needs to save off and restore any SPR that will be modified.
Dave,
loadp2 is still not right for me. I've got it working reliably by outright replacing the call to findp2() with serial_init().
Testing has me a little confused for the moment. There was always an intermittent "Could not find a P2" error, even with the ES silicon, but when changing back to testing on the FPGA it has some sort of resetting after load problem again. But not consistently, sometimes it works fine.
That is with findp2(). With just serial_init() it seems to be 100% reliable on both chips.
Here's the actual change I've made:
/*
// Find a P2 on one of the serial ports, or on the specified port
if (!findp2(PORT_PREFIX, LOADER_BAUD, port))
{
printf("Could not find a P2\n");
exit(1);
}
*/
if (1 != serial_init(port, LOADER_BAUD))
{
printf("Could not open port %s\n", argv[1]);
exit(1);
}
The change is just a copy'n'paste from an older version of loadp2.c. I know it's not suitable as general solution but just to let you know the area of problem.
I have the DE2-115 and the P2 Eval board. My development machine are a Windows 10 box and a Linux box running Xubuntu.
I suppose I could add another flag that avoids findp2, and just does a serial_init without testing for a P2 chip. I'm away from home right now, so I won't be able to get to this until Saturday.
I'm flat out of ideas. I've spent another evening trying to find a way to push/pop all the SPRs while in an interrupt without using any of the SPRs and everything I've tried has failed. Every time I try to modify prefix.spin2, the app behaves "strangely." I'm guessing a hardcoded address somewhere and code is getting clobbered? And nothing I've tried in C has produced the desired .spin2 file. The closest I've come is this mess:
which produces an extra rdlong instruction for each mv and, even if I didn't care about the wasted instructions, it's using one of the registers that I'm trying to prevent from being clobbered
Umm, my advise is don't try to. ISRs in the propeller, like "objects" in the Obex, should be integral to the program running in that cog. Each use is custom to suit the job. Mostly that means everything is global within that cog.
Conventions for utilisation of hubRAM might be a good idea though.
I have a question?, Why doesn't the P2 use DFU mode to load code. This seems to be a standard used by a number of chip makers to load code into memory.
This is why the P2 doesn't use DFU mode to load code.
From Wikipedia:
DFU or Device Firmware Upgrade mode allows all devices to be restored from any state. It is essentially a mode where the BootROM can accept iBSS. DFU is part of the SecureROM which is burned into the hardware, so it cannot be removed. On A7+ devices, it generates an ApNonce and recognizes APTickets as well, so even in DFU, it can accept an APTicket.
Comments
This works great and blinks p58 at 10 Hz while 56 and 57 blink at 1 Hz.
This is the code I would like to write in C
But it's not working at all. For reference, here's the exact commit for both blinky.c as well as common.h, not included in this post.
At the bottom of the generated .spin2 file I see that my main code is getting launched into HUB RAM. is that really the case?
If that is truly the case, would that not prevent the ISR from working correctly? I do get this warning from p2asm:
EDIT: Each C file gets compiled into an object file starting at 0x400. The linker combines all the object files together, and relocates the object files so they fit in hub memory one after another.
I can only find V7
WOOHOO! Just had to change my set_isr macro and it's working
So #asdf refers to a symbol "asdf" in cog memory and ##asdf refers to a symbol in hub memory? This would be part of the Spin2/PASM language standard then, is that the right vocabulary?
Wow. Impressive Thanks for the explanation.
For instance, here's a C function:
and here's the .s file from PropGCC
Nothing to do with pushing/popping the stack.
So my post is two questions:
1) How the heck does this work
and
2) Assuming it works by "magic" and said "magic" will break if an ISR is called in the middle of this function executing, can p2gcc wrap my ISR function logic with push/pop instructions? Maybe it could search for symbols starting with "_isr" and insert push instructions at the top. That seems "easy" enough. Then maybe p2gcc could find the "jmp lr" and replace it with pop instructions and a reti<X>? How to determine X though... maybe further restrictions on the function name, such as "function must start with isr_X" where "X" is a number 1-3?
One temporary option is that I write two macros which push & pop all the registers. This methodology would be a bit wasteful though, because in the C code I wouldn't know which registers actually need to be saved. p2gcc could, potentially, look at the function implementation and only save those registers which are going to be clobbered.
Upon return those registers are reloaded with new values.
No stack is required.
Mike
For functions with a variable number of arguments, the extra arguments plus the one previous to them are all put on the stack. As an example, for printf, all of the arguments are on the stack. For ssprintf, the first argument is in r0, and the rest are on the stack.
r0 is used to return a single 32-bit value. r0 and r1 are used to return a 64-bit value, such as double.
If a function uses a pointer to an argument it must copy the argument to memory, and use the memory copy instead of the value in the register.
And that does seem to confirm that we both need and should be able to push all the registers onto the stack at the beginning of an ISR and pop them off when we're done. Alternatively, we could "switch register banks." That might be an interesting thought exercise: is it feasible for p2gcc to implement switchable register banks on top of PropGCC and, for the future, how hard is it to implement switchable register banks in PropGCC itself? Seems much more complicated, but it would save a lot of CPU time not having to push/pop for each ISR.
Right, a quick run down on calling conventions in high level languages and compiling for them:
Typically the code generator for a particular target architecture will assign the machine registers into several
disjoint sets, caller-save, callee-save and scratch/temporary. When function a() calls function b(), a is the
caller, b is the callee.
The protocol is that a procedure/function's code has to save and restore the callee-save registers it touches,
they are strictly property of the calling function.
Caller-save registers are assumed to always be trashed by procedure/function calls (unless global dataflow
analysis indicates otherwise), so must be saved/restored by the caller across calls (if live at that point).
A live register is one whose value is needed later in at least one execution path.
Temporary/scratch registers are for use within a code fragment, they never need to be saved and are assumed
trashed by any callee. Special purpose machine registers and flags are typical examples, and sometimes a
couple of general purpose registers are set aside for complex fragments.
Interrupt handlers have a different calling convention in which all registers (and flags) are callee-save.
You could make all registers caller-save, or you could make all callee-save, but neither performs as well
as using some of each - what you are striving for is that the leaf functions and the next level up (which is
where code spends most of its time) are typically avoiding unnecessary stack-traffic - each caller-save
register is free for the leaf function to use, each callee-save is free for the next level up functions to use...
Higher up in the call levels you get less benefit, but leaf calls are usually where a program spends most time.
When one language can call into another (like C to assembler), the calling conventions may differ, and such
"foreign" calls require extra code fragments to convert between conventions.
There is of course more to it than this, as usually a set of the caller-save/temp registers are also the argument/return registers, and many architectures have complicated rules about which registers arguments go into
when there is a mix of types and sizes, and when the number of argments means the stack is also involved
in passing arguments.
Heavily optimizing compilers can do more advanced things, like assign different calling conventions to
different functions, and inlining leaf functions automatically, based on analysis of the inner loops and
the call tree - however anything visible to the linker has to use the standard calling conventions.
A C ISR needs to save all the registers on entry, and restore all the registers on exit. With the eggbeater architecture this can be done quite efficiently with the streaming instructions. It takes about the same number of cycles to store all 15 registers as it would to store 2 or 3 registers separately.
Makes sense that normal functions wouldn't need to save off r0-r5 - for my purposes, I only care about ISRs though. I trust PropGCC to do the right thing everywhere else and don't want to mess with it. But I do want to be able to use a high-level language for my ISRs.
I haven't gotten far enough in my explorations to know how to do this yet. Right now, I'm trying to make the simple-but-ugly solution work, by sticking this at the end of prefix.spin2:
Of course... that's not gonna be good when the second interrupt fires in the middle of the first .... but that's a problem for another day (hopefully by the time I get that far, either I'll know how to use the eggbeater or you'll have come to my rescue and provided the necessary code).
So I paired the above with a couple macros in my headers:
and an ISR
Annnddd if I did everything right, that should work, right? Not that it's necessary for this ISR, but I'm starting here since I know it already works and then will move it to my serial object. But it's not working. I'll bang on this a bit more before I try to post a simple and complete example which demonstrates the problem.
The definitions for all of these things are in the gcc sources, gcc/gcc/config/propeller.h.
So I combined prefix.spin with some hand-written code and it's working just fine. One key difference is that I removed the "org $400" from between prefix.spin2 and my code. Without that, I was getting a bunch of "illegal literals," which I suppose makes sense.
But this was at least enough to confirm that call/ret is allowed in ISR blocks. I was starting to worry that I missed something, and that the "ret" instruction in __PUSH_STACK was going all the way back to the main code rather than back to the ISR.
Anyway, my sadly broken code is pushed to git. Full commit here, blinky.c with the ISR here, working test_int.p2asm here, and I've attached my modified prefix.spin2 to the post.
So taking this and all the other comments above into consideration, is the following correct?
It is up to the calling function to ensure that r0-r7 are stashed away somewhere if it makes a call to another function but still needs those values. And it is up to the callee function to ensure r8-r14 are saved off and restored prior to modifying them, if it needs to modify them?
And, of course, an ISR needs to save off and restore any SPR that will be modified.
Yes, that is correct.
loadp2 is still not right for me. I've got it working reliably by outright replacing the call to findp2() with serial_init().
Testing has me a little confused for the moment. There was always an intermittent "Could not find a P2" error, even with the ES silicon, but when changing back to testing on the FPGA it has some sort of resetting after load problem again. But not consistently, sometimes it works fine.
Here's the actual change I've made:
The change is just a copy'n'paste from an older version of loadp2.c. I know it's not suitable as general solution but just to let you know the area of problem.
I suppose I could add another flag that avoids findp2, and just does a serial_init without testing for a P2 chip. I'm away from home right now, so I won't be able to get to this until Saturday.
which produces an extra rdlong instruction for each mv and, even if I didn't care about the wasted instructions, it's using one of the registers that I'm trying to prevent from being clobbered
Conventions for utilisation of hubRAM might be a good idea though.
Mike
From Wikipedia: