P2 JMP/CALL addresses to COG/LUT/HUB
Cluso99
Posts: 18,069
in Propeller 2
I haven't tested this so I am unsure of the precise way the jmp/call addresses work with respect to cog, lut and hub addresses.
Let me explain. the silicon checks the address, and if the address <$200 it is COG, if it is <$400 it is LUT.
So, this code is fine, no problems here...
So, I am pondering if this will work
Any ideas before I test it???
Let me explain. the silicon checks the address, and if the address <$200 it is COG, if it is <$400 it is LUT.
So, this code is fine, no problems here...
org 0 ' COG cog ... org $200 ' LUT lut ... orgh $1000 ' HUB hub ... jmp #cog jmp #lut jmp #hubAbove, COG/LUT $400 is equivalent to HUB $1000 because cog and lut is in longs and hub is in bytes.
$400 = %0000_0100_0000_0000 $1000 = %0001_0000_0000_0000
So, I am pondering if this will work
orgh $400 hub2 ... jmp #hub2
Any ideas before I test it???
Comments
But if you code as we normally do, from org $0 upwards, and this is placed in hub as we normally do, then the first 2KB for cog occupies hub $000-$1FF, and lut follows on normally from $200-$3FF, then the normal hub code for hubexec would normally be from $1000 which equals cog $400 because $400 << 2 = $1000.
Anyway, I tested my question, and a JMP/CALL to hub $400 does indeed work.
So, in fact we only lose 1KB of space where hubexec will not run, not the 4KB that I had previously expected
It uses the ROM monitor/debugger for serial output.
COG/LUT $400 is invalid though.
Your example above sets hub address to $1000 whuch is exactly that, hub address $1000..
BTW Nearly all the code I have posted here on this forum has used ORGH $400.
If you use orgh $400 then either you only have a cog/lut combined program of 1KB total, or you are loading the cog manually from a higher hub location.
I have just relocated my cog code and lut code above the hubexec code which now starts at orgh $400, and I have a stub cog program at org $0 orgh $0 that copies up my cog & lut programs and then jumps to them. It's a bit fiddley as you have to go back to hubexec to copy them up or otherwise you overwrite the running code
Hub address $1000 follows on from a 2KB cog and 2KB lut code space in hub.
No No @Cluso99 your thinking is wrong. There is no need to shift. It is $3FF where lut ends. We have 2K reg and 2k lut. you wish you had $200 longs as registers, but its just $200 bytes …
The save place starts at $400
Enjoy!
Mike
I had always incorrectly assumed that hubexec could only run from 4KB and above, because of the fact that there is 2KB for cog and 2KB for lut.
This all stems from the fact that I have an app that requires hubexec addresses to be a maximum of 12-bits. This meant addresses <$1000. So I can have hubexec code from $400-$FFF, which is 3KB of hubexec code.
That's not quite right.
We have $200 longs for cog ram and SFRs (addressed as $0-$1FF) and $200 longs for lut ram (addressed as $200 - $3FF), but hub ram is byte addressed.
If you wanted a single load code image, starting at address 0 that contained a full cog image, followed by a full lut image, and then some hubexec code, that hubexec code would need to sit at (or above) $1000 in hub ram, but that doesn't set the lowest limit for hubexec; that is set by the PC interpretation which allows hubexec from $400.
All of us use $400 because we have small programs it works. Even FastSpin does. BUT IT IS WRONG.
SAVE HUB starts at $1000 not $400.
Whow
I wouldn't call either wrong. Just there is a caveat to using orgh $400 is all.
$200 longs for cog is $800 bytes in hub.
$200 longs for lut is another $800 bytes in hub.
If both these are placed in hub, then the first free hub byte for hubexec will be $1000.
Why i started this thread was because of the above paragraph, i had always (incorrectly) assumed that hubexec could only ever start at $1000 as otherwise the silicon would not know if a jmp/call was to cog, lut or hub. My wrong presumption was that cog and lut jmp and call addresses were stored as byte addresses ie shifted left 2 zero bits.
But the cog and lut addresses, are always 10 bits, with internal silicon taking care of the fact they are long addresses, not byte addresses.
The great quirk is that for the silicon to determine that cog and lut addresses are all addresses less than $400 (<$400). So any address $400+ is hub, and $400 hub is a byte address which is only immediately above 1KB.
So, having jmp/call use the first 4KB of cog+lut addresses in longs (10 bits), and thereafter all addresses as bytes (11+ bits), means we only have lost the ability to run hubexec for the first 1KB of hub. ie in a 512KB hub, only the first 1KB cannot be used for hubexec, or 511KB can be used for hubexec, which starts at byte $400 in hub.
By placing the cog and lut main code in an alternate higher location in hub, I have 3KB of hubexec space between $400-$FFF hub that I can run while keeping those addresses in a table of 12-bits (uses the fact that $FFF is 12-bits with the higher bits all zeros).
Why does this matter to me? Well, as a number of you may know, in P1 i often use a long to store 3 9-bit cog addresses for tables of routines. ie In my spin interpreter, i use a 256 long table, where each long represents a bytecode. So each bytecode has up to 3 subroutines that can be performed for each bytecode. So i decode the bytecode and use a series of MOVS and SHR #9 to get each subroutine address (i call the vectors).
In my Z80 code I need to sometimes have the first routine be hubexec as i cannot fit all the code in cog or lut. So my long tables have the first routine as 12-bits (cog/lut/hub) followed by two 10-bit cog/lut addresses. My code now fits in 2KB cog plus 2KB lut plus 3KB hub. The tables are higher in hub.
Not really.
This long vs byte addressing issue only relates to hubexec; reading and writing to hub ram is always byte addressed.
Code for cog and lut execution can sit anywhere in hub ram because it isn't going be executed there.
One could just as easily create a single load image with the hubexec code first, followed by any cog and lut image(s), but it would need to be loaded into hub ram no lower than $400.
Not quite. When the P2 starts initially by downloading, and for that matter from FLASH and SD, Cog 0 starts by having the first 2KB (less special registers) copied from hub $0.
So in order to have cog and lut fully loaded with code (4KB) while reserving hub $400+ for hubexec, there needs to be a stub in hub $0 (1KB max) that can copy additional cog and from elsewhere, or at least above the hubexec code at $400.
So, anyone using hub orgh $400 can only have a 1KB cog program start, and load the rest by code.
This is why I have always used hub orgh $1000 in the past, which reserves 2KB for cog followed by 2KB lut. Lut could have been loaded from elsewhere, so hub could start from $800, just reserving 2KB for COG #0.
I feel dumb again, but I am happy he found it out. Lemmings we are...
Mike
If my "cog0" code happens to grow larger than 1k bytes ($400) the compiler lets me know and I can simply replace the ORGH $400 with a single ORGH and continue onwards and upwards.
Context is everything. My comments were in response to Mike claiming that we've all been doing it WRONG.
On initial startup this may be the case, but to suggest that sacrificing 3K of hub ram in the general case in order to avoid this corner case seems to be the wrong approach.
Brian follows the approach I would advocate: aim for $400 until you run out of room, then reassess.
The very fact that this discussion is happening now, rather than six months ago, suggests to me that it is not a significant problem; certainly not worth sacrificing 3K of 511K ram to avoid having to think about the limitation.
This should be easy to add to fastspin, but I'm not sure if this is the best syntax. We could also create a new "orghmin" directive instead, but that seems a bit ugly.
@cgracey, do you think adding something like this to PNut is both desirable and relatively easy to accomplish?
I don’t think any changes are necessary. I’d just presumed that i needed to leave 4KB in hub before hubexec. I therefore always started hub at $1000.
We have FIT to test for cog/lut size.
It wasn’t until I was thinking about trying to use hub addresses with 12bits that I thought about the actual silicon functioning.
It’s probably a cursory note in the tricks and traps.
To summarize:
(1) Hubexec needs to start at $400 or later.
(2) The space from $0 to $3ff is often used for COG code (among other things)
(3) If that COG code is less than 1K in size then we need to use "ORGH $400" to force the hubexec code to start at $400.
(4) If the COG code is more than 1K in size, then "ORGH $400" will give an error. In that case we can use just plain "ORGH" and let the hubexec start at whatever place it ends up (it's above $400).
(5) If we use just plain "ORGH" but the COG code shrinks back below 1K, the hubexec code will start too low.
So it'd be handy to have a way to say "switch to HUB mode but make sure it's at at HUB address $400 or later"
Can you do bounds checking on a plain ORGH within the compiler/assembler and add filler if necessary?
That would be one less thing for the programmer to have to consider.
Thinking about it further, perhaps an orgh operator option of
ORG >=$400
could be useful.
But it’s up to you.
I’m just delighted to be able to put hubexec code in hub $400-$FFF.
The problem is that ORGH gets used for data, as well as hub-exec code. Any label should reflect the contextual address. I think the programmer just has to know that he can't jump to hub-exec code below $400. He might want to have some snippets of code below $400, though, that get moved at runtime to somewhere else for execution. So, I don't think it would be good to change the rules about ORGH.
Okay. If you do an 'ORGH $400', it will zero-fill to $400, already. I guess what is wanted is an ORGH variant that will zero-fill to $400, in case the current address is below $400.
But I often switch between ORG and ORGH sections without addresses, so that they stack up.