I am trying to get Tachyon to compile and work properly but so far no joy. The compiler is touchy about the position of the intvecs, as it often says "Hub origin already exceeds target" or I get an error on coginit source of "constant must be from 0 to 511". I guess I am not quite fully understanding all the orgs and orghs and how they affect the compiler itself. I chopped out most of the code but unless I have a small program I get an error.
While I work on getting things rearranged, you guys can keep working by only having a small initial cog program at $00000 that relaunches itself to the real program of interest that sits somewhere after $43F.
Scratch that idea of moving cog/lut exec to the hub ceiling. It would cause a problem between register addresses and branch addresses within the cog. Not going to work. The solution lies in keeping cog/lut-exec $00000-based in hub, which means that the solution will be here a lot sooner.
Imagine #cogroutine is $FFF00, or $100 within the cog. This MOV already has big problems.
And reg holding a cog execution address must contain $FFExx for the JMP.
I don't know how to overcome these kinds of problems. Do you? If we could stuff cog/lut exec against the hub ceiling, it would be fantastic, but how to do it and maintain sanity?
Imagine #cogroutine is $FFF00, or $100 within the cog. This MOV already has big problems.
MOV reg,#cogroutine
And reg holding a cog execution address must contain $FFExx for the JMP.
I don't know how to overcome these kinds of problems. Do you? If we could stuff cog/lut exec against the hub ceiling, it would be fantastic, but how to do it and maintain sanity?
hmm... To still fit in a #9/10? immediate, that would need some form of sign-extend equivalent ?
Either that, or JMP becomes COG-relative, or has a COG-relative variant.
eg ADR() tells the tools what you are trying to do
MOV reg,ADR(cogroutine) ' 9/10 active bits, upper ones are forced COG.LUT
JMP reg ' or shorter jump variant available.
addit : I think this relative form works for ADR(cogroutine) > ADR(JmpBase)
more:
This has gained smaller code in HUB page 0, and there was already an issue with LUT, so a form like
MOVCADR RegN,ADR(AddressInCog)
MOVLADR RegN,ADR(AddressInLut)
MOVHADR RegN,ADR(AddressInHub)
would allow 3 x short opcode loads of RegN, of address in 3 memory areas.
This gains code size, & is portable**, but does need opcode space.
** In actual ASM use, the common MOVADR would be used and the tools would select which opcode to use, based on memory area. out-of-reach address would expand to the 2 opcode version.
This short load triple form has merit no matter what final memory map is used, all possible code areas, have equally efficient blocks. Q:Is there opcode room ?
Chip,
I have a distinct dislike for System Management Mode on the PC. I'm not sure I want an equivalent being bred here.
That doesn't sound fun.
What do you guys think about these debug hooks? Is it the wrong approach? Is your dislike centered on the awkward presence of those 16 longs in lower hub space, or the concept, itself?
Moving initial cog load to $800 sounds fine.
You could also launch it with HUBEXEC if you wanted too then I think...
What are these "hub $00000..$0003F = 16 special longs that create r/w events"
Is that a new idea?
Doesn't it make sense to keep lowest RAM clear for table usage?
Can you put them after $43F?
When cogs read or write those first 16 longs, events are triggered among all cogs. It's like a mailbox function with full handshake. I think they need to stay at the bottom of hub memory, since they have hardwired special functions.
I like the idea, just wish it was more transparent...
Can ROM not set these longs for us, so we don't need to include it in code?
Of course! We just don't have that kind of context running, at this stage. When you download, you are overwriting the entire hub memory, so you must include those 16 instructions.
I think it's just hard to get a handle on the implications of such a potent feature, from where we are now. However that in itself is a good reason to include it in the fpga image now, so we can live with it a while, and see further ahead.
It's also going to speed up testing and faultfinding
Another question is whether a cog can detect whether its being debugged, eg by checking the system CNT increment amount over some normally fixed interval, so see whether cycles are 'leaking' due to a debugger. Of course the debugger could get sophisticated too, and return offset corrected CNT values. These kind of alternating layers of countermeasures are probably healthy, I believe.
I suspect we'll all be as enthusiastic about it as you are Chip, once we've experienced the benefits first hand.
One other thing I don't like about this new scheme is that every coginit will now take an additional 8-10 clock cycles (up to 16-18 cycles) since it must now switch temporarily to hubexec mode and fill the instruction cache. I was looking forward to using coginit to restart a cog at various entry points without reloading code. That debug interrupt can be some significant overhead (not to mention variability).
Because of this, I still think it's a good idea to enable the debug interrupt via a bit in D.
Chip
I've got my code up an running again. I had to change to hubexec though to get it going.
I don't seem to be able to launch code in cogexec mode.
Here's test code to show what i'm seeing.
Chip
I've got my code up an running again. I had to change to hubexec though to get it going.
I don't seem to be able to launch code in cogexec mode.
Here's test code to show what i'm seeing.
wouldn't be better to access debug functionality through special opcodes ? flags that have to be set and address registers ?
After years of programming various processors in assembly, I tend to favour orthogonal, flat models. If the bootloaderthingy has to live at 0x800, then so be it. I'm glad uneven addresses are gone. org/orgh seem a bit odd. sections sound a bit better...
wouldn't be better to access debug functionality through special opcodes ? flags that have to be set and address registers ?
After years of programming various processors in assembly, I tend to favour orthogonal, flat models. If the bootloaderthingy has to live at 0x800, then so be it. I'm glad uneven addresses are gone. org/orgh seem a bit odd. sections sound a bit better...
Uneven addressing is still possible for hub instructions.
Instead of putting the debug interrupt instructions in hubram, requiring a switch to hubexec and a streamer load on every coginit and requiring certain locations in hubram to contain certain values, how hard would it be to run a 32 bit global bus to every cog with the debug instruction, wired up the same as CNT is? This would mean every cog would use the same debug instruction as I mentioned before. The instruction would be stored in a register in the hub that would default to RETI0 on boot and would be settable by cogs via an instruction.
Comments
You are suffering from the same trouble as ozpropdev. We need to rearrange some things.
That looks good to me yes, ORG/SEG will hide the hard-to-remember numbers.
Can the tools not manage that ?
Here's a problem:
MOV reg,#cogroutine
JMP reg
Imagine #cogroutine is $FFF00, or $100 within the cog. This MOV already has big problems.
And reg holding a cog execution address must contain $FFExx for the JMP.
I don't know how to overcome these kinds of problems. Do you? If we could stuff cog/lut exec against the hub ceiling, it would be fantastic, but how to do it and maintain sanity?
hmm... To still fit in a #9/10? immediate, that would need some form of sign-extend equivalent ?
Either that, or JMP becomes COG-relative, or has a COG-relative variant.
eg ADR() tells the tools what you are trying to do
MOV reg,ADR(cogroutine) ' 9/10 active bits, upper ones are forced COG.LUT
JMP reg ' or shorter jump variant available.
addit : I think this relative form works for ADR(cogroutine) > ADR(JmpBase)
MOV reg,ADR(cogroutine)-ADR(JmpBase)
...
JmpBase
JMPREL reg
...
cogroutine
more:
This has gained smaller code in HUB page 0, and there was already an issue with LUT, so a form like
MOVCADR RegN,ADR(AddressInCog)
MOVLADR RegN,ADR(AddressInLut)
MOVHADR RegN,ADR(AddressInHub)
would allow 3 x short opcode loads of RegN, of address in 3 memory areas.
This gains code size, & is portable**, but does need opcode space.
** In actual ASM use, the common MOVADR would be used and the tools would select which opcode to use, based on memory area. out-of-reach address would expand to the 2 opcode version.
This short load triple form has merit no matter what final memory map is used, all possible code areas, have equally efficient blocks. Q:Is there opcode room ?
hub $00000..$001FF = cog exec range
hub $00200..$003FF = lut exec range
hub $00400..$FFFFF = hub exec range
These are flexible:
hub $00000..$0003F = 16 special longs that create r/w events
hub $00400..$0043F = 16 cog-start debug interrupt instructions (RETI0/JMP's)
Initial cog image needs to move from $00000 to maybe $00800.
I have a distinct dislike for System Management Mode on the PC. I'm not sure I want an equivalent being bred here.
Moving initial cog load to $800 sounds fine.
You could also launch it with HUBEXEC if you wanted too then I think...
What are these "hub $00000..$0003F = 16 special longs that create r/w events"
Is that a new idea?
Doesn't it make sense to keep lowest RAM clear for table usage?
Can you put them after $43F?
That doesn't sound fun.
What do you guys think about these debug hooks? Is it the wrong approach? Is your dislike centered on the awkward presence of those 16 longs in lower hub space, or the concept, itself?
Can ROM not set these longs for us, so we don't need to include it in code?
When cogs read or write those first 16 longs, events are triggered among all cogs. It's like a mailbox function with full handshake. I think they need to stay at the bottom of hub memory, since they have hardwired special functions.
Of course! We just don't have that kind of context running, at this stage. When you download, you are overwriting the entire hub memory, so you must include those 16 instructions.
Can you do this in PNut?:
reti0 [16]
It's also going to speed up testing and faultfinding
Another question is whether a cog can detect whether its being debugged, eg by checking the system CNT increment amount over some normally fixed interval, so see whether cycles are 'leaking' due to a debugger. Of course the debugger could get sophisticated too, and return offset corrected CNT values. These kind of alternating layers of countermeasures are probably healthy, I believe.
I suspect we'll all be as enthusiastic about it as you are Chip, once we've experienced the benefits first hand.
' Cog 0..15 initial debug interrupt vectors
'
orgh $400
long $FABBFFFF [16]
Still don't see why the special longs to create r/w events need to be at $0000.
But, I can see them being very useful.
Because of this, I still think it's a good idea to enable the debug interrupt via a bit in D.
I've got my code up an running again. I had to change to hubexec though to get it going.
I don't seem to be able to launch code in cogexec mode.
Here's test code to show what i'm seeing.
At least, I get LED10 to light...
Changing this line seems to fix the problem #@ for cogexec , # for hubexec ???
It's getting late...time for sleep.
When you do that LOC, the address operand label needs to be declared under an ORGH, before the ORG. Or, just use #@label.
I'll document this after I sleep. I'll make some new FPGA files, too.
After years of programming various processors in assembly, I tend to favour orthogonal, flat models. If the bootloaderthingy has to live at 0x800, then so be it. I'm glad uneven addresses are gone. org/orgh seem a bit odd. sections sound a bit better...
Uneven addressing is still possible for hub instructions.
can the r/w event longs and the debug instruction longs all be moved to the end of hub space?
I think that would be the least painful/awkward setup. Then the cog image can go at 0 in hub space again.
It seems really odd to have something at $400 hub space...
I think he considered that (I'm losing track). The issue is that those addresses are not initialized from the code image when the chip boots.