On the subject of relative addressing, were you considering maintaining the REPS #n,@label syntax or a new variant?
Neither. The problem is that if multitasking is enabled, that instruction between the REPS and the repeating code goes away. So, it's better for someone to just put the #constant or do the calculation if the block is big.
Relative addressing can be computed by the compiler. This makes the code relocatable. These relative addresses would be held within the JMP/CALL instructions (17 typo16 bits). I cannot see a need to have Relative addressing held in registers - can you???
But the original call to the module would need to be done by an Absolute JMP/CALL.
Most likely, the Absolute address would need to be held in a register because you don't want to plug absolute addresses into instructions within hub.
However, we will also require the JMP/CALL from hub to cog to contain the absolute cog address within the instruction, and also cog to hub.
So it makes sense for JMP/CALLs to Hub or Cog to have absolute addresses contained within the instruction, as well as in a register.
Does this make sense???
I don't think so. If the compiler computes the relatives than that means that in the binary the address will still be an absolute.
The advantage of the toggling/non toggling JUMP versions is that you can execute the same binary code in HUB mode and in Cog mode.
The Cog code is anyway generated as an image in Hub-Memory before it is loaded into the cog. It may be useful to be able to execute the code in the image also in hub mode.
With JMP and HJMP you will need to change all the jumps in a code snippet if you decide to execute it in the other mode.
If the compiler generates two different instructions from the same mnemonic depending on the Hub or cog mode it's a bit simpler, but you can not execute the same binary code in both modes. This is only possible with toggling the mode on CPU level.
Andy
Exactly ... the concept ir right.
I am not skilled enough to suggest how the opcodes should be made but the concept is this eg. float math library:
- now on P1 it starts a cog. If I use its functions from many places/cogs I need to be sure (by using locks or whatever other handshake method) to not call a function while it is executing from another task. If I need concurrency I have to start 2 cogs.
- I WANT that (because the cog pasm is loaded from hub and thus still available in the hub after the load) I can use the cog function for intensive math (eg tight control loops) while I can also call the same function (resident in hub) from an other place where the execution speed is not so crucial but float math is needed anyway.
That means that the binary code must execute in hubexec, but the same code if loaded into the cog must execute the same way in cog mode without any jmp/call/branch patching of any sort.
I would like to see (I would like to be able to write code like this)
jmp address being address a +/- immediate offset to ad to pc making this a immediate relative jmp
jmp #address being address an immediate absolute (set the pc)
jmp @register the referred register contains the relative offset (the location of the register (the reference to) is always expressed in relative terms from the jmp pc)
jmp @#register the referred register contains the absolute address (the location of the register (the reference to) is always expressed in relative terms from the jmp pc)
I started out thinking the same thing, but as I got to considering the mnemonic names I realized that it was going to be a pain to write one kind of code with JMP and another kind with HJMP. I wanted to make the differences go away to keep code simple. Still more to think about....
The assembler will know whether a label is in hub space or COG space and can automatically generate the correct instruction.
I just realized that since all hub-to-cog and cog-to-hub constant jumps must be absolute (since relative addresses make no sense), there is a already an obvious way of discriminating where you are going:
Absolute addresses $000..$1FF are valid cog addresses, but are ROM in hub memory, so not practically called as hub addresses. These would obviously be cog addresses.
Absolute addresses $200..$FFFF are obviously hub addresses, as they are outside the range of cog memory.
So, that takes care of all constant absolute JMPs and CALLs within and between cog and hub memory.
What's left is the register-based JMPs and CALLs, which are always absolute. Their ranges could be discriminated at execution time to determine where they were headed.
This just leaves constant relative branches, which don't ever change modes, and therefore require no special consideration.
All this means that we don't need the JMP_/CALL_ instructions, at all, but we can let the hardware figure out at execution time what the deal is.
Does anyone see any problems with this?
(Perhaps it's providence that the ROM starts at $00000 and is >= $200 longs!)
I like that idea. In fact, I think I suggested it a while back. :-)
The only problem I can see is that you won't be able to put hub code in locations $000-$1FF.
I like that idea. In fact, I think I suggested it a while back. :-)
The only problem I can see is that you won't be able to put hub code in locations $000-$1FF.
Well, thanks for positing this idea. It's getting used now.
The booter, hmac/sha256, and monitor are in the ROM at the bottom of memory, so it's no problem if we can't call any of that code.
I got all these changes made and now nothing works. This happens sometimes. I hope to have this resolved soon so we can get on to the instruction cache.
I would like to see (I would like to be able to write code like this)
jmp address being address a +/- immediate offset to ad to pc making this a immediate relative jmp
jmp #address being address an immediate absolute (set the pc)
jmp @register the referred register contains the relative offset (the location of the register (the reference to) is always expressed in relative terms from the jmp pc)
jmp @#register the referred register contains the absolute address (the location of the register (the reference to) is always expressed in relative terms from the jmp pc)
Thanks, Everyone, for all your help on this hub/cog addressing dilemma. I think we've got the right solution now.
It's true that we need both relative and absolute branches in the cog and in the hub. I thought this was superfluous, at first, but Bill rightly pointed out their necessity.
Anyway, we've got it all now. I just need to find out why nothing's working. I hard-coded the TRACE into the Verilog so I'll be able to see on my logic analyzer what's happening internally. Hopefully, it's something identifiable. Whenever nothing works after a big change, I'm always preoccupied until I get it straightened out. I hate to have something so huge up in the air, especially at 4:40am.
I got all these changes made and now nothing works. This happens sometimes. I hope to have this resolved soon so we can get on to the instruction cache.
I got in the house to go to bed and it occurred to me that I hadn't updated the hardcoded instructions in the boot state machine of the cog. I came back in the office and, sure enough, they were old encodings, one of which now equates to "WAITPF #$1F8". That would hang things up, for sure. I've got it recompiling now. It takes about 40 minutes, so I'm going to sleep for a while.
Chip,
Maybe you just need to get some sleep. It will be clearer in the morning.
Its been a busy day, but the end result is simpler and well worth thetime.
Chip,
For HUB code, I would even be happy to just compile it as if it were cog code (like I do with LMM now) and hand code a long for the appropriate JMP/CALLs.
May just need something like
CON
HJMP EQU %1111101_100_0000_xx_0000000000000000
_IF_Z_OR_C EQU %xxxx << 18
DAT
ORG 0
ORGH $4000
...code here...
LONG _IF_Z_OR_C I HJMP | ~label ' if_z_or_c hjmp #label
[code]
where ~label would be some char and the label would be the hub or cog address - hopefully you could add this feature to pnut as a temp thing???
Postedit: Maybe
LONG _IF_Z_OR_C | HJMP | @label
or @@label
would already work???
I just realized that since all hub-to-cog and cog-to-hub constant jumps must be absolute (since relative addresses make no sense), there is a already an obvious way of discriminating where you are going:
Absolute addresses $000..$1FF are valid cog addresses, but are ROM in hub memory, so not practically called as hub addresses. These would obviously be cog addresses.
Absolute addresses $200..$FFFF are obviously hub addresses, as they are outside the range of cog memory.
So, that takes care of all constant absolute JMPs and CALLs within and between cog and hub memory.
What's left is the register-based JMPs and CALLs, which are always absolute. Their ranges could be discriminated at execution time to determine where they were headed.
This just leaves constant relative branches, which don't ever change modes, and therefore require no special consideration.
All this means that we don't need the JMP_/CALL_ instructions, at all, but we can let the hardware figure out at execution time what the deal is.
Does anyone see any problems with this?
(Perhaps it's providence that the ROM starts at $00000 and is >= $200 longs!)
I got in the house to go to bed and it occurred to me that I hadn't updated the hardcoded instructions in the boot state machine of the cog. I came back in the office and, sure enough, they were old encodings, one of which now equates to "WAITPF #$1F8". That would hang things up, for sure. I've got it recompiling now. It takes about 40 minutes, so I'm going to sleep for a while.
I got in the house to go to bed and it occurred to me that I hadn't updated the hardcoded instructions in the boot state machine of the cog. I came back in the office and, sure enough, they were old encodings, one of which now equates to "WAITPF #$1F8". That would hang things up, for sure. I've got it recompiling now. It takes about 40 minutes, so I'm going to sleep for a while.
Nice, but now I am needing to stay up for the result. This P2 is addictive!
That would waste a long (4 bytes) for every jump/call in hub code.
It is not difficult for PNUT and compilers to embed the 16 bit address.
Bill, I think you misunderstood. Its a temporary solution suggested to overcome pnut limitations. No extra longs are wasted. The long is a handcrafted jump/call until pnut could do hub mode - so we can start testing earlier.
However, I think with the latest instructions this is probably unnecessary.
Nice, but now I am needing to stay up for the result. This P2 is addictive!
I need to sleep for about 5 hours, I think. Otherwise, tomorrow will be a waste. So, if it's midnight for you, you can go to bed, too.
Edit: I had to go through the ROM programs and update the JMP/CALL instructions. With that built-in 4-level FIFO stack for plain CALL and RET instructions, you don't need all those labels attached to the RET, anymore. It really cleaned things up a lot. Now, if we could have a straight label after JMP/CALL, instead of @relative, it would really make the code look nice.
Sweet dreams Chip, I find it very useful sometimes to step away from the keyboard, let your subconscious do all your debugging, and you'll quite often wake up with the answer fingers crossed for you
Bill, I think you misunderstood. Its a temporary solution suggested to overcome pnut limitations. No extra longs are wasted. The long is a handcrafted jump/call until pnut could do hub mode - so we can start testing earlier.
However, I think with the latest instructions this is probably unnecessary.
With Cluso's encouragement, I stayed up the extra 40 minutes to see how the compile went.
It works! Everything seems functional. The ROM Monitor came right up and now the Balls.spin demo is running. What a relief! It was just those two boot-state-machine instructions that were causing things to appear dead. Now that it's working, it also means there's agreement between the chip and the development tool. Those new LIFO stacks are running, too. Next thing: fetching instructions from the hub. Tomorrow.
With Cluso's encouragement, I stayed up the extra 40 minutes to see how the compile went.
It works! Everything seems functional. The ROM Monitor came right up and now the Balls.spin demo is running. What a relief! It was just those two boot-state-machine instructions that was causing things to appear dead. Now that it's working, it also means there's agreement between the chip and the development tool. Those new LIFO stacks are running, too. Next thing: fetching instructions from the hub. Tomorrow.
With Cluso's encouragement, I stayed up the extra 40 minutes to see how the compile went.
It works! Everything seems functional. The ROM Monitor came right up and now the Balls.spin demo is running. What a relief! It was just those two boot-state-machine instructions that was causing things to appear dead. Now that it's working, it also means there's agreement between the chip and the development tool. Those new LIFO stacks are running, too. Next thing: fetching instructions from the hub. Tomorrow.
That's great! Congratulations! Now get some sleep! :-)
With Cluso's encouragement, I stayed up the extra 40 minutes to see how the compile went.
It works! Everything seems functional. The ROM Monitor came right up and now the Balls.spin demo is running. What a relief! It was just those two boot-state-machine instructions that was causing things to appear dead. Now that it's working, it also means there's agreement between the chip and the development tool. Those new LIFO stacks are running, too. Next thing: fetching instructions from the hub. Tomorrow.
Yay!
Hmm... ever since when Beau mentioned memory and computing requirements for DRC, things like this:
make my heart bleeding... and that's just one of quite a few.
I really wish I was in the position to lend our idle resources to a "good cause" :depressed:
Come to think of it, using an SSD for swap could actually be a bad thing. SSDs have TRIM issues. Essentially the flash has to be blanked in 256K blocks and written in 4K blocks. The controller tries to keep a free list to write directly into, but eventually without TRIM, you need to blank old blocks, then copy a bunch of blocks to the blank, then write the changed data.
This overhead causes many SSDs to drop to half their rated speed and performance is non-deterministic.
64GB of DDR3 ram will be around 8-18GB/s throughput, in actual use. My testing shows a 2-4GB/s performance difference between DDR1333 and DDR1600. The DDR1600 ram I've tested wasn't very reliable and only works stably at 1333.
How does a HUB program get the PC into a COG or HUB address for knowing where it currently is in memory? I know we have relative jumps, etc... just wondering if we've got a path to put the PC out there for a program to use.
How does a HUB program get the PC into a COG or HUB address for knowing where it currently is in memory? I know we have relative jumps, etc... just wondering if we've got a path to put the PC out there for a program to use.
Comments
On the subject of relative addressing, were you considering maintaining the REPS #n,@label syntax or a new variant?
Neither. The problem is that if multitasking is enabled, that instruction between the REPS and the repeating code goes away. So, it's better for someone to just put the #constant or do the calculation if the block is big.
I don't think so. If the compiler computes the relatives than that means that in the binary the address will still be an absolute.
Exactly ... the concept ir right.
I am not skilled enough to suggest how the opcodes should be made but the concept is this eg. float math library:
- now on P1 it starts a cog. If I use its functions from many places/cogs I need to be sure (by using locks or whatever other handshake method) to not call a function while it is executing from another task. If I need concurrency I have to start 2 cogs.
- I WANT that (because the cog pasm is loaded from hub and thus still available in the hub after the load) I can use the cog function for intensive math (eg tight control loops) while I can also call the same function (resident in hub) from an other place where the execution speed is not so crucial but float math is needed anyway.
That means that the binary code must execute in hubexec, but the same code if loaded into the cog must execute the same way in cog mode without any jmp/call/branch patching of any sort.
I would like to see (I would like to be able to write code like this)
idem for calls and other similar instructions
EDIT: to add code box
The only problem I can see is that you won't be able to put hub code in locations $000-$1FF.
Well, thanks for positing this idea. It's getting used now.
The booter, hmac/sha256, and monitor are in the ROM at the bottom of memory, so it's no problem if we can't call any of that code.
I got all these changes made and now nothing works. This happens sometimes. I hope to have this resolved soon so we can get on to the instruction cache.
I like the idea of relative jumps not needing any @ character. It makes code a lot easier to write and read. Maybe we can have such syntax.
It's true that we need both relative and absolute branches in the cog and in the hub. I thought this was superfluous, at first, but Bill rightly pointed out their necessity.
Anyway, we've got it all now. I just need to find out why nothing's working. I hard-coded the TRACE into the Verilog so I'll be able to see on my logic analyzer what's happening internally. Hopefully, it's something identifiable. Whenever nothing works after a big change, I'm always preoccupied until I get it straightened out. I hate to have something so huge up in the air, especially at 4:40am.
Hope it's nothing too serious!
Maybe you just need to get some sleep. It will be clearer in the morning.
Its been a busy day, but the end result is simpler and well worth thetime.
BTW its midnight here in Oz, but its Saturday
coffee helps! Looks like you were encoding it into one long, works for me
It is not difficult for PNUT and compilers to embed the 16 bit address.
(mind you, I just woke up and have not had coffee yet)
Heart attack downgraded to MILD!
However, I think with the latest instructions this is probably unnecessary.
I need to sleep for about 5 hours, I think. Otherwise, tomorrow will be a waste. So, if it's midnight for you, you can go to bed, too.
Edit: I had to go through the ROM programs and update the JMP/CALL instructions. With that built-in 4-level FIFO stack for plain CALL and RET instructions, you don't need all those labels attached to the RET, anymore. It really cleaned things up a lot. Now, if we could have a straight label after JMP/CALL, instead of @relative, it would really make the code look nice.
My apologies. I mis-read it, as I am allergic to using two longs for a JMP/CALL in hub mode, and I see them in the shadows...
Teaches me not to post before drinking coffee!
It works! Everything seems functional. The ROM Monitor came right up and now the Balls.spin demo is running. What a relief! It was just those two boot-state-machine instructions that were causing things to appear dead. Now that it's working, it also means there's agreement between the chip and the development tool. Those new LIFO stacks are running, too. Next thing: fetching instructions from the hub. Tomorrow.
Now go get some quality ZZZzzz's...
Yay!
Hmm... ever since when Beau mentioned memory and computing requirements for DRC, things like this:
make my heart bleeding... and that's just one of quite a few.
I really wish I was in the position to lend our idle resources to a "good cause" :depressed:
Come to think of it, using an SSD for swap could actually be a bad thing. SSDs have TRIM issues. Essentially the flash has to be blanked in 256K blocks and written in 4K blocks. The controller tries to keep a free list to write directly into, but eventually without TRIM, you need to blank old blocks, then copy a bunch of blocks to the blank, then write the changed data.
This overhead causes many SSDs to drop to half their rated speed and performance is non-deterministic.
64GB of DDR3 ram will be around 8-18GB/s throughput, in actual use. My testing shows a 2-4GB/s performance difference between DDR1333 and DDR1600. The DDR1600 ram I've tested wasn't very reliable and only works stably at 1333.
How does a HUB program get the PC into a COG or HUB address for knowing where it currently is in memory? I know we have relative jumps, etc... just wondering if we've got a path to put the PC out there for a program to use.
POP reg
Should do it
or just CALL, and it will end up in cog location $0
With all the changes, I'm gonna have to just go start coding a lot of broken PASM again! So nice to have a basic stack!