To use SETQ and RDLONG together to load cog and/or put should not require extra bits since the cog/out address would be held in a cog register, so it can be 11 address bits (no need to restrict it to 9 bits) so it would address cog $000:1FF and put $200:3FF (or $5FF or even more later).
To use SETQ and RDLONG together to load cog and/or put should not require extra bits since the cog/out address would be held in a cog register, so it can be 11 address bits (no need to restrict it to 9 bits) so it would address cog $000:1FF and put $200:3FF (or $5FF or even more later).
I was thinking that by expanding SETQ to have one more D bit, it could express bigger constants. Also, by making RDLONG have one more D bit, it, too, could express larger start addresses. The D-field in RDLONG specifies the start register, not the contents of D.
I was of the mind yesterday that I should expand RDLONG-repeat to automatically flow from cog to LUT. This would involve one more D bit in the RDLONG instruction and one more D bit in the SETQ instruction. Both could be done, but then I started thinking how it would booger up the instruction set for this single-purpose accommodation and I decided against it. It's still pulling at me, though. It would be nice to have a single means to load both cog and LUT. It could be as simple as this:
Sounds good, that helps users (and tools) think about LUT as more COG, so has appeal for simplicity.
eg There may be initialised tables that need to cross COG-LUT boundary & it saves code space.
What exactly is the booger up the instruction set impact ?
If we didn't have bytes, each of us would hit a wall as soon as we needed a memory-efficient mechanism to handle them. We'd be doing read-modify-writes on hub longs and pulling our hair out,[..]
Yep. The first generation of the DEC Alpha was 32-bit all the way and didn't have byte (or 16-bit word) support, and the result was just as you describe. So the next iteration of the DEC Alpha was extended with support for direct manipulation of bytes and 16-bit words.
I wonder if we could reckon ALL memory by long-address and consider the two orphaned LSBs as fractions: 0.00, 0.25, 0.50, 0.75. actually, those could be expressed as .0, .1, .2, .3.
It would be a little weird to understand that some hub-exec code starts at xxxx.3, for example. But, that's life. I think that would really look strange to people. Perhaps just having the tools unify hub-addressing notions with cog/LUT realities would be best.
Interesting.
SEG.ORG statements would hide that from users, but this idea could have merit for use in the LIST and MAP files where you need to report both Long and Byte address values in a way that the user can relate to,
If they list LONG addr by default, you need some way to intersperse BYTE address sometimes.
REG LONG $zz_rrrrrrrrr 'cog/out address 2+9 bits +00 assumed ???
HUBPTR LONG $xxxx_xxxxxxxx_xxxxxx00 ' 18bits+00 =512KB
In the P1 the HUBPTR lower 2 bits are ignored. In fact the reverse load mechanism we use specifically uses these knowing they are ignored. But this quirk won't be needed on P2.
Supporting words and bytes is a pain, but I realize that for many reasons they are vital. If we didn't have bytes, each of us would hit a wall as soon as we needed a memory-efficient mechanism to handle them. We'd be doing read-modify-writes on hub longs and pulling our hair out, knowing we were mired in the reinvention of an old wheel.
Yup, vital on Embedded controllers.
IIRC, I've seen MCUs that take that granularity down to BIT level, allows arrays of bits to be easily indexed and manipulated.
I was thinking that by expanding SETQ to have one more D bit, it could express bigger constants. Also, by making RDLONG have one more D bit, it, too, could express larger start addresses. The D-field in RDLONG specifies the start register, not the contents of D.
Oh, SETQ uses immediate. If there isn't 2 more bits available, then don't worry as its not a big deal if it can only load a page of 512 longs at a time.
I don't follow the RDLONG D problem... See prev post. Doesn't D refer to a cog register ($008:1FF) that contains an address (currently 9bits but could be 11 bits) pointing to the starting cog/lut address?
Chip,
How fast can the OnSemi RAM be accessed? Could it be 2x the P2 clock speed by chance???
It's rated for ~350MHz, but getting a 2x clock spread around is too much trouble. We also would't have enough setup time to do much, given the clock uncertainty.
If the P2 ends up at 160MHz max, then 2x is 320MHz which is under ~350MHz.
What I wondered was that since COG/LUT is closely coupled to the ALU/COG, was whether this RAM (not the HUB RAM) could run 2x clock speed so that the clocks I+R followed by D+S could be separated to I,R,S,D. This would mean that Dual Port Ram would not be required for COG RAM/Registers. Cog RAM would then be the same as LUT RAM.
So maybe the 3 clock LUT instructions could become 2 clock instructions simplifying things???
I was of the mind yesterday that I should expand RDLONG-repeat to automatically flow from cog to LUT. This would involve one more D bit in the RDLONG instruction and one more D bit in the SETQ instruction. Both could be done, but then I started thinking how it would booger up the instruction set for this single-purpose accommodation and I decided against it. It's still pulling at me, though. It would be nice to have a single means to load both cog and LUT. It could be as simple as this:
Sounds good, that helps users (and tools) think about LUT as more COG, so has appeal for simplicity.
eg There may be initialised tables that need to cross COG-LUT boundary & it saves code space.
What exactly is the booger up the instruction set impact ?
To make those instructions (SETQ and RDLONG) work well for loading cog and LUT, I would need to add one D bit to the SETQ instruction for immediate values up to #$3FF (otherwise we need to use AUGS - another instruction) and I would need to add a D bit to RDLONG so that it could specify D registers up to $3FF. This gums things up a bit by creating exceptional field uses. Maybe I'll revisit this later, but I feel like the way it works now is okay, all things considered.
REG LONG $zz_rrrrrrrrr 'cog/out address 2+9 bits +00 assumed ???
HUBPTR LONG $xxxx_xxxxxxxx_xxxxxx00 ' 18bits+00 =512KB
In the P1 the HUBPTR lower 2 bits are ignored. In fact the reverse load mechanism we use specifically uses these knowing they are ignored. But this quirk won't be needed on P2.
Couldn't this work Chip ?
I'm not understanding what you are saying. I think I might be too tired right now.
To make those instructions (SETQ and RDLONG) work well for loading cog and LUT, I would need to add one D bit to the SETQ instruction for immediate values up to #$3FF (otherwise we need to use AUGS - another instruction) and I would need to add a D bit to RDLONG so that it could specify D registers up to $3FF. This gums things up a bit by creating exceptional field uses. Maybe I'll revisit this later, but I feel like the way it works now is okay, all things considered.
My plan tomorrow is to get some FPGA images together and release them. The first two I'll do will be the Prop123-A7 and the DE2-115. I don't have any significant documentation done, but that will follow. I've got the whole 512KB memory downloading from PNut.exe, anyway, so you could potentially make big programs.
My plan tomorrow is to get some FPGA images together and release them. The first two I'll do will be the Prop123-A7 and the DE2-115. I don't have any significant documentation done, but that will follow. I've got the whole 512KB memory downloading from PNut.exe, anyway, so you could potentially make big programs.
My plan tomorrow is to get some FPGA images together and release them. The first two I'll do will be the Prop123-A7 and the DE2-115. I don't have any significant documentation done, but that will follow. I've got the whole 512KB memory downloading from PNut.exe, anyway, so you could potentially make big programs.
I realized that while the periodic timer interrupt was quaint, it didn't allow time targets, only repeating periods.
So, what was the WAITCNT instruction is now called ADDCNT and it just adds S into D, but copies the result into a CNT-target register that is compared to CNT on every clock. When there's a match, an event occurs that gets trapped in a flop for later testing. Also, this event can be an interrupt source. The interrupt must do an ADDCNT to set the next CNT target.
By having the interrupt set the next CNT target each time, you can do things like wait 1.5 bit periods after a start bit, and then 1 bit period for data bits, etc. It's like WAITCNT was, but now made into an event and interrupt. In fact, there is still an instruction called WAITCNT, but all it does is wait for the cnt-target - it has no operands.
The old SETPER instruction is gone, effectively replaced by ADDCNT. This caused some minor opcode shuffling that left things a little cleaner-looking.
The way the Prop1 addresses memory means that longs must be long-aligned and words must be word-aligned, while bytes can be anywhere. One thing that means is that you cannot have structures made up of mixed word sizes.
On the Prop2, there are no such limitations. There is only one issue where any type of hub alignment matters, and that is on fast r/w blocks that wrap - they must be long-aligned to wrap properly. In no other case does it matter, so it makes understanding hub memory dead simple. The ONLY place I see it being a pain is in reconciling cog and LUT longs, which each have a single address, with longs in hub, which take four addresses. That's why <<2 and >>2 come into play. Those could be cleaned up by the approach taken in the development tools, though.
I feel like we are having two slightly different conversations. I don't think anyone has suggested we get rid of byte addressing for manipulating data in the hub, which is what I feel like you are focusing on. We have only suggested treating PC as I described in an earlier comment.
About not wasting two bits of the PC by supporting non-long-alignment: Remember that we still need to have ANOTHER two bits beyond the PC's bits to reach down to words and longs. Those two bits must be encoded into the instructions for reckoning absolute and relative addresses. We are at 20 bits for those purposes and there are no more bits for bigger addresses in the opcode set. So, these two sub bits of the 18-bit PC, if you want to see them that way, total about 20 flops per cog, with 16 of them being in the 8-level PUSH/POP/CALL/RET hardware stack. They are not resource hogs and if we got rid of them, we would be forced into long-alignment for all instructions. That would be the only effect of getting rid of them. We wouldn't get a 4x-size hub memory map because we are constrained to 20 bits for byte-level addresses. However, if we totally got rid of words and bytes (which I've really though about), we could have a 4x-size hub memory map. Supporting words and bytes is a pain, but I realize that for many reasons they are vital. If we didn't have bytes, each of us would hit a wall as soon as we needed a memory-efficient mechanism to handle them. We'd be doing read-modify-writes on hub longs and pulling our hair out, knowing we were mired in the reinvention of an old wheel.
Or maybe we are having the same conversation?
Supposing you were to treat the PC as an instruction counter (see comment above), it is true that this would make the instruction space 4x larger than can actually be accessed with byte-level addressing. But that does not affect the P2 here and now. The PC would be 17 bits (128K instructions), meaning that the top 3 bits of instruction addresses (e.g. in registers or immediates in CALLx/JMP) would be ignored. But, if you are concerned about future expansion, you would already be able to double the hub memory and increase PC to 18 bits without changing anything else. To allow more than 256K instructions, you'd have to change the architecture not matter which way the instructions were addressed.
And, besides, this also potentially gives you a way to be able to execute from the entire hub memory, not just above $FFF(bytes)/$3FF(longs). But that is a separate conversation...
I realized that while the periodic timer interrupt was quaint, it didn't allow time targets, only repeating periods.
So, what was the WAITCNT instruction is now called ADDCNT and it just adds S into D, but copies the result into a CNT-target register that is compared to CNT on every clock. When there's a match, an event occurs that gets trapped in a flop for later testing. Also, this event can be an interrupt source. The interrupt must do an ADDCNT to set the next CNT target.
By having the interrupt set the next CNT target each time, you can do things like wait 1.5 bit periods after a start bit, and then 1 bit period for data bits, etc. It's like WAITCNT was, but now made into an event and interrupt. In fact, there is still an instruction called WAITCNT, but all it does is wait for the cnt-target - it has no operands.
The old SETPER instruction is gone, effectively replaced by ADDCNT. This caused some minor opcode shuffling that left things a little cleaner-looking.
For Byte/Word values add a pair of instructions that will grab/set the part you want to use.
I.E.
GetByte : S = byte (0 - 3) to get out of the long stored in D returned to D.
SetByte : S = byte to set into position in D encoded by the WC/WZ fields (0 - 3)
KIS (so we don't end up back at the P2Hot stage...)
For Byte/Word values add a pair of instructions that will grab/set the part you want to use.
I.E.
GetByte : S = byte (0 - 3) to get out of the long stored in D returned to D.
SetByte : S = byte to set into position in D encoded by the WC/WZ fields (0 - 3)
KIS (so we don't end up back at the P2Hot stage...)
That's exactly how it is now (for cog memory). There are similar instructions for nibs and words.
For hub memory, you have the same RDxxx/WRxxx instructions as before, except they no longer have to be aligned. You take an extra one-cycle penalty if you access a word or long that crosses a aligned-long boundary, but in return you have maximum flexibility in data alignment in the hub.
Comments
Here is the latest Prop2 instruction set:
<I tried to update the file, but the attachment menu has disappeared. I'll do it in a new post, instead.>
I was thinking that by expanding SETQ to have one more D bit, it could express bigger constants. Also, by making RDLONG have one more D bit, it, too, could express larger start addresses. The D-field in RDLONG specifies the start register, not the contents of D.
Sounds good, that helps users (and tools) think about LUT as more COG, so has appeal for simplicity.
eg There may be initialised tables that need to cross COG-LUT boundary & it saves code space.
What exactly is the booger up the instruction set impact ?
Interesting.
SEG.ORG statements would hide that from users, but this idea could have merit for use in the LIST and MAP files where you need to report both Long and Byte address values in a way that the user can relate to,
If they list LONG addr by default, you need some way to intersperse BYTE address sometimes.
REG LONG $zz_rrrrrrrrr 'cog/out address 2+9 bits +00 assumed ???
HUBPTR LONG $xxxx_xxxxxxxx_xxxxxx00 ' 18bits+00 =512KB
In the P1 the HUBPTR lower 2 bits are ignored. In fact the reverse load mechanism we use specifically uses these knowing they are ignored. But this quirk won't be needed on P2.
Couldn't this work Chip ?
Yup, vital on Embedded controllers.
IIRC, I've seen MCUs that take that granularity down to BIT level, allows arrays of bits to be easily indexed and manipulated.
Oh, SETQ uses immediate. If there isn't 2 more bits available, then don't worry as its not a big deal if it can only load a page of 512 longs at a time.
I don't follow the RDLONG D problem... See prev post. Doesn't D refer to a cog register ($008:1FF) that contains an address (currently 9bits but could be 11 bits) pointing to the starting cog/lut address?
If the P2 ends up at 160MHz max, then 2x is 320MHz which is under ~350MHz.
What I wondered was that since COG/LUT is closely coupled to the ALU/COG, was whether this RAM (not the HUB RAM) could run 2x clock speed so that the clocks I+R followed by D+S could be separated to I,R,S,D. This would mean that Dual Port Ram would not be required for COG RAM/Registers. Cog RAM would then be the same as LUT RAM.
So maybe the 3 clock LUT instructions could become 2 clock instructions simplifying things???
To make those instructions (SETQ and RDLONG) work well for loading cog and LUT, I would need to add one D bit to the SETQ instruction for immediate values up to #$3FF (otherwise we need to use AUGS - another instruction) and I would need to add a D bit to RDLONG so that it could specify D registers up to $3FF. This gums things up a bit by creating exceptional field uses. Maybe I'll revisit this later, but I feel like the way it works now is okay, all things considered.
I'm not understanding what you are saying. I think I might be too tired right now.
I think i'll wait till the movie comes out.
Great news!
So, what was the WAITCNT instruction is now called ADDCNT and it just adds S into D, but copies the result into a CNT-target register that is compared to CNT on every clock. When there's a match, an event occurs that gets trapped in a flop for later testing. Also, this event can be an interrupt source. The interrupt must do an ADDCNT to set the next CNT target.
By having the interrupt set the next CNT target each time, you can do things like wait 1.5 bit periods after a start bit, and then 1 bit period for data bits, etc. It's like WAITCNT was, but now made into an event and interrupt. In fact, there is still an instruction called WAITCNT, but all it does is wait for the cnt-target - it has no operands.
The old SETPER instruction is gone, effectively replaced by ADDCNT. This caused some minor opcode shuffling that left things a little cleaner-looking.
I feel like we are having two slightly different conversations. I don't think anyone has suggested we get rid of byte addressing for manipulating data in the hub, which is what I feel like you are focusing on. We have only suggested treating PC as I described in an earlier comment.
Or maybe we are having the same conversation?
Supposing you were to treat the PC as an instruction counter (see comment above), it is true that this would make the instruction space 4x larger than can actually be accessed with byte-level addressing. But that does not affect the P2 here and now. The PC would be 17 bits (128K instructions), meaning that the top 3 bits of instruction addresses (e.g. in registers or immediates in CALLx/JMP) would be ignored. But, if you are concerned about future expansion, you would already be able to double the hub memory and increase PC to 18 bits without changing anything else. To allow more than 256K instructions, you'd have to change the architecture not matter which way the instructions were addressed.
And, besides, this also potentially gives you a way to be able to execute from the entire hub memory, not just above $FFF(bytes)/$3FF(longs). But that is a separate conversation...
That's right. Same thing with WAITPAE/WAITPBE/etc. You set up the target and then you can do three things:
1) Poll to see if it has happened yet.
2) Wait for it to happen.
3) Use it as an interrupt source.
To do the old WAITCNT's new equivalent, you would:
Instead of WAITCNT, you could do TESTCNT WC to find out if the target was met, yet (C=1).
Are you still disappointed?
It looks like this has become a two-instruction process:
addcnt target, #0
waitcnt
that's wrong. It would just be "waitcnt" (no parameters).
The movie *IS* out, it's called Groundhog Day - funny stuff in its original context...not so funny here and now.
Very nice change, by the way.
For Byte/Word values add a pair of instructions that will grab/set the part you want to use.
I.E.
GetByte : S = byte (0 - 3) to get out of the long stored in D returned to D.
SetByte : S = byte to set into position in D encoded by the WC/WZ fields (0 - 3)
KIS (so we don't end up back at the P2Hot stage...)
That's exactly how it is now (for cog memory). There are similar instructions for nibs and words.
For hub memory, you have the same RDxxx/WRxxx instructions as before, except they no longer have to be aligned. You take an extra one-cycle penalty if you access a word or long that crosses a aligned-long boundary, but in return you have maximum flexibility in data alignment in the hub.
What happens when PC increments from $0FFC to $1000? Do you automatically switch from LUT execution to HUB execution?