@AJL,
I understood what tou are saying, but that is technically an incorrect understanding of the physical logic invloved with the comparison to get equal. In the physical logic, an inverse adder is used to determine equality. The reason for this is that an adder is present in the hardware logic, and in order to do subtraction, one operand is inverted and then added. If the result is zero then equality is signalled and a flipflop is set, usually called the zero flag internally on the schematic diagram. Its just he way computers were built originally.
But this is way off the point that i have been making. When you copy a logic bit into a flag, confusion reigns if its the zero flag. The carry flag, or any other for that matter except the zero flag, will be set when the bit is 1 and reset when the bit is 0, so there is no confusion here.
When a 1 gets copied to the zero flag it is called set. The confusion results because you copied the bit instead of using it as a test. YES, IT SHOULD BE A TEST AND SET. Then the Z flag reports precisely what it is meant to be in ALL situations, a ZERO RESULT. This is the fundamental meaning of the zero flag, a zero result. This is why the CMP instruction states if the result of the subtraction gives a zero result, then ZERO will be set. Then you test for this using the IF_Z or if you wish BZ, JZ, TJZ or whatever.
If its done this way, not only is the Z flag work the same way with all instructions, but there is no confusion whether you are testing multiple bits or a single bit, comparing, adding, moving, or any other of the many instructions that set Z if the result is zero.
@AJL,
I understood what tou are saying, but that is technically an incorrect understanding of the physical logic invloved with the comparison to get equal. In the physical logic, an inverse adder is used to determine equality. The reason for this is that an adder is present in the hardware logic, and in order to do subtraction, one operand is inverted and then added. If the result is zero then equality is signalled and a flipflop is set, usually called the zero flag internally on the schematic diagram. Its just he way computers were built originally.
But this is way off the point that i have been making. When you copy a logic bit into a flag, confusion reigns if its the zero flag. The carry flag, or any other for that matter except the zero flag, will be set when the bit is 1 and reset when the bit is 0, so there is no confusion here.
When a 1 gets copied to the zero flag it is called set. The confusion results because you copied the bit instead of using it as a test. YES, IT SHOULD BE A TEST AND SET. Then the Z flag reports precisely what it is meant to be in ALL situations, a ZERO RESULT. This is the fundamental meaning of the zero flag, a zero result. This is why the CMP instruction states if the result of the subtraction gives a zero result, then ZERO will be set. Then you test for this using the IF_Z or if you wish BZ, JZ, TJZ or whatever.
If its done this way, not only is the Z flag work the same way with all instructions, but there is no confusion whether you are testing multiple bits or a single bit, comparing, adding, moving, or any other of the many instructions that set Z if the result is zero.
I find it interesting the way you are prepared to continue arguing from a point of ignorance. This mainframe used a set of comparator circuits to determine Equality or not, it did not perform a subtraction (inverted addition). Maybe that also explains why the MIPS architecture doesn't even have a zero flag.
As every processor that I have ever worked with stores the Z bit in it's raw form when the Status Register (whatever it is named) is pushed to, or retrieved from the stack, I find your logic at odds with processor design.
I find that I've reached the point where I have to accept that you are unreachable in your tower of self-confirming logic. I've built my bridge, and now I'm over it. You'll hear nothing more on this from me.
@AJL
It's a shame I threw out the schematics for the 1969 mini design I worked on or I could show you what I mean.
It doesn't resolve the different interpretations in different instructions tho' within the same P2.
Anyway, it is what it is, so I'll give up trying to explain my reasoning.
blank call #hsync 'blank lines
xcont m_vi,#0
_ret_ djnz x,#blank
hsync xcont m_bs,#0 'horizontal sync
xcont m_sn,#1
_ret_ xcont m_bv,#0
m_bs long $CF000000+16 'before sync
m_sn long $CF000000+96 'sync
m_bv long $CF000000+48 'before visible
m_vi long $CF000000+640 'visible
m_rf long $7F000000+640 'visible rlong 8bpp lut
My first reading of streamer docs tells me $C command means 32-bit immediate mode, which says S is the output data.
Immediate mode
S/# provides 32 bits of data which are directly output for the duration of the command.
The rest of the config says output to all four DAC channels. So a byte value of #1 is fed to DAC0 (Sync pin I presume) for the duration of 96 pixels. I don't know why only #1 and not #200 or something.
Solved. The reason is SETCMOD instruction configures special video encoding hardware on the data going to the four DACx channels. For VGA output, bit 0 of DAC0 is duplicated to all 8 bits of its physical DAC, with an option polarity inversion.
Oddly, I don't think cmod is doing anything else, other than adding some lag, when it comes to RGB. It could be disabled and use #255 for the sync instead.
I tried using COGATN inside the VGA driver field loop and it didn't seem to work...
Not until we added this "rdfast ##1<<31,..." anyway...
The streamer instructions block cog execution. They only return once that streamer command is running. There is no double buffering with them.
But that just means that any subsequent instructions, like COGATN, are occurring in time with the same video actions. For example, the following issues a COGATN at the leading edge of the hsync pulse:
Right. You can also use SETQ to override the bits above the pin/bit number:
SETQ #7
DIRH #8
That would make pins 8..15 high outputs.
.. all of which is quite cryptic to a new user...
Could the tools support (for example) something like
DIRH [15..8]
as a single line, with much clearer user intent ?
Good suggestion but that can come later.
DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.
DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.
DIRx/DRVx/FLTx etc are in the same category.
Should we use .. or : ???
Also able to affect mutiple pins are the WRPIN/WXPIN/WYPIN instructions.
For bit operations, BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT use bits 9..5 to specify additional bits. So, up to 15 additional bits can be specified within a 9-bit immediate value.
Right. You can also use SETQ to override the bits above the pin/bit number:
SETQ #7
DIRH #8
That would make pins 8..15 high outputs.
.. all of which is quite cryptic to a new user...
Could the tools support (for example) something like
DIRH [15..8]
as a single line, with much clearer user intent ?
Good suggestion but that can come later.
DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.
DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.
DIRx/DRVx/FLTx etc are in the same category.
Should we use .. or : ???
Also able to affect mutiple pins are the WRPIN/WXPIN/WYPIN instructions.
For bit operations, BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT use bits 9..5 to specify additional bits. So, up to 15 additional bits can be specified within a 9-bit immediate value.
I knew there were others. The BITx ops are a great addition too
What do the C and Z do with multiple bits in these instructions?
This is really going to speed up and minimise code, including the spin interpreter!
Chip,
I've had another look at my attempt to match up understanding of the instruction pipeline. Last time I presented it it didn't work for you, so I've rotated my view by one clock and added some more detail. I think this will fit reality this time. Top diagram is your one, bottom diagram is mine lined up showing what I feel is clarifying the same stages. I'm keen to get some feedback again.
Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.
Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.
Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.
Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.
Oh, I just realised I'm maybe not treating the latches correctly. I have been assuming they are synchronous flip-flops rather than possibly being actual asynchronous latches.
Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.
Every bit storage element is a register, not a latch. So, visualize flip-flops.
Another attempt. I've paralleled the prior stage latching with the associated fetches. I've never seen a pipeline schematic so it's not something I've considered before. Presumably that's called forwarding. So it occurs at all three registered stages.
I've included both get/go phase variants to choose from:
Evanh, I think what's throwing me off is the variably-placed width-varying ===x=== markers. These paths take different times, of course, but they all must resolve before the clock edge. So, the edge is the only event that matters. It looks like you would like to know when the PC updates. I could add that to my drawing.
Those propagation widths, while not necessarily accurate sizes, are quite important to my attempt. For example, I've made inline register clocking have a three character, =x=, propagation time. Also, note the gap at the end of each clock interval, before the new clock edge. That indicates spare overclocking overhead leeway.
I've made the propagation for cogRAM fetching a lot longer than the latches as a guess to account for the level of mux'ing.
As for my "forwarding", I did that to handle the fact that to use synchronous registers for cogRAM requires setup before the clock rise. It can't wait for the propagation of an addressing latch.
Evanh, here's the spoiler: the ASIC tools optimize the logic cell placement and routing only to meet the timing goal, so that signals with plenty of slack get routed around the hot spots, loosing their slack, while the signals needing speed get the prime placement and shortest routes. In the end, hundreds of thousands of paths are stacked against the timing wall, forming a cliff, where the chip fails systemically if the clock period becomes too short. So, while in theory some things take less time than others, the implementation is a blob of nearly identically-timed paths that affords no possibility of speed-up via selective clock cycle shortening. When you hit the speed limit, everything fails at once.
Comments
I understood what tou are saying, but that is technically an incorrect understanding of the physical logic invloved with the comparison to get equal. In the physical logic, an inverse adder is used to determine equality. The reason for this is that an adder is present in the hardware logic, and in order to do subtraction, one operand is inverted and then added. If the result is zero then equality is signalled and a flipflop is set, usually called the zero flag internally on the schematic diagram. Its just he way computers were built originally.
But this is way off the point that i have been making. When you copy a logic bit into a flag, confusion reigns if its the zero flag. The carry flag, or any other for that matter except the zero flag, will be set when the bit is 1 and reset when the bit is 0, so there is no confusion here.
When a 1 gets copied to the zero flag it is called set. The confusion results because you copied the bit instead of using it as a test. YES, IT SHOULD BE A TEST AND SET. Then the Z flag reports precisely what it is meant to be in ALL situations, a ZERO RESULT. This is the fundamental meaning of the zero flag, a zero result. This is why the CMP instruction states if the result of the subtraction gives a zero result, then ZERO will be set. Then you test for this using the IF_Z or if you wish BZ, JZ, TJZ or whatever.
If its done this way, not only is the Z flag work the same way with all instructions, but there is no confusion whether you are testing multiple bits or a single bit, comparing, adding, moving, or any other of the many instructions that set Z if the result is zero.
I find it interesting the way you are prepared to continue arguing from a point of ignorance. This mainframe used a set of comparator circuits to determine Equality or not, it did not perform a subtraction (inverted addition). Maybe that also explains why the MIPS architecture doesn't even have a zero flag.
As every processor that I have ever worked with stores the Z bit in it's raw form when the Status Register (whatever it is named) is pushed to, or retrieved from the stack, I find your logic at odds with processor design.
I find that I've reached the point where I have to accept that you are unreachable in your tower of self-confirming logic. I've built my bridge, and now I'm over it. You'll hear nothing more on this from me.
It's a shame I threw out the schematics for the 1969 mini design I worked on or I could show you what I mean.
It doesn't resolve the different interpretations in different instructions tho' within the same P2.
Anyway, it is what it is, so I'll give up trying to explain my reasoning.
Looks like #1 makes it do sync, but I can't find anything in the docs about this...
Also, is there anyway to input the actual hsync pin's state?
I'm thinking you can't because it's in a smartpin mode...
My first reading of streamer docs tells me $C command means 32-bit immediate mode, which says S is the output data.
The rest of the config says output to all four DAC channels. So a byte value of #1 is fed to DAC0 (Sync pin I presume) for the duration of 96 pixels. I don't know why only #1 and not #200 or something.
Oddly, I don't think cmod is doing anything else, other than adding some lag, when it comes to RGB. It could be disabled and use #255 for the sync instead.
I tried using COGATN inside the VGA driver field loop and it didn't seem to work...
Not until we added this "rdfast ##1<<31,..." anyway...
But that just means that any subsequent instructions, like COGATN, are occurring in time with the same video actions. For example, the following issues a COGATN at the leading edge of the hsync pulse:
Right. You can also use SETQ to override the bits above the pin/bit number:
SETQ #7
DIRH #8
That would make pins 8..15 high outputs.
.. all of which is quite cryptic to a new user...
Could the tools support (for example) something like
DIRH [15..8]
as a single line, with much clearer user intent ?
DIRx #[n+w..n]
would resolve to a single instruction for widths up to 8.
DIRx ##[n+w..n]
would resolve to two instructions using SETQ. This way we explicitly define the use of SETQ with DIRx.
DIRx/DRVx/FLTx etc are in the same category.
Should we use .. or : ???
Also able to affect mutiple pins are the WRPIN/WXPIN/WYPIN instructions.
For bit operations, BITL/BITH/BITC/BITNC/BITZ/BITNZ/BITRND/BITNOT use bits 9..5 to specify additional bits. So, up to 15 additional bits can be specified within a 9-bit immediate value.
I knew there were others. The BITx ops are a great addition too
What do the C and Z do with multiple bits in these instructions?
This is really going to speed up and minimise code, including the spin interpreter!
I've had another look at my attempt to match up understanding of the instruction pipeline. Last time I presented it it didn't work for you, so I've rotated my view by one clock and added some more detail. I think this will fit reality this time. Top diagram is your one, bottom diagram is mine lined up showing what I feel is clarifying the same stages. I'm keen to get some feedback again. EDIT: Added PClatc, improve description of PCflux, and merge ALU descriptions.
Similar question for CogRAM then too. I've mostly treated it as asynchronous SRAM I think, and maybe that needs flipped as well.
Every bit storage element is a register, not a latch. So, visualize flip-flops.
I've included both get/go phase variants to choose from:
As for my "forwarding", I did that to handle the fact that to use synchronous registers for cogRAM requires setup before the clock rise. It can't wait for the propagation of an addressing latch.