With a little bit of smarts the compiler could figure out if the target address was intended to be a pointer or an instruction. It is unlikely that the programmer would intentionally do a jump without a "#" to a location that contained an instruction, and it's also not likely that one would jump to a location using a "#" that did not contain an instruction. Consider this code:
jmp #label1
....
jmp label2
...
jmp label3
...
jmp #label4
....
label1 mov x, y
label2 add x,#3
...
label3 long label1
label4 long label2
A smart compiler would issue a warning for "jmp label2" and "jmp #label4". Of course, it's always possible, though unlikely, that somebody did want to do something tricky and doesn't want the compiler to issue warnings. So we would need a way to disable warnings, such as a line that reads "'COMPILER WARNINGS OFF", and to turn warnings back on you would have a line reading "'COMPILER WARNINGS ON".
With a little bit of smarts the compiler could figure out if the target address was intended to be a pointer or an instruction. It is unlikely that the programmer would intentionally do a jump without a "#" to a location that contained an instruction, and it's also not likely that one would jump to a location using a "#" that did not contain an instruction. Consider this code:
jmp #label1
....
jmp label2
...
jmp label3
...
jmp #label4
....
label1 mov x, y
label2 add x,#3
...
label3 long label1
label4 long label2
A smart compiler would issue a warning for "jmp label2" and "jmp #label4". Of course, it's always possible, though unlikely, that somebody did want to do something tricky and doesn't want the compiler to issue warnings. So we would need a way to disable warnings, such as a line that reads "'COMPILER WARNINGS OFF", and to turn warnings back on you would have a line reading "'COMPILER WARNINGS ON".
Yup, tho some more care would be needed for LUT and HUB cases, so something like 'reg' or 'regseg' or ??, could indicare a LONG in COG aka a Valid Register
eg
jmp #label1
....
jmp label2
...
jmp label3
...
jmp #label4
....
label1 mov x, y
label2 add x,#3
...
'~~~COG space~~~
label3 reg label1 ' fine to use as indirect
label4 reg label2
'~~~LUT space~~~
label5 long label1 ' valid code, but do not try to use as indirect
label6 long DataValue ' more common code
'~~~HUB space~~~
label7 long label1 ' valid code, but do not try to use as indirect
label8 long DataValue
Placing labeled initialized tables in LUT and HUB will be fine, but using those labels as indirect pointers will fail.
Reg above will give a error if copied/pasted into the wrong code location, and assist the tools in knowing reg is a valid indirect usage.
PNut already knows that valid registers are between $00 and $1ff. It will issue an error if the value is outside of that range.
I've encountered the issue of trying to treat hub memory as registers in my little Prop2 GCC project. I generate assembly from C using the P1 GCC compiler. This uses the cog memory model, which assumes everything is in cog memory. However, I'm trying to execute it on the P2 from hub memory using hub exec. So any instructions that reference hub memory as a register are converted to a "RDLONG temp, label" followed by the instruction using the temp register instead of the label.
I just wish you'd get a warning if the "#" is missing...
Seems that half my bugs turn out to be a missing #.
I think jmg is right and that direct calls are much more common.
If I were doing this from scratch, definitely drop the #...
Anyway, best if the normal usage is the simplest...
bst has an option to give a warning. Saved my bacon many times!
When I turned this option on from a working ZiCog, found a few bugs that had not shown their ugly heads.
Of course there were a few warnings on real code - so I put a special comment "indirect jump" and then I could ignore those flagged warnings
PNut already knows that valid registers are between $00 and $1ff. It will issue an error if the value is outside of that range.
True, but I'm also thinking ahead a little to flows that use a Linker, as has been mentioned before for P2.
There, it can be useful to separate at source level, those VARs that must be placed in COG, from those that could be placed in COG or LUT or HUB.
HUB-read comes with a speed cost, but tables in HUB, will likely be used to free up COG/LUT code space for a net gain in speed.
It also makes edit/merge harvesting of code examples easier.
Does anybody know what the letter designations are for the various boards in the Prop_Ver response to Prop_Chk. From what I can tell "A" is the Prop123-A9, "B" is the DE2-115 and "C" is the DE2-Nano. What are the letters for the BeMicro-A9, Prop123-A7 and BeMicro-A2?
Does anybody know what the letter designations are for the various boards in the Prop_Ver response to Prop_Chk. From what I can tell "A" is the Prop123-A9, "B" is the DE2-115 and "C" is the DE2-Nano. What are the letters for the BeMicro-A9, Prop123-A7 and BeMicro-A2?
From the ROM_Booter.spin2 file:
ver = "A" 'Prop123-A9 / BeMicro-A9
' ver = "B" 'DE2-115
' ver = "C" 'DE0-Nano
' ver = "D" 'BeMicro-A2
' ver = "E" 'Prop123-A7
Chip,
I've noticed an easy spare double-operand instruction slot if there was any needed of one. The SFUNC instruction is currently filling one and can be moved to a group of single-operand slots instead.
Bonus: The individual instructions SFUNC encompasses will no longer be listed as aliases but become regular citizens in the documentation. SFUNC would vanish as a name.
Chip,
I've noticed an easy spare double-operand slot instruction slot if there was any needed of one. The SFUNC instruction is currently filling one and can be moved to a group of single-operand slots instead.
Bonus: The individual instructions SFUNC encompasses will no longer be listed as aliases but become regular citizens in the documentation. SFUNC can vanish as a name.
Man, we are thinking the same thing at the same moment. I just responded to TonyB's thread. I want the blessing of those working on tools, though, since they've been through the ringer a few times, already.
In looking at timing reports, I realized many of the slow paths were starting from the instruction bits and bogging down in the instruction decoding logic. I made some alternate macros to decode instructions in the prior clock and then capture them into registers, so they are ready at the start of the critical cycle. I just change macro names to cause a register to be made, in lieu of normal decoding logic. The following instructions use the register macros:
This took 9 flops per cog, but jacked up the 2-cog Fmax from ~83MHz to ~96MHz. Now, the critical paths are things that I can't really speed up, anymore, as they are already quite optimized. This is good for the FPGA emulation and the silicon synthesis.
I'm running a 16-cog compile now for the Prop123-A9 board. It's going to take a few hours.
I hope to have a new set of files (v19) out on Monday.
Here's the latest instruction file. Note that flag suffixes are now single symbols, with WCZ replacing 'WC,WZ'. Someone here suggested that and it really simplifies things.
This took 9 flops per cog, but jacked up the 2-cog Fmax from ~83MHz to ~96MHz..
Will be interesting to see a table of how fMax has changed across all the builds. 15.66% gain is significant.
The FMax is all over the place, depending on the luck of the fitter seed value, but things have definitely gotten faster. It would be neat if v19 could be 100MHz.
The FMax is all over the place, depending on the luck of the fitter seed value, but things have definitely gotten faster. It would be neat if v19 could be 100MHz.
To a human, perhaps.
To USB circuits, 84MHz or 96MHz might be more useful thresholds
I just looked at the compile for Prop123-A9. I had to set the fitter to area-optimization mode to get everything to fit, and we are not going to get above 80MHz, after all. To get to 100MHz, I'd have to drop maybe two smart pins.
One thing I would like you spend some time in thinking over: an additional error flag for math operations.
1) you need one instruction to clear the flag
2) all the math instructions add, sub, .... and CORDIC result reading instructions will OR (accumulate) its error output (overflow, underflow, division by 0, ...) with the internal flag.
3a) you need one instruction to test the flag and/or one to jump on its state
3b) you need one instruction to copy&clear the flag state to C or Z thus allowing the use of already made (test, jump, conditionals, ...) based on the error state transferred to C or Z
In this way in critical points, during math operations, or at the end of several math operations, one can understand if the result is valid or not. It can be used to force some default values in place of erroneous computed ones into the mathematic equation chains
Other thing. It is possible to use the internal CORDIC engine to compute floating point single precision (at least the basic: +, -, /, *, squarer and integer2float and float2integer conversions) math?
I've read on the web that with CORDIC you can do +/- 1/16777216 precision math and is often times used in FPGA and MCU where a FPU is not available or its cost is not reasonable.
Quartus 17 has been release. Apparently, the big new thing is:
"The Intel® Quartus® Prime design software v17.0 Pro edition offers new team-based design flows that allow your geographically diverse development team to collaborate on a design."
Does this mean we'll all get to start working on the P2 verilog at the same time? Just think how much faster we'll get the final design knocked out!
Comments
Warnings for typos, or common misuse, are not really warnings for valid code.
They are the tools helping the programmer.
Here is a real example, showing how another assembler manages this :
Yup, tho some more care would be needed for LUT and HUB cases, so something like 'reg' or 'regseg' or ??, could indicare a LONG in COG aka a Valid Register
eg
Placing labeled initialized tables in LUT and HUB will be fine, but using those labels as indirect pointers will fail.
Reg above will give a error if copied/pasted into the wrong code location, and assist the tools in knowing reg is a valid indirect usage.
I've encountered the issue of trying to treat hub memory as registers in my little Prop2 GCC project. I generate assembly from C using the P1 GCC compiler. This uses the cog memory model, which assumes everything is in cog memory. However, I'm trying to execute it on the P2 from hub memory using hub exec. So any instructions that reference hub memory as a register are converted to a "RDLONG temp, label" followed by the instruction using the temp register instead of the label.
When I turned this option on from a working ZiCog, found a few bugs that had not shown their ugly heads.
Of course there were a few warnings on real code - so I put a special comment "indirect jump" and then I could ignore those flagged warnings
There, it can be useful to separate at source level, those VARs that must be placed in COG, from those that could be placed in COG or LUT or HUB.
HUB-read comes with a speed cost, but tables in HUB, will likely be used to free up COG/LUT code space for a net gain in speed.
It also makes edit/merge harvesting of code examples easier.
From the ROM_Booter.spin2 file:
I've noticed an easy spare double-operand instruction slot if there was any needed of one. The SFUNC instruction is currently filling one and can be moved to a group of single-operand slots instead.
Bonus: The individual instructions SFUNC encompasses will no longer be listed as aliases but become regular citizens in the documentation. SFUNC would vanish as a name.
Man, we are thinking the same thing at the same moment. I just responded to TonyB's thread. I want the blessing of those working on tools, though, since they've been through the ringer a few times, already.
In looking at timing reports, I realized many of the slow paths were starting from the instruction bits and bogging down in the instruction decoding logic. I made some alternate macros to decode instructions in the prior clock and then capture them into registers, so they are ready at the start of the critical cycle. I just change macro names to cause a register to be made, in lieu of normal decoding logic. The following instructions use the register macros:
SCL/SCLU
REP
SETQ
SETQ2
SKIP
SKIPF
EXECF
AUGS
AUGD
This took 9 flops per cog, but jacked up the 2-cog Fmax from ~83MHz to ~96MHz. Now, the critical paths are things that I can't really speed up, anymore, as they are already quite optimized. This is good for the FPGA emulation and the silicon synthesis.
I'm running a 16-cog compile now for the Prop123-A9 board. It's going to take a few hours.
I hope to have a new set of files (v19) out on Monday.
Will be interesting to see a table of how fMax has changed across all the builds. 15.66% gain is significant.
The FMax is all over the place, depending on the luck of the fitter seed value, but things have definitely gotten faster. It would be neat if v19 could be 100MHz.
To USB circuits, 84MHz or 96MHz might be more useful thresholds
Of course, you're human ;-)
Chip; Are you at the point where you can revise the above statement to NO MORE CHANGES, barring bug fixes?
Sandy
Yes.
I do think we are done adding.
One thing I would like you spend some time in thinking over: an additional error flag for math operations.
1) you need one instruction to clear the flag
2) all the math instructions add, sub, .... and CORDIC result reading instructions will OR (accumulate) its error output (overflow, underflow, division by 0, ...) with the internal flag.
3a) you need one instruction to test the flag and/or one to jump on its state
3b) you need one instruction to copy&clear the flag state to C or Z thus allowing the use of already made (test, jump, conditionals, ...) based on the error state transferred to C or Z
In this way in critical points, during math operations, or at the end of several math operations, one can understand if the result is valid or not. It can be used to force some default values in place of erroneous computed ones into the mathematic equation chains
Other thing. It is possible to use the internal CORDIC engine to compute floating point single precision (at least the basic: +, -, /, *, squarer and integer2float and float2integer conversions) math?
I've read on the web that with CORDIC you can do +/- 1/16777216 precision math and is often times used in FPGA and MCU where a FPU is not available or its cost is not reasonable.
"The Intel® Quartus® Prime design software v17.0 Pro edition offers new team-based design flows that allow your geographically diverse development team to collaborate on a design."
Does this mean we'll all get to start working on the P2 verilog at the same time? Just think how much faster we'll get the final design knocked out!