Thanks Chip. I was just looking those over confused... Nice!
Is the text document linked above inclusive? In other words, are you just adding to your original source one and posting here Chip?
I keep adding to the same document, but I also keep making changes to older parts of it. In the end, it will be good to read the whole thing from beginning to end to get an updated overview of things.
The SETBC /NC /Z /NZ do already copy the state of the C or Z flag to the bit.
If you want to copy a bit from a register into another bit in another register you will need 2 instructions:
isob reg1,#bit1 wc,nr
setbc reg2,#bit2
It may be a good idea to make a pseudo instruction for ISOB with NR-effect (like TEST is a AND NR) :
Thanks! Too many threads. I'm just getting back from a conference in NOLA, got sick as a dog (seriously, use that hand cleaner well if you are there), and am just now catching back up.
xxxx [COLOR=#ff0000][U]C[/U][/COLOR]IOHHHLLL [U]|[B] [COLOR=#ff0000]C [/COLOR][/B]| OUT / IN[/U] | 0 | Live [B] | [COLOR=#ff0000]1 | Clocked[/COLOR][/B]
I understand now. You are talking about pin modes which are selectable on the actual chip, but not possible to fully implement on the FPGA. I could implement part of the pin circuitry on the FPGA, but not much, so I didn't bother doing it, at all. We'll have to wait for the chip.
I understand now. You are talking about pin modes which are selectable on the actual chip, but not possible to fully implement on the FPGA. I could implement part of the pin circuitry on the FPGA, but not much, so I didn't bother doing it, at all. We'll have to wait for the chip.
How I use MUL, DIV
I see in instructions set that instructions BUT don't find how I shall use them.
Any help appreciated if possible with Examples
GETMULL D WC attempt to get lower multiplier result
GETMULH D WC attempt to get upper multiplier result
GETDIVQ D WC attempt to get divider quotient result
GETDIVR D WC attempt to get divider remainder result
You have simple multiply instruction but why not simple Divide?
| MUL | D,S#n |
| DIV | D,S#n | Why not that one ?
The reason is that there is no such thing as a simple (as in 'fast') divide technique. Division must be performed on a step-by-step test-case basis, unlike multiplication which is deterministic. There are two asynchronous 16-over-8-bit dividers in the texture mapper to handle the Z-perspective correction and they take 3 clocks just to settle. Their latency is the main reason GETPIX takes three clocks.
The 64-over-32-bit divider in each cog takes 17 clocks, being a radix-4 divider. It tests 4 cases of subtraction on each clock to generate 2 bits of quotient. The 17th clock is needed to possibly negate the quotient and remainder results, in case the operation was signed and the results were due to be negative.
How I use MUL, DIV
I see in instructions set that instructions BUT don't find how I shall use them.
Any help appreciated if possible with Examples
GETMULL D WC attempt to get lower multiplier result
GETMULH D WC attempt to get upper multiplier result
GETDIVQ D WC attempt to get divider quotient result
GETDIVR D WC attempt to get divider remainder result
I will document this after I get the SDRAM working.
The reason is that there is no such thing as a simple (as in 'fast') divide technique. Division must be performed on a step-by-step test-case basis, unlike multiplication which is deterministic. There are two asynchronous 16-over-8-bit dividers in the texture mapper to handle the Z-perspective correction and they take 3 clocks just to settle. Their latency is the main reason GETPIX takes three clocks.
The 64-over-32-bit divider in each cog takes 17 clocks, being a radix-4 divider. It tests 4 cases of subtraction on each clock to generate 2 bits of quotient. The 17th clock is needed to possibly negate the quotient and remainder results, in case the operation was signed and the results were due to be negative.
In attachment You can see DIVIDE.VHD I use that not need settle time
It still needs settling time - not in clocks, but in nanoseconds. You can find out what the settling time is by putting flops on the inputs and outputs and then compiling it and checking the Fmax. You cannot clock that thing very quickly, compared to a multiplier.
For big dividers, you must have them work over multiple clocks with flip-flops, because they'd never be able to keep up with the main clock as asynchronous circuits. Even in a 16-over-8 asynchronous divider, the critical path is through a few hundred standard cells in an ASIC. It's less cells in an FPGA, because those cells are more complex, but even FPGA will often have dedicated math circuits to overcome the logic fabric's speed limitations for math operations.
The reason is that there is no such thing as a simple (as in 'fast') divide technique. Division must be performed on a step-by-step test-case basis, unlike multiplication which is deterministic. There are two asynchronous 16-over-8-bit dividers in the texture mapper to handle the Z-perspective correction and they take 3 clocks just to settle. Their latency is the main reason GETPIX takes three clocks.
The 64-over-32-bit divider in each cog takes 17 clocks, being a radix-4 divider. It tests 4 cases of subtraction on each clock to generate 2 bits of quotient. The 17th clock is needed to possibly negate the quotient and remainder results, in case the operation was signed and the results were due to be negative.
This is a really impressive divide. Wow 17 clocks!!! Now, if we could just clock this thing at 1GHz
Chip: Is there any restriction in pnut.exe that prevents it from compiling programs greater than 2KB? Postedit: Just discovered I found this restriction in December -oops. It has been confirmed. Use p2load by David Betz instead to load a pnut binary image. http://forums.parallax.com/showthread.php/144384-p2load-A-Loader-for-the-Propeller-II
pnut appears to cut the load at $165F and then fills 32 bytes (8 longs) with $00000001. This would mean $E80..$167F = $0800 = 2KB. I have tried making a new DAT section and also an ORG 0. Neither fix this.
Attached is a code sample. I have patched my code to go straight to the ROM Monitor after a 5 sec delay. If you then examine $1600-$16FF you will see the "==== HUB END ===" followed by some $33 ("3") bytes. In my code (at the end) is a byte $33[256] which should output 256 x $33 but they are truncated at $165F.
Comments
Is the text document linked above inclusive? In other words, are you just adding to your original source one and posting here Chip?
I keep adding to the same document, but I also keep making changes to older parts of it. In the end, it will be good to read the whole thing from beginning to end to get an updated overview of things.
Thanks.
I've been trying to do that. Of course, with my memory, it's like I'm getting a new document each time!
It is maybe to late but I gave looked on this instructions.
And are not sure if that is so usable.
In my opinion more usable have be that ones for handle of CPU emulators
If you want to copy a bit from a register into another bit in another register you will need 2 instructions:
It may be a good idea to make a pseudo instruction for ISOB with NR-effect (like TEST is a AND NR) :
Andy
I think the answer to your question is in this thread. http://forums.parallax.com/showthread.php/145201-Shuttle-today?p=1156240#post1156240
Fred
Have You any info/demo on strobed IO
What do you mean by strobed I/O?
This mode
Previous post edited -- to be more clear
I understand now. You are talking about pin modes which are selectable on the actual chip, but not possible to fully implement on the FPGA. I could implement part of the pin circuitry on the FPGA, but not much, so I didn't bother doing it, at all. We'll have to wait for the chip.
If it is not on Emulator -- not interesting in this phase of experiments ----
Thanks
How I use MUL, DIV
I see in instructions set that instructions BUT don't find how I shall use them.
Any help appreciated if possible with Examples
GETMULL D WC attempt to get lower multiplier result
GETMULH D WC attempt to get upper multiplier result
GETDIVQ D WC attempt to get divider quotient result
GETDIVR D WC attempt to get divider remainder result
You have simple multiply instruction but why not simple Divide?
| MUL | D,S#n |
| DIV | D,S#n | Why not that one ?
The reason is that there is no such thing as a simple (as in 'fast') divide technique. Division must be performed on a step-by-step test-case basis, unlike multiplication which is deterministic. There are two asynchronous 16-over-8-bit dividers in the texture mapper to handle the Z-perspective correction and they take 3 clocks just to settle. Their latency is the main reason GETPIX takes three clocks.
The 64-over-32-bit divider in each cog takes 17 clocks, being a radix-4 divider. It tests 4 cases of subtraction on each clock to generate 2 bits of quotient. The 17th clock is needed to possibly negate the quotient and remainder results, in case the operation was signed and the results were due to be negative.
Thanks for explanation.
I will document this after I get the SDRAM working.
Thanks
In attachment You can see DIVIDE.VHD I use that not need settle time
It still needs settling time - not in clocks, but in nanoseconds. You can find out what the settling time is by putting flops on the inputs and outputs and then compiling it and checking the Fmax. You cannot clock that thing very quickly, compared to a multiplier.
For big dividers, you must have them work over multiple clocks with flip-flops, because they'd never be able to keep up with the main clock as asynchronous circuits. Even in a 16-over-8 asynchronous divider, the critical path is through a few hundred standard cells in an ASIC. It's less cells in an FPGA, because those cells are more complex, but even FPGA will often have dedicated math circuits to overcome the logic fabric's speed limitations for math operations.
Thanks.
Now I understand what You mean --- settle time
This is a really impressive divide. Wow 17 clocks!!! Now, if we could just clock this thing at 1GHz
$3M would do it!
Of course I would also be expecting more SRAM for this.
I have my lotto ticket in
Postedit: Just discovered I found this restriction in December -oops. It has been confirmed. Use p2load by David Betz instead to load a pnut binary image.
http://forums.parallax.com/showthread.php/144384-p2load-A-Loader-for-the-Propeller-II
pnut appears to cut the load at $165F and then fills 32 bytes (8 longs) with $00000001. This would mean $E80..$167F = $0800 = 2KB. I have tried making a new DAT section and also an ORG 0. Neither fix this.
Attached is a code sample. I have patched my code to go straight to the ROM Monitor after a 5 sec delay. If you then examine $1600-$16FF you will see the "==== HUB END ===" followed by some $33 ("3") bytes. In my code (at the end) is a byte $33[256] which should output 256 x $33 but they are truncated at $165F.
LMM_SerialDebugger_025_bug.spin