Faster number printing without QDIV
Hello!
I was working on printing numbers. Using the Propeller 2 Assembly Language (PASM2) Manual, I was able to print the numbers given the largest unsigned number. I stumbled onto QDIV and the program works and appears very accurate:
``` ' set the reciving pin for input to the P2 microcontroller RX_PIN = 63 ' set the tranmission pin for output from the P2 microcontroller TX_PIN = 62 ' set the baud mode to support 2000000 buad BAUD_MODE = 655367 dat ' begin the program at address 0 org 0 ' Set the clock mode asmclk ' configure TX smart pin fltl #TX_PIN wrpin ##(P_ASYNC_TX | P_OE), #TX_PIN wxpin ##BAUD_MODE, #TX_PIN drvl #TX_PIN mov div_idx, #0 mov number, ##4_294_967_295 mov tmp, number mov divisor, ##1_000_000_000 .digit_loop mov digit, #0 .extract_digit cmpsub tmp, divisor wc ' Subtract divisor if num >= divisor if_c add digit, #1 ' Increment digit if subtraction occurred if_c jmp #.extract_digit ' Repeat if num >= divisor add digit, #"0" ' Convert to ASCII wypin digit, #TX_PIN .flush rdpin pr2, #TX_PIN WC ' check busy flag if_c jmp #.flush ' hold until done add div_idx, #1 ' Move to next divisor in table qdiv divisor, #10 getqx divisor cmp div_idx, #10 wz ' Stop when all digits are printed if_nz jmp #.digit_loop ' Continue until last divisor is reached jmp #.done .done ret ' Constants and Variables digit res 1 divisor res 1 div_idx res 1 tmp res 1 number res 1 buffer byte "0", 0, 0 ```
I thought i was done but then i read the fine print:
ALU circuit and CORDIC Solver math instructions. The ALU (Arithmetic Logic Unit) instructions perform common math operations in just 2 clock cycles each. The CORDIC (COordinate Rotation DIgital Computer) instructions perform more complicated math operations in 54 clock cycles each.
Right now for this simple example, I wouldn't notice much delay but I would imagine it would be rather slow for a more complex project, like a video game. Is there a faster way to do the division?
Comments
Unless you are a modern JRPG you are not printing enough numbers per frame in a video game for it to really matter
You're actually doing two divisions per digit here: One iterative one in
extract_digit
and the obvious QDIV one. The QDIV here actually just movesdivisor
through a fixed sequence of powers of ten, so you could replace it with a table lookup. But then you're still doing a division loop that can take longer than a QDIV would (if the digit is 8 or 9).Also never put initialized data after RES, the krampus will come and eat your socks.
You can also convert a number to decimal without doing division at all, using the "double dabble" algorithm (https://en.wikipedia.org/wiki/Double_dabble). Technically this actually converts the number to binary coded decimal, but this is easily printed (just print it as you would a hex number).
For 16-bit values when the divisor is a constant and fairly small, you could pre-compute
65536/divisor
and use that asS
in theMUL D,S
instruction.It's an extra instruction but this makes use of the smartpin's transmit buffer. Allows the Cordic and the comport to be operated in parallel.
I took some time to look into this and was pretty surprised to find that division via hardware, even in later game consoles (as recent as PS4!), has been avoided in many cases. I mostly stick to 2D stuff so that works.
I see what you mean about the division loop. I did think about doing a loopup table but didn't really start to understand arrays of longs until I had the above code completed. I think I'll revisit it now though.
So, I may need to reference the manual again but is there a reason not to do that? Alignment issues? Thank you in advance!
I'm going to give this a shot. Thank you!
I'll give this a shot as well. It'll be helpful for my journey learning Propeller Assembly. Thank you!
I'm going to look into this. I did see rqpin in the manual once but had no idea why I would use it. I couldn't figure out a reason I wouldn't want "no acknowledge". Thank you!
You all are awesome! I guess i got more homework to do.
You shouldn't put data after RES because RES desynchronizes the cog address counter with the actual data being assembled. It is only to be used to reserve space at the end of cog RAM without emitting corresponding padding into hub RAM. There's a longer explanation somewhere on here but I'm writing from my phone in a waiting room. Someone please find and link it.
I've actually been working on a 3D rendering thing: https://forums.parallax.com/discussion/176083/3d-teapot-demo/p1
I'm not sure if that detail actually matters. I'd just cut'n'pasted from old code. The important part is the reverse order of checking smartpin status before writing the buffer instead the other way around. That and also checking for buffer full as well.