Shop OBEX P1 Docs P2 Docs Learn Events
CORDIC is useful for Bezier curve generation — Parallax Forums

CORDIC is useful for Bezier curve generation

roglohrogloh Posts: 5,837
edited 2021-04-20 05:16 in Propeller 2

I've been mucking about with Bezier curves in my graphics memory stuff lately and have found that the CORDIC engine pipeline is pretty good for this.

Here's a sample function I created which will generate and plot a cubic Bezier curve from 4 co-ordinate pairs (2 points on the curve, and 2 control points). I think it could be sped up a bit more if the inline PASM wasn't being loaded each iteration, because I'm calling SPIN2 code for each step in the loop. I'm guessing it would be getting reloaded each time unless it is cached by FlexSpin. Maybe it would be faster to save all the computed points in some scratch array area from the inline PASM, then draw the line segments.

Is there more CORDIC pipelining that could be done here to optimize it more?

Hoping to include this type of capability in my memory driver as a graphics acceleration feature for speeding up curve generation so you don't need to make a memory request for every pixel, but rather one for the entire curve in a request list. It can be used for generating large vector fonts for example. It's how TrueType and OpenType fonts work. I read that TrueType uses quadratic Bezier curves which would be faster to calculate while OpenType uses cubics as I have used below.

' plot a cubic Bezier curve

PUB bezier3(x1,y1,x2,y2,x3,y3,x4,y4,colour) | t,x,y,a,b,c,d,e,x0,y0

' show control lines to debug
 '  line(x1,y1,x2,y2,colour)
 '  line(x2,y2,x3,y3,colour)
 '  line(x3,y3,x4,y4,colour)

    x0:=x1
    y0:=y1
    'scale up co-ordinates to reduce truncation errors
    x1<<=8
    x2<<=8
    x3<<=8
    x4<<=8
    y1<<=8
    y2<<=8
    y3<<=8
    y4<<=8

    repeat t from 0 to 1024 step 16  ' the step size controls smoothness 
        org
            mov a, ##1024
            sub a, t 'a=1024-t
            mov b, a
            mul b, a 'b=a*a
            mov c, t
            mul c, t 'c=t*t
            mov d, c

            mov e, #3
            mul e, a 'e=3*a

            qmul d, t

            qmul c, e

            mov e, #3
            mul e, t 'e=3*t

            qmul b, e

            qmul a, b

            getqx d  'd=t*t*t
            getqx c  'c=3*a*t*t
            getqx b  'b=3*a*a*t
            getqx a  'a=a*a*a

            qmul x1, a
            qmul y1, a
            qmul x2, b
            qmul y2, b
            qmul x3, c
            qmul y3, c
            qmul x4, d
            qmul y4, d

            getqy x

            getqy y

            getqy e
            add x,e

            getqy e
            add y,e

            getqy e
            add x,e

            getqy e
            add y,e

            getqy e
            add x,e
            getqy e
            add y,e

            shr x,#6
            shr y,#6
        end
        line(x0,y0,x,y,colour)
        x0:=x
        y0:=y

Comments

  • Here's what it can plot..

  • roglohrogloh Posts: 5,837
    edited 2021-04-20 05:49

    Just tried a new version that is computing all the points and saving these into an array before drawing the line segments. I think FlexSpin must be caching the inline PASM2 because the version which saves off the x,y results first into a scratch buffer is about 6% slower than the version which draws each line segment from inside the loop itself when the step size was 16, and 2% slower when the step size was 4. Weird. I wasn't expecting that at all.

  • cgraceycgracey Posts: 14,206

    You could interleave those CORDIC operations a bit more, but why not just use MUL/MULS or SCA/SCAS instructions?

    That looks really nice!

  • roglohrogloh Posts: 5,837
    edited 2021-04-20 08:31

    I think MUL will overflow the range I am using eg. 1024 * 1024 * 1024 * (1920), but I've not considered SCA/SCAS. I'll need to take a look into that. It would be good to speed it up even more.

    Actual formula is this, where t=0..1, but I scaled it from 0-1024 with integers instead of floating point values to give me up to 1024 segments per curve (or less segments for less smoothness):

    x(t) = (1-t)(1-t)(1-t).x1 + 3(1-t)(1-t).t.x2 + 3(1-t).t.t.x3 + t.t.t.x4

    similar for y(t)

  • I found if I just used a range from 0-32, I could make use of MUL because 32 is less than the cube root of 2^16. You lose a bit of smoothness on longer curves and tight bends but it is still reasonably curvy and is probably a decent approximation.

    It speeds up the computation by 33% relative to the CORDIC approach for the same segment count of 32.

    PUB bezierfast(x1,y1,x2,y2,x3,y3,x4,y4,colour) | t,x,y,a,b,c,d,e,x0,y0
    
    ' show control lines to debug
     '  line(x1,y1,x2,y2,colour)
     '  line(x2,y2,x3,y3,colour)
     '  line(x3,y3,x4,y4,colour)
    
        x0:=x1
        y0:=y1
        'scale up co-ordindates to reduce truncation errors
        x1<<=1
        x2<<=1
        x3<<=1
        x4<<=1
        y1<<=1
        y2<<=1
        y3<<=1
        y4<<=1
    
        repeat t from 0 to 32 'step size controls smoothness 
            org
                mov a, ##32
                sub a, t 'a=32-t
                mov b, a
                mul b, a 'b=a*a
                mov c, t
                mul c, t 'c=t*t
                mov d, c
                mul d, t ' d=t*t*t
    
                mul c, #3
                mul c, a  'c=3*a*t*t
    
                mul a, b ' a=a*a*a
    
                mul b, #3
                mul b, t ' b=3*a*a*t
    
                mov x,x1
                mul x, a
    
                mov y,y1
                mul y, a
    
                mov e, x2
                mul e, b
                add x, e
    
                mov e, y2
                mul e, b
                add y, e
    
                mov e, x3
                mul e, c
                add x, e
    
                mov e, y3
                mul e, c
                add y, e
    
                mov e, x4
                mul e, d
                add x, e
    
                mov e, y4
                mul e, d
                add y, e
    
                shr x,#16
                shr y,#16
    
            end
            line(x0,y0,x,y,colour)
            x0:=x
            y0:=y
    
  • RaymanRayman Posts: 14,744
    edited 2021-04-20 14:49

    sca is a neat instruction... Hope @ersmith is using that in his fixed point math implementation....

    Would be neat to use this for smooth fonts.

Sign In or Register to comment.