Shop OBEX P1 Docs P2 Docs Learn Events
Some P2 benchmark results — Parallax Forums

Some P2 benchmark results

All of these times are using the fastspin compiler (both P1 and P2), so we're comparing apples to apples. The P2 results were obtained on my DE2-115, so using an 80 MHz clock (the same as the P1, which was on a C3 board).
                     P1                      P2
hub fft_bench: 21622192 cycles (270ms)  4324532 cycles (54 ms)
hub fibo(20):  10507664 cycles (131ms)  4053316 cycles (50 ms)
cog fibo(20):   2977168 cycles (37ms)   2276664 cycles (28 ms)
hub xxtea:       116384 cycles            23450 cycles

I think for both xxtea and fft_bench the hardware multiply is making a big difference. Also, of course, hubexec is a lot faster than LMM; we're not seeing as much of a speedup on the COG test. OTOH since Fibonacci is recursive both the P1 and P2 are hitting the stack (in hub) heavily, so they're quite memory bound.

Eric

Comments

  • RaymanRayman Posts: 14,768
    Strange that cog fibo is not at least 2x faster on P2.
    Even if the assembly code was exactly the same, P2 only has 2 clocks per instruction where P1 is 4...

    Plus, I'd think the new instructions could make a lot of code faster too...
  • SeairthSeairth Posts: 2,474
    edited 2016-05-09 17:47
    Rayman wrote: »
    Strange that cog fibo is not at least 2x faster on P2.
    Even if the assembly code was exactly the same, P2 only has 2 clocks per instruction where P1 is 4...

    Plus, I'd think the new instructions could make a lot of code faster too...

    I suspect that's due to the branching, which will take 5 clocks each.

    Edit: see later comment.
  • Also, hub accesses occur every 16 cycles on both the P1 and P2, so stack accesses are the same.
  • RaymanRayman Posts: 14,768
    Think branching is 8 cycles on P1. Does the cog fibo need hub access? Thought the name implied was only in cog...
  • SeairthSeairth Posts: 2,474
    edited 2016-05-09 17:47
    Rayman wrote: »
    Think branching is 8 cycles on P1. Does the cog fibo need hub access? Thought the name implied was only in cog...

    P1: 4 cycles when branching, 8 cycles when not branching
    P2: 4 cycles when branching, 2 cycles when not branching

    Edit: corrected.
  • cgraceycgracey Posts: 14,209
    Seairth wrote: »
    Rayman wrote: »
    Think branching is 8 cycles on P1. Does the cog fibo need hub access? Thought the name implied was only in cog...

    P1: 4 cycles when branching, 8 cycles when not branching
    P2: 5 cycles when branching, 2 cycles when not branching

    Branches take 4 when taken, 2 when not taken.
  • Rayman wrote: »
    Think branching is 8 cycles on P1. Does the cog fibo need hub access? Thought the name implied was only in cog...

    The stack is in hub in both cases, so yes, the cog fibo needs hub access, and I think that's probably the bottleneck.
  • So, does this mean being able to support a stack in LUT would remove significant bottlenecks for a lot of use cases?

    (Runs so as not to suffer the slings and arrows of outrageous comments )
  • Implementing a 16 deep stack in LUT ram
    		wrlut	value,lutsp	'push to lut
    		incmod	lutsp,#15
    
    		decmod	lutsp,#15	'pop from lut
    		rdlut	value,lutsp
    .
    .
    lutsp		long	0
    
    For debug purposes you can take advantage of the WC bit to check for
    stack over/under flows.
    		wrlut	value,lutsp	'push to lut
    		incmod	lutsp,#15 wc
    	if_c	add	stack_overflow,#1
    
    		decmod	lutsp,#15 wc	'pop from lut
    		rdlut	value,lutsp
    	if_c	add	stack_underflow,#1
    .
    .
    lutsp		long	0
    stack_overflow	long	0
    stack_underflow	long	0
    
    

  • Show off!! :D
  • rjo__rjo__ Posts: 2,114
    mindrobots wrote: »
    Show off!! :D

    He is Australian... what do you expect?

Sign In or Register to comment.