Shop OBEX P1 Docs P2 Docs Learn Events
fft_bench - An MCU benchmark using a simple FFT algorithm in Spin, C, and ... - Page 3 — Parallax Forums

fft_bench - An MCU benchmark using a simple FFT algorithm in Spin, C, and ...

13

Comments

  • RossHRossH Posts: 5,462
    edited 2011-03-06 16:11
    koehler wrote: »
    Who would of thunk it that Basic to LMM would be faster than Catalina?

    Yes, I have been showing Catalina some pictures of the aircraft cemetary as a warning that it needs to lift its game (http://www.modern-ruins.com/boneyard/index.html) :smile:.

    Ross.
  • RossHRossH Posts: 5,462
    edited 2011-03-06 16:13
    jazzed wrote: »
    Where exactly do you see those price comparisons? According to Heater the whole board is 150 euro.

    A Propeller board with the same connectors and 1/10th the performance (if that much) would cost about the same. Of course being a Propeller fan makes up for the disparities :-)

    I just did a quick look up of the processor prices on Digikey.

    Ross.
  • koehlerkoehler Posts: 598
    edited 2011-03-06 16:57
    Ross, no disrespect to all your's and others good work intended. Just that the concept of benchmarking discussed recently is of value and can only help everyone in making better choices and spurring improvements that heretofor might not have been recognized as being really needed.
    RossH wrote: »
    Yes, I have been showing Catalina some pictures of the aircraft cemetary as a warning that it needs to lift its game (http://www.modern-ruins.com/boneyard/index.html) :smile:.

    Ross.
  • RossHRossH Posts: 5,462
    edited 2011-03-06 17:36
    koehler wrote: »
    Ross, no disrespect to all your's and others good work intended. Just that the concept of benchmarking discussed recently is of value and can only help everyone in making better choices and spurring improvements that heretofor might not have been recognized as being really needed.

    Hi koehler,

    No problem - I agree with benchmarking, and am always happy to have Catalina's results compared to other languages. I don't expect it to come top in all cases. I doubt any stack-based language will beat Ariba's result for fft - but it does show we probably also need other benchmarks as well.

    Maybe someone would be interested in converting Whetstone (http://www.netlib.org/benchmark/whetstone.c) or Dhrystone (http://www.netlib.org/benchmark/dhry-c) to Basic? We already have results for these benchmarks for Catalina, Zog and SPIN in this thread.

    Ross.
  • jazzedjazzed Posts: 11,803
    edited 2011-03-06 17:48
    RossH wrote: »
    Yes, I have been showing Catalina some pictures of the aircraft cemetary as a warning that it needs to lift its game (http://www.modern-ruins.com/boneyard/index.html) :smile:.

    Ross.
    Oh no not that! I'm still looking forward to Catalina release 3.0. Got an estimated release date yet?
  • David BetzDavid Betz Posts: 14,516
    edited 2011-03-06 17:51
    jazzed wrote: »
    Oh no not that! I'm still looking forward to Catalina release 3.0. Got an estimated release date yet?
    Me too!!!
  • RossHRossH Posts: 5,462
    edited 2011-03-06 18:56
    Hi David, Jazzed ...

    I thought I was getting close, but yesterday I found a problem with one of the Catalyst demo programs - it crashes on the C3, and I'm yet to figure out why - every other program I try seems to work fine.

    One thing that never occurred to me was that it is impossible to debug C programs executing from FLASH. I'm thinking of making a special "debugging" version of the caching XMM driver that will accept writes to FLASH but just keep them in RAM - that way, I can use the normal Catalina debugger (which needs to replace instructions with breakpoints on the fly).

    I might have something (maybe a pre-release) available this weekend.

    Ross.
  • jazzedjazzed Posts: 11,803
    edited 2011-03-06 20:08
    RossH wrote: »
    I might have something (maybe a pre-release) available this weekend.
    Send me a preview and I'll see if i can get Catalina running with the PropellerPlatform SDRAM board. You have my email.
  • RossHRossH Posts: 5,462
    edited 2011-03-06 21:01
    jazzed wrote: »
    Send me a preview and I'll see if i can get Catalina running with the PropellerPlatform SDRAM board. You have my email.

    I'll send you a pre-release as soon as I have it in reasonable shape.

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2011-03-07 01:06
    When you guys are on a roll I can't keep up. Especially as I have no internet at home just now.

    Jazzed,

    I'm not suggesting a serious comparison between the Prop and the IGEP ARM board. Even if a C3 and IGEP are approaching each other in price. The IGEP will give you a thousand times more bang for the buck if you want: Linux, decent graphics with 3D acceleration, networking, WIFI, BlueTooth, USB (slave and host) etc. It gives you nothing if you want to drive 8 servos simultaneously whilst monitoring a bunch of other sensors and real time events. Horses for courses.

    I'll have a look at running SpinSim on the IGEP/ARM just for fun. Just now I busy checking that all my favorite dev tools will run on it. So far so good but BST is an issue.

    Hein Hein,
    The Prop is not a DSP
    Well, yes and no. What is a DSP anyway? Seems to me that a lot of apps for the Prop are DSP. If you are running Kalman filter for your QuadCopter is that not DSP?. Or just some simple ADC input filtering? What about the vocal tract object? And so on. The term "DSP" is as nebulous as the term "super computer". It is seems to be defined by the speed of generally available hardware at the time rather than any absolute measure.

    Anyway, my purpose in putting up the FFT as a benchmark is nothing to do with "DSP" as generally accepted. I just noticed that it is small but contained a good selection of operators, plus, minus, shift, mul, a good selection of loop constructs, array indexing etc etc which are typical of control programs on an MCU. A better benchmark that FIBO say that is heavily measuring subroutine call time or Dhrystone which is biased toward string handling or Whetstone that contains floating point.

    RossH,

    No idea about comparing prices of Prop and ARM, or anything else for that matter, but as working systems a C3 and an IGEP are getting close in price. See above.
    ...converting Whetstone (http://www.netlib.org/benchmark/whetstone.c) or Dhrystone (http://www.netlib.org/benchmark/dhry-c) to Basic?

    More benchmarks is always better. I'm just not sure that those two are representative of what a Prop does. See above.

    Koehler,
    Who would of thunk it that Basic to LMM would be faster than Catalina?

    Yep that's amazing. But, that BASIC is tuned to programming within the constraints of the Prop whilst C is aimed at much larger software constructions.
    ...and a heads up to Parallax that they really may need to prioritize things such as C...

    My statement above makes me think that C is NOT a priority for the Prop. Perhaps the Prop II.
  • RossHRossH Posts: 5,462
    edited 2011-03-07 03:14
    Heater. wrote: »

    RossH,

    No idea about comparing prices of Prop and ARM, or anything else for that matter, but as working systems a C3 and an IGEP are getting close in price. See above.
    Not really - I checked those prices as well, and the IGEPv2 is about three times the price of the C3. For that price, I can buy a complete computer (a netbook).
    Heater. wrote: »
    More benchmarks is always better. I'm just not sure that those two are representative of what a Prop does. See above.
    I agree they're not ideal - but it's becoming obvious we need more than one benchmark - including at least one that is big enough to require more variables than can be stored in a cog - and those programs are already implemented in a couple of the languages we have.

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2011-03-07 03:57
    RossH,

    You are right. I checked again. My IGEP has cost three times more than a C3.
    Given that the IGEP has a few orders of magnitude more functionality that either makes the C3 very expensive or the IGEP very cheap:)
    Yes for that price you might be able to get a netbook. However I am not about to rely on a netbook on the factory floor under some machine. In my current use case I need the -40C temperature spec. I also need the credit card size.

    Which makes me wonder what the C3 is for. If you want graphics, audio, mouse, keyboard then there are better ways to get them. One of the primary features of the Prop, the thirty two independent general purpose I/O pins is being frittered away on those features.
  • Heater.Heater. Posts: 21,230
    edited 2011-03-07 04:01
    I have to revise my fft_bench result for the XMOS chip. Good for the Prop bad for the XMOS.

    When I ran fft_bench it was the only thread running on the XMOS. What happens if we start up some other threads, perhaps doing nothing but looping? This is what we get:
     XMOS FFT_BENCH RESULTS
    --------------------------
    Threads |    Milli Seconds
    --------------------------
       1    |      4.7
       4    |      4.7
       3    |      4.7
       4    |      4.7
       5    |      5.9
       6    |      7.1
       7    |      8.3
       8    |      9.5
    --------------------------
    

    What does this mean?

    Well. If you have the smallest cheapest XMOS, which is price comparable to a Prop with similar size, I/O capability etc, you only get a single core. However that core can do hardware scheduling of up to 8 threads. Instructions are interleaved in such away that the threads appear to be independent processors with timing determinacy, ALMOST.

    As we see running from one to four threads gives the same result. From this we can guess that an XMOS instruction has four phases, instruction fetch, decode, execute, result write. Whatever. And we can guess that the four phases of the instructions of four threads are being interleaved such that all the threads run at the same speed as just one. So far so good.

    However when we start a fifth thread there are no "spare" phases that it can hide in. It has to add extra time to the overall round-robin cycle. In this case adding on a millisecond to the execution time. Add a 6th thread, there goes another millisecond and so on.

    Conclusion: To use this chip as a Prop replacement we must assume the worst case, that there will be 8 threads in use, and the fft_bench result is 9.5ms not 4.7.

    It gets worse for the XMOS:

    Suppose we have a programming running four threads on an XMOS and at least one of it's threads needs the maximum speed possible to meet some timing constraint, a video driver say.
    Now let's suppose we want to drop in another thread to handle a UART or whatever. BOOM our application now fails as our time critical thread is robbed of time.

    Conclusion: If you want to mix and match code on an XMOS like we are used to on the Prop ALL of that code must assume the worse case execution speed.

    The XMOS chip programmed C is only 5 times faster than the Prop programmed in PASM for this FFT. Not many would entertain the idea of programming the XMOS is assembler but PASM is easy so this is perhaps a valid comparison.

    And worse:

    Suppose we have 5 threads running on an XMOS. What happens when of of those threads enters a waits on a timer or other input? The XMOS has mechanisms that can do similar waits as waitcnt, waitpxx on the Prop. Turns out that the remaining 4 threads get a boost in speed of 20% because they can now use the cycles that were used by the fifth thread.

    Conclusion: There is no timing determinacy in the XMOS. The starting and stopping of a thread can "modulate" the speed of all the other threads. If you want that kind of determinacy you must limit your program to only 4 threads.

    In fairness the XMOS does have timed I/O and other features to ensure timing criteria are met by means other than execution timing.
  • RossHRossH Posts: 5,462
    edited 2011-03-07 04:30
    Heater. wrote: »
    RossH,

    You are right. I checked again. My IGEP has cost three times more than a C3.
    Given that the IGEP has a few orders of magnitude more functionality that either makes the C3 very expensive or the IGEP very cheap:)
    I think the C3 is quite expensive for what it is - but to be fair to Parallax the volumes are probably much lower.
    Heater. wrote: »

    Which makes me wonder what the C3 is for. If you want graphics, audio, mouse, keyboard then there are better ways to get them. One of the primary features of the Prop, the thirty two independent general purpose I/O pins is being frittered away on those features.
    Again, I agree. I think the C3 is intended to be a bit of a "show off" - i.e. to show what much you can do with basically just a single Prop chip - the IGEP board is much more functional, but it is also chock full of peripheral chips, whereas the C3 is quite bare.

    Ross.
  • RossHRossH Posts: 5,462
    edited 2011-03-07 15:10
    Heater. wrote: »

    Conclusion: There is no timing determinacy in the XMOS. The starting and stopping of a thread can "modulate" the speed of all the other threads. If you want that kind of determinacy you must limit your program to only 4 threads.


    Thanks for the analysis. I wondered why the timings you posted previously seemed to show the X___ benchmark results slowing down as more threads became active For those of us used to the Propeller this seemed strange, but your explanation makes sense. Thankfully, we can now put the determinism issue to bed once and for all. There are applications where speed is critical but determinism and cost are not, and for those the X___ is a great solution. For other cases there is the Propeller. But when the Propeller II arrives the choice will not be so clearcut, since its speed boost will put it in direct competition with the X___.

    Now we can get back to more serious issues - like how to make C the language of choice for the Propeller II :smile:

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2011-03-08 00:37
    RossH,
    Thankfully, we can now put the determinism issue to bed once and for all.

    No we cannot. I have to clear up some misunderstandings I may have created about determinism and the X chip. These statements will also apply to the Prop so please read on. Besides it's me you are talking to, I can never drop the stick:)
    There are applications where speed is critical but determinism and cost are not, and for those the X___ is a great solution. For other cases there is the Propeller.

    No.

    1) The X chip is 100% deterministic down to the resolution of it's 100MHz clock. This is something the Prop can only dream of.

    2) Comparing the cost of a small X chip vs a Prop there is not much in it. In fact X was cheaper last time I bought some Props.

    3) In my view, the Prop and X are both tackling the same problem space in similar ways and are directly comparable. This will be even more so with the arrival of the Prop II.

    About determinism:

    We have seen that it is not reasonable to expect execution determinism from the threads on an X chip because:

    1) Starting up new threads can slow down the execution speed of the other threads.

    2) Having threads go in and out of a waiting state can "modulate" the execution speed of other running threads.

    3) Not mention yet is the fact that all X instructions are the same length in clocks except divide. The implication being that if a thread hits a divide instruction it will stall other threads by a clock or so. I have yet to test this.

    4) It NOT expected that programmers resort to writing in assembler for X and counting instructions to get the timing right as we do on the Prop. Rather one is expected to use C. Well there goes your determinism anyway. Change the optimization level and timing will change. Get a new compiler version the produces tighter code and your timing will change.

    So how does X get 100% timing determinism?

    That was a question I put to David May, founder of XMOS, on their forum a year ago. XMOS had made statements about execution determinism that basically fall down as described above. After some discussion I finally got him to state that execution was not deterministic as such BUT:

    Determinism is achieved through hardware support. I have not looked into this much yet. Firstly there are hardware timers that code can wait on. Similar to waitcnt. This can get you pretty accurate timing of your code actions. Secondly there are timers associated with the IO pins. Data can be clocked in AND out to within 10ns or so using those timers. For fast data I/O there are hardware FIFOs and so on.

    These hardware features might be a pain to use if it were not for the XC language developed by X that is basically C with some extensions to directly support timing and communications facilities in the language.

    In this way it is possible for X to support things like USB and SPDIF with ease.

    What does this mean for the Prop and C?:

    You said:
    Now we can get back to more serious issues - like how to make C the language of choice for the Propeller II

    An admirable goal and on that I fully support.

    BUT as you see, when it comes to timing determinism using C on the Prop does not stand a chance. For critical cases one will always be reliant on PASM and instruction counting. This is an issue X has tackled with hardware support and the option to use XC for time critical I/O handling code.

    P.S. For those getting hot under the collar about all the talk of X here please be aware that I am not attempting to push X onto Prop users. Far from it. As I said both chips operate in approximately the same problem space and can stand comparison. I suspect they could both learn a lot from each other.
  • RossHRossH Posts: 5,462
    edited 2011-03-08 02:01
    Heater. wrote: »
    RossH,

    No we cannot. I have to clear up some misunderstandings I may have created about determinism and the X chip. These statements will also apply to the Prop so please read on. Besides it's me you are talking to, I can never drop the stick:)
    I think this is just a consipracy to delay the release of Catalina 3.0 - which will grind Zog into the dust once and for all! :smile:
    Heater. wrote: »

    No.

    1) The X chip is 100% deterministic down to the resolution of it's 100MHz clock. This is something the Prop can only dream of.

    I have no idea what you mean here. The Prop is as deterministic running up to 8 cogs at 80Mhz as the X is running up to 4 thread at 100Mhz (provided you limit yourself to only running 4 threads, and don't use interrupts or divide instructions on the X). Am I missing something? Are we perhaps confusing speed with determinism here?
    Heater. wrote: »
    2) Comparing the cost of a small X chip vs a Prop there is not much in it. In fact X was cheaper last time I bought some Props.

    3) In my view, the Prop and X are both tackling the same problem space in similar ways and are directly comparable. This will be even more so with the arrival of the Prop II.
    Ok - just checked Digikey. For a single core X there's not much in it cost-wise, but you're right - the X is a few cents cheaper. Functionality-wise I don't know enough about the X to really say - but I don't think you could implement 8 independent TV outputs on an X and retain any pretence to deterministic timing - which you can do on the Prop! (I'm not saying this is a useful thing to do - but it is possible on a Prop. Is it possible on a single core X?)
    Heater. wrote: »
    About determinism:

    We have seen that it is not reasonable to expect execution determinism from the threads on an X chip because:

    1) Starting up new threads can slow down the execution speed of the other threads.

    2) Having threads go in and out of a waiting state can "modulate" the execution speed of other running threads.

    3) Not mention yet is the fact that all X instructions are the same length in clocks except divide. The implication being that if a thread hits a divide instruction it will stall other threads by a clock or so. I have yet to test this.
    Why is this not reasonable? All these are true on the Prop!
    Heater. wrote: »
    4) It NOT expected that programmers resort to writing in assembler for X and counting instructions to get the timing right as we do on the Prop. Rather one is expected to use C. Well there goes your determinism anyway.
    Howso? C is as deterministic on the Prop as PASM. Again, I think there may be some confusion between speed and determinism going on here. I can write a C program on the Prop which will toggle a pin with perfectly predictable accuracy (to the level of accuracy of the clock) no matter what is going on in any other cog/thread. The same can't be said of X.
    Heater. wrote: »
    Change the optimization level and timing will change. Get a new compiler version the produces tighter code and your timing will change.
    Of course - this is true of any processor, any compiler and any language - unless you make use of extra timing mechanisms (either external, or provided by the hardware). So what?
    Heater. wrote: »
    So how does X get 100% timing determinism?

    That was a question I put to David May, founder of XMOS, on their forum a year ago. XMOS had made statements about execution determinism that basically fall down as described above. After some discussion I finally got him to state that execution was not deterministic as such BUT:

    Determinism is achieved through hardware support. I have not looked into this much yet. Firstly there are hardware timers that code can wait on. Similar to waitcnt. This can get you pretty accurate timing of your code actions. Secondly there are timers associated with the IO pins. Data can be clocked in AND out to within 10ns or so using those timers. For fast data I/O there are hardware FIFOs and so on.

    These hardware features might be a pain to use if it were not for the XC language developed by X that is basically C with some extensions to directly support timing and communications facilities in the language.
    Yes, I said much the same thing on one of the very first threads on this issue - i.e. there are two ways of achieving determinism. You can have it as an inherent attribute of the processor (as the Prop does) or you can "add on" extra timing mechanisms (like the X apparently does) and rely on programmers using them effectively (e.g. on the X you not only have to use XC so that you don't have to mess with all the complications directly, you also have to avoid some XC facilities - e.g. you have to write your program to use events and avoid interrupts).
    Heater. wrote: »
    In this way it is possible for X to support things like USB and SPDIF with ease.
    Again, I think supporting USB etc has more to do with speed than determinism. And there's no question that a single thread on the X is faster than a single cog on the Prop (one clock cycle per instruction on the X vs four on the Prop I, from memory? - but this will change with the Prop II).
    Heater. wrote: »

    What does this mean for the Prop and C?:

    ... when it comes to timing determinism using C on the Prop does not stand a chance. For critical cases one will always be reliant on PASM and instruction counting. This is an issue X has tackled with hardware support and the option to use XC for time critical I/O handling code.
    Yet again, I think this is more about speed than determinism. It is no different to any other computer system or any other language - it doesn't matter how fast your language is, time critical code sections are generally better written in assembler. And I'd rather do that on a Prop than on an X (or an ARM for that matter)!
    Heater. wrote: »

    P.S. For those getting hot under the collar about all the talk of X here please be aware that I am not attempting to push X onto Prop users. Far from it. As I said both chips operate in approximately the same problem space and can stand comparison. I suspect they could both learn a lot from each other.

    There's a world of difference between discussing the technical merits of various architectural features for various applications, and just blindly spruiking a competitor's products. Although given that this is a Parallax forum I have an inherent advantage - since it is of course perfectly ok for me to blindly spruik Parallax products! :lol:.
  • BeanBean Posts: 8,129
    edited 2011-03-08 05:41
    I don't want to hijack this thread, but here is a simple bubble sort in PropBasic.

    This takes 1152mSecs to bubble sort 1024 LONGs that are in reverse order.

    I think this would be alot easier to translate into other languages.

    Bean
  • RossHRossH Posts: 5,462
    edited 2011-03-08 13:25
    Bean wrote: »
    I don't want to hijack this thread, but ...

    Hijack it! Please! Before Leon comes along! :smile:

    I must say PropBasic looks like a very neat alternative to programming in the usual combination of SPIN/PASM.

    However, as a benchmark bubble sort will suffer the same problem as fibo - i.e. it only exercises one particular aspect of the language. The fibo program only exercises procedure calls, and bubble sort only exercises copying longs to and from hub RAM. While It could be included as part of a larger "suite" of tests designed to exercise particular aspects of the various languages, it is not a suitable benchmark on its own.

    Anyway, Catalina is still sore at losing to Ariba - it's sulking in its hangar and currently refusing to fly! :frown:

    Ross.
  • jazzedjazzed Posts: 11,803
    edited 2011-03-08 15:52
    I didn't know you store "C planes" in a hangar. Either way take off and landing can be pretty rough, but those pontoons can yield lots of extra miles.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2011-03-08 16:58
    In response to Beans bubble sort, I coded this in BCX basic. 17 seconds, so about 16x slower.
    #include <stdio.h>
    #include <catalina_hmi.h>
    #include <ctype.h>
    
    $COMMAND
    catalina -lc -lm -D DEMO -D HIRES_VGA
    ' library c, library maths, demo board, hires vga, mouse, keyboard
    $COMMAND
    
    ' ************ main **********
    Dim Values[1024] As Short
    Dim temp As Integer
    Dim value1 As Short
    Dim value2 As Short
    Dim strtCnt As Long
    Dim exchange As Char
    
      t_string(1,"Bubble sort test for the Propeller using BCX Basic and Catalina")
      t_char(1,13)					' carriage return
      t_char(1,10)					' line feed
    
    ' initialize array upside down
    For temp = 0 To 1023
      value1 = 1023 - temp
      Values[temp] = value1
    Next temp
    
    ' start timing from here
    strtCnt = _cnt()
    'Print strtCnt
    t_string(1, "Start Bubble")
    
    ' Bubble sort
    Do
    	exchange = 0
          For temp = 0 To 1022
     		value1 = Values[temp]
    		value2 = Values[temp+1]
    		If value1 > value2 Then
     			Values[temp] = value2
    			Values[temp+1] = value1
    			exchange = 1
    		End If
          Next temp
    Loop Until exchange = 0
    
    ' stop timing here
    strtCnt = _cnt() - strtCnt
    
    ' Print elapsed time
    strtCnt = strtCnt / 80000
    t_string(1,"Done")
    
    Do
    Loop Until 0=1                          ' infinite loop
    ' *********** end main *******
    

    Of course, in the tough harsh highly competitive world of benchmarks, one always includes a benchmark that shows off ones program in the best light! For BCX basic on the dracblade, that would be a program that wrote to the serial port, displayed on a TV/VGA in color, sent text to a 20x4 LCD display, had mouse and keyboard input, read and wrote 100k files to and from the SD card and was 200k long. Propbasic can't do all that. But for pure speed, Propbasic wins hands down <applause>.
  • RossHRossH Posts: 5,462
    edited 2011-03-08 17:38
    jazzed wrote: »
    I didn't know you store "C planes" in a hangar. Either way take off and landing can be pretty rough, but those pontoons can yield lots of extra miles.
    Of course "C planes" have hangars!

    This is what Catalina will get if it lifts its game.

    This is what it will get if it doesn't .

    Ross.

  • BeanBean Posts: 8,129
    edited 2011-03-08 17:59
    I "tried" to convert the fft_bench program to PropBasic, but it doesn't work right (shows a bunch of data).
    I can't figure out where I went wrong in the conversion. I have attached it if anyone wants to see if they can fix it.

    Bean
  • RossHRossH Posts: 5,462
    edited 2011-03-08 18:17
    Bean wrote: »
    I "tried" to convert the fft_bench program to PropBasic, but it doesn't work right (shows a bunch of data).
    I can't figure out where I went wrong in the conversion. I have attached it if anyone wants to see if they can fix it.

    Bean

    Perhaps if Ariba could post his version? It would probably be easier to convert that.

    Ross.
  • AribaAriba Posts: 2,690
    edited 2011-03-09 00:16
    Attached is my Basic source code.
    But PropBasic is very different. From a first look I have found these bugs in the PropBasic source:
    In the butterfiles routine:
    k1 = k1 >> 12 is not the same as k = k ~> 12, I have changed it to:
    \ sar k1,#12
    the same for k2 and k3, but it gives still wrong results.

    In the printSpectrum routine, the SQRTI is not called for the magnitude. Are you sure that sqr algorithm works correct?

    The bit-reverse is very complicated, you can use the assembly REV instruction.

    I have also attached my modified PBAS, but it works not yet.

    Andy
  • Heater.Heater. Posts: 21,230
    edited 2011-03-31 03:33
    RossH,

    Some belated answers to some queries a few posts back:
    I have no idea what you mean here. The Prop is as deterministic...the X is...Am I missing something? Are we perhaps confusing speed with determinism here?

    No we are not confusing determinism with speed.

    As we have seen the X can be a bit wonky in it's execution determinism, like firing up another thread can slow all the others or perhaps divides in one thread put a spanner in the works of another.

    BUT the X has another approach to determinism.

    For example when an input changes it can be timestamped to 100MHz resolution in hardware independently of your running code. When an output is required to be changed at a given time in can be clocked out by a hardware timer again to a 100Mhz resolution. In this way it does not matter if your code is a bit jittery in it's execution, the actual timing of I/O is nailed down by hardware on the IO ports. Provided you have enough time for the code to run, worst case, then your IO is totally deterministic.

    Point is that as a "black box" it is not necessary to know the timing of what goes on inside as long as it all looks good from the outside.

    Basically for tip-top determinism one is expected to make use of the hardware features available rather than count instruction cycles in code.

    As for that "worst case" scenario the X development tools provide timing analysis to check that your code does not blow it's time budget.
    ...but I don't think you could implement 8 independent TV outputs on an X and retain any pretence to deterministic timing

    For sure the X can do video output, provided it is done as described above there is no reason not to run two or more until you run out of RAM.
    Howso? C is as deterministic on the Prop as PASM. Again, I think there may be some confusion between speed and determinism going on here. I can write a C program on the Prop which will toggle a pin with perfectly predictable accuracy (to the level of accuracy of the clock) no matter what is going on in any other cog/thread. The same can't be said of X.

    Yes C is as deterministic as the PASM, after you have written the code it always runs the same. The point of PASM determinism though is that it is relatively easy to write carefully timed loops and other sequences and know as you write it that it will work. That's because all instructions you are likely to be using doing that with take the same number of clocks. You write the sequence to get the timing you want. You are not going to be doing that in C.

    On X you are not expected to write in assembler and count instructions to get the timing you want. Just write it in C (or XC) and use the IO hardware timers (or just normal timers) to get the timing fixed. Use the timing analysis tools to check it will fly.

    No, there is no confusion between speed and determinism.
    ...it doesn't matter how fast your language is, time critical code sections are generally better written in assembler. And I'd rather do that on a Prop than on an X

    Exactly, for timing critical stuff C and other high level languages are out. And yes I'd rather write PASM than any other assembly language I have come across.

    The X is bravely trying to change that. Hence the XC variant of C they have developed that incorporates time and parallelism into it's syntax and semantics. They are wanting to attract applications that would otherwise have been done in VHDL or Verilog on an FPGA. They want to offer the simplicity and familiarity of a C like language. They know that saying "do it in ASM" will not wash.

    We have yet to see how that plan works out...
  • davidsaundersdavidsaunders Posts: 1,559
    edited 2011-03-31 17:15
    While the // comments are not C89, they are C99, and most C compilers since about 1986 have supported them.
  • Heater.Heater. Posts: 21,230
    edited 2011-04-08 01:54
    From time to time people on this forum have compared the Prop to low end ARM processors. So here goes:

    The C version of fft_bench on an ATMEL AT91RM920 at 180MHz runs in 7ms.

    Make of it what you will.
  • RossHRossH Posts: 5,462
    edited 2011-04-08 02:45
    Heater. wrote: »
    From time to time people on this forum have compared the Prop to low end ARM processors. So here goes:

    The C version of fft_bench on an ATMEL AT91RM920 at 180MHz runs in 7ms.

    Make of it what you will.

    So a microprocessor with more than double the clock speed, and which costs between four and five times the price (according to Digikey) outperfoms the Propeller?

    How surprisement! :lol:

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2011-04-08 03:44
    RossH,

    Not much surprise I must admit, I just happen to have a box running one of them plopped on to my desk so, well, you know, I just had to do it.
    It's just another data point.

    However, let's look at that clock frequency comparison and the performance of C.

    Let's be generous to the Prop and divide the ATMELs clock by 3 down to 60MHz, comparable to the Prop wouldn't you say. So we would then have an fft_bench execution time on the ATMEL three times bigger at 21ms.

    Compare to the fft_bench in Catalina C on the Prop at approx 400ms.

    As we see C on the Prop sucks by a factor of about 20 !!!

    Ah you say "Zog and C sucks by an even bigger factor", perhaps true but at this glacial pace no one notices:)

    It's this kind of quick calculation that causes MCU users to skip over the Prop page in the catalog. Support for C on the Prop is poor.
Sign In or Register to comment.