The hanging issue is due to a bug in the way the propeller gcc+library handles sin and cos builtin functions -- it tries to optimize a call to sin() and cos() with the same argument into a call to cexp(), but cexp() is implemented internally with sin() and cos() and has the same optimization, so it gets stuck in an infinite recursion. The solution for now is to build the program with the -fno-builtin flag. I'm working on a better fix.
Heater: the atan problem is fixed in the current mercurial repository.
The hanging issue is due to a bug in the way the propeller gcc+library handles sin and cos builtin functions -- it tries to optimize a call to sin() and cos() with the same argument into a call to cexp(), but cexp() is implemented internally with sin() and cos() and has the same optimization, so it gets stuck in an infinite recursion. The solution for now is to build the program with the -fno-builtin flag. I'm working on a better fix.
Heater: the atan problem is fixed in the current mercurial repository.
Does the -fno builtin flag go into the compiler options?
I've checked in a fix for the sin()/cos() optimization problem, which also has the benefit that calculations of sin() and cos() will actually be faster (there's a combined __sincos() function now for calculating both at once). The fix is in the library source code, no compiler changes were necessary, so if you know your way around the command line you can get the current library source code from Mercurial and rebuild it. The next official release will also have this change as well, of course.
OK. Newbie perspective again... .. Jazzed, you able to do this on the Prop inside SimpleIDE or through command line? What all parameters are you using. I get a little further each time as I see how people are resolving issues but putting it all together is my challenge. So close... Thanks
OK. Newbie perspective again... .. Jazzed, you able to do this on the Prop inside SimpleIDE or through command line? What all parameters are you using. I get a little further each time as I see how people are resolving issues but putting it all together is my challenge. So close... Thanks
Used SimpleIDE.
Changed double u = atan(0.99664719 * tan(lat_rad)); to double u = atanf(0.99664719 * tan(lat_rad));
Don't forget to check Math Lib in the library tab. Rest looks like this.
That's interesting. We are both using the -fno -builtin flag. You are using the windows version on a C3F and have the compiler using C. I am using the PropBOE with a card (BOE-SDXMMC) on a Linux and need to use C++ to avoid compile errors (see attached errors.doc). I also have to have the #include <spa.c> in the spa_test program to avoid errors (see attached errors2.doc). I also noticed you added the #define dprintf and #include <unistd.h> which I have added. With my configuration it compiles (see attached compiler output) starts the program but still locks up. Could there be a difference with the SimpleIDE or board versions that accounts for the difference in results?
That's interesting. We are both using the -fno -builtin flag. You are using the windows version on a C3F and have the compiler using C. I am using the PropBOE with a card (BOE-SDXMMC) on a Linux and need to use C++ to avoid compile errors (see attached errors.doc). I also have to have the #include <spa.c> in the spa_test program to avoid errors (see attached errors2.doc). I also noticed you added the #define dprintf and #include <unistd.h> which I have added. With my configuration it compiles (see attached compiler output) starts the program but still locks up. Could there be a difference with the SimpleIDE or board versions that accounts for the difference in results?
I'm using an updated SimpleIDE, but the problems you have are very fundamental issues.
I don't see any -lm in your first error report. I get exactly the same thing if I leave it out.
Why do you have -L .... in your error output? You shouldn't need that at all.
As for running, I suggest putting some kind of a startup message at the beginning with a 1 second delay before it.
Make sure you SD card is freshly formatted too. Corrupted SD cards cause infinite grief. There are some SD cards that refuse to work. Reformat or try another one once these build errors are resolved.
Why do you have -L .... in your error output? You shouldn't need that at all.
As for running, I suggest putting some kind of a startup message at the beginning with a 1 second delay before it.
Make sure you SD card is freshly formatted too. Corrupted SD cards cause infinite grief. There are some SD cards that refuse to work. Reformat or try another one once these build errors are resolved.
I'm not sure how the -L gets in there. (from the PBOE selection?). Oddly enough when I add the printf it only prints the first 8 characters. I will try the reformat on the card. It seems a little flaky requiring a restart of the prop occasionally to work.
I'm not sure how the -L gets in there. (from the PBOE selection?). Oddly enough when I add the printf it only prints the first 8 characters. I will try the reformat on the card. It seems a little flaky requiring a restart of the prop occasionally to work.
I just tried this with my PROPBOE-SDXMMC board type. It's extraordinarily slow because of the way SD cards work, but it did finish. SPI Flash would be much faster.
I've asked one of our hardware specialist friends to come up with a micro-SD SPI-Flash plug-in card - don't have an ETA on that yet. If he doesn't do it I will start it ... in August.
Thanks!!! On my PBOE-XMMC it compiles and prints "Starting SPA" then hangs. I'll pick up a different brand of card and compile it on a Windows computer to see if that changes it. The -L compiler flag disappeared though.
Thanks!!! On my PBOE-XMMC it compiles and prints "Starting SPA" then hangs. I'll pick up a different brand of card and compile it on a Windows computer to see if that changes it. The -L compiler flag disappeared though.
It "hangs" for me too for several minutes then finishes.
It "hangs" for me too for several minutes then finishes.
I swapped around cards and it started running. It takes about 4 minutes before the final result comes out but it works. That's amazing that a $7 microcontroller could calculate the sun's location within 0.0003 degrees. At 80mhz it must be doing a lot of integer churning to get the 64 bit math done. Is there a chance the Propeller's multiple cogs could be tapped in the future to speed up the math? To run real time I would like to get the run time down to a second or so.
That's amazing that a $7 microcontroller could calculate the sun's location within 0.0003 degrees.
Indeed it is.
But some might say it's amazing that it takes a $7 MCU so long to do it.
Just for fun I ran this on the amazingly cheap and small ARM processor of the Raspberry PI board.
time ./spa_tester Starting SPA.
Julian Day: 2452930.312847
L: 2.401826e+01 degrees
B: -1.011219e-04 degrees
R: 0.996542 AU
H: 11.105902 degrees
Delta Psi: -3.998404e-03 degrees
Delta Epsilon: 1.666568e-03 degrees
Epsilon: 23.440465 degrees
Zenith: 50.111622 degrees
Azimuth: 194.340241 degrees
Incidence: 25.187000 degrees
Sunrise: 06:12:43 Local Time
Sunset: 17:20:19 Local Time
real 0m0.055s
user 0m0.020s
sys 0m0.020s
IT does the job in about 50ms or shall wee say nearly 5000 times faster!!
Not knocking the Prop at all, it's just not designed for this sort of thing. I cannot see that 4 mins run time on the Prop coming down to a second any time soon.
Thanks for the perspective Heater. 55 ms execution time is right on the mark for timing assuming I could squeeze in the mechanical control concurrently. Any idea of why would it run faster on an ARM Processor? Does the architecture include a floating point co-processor? I wonder if part of the time lag has to do with the underlying implementation of the trigonometry functions/libraries which is accurate but not optimised for larger analysis or the 64 bit math. It may be that the next generation of updated libraries will have more efficient algorithms and/or use more cogs to speed it up. This is only the first generation of GCC which will work for most applications but is stressing on this one. I guess I'm a little more optimistic that architecture-specific speed will come along.
It was in XMMC on a PropBOE-XMMCSD with an SD Card compiled with the same options as yours.
Ok, sorry I forgot and have been out of the loop a while.
A DIP SPI Flash on the bread-board will cut the time to a minute and cost $0.63 each and 4 propeller pins.
Buy 2 of those and cut the time to 30 seconds using 10 propeller pins.
RPi doing this at 55ms is very compelling of course.
Yeah. 55 ms would be nice. Tracking hardware and alignment aside, the formula should be accurate enough to resolve the azimuth/altitude of the sun over time to within 72 ms for real time tracking such as solar concentrator. The 30 seconds time period would allow an iterative 1/8 degree increment of angle which works for standard solar panels and may still work for periodic solar snapshots if the location is predicted in advance and the camera waits for the precise moment and direction when the sun passes through.
Yeah. 55 ms would be nice. Tracking hardware and alignment aside, the formula should be accurate enough to resolve the azimuth/altitude of the sun over time to within 72 ms for real time tracking such as solar concentrator. The 30 seconds time period would allow an iterative 1/8 degree increment of angle which works for standard solar panels and may still work for periodic solar snapshots if the location is predicted in advance and the camera waits for the precise moment and direction when the sun passes through.
I'm confused. Doesn't the program resolve to a very specific time for any time of day whether you run the calculation at 6AM or any other time?
I mean you could do several calculations per hour, and each result would give you precise azimuth for, etc.... a given time of day. right?
I think the point is to match the reolution of the algorithm against the speed that you can run it at. Given 64 bit doubles you can get down to a certain fraction of a degree resolution in the result. But for the sun to move that far takes a certain amount of time. There would be no pont in running the algorithm at a faster repetion rate than that time as you would always get the same result.
Moving to 32 bit floats yields a much worse angular resolution and hence time it takes a for the result to change so you might as well have a lower repetition rate.
Bit like not bothering to read an ADC a million times per second when you have a one Hz input signal and only an 8 bit ADC. Most of the time you would be reading the same result.
You right that it does resolve for a very specific time and location. Although we can generally think of the formula as a continuous function, in practice when you account for accuracy it is more of a discrete function with steps.
The problem is that the formula only resolves to 0.0003 degrees of accuracy which is the distance that the sun moves across the sky in about 72 ms. Hence, we can only predict its location +/- 72 ms or 0.0003 degrees which are interchangeable. The sun moves at 360deg/24hr/60min/60sec = 0.0042 degrees per second. So, if you were exactly pointing towards the sun at time zero, you could at best predict the next point which is 0.0003 degrees away and occurs 72 ms later.
So, if you want to point at the sun for a solar observation and stay on target, you would want to use a motor that continuously moves at 0.0042 degrees per second which would be about 14 steps of 0.0003 degrees which are about 72 ms apart. Hence, one calculation per 55 ms is great.
If you just wanted to take a single accurate snapshot (+/- 0.0003 degrees ) then you could predict that time and direction anywhere along the line then just point and wait for the appointed time which would only last for a matter of milliseconds.
For a regular solar panel though, it works well if you can aim it within a degree or so considering the law of cosines means you don't lose too much power for small errors. The sun moves about 1 degrees in 4 minutes. (360deg/24hr/60)=0.25 degrees per minute. Hence, one calculation per 4 minutes is fine.
P.S. The rate of motion of the motors (or slew rate) on a 2 axis system changes over the day like a sine and cosine wave when the altitude and azimuth are translated from polar coordinates to x,y which is where a micro-controller can come in handy because it can calculate the optimum speed and update frequency for both axis to save power. It also allows more complex designs.
A carefully aligned 1-axis is less complex and only needs a single motor running at a fixed speed in sync with the sun rotation speed with only occasional adjustments of the axis for seasonal variation. (But where's the fun in that?).
55 ms execution time is right on the mark for timing assuming I could squeeze in the mechanical control concurrently.
You have pretty stringent requirements there.
An ARM as on the Raspberry Pi looks like it could take care of the heavy floating point number crunching in the time available.
Then you have the problems of motor position control and monitoring which my gut tells me could be doable on the raspi but would be much easier on a the Propeller.
I would look using both. ARM for number crunching, Prop for motor/servo control and other hardware handling.
Connect the two via serial link.
As a bonus the raspi does not drive VGA, you need a TV or HDMI monitor, but the Prop can drive VGA as a text terminal or with limited graphics.
Best of both worlds "Propeller Pi".
Any idea of why would it run faster on an ARM Processor?
Oh yeah.
Firstly the Pi processor is running at 700MHz and pretty much executes one instruction per clock (float ops may be slower), meanwhile the Prop has only an 80MHz clock and executes a PASM instruction every 4 clocks, 20MHz. There is a factor of 35 in speed already and the Prop execution time could be expected to be 35 * 0.055 = 1.9 seconds.
Now, any large piece of code, like your algorithm, will not fit into the Props COG space where it can be run at full speed but must be run by fetching its instructions from the larger HUB RAM into COG and then executing them. This must be done with some simple software loop, the so called Large Memory Model (LMM). That fetch execute loop effectively slows execution of your actual code by a factor of 4 or so. So now we have a speed reduction factor of 35 * 4 = 140.
(One could also write larger programs in Spin or other language that compiles to byte codes but then your byte code execution is a factor of 50 to 100 slower.)
If that were all there were to it your algorithm that takes 55m on raspi would take about 8 seconds on the Prop.
Finally the Prop has no floating point hardware support, all floating point ops must be done in software using the available integer ops. No idea of how many instructions it takes to do a double precision floating point addition on the Prop, could be 50 to 100 say. And I have no idea how many clocks the ARM takes to do float op. I suspect more than one. So here I cannot compare very well any more.
I would guess a slow down of, say, 100, that brings the Prop version up to 800 seconds or 13 minutes!!!
Hmm..so looking at it that way I'm surprised the Prop is doing as well as it does. Especially as we haven't even started to think about the speed of fetching code from external RAM which seems to be required in this case.
How am I going wrong in my comparison here? Is caching code in COG helping things a lot here?
I could not help but play with this a bit. As we want to know how fast you could crank out spa calculations it makes sense to exercise the actual spa code by itself and not include all the overheads of the printf output. To that end I wrapped the call to spa_calculate in a loop that calls it 10000 times. Then then only the result of that last call is printed just to check for sanity.
Running this on my 2GHz PC takes 0.77 seconds or 0.07ms per iteration
On the Raspberry Pi that's 14.2 seconds or 1.4ms per iteration.
Seems that my previous raspi time of 50ms is very misleading as it includes all the printf output.
Attached is what I am running here. How does it go on a Prop? You might want to reduce the iteration count if it is taking forever.
Comments
Has that been fixed? I have an old compiler here at the moment.
OK. I change that atan to atanf and change to using gcc instead of c++ also add -Wall
BINGO it compiles without error or warning!
Does it run? No idea, no props here at the moment.
Heater: the atan problem is fixed in the current mercurial repository.
That works. It takes a while though.
With 64bit doubles .... same result Heater had.
With 32bit doubles ....
Does the -fno builtin flag go into the compiler options?
Yes.
Eric
OK. Newbie perspective again... .. Jazzed, you able to do this on the Prop inside SimpleIDE or through command line? What all parameters are you using. I get a little further each time as I see how people are resolving issues but putting it all together is my challenge. So close... Thanks
Used SimpleIDE.
Changed double u = atan(0.99664719 * tan(lat_rad)); to double u = atanf(0.99664719 * tan(lat_rad));
Don't forget to check Math Lib in the library tab. Rest looks like this.
I'm using an updated SimpleIDE, but the problems you have are very fundamental issues.
I don't see any -lm in your first error report. I get exactly the same thing if I leave it out.
Why do you have -L .... in your error output? You shouldn't need that at all.
As for running, I suggest putting some kind of a startup message at the beginning with a 1 second delay before it.
Make sure you SD card is freshly formatted too. Corrupted SD cards cause infinite grief. There are some SD cards that refuse to work. Reformat or try another one once these build errors are resolved.
I'm not sure how the -L gets in there. (from the PBOE selection?). Oddly enough when I add the printf it only prints the first 8 characters. I will try the reformat on the card. It seems a little flaky requiring a restart of the prop occasionally to work.
I just tried this with my PROPBOE-SDXMMC board type. It's extraordinarily slow because of the way SD cards work, but it did finish. SPI Flash would be much faster.
I've asked one of our hardware specialist friends to come up with a micro-SD SPI-Flash plug-in card - don't have an ETA on that yet. If he doesn't do it I will start it ... in August.
I have another PropBOE XMMC specific idea too
Thanks!!! On my PBOE-XMMC it compiles and prints "Starting SPA" then hangs. I'll pick up a different brand of card and compile it on a Windows computer to see if that changes it. The -L compiler flag disappeared though.
I swapped around cards and it started running. It takes about 4 minutes before the final result comes out but it works. That's amazing that a $7 microcontroller could calculate the sun's location within 0.0003 degrees. At 80mhz it must be doing a lot of integer churning to get the 64 bit math done. Is there a chance the Propeller's multiple cogs could be tapped in the future to speed up the math? To run real time I would like to get the run time down to a second or so.
Thanks for all the help!!!!
So, break up the math into parts that can be done independently.
Indeed it is.
But some might say it's amazing that it takes a $7 MCU so long to do it.
Just for fun I ran this on the amazingly cheap and small ARM processor of the Raspberry PI board.
IT does the job in about 50ms or shall wee say nearly 5000 times faster!!
Not knocking the Prop at all, it's just not designed for this sort of thing. I cannot see that 4 mins run time on the Prop coming down to a second any time soon.
Four minutes? What mode?
I clocked 1 minute on a C3 and 30 seconds on other modules today using XMMC.
It was in XMMC on a PropBOE-XMMCSD with an SD Card compiled with the same options as yours.
A DIP SPI Flash on the bread-board will cut the time to a minute and cost $0.63 each and 4 propeller pins.
Buy 2 of those and cut the time to 30 seconds using 10 propeller pins.
RPi doing this at 55ms is very compelling of course.
I'm confused. Doesn't the program resolve to a very specific time for any time of day whether you run the calculation at 6AM or any other time?
I mean you could do several calculations per hour, and each result would give you precise azimuth for, etc.... a given time of day. right?
Moving to 32 bit floats yields a much worse angular resolution and hence time it takes a for the result to change so you might as well have a lower repetition rate.
Bit like not bothering to read an ADC a million times per second when you have a one Hz input signal and only an 8 bit ADC. Most of the time you would be reading the same result.
The problem is that the formula only resolves to 0.0003 degrees of accuracy which is the distance that the sun moves across the sky in about 72 ms. Hence, we can only predict its location +/- 72 ms or 0.0003 degrees which are interchangeable. The sun moves at 360deg/24hr/60min/60sec = 0.0042 degrees per second. So, if you were exactly pointing towards the sun at time zero, you could at best predict the next point which is 0.0003 degrees away and occurs 72 ms later.
So, if you want to point at the sun for a solar observation and stay on target, you would want to use a motor that continuously moves at 0.0042 degrees per second which would be about 14 steps of 0.0003 degrees which are about 72 ms apart. Hence, one calculation per 55 ms is great.
If you just wanted to take a single accurate snapshot (+/- 0.0003 degrees ) then you could predict that time and direction anywhere along the line then just point and wait for the appointed time which would only last for a matter of milliseconds.
For a regular solar panel though, it works well if you can aim it within a degree or so considering the law of cosines means you don't lose too much power for small errors. The sun moves about 1 degrees in 4 minutes. (360deg/24hr/60)=0.25 degrees per minute. Hence, one calculation per 4 minutes is fine.
A carefully aligned 1-axis is less complex and only needs a single motor running at a fixed speed in sync with the sun rotation speed with only occasional adjustments of the axis for seasonal variation. (But where's the fun in that?).
You have pretty stringent requirements there.
An ARM as on the Raspberry Pi looks like it could take care of the heavy floating point number crunching in the time available.
Then you have the problems of motor position control and monitoring which my gut tells me could be doable on the raspi but would be much easier on a the Propeller.
I would look using both. ARM for number crunching, Prop for motor/servo control and other hardware handling.
Connect the two via serial link.
As a bonus the raspi does not drive VGA, you need a TV or HDMI monitor, but the Prop can drive VGA as a text terminal or with limited graphics.
Best of both worlds "Propeller Pi".
Oh yeah.
Firstly the Pi processor is running at 700MHz and pretty much executes one instruction per clock (float ops may be slower), meanwhile the Prop has only an 80MHz clock and executes a PASM instruction every 4 clocks, 20MHz. There is a factor of 35 in speed already and the Prop execution time could be expected to be 35 * 0.055 = 1.9 seconds.
Now, any large piece of code, like your algorithm, will not fit into the Props COG space where it can be run at full speed but must be run by fetching its instructions from the larger HUB RAM into COG and then executing them. This must be done with some simple software loop, the so called Large Memory Model (LMM). That fetch execute loop effectively slows execution of your actual code by a factor of 4 or so. So now we have a speed reduction factor of 35 * 4 = 140.
(One could also write larger programs in Spin or other language that compiles to byte codes but then your byte code execution is a factor of 50 to 100 slower.)
If that were all there were to it your algorithm that takes 55m on raspi would take about 8 seconds on the Prop.
Finally the Prop has no floating point hardware support, all floating point ops must be done in software using the available integer ops. No idea of how many instructions it takes to do a double precision floating point addition on the Prop, could be 50 to 100 say. And I have no idea how many clocks the ARM takes to do float op. I suspect more than one. So here I cannot compare very well any more.
I would guess a slow down of, say, 100, that brings the Prop version up to 800 seconds or 13 minutes!!!
Hmm..so looking at it that way I'm surprised the Prop is doing as well as it does. Especially as we haven't even started to think about the speed of fetching code from external RAM which seems to be required in this case.
How am I going wrong in my comparison here? Is caching code in COG helping things a lot here?
Running this on my 2GHz PC takes 0.77 seconds or 0.07ms per iteration
On the Raspberry Pi that's 14.2 seconds or 1.4ms per iteration.
Seems that my previous raspi time of 50ms is very misleading as it includes all the printf output.
Attached is what I am running here. How does it go on a Prop? You might want to reduce the iteration count if it is taking forever.