Why can't we work out stack space at compile time?
PointDog
Posts: 22
Hi all, I hope I'm not starting a new thread for nothing, I've searched the old posts and can't turn anything up.
I am wondering why the stack space required for an object, or mehod called by the cognew command can't be calculated at compile time.
All other languages use a simple call tree generator to map out the possible call paths, and then the stack space is the largest of the branches.
This is mildly inefficient as some calls that are mutualy exclusive in external hardware logic, are not in code. This however is out weighed by the amount of trouble you get when you are searching for the cognew(@needle_in_a_haystack, @stack_too_small) procedure that is overwriting another procedures stack, or worse still another procedure on another cog.
It seems very backward to compile an application into the u-controller, guess the stack size, and use another object to check that your guess was correct. Worse still is if you make a mistake and don't run the procedure to its call limits during testing(which can be very hard to keep track of on paper), you may get an incorrect result.
I've written code that compiles to over 30k on a pic, using about 1.5k of ram, and it looks as though calculating stack sizes if I port the code would take weeks.
Perhaps someone knows the reason why it has to be this way?
I am considering writing a program to calculate the stack size from the spin source, but because no-one has attempted it, I fear that I have missed something, so if anyone knows why the stack size can't be calculated at compile time, let me know.
I am wondering why the stack space required for an object, or mehod called by the cognew command can't be calculated at compile time.
All other languages use a simple call tree generator to map out the possible call paths, and then the stack space is the largest of the branches.
This is mildly inefficient as some calls that are mutualy exclusive in external hardware logic, are not in code. This however is out weighed by the amount of trouble you get when you are searching for the cognew(@needle_in_a_haystack, @stack_too_small) procedure that is overwriting another procedures stack, or worse still another procedure on another cog.
It seems very backward to compile an application into the u-controller, guess the stack size, and use another object to check that your guess was correct. Worse still is if you make a mistake and don't run the procedure to its call limits during testing(which can be very hard to keep track of on paper), you may get an incorrect result.
I've written code that compiles to over 30k on a pic, using about 1.5k of ram, and it looks as though calculating stack sizes if I port the code would take weeks.
Perhaps someone knows the reason why it has to be this way?
I am considering writing a program to calculate the stack size from the spin source, but because no-one has attempted it, I fear that I have missed something, so if anyone knows why the stack size can't be calculated at compile time, let me know.
Comments
Edit: you beat me to it Mike!
-Phil
Mike
Sorry if I seem a bit thick on the subject, I'm trying to migrate from the PIC because my applications are really pushing the chip to its limits, so some of the concepts of the propeller are new to me.
You're quite right about the amount of stack required for a call being the # temps at the point of call + return + # parameters + maximum stack used by the procedure being called. The problem is that you have to look for the worst case call in the procedure being analyzed rather than just using a static "who called whom" tree which the compiler already needs to determine which procedures to leave out because they're not ever used. This is not an insurmountable problem, but might require a rewrite of large parts of the compiler depending on how it's organized.
Mike
The Propeller is a 32 bit computer, but can handle byte and 16-bit word data just as easily in SPIN (the actual COG instruction set provides 32 bit arithmetic only). This all impacts how you do the conversion.
SPIN and C are not very different. They have similar operators and function at a similar level (conceptually). It's a no-brainer to go through a C program and convert it line by line into SPIN. The real time consuming part is in the I/O functionality. Given the way the Propeller works, maybe you could do things much more efficiently making use of the Propeller's power. On the other hand, if the PIC you're using has a fancy I/O function block (like a USB interface for example), you might spend an inordinate amount of time reimplementing what MicroChip has already done using the a bare Propeller COG. Timers, frequency synthesizers, video output generators, 1-wire control, and UARTs are straightforward given that someone else has already done the work for the Propeller. You get the idea?
About the stack size. I wouldn't worry about it too much. You should have a rough idea of the depth of your static call tree just based on how you've written your C program. Figure out the average number of parameters on a call, add 3 to account for the return address, result variable (whether used or not), and the back chain, then you have a starting value for the stack size. Double or triple it for testing purposes. You can always put in a check free stack space routine that's called at the "leaves" of the call tree and saves the minimum free space for analysis later.
If you'd be willing to share some more detailed information about your project, I'd be happy to offer a more focused opinion. Feel free to do it via Private Message here.
Assembler on a single Cog in the Propeller will probably not outperform your PIC, unless you're doing a lot of multibyte arithmetic. The Propeller requires four clocks for most instructions; iirc, PICs run most instructions in a single cycle. So, you'd be moving from 40MIPS to 20MIPS.
Of course, you've got eight cores.
You may be disappointed. SPIN is an interpreted language, and is hundreds of times slower than native-compiled C code.
No C compiler currently targets the Propeller. (In fact, no compilers target the Propeller's native instruction set at all, with the exception of some research projects I haven't released yet.)
That said, converting C to SPIN should be straightforward if that's what you want to do. SPIN is spookily similar to BCPL, which the original C dialect was translated to before compilation. You could even compile it directly to SPIN bytecodes, though the only reference materials are mine and they're not officially supported.
Parallax has been quite friendly to open-source and third-party tools targeting the Propeller; I wouldn't expect resistance from them.
Thats exactly my problem, it's a tremendous amount of work to shift processors, But I'm using the fastest pic available, unless I jump to the dspic, which requires a new compiler, and learning a new architecture anyway. I looked into the AVR's, but they dont seem to be too much faster. (16Mips). A lot of my code is fairly high level, the main portion being a FAT16 file system, so I figured that it may be a similar amount of work to write a converter as it would to port the code, however I don't yet know the architecture, or spin well enough to make a decision.
Mike, the project is a datalogger, used by electricity distribution companies. It records to a compact flash card. At the moment it has 2x 200ksps A/D convertes on SPI, an hitachi LCD, the CF card, and an FT245 usb-fifo because the microchip on-board usb looked like a lot of work to set up, test, and then write drivers for. The code has to continuously monitor a power line, and sample at a high frequency to record harmonic voltages, as well as AFLC load control signals (used by electricity companies to switch things like water heaters on and off), while its monitoring, and doing RMS calculations of the main waveform, and at least one harmonic frequency, it also has to record the data to a flash card using the FAT16 file system. The majority of my code is the fat16 file system (32k), containing procedures to format, chdir, mkdir, delete, open_file, and close_file. The hardware part is easy enough, but I dread porting the fat16 code by hand, and then testing it. It is also the part that uses the most ram, It has to buffer a 512 byte sector.
But its also the part that would benefit the most from the multi-core architecture, I would build a software FIFO into the buffer, and then one cog could take care of the filing system, one could take care of the continual RMS conversion, one could take over the SPI bus, but with two MOSI (SDI) pins so that the converters can talk at the same time (can't do that on a pic, software sdi uses too much of the processor).
Oh and the logger is an 8 channel unit, so the pic hat the moment has to do all that eight times.
The PIC can do it, but I've had to drop the harmonic and AFLC measurement function. I will be releasing the instrument without it, and then looking for a way to incorporate it in a revision.
I wish the propeller was out when I started this design, I would have written it in spin to start with.
You could use the USB-serial chip instead of the FIFO one to save some pins. The FullDuplexSerial routines sit in a single cog and can handle over 1MBps. You could even do floating point if you needed it at 40-60us per operation using 1-2 cogs.
Post Edited (Mike Green) : 10/27/2006 10:28:46 PM GMT
And is price a large factor in this project?
(I can't imagine that you'll be selling hundreds of thousands of them. Too specialized )
What about a FAT16/32-on-a-chip solution?
They also sells complete breakout-boards for hobbyists and prototyping...
You can read more about the product on DOSonCHIP
I mean, if the FAT16 part of the code is the largest bit, and there's a product available that can just... let you bypass that...
As for your Stack-size problems, there exists an object that can read how much of an object's stack is actually used.
Stack Monitor
I can't see why Phil didn't mention it himself, as it's quite an Excellent Piece Of Code...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...