Shop OBEX P1 Docs P2 Docs Learn Events
Compiling HLL to PASM possible? — Parallax Forums

Compiling HLL to PASM possible?

SRLMSRLM Posts: 5,045
edited 2012-03-07 10:40 in Propeller 1
I would like to write code in a high level language (C/C++, BASIC, etc.) and have it compiled down to assembly. This would then be loaded into a cog and run as native assembly code.

The question: I would like to know which compiler(s) can compile directly to Propeller assembly.

I understand the limitations inherent in doing this (small code size, less efficient than pure assembly, etc.) I'm not looking for any interpreter type setup or LMM, or advanced language features such as objects, functions, etc. I want to be able to compile down so that I can write my code (math routines to be performed in a fast loop) in the high level language and not have to worry too much about the assembly code. I'd probably still do some tweaks and modifications, but it would be nice to start with the HLL.

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2012-03-02 17:35
    PropBasic can do that. It can produce both PASM and LMM and, because it uses Spin/PASM as the intermediate source code, you can include Spin code that gets passed through to the Spin compiler. Some cogs can run straight compiled PASM, some LMM, and some Spin..
  • KyeKye Posts: 2,200
    edited 2012-03-02 21:44
    GCC? Google prop GCC.
  • Heater.Heater. Posts: 21,230
    edited 2012-03-03 01:43
    propgcc can compile C/C++ down to PASM that runs in a COG.

    If you look in the propgcc demos you will find a FullDuplexSerial driver written in C that runs in COG and works at 115200 baud.

    In some cases you don't even have to do anything special to get code to run in cog. In the propgcc demos there is a Fast Fourier Transform the core loop of which gets loaded in to COG to run at full PASM speed. Have a look for "fcahe" when you find the propgcc threads.

    This is actually quite amazing. I had always argued that compiling C/C++ to COG was going to be useless as the code would be to big and there is no stack or indexed addressing etc to help the C compiler. All in all more work that it's worth. But the guys did it anyway and it works very well.
  • SRLMSRLM Posts: 5,045
    edited 2012-03-04 19:03
    So, I've been able to review the two options posted, and do a side by side comparison of their features and the code they produce. GCC seemed to be better, mostly because it can do certain optimizations.

    [size=+2]PropBasic[/size]

    I used PropBasic version 00.01.14 (2011-07-26) to test with. The source code that I used is a simple program to multiply some numbers together, add them, and divide with them. This goes well with the math intensive but no I/O application that I need. It's probably not a good benchmark to use if you're going to be doing complicated serial communication or anything with delays, I/O, etc.

    A side note about PropBasic: the syntax is a bit quirky. It requires that your code have only one operator/statement per line. So "num = a+b+c" is out. It's odd, but easy enough to work with.
    DEVICE P8X32A, XTAL1, PLL16X
    FREQ 80_000_000
    
    num1	VAR LONG
    num2	VAR LONG
    num3	VAR LONG
    result0	VAR LONG
    
    PROGRAM Start
    
    Start:
      DO
        num3 = num3 * num3
      	num2 = num2 * num2
      	num1 = num1 * num1
      	result0 = num1 + num2
      	result0 = result0 + num3
      	result0 = result0 / num3
      LOOP
    END
    

    I used the following command to test with:
    ./PropBasic-bst.linux test.pbas
    

    There doesn't seem to be any command line options to use. Anyway, that generated the following Spin file:
    '{$BST PATH {REMOVED FROM POSTING}}
    ''  *** COMPILED WITH PropBasic VERSION 00.01.14  July 26, 2011 ***
                                                                 '' This program tests the compiler for PropBasic.
    
                                                                 '' Is result a command???
    
    
    
    CON                                                          'DEVICE P8X32A, XTAL1, PLL16X
      _ClkMode = XTAL1 + PLL16X                                 
    
      _XInFreq =   5000000                                       'FREQ 80_000_000
    
    
    ' num1	VAR LONG                                              'num1	VAR LONG
    
    ' num2	VAR LONG                                              'num2	VAR LONG
    
    ' num3	VAR LONG                                              'num3	VAR LONG
    
    ' result0	VAR LONG                                           'result0	VAR LONG
    
    
    PUB __Program                                                'PROGRAM Start
      CogInit(0, @__Init, @__DATASTART)                         
                                                                
    DAT                                                         
                      org           0                           
    __Init                                                      
    __RAM                                                       
                      mov           dira,__InitDirA             
                      mov           outa,__InitOutA             
                      jmp           #Start                      
    
    
    Start                                                        'Start:
    
    __DO_1                                                       '  DO
    
                      mov           __temp1,num3                 '    num3 = num3 * num3
                      mov           __temp2,num3                
                      abs           __temp1,__temp1 WC          
                      muxc          __temp3,#1                  
                      abs           __temp2,__temp2 WC, WZ      
        IF_C          xor           __temp3,#1                  
                      mov           __temp4,#0                  
                      mov           __temp5,#32                 
                      shr           __temp1,#1 WC               
    __L0001                                                     
        IF_C          add           __temp4,__temp2 WC          
                      rcr           __temp4,#1 WC               
                      rcr           __temp1,#1 WC               
                      djnz          __temp5,#__L0001            
                      test          __temp3,#1 WZ               
        IF_NZ         neg           __temp4,__temp4             
        IF_NZ         neg           __temp1,__temp1 WZ          
        IF_NZ         sub           __temp4,#1                  
                      mov           num3,__temp1                
    
                      mov           __temp1,num2                 '  	num2 = num2 * num2
                      mov           __temp2,num2                
                      abs           __temp1,__temp1 WC          
                      muxc          __temp3,#1                  
                      abs           __temp2,__temp2 WC, WZ      
        IF_C          xor           __temp3,#1                  
                      mov           __temp4,#0                  
                      mov           __temp5,#32                 
                      shr           __temp1,#1 WC               
    __L0002                                                     
        IF_C          add           __temp4,__temp2 WC          
                      rcr           __temp4,#1 WC               
                      rcr           __temp1,#1 WC               
                      djnz          __temp5,#__L0002            
                      test          __temp3,#1 WZ               
        IF_NZ         neg           __temp4,__temp4             
        IF_NZ         neg           __temp1,__temp1 WZ          
        IF_NZ         sub           __temp4,#1                  
                      mov           num2,__temp1                
    
                      mov           __temp1,num1                 '  	num1 = num1 * num1
                      mov           __temp2,num1                
                      abs           __temp1,__temp1 WC          
                      muxc          __temp3,#1                  
                      abs           __temp2,__temp2 WC, WZ      
        IF_C          xor           __temp3,#1                  
                      mov           __temp4,#0                  
                      mov           __temp5,#32                 
                      shr           __temp1,#1 WC               
    __L0003                                                     
        IF_C          add           __temp4,__temp2 WC          
                      rcr           __temp4,#1 WC               
                      rcr           __temp1,#1 WC               
                      djnz          __temp5,#__L0003            
                      test          __temp3,#1 WZ               
        IF_NZ         neg           __temp4,__temp4             
        IF_NZ         neg           __temp1,__temp1 WZ          
        IF_NZ         sub           __temp4,#1                  
                      mov           num1,__temp1                
    
                      mov           result0,num1                 '  	result0 = num1 + num2
                      adds          result0,num2                
    
                                                                 '  	result0 = result0 + num3
                      adds          result0,num3                
    
                      mov           __temp1,result0              '  	result0 = result0 / num3
                      mov           __temp2,num3                
                      mov           __temp4,#0                  
                      abs           __temp1,__temp1 WC          
                      muxc          __temp5,#1                  
                      abs           __temp2,__temp2 WC, WZ      
        IF_Z          mov           __temp1,#0                  
        IF_Z          jmp           #__L0004                    
        IF_C          xor           __temp5,#1                  
                      mov           __temp3,#0                  
                      min           __temp2,#1                  
    __L0005                                                     
                      add           __temp3,#1                  
                      shl           __temp2,#1 WC               
        IF_NC         jmp           #__L0005                    
                      rcr           __temp2,#1                  
    __L0006                                                     
                      cmpsub        __temp1,__temp2 WC          
                      rcl           __temp4,#1                  
                      shr           __temp2,#1                  
                      djnz          __temp3,#__L0006            
                      test          __temp5,#1 WZ               
        IF_NZ         neg           __temp4,__temp4             
        IF_NZ         neg           __temp1,__temp1             
    __L0004                                                     
                      mov           result0,__temp4             
    
                      jmp           #__DO_1                      '  LOOP
    __LOOP_1                                                    
    
                      mov           __temp1,#0                   'END
                      waitpne       __temp1,__temp1             
    
    
    '**********************************************************************
    __InitDirA       LONG %00000000_00000000_00000000_00000000
    __InitOutA       LONG %00000000_00000000_00000000_00000000
    _FREQ            LONG 80000000
    
    
    __remainder
    __temp1          RES 1
    __temp2          RES 1
    __temp3          RES 1
    __temp4          RES 1
    __temp5          RES 1
    __param1         RES 1
    __param2         RES 1
    __param3         RES 1
    __param4         RES 1
    __paramcnt       RES 1
    num1             RES 1
    num2             RES 1
    num3             RES 1
    result0          RES 1
    
    FIT 492
    
    CON
      LSBFIRST                         = 0
      MSBFIRST                         = 1
      MSBPRE                           = 0
      LSBPRE                           = 1
      MSBPOST                          = 2
      LSBPOST                          = 3
    
    DAT
    __DATASTART
    

    I think the compiler did a good job of being faithful to the original code, but I noticed some things:
    1. Every source code line is in the .spin file as a comment, which is very helpful.
    2. The multiplication and division is done inline, so each additional multiplication consumes 18 longs. It does share temporary variables however.
    3. All variables are stored in cog RAM, and user defined variables use the user defined name.
    4. The compiler added the remnants of some serial communication code: three longs at "__RAM" and a constants block.
    5. The code is nicely formatted straight from the compiler (although it uses spaces instead of tabs).

    [size=+2]Propeller GCC[/size]

    I used the most recent (and only) version posted in the GCC downloads page (v0_2_3 from 2012-02-08). The source program I used was the same as from the PropBasic, except modified a bit for C.
    #if defined(__propeller__)
    #include <propeller.h>
    #define int32_t int
    #define int16_t short int
    #else
    #endif
    
    
    int main()
    {
    
    	for(;;){
    		volatile int num1, num2, num3, result0;
    	
    		num3 = num3 * num3;
    	  	num2 = num2 * num2;
    	  	num1 = num1 * num1;
    	  	result0 = num1 + num2;
    	  	result0 = result0 + num3;
    	  	result0 = result0 / num3;
    	}
    }
    

    I based it off the fft_bench.c demo, which is why it has the various preprocessor statements at the begining. Note the use of the keyword "volatile" for the int declaration: wihtout it the compiler simply optimized away everything into a simple jump loop.

    Anyway, I used the following command to generate the code:
    propeller-elf-gcc -Os -S -mcog -mspin test.c
    

    The options do the following:
    -0s: optimize code for minimum size
    -S: output source code as a file
    -mcog: use the cog memory model (put everything in a single cog)
    -mspin: generate the resulting spin file

    There is also the -mfcache option, but in this case it did not generate code any differently.

    And, when run it generated the following spin code:
    '' spin code automatically generated by gcc
    CON
      _clkmode = xtal1+pll16x
      _clkfreq = 80_000_000
      __clkfreq = 0  '' pointer to clock frequency
     '' adjust STACKSIZE to how much stack your program needs
      STACKSIZE = 256
    VAR
      long cog  '' cog that was started up by start method
      long stack[STACKSIZE]
      '' add parameters here
      long param
    
    '' add any appropriate methods below
    PUB start
      stop
      cog := cognew(@entry, @param) + 1
    PUB stop
      if cog
        cogstop(cog~ - 1)
    
    DAT
    	org
    
    entry
    r0	mov	sp,PAR
    r1	mov	r0,sp
    r2	jmp	#_main
    r3	long 0
    r4	long 0
    r5	long 0
    r6	long 0
    r7	long 0
    r8	long 0
    r9	long 0
    r10	long 0
    r11	long 0
    r12	long 0
    r13	long 0
    r14	long 0
    lr	long 0
    sp	long 0
    	'.text
    	long
    	'global variable	_main
    _main
    	sub	sp, #16
    L_L2
    	mov	r5, #4
    	add	r5, sp
    	mov	r6, #8
    	add	r6, sp
    	mov	r7, #12
    	add	r7, sp
    	rdlong	r0, r5
    	rdlong	r1, r5
    	call	#__MULSI
    	wrlong	r0, r5
    	rdlong	r0, r6
    	rdlong	r1, r6
    	call	#__MULSI
    	wrlong	r0, r6
    	mov	r6, #8
    	add	r6, sp
    	rdlong	r0, r7
    	rdlong	r1, r7
    	call	#__MULSI
    	wrlong	r0, r7
    	mov	r7, #12
    	add	r7, sp
    	rdlong	r7, r7
    	rdlong	r6, r6
    	add	r7, r6
    	wrlong	r7, sp
    	rdlong	r7, sp
    	rdlong	r6, r5
    	add	r7, r6
    	wrlong	r7, sp
    	rdlong	r0, sp
    	rdlong	r1, r5
    	call	#__DIVSI
    	wrlong	r0, sp
    	jmp	#L_L2
    __MASK_0000FFFF	long	$0000FFFF
    __TMP0	long	0
    __MULSI
    	mov	__TMP0,r0
    	min	__TMP0,r1
    	max	r1,r0
    	mov	r0,#0
    __MULSI_loop
    	shr	r1,#1 wz,wc
      IF_C	add	r0,__TMP0
    	add	__TMP0,__TMP0
      IF_NZ	jmp	#__MULSI_loop
    __MULSI_ret	ret
    __MASK_00FF00FF	long	$00FF00FF
    __MASK_0F0F0F0F	long	$0F0F0F0F
    __MASK_33333333	long	$33333333
    __MASK_55555555	long	$55555555
    __CLZSI	rev	r0,#0
    __CTZSI	neg	__TMP0,r0
    	and	__TMP0,r0 wz
    	mov	r0,#0
    	IF_Z	mov	r0,#1
    	test	__TMP0, __MASK_0000FFFF wz
    	IF_Z	add	r0,#16
    	test	__TMP0, __MASK_00FF00FF wz
    	IF_Z	add	r0,#8
    	test	__TMP0, __MASK_0F0F0F0F wz
    	IF_Z	add	r0,#4
    	test	__TMP0, __MASK_33333333 wz
    	IF_Z	add	r0,#2
    	test	__TMP0, __MASK_55555555 wz
    	IF_Z	add	r0,#1
    __CLZSI_ret ret
    __DIVR	long	0
    __DIVCNT	long	0
    __UDIVSI
    	mov	__DIVR,r0
    	call	#__CLZSI
    	neg	__DIVCNT,r0
    	mov	r0,r1
    	call	#__CLZSI
    	add	__DIVCNT,r0
    	mov	r0,#0
    	cmps	__DIVCNT,#0 wz,wc
      IF_C	jmp	#__UDIVSI_done
    	shl	r1,__DIVCNT
    	add	__DIVCNT,#1
    __UDIVSI_loop
    	cmpsub	__DIVR,r1 wz,wc
    	addx	r0,r0
    	shr	r1,#1
    	djnz	__DIVCNT,#__UDIVSI_loop
    __UDIVSI_done
    	mov	r1,__DIVR
    __UDIVSI_ret	ret
    __DIVSGN	long	0
    __DIVSI	mov	__DIVSGN,r0
    	xor	__DIVSGN,r1
    	abs	r0,r0 wc
    	muxc	__DIVSGN,#1 wc
    	abs	r1,r1
    	call	#__UDIVSI
    	cmps	__DIVSGN,#0 wz,wc
    	IF_B	neg	r0,r0
    	test	__DIVSGN,#1 wz
    	IF_NZ	neg	r1,r1
    __DIVSI_ret	ret
    
    

    Some things that I have noticed about this code:
    1. The output lacks suitable comments, and the resultant code is rather difficult to understand. It doesn't use original variable names.
    2. It creates a multiplication subroutine. This is slightly less efficient in execution time than putting it inline, but it is vastly more efficient on space.
    3. The code stores variables in the hub, not the cog as expected.
    4. The -Os option appears to be needed: with no optimization the output code is 192 lines. Interstingly, -O2 gives the same output as -0s.
    5. The multiply loop ("__MULSI") is very compact (9 longs). It looks like it is O(1). It is also only 4 lines, so at most it will take 32*4 cycles to complete. I'm not sure how it works yet though (especially with a sign).
    6. The divide routine is a bit more expensive: 51 longs. To support it though, the loop ("__UDIVSI") is as efficient as the multiply loop.
    7. GCC isn't very efficient in memory management from the default: it creates a 256 long hub stack and a 16 long cog stack frame. This could probably be cleaned up manually.
    8. It's missing a "FIT" statement at the end.
    9. The generated code isn't very well formatted.

    Next, I tried a slightly modified source:
    #if defined(__propeller__)
    #include <propeller.h>
    #define int32_t int
    #define int16_t short int
    #else
    #endif
    
    
    int main()
    {
    
    	for(;;){
    		int num1, num2, num3;
    		volatile int result0;
    	
    		num3 = num3 * num3;
    	  	num2 = num2 * num2;
    	  	num1 = num1 * num1;
    	  	result0 = num1 + num2;
    	  	result0 = result0 + num3;
    	  	result0 = result0 / num3;
    	}
    }
    

    Note here that the only variable marked volatile is result0. I compiled with
    propeller-elf-gcc -Os -S -mcog -mspin test.c
    

    And got the following output:
    '' spin code automatically generated by gcc
    CON
      _clkmode = xtal1+pll16x
      _clkfreq = 80_000_000
      __clkfreq = 0  '' pointer to clock frequency
     '' adjust STACKSIZE to how much stack your program needs
      STACKSIZE = 256
    VAR
      long cog  '' cog that was started up by start method
      long stack[STACKSIZE]
      '' add parameters here
      long param
    
    '' add any appropriate methods below
    PUB start
      stop
      cog := cognew(@entry, @param) + 1
    PUB stop
      if cog
        cogstop(cog~ - 1)
    
    DAT
    	org
    
    entry
    r0	mov	sp,PAR
    r1	mov	r0,sp
    r2	jmp	#_main
    r3	long 0
    r4	long 0
    r5	long 0
    r6	long 0
    r7	long 0
    r8	long 0
    r9	long 0
    r10	long 0
    r11	long 0
    r12	long 0
    r13	long 0
    r14	long 0
    lr	long 0
    sp	long 0
    	'.text
    	long
    	'global variable	_main
    _main
    	sub	sp, #4
    L_L2
    	mov	r1, r7
    	mov	r0, r7
    	call	#__MULSI
    	mov	r7, r0
    	mov	r1, r4
    	mov	r0, r4
    	call	#__MULSI
    	mov	r1, r5
    	mov	r4, r0
    	mov	r0, r5
    	call	#__MULSI
    	mov	r6, r0
    	add	r6, r4
    	mov	r5, r0
    	mov	r1, r7
    	wrlong	r6, sp
    	rdlong	r6, sp
    	add	r6, r7
    	wrlong	r6, sp
    	rdlong	r0, sp
    	call	#__DIVSI
    	wrlong	r0, sp
    	jmp	#L_L2
    __MASK_0000FFFF	long	$0000FFFF
    __TMP0	long	0
    __MULSI
    	mov	__TMP0,r0
    	min	__TMP0,r1
    	max	r1,r0
    	mov	r0,#0
    __MULSI_loop
    	shr	r1,#1 wz,wc
      IF_C	add	r0,__TMP0
    	add	__TMP0,__TMP0
      IF_NZ	jmp	#__MULSI_loop
    __MULSI_ret	ret
    __MASK_00FF00FF	long	$00FF00FF
    __MASK_0F0F0F0F	long	$0F0F0F0F
    __MASK_33333333	long	$33333333
    __MASK_55555555	long	$55555555
    __CLZSI	rev	r0,#0
    __CTZSI	neg	__TMP0,r0
    	and	__TMP0,r0 wz
    	mov	r0,#0
    	IF_Z	mov	r0,#1
    	test	__TMP0, __MASK_0000FFFF wz
    	IF_Z	add	r0,#16
    	test	__TMP0, __MASK_00FF00FF wz
    	IF_Z	add	r0,#8
    	test	__TMP0, __MASK_0F0F0F0F wz
    	IF_Z	add	r0,#4
    	test	__TMP0, __MASK_33333333 wz
    	IF_Z	add	r0,#2
    	test	__TMP0, __MASK_55555555 wz
    	IF_Z	add	r0,#1
    __CLZSI_ret ret
    __DIVR	long	0
    __DIVCNT	long	0
    __UDIVSI
    	mov	__DIVR,r0
    	call	#__CLZSI
    	neg	__DIVCNT,r0
    	mov	r0,r1
    	call	#__CLZSI
    	add	__DIVCNT,r0
    	mov	r0,#0
    	cmps	__DIVCNT,#0 wz,wc
      IF_C	jmp	#__UDIVSI_done
    	shl	r1,__DIVCNT
    	add	__DIVCNT,#1
    __UDIVSI_loop
    	cmpsub	__DIVR,r1 wz,wc
    	addx	r0,r0
    	shr	r1,#1
    	djnz	__DIVCNT,#__UDIVSI_loop
    __UDIVSI_done
    	mov	r1,__DIVR
    __UDIVSI_ret	ret
    __DIVSGN	long	0
    __DIVSI	mov	__DIVSGN,r0
    	xor	__DIVSGN,r1
    	abs	r0,r0 wc
    	muxc	__DIVSGN,#1 wc
    	abs	r1,r1
    	call	#__UDIVSI
    	cmps	__DIVSGN,#0 wz,wc
    	IF_B	neg	r0,r0
    	test	__DIVSGN,#1 wz
    	IF_NZ	neg	r1,r1
    __DIVSI_ret	ret
    

    This is much better: the output no longer has a bunch of RDLONG and WRLONGs, and is hench much more efficient. Previously, the main loop was 34 lines (many of which are hub access), and now it is 22 lines. The other comments still apply though. Also as before, -mfcache did not change the output code.

    [size=+2]Conclusion[/size]

    I think I will look into Propeller GCC more. It seems to do a good job for compiling down to efficient Propeller assembly, and it isn't too hard to read the output. I hope that it will be improved over time as well. The PropBasic compiler has a more understandable output, but the inefficient use of cog RAM and the lack of updates (no changes in 8 months) has me worried. Propeller GCC seems to fit my requirements.
  • CircuitsoftCircuitsoft Posts: 1,166
    edited 2012-03-06 10:32
    I don't think the PropGCC output is designed to be easily human-readable. It is intended to be quite efficient, and it can reorder operations if doing so makes the code more efficient. By that token, you can have one line of code split across several places in the source. Try compiling:
    #include <propeller.h>
    #include <stdint.h>
    
    void main()
    {
        for(;;)
        {
            int num1, num2, num3;
            volatile int result1, result2, result3;
    
            result1 = (num1*num1 + num2*num2 + num3*num3)/num1;
            result2 = (num1*num1 + num2*num2 + num3*num3)/num2;
            result3 = (num1*num1 + num2*num2 + num3*num3)/num3;
        }
    }
    

    As written, it's quite inefficient, but I'd bet that GCC will make a temporary variable for num1*num1+num2*num2+num3*num3.
  • Dave HeinDave Hein Posts: 6,347
    edited 2012-03-06 14:39
    Here's the generated assembly code with -mcog -O2
    	.text
    	.balign	4
    	.global	_main
    _main
    	sub	sp, #12
    	mov	r5, #0
    	mov	r1, r5
    	mov	r0, r5
    	mov	r6, #0
    	mov	r4, #0
    	call	#__MULSI
    	mov	r7, r0
    	add	r7, r0
    	add	r7, r0
    	mov	r1, r6
    	mov	r0, r7
    	call	#__DIVSI
    	mov	r1, r6
    	mov	r5, r0
    	mov	r0, r7
    	call	#__DIVSI
    	mov	r6, r0
    	mov	r1, r4
    	mov	r0, r7
    	call	#__DIVSI
    	mov	r7, r0
    .L2
    	mov	r4, #8
    	add	r4, sp
    	wrlong	r5, r4
    	mov	r4, #4
    	add	r4, sp
    	wrlong	r6, r4
    	wrlong	r7, sp
    	jmp	#.L2
    
    EDIT: Of course, if you also specify -Wall you will get warnings that num1, num2 and num3 are uninitialized and result1, result2 and result3 are unused.
  • SRLMSRLM Posts: 5,045
    edited 2012-03-07 09:43
    @Circuitsoft

    Even if it reorders operations and breaks up lines of code the compiler still has to make a syntax tree, which has enough information to output useful comments. Some comments are better than none, especially when it changes the logic and order of the operations.

    @Dave Hein

    If you include the flags "-mspin" and "-S" it should make the complete assembly code (ie, code with the support assembly in there instead of just the business logic directly from the code).
  • Dave HeinDave Hein Posts: 6,347
    edited 2012-03-07 10:40
    SRLM wrote: »
    If you include the flags "-mspin" and "-S" it should make the complete assembly code
    Thanks for the tip. I wasn't aware of the -mspin flag.

    EDIT: Oh, I see you mentioned it in your previous post on the March 4, and you also posted the output from PropGCC. Sorry I missed that. I usually skim through any post that is more a dozen lines or so. I should have read it in more detail.
Sign In or Register to comment.