Require specific Guide/Manual for Propeller 2 GPIO coding in c/c++

chintan_joshi · 2022-07-06 10:38

Hello Everyone,

I am embedded c/c++ developer and new to this site and just bought Propeller 2. I want to start with high frequency GPIO code using c++. But not able to find proper guide or user manual to start with Propeller 2. So it will be helpful if someone can provide link/document for Propeller2 GPIO coding using c++?

evanh · 2022-07-06 11:19

Most of your work will be getting familiar with the environment.
Lots of good links here - https://forums.parallax.com/discussion/173000/propeller-2-users-get-started-here/p1

Definitive hardware reference document is still Chip's silicon google doc - https://docs.google.com/document/d/1gn6oaT5Ib7CytvlZHacmrSbVBJsD9t_-kmvjd7nUR6o/edit?usp=sharing

And also at the end of his Spin2 doc, it defines common used compile time symbols - https://docs.google.com/document/d/16qVkmA6Co5fUNKJHF6pBfGfDupuRwDtf-wyieh_fbqw/edit

There is two third-party C compilers:

FlexC (flexspin) with FlexProp IDE - https://forums.parallax.com/discussion/164187/flexspin-compiler-for-p2-assembly-spin-basic-and-c-in-one-compiler/p1
Catalina C - https://forums.parallax.com/discussion/174158/catalina-ansi-c-for-the-propeller-1-2/p1

EDIT: Changed Catalina link

evanh · 2022-07-06 11:25

Here's a list from the FlexC manual for bit-bashing functions:

void _pinf(int pin);
Forces pin pin to float low.

void _pinl(int pin);
Makes pin pin an output and forces it low.

void _pinh(int pin);
Makes pin pin an output and forces it high.

void _pinnot(int pin);
Makes pin pin an output and inverts it.

void _pinrnd(int pin);
Makes pin pin an output and sets it to a random bit value.

int _pinr(int pin);
Makes pin pin an input and returns its current value (0 or 1). Only works for single pins. For multiple pins, use _pinread.

int _pinread(int pins);
Read one or more pins. pins can be a single pin from 0-63, or it can be a group of num pins starting at base specified as base + ((num-1)<<6). For reading a single pin, the _pinr function is more efficient.

void _pinw(int pin, int val);
Makes pin pin an output and writes val to it. val should be only 0 or 1; results for other values are undefined (that is, only a single pin is supported). For writing multiple pins, see _pinwrite.

void _pinwrite(int pins, int val);
pins can be a single pin from 0-63, or it can be a group of num pins starting at base specified as base + ((num-1)<<6). For writing a single pin, _pinw is more efficient.

chintan_joshi · 2022-07-06 11:33

Thank you very much for the support. Let me check all this.

n_ermosh · 2022-07-09 15:07

If you are looking for C++ specifically, I will plug my clang port as well: https://forums.parallax.com/discussion/171962/llvm-backend-for-propeller-2/p1

I have an (incomplete) smart pin library I am working on that I can point you to do some basic stuff, but it’s not documented yet.

chintan_joshi · 2022-07-11 06:21

Hy @Nikita, Thank you for pointing out p2llvm library. Currently i am working on to just build normal c code and access GPIO to get understanding of Propeller 2. Once i finish this i will be in better position to decide library to port my existing c++ code to Propeller 2 supported code.

chintan_joshi · 2022-08-01 06:25

Hello , Is there any specific library/ Sample code available for UART in c/c++ for propeller 2 Flexprop ide? i want to transmit image raw data from Raspberry pi to Propeller 2 on high speed. I have checked simpletools.h but there is no functions for UART imterface.

Also another thing is i want to make multiple GPIO pins high/LOW at same time , same like write to Port. I have used **set_outputs(End Pin, Start pin, 0b00111100); ** But this taking more time. I want to do it on faster way(In nano seconds). So is there any alternative available to write multiple pins at same time in high speed?

evanh · 2022-08-01 06:40

You'll inevitably be ending up with some inline assembly ... here's my super simple diag routines for sending text to terminal. It's written in Spin2 because FlexC can compile and integrate Spin natively. Therefore it is more portable for use with the wider Propeller community.

CON
    DIAGTXPIN = 62
    DIAGRXPIN = 63


PUB  baudinit( baud ) | x
    x := muldiv65( 64, clkfreq, baud ) << 10 | 7
    pinstart( DIAGTXPIN, P_ASYNC_TX | P_OE, x, 0 )
    pinstart( DIAGRXPIN, P_ASYNC_RX, x, 0 )


PUB  putch( char )
    org
.loop
        rqpin   inb, #DIAGTXPIN   wc    'transmiting? (C high == yes)  *Needed to initiate tx
        testp   #DIAGTXPIN   wz         'buffer free? (IN high == yes)
if_c_and_nz jmp #.loop                  'wait while Smartpin is both full (nz) and transmitting (c)
        wypin   char, #DIAGTXPIN        'write new byte to Y buffer
    end

PS: It makes use of the one-word (a byte by default) hardware buffer.

evanh · 2022-08-01 06:55

Oh, and here's the muldiv65():

PUB  muldiv65( mult1, mult2, divisor ) : result
' 32bit result = 32bit x 32bit / 32bit
' Rounding to nearest variant of muldiv64()
    org
        qmul    mult1, mult2
        mov     result, divisor
        shr     result, #1
        getqx   mult1               ' lower 32 bits of 64-bit intermediate
        getqy   mult2               ' upper 32 bits of 64-bit intermediate
        add     mult1, result   wc      ' round-to-nearest
        addx    mult2, #0           ' round-to-nearest
        setq    mult2
        qdiv    mult1, divisor
        getqx   result
    end

evanh · 2022-08-01 07:16

@chintan_joshi said:
Also another thing is i want to make multiple GPIO pins high/LOW at same time ...

I'm pretty certain the earlier doc I quoted isn't quite giving all the info. Eg: It should read:

void _pinh(int pin);
Makes pins pin outputs and forces them high.

The value of pin can include the addpins extension to specify a contiguous pin group. EDIT: Err, ADDPINS is a Spin2 term. It's just an extra bit-field in the pin number, the field of pin[10:6] specifies the number of pins in the group, with zero value meaning a single pin. So, in C, do this: pin = 12 | 3<<6 to address the four pins P12..P15.

PS; This is a hardware feature. It costs no extra instructions. Every dedicated pin instruction has this ADDPINS ability. Eg: _pinh(int pin); compiles to a single instruction.

chintan_joshi · 2022-08-01 10:42

Hello @evanh , Need little Help on pin = 12 | 3<<6.

I am using set_outputs like this,

set_outputs(19, 12, hex_t[index][i]);

hex_t[index][i] will have 8 bit hex value, and based on HEX value i am making pins 12 to 19 HIGH/LOW. so some pins will be High and some pins will be LOW.

So how can i assign runtime HEX value to pin variable? Please suggest.

evanh · 2022-08-01 11:27

Sounds like you want to place an 8-bit number onto a group of eight parallel pins. To do that bit-bash style the quickest way would be SETBYTE instruction on the OUTA/B registers. It only works on byte aligned boundaries, eg: Pins P8..P15, but most byte wide ops are easiest that way.

The universal way would be using SETQ+MUXQ combo on same registers instead. That's likely what the listed void _pinwrite(int pins, int val); is doing.

chintan_joshi · 2022-08-01 11:49

For _pinwrite() how can i add pins 12 to 19 in pins variable?

I am fine if i have to use P8 to P15 instead of 12 to 19 if that works.

So kindly share how can i use SETBYTE instruction as well for pins P8 to P15.

evanh · 2022-08-01 12:10

Oh, sorry, here: _pinwrite( 12|7<<6, val ); // write to pins P12..P19

evanh · 2022-08-01 12:21

Here's a quick hack up of function for using SETBYTE in C:

static void  pinout8( unsigned group, unsigned val )
{
    __asm {
        altsb   group, #0x1fc  // OUTA cog register
        setbyte val
        altsb   group, #0x1fa  // DIRA cog register
        setbyte #0xff
    }
}

PS: This works across OUTB and DIRB too, since they are adjoining registers, 0x1fd and 0x1fb respectively. The prefixing ALTSB instruction indexes across multiple registers.

chintan_joshi · 2022-08-02 10:41

Thank you very much for your reply @evanh .

Need help for SPI communication as well. For SPI Communication which pins can i use for Propeller2 for MOSI, MISO, CLK and CS? I am not able to find any specific Pins. I have checked schematic and found SD card pins have SPI Labeled.

Found (https://learn.parallax.com/tutorials/language/propeller-c/propeller-c-simple-protocols/spi-example), this link for SPI example but not sure for pins.

So can i use those pins like:

CS: 60
MOSI: 59
MISO : 58
CLK: 61

chintan_joshi · 2022-08-02 10:57

Hello , If i use _pinwrite() function to set/clear group of pins then its taking approax 500 ns, can i make it more fast? or its limitation of propeller2 library?

 _pinwrite( 12|7<<6,  hex_t[index][i] ); // write to pins P12..P19

As per attached screenshot, PDATAC, PDATAM and PDATAY are setting up with _pinwrite() and its taking approax 500 ns.

JonnyMac · 2022-08-02 13:32

Need help for SPI communication as well. For SPI Communication which pins can i use for Propeller2 for MOSI, MISO, CLK and CS?

With P2 smart pins you can use any IO you want for SPI. Sorry, I'm not a C programmer, but I can show you the process in Spin2 -- it will probably be very similar under FlexProp C.

  pinh(CSX)                                                     ' deselect

  m := P_SYNC_TX | P_OE                                         ' spi tx mode
  m |= ((SCLK-MOSI) & %111) << 24                               ' add clk offset (B pin)
  x := %1_00111                                                 ' start/stop mode, 8 bits
  pinstart(MOSI, m, x, 0)                                       ' activate smart pin
  pinf(MOSI)                                                    ' reset/disable until used

  m := P_PULSE | P_OE                                           ' spi clock mode (CPOL = 0)
  x.word[0] := 2 #> (clkfreq / 10_000_000) <# $FFFF             ' ticks in period (10 MHz)
  x.word[1] := x.word[0] >> 1                                   ' ticks in low cycle (50%)
  pinstart(SCLK, m, x, 0)                                       ' initialize smart pin

  pinl(MISO)                                                    ' not used

This example doesn't use MISO, but you can set it up and relate it to the SCLK pin as with MOSI (the mode is P_SYNC_RX). The only restriction is that the MOSI and MISO pins need to be within +/- three pins of SCLK.

pub spi_write(value)

  value ror= 8                                                  ' flip MSB/LSB for smart pin
  value rev= 31

  pinl(CSX)                                                     ' select device

  wypin(MOSI, value)                                            ' write value to SPI out
  pinl(MOSI)                                                    ' enable SPI smart pin
  wypin(SCLK, 8)                                                ' clock 8 bits
  repeat until pinr(SCLK)                                       ' let it finish
  pinf(MOSI)                                                    ' reset MOSI smart pin

  pinh(CSX)                                                     ' deselect device


pub pasm_spi_write(value)

  org
                        ror       value, #8                     ' flip MSB/LSB for smart pin
                        rev       value

                        drvl      #CSX                          ' select device

                        wypin     value, #MOSI                  ' write value to SPI out
                        drvl      #MOSI                         ' enable SPI smart pin
                        wypin     #8, #SCLK                     ' clock 8 bits
                        nop                                     ' let it start
                        testp     #SCLK                 wc      ' wait for it to finish
        if_nc           jmp       #$-1

                        fltl      #MOSI                         ' reset SPI smart pin

                        drvh      #CSX                          ' deselect device
  end

To write a byte, load the value into the MOSI pin and enable it, then load SCLK with the number of bits to clock. Reading the IN bit from the SCLK pin will tell you when its finished. Note, too, that the value is always output LSB first from the smart pin, so you may need to flip the bits before transmitting. If CPOL is 1, you can add P_INVERT_OUTPUT to the SCLK configuration bits.

evanh · 2022-08-03 05:41

@chintan_joshi said:
Hello , If i use _pinwrite() function to set/clear group of pins then its taking approax 500 ns, can i make it more fast? or its limitation of propeller2 library?

Here's some examples to get a feel for what's possible:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>


enum {
//  _xinfreq = 20_000_000,
    _xtlfreq = 20_000_000,
    _clkfreq = 4_000_000,
    DOWNLOAD_BAUD = 230_400,
    DEBUG_BAUD = DOWNLOAD_BAUD,

    PINS = 20|7<<6,
    GROUP = 1,
};



static uint32_t  pinout8_dma( unsigned pgroup, uint8_t *haddr, unsigned len )
{
    uint32_t  xfrq = 0x8000_0000;    // sysclock/1 data rate, top speed!
    uint32_t  m_tx = X_RFBYTE_8P_1DAC8 | pgroup<<20 | X_PINS_ON | len;  // byte wide transfers from hubRAM, via the FIFO
    uint32_t  ticks = _cnt();

__asm volatile {    // "volatile" enforces Fcache use, which is needed to free up the FIFO
        rdfast  #0, haddr    // setup the FIFO to read from hubRAM
        setxfrq xfrq    // set the transfer rate
        xinit   m_tx, #0    // do it!
        waitxfi   // wait for transfer to complete before returning back to hubexec, hubexec uses the FIFO
}

    return  _cnt() - ticks;
}


static uint32_t  pinout8_grp( unsigned pgroup, uint8_t *haddr, unsigned len )
{
    uint8_t  val;
    uint32_t  i, ticks = _cnt();

    for( i=0; i<len; i++ )  {
        val = *(haddr++);
__asm {
//__asm const {
        altsb   pgroup, #0x1fc  // OUTA cog register
        setbyte val
}
    }

    return  _cnt() - ticks;
}


static uint32_t  pinout8_bb( unsigned pins, uint8_t *haddr, unsigned len )
{
    uint8_t  val;
    uint32_t  i, ticks = _cnt();

    for( i=0; i<len; i++ )  {
        val = *(haddr++);
        _pinwrite( pins, val );
    }

    return  _cnt() - ticks;
}


static uint32_t  filldat( uint8_t *haddr, unsigned len )
{
    uint32_t  i, ticks = _cnt();

    for( i=0; i<len; i++ )  {
        *(haddr++) = (uint8_t)(len - i);
    }

    return  _cnt() - ticks;
}



void  main( void )
{
    uint8_t  buff[1000];

    _waitms( 200 );
    printf( "\n   clkfreq = %d   clkmode = 0x%x\n", _clockfreq(), _clockmode() );

    printf( "\narray size is %d bytes.  filldat() in %d ticks.\n", sizeof(buff), filldat( buff, sizeof(buff) ) );
    for(;;) // ever
    {
        _pinl( PINS );
        printf( "pinout8_bb() in %d ticks. ", pinout8_bb( PINS, buff, sizeof(buff) ) );
        _pinf( PINS );

        _pinl( GROUP<<3|7<<6 );
        printf( "pinout8_grp() in %d ticks. ", pinout8_grp( GROUP, buff, sizeof(buff) ) );
        printf( "pinout8_dma() in %d ticks.\n", pinout8_dma( GROUP, buff, sizeof(buff) ) );
        _pinf( GROUP<<3|7<<6 );

        _waitms( 800 );
    };
}

evanh · 2022-08-03 07:04

Expanded a couple more for demoing the optimiser. FlexC's default is -O1 optimising.

With default -O1 we get:

array size is 1000 bytes.  filldat() in 24324 ticks.
pinout8_bb() in 117040 ticks.
pinout8_grp() in 50002 ticks.
pinout8_grp1() in 43988 ticks.
pinout8_grp2() in 16320 ticks.
pinout8_dma() in 1080 ticks.

With -O0 --fcache=128 we get:

array size is 1000 bytes.  filldat() in 50096 ticks.
pinout8_bb() in 184080 ticks.
pinout8_grp() in 58064 ticks.
pinout8_grp1() in 46024 ticks.
pinout8_grp2() in 24368 ticks.
pinout8_dma() in 1144 ticks.

With -O2 we get:

array size is 1000 bytes.  filldat() in 22320 ticks.
pinout8_bb() in 32326 ticks.
pinout8_grp() in 43987 ticks.
pinout8_grp1() in 43958 ticks.
pinout8_grp2() in 16320 ticks.
pinout8_dma() in 1080 ticks.

chintan_joshi · 2022-08-03 11:04

Hello @evanh thank you very much for the help. pinout8_dma working very fast. But i am not able to understand the logic behind it. Sorry but It will be very good if you can explain this code to me so i can change this code according to my requirement?

Especially below line.

uint32_t m_tx = X_RFBYTE_8P_1DAC8 | pgroup<<20 | X_PINS_ON | len; // byte wide transfers from hubRAM, via the FIFO
And Assembly code as well.
Also pinout8_dma( GROUP, buff, sizeof(buff) );, why we are sending GROUP = 1 here? i can see code is changing pins (Pin Number 8 to 15) values.

static uint32_t pinout8_dma( unsigned pgroup, uint8_t *haddr, unsigned len )
{
uint32_t xfrq = 0x4000_0000; // sysclock/1 data rate, top speed!
uint32_t m_tx = X_RFBYTE_8P_1DAC8 | pgroup<<20 | X_PINS_ON | len; // byte wide transfers from hubRAM, via the FIFO
uint32_t ticks = _cnt();

__asm volatile { // "volatile" enforces Fcache use, which is needed to free up the FIFO
rdfast #0, haddr // setup the FIFO to read from hubRAM
setxfrq xfrq // set the transfer rate
xinit m_tx, #0 // do it!
waitxfi // wait for transfer to complete before returning back to hubexec, hubexec uses the FIFO
}

return  _cnt() - ticks;

}

chintan_joshi · 2022-08-03 11:19

@JonnyMac said:

Need help for SPI communication as well. For SPI Communication which pins can i use for Propeller2 for MOSI, MISO, CLK and CS?

With P2 smart pins you can use any IO you want for SPI. Sorry, I'm not a C programmer, but I can show you the process in Spin2 -- it will probably be very similar under FlexProp C.

  pinh(CSX)                                                     ' deselect

  m := P_SYNC_TX | P_OE                                         ' spi tx mode
  m |= ((SCLK-MOSI) & %111) << 24                               ' add clk offset (B pin)
  x := %1_00111                                                 ' start/stop mode, 8 bits
  pinstart(MOSI, m, x, 0)                                       ' activate smart pin
  pinf(MOSI)                                                    ' reset/disable until used

  m := P_PULSE | P_OE                                           ' spi clock mode (CPOL = 0)
  x.word[0] := 2 #> (clkfreq / 10_000_000) <# $FFFF             ' ticks in period (10 MHz)
  x.word[1] := x.word[0] >> 1                                   ' ticks in low cycle (50%)
  pinstart(SCLK, m, x, 0)                                       ' initialize smart pin

  pinl(MISO)                                                    ' not used

This example doesn't use MISO, but you can set it up and relate it to the SCLK pin as with MOSI (the mode is P_SYNC_RX). The only restriction is that the MOSI and MISO pins need to be within +/- three pins of SCLK.

pub spi_write(value)

  value ror= 8                                                  ' flip MSB/LSB for smart pin
  value rev= 31

  pinl(CSX)                                                     ' select device

  wypin(MOSI, value)                                            ' write value to SPI out
  pinl(MOSI)                                                    ' enable SPI smart pin
  wypin(SCLK, 8)                                                ' clock 8 bits
  repeat until pinr(SCLK)                                       ' let it finish
  pinf(MOSI)                                                    ' reset MOSI smart pin

  pinh(CSX)                                                     ' deselect device


pub pasm_spi_write(value)

  org
                        ror       value, #8                     ' flip MSB/LSB for smart pin
                        rev       value

                        drvl      #CSX                          ' select device

                        wypin     value, #MOSI                  ' write value to SPI out
                        drvl      #MOSI                         ' enable SPI smart pin
                        wypin     #8, #SCLK                     ' clock 8 bits
                        nop                                     ' let it start
                        testp     #SCLK                 wc      ' wait for it to finish
        if_nc           jmp       #$-1

                        fltl      #MOSI                         ' reset SPI smart pin

                        drvh      #CSX                          ' deselect device
  end

To write a byte, load the value into the MOSI pin and enable it, then load SCLK with the number of bits to clock. Reading the IN bit from the SCLK pin will tell you when its finished. Note, too, that the value is always output LSB first from the smart pin, so you may need to flip the bits before transmitting. If CPOL is 1, you can add P_INVERT_OUTPUT to the SCLK configuration bits.

Thank you very much for the SPI details. Will surely check this.

evanh · 2022-08-03 12:21

@chintan_joshi said:
1. uint32_t m_tx = X_RFBYTE_8P_1DAC8 | pgroup<<20 | X_PINS_ON | len; // byte wide transfers from hubRAM, via the FIFO

The symbols used there are listed in the Spin2 Document. They each pertain to streamer mode setting. The "streamer" is documented in the Silicon Document.

And Assembly code as well.

The basic structure of the streamer hardware is a DMA engine that shares the cog's databus to access hubRAM. Each cog has a partnering streamer.

And in front of the databus sits a "FIFO" that performs prefetching on hubRAM reads and buffered write-throughs on hubRAM writes. Certain ops need this FIFO, others don't. The streamer always needs it if accessing hubRAM. Half the DMA engine is this FIFO, since it performs the memory addressing for the streamer.

Also pinout8_dma( GROUP, buff, sizeof(buff) );, why we are sending GROUP = 1 here? i can see code is changing pins (Pin Number 8 to 15) values.

GROUP can be any value from 0 to 7. It specifies which byte sized pin group you want to place the bytes out on. It is intentionally restricted to the groups supported by the streamer modes. eg: GROUP=2 is pins P16..P23.

evanh · 2022-08-03 12:55

Okay, probably should mention why the FIFO needs to exist at all. It's because of the 8-way shared access between the eight cogs. In particular the streamers can't tolerate erratic memory timing.

Underneath all the cog databuses there is a cross-point switching matrix to allow total hubRAM access to all cogs at once. But each cog can only access the same hubRAM bank/slice in its turn/slot. This means there is a 1/8th initial fetch opportunity for each cog. However, once fetching sequential addresses, it can burst read or write at full sysclock to any and all cogs. The mechanism has been dubbed the "eggbeater". And the FIFO makes good use of that burst ability.

Of note is this layer of the eggbeater stacks onto the hubRAM read latency, over and above the 1/8th opportunity. RDFAST instruction can block for 20 sysclock ticks on FIFO filling. The instruction blocking can be skipped if you know there is at least that many ticks between the RDFAST issuing and first use of the FIFO.

evanh · 2022-08-03 13:23

I've added one more case to the test program - pinout8_grp3(). It uses the FIFO but not the streamer.

array size is 1000 bytes.  filldat() in 24324 ticks.
pinout8_bb() in 117040 ticks.
pinout8_grp() in 50034 ticks.
pinout8_grp1() in 44019 ticks.
pinout8_grp2() in 16320 ticks.
pinout8_grp3() in 6088 ticks.
pinout8_dma() in 1080 ticks.

evanh · 2022-08-05 04:34

Chintan,
Here's a generic calculation to convert whole clock dividers into NCO fractions for the streamer (SETXFRQ instruction). It performs the needed rounding to initiate first cycle at the correct clock ratio. Otherwise the non-powers-of-two will extend an extra sysclock tick for the first NCO cycle.

CON
    CLK_DIV = 1    ' sysclock/1 data rate, top speed!
    M_NCO = $8000_0000 +/ CLK_DIV + ($8000_0000 +// CLK_DIV > 0 ? 1 : 0)    ' round up

Again it's written in Spin2. The operators are explicitly unsigned (eg: the +// for unsigned remainder), since the only type in Spin is an undefined 32-bit integer.

EDIT: Here's the C version:

enum {
    CLK_DIV = 1UL,    // sysclock/1 data rate, top speed!
    M_NCO = 0x8000_0000UL / CLK_DIV + (0x8000_0000UL % CLK_DIV > 0UL ? 1UL : 0UL),    // round up
}

evanh · 2022-08-05 15:48

Oops, I'd forgotten what "volatile" does. It prevents optimisation, not anything to do with Fcache. pinout8_grp1() and pinout8_grp2() weren't doing what I thought at all. They were both Fcache'd, just that grp1 got optimised while grp2 didn't.

Or not. The testing tells me it's not that simple either. pinout8_grp2() has a clear speed up. And pinout8_grp3() is cycling on just three instructions! That can only happen if its loop was optimised into a REP block.

I'm thinking volatile's purpose there has changed as FlexC has grown. Docs say:

__asm volatile suppresses optimization of the assembly code (see below) and forces the code to be placed into FCACHE memory.

My guess is the first half of that statement has been revoked.

ersmith · 2022-08-05 19:18

@evanh said:
Oops, I'd forgotten what "volatile" does. It prevents optimisation, not anything to do with Fcache. pinout8_grp1() and pinout8_grp2() weren't doing what I thought at all. They were both Fcache'd, just that grp1 got optimised while grp2 didn't.

Or not. The testing tells me it's not that simple either. pinout8_grp2() has a clear speed up. And pinout8_grp3() is cycling on just three instructions! That can only happen if its loop was optimised into a REP block.

I'm thinking volatile's purpose there has changed as FlexC has grown. Docs say:

__asm volatile suppresses optimization of the assembly code (see below) and forces the code to be placed into FCACHE memory.

My guess is the first half of that statement has been revoked.

No, that's just a bug -- there were a few P2 specific optimizations that weren't checking the flag that says the instruction came from user's volatile asm code, and one of them was the DJNZ -> REP optimizatoin. That's fixed in github now.

evanh · 2022-08-06 04:46

Okay, cool. Updated my code to match that now. Another case added for a hand optimised REP loop. I've commented the source better now too.

array size is 1000 bytes.  filldat() in 24324 ticks.
pinout8_bb() in 119024 ticks.
pinout8_grp() in 50066 ticks.
pinout8_grp1() in 44018 ticks.
pinout8_grp2() in 24312 ticks.
pinout8_grp3() in 16320 ticks.
pinout8_grp4() in 6080 ticks.
pinout8_dma() in 1080 ticks.

chintan_joshi · 2022-08-09 05:43

Hello @evanh i have tried to modify code as per my requirement as shown below.

unsigned char d = hex_t[index][i];
__asm { // allowed to be optimised and hubexec inlined
altsb 2, #0x1fc // SETBYTE indirection prefixer, OUTA base cog register
setbyte d // "val" is indexed into 64-bit OUTA/OUTB register pair
}
_pinw(CLK, 0);
_waitx(1);
_pinw(CLK, 1);
Here i am taking 1 byte from hex_t[index][i]. and setting up pins 16 to 23 using group 2 and loading with falling edge of the clock pin(Shift Register coding). But i am not able to see any change on pins 16 to 23 with these instructions. What am i doing wrong here? can you please suggest?

Also what is meaning of altsb pingroup, #0x1fc instruction?

evanh · 2022-08-09 06:55

Okay, you're attempted to hard code the pin-group. That was the sole job of the ALTSB prefixing instruction - to make pin-group a variable - with address of OUTA, 0x1fc, as its base address. So ALTSB is of not much value when the pin-group is hard coded.

The bug is that you've used 2 instead of #2. In Pasm, a plain number is treated as a register address for Register Direct addressing mode. #2 is the literal for Immediate addressing mode.

EDIT: And, of course, as is often the case, D operand of ALTSB can't be an Immediate literal ... so, either reintroduce the pgroup variable or remove the ALTSB prefix ...

__asm { // allowed to be optimised and hubexec inlined
    setbyte outa, d, #2    // pins P16..P23
}

PS: Another of the reference docs is the instruction table - https://docs.google.com/spreadsheets/d/1_vJk-Ad569UMwgXTKTdfJkHYHpc1rZwxB-DcIiAZNdk/edit?usp=sharing

Require specific Guide/Manual for Propeller 2 GPIO coding in c/c++

Comments