STREAMER - I beg you example !

Ramon · 2021-03-27 03:45

Hi!

I beg you all to please help me to figure out how to code this :

I have this two LONGS (*):

data      byte    $01, $08, $80, $07
                  $00, $FF, $F0, $1F

And want to serially send a 1-bit stream into pin number 32 in a loop  

  repeat
     0000 0001 0000 1000 1000 0000 0000 0111 ' Send to pin 32
     0000 0000 1111 1111 1111 0000 0001 1111 ' Send to pin 32

Sorry, I am not able to write this simple test code by just only reading the docs without any example.

I was looking at the streamer, but actually I don't know if there is any other way much more efficient (like LUT / LOOKUP, or whatever ...)

Extra1 : What would be the fastest possible bitrate?
Extra2 : What would be the fastest possible bitrate if we need to transmit 1520 bytes?

evanh · 2021-03-27 04:16

Sending ... and into pin 32. Are you wanting the streamer outputting or inputting? And is this clocked SPI?

evanh · 2021-03-27 04:29

Chip's flash_loader.spin2 is a good demo of using the streamer for both burning the EEPROM and then booting from it.

Ramon · 2021-03-27 04:51

Hi Evanh

Thanks. I want to output to pin 32. It's not SPI, but will need a fixed frequency (which frequency is not important, the highest the better).

Be warned that you are dealing with a extremely stupid guy, so reading the 276 lines of flash_loader.spin2 (even if it has comments) is as useful to me as reading the current DOCs (a wonderful technical document, but without any single simple example).

evanh · 2021-03-27 06:42

It's not a coherent source of info but there was a long conversation with Mike (msrobots) a couple years back that appears to be similar in idea - https://forums.parallax.com/discussion/170216/ringbuffer-was-streamer-questions-how-to-sync/p1

My own experimenting was very messy but I do still have the sources that I could dig up.

EDIT: I used 32-bit data width there and it later morphed into hyperRAM testing at 8-bit data width. But there's nothing stopping it being 1-bit wide instead.

EDIT2: It was all Pasm2 back then. Do you prefer Spin (with Pasm where needed) examples?

EDIT3: I think what Mike was doing is negotiate a starting time between the two props then blindly bursting. It worked well for me in a single Prop2 between two cogs, via the physical pins, but between two Propellers might be more difficult. It would depend on all propellers seeing the same external clock on the XI pin.

EDIT4: Ha, I tell a lie about pasm only. it looks like Mike was using Eric's extended fastspin.

Ramon · 2021-03-27 08:03

Thank you for your help Evanh!

Wow ! Your streamer example 689 lines of code, and msrobots' example is even more complex doing simultaneous RX/TX on different cogs and using three file objects.

Please, I beg again, could it be possible to have a **SIMPLE ** code example? (something really simple like the pseudo code in the first post).

Ariba · 2021-03-27 08:14

Here is a very simple example in Spin. Be aware that the streamer can output the bits up to the full sysclock frequency, I don't know if you have a scope, fast enough to verify this. So I have set the frequency to 1/32 sysclock (~5.6 MHz @ 180MHz).

{{ streamer bit out test }}

  _clkfreq = 180_000_000

  OUTPIN   = 32

PUB main() | md
  md := %0100_0000_1<<23 + OUTPIN<<17 + 32  'streamer mode: imm 32x1, 32 bits
  org
        setxfrq ##$80000000 / 32            'streamer freq: 1/32 sysclock
        drvl    #OUTPIN                     'pin = output, low
  end
  repeat
    org
          xinit   md, ##%0000_0001_0000_1000_1000_0000_0000_0111    'output 2x32 bits
          xcont   md, ##%0000_0000_1111_1111_1111_0000_0001_1111
    end
    waitms(100)                             'repeat every 100ms

Andy

Ramon · 2021-03-27 08:39

Ariba, thank you so much !!

I will start with that code, and try to make some debugs.

NOTE to myself: This thread seems to match, and it has also interesting details about how the streamer compares with smartpin serial synchronous input / output modes -> https://forums.parallax.com/discussion/171086/faster-spi-bus-transfers/p1

(It's hard to get around all this concepts. I think that the docs should have included some simple code examples and graphs !)

evanh · 2021-03-27 09:05

_clkfreq = 10_000_000 should help with seeing the data with a scope. Can up the clock rate after you've got the sequencing sorted.
EDIT: _clkfreq = 4_000_000 is about as low as allowed in Spin auto-config. I have handcrafted 2 MHz for this sort of testing before.

evanh · 2021-03-27 09:25

Ariba's example is using what Chip calls "Immediate mode". It spits out whatever data is in the S operand of XINIT/XCONT/XZERO. When you look at the hardware manual for the streamer, you'll see five groups of outputting modes. The two immediate modes use S operand but the remaining three modes all use hubRAM as the data source, with the fifth mode being specifically for video output.

Ramon · 2021-03-27 10:38

Hi Evanh,

Yes, thank you. I am currently exactly doing that (lowering clock as much as possible to be able to do a simple quick LED test first, at around 1Hz, using my eyes as scope). 4MHz is around the lowest I can get. Unfortunately it is still too high speed to be able to do a simple led-eye-scope test.

Also, I am no sure about the relationship on NCO and system frequency. According to instruction set, the allowed values are between #0 and #511. I guess that the higher the value the delay will be longer (as we will need to wait until it reach zero) is that correct?

   SETXFRQ {#}D    EEEE 1101011 00L DDDDDDDDD 000011101   ' Set streamer NCO frequency to D.

$ cat streamer_bit_out_test.lst | grep setxfrq
00620     1D 00 64 FD |         setxfrq #0
00620     1D 20 64 FD |         setxfrq #16
00620     1D 40 64 FD |         setxfrq #32
00624     1D FE 67 FD |         setxfrq ##4095
00620     1D FE 67 FD |         setxfrq #511
00620     1D E8 67 FD |         setxfrq #500
00620     1D FC 67 FD |         setxfrq #510
00620     1D FE 67 FD |         setxfrq #511
00624     1D 00 64 FD |         setxfrq ##512

   FD64001D - 1111 1101011 001 000000000 000011101   ' setxfrq #0
   FD67FE1D - 1111 1101011 001 111111111 000011101   ' setxfrq #511

 NOTE: using ## will not raise warning on flexprop "immediate operand <#> out of range"
       using # only will only allow from 0 .. 511

I am using the following test code with flexprop 5.2 and Retroblade2 (that's why I also limit debug to 115_200)

{{ streamer bit out test }}
CON

  _clkfreq = 4_000_000
  DOWNLOAD_BAUD = 115_200
  DEBUG_DELAY = 100
  DEBUG_BAUD = 115_200

  rx_pin   = 63
  tx_pin   = 62
  baud     = 115_200

  OUTPIN   = 36

OBJ
  ser: "spin/SmartSerial"

PUB main() | md


  ser.start(rx_pin, tx_pin, 0, baud)
  md := %0100_0000_1<<23 + OUTPIN<<17 + 32  'streamer mode: imm 32x1, 32 bits
  org
        'setxfrq ##$80000000 / 32            'streamer freq: 1/32 sysclock
        setxfrq #511
        drvl    #OUTPIN                     'pin = output, low
  end
  repeat
    ser.printf("\n[START]")
    org
          'xinit   md, ##%0000_0001_0000_1000_1000_0000_0000_0111    'output 2x32 bits
          'xcont   md, ##%0000_0000_1111_1111_1111_0000_0001_1111
          xinit   md, ##%1111_1111_1111_1111_1111_1111_1111_1111    'output 2x32 bits
          xcont   md, ##%0000_0000_0000_0000_0000_0000_0000_0000
    end
    ser.printf("\n[END]")
    waitms(1000)                             'repeat loop every second

evanh · 2021-03-27 11:53

Okay, NCO (Numerically-controlled oscillator) is not a delay timer in the simple sense. They're a little fancier because they can produce a fractional average rate. Albeit as a dither.

There is two parameters: One is the update period, for SETXFRQ this is simply the sysclock itself. The second paramter is an amount to accumulate on each update. So, an amount of, say, 200 will, on each sysclock, add 200 to the accumulator inside the NCO.

The output of the NCO is bit31 of its accumulator. Adding 200 each sysclock will take over 10.7 million sysclocks to trigger one streamer cycle. In your case, a single bit is outputted on pin 36.

So, to make the NCO generate a streamer trigger on every sysclock needs the maximum value of $8000_0000. Other examples are provided in the hardware doc.

BTW: You're not far off one bit per second with setxfrq #511 and clkfreq of 4 MHz. setxfrq #268 will give you a bit every two seconds.

evanh · 2021-03-27 12:03

Right, the next thing is large immediates (##number) in pasm2. They do work. You've just chopped out too much of the listing using grep like that.

The assemblers generate a prefixing instruction, called AUGD or AUGS, that sets up a hidden register(s) with the extra bits to complete a full 32-bit immediate number in the code.

evanh · 2021-03-27 12:27

You also might prefer seeing the most significant bit first:

md := %0100_0000_1<<23 + OUTPIN<<17 + 1<<16 + 32  'streamer mode: imm 32x1, msb, 32 bits

Ramon · 2021-03-27 12:28

(it was not a single grep, but the many test I did. I removed the commands and merged the output to avoid wasting internet bandwidth)

The P2 instruction set document doesn't say that this instruction needs AUGD.
But yes, the complete listing shows AUGD preceding SETXFRQ.

org
        setxfrq ##$80000000 / 32            'streamer freq: 1/32 sysclock
        drvl    #OUTPIN                     'pin = output, low

00608     00 00 FE FF
0060c     1D 00 64 FD |         setxfrq ##-67108864
00610     58 40 64 FD |         drvl    #32

  FFFE0000 - AUGD    $7E0000     -  %1111 11111_11 111 100000000 000000000
  FD64001D - SETXFRQ %000000000  -  %1111 1101011  001 000000000 000011101

I have just done a quick test with the scope.
Tried zillions of combinations and was not able to make a simple LED toggling.

I was thinking that I didn't setup my scope properly, but a simple test with pintoggle at (4MHz sysclock) showed a 1MHz frequency.
And I was able to detect any low (or high) frequency with a simple pintoggle, but was not able to emulate the same with the streamer:

I modified the immediate values to make it square wave:

    org
          xinit    md, ##%1111_1111_0000_0000_1111_1111_0000_0000
          xcont   md, ##%1111_1111_0000_0000_1111_1111_0000_0000
    end

Maybe the key is in the md value. I will look at it later.

Thanks for your detailed comments !

TonyB_ · 2021-03-27 12:30

@evanh said:
BTW: You're not far off one bit per second with setxfrq #511 and clkfreq of 4 MHz. setxfrq #268 will give you a bit every two seconds.

Or use a lower setxfreq and higher clkfreq. Has anyone ever used setxfreq #1?

evanh · 2021-03-27 12:57

@Ramon said:
The P2 instruction set document doesn't say that this instruction needs AUGD.

That's because those prefixes can be applied to almost all instructions. Not ALTx.

Maybe the key is in the md value. I will look at it later.

Oh, it's the waitms() One second is far too short if each bit time is greater than a second. It'll just keep resetting the XINIT as it loops.

As an aside, you would have see something if you had three consecutive streamer commands. The XINIT is started immediately, and the XCONT is buffered immediately. If a second XCONT was after that it would have blocked, stalling the Cog, waiting for the buffer to free up.

evanh · 2021-03-27 13:33

Ha, I just noticed that the assembled machine code for Ariba's setxfrq ##$80000000 / 32 is wrong. Or at least not what was intended. It has been treated as a signed number when the intent was unsigned.
setxfrq ##$8000_0000 >> 5 should work correctly. If not then this: setxfrq ##$0400_0000

Ramon · 2021-03-28 01:32

Thank you all !!

Evanh, Yes, you are right. The instruction setxfrq ##$80000000 / 32 is being treated as unsigned and correct one should be setxfrq ##$8000_0000 >> 5.

I also needed to slightly modify Ariba's code to make it work. I moved the xinit instruction outside the loop, and keep only xcont inside the loop. I was able to toggle pin 36 (1Hz) with this code:

{{ streamer bit out test -- toggle pin 36 at 1 Hz }}
CON

  _clkfreq = 4_000_000
  DOWNLOAD_BAUD = 115_200
  DEBUG_DELAY = 100
  DEBUG_BAUD = 115_200

  OUTPIN   = 36

PUB main() | md

  md := %0100_0000_1<<23 + OUTPIN<<17 + 32  'streamer mode: imm 32x1, 32 bits
  org
        setxfrq ##536            '1Hz for 4MHz clock
        drvl    #OUTPIN                     'pin = output, low
        xinit   md, ##%1010_1010_1010_1010_1010_1010_1010_1010
  end
  repeat
    org
          xcont   md, ##%1010_1010_1010_1010_1010_1010_1010_1010
    end

evanh · 2021-03-28 02:03

For what you're doing now, it actually doesn't need the XINIT line at all.

XINIT just guarantees an immediate restart of the streamer - Including the NCO I suspect. But when it's not doing anything it will happily startup from an XCONT.

Ramon · 2021-03-28 02:10

I remember that tried many combinations and only worked this way.

Well, now the big question: How to make it work to read from an array of bytes (instead of using immediate values)?

evanh · 2021-03-28 02:19

I've just loaded up your example and removed the XINIT line. It's working fine.

evanh · 2021-03-28 02:24

@Ramon said:
Well, now the big question: How to make it work to read from an array of bytes (instead of using immediate values)?

Three options: You can feed more S operands from cogRAM, or reference lutRAM with a map in S, or setup a RDFAST from hubRAM using the FIFO.

The FIFO option allows larger data blocks and more autonomous but it must be consecutive addresses within the hubRAM blocks. The hubRAM data can also reference lutRAM.

evanh · 2021-03-28 03:43

PS: More than one block require the cog to manage the FIFO with FBLOCK instructions.

Ramon · 2021-03-28 04:23

@evanh said:
For what you're doing now, it actually doesn't need the XINIT line at all.

XINIT just guarantees an immediate restart of the streamer - Including the NCO I suspect. But when it's not doing anything it will happily startup from an XCONT.

Agree, for the first sentence only.

I have done more tests. There seems to be something wrong with XINIT.

- XINIT **inside the repeat block **will not make the pin to toggle.
- XINIT before the repeat block will not make the pin to immediately toggle, it **MUST **be followed by XZERO (or XCONT) in a repeat loop (without a repeat loop it will not toggle). 
- XINIT is optional (as you have said).

evanh · 2021-03-28 04:27

XINIT inside the REPEAT prevents buffering of streamer commands because part of restarting feature is to empty the command buffer. It just endlessly restarts in a tight loop.

Ariba · 2021-03-28 04:28

Lutram and Rdfast are not really usable within Spin2, so the easiest is to read the array values into a local variable and send it per Streamer immediate-mode:

{{ streamer bit out test -- toggle pin 36 at 1 Hz }}
CON

  _clkfreq = 180_000_000
  DOWNLOAD_BAUD = 115_200
  DEBUG_DELAY = 100
  DEBUG_BAUD = 115_200

  OUTPIN   = 36

DAT
myarray  long  $FFFF00FF,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

PUB main() | md,i,v

  md := %0100_0000_1<<23 + OUTPIN<<17 + 32  'streamer mode: imm 32x1, 32 bits
  org
        setxfrq #$8000_0000 +/ _clkfreq     '1Hz
        drvl    #OUTPIN                     'pin = output, low
        xinit   md, ##%1010_1010_1010_1010_1010_1010_1010_1010
  end
  repeat
    repeat i from 0 to 15
      v := myarray[i]
      org
          xcont   md, v
      end

(This code also shows that you can use unsigned divide with $8000_0000 for calculating the streamer freq.)
For sure Spin2 limits the max. possible streamer frequency to a few MHz (repeat loop freq. * 32) if you need a contiguous bitstream.

Andy

evanh · 2021-03-28 04:49

I suspect the problem with the single XINIT test would've been the whole program terminated on you. Thereby cancelling the streamer actions. This works:

{{ streamer bit out test -- toggle pin 56 at 1 Hz }}
CON

  _clkfreq = 4_000_000
  DOWNLOAD_BAUD = 115_200
  DEBUG_DELAY = 100
  DEBUG_BAUD = 115_200

  OUTPIN   = 56

PUB main() | md

  md := %0100_0000_1<<23 + OUTPIN<<17 + 32  'streamer mode: imm 32x1, 32 bits
  org
        setxfrq ##536            '1Hz for 4MHz clock
        drvl    #OUTPIN                     'pin = output, low
        xinit   md, ##%1010_1010_1010_1010_1010_1010_1010_1010
  end
  repeat

Ramon · 2021-03-28 08:23

I have just asked Eric on a separate thread (FlexProp thread).

XINIT seems fine. The problem is how FlexProp handles an empty in-line PASM (org .. end) inside a repeat loop.

It seems a FlexProp bug, PNut v35k does not have this issue.

evanh · 2021-03-28 09:41

Yep, I intentionally removed the empty ORG/END to avoid that.

evanh · 2021-03-28 10:03

@evanh said:
You also might prefer seeing the most significant bit first:
md := %0100_0000_1<<23 + OUTPIN<<17 + 1<<16 + 32  'streamer mode: imm 32x1, msb, 32 bits

I just tested this mode variation and I'm not seeing any difference in effect. The bit order stays the same. I've also tried at 2-bit width and no effect there either. Both bit order and 2-bit word order seems unchanged by the "a" bit being set. Not sure what the story is right now.

I think I used it once before a long time back with hubRAM transfers. But that was 32-bit word width. EDIT: Hmm, or maybe not. It only applies to 4-bit or less word size.

STREAMER - I beg you example !

Comments