Buffering example

kwagner · 2021-08-11 13:05

Hi all,

I've wanted to dive in to the propeller for years, and the P2 seems like the perfect solution/excuse for a project at work I've got the developer starter bundle and eval board accessory set to play around with, and ran some examples. I'm having some trouble adapting the examples to what I need, so I thought I'd ask here.

The basic execution of the program is to put signals (in half microsecond resolution) at various spots along a buffer. Ideally this would be an analog signal, but I can make it work with digital as well. I know how to bit bang a pin, but I'd like to take advantage of the smart buffering capabilities of the system, and I'm just not able to parse the documentation around it right.

I thought the VGA examples might be a good place to look, but I haven't been able to modify it properly to remove the front/back porch signals to create a continuous buffer.

Is there a simple "hello world" version of buffering I can reference? I'm flexible on language, spin/basic/c/assembly is fine.

Thanks!

Cluso99 · 2021-08-11 13:27

It’s easy to remove the front/back porch code from the vga assembly code once you realise how simple it is. Unfortunately I’m not at my pc so cannot post the bits atm.
Perhaps someone else can chime in.

I’m presuming you just want to repeat outputting a buffer continuously, and will change that buffer’s data from another cog when required - that’s the easiest way although there would be time available in the same cog.

kwagner · 2021-08-11 13:34

Yes, plan would be to have one cog just outputting the buffer, and another modifying it.

I found the front/back porch sections, but when I either removed the jumps to them or set their config values to 0 for the signal timings, I got no output. I assume that there's some logic in those sections that is required to reset certain variables used in the main section, but it eluded me at the time. I'll give it another go while waiting for further responses.

Rayman · 2021-08-11 15:44

In case your looking to do direct digital synthesis (DDS), here's an example that I copied from Chip Gracey.
It creates waveforms in a buffer and then repeatedly outputs them in a loop on four pins.
Chip's original file to HDMI is here in this forum (somewhere).

Found it: https://forums.parallax.com/discussion/170820/hdmi-4-channel-oscilloscope-demo/p1

kwagner · 2021-08-11 16:55

Thanks, Rayman!

I don't think DDS will be a fit because the waveforms being buffered are not repeating, but I'll take a look to see if I can learn more. That was why I was attracted to the arbitrary nature of the VGA examples, but there might be too much extra in the drivers to trim down. I gave this pong version a try https://forums.parallax.com/discussion/171867/a-try-version-of-pong and then commented out the blanking commands, but I feel like I'm going at this backwards.

Edit: Maybe the "full duplex serial" example would be a better start, if I can up the baud rate to get as high of a resolution as I need?

Cluso99 · 2021-08-13 06:07

I have just looked at the VGA code and I'm a little unsure how to extract the part you need.
PM Chip with your question and a reference to this thread and I'm sure he'll post what you need to do.

Basically, you need to just do a continuous loop with xcont after setting up the streamer pins and setxfreq as is done in the driver section of the code. So something like this...

.field
        rdfast  #0,screen_base      'set fifo to read from current screen row
        xcont   m_px,pa             'output xx single-bit pixels to streamer
        jmp     #.field

m_px        long    $0F0C0000+nnnn  'nnnn single-bit pixels
screen_base res 1       'set to the address of the screen buffer in hub

I am not sure if m_px is correct or not.
Likewise, I am uncertain as to whether the screen_buffer has to be in cog/lut/hub.

You might have to copy the hub pixel code into cog/lut between the rdfast and xcont instruction using a SETQ or SETQ2.
It's definately not as simple as I thought without understanding the precise streamer function

kwagner · 2021-08-13 11:55

Thanks for looking Yeah I gave the VGA example a go yesterday and it's very optimized for the extra stuff it's trying to do, so making it do less got complicated for me quickly. I'll send a PM, thanks!

Edit: being newer to the forum, I'm unsure of Chip's handle (and searching for "chip" on a microcontroller forum yields the wrong results).

evanh · 2021-08-13 12:50

Chip is otherwise occupied and hasn't been posting on the forums for some weeks now. Last logged in over a week ago.

evanh · 2021-08-13 12:54

The streamer can be operated in an immediate fashion for state change type functions. A smartpin can do similar. What activity are you after?

kwagner · 2021-08-13 13:47

Evan:
I have a program that generates an arbitrary waveform, ideal size is 3ms worth at a 0.05us resolution.
The way I do this now in C on linux is I have a frontbuffer and backbuffer (like writing to a display), the one thread writes to the backbuffer while the other is sending the frontbuffer over the pin.
After the buffer is sent, the threads essentially flip buffers so the one is writing to the frontbuffer while the other sends the backbuffer.
What I want to do on the propeller is the same thing, have one cog writing to one buffer while another cog handles spitting out the second buffer on the pin.
After the second cog sends the entire buffer, flip buffers (or pointers to them).
I could accomplish the same thing with a single buffer eating its tail if it's easier to make work.
I know how to do it with a higher level language on the propeller, but for speed I need to mix in assembly, and I'm not familiar enough with it to make it do what I want yet.

evanh · 2021-08-13 22:06

8-bit DAC I presume? And by 3 ms I guess you only mean the buffer length, right? 20 mega-samples-per-second (MSPS) for 3 ms is 60,000 samples (bytes) per buffer.

ersmith · 2021-08-13 23:43

@kwagner said:
The way I do this now in C on linux is I have a frontbuffer and backbuffer (like writing to a display), the one thread writes to the backbuffer while the other is sending the frontbuffer over the pin.
After the buffer is sent, the threads essentially flip buffers so the one is writing to the frontbuffer while the other sends the backbuffer.
What I want to do on the propeller is the same thing, have one cog writing to one buffer while another cog handles spitting out the second buffer on the pin.
After the second cog sends the entire buffer, flip buffers (or pointers to them).
I could accomplish the same thing with a single buffer eating its tail if it's easier to make work.
I know how to do it with a higher level language on the propeller, but for speed I need to mix in assembly, and I'm not familiar enough with it to make it do what I want yet.

Are you sure you need assembly to get enough speed? Have you tried compiling with flexspin? With a little bit of hinting to the compiler (placing the main transmit function in LUT) I can get > 3 megabytes/second (24 megabits per second) in C on just one pin, and that's without using smartpins, just bit banging. Here's the code:

// simple buffer streaming test using bit banging
#include <stdio.h>
#include <propeller2.h>

enum {
    _clkfreq = 250000000
};

#define PIN 0

// send a buffer of n bytes out pin PIN as serial, with a start
// bit and stop bit
// put this routine in LUT for speed. It is important that it not call
// out to other functions (except built-ins like _drvh)
void sendbuf(unsigned char *buf, int n) __attribute__((lut)) {
    unsigned x;
    unsigned i;
    do {
        x = *buf++;
        x = (x | 0x100) << 23;
        for (i = 0; i < 10; i++) {
            if (x & 0x80000000) {
                _drvh(PIN);
            } else {
                _drvl(PIN);
            }
            x = x<<1;
        }
    } while (--n != 0);
}

// test harness
unsigned char mybuf[65536];

void main() {
    unsigned cycles;
    unsigned len = sizeof(mybuf);
    unsigned freq = _clockfreq();
    printf("testit...\n");
    cycles = _cnt();
    sendbuf(mybuf, len);
    cycles = _cnt() - cycles;
    printf("sent %u bytes in %u cycles\n", len, cycles);
    printf("  -> %u bytes/sec at %u MHz\n", _muldiv64(len, freq, cycles), freq / 1000000);
}

Compiled with flexspin -2 -O2 foo.c, the output is:

( Entering terminal mode.  Press Ctrl-] or Ctrl-Z to exit. )
testit...
sent 65536 bytes in 4210717 cycles
  -> 3891023 bytes/sec at 250 MHz

kwagner · 2021-08-16 14:06

@evanh said:
8-bit DAC I presume? And by 3 ms I guess you only mean the buffer length, right? 20 mega-samples-per-second (MSPS) for 3 ms is 60,000 samples (bytes) per buffer.

Yes, 8-bit DAC would be ideal, though I can make it work with 1 bit. Correct, 60,000 samples per buffer. I mention the 3ms for timing of the buffer output.

@ersmith said:
Are you sure you need assembly to get enough speed? Have you tried compiling with flexspin?

No and no. Again, I'm new to propeller, so I don't know the optimal way to do anything yet. I'm trying to follow tutorials and examples and poke my way around to what works. So far most examples I've seen are "set a value/buffer once, then spit it out forever", which leaves me with a gap in understanding. I appreciate the example, that's the kind of thing I was looking for. I'll see if I can modify it to have two cogs, one changing the buffer and the other writing it.

evanh · 2021-08-16 18:58

The "FIFO" will help a lot here. Each Cog has one FIFO. It provides a few pieces of a DMA channel. It has internal buffering for handling the hubRAM timings and it can be set to handle multiple blocks within hubRAM. Block size is in increments of 16 longwords (64 consecutive byte addresses).

Each Cog also has one "Streamer". The streamer is the other half of a DMA channel. It paces the I/O and has various I/O modes.

I'm off to work right now. Read up on those two. The FIFO is the simpler of the two, and it can be used by the Cog without invoking the Streamer so you can have a play with it by itself.

msrobots · 2021-08-16 20:15

LUT sharing might be another solution.

You can pair two COGs at start and allow one COG to allow write access to its LUT from the other one.

Now each write from COG one into its LUT appears in the LUT of the second COG giving you a 2K buffer between those two COGs without any HUB buffering and HUB timing constrains.

As always there are many ways to skin a cat,

Mike

kwagner · 2021-08-17 00:58

@evanh: thanks for the reading material. I'm digesting that stuff now.

@msrobots said:
As always there are many ways to skin a cat,

Agreed, I just wish I understood what they all are
LUT's are an interesting idea. I wanted to try using this example as a base:
https://github.com/parallaxinc/propeller/blob/master/resources/FPGA Examples/VGA_640_x_480_8bpp.spin2
But when I run it (in FlexProp), I don't seem to get any output on the pins.
Not sure if I'm doing something wrong, or it's because of differences with the production P2 from the FPGA version.
But it's a very concise VGA example without all the tile driver stuff, which I should be able to adapt pretty easily once I get it running.

rogloh · 2021-08-17 01:37

@kwagner said:
@evanh: thanks for the reading material. I'm digesting that stuff now.

@msrobots said:
As always there are many ways to skin a cat,

Agreed, I just wish I understood what they all are
LUT's are an interesting idea. I wanted to try using this example as a base:
https://github.com/parallaxinc/propeller/blob/master/resources/FPGA Examples/VGA_640_x_480_8bpp.spin2
But when I run it (in FlexProp), I don't seem to get any output on the pins.
Not sure if I'm doing something wrong, or it's because of differences with the production P2 from the FPGA version.
But it's a very concise VGA example without all the tile driver stuff, which I should be able to adapt pretty easily once I get it running.

If this is an FPGA example there is a probably a good chance the streamer commands would be for the older rev A P2 not rev B/C.

Rayman · 2021-08-17 02:14

That is definitely an old FPGA example as it does not set the P2 clock to anything real...

These here may be more recent:
https://forums.parallax.com/discussion/172624/simple-vga-wvga-spin2-examples

Rayman · 2021-08-17 02:16

With the new Spin2 and P2 all you need to do is define _clkfreq and it magically does it for you...

evanh · 2021-08-17 04:36

If you want to incorporate example spin objects into C then it's painless for the simple stuff at least. I only just tried out my first one just last night using one of Jonny Mac's I2C examples. https://github.com/parallaxinc/propeller/tree/master/libraries/community/p2/All/jm_i2c_devices

I ported his device listing example to C and used his jm_i2c.spin2 object for the low level read/write.

evanh · 2021-08-17 07:38

And using Rayman's HDMI example from his above link:

enum {
    _clkfreq = 300_000_000,
};

struct __using( "WVGA_HDMI_Simple1a.spin2" ) hdmi;



void  main( void )
{
    hdmi.Start();
}

ersmith · 2021-08-17 12:00

@kwagner said:

@ersmith said:
Are you sure you need assembly to get enough speed? Have you tried compiling with flexspin?

No and no. Again, I'm new to propeller, so I don't know the optimal way to do anything yet. I'm trying to follow tutorials and examples and poke my way around to what works. So far most examples I've seen are "set a value/buffer once, then spit it out forever", which leaves me with a gap in understanding. I appreciate the example, that's the kind of thing I was looking for. I'll see if I can modify it to have two cogs, one changing the buffer and the other writing it.

Don't worry too much about the "optimal" way to do things... as others have said, there are many ways to skin the P2 cat, and frankly as long as you can get something that works fast enough for your needs, that's all that really matters.

kwagner · 2021-08-17 19:44

Thanks all, especially @Rayman for that USB mouse changing bitmap example in the link. That's exactly what I needed for a base to fiddle with. Made lots of progress today and I have some arbitrary waveforms working

kwagner · 2021-08-18 13:22

Attached is the code I have currently for a 3ms buffer showing on two pins: a 3.3V to mark the start of the buffer for reference on an oscope, and a ~1.7V signal bouncing back and forth across the buffer.

The code is based off of one of @Rayman's VGA examples. I actually want to use all three RGB pins, as I will be doing multiple synchronous waveforms. What I don't understand is how the values correlate to pins and intensities. It originated from outputting an 8bpp bitmap, which it did just fine color-wise. I did some experimenting with numeric values and found some that worked for now, but I could use some help understanding that portion to take the code further.

evanh · 2021-08-18 13:46

I suspect you'll be wanting a different streamer mode that just pipes the data verbatim from hubRAM to the DACs, instead of via the lutRAM tables. Namely X_RFLONG_32P_4DAC8 | X_DACS_3_2_1_0

If you do want lutRAM use then X_RFLONG_4X8_LUT | X_DACS_3_2_1_0 is the 256 entry lookup. And X_RFLONG_8X4_LUT | X_DACS_3_2_1_0 is the 16 entry lookup.

kwagner · 2021-08-18 17:55

Ah, that makes sense. Yes, I want to go from hubRAM to the DACs.

I'm trying to reconcile the streamer modes you listed and are in the silicon documentation on page 30-32 with the wrpin values discussed in the smart pin documentation on page 6-9. The streamer modes are listed as 16 bit values, while doing wrpin shows a value like %AAAA_BBBB_FFF_PPPPPPPPPPPPP_TT_MMMMM_0 with 13 bits for pin control.

How would I properly change my code to use the builtin X_RFLONG_32P_4DAC8 | X_DACS_3_2_1_0 from the 0000_0000_000_1011100000000_01_00000_0 it has listed now?

Edit: Ok, I see my mistake now. The streamer mode is obfuscated up in the assembly here:
m_rf:=$7F080000 + basepin<<17 + 800 which corresponds to 0111_1111_0000_1000, which is the RFLONG->4x8-bit LUT.
So I need to update this portion accordingly and I should be good.

kwagner · 2021-08-18 19:37

Attached is my latest example. For my use case, I wanted one byte to be used as four analog values across four DACs, so the X_RFBYTE_8P_4DAC2 was used. I made all four pins step through values 0-255 as a quick example to show different values in sync. My only issue is I have a bug where it seems the pins go in order of 1230 instead of 0123. Am I mixing things up somewhere in the initialization?

evanh · 2021-08-18 23:32

Heh, that mode is still four DAC channels packed into each byte, but each channel is just two bits wide. So you've effectively turned the 8-bit DACs into 2-bit DACs. Try X_RFBYTE_8P_1DAC8 | X_DACS_0_0_0_0

kwagner · 2021-08-19 01:47

Correct, I realize I was ambiguous with my sentence. What I wanted was:

4 DACs to each have their own value
Using a single byte for space

Which gives me 4 potential levels (0,33%,66%,100%) of analog signal (2 bits), which is fine for my needs.
In the future, I'm going to have a couple more cogs running different waveforms at the same resolution, so I want to be sure I have enough shared RAM to make it work (right now I'm using 120KB).

What I would expect from the documentation when byte %hgfedcba goes to the smart pins:

DAC 0 would get %babababa
DAC 1 would get %dcdcdcdc
DAC 2 would get %fefefefe
DAC 3 would get %hghghghg

What seems to be happening in my case is:

DAC 0 gets %dcdcdcdc
DAC 1 gets %fefefefe
DAC 2 gets %hghghghg
DAC 3 gets %babababa

On an oscope, I see the fourth pin's 'staircase' values are the shortest duration, meaning it's the lowest bits since they toggle the most often. Does that happen when anyone else runs my example code?

evanh · 2021-08-19 03:44

Oh, I see the problem, your observation is correct.

It's because the DAC channels are hard ordered in the hardware. Each channel is always on four-pin boundaries with DAC0 on the multiple of four, DAC1 on the four+1 DAC2 four+2 and DAC3 four+3. And with the leading ADD smart1, #1 in the setup routine you've offset (rotated) the order of assignment. I can see why you did that, you wanted the base pin to be the reference pulse. Prolly best to make it +4 instead.

kwagner · 2021-08-19 12:38

Aha! That makes sense, thanks
Example modified and attached.
Next step is to attempt code reuse and run a separate set of waveforms on a second cog.

Buffering example

Comments