Ring Buffers in spin2 (Newbie stuff)

Scroungre · 2021-08-16 14:30

So I was poking around the fora looking for a Ring Buffer module I could just grab and use, and I didn’t find one. Perhaps I should have poked around a bit harder. Since I didn’t find one, I wrote one.

Obviously, it could use some help from people who actually know what they are doing, and there are a few things wrong with it that I don’t know how to fix.

I want to make this into an ‘instantiatable’ object – I want to have several cogs to have their own ring buffers without having to copy and paste the code (with new variable names) into each cog. Each cog may have a different ring buffer size.

The code as supplied below is a top-level object and can (and does!) run as is, in Cog 0. It’s ‘ring_buf_testing()’.

I included some error control – you can’t read from an empty buffer, and you can’t write to a full one. Writes fail to a full buffer – and reads from an empty buffer return $00, and I don’t know if you can stop that. There are debug() messages in case you care.

Actually, about that ‘debug()’ - I couldn’t get any of the ‘array’ debug statements to work. They always gave me garbage. As you might have noticed down in ‘ring_buf_dump()’ I worked around it for a ten-item ring buffer, but the solution is, shall we say, ‘awkward’? Testing it with many different buffer sizes would be a nightmare.

The DAT block ‘stuff’ is also somewhat inelegant – it’s just a block of crud to test with. The ‘i’ variable tests are hard-coded to the length of ‘stuff’. You will need to change them if you change the length of ‘stuff’.

I’d be delighted to hear from those who know better ‘how I should have done that’, and I will cheerfully incorporate good suggestions into revised code. But since I couldn’t easily find a ring buffer module while searching around the fora, I thought it might be a helpful /topic to post. Perhaps now those who search can find this one – and despair. ;-)

Nothing beside remains. S.

__deets__ · 2021-08-16 16:52

Instead of comparing and then setting the write pointer (which I would call index in this case) to 0 (same for read) one usually uses the module operator (should be +//) to do this.

The error handling is a bit moot. While it informs you via debug, if nobody looks, the code just runs. It should use e.g. the multiple return value mechanism in spin2 and return a flag if the operation succeeded.

And the whole status thing is unfortunately not concurrency capable. Between your check & setting the status, the other cog could do the same, and then you overwrite in the wrong order. I’m not knowledgeable in PASM2 yet, but if my ringbuffer would ever grow such ability ( https://github.com/deets/unifhy-rocket-engine-test-stand/blob/master/modul2/P2/spin/ringbuffer.spin2 ) I would look for an atomic increment/decrement of a memory address. That should solve these issues.

Rayman · 2021-08-16 17:05

Fullduplex serial might be a good example of a ring buffer...

Looks like there is "jm_fullduplexserial.spin2" included with the Prop Tool in the library...

JonnyMac · 2021-08-16 17:57

I'm not one to tell people how they should or shouldn't do anything (though I certainly have opinions), so I'll share with you something that I knocked together over coffee and phone calls this morning that was inspired by your project. It's simple, and more easily re-usable.

We cannot dynamically allocate memory in Spin, so I tend to create buffer-handling objects by passing a pointer to a buffer, along with its size. After that, things are pretty easy.

If I may offer, you can make your code a bit friendlier by being more generous with whitespace, and using named constants when appropriate. You can see that in my write() and read() methods from the attached object.

pub write(value) : result

'' Write one long to buffer
'' -- will not work if buffer is full
''    * use get_status() or get_count() to check for space

  if (status == AVAILABLE)                                      ' space available?
    long[bufpntr][wridx] := value                               ' add value to buffer
    if (++wridx == bufsize)                                     ' update write index and check
      wridx := 0                                                ' reset if past end of buffer
    bufcount += 1                                               ' update count                
    status := (bufcount < bufsize) ? AVAILABLE : FULL           ' update status

  return status


pub read() : value

'' Read one long from ring buffer
'' -- use get_count() first

  if (bufcount > 0)                                             ' values available?
    value := long[bufpntr][rdidx]                               ' get value from buffer
    if (++rdidx == bufsize)                                     ' update read index and check
      rdidx := 0                                                ' reset if past end of buffer
    bufcount -= 1                                               ' update count
    status := AVAILABLE                                         ' update status       
  else
    return 0

In the end one could probably update the object so that could handle longs, word, or bytes, but that might be messier code than is worth the convenience of having a single object.

Cluso99 · 2021-08-16 20:06

There are examples of multiple ring buffers in my 16 serial port drivers. The driver is in pasm and addresses and sizes are passed. Look for the quick bytes examples on the main parallax website.
You will need to add error cases as the ring buffer will overwrite if full as that’s what we decided for serial - either way is an error.

Cluso99 · 2021-08-17 03:11

FYI While looking for something else, I saw and recalled that the COG LUT shared between cogs cannot be used when the streamer is being used to stream to/from LUT. This is an either/or case.

Scroungre · 2021-08-17 07:14

Awesome, guys! This is the good stuff. Thanks much - as there is much to ponder!

The +// operator looks helpful, although I wasn't really worried about re-entrancy. My idea was that each cog gets its own buffer - mostly to smooth out the 'eggbeater' timing.

Dunno if I want the extra variable 'bufcount' in JonnyMac's code. Seems unnecessary. Might be interesting, though, at some point, but you can find out how much data is in your buffer by doing math on the read and write indexes. I definitely see your point in using words instead of cryptic 'error codes'. It's also probably better to use the serial port for testing instead of horsing around with 'debug'. I wasn't too worried about pre-filling the array - on initialization it's considered 'empty' whatever's in the actual memory, and whatever's there will get overwritten anyhow.

I'll have a good sniff through the various serial port routines too. Good place to start sniffing!

I decided that a write to a full buffer should fail instead of overwriting, but as Cluso99 said, that's a personal choice. A read failure still returns 0 - but I'm not too worried about that.

This is all cool! Thanks again, folks!

Scroungre · 2021-08-18 05:25

A bit of tinkering and using the +// instruction got me this, which seems to work okay (aside from all the rest of the code's dependence upon bufcount. I intend to get rid of that entirely, and rewrite get_count while I'm at it...). Saved a few bytes (4,368 vs. 4,372). What I don't know (yet) is if it's faster - I don't know how expensive the +// operator is when compiled. Experiments proceed! S.

```pub write(value) : result

'' Write one long to buffer
'' -- will not work if buffer is full - Will silently fail. S.
'' * use get_status() or get_count() to check for space - They've been fried and need fixing
'' * to get rid of bufcount as redundant. Is this premature optimization? S.

if (status == AVAILABLE) ' space available?
long[bufpntr][wridx] := value ' add value to buffer
wridx := (++wridx +// bufsize) ' Modulo, loops back to 0
status := (wridx == rdidx) ? FULL : AVAILABLE ' update status

return status```

And unfortunately the code window still appears to not work on my web browser*. Yes, I tried three back-ticks. It didn't work either. S.

Pale Moon (A Firefox fork) rev. 29.3.0 64-bit on Win7 64bit Lenovo Thinkpad i7 power.

Scroungre · 2021-08-18 05:28

In case it helps, here's the code in a .spin2 attachment:

JonnyMac · 2021-08-18 07:23

I tend to write code that is obvious and, hopefully, useful. I did a quick test and the version that uses +// in write() and it is a tiny bit faster. On the other side of that, however, is now the get_count() method needs rewriting.

pub get_count() : result

  if (status == FULL)
    result := bufsize
  else
    result := wridx - rdidx
    if (result < 0)
      result += bufsize

The read() and preset() methods need updates, too. The lesson is that "optimized" is a very loaded word, and doesn't mean the same to everyone. Most of the code I write is for public consumption, hence I work very had to keep it simple and obvious so that others can learn from it. And, as I just pointed out, "optimizing" one section of code may break another.

From your code:

Is this premature optimization?

Maybe -- but you have to decide. I added versions of methods that don't need bufcount just to play with.

Cluso99 · 2021-08-18 07:55

The 3 backticks need to be on their own line - silly but required

__deets__ · 2021-08-18 08:30

@Scroungre Nice to see you progress. WRT to +// being faster - I don’t know. It’s just a tool in the box, and it pops into my mind if the task is “make things wrap around”. And thus seeing it communicates that clearly to me. Versus an if, which can test for anything and do anything, so I need to understand it. Granted this is subjective. But this is simpler and easier to understand for me.

I have to admit I’m a bit confused to your statement that this doesn’t need to be concurrent. There might be value in buffering within one cogs business, no doubt about it. However I use ringbuffers extensively to share data w/o need of synchronization between independent cogs (as long as we don’t overflow, but that’s a system design issue). So I guess my question is: what else is there that I’m missing? Especially with an eye on non-lockstepped communication. I am aware of ATN and locks and polling of hub memory, but those all are explicit synchronization points.

Scroungre · 2021-08-19 00:07

It was very useful and very helpful, JonnyMac. I intend to replace most of my code with yours!

I went ahead and fixed read() and getcount() myself, although once again, your getcount() is better than mine. I'm going to steal yours now... ;-)

pub get_count() : result        ' NB - In this case, it reports How Many Longs Can You Read
                                ' not 'How many empty slots are there?'

  if (status == AVAILABLE)
    if wridx > rdidx
      return (wridx - rdidx)
    else
      return ((wridx + bufsize) - rdidx)
  elseif (status == FULL)
    return bufsize
  else
    return 0                     ' aka Empty

I also shuftied the logic around a bit (which broke a lot more things! but again, that's personal choice. I also put back 'EMPTY' as a valid condition (elsewhere) even though it's logically not really required.

There were a couple of things I wasn't entirely clear on. The formulation

 #0, AVAILABLE, EMPTY, FULL

was a bit unclear, although it seems to work, I'm really hoping they resolve to bits or values, not separate variables, (See edit #2 below) and

long[x][y] := value

sort of became clear in context, but I'm not sure where to find it in my spin2 documentation (v35h) (see edit below). Incidentally, because cogs do everything in longs, I'm doing everything in longs.

In other ring buffer news, yes, it's true that each cog (each worker cog, anyhow) in the application I have in mind will get its own buffer, but there will also be quite a lot of asynchronous user interfacing (with its own dedicated cog) that will need buffering too - and another in the hub for cog distribution purposes... Each cog will set a semaphore asking the hub for another 'work unit' when there's space in its ring buffer, hopefully while still grinding away on previous work units. Yet another cog might be fetching 'work units' from an external RAM and sticking them in the Hub ring buffer (in this case more of a FIFO). &c. They're wonderfully useful little bricks, these things!

Thanks again! I tinker because you've inspired me to tinker! S.

PS - Testing the backticks!

Edit Below: I found it, but it says it's a HUB variable. I don't want any of this in the hub except for the ring buffer in the hub. Might this become a problem?

Edit #2 below: Okay, I found it in the CON section, and I think it does do what I thought it did - each 'condition' is a separate number which can then be dealt with. And now I can have four billion conditions! Wheeee.... ;-)

JonnyMac · 2021-08-19 06:38

because cogs do everything in longs

You can treat cog memory as an array of nibbles, bytes, words, or longs in the P2. I'm assuming when you refer to a cog you're hinting at assembly programming. In the attached program the sbusraw array -- though declared as longs (to your point) -- is accessed as bytes. If you examine the program you'll see that it takes an array of bytes in the cog and outputs to an array of words in the hub.

Don't forget that most aspects of Spin1 apply to Spin2, so the P1 manual is useful. On page 130 you'll find an explanation of reading longs from the hub in this form:

  x := long[baseAddress][offset]

The upside of this syntax is that the compiler knows what to do with offset for the variable type being accessed.

Scroungre · 2021-08-19 08:35

I guess I am kinda hinting at assembly programming. I am coming 'up' from AVR programming, which I wrote extensively in pure assembler, and one of things I learned to care about a lot is exactly where in the chip each item gets put.

AVRs have lots of cheap program space and a little expensive memory space (relatively speaking) - but I guess with the P2, everything gets written into memory to start with, so writing a more complex program to save a variable or two might be a bit counter-productive.

I have not poked at the P1 documentation. If nothing else, it'll have the advantage of experience.

Many years ago, I had this same idea, but the P1, when I ran the math on it, wasn't fast enough. Then when the P2 came out, I went, "Hmmm!" and it may be fast(er) enough. So far it seems it is (with compiled spin2. Byte-interpreted is still too slow). The idea is to have lots of cogs (four would do, five would be nice, six would be better) cranking away in parallel while the rest of the cogs deal with things like user interaction and external memory.

Cogs in parallel means I really don't want to have to share variables (or anything else) with the hub. I don't want to have to keep going back to the hub at some point in its rotation. The only thing the cog should have to do with the hub (as far as I'm concerned right now) is just passing back and forth a few longs now and then (see edit below).

Also while it can be dealt with in bytes and words, to me that sorta defeats the purpose of using a 32-bit processor. If I wanted to monkey with eight bits at a time, I'd use an AVR. I fully intend to cause the user interface to deal strictly in longs (perhaps not so generic - but it's what I want).

I will go and look at the P1 documentation. I didn't think it was important, but perhaps it does retain quite a bit of relevance.

Thanks again for advice and excellent code. I may have to learn PASM2 while I'm here... S.

Edit below: What I mean here is that the cog has a lot of work to do - on its own. It should be able to talk to the hub when it is done with what it's got - or when it needs something else to do - but not in the regular course of operations. I don't think that's going to be fast enough - and it probably won't be regularly timed well enough, because I will have no idea where the hub 'rotor' is when the cog is done or needs more to do (thus also the ring buffers...).

JonnyMac · 2021-08-19 18:35

Also while it can be dealt with in bytes and words, to me that sorta defeats the purpose of using a 32-bit processor. If I wanted to monkey with eight bits at a time, I'd use an AVR.

That's a silly statement, and my SBUS decoder proves it. SBUS devices spit out a chunk of bytes that have to be decompressed to words for use by the application. In this case, I needed a byte array in cog and the P2 has instructions that simplify that, within the native 32-bit architecture. Be careful not to get wrapped around the "bigger is better" pole. I've been coding for a while and tend to be very stingy with variable space, unless being stingy gets in the way of code readability/usability.

You'll get lots of good help in this forum with PASM2. I don't anticipate the need a generic ring buffer object so I'm going to bow out.

Scroungre · 2021-08-21 06:40

It may be true that nobody else will ever want one, but I wanted one one day, so I made one, and posted it. Many people here helped me make it better. If anyone else wants one - now they know where to look! If nothing else, I learned by doing.

Have fun, y'all! S.

Ring Buffers in spin2 (Newbie stuff)

Comments