rePlay - An audiostreamer + sample player

Ahle2 · 2020-09-22 12:30

Hi all,

I have made a quite useful object for playback of multiple simultaneous audiostreams. At the moment it can be used with SD card and COG ram, but I plan to include HyperRam and possible other memory types in the future. I would have released the first version now if it wasn't for the fact that the SD drivers I have tried are buggy, limited and lacks the most important feature for this application, a non blocking DMA read mode. I did manage to get it going from a single SD card with two simultaneous audio streams at 44100 Hz stereo mode. But all tested drivers locked up and/or bugged out "randomly". Then I had to start a separate thread and waste a cog to make it non blockning. I think that a good bug free SD driver with DMA support is of utter most importance for many tasks. Parallax should get behind an effort at making an official gold standard SD driver. Besides serial, video, I2C, SPI, USB etc. I really think that this is one of the most important objects to get done (right) before the official P2 release. No shadow falls on all involved in making SD drivers for the P2, without them we would have nothing at all. I'm not bashing on anyone.

/Johannes

Ahle2 · 2020-09-22 12:30

Reserved

rogloh · 2020-09-22 13:49

PM sent.

Ahle2 · 2020-09-23 08:06

Thanks Roger, that driver is a work of art.

That is the definition of gold standard in my eyes!

Ahle2 · 2020-09-23 08:16

@JonnyMac

I know that you have been an advocate for an object like this. Are you interested in beta testing and reviewing the API?

rogloh · 2020-09-23 12:35

Thanks Ahle2! I would tend to agree it would be nice to have something really good for SD on the P2, that's easy to use, is high performance and leverages whatever it can from the P2 and is rock solid reliable. There's probably a couple of access models that might be desirable:

1) in-line requests from a single COG. Calling code blocks the caller until the reads/writes are complete and runs in the calling COG.

2) A dedicated SD COG. Maybe some type of mailbox setup similar to what I did for the HyperRAM driver could allow multiple COGs to make simple requests for file data on the SD card being managed by the driver. This SD COG could potentially be written to also understand filesystems and control the transfer of data using the streamer (acting like a DMA engine for the clients). This model may relieve the clients of needing to know the specifics about filesystem formats, and they could just request to read/write data from named/opened files at given offsets in the files etc. It could make things easy to use. The benefit of this approach is that multiple languages could take advantage of it, instead of mainly the language it is written in (though FlexC does let multiple languages share SPIN which is nice).

JonnyMac · 2020-09-23 15:28

Ahle2 wrote: »

@JonnyMac
I know that you have been an advocate for an object like this. Are you interested in beta testing and reviewing the API?

Yes!

cgracey · 2020-09-24 04:15

Man, if we had solid drivers for SD, HyperRAM, and the SPI flash, we'd have it all covered.

rogloh · 2020-09-24 04:53

cgracey wrote: »

Man, if we had solid drivers for SD, HyperRAM, and the SPI flash, we'd have it all covered.

For sure, that would be great.

I was sort of hoping sometime down the track that I might be able to extend my memory driver API layer to also support mapping from SPI flash, such that callers could read from some external address range just like HyperRAM and HyperFlash, and the memory driver would go copy data in/out of it to/from HUB RAM using the same set of APIs. With work it could probably also copy SPI flash directly into HyperRAM etc using this address space. It's not there yet but I've sort of started to design the memory API to try to head down that path anyway.

It may be possible to do this without a dedicated SPI COG via bit banging in the calling COG's context so they don't have to use the streamer resource they may already be using for other purposes, though it would require a lock to prevent other COGs from accessing the flash when it is in use. That's probably the way to do it.

For SD cards, another mailbox based SD driver COG might be nice, to allow things like audio COGs to stream from files in parallel to other independent application COG(s) doing general read/write from SD. This is probably especially important for self hosted setups, where you might have multiple COGs desiring access to the SD filesystem. Eg, some network connection might want to write data into it, while a compiler tool might want to read/write files.

rogloh · 2020-09-24 05:06

If anyone already has good Spin2 based bit bang SPI Flash read code available, I could try to experiment with this idea to add it into the driver (at least for reading). In Spin2 its read performance may not be optimal, at least in interpreted SPIN2 vs Fastspin, but the memory functionality will be increased. We could then try to do hub-exec/inline assembly extensions perhaps down the track for better speed.

cgracey · 2020-09-24 05:18

It would be important to be able to access the SPI flash both address-wise and FAT-wise. Addresses $080000-$FFFFFF (15.5MB) could be a solid-state drive. That thing is a lot easier to deal with than an SD card.

Cluso99 · 2020-09-24 05:48

rogloh wrote: »

cgracey wrote: »

Man, if we had solid drivers for SD, HyperRAM, and the SPI flash, we'd have it all covered.

For sure, that would be great.

I was sort of hoping sometime down the track that I might be able to extend my memory driver API layer to also support mapping from SPI flash, such that callers could read from some external address range just like HyperRAM and HyperFlash, and the memory driver would go copy data in/out of it to/from HUB RAM using the same set of APIs. With work it could probably also copy SPI flash directly into HyperRAM etc using this address space. It's not there yet but I've sort of started to design the memory API to try to head down that path anyway.

It may be possible to do this without a dedicated SPI COG via bit banging in the calling COG's context so they don't have to use the streamer resource they may already be using for other purposes, though it would require a lock to prevent other COGs from accessing the flash when it is in use. That's probably the way to do it.

For SD cards, another mailbox based SD driver COG might be nice, to allow things like audio COGs to stream from files in parallel to other independent application COG(s) doing general read/write from SD. This is probably especially important for self hosted setups, where you might have multiple COGs desiring access to the SD filesystem. Eg, some network connection might want to write data into it, while a compiler tool might want to read/write files.

No-one was interested when I posted my SD driver so I haven't bothered with going any further until I need it myself.
Here are the links
forums.parallax.com/discussion/171642/p2-sd-drivers-cog-pasm-version-v-223/p1
https://forums.parallax.com/discussion/comment/1499300

Ahle2 · 2020-09-24 08:47

@Cluso99
Your driver might be better for what I am after, but I couldn't get the grip on it. So that driver wasn't part of my judgement of the state of SD drivers on the P2. I would like to try it though!

@Chip
I REALLY think that a fast solid SD driver for the P2 is of great importance. Parallax should get a taskforce going to make an "official" driver that lives up to the high standards of embedded systems these days. I know that we have got the knowledge and brain power among the forumistas. (mostly in Australia!

)

@rogloh
Yes, that sounds good! A simple interface to start the driver in either "cog mode" or "caller cog mode". For this application I would use it in cog mode for non blocking DMA transfers. Each time one of the multiple concurent audio buffers needs an update a cog attention event is triggered by the reSound driver. Then a service routine is called that looks up which buffer/buffers needs new audio data. I would like to just give the SD driver a HUB pointer to one of the buffers, a data length and a file handle (needed when multiple files are open at the same time) and everythings happens in the background. Fire and forget. My attempts at making this work with current drivers have been futile and things locks up after a while because the code wasn't made with multiple intances (open files) in mind. Strangely I had better or the same read performance on the P1 at 80 MHz using fsrw. I'm no expert on SD spi bus performance, but this seems backwards. Surely a P2 with a higher clock frequency and smart pins should do better?! We might have hit the limit of what a typical SD spi bus can deliver and that's the bottleneck?

Cluso99 · 2020-09-24 09:17

@Ahle2,
The way I use the driver in CPM is to locate the disk files (there are 8 x 8MB contiguous FAT32 files which store the contents of each CPM disk) at startup and store their starting addresses in a table.
Then to access any sector with the driver is to pass the sector address, the hub buffer address, and the command to read the sector. It's as simple as this. Of course, there is also the FAT32 driver (I converted Kye's to spin2) but apart from using it to locate the files, I don't use it in CPM.

My P2 SD (P2ASM) Driver resides in its' own cog and also can walk the FAT32 tree, locate a file and return the starting sector address and file size in bytes and/or load/execute the file. The hub mailbox interface is 4 longs.

Wuerfel_21 · 2020-09-24 11:38

Ahle2 wrote: »

Strangely I had better or the same read performance on the P1 at 80 MHz using fsrw. I'm no expert on SD spi bus performance, but this seems backwards. Surely a P2 with a higher clock frequency and smart pins should do better?! We might have hit the limit of what a typical SD spi bus can deliver and that's the bottleneck?

P1 fsrw uses an almost-20 MHz clock. (or almost-10 for reading when using sdspi_safe.spin - I never had even a single bit error using the 20Mhz read routine despite having a really janky SD card setup (going through a 5V-tolerant breakout board), so IDK why that even exists) Spec-wise 25 is the max frequency for any ol' card, although most cards you'd actually want to use go up to 50 MHz (So checking if it actually supports highspeed is kinda optional). However, (clock speed*bus width) is only the bottleneck for sequential access, for single block reads the wait time (between issuing the command and getting the block start token) often outweighs the actual transfer time. The only simple way to speed that up is to get a faster card (A1/A2 rating). A2 cards also support special features called "Command Queue" (lets you send new commands while the card is already busy, reducing the amount of "dead air" on the bus) and "Cache" (basically card-side write-behind), but as far as I can tell you can't use these in SPI mode.

Speaking of which, 4 bit SD bus mode would increase sequential rates by 4, obviously. Most cards also support UHS-1 modes (up to 208 MHz clock), but that requires 1.8V signalling - P2 can output that with BIT_DAC, but IDK if it can read the data coming back. It also requires something like Peter's SD power switching thing, as you can not get it back into any 3.3V mode after enabling UHS.

Ahle2 · 2020-09-25 07:37

It's quite obvious after reading through your comment that no available SD driver takes all these technical details into account. It's just bitbanging synced to the system clock without any knowledge of what kind of card that is present. With the smart pins it should be easy to do clock frequency agnostic SPI bus communication at different rates for different cards. Maximising performance in each case.

Ahle2 · 2020-09-25 08:25

@JonnyMac
I will come back with something for you to test out and review in the near future. I may be dropping SD support (for now) and do a pure HyperRam version since that will work with many simultanous audiostreams from the same device without hickups. That way we can can concentrate on getting the API and processing right and when the state of SD drivers have improved, just toss that in as well. Do you have a HyperRam module? Also, I have to ask rogloh for permission to share his beta driver. I have promised to keep it to myself.

Ahle2 · 2020-09-25 08:39

@JonnyMac
One more thing, it may be necessary to desolder the chips on the HyperRam module and replace them with something bigger since even a standard 44.1kHz stereo music track is 40++ MB in size.

rogloh · 2020-09-25 08:48

@Ahle2, please keep that pre-beta software I sent to yourself as discussed. I plan to release to the forum a final updated driver this weekend so it won't be long. Only a final sanity test remains. No known issues are remaining now and I can't add any more code to the PASM2 driver anyway.

JonnyMac · 2020-09-25 14:38

@Ahle2 I don't have a HyperRam module and am looking for something that I can use with SD. I will wait until development matures.

Ahle2 · 2020-09-26 09:09

@Cluso99
Your driver seems interesting. I will give it a shot. Can it do DMA as well? What's performance like?

Ahle2 · 2020-09-26 09:17

@JonnyMac
I will not drop SD support, that was a little bit of frustration that came out. Because every stream can have it's own arbitrary sample rate, number of channels, sample format, buffer size etc, the "data service" routine will request reads from the SD card in a very async and bursty way. No driver can handle this (of the ones I've tried) atm without some problems.

Cluso99 · 2020-09-26 12:51

Ahle2 wrote: »

@Cluso99
Your driver seems interesting. I will give it a shot. Can it do DMA as well? What's performance like?

Not sure what you mean by DMA. It runs in its own cog and reads/writes the SD 512 byte sector straight to/from hub. There is no cog buffer, and it does not do read forward or write behind. This could be done via another cog or else extend the driver. I never used the buffering in the P1 drivers.

evanh · 2020-09-26 13:13

Streamer + FIFO = One DMA channel.

Ahle2 · 2020-09-26 13:42

Sounds like it may be what I'm after!

And I like the fact that the low level stuff are completely decoupled from the file system and file handling. Your coding style would have been appreciated at my old job (on that big Swedish telecom company) where we were 1700 SW engineers on the same multi million of lines code base. We were shot at sight if any details were exposed beyond the teams current scope. Every interface was clearly defined by the implementing team and by all the user teams way in advance. That process took weeks some times. Not a single person knew even 1/100 of the details on all classes and sub systems. Just enough to be able to work in the current sand box that was assigned for that period. That would never have been possible without decoupling and abstraction. (Oh the horrors, but I'm in a better place now)

Cluso99 · 2020-09-27 00:34

Ahle2 wrote: »

Sounds like it may be what I'm after! And I like the fact that the low level stuff are completely decoupled from the file system and file handling. Your coding style would have been appreciated at my old job (on that big Swedish telecom company) where we were 1700 SW engineers on the same multi million of lines code base. We were shot at sight if any details were exposed beyond the teams current scope. Every interface was clearly defined by the implementing team and by all the user teams way in advance. That process took weeks some times. Not a single person knew even 1/100 of the details on all classes and sub systems. Just enough to be able to work in the current sand box that was assigned for that period. That would never have been possible without decoupling and abstraction. (Oh the horrors, but I'm in a better place now)

I have been fortunate in my coding. When I have worked on projects with teams, the code I was contracted to write was clearly defined. Even when it was a tiny block and verbally communicated, it was clearly defined.
However, most of my software projects were just me. I have written complete online (live, not batch) order entry, packing, invoicing, inventory, and accounting systems - well over 500 programs. I have only needed to worry about me. But I have always considered that someone will need to maintain that software, and that may be me. So I always have considered documenting the code as a primary part of writing code in the first place.
And I have taught numerous programming courses on the Singer/ICL mini I worked on. This keeps you on your toes as there is always someone on the course who will ask the curley questions.

FIW the Singer/ICL mini was very much like the Propeller. It could run up to 20 Partitions (COGS) and each had their own memory (COG), plus they had access to a shared common memory (HUB). They could run code from both partition (cog) and common (hubexec). It was programmed in assembly (high level with macros) where every instruction was memory to memory like the prop. For example, one instruction could move 1 to 100 data (bytes), another could multiply 1-10 digits by 1-1- digits where the result was the total length of the source operands (ie 2-20 digits) - all in decimal so no overflow possible. The memory was addressed in decimal too. Every instruction was 10 bytes long, on a 10 byte boundary. Originally there were only 14 assembler instructions. And there were 3 index registers (memory locations) for each partition.

Ahle2 · 2020-10-01 12:18

@Cluso99
You have been around a long time doing this stuff, it's very impressive to see that you havn't lost interest yet. I could see myself doing this 3 decades into future. I started doing electronics/programming as a hobby in my preteen years. It's almost 3 decades now. I have been doing this for a living since a decade ago, but I still do it as a hobby in my spare time. I guess we (many forumistas) are helpless!

Maciek · 2020-10-01 12:51

Ahle2 wrote: »

... I guess we (many forumistas) are helpless!

You got that right, unfortunately. It was a secret until now.

rePlay - An audiostreamer + sample player

Comments