Is 8 bits enough for digital voice?
lardom
Posts: 1,659
in Propeller 1
I might have to learn pasm because sampling audio would require it.
Digital audio is a brand new area for me. I don't mind quality lower than telephone voice. I just want to know if one byte carries enough data.
I'll be satisfied with my original goal of wireless motor control but being able to speak through a wireless robot would just be 'over the top' fun.
Digital audio is a brand new area for me. I don't mind quality lower than telephone voice. I just want to know if one byte carries enough data.
I'll be satisfied with my original goal of wireless motor control but being able to speak through a wireless robot would just be 'over the top' fun.
Comments
Next thing to worry about is the sample rate. 8KHz should do.
J
Be sure and do at least 8khz, if not 16khz sampling.
Voice works in as little as 3 to 3.5khz of bandwidth, meaning 8khz sampling is near the minimum. Optimal voice reproduction happens at about 8khz, ideally 10khz of audio bandwidth.
Many of the harmonics that clarify voice and get away from a nasal or muffled sound exist above 5khz, which is a very good compromise.
If you are sampling voices, be sure to compress, equalize and normalize your samples. You want them full and loud to insure all the good stuff is above that -40db or so noise floor. That is the single most important thing you can do to really maximize 8 bit samples. People hear the signal, and when it is well differentiated from noise, the minor artifacts inherent in 8 bit samples are largely ignored by most listeners.
For the eq, use a hard roll off at half your sample rate and maybe punch the highs and lows up 5db or so, depending on the voice. For men, a little 100hz and 3khz boost is good. For most wonen, 300hz and 3.5 to 4khz is good. If you aren't sure, leave it flat and employ aggressive compression.
A nice mic really helps.
Audacity does all this nice and easy. You can optionally record samples at a higher bitrate, do the processing, then output at the target rate. (Recommended ) This gives you a little room to get a good sample and process without excessive ringing and artifacts, both of which will make the voice muddy and or nasal sounding.
Google AM radio production for some tips on this. For modest sample rates and depths, they have it right.
Also, if possible, make sure your playback device has good response below 3khz. This makes a world of difference. Often, just adding a little mass (more substantial speaker and enclosure) and a firm mounting will do this for many enclosures.
You don't have to do all of that, but it's there fo those who might want to maximize 8 bit samples.
A byte carries plenty of data. Just make whatever you speak, loud.
I think it will give you a good idea of what sort of sound quality you can get with 8-bits.
Here are some comments from the program.
The program defaults to 11-bit audio but as you can see, the same bits go as low as 5-bits.
It's been a few years since I've played with the program but I thought it was pretty cool.
Of course the PASM code will give you a good head start with your project.
You'll need to figure out how to modify the sample rate if you want to reduce the amount of data required to send to a robot.
I've never used it myself. The last time I looked at it, I didn't understand how to use it. Mark_T didn't include a demo top object.
I'll study that object line by line alongside the Propeller manual.
BTW, I've drawn a 'lot' from your work. I wish you would post it to the OBEX. It's been very helpful. Thanks.
I've learned even more from Erlend's work. I'm pretty comfortable with what I can do so far.
I just desoldered the joysticks from a usb joypad to plug into my breadboard so that's what I'm working on at the moment.
===Jac
*(The u in uLaw is supposed to be a Greek letter mu but I can't be bothered to find the keyboard code for that right now)
So why not use 12 bit DAC and a 8-bit -> 12-bit lookup table at the receiving end?
There are other ways to encode speech more compressed, but they rapidly get more complex.
I know almost nothing about processing audio so I have a lot to learn. I'm assuming 'logarithmic' is how digital samples are interpolated. This is my guess. I'll research the subject.
A "12-bit lookup table" sounds like it 'needs' pasm to work. I'm guessing it will improve the quality of the transmitted audio. That is exciting.
Do you know of any examples of this that I can study?
https://en.wikipedia.org/wiki/G.711#.CE.BC-Law
A look up table is an array, you don't need PASM. You can put 12 bit values in a 16- or 32- bit integer,
there's plenty of room(!)
Sorry. just found some time to read your post. I agree with the observations that have been made but the question is, why do you need 8 bits and do you mean wirelessly "speak" live or recorded?
If recorded then that's easy and after leveling and compression you could do 8-bits but since you need memory anyway it is just as easy to go to 16-bits and get some great sound quality. A microSD is as cheap as it gets and you can treat it as a raw SPI memory if you like although for me it is just as easy to access it as FAT32 16-bit 44kHz wave files. For playback you essentially double buffer where one cog fills the next buffer while another cog plays the samples at "exactly" the playback rate. You'd be surprised how unintelligible it becomes if that last aspect is not observed! I have also demonstrated reading the SD card live without double-buffering but at a lower sample rate to handle the latency of reading a new sector.
If it's live then there is the problem with buffering packets as some packets will get lost or be retransmitted etc I would think although I should look to see what Mark_T has done then.
I 'need' the challenge. Controlling motors with the transceiver is 'old hat'.
a hard requirement to limit to 8 bit audio given the bandwidth available.
I thought you already wrote an object that transmits audio. I'll study your work.