Text To Speech

Agent420 · 2009-08-12 16:29

Just getting back to some more Propeller experimentation, and Text To Speech came to mind.· I know from my initial introduction·to the Prop and demo board that Ken's Vocal Tract was an included demo, and it surprised me that speech on the Prop had not advanced much in the last year or two - synthetic speech seems like it would be nearly as popular as any of the Prop's video abilities, yet information on the subject is sparse and there are no links in any of the sticky's or objects in the exchange.

I note that Phil Pilgrim has done some work on Phonemic Speech Synthesis, creating a Prop object named Talk.· I won't be able to check that out until later tonight, but I see that a couple of posts suggest it may be difficult to understand.· It is also phonetic based, and I would like to investigate free english text conversion.

I recall from my early C64 days a program titled SAM (Software Automatic Mouth), apparently was the first commercial software based speech synthesizer.· The speech produced was very comprehendible.· The free text to phonetic RECITER program occupied only 6K, so I would think that something similar should be able to·be done within the memory limits of the Propeller.

So with that in mind, does anybody have any relevant information or links on the subject?

I am using these references so far:

Phil's thread on his Talk object:
http://forums.parallax.com/showthread.php?p=613308

eSpeak, an open source speech app to review for the english text to phonetic conversion:
http://espeak.sourceforge.net/

Free TTS, another open source project:
http://freetts.sourceforge.net/docs/index.php

edit -

SAM original documentation for the Atari version, which includes some interesting theory of operation (if not simply nolstagic value

)
http://www.retrobits.net/atari/sam.shtml

SAM said...
The program uses about 450 rules to convert English into S.A.M.'s phonetic language. Included among these rules are some stress markers for situations where the stress choice is unambiguous. In addition, S.A.M.'s usual punctuation rules still operate with some additional symbols ("!", ";", and ":") being considered as periods. The net result is that even directly-translated English text has a fair amount of inflection.

Post Edited (Agent420) : 8/12/2009 4:54:29 PM GMT

Rayman · 2009-08-13 00:44

I think I'd cheat and use an SD card to store a database of sounds to draw from...

But, for everything I've done so far, it has been far easier to just pre-record messages as .wav files on the SD card and play them out...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
My Prop Info&Apps: ·http://www.rayslogic.com/propeller/propeller.htm

$WMc% · 2009-08-13 00:57

Agent240

I had a text to speech card for my TRS80 back in the mid to early 1980s. My TRS80 had 16K of RAM and ran @ a Ballistic 3.5MHz.
This should be a breeze for the Prop.

Have you tried the EMIC speech module? I have used it with the BS2,But I havn't tried it with the Prop.

_________________________$WMc%______________

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
The Truth is out there············································ BoogerWoods, FL. USA

Agent420 · 2009-08-13 12:54

I have a Devantech SP03 on hand (http://www.robot-electronics.co.uk/htm/Sp03doc.shtml), it appears very similar to the EMIC modules and is based on the same WTS701 chip, but uses the male version and has an additional I2C interface.· I considered the EMIC but wanted a male voice for my project.· Seems like these are no longer available however.

The WTS701 is a great chip; the speech is good quality and the built in text to phonome converter allows you to create random phrases on the fly; you don't have to pre code your phrases with phonetic spelling.· One example of this would be speaking text that was input from a terminal or keyboard connection.

Unfortunately, these modules are a bit expensive ~ $70.

Rayman said...
I think I'd cheat and use an SD card to store a database of sounds to draw from...

But, for everything I've done so far, it has been far easier to just pre-record messages as .wav files on the SD card and play them out...

I believe the WTS701 speech chip actually does store the various·phonomes in analog form and then pieces them together to create speech.· From my understanding this may provide better quality and may be less complex than synthesizing the formants in real time, at the expense of memory space (though the WTS701 uses a very clever method of actually storing analog values inside the chip - they are not digitized samples). That method may be an alternative, but obviously adds the complexity of requiring a memory card [noparse][[/noparse]and the associated preloading of data on it].

I checked out Phil's Talk program last night and it's pretty close to what I remember from the SAM software.· The creation of real time formant synthesizing seems a bit complex, but I think I may experiment with it some more.· It sure would be nice to have a pure software solution for creating speech.

hinv · 2009-08-13 13:50

Wow that SAM brings back memories. I wonder would have come about if so many of us didn't pirate it. ;&|
That was 27 years ago! Has there been anything close from a software offering? Those speech chips aren't very active.

The talk demo doesn't sound a understandable as 27year old SAM on a ~250KIP processor. I do appreciate the work Phil did on it, but I think it needs more work.
I agree that a software only solution would be great.

Doug

BradC · 2009-08-13 14:16

hinv said...
Wow that SAM brings back memories. I wonder would have come about if so many of us didn't pirate it. ;&|

Wow! I still have my original SAM Apple ][noparse][[/noparse] disk and 8 bit DAC card that came with it!
I just can't remember who I lent my Apple ][noparse][[/noparse] to, to run it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lt's not particularly silly, is it?

tronsnavy · 2009-08-13 14:17

WOW... this thread brings back memories. I purchased a SP0256-17 voice synthesizer (radio shack) back in the early 1980's. I could not get it to work after the first night so I gave up. I think it's still in my junk box. If I still have it, I might try and hook it up to the prop this weekend. I know that it required a special crystal. As I recall, it used allophone synthesis (don't hold me to that, it's been awhile). I too remember SAM, although I never used it. I also agree that a software solution is best. Anyway, I will reply if the SP0256-17 still works (or doesn't). Have a great weekend.
Bob

hinv · 2009-08-13 14:29

BTW, I am confused about storing analog values. What technologies are they using for that?

Ale · 2009-08-13 14:35

There was a SAM for C64 too (I think). I know nobody that bough it though, I didn't have a C64 back in its heyday either.
For the PC there were some soft before the SB times. They were ugly, specially for non-english languages

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU

Microcontrolled · 2009-08-13 15:03

I can remember......

Long, Long, ago...............

There was Windows 98.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.

Robots are microcontrolled.
I am microcontrolled.

But you·can·call me micro.

If it's not Parallax then don't even bother.

I have changed my avatar so that I will no longer be confused with others who use generic avatars (and I'm more of a Prop head then a BS2 nut, anyway)

Agent420 · 2009-08-13 15:06

hinv said...
BTW, I am confused about storing analog values. What technologies are they using for that?

It's basically a storage matrix comprised of non-volatile capacitor cells that can store a·256 level analog 'bit' level per cell.· This seems a popular technology for storing short audio signals and has been used in many 'chip recorders', answering machines, pds's etc.·

Datasheet said...
Recording is stored into the embedded Flash memory cells, providing zero-power message storage. This unique single-chip solution ultilizes Nuvoton’s patented MLS technology. Therefore, voice or audio data are stored directly into the memory in their natural form without any compressions alike digital approach, providing high-quality, solid-state audio reproduction.

The WTS701 is based on Winbond’s Multi-Level Storage (MLS) technique in which one of 256 distinct voltage levels are precisely stored per memory cell. ·This provides eight times more storage space for any given memory size than the ordinary digital signal storage technology which can store only 0 or 1.

The end result is mapped into samples that are piped out of the chip's analog storage array. The signal is then smoothed over by routing it through a low pass filter and is available as an analog signal, or it can be passed through a encoder/decoder for digital audio output.

A datasheet for one of the ChipCorder ic's better explains the process:
http://www.winbond-usa.com/products/isd_products/chipcorder/datasheets/4004/ISD4004_Rev1.2.pdf

One of the patents also has some information:
http://www.patentgenius.com/patent/7554844.html

·

dMajo · 2009-08-13 15:13

Ale said...
There was a SAM for C64 too (I think). I know nobody that bough it though, I didn't have a C64 back in its heyday either.

I can confirm it because I had it and its name was SAM Reciter. It was quite good with english language but with italian ...

EDIT: ... I forgot: the sound was very robotic (metalic)

Humanoido · 2009-08-13 15:23

I have the SPO256 working on a Basic Stamp and have used it in a multiple Stamp driven Toddler humanoid robot, and also had SAM working on the Apple IIe. As I remember, Software Automated Mouth was one of my favorite programs and had numerous excellent features considering it was all in software.

BradC · 2009-08-13 15:32

Ale said...
There was a SAM for C64 too (I think). I know nobody that bough it though, I didn't have a C64 back in its heyday either.

There was indeed. Given the processor core was pretty much identical it was only a matter of porting the interfaces (user and HW) to make the emulation core work.
I must admit I'd given more than a passing thought to disassembling SAM and porting it to the Propeller. I got it in about '85. What's copyright these days? Still about 20 years ? [noparse];)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lt's not particularly silly, is it?

natpie · 2009-08-13 15:32

dMajo said...
EDIT: ... I forgot: the sound was very robotic (metalic)

Sam was very flexable.· You had quite a few settings you could change to create male and femail voices.·· You could make it sound more robotic or more natural with just a few settings.·· It was also able to sing.· I still have my commodore and pull out sam from time to time.

Agent420 · 2009-08-13 15:38

BradC said...
show previous quotes

Ale said...
There was a SAM for C64 too (I think). I know nobody that bough it though, I didn't have a C64 back in its heyday either.

There was indeed. Given the processor core was pretty much identical it was only a matter of porting the interfaces (user and HW) to make the emulation core work.
I must admit I'd given more than a passing thought to disassembling SAM and porting it to the Propeller. I got it in about '85. What's copyright these days? Still about 20 years ? [noparse];)[/noparse]

I've given even more of a passing thought... I've downloaded SAM plus a C64 emulator and dissassembler

http://www.members.tripod.com/the-cbm-files/speak/
http://www.ccs64.com/

EDIT - I had some trouble getting that disc image to work, but found a good one here:
http://www.emuasylum.com/forums/showthread.php?t=31785

It's not really easy to always infer the intent of raw ML assembly however.

I don't think that hardware technology is the issue here so much the algorithms and logic.· The original developers are still around and quite prevelant, now known as SoftVoice (http://www.text2speech.com/).·

Post Edited (Agent420) : 8/13/2009 4:01:53 PM GMT

BradC · 2009-08-13 15:48

Agent420 said...

It's not really easy to always infer the intent of raw ML assembly however.

I don't think that hardware technology is the issue here so much the algorithms and logic. The original developers are still around and quite prevelant, now known as SoftVoice (http://www.text2speech.com/).

I've got a bit of experience reversing all sorts of assembler algorithms (Italian fuel injection for example) and I cut my teeth on the 6502 and trying to figure out how Woz did what he did prior to obtaining a listing of the Apple ROMS (thanks Beagle Brothers!).. so I figured SAM would not be too hard a target.

To be honest I was aiming more for brute force and ignorance. Translate the code rather than understand and re-write the algorithms.

SAM (the core anyway) was written in assembler, and like the Parallax Propeller Compiler, being written in assembler, it's a lot easier to read and understand the resultant disassembly than it is to read and understand something created from a high level compiler.

In any case, I put it on the back burner a year ago.. if I ever locate my Apple which has my SAM D-A card in it, I'll certainly give it some more thought.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lt's not particularly silly, is it?

hover1 · 2009-08-13 16:08

I still have a running Digital Equipment decTalk module from 1986. We used it in a CAM program that would call you when a job (photoplotting, PC drilling, etc) was finished. You could also call it and query on the status of the job.
It was also used for the robot voices on "Short Circuit", (except for Jonny 5 of course), and it was the device that Stephen Hawking used for many years to vocalize his thoughts.

Agent420 · 2009-08-13 16:08

To be honest I was aiming more for brute force and ignorance. Translate the code rather than understand and re-write the algorithms.

SAM (the core anyway) was written in assembler, and like the Parallax Propeller Compiler, being written in assembler, it's a lot easier to read and understand the resultant disassembly than it is to read and understand something created from a high level compiler.

The problem here is that the C64 version made use of the SID sound chips to create the audio, so you would have to decipher what timings, frequencies and filters it was applying to come up with something similar.

I was a C64 brat, and never spent a lot of time hacking Apples or Ataris, so I'm not sure how much different their versions may have been.

While we're on the subject, there is also some good info on computerized phonetics at this link, inlcuding some spectral analysis software...
http://www.fon.hum.uva.nl/praat/

This topic can become very complex quite quickly...

Agent420 · 2009-08-13 16:10

Btw, no discussion of synthetic speech would be complete without a reference to Sh!t Talker

http://unaesthetic.net/st/index.shtml

The Stephen Hawking reference reminded of this free gidget.

blittled · 2009-08-13 16:32

[i][b]Agent420 said...[/b][/i]
[i]I have a Devantech SP03 on hand ([/i][url=http://www.robot-electronics.co.uk/htm/Sp03doc.shtml][i]http://www.robot-electronics.co.uk/htm/Sp03doc.shtml[/i][/url][i]), it appears very similar to the EMIC modules and is based on the same WTS701 chip, but uses the male version and has an additional I2C interface.  I considered the EMIC but wanted a male voice for my project.  Seems like these are no longer available however.[/i]

I found some Devantech SP03's for sale at http://www.junun.org/MarkIII/Info.jsp?item=31·but they are $105!

[url=http://forums.parallax.com/member.php?u=56988][i][b]tronsnavy[/b][/i][/url][i][b] said...[/b]
[/i]
[i]I know that it required a special crystal[/i]

The SPO256 requires a 3.12MHz crystal but a colorburst 3.57MHz crystal works ok. I built a circuit using a SPO256-AL and a PIC16F84 and it worked well.

tronsnavy: If you do hook up your SPO256-17 could you post the code? I was thinking of adding mine to my propeller based Boe-Bot.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
What electronics need - MORE POWER!!!!!!!

Beau Schwabe · 2009-08-13 17:32

S.A.M. was also available for the Atari, and it too was one of my favorite programs.

SAM used a set of phoneme's to synthesize speech. This is quite different than the VocalTract engine that Chip created in that it uses speech formants to slide from one transition to another rather than a cut and paste approach used with phoneme's.

Chip has written an entire chapter to a book soon to be released explaining how to use speech formants with the VocalTract engine. (I will see if I can get his permission to post it)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

Agent420 · 2009-08-13 17:41

^ That would be great.· There is no doubt that a synthetic speech app is possible, but the 'recipe' to create good sounding phonemes and realistic inflection seem quite complex to determine.

Until then I will keep playing around, as well as investigating the rules based text/phonome conversion.

Agent420 · 2009-08-13 17:54

natpie said...
show previous quotes

dMajo said...
EDIT: ... I forgot: the sound was very robotic (metalic)

Sam was very flexable.· You had quite a few settings you could change to create male and femail voices.·· You could make it sound more robotic or more natural with just a few settings.·· It was also able to sing.· I still have my commodore and pull out sam from time to time.

Btw, you should check out the open source eSpeak app I linked in the first post...· Using different phonome settings you can create not only male and female voices, but also one with a British accent.

I imagine that aspect is also one of the powers of software synthesis, and if we can better apply the VocalTract elements you could theoretically create any voice with a Prop.

Beau Schwabe · 2009-08-14 03:11

Here is Chip's work on Speech synthesis using formants. This also by the way is part of what he was presenting at the UPNE Propeller EXPO ..

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Beau Schwabe

IC Layout Engineer
Parallax, Inc.

Agent420 · 2009-08-14 10:23

Thanks for posting that.· It's an interesting read from a technical pov, but I guess I was hoping there would be a big lookup table for all the formant combinations, similar to what Phil has done with his Talk object.· There's no free lunch [noparse]:)[/noparse]

I am going to continue pursuing this using the PRAAT software I linked earlier.· I have not had time to fully experiment with it, but it significantly expands on the simple Spectrum Display application that is included in Chip's documents, apparently performing formant analysis and indentifying their frequencies.

I guess if this was all too simple there wouldn't be as much satisfaction getting it to work·;-)

tronsnavy · 2009-08-14 11:14

blittled,

Sorry I did not get back with you sooner... got real busy at work and had friends in from out of town. Anyway, I did get a chance last night to see if I still had the SP0256-017... sure enough, it was in my junk box. Right where I put it, 27 years ago. I also did a quick search on the internet for info... found the original booklet. I will build the ckt this weekend. I will try and write some code this weekend too, but spin is ambiguous at times and I only have about a month of training. I'm thinking about generating an array for the [noparse][[/noparse]70] or so allophones (in binary, for sending to the i/o pins). This will make it easy to generate loops and call only the elements needed for specific words and sentences. I will post the code (and schematic) when complete.

I see that you have your prop hooked up to boe... hows it going? I am going to do the same thing, once I gain more knowledge on spin. I have ping... going to buy "basic boe" next. Have a good weekend.

Bob (transnavy)

Agent420 · 2009-08-14 14:58

It wasn't immediately obvious to me, but I now see that Chip's VocalTract is based on the Klatt Synthesizer·(should have read Phil's Talk object comments more thoroughly and picked up on the Klatt reference).

This looks as though it is a complex field

Text To Speech

Comments