Meet the CHICAS and Let Them Talk
AI
Meet the CHICAS and Let Them Talk
2016-06-27
By
David "DeMO" Martínez Oliveira

The first MIA component is out. It is call CHICAS and it is MIA's voice. An artificial intelligence has to have a voice...hasn't it?. A voice is nice, but what about four voices?. That won't be nice... that will be awesome!. Seriously now. The idea with this project is to explore the use of TTS systems in the areas of HMI (Human Machine Interfaces).
The use of Text To Speech (TTS hereinafter) as a means for interfacing humans and machines is well-known and it is already in use nowadays. Despite of how simple the technology is (you just provide a text to some software that will speak it our), proper implementation is tricky. Speaking machines can become annoying very easily.

Some effort needs to be put on the interface, so the machine talks when it has to. In other words, using TTS technology to speak out the system log will be a bad idea.

Within the context of the MIA project, a TTS interface is mandatory. I really want my artificial intelligence to talk to me. And I want it to talk to me at any time and any place. Something like the talking spacecrafts in the movies. They magically know where you are and they talk to you using the proper channel: your communication device, a nearby console, or a voice in the air.

On top of that, I want to explore an idea that popped up some years ago. The concept of using multiple personalities to interact with the system. At first glance it looks like an interesting psychological concept. We trend to listen in different way to different people... at least that is what I think.

Based on these vague requirements let me introduce the very first MIA module that I have named CHICAS. By the way, CHICAS is Spanish for girls, and you will find out soon why it is named like that.

CHICAS Specification

Based on what I have said in the previous section, I will specify CHICAS as a small distributed application that can be deployed in different devices/platforms. This will allow the distribution of the "voice" along large premises, using multiple nodes physically distributed across a given space.

A CHICAS node will just speak out sentences as commanded by somebody else, a higher level control application that will be able to determine which node has to be used to carry out a given interaction.

Finally, in addition to the functionality to speak out messages, a CHICAS node shall be able to switch voices at run-time so I can try out the idea of having multiple personalities/entities within the same system.

For the time being, I'm not going further. As the MIA spec evolves additional functions have to be added. For instance, discovery capabilities or the ability to associate a location to the MIA node so the closest node to the user can be selected.

CHICAS Implementation

For the implementation (you can find it in my github if you want to try it out) I have chose to use Nyx (this is what I'm building it for :). Current Nyx implementation is enough for this simple project, it just takes a couple of line to get a TCP server up and running.

For the speech synthetiser there are many options. Festival has a quite good quality but, as far as I remember, it was a bit heavy. Espeak is OK and a lot lighter. It also provides quite some different voices to try out for my little experiment. However, I finally decide on picoTTS.

picoTTS provides, at least to my taste, most natural voices that espeak. It was developed for Android devices so it is pretty light and has good performance. Right now, the interface between CHICAS and picoTTS, is through an external process that executes the tool pico2wave to produce the audio file to play. This approach, even when a bit primitive, makes switching to other TTS (for instance espeak) a matter of changing a string in the application.

CHICAS Hardware

In order to cover large premises I will have to be able to deploy CHICAS in many network connected devices. For instance for just a standard home, you may want to have one device per room... it will depend on your house layout, but that looks reasonable.

So I tried the system in a couple of platforms. It works fine with all the devices I've tried but at the end, there was only one device that filled the needs, at least for the moment. But, let's go through the different options I tried:

  • PC. That works fine without any issue, however, PCs are bulky and expensive compared to other options.
  • Raspberry Pi. I have a standard Pi B and a Pi2. At first glance they looked OK as the Rpi has an analogue audio output to directly connect to some external speakers. Unfortunately, that output has a lot of noise that becomes annoying in a few minutes. I tested both models and it was the same in the two cases. Lowering the volume helps a bit but I finally discarded it.
  • Banana Pi. After the fail of the Rpi, I tried a Bpi I bought some time ago to test. And the analogue audio output on this one worked as expected. Is not fantastic, but there is no annoying background noise in the speakers. The Bpi has a couple of extra goodies on it.

It has a built-in speaker to capture audio for speech recognition or as a means to detect presence/activity in the room. I just captured some audio and it seems to work OK, however it looks like you have to be pretty close to the mic. I have to play with the mixer a bit.

It also have an IR sensor already connected. I do not know whether it can be useful but the thing is there.

The alternative to any of those is to use an external USB audio card. In that case there are plenty of possibilities. At some point I have to explore this possibility deeper, as Arietta G25 looks like a good candidate with wifi interface and a very small size and power fingerprint.

Check it out

So, this is it for know. You can download the code from github and try it out by yourself or you can check the youtube video below. Unfortunately, I have to push the volume to the maximum and those cheap speakers were distorting the sound quite a lot.

Since I recorded the video, I have upgraded a bit the setup with new speakers and a box for the Bpi. This is a picture of the current system... that I'm currently using to be warn when long tasks finish (i.e. downloads, calculations, ...).

The system right now, looks like this:

Second CHICAS Node

Want to try CHICAS?... Just grab the code from my github: https://github.com/picoflamingo/mia

It requires to use Nyx and svox (acually picoTTS).

So, now you have meet the CHICAS.... Get ready for the next component!

CU

Big thanks to Vik for giving us permission to use the great header picture. This picture is copyright of Vik, and it is not under the general CC-BY license of this site.

RELATED POSTS
MIA NOIR/1
The Perfect Knowledge Transfer Equation
Knowledge Equivalence
Smart Manuals
Header Image Credits: MilliganVick