Amazon Polly Text To Speech With aws-cli and Twilio

Today, Amazon announced a new web service named Amazon Polly, which converts text to speech in a number of languages and voices.

Polly is trivial to use for basic text to speech, even from the command line. Polly also has features that allow for more advanced control of the resulting speech including the use of SSML (Speech Synthesis Markup Language). SSML is familiar to folks already developing Alexa Skills for the Amazon Echo family.

This article describes some simple fooling around I did with this new service.

Deliver Amazon Polly Speech By Phone Call With Twilio

I’ve been meaning to develop some voice applications with Twilio, so I took this opportunity to test Twilio phone calls with speech generated by Amazon Polly. The result sounds a lot better than the default Twilio-generated speech.

The basic approach is:

  1. Generate the speech audio using Amazon Polly.

  2. Upload the resulting audio file to S3.

  3. Trigger a phone call with Twilio, pointing it at the audio file to play once the call is connected.

Here are some sample commands to accomplish this:

1- Generate Speech Audio With Amazon Polly

Here’s a simple example of how to turn text into speech, using the latest aws-cli:

text="Hello. This speech is generated using Amazon Polly. Enjoy!"
audio_file=speech.mp3

aws polly synthesize-speech \
  --output-format "mp3" \
  --voice-id "Salli" \
  --text "$text" \
  $audio_file

You can listen to the resulting output file using your favorite audio player:

mpg123 -q $audio_file

2- Upload Audio to S3

Create or re-use an S3 bucket to store the audio files temporarily.

s3bucket=YOURBUCKETNAME
aws s3 mb s3://$s3bucket

Upload the generated speech audio file to the S3 bucket. I use a long, random key for a touch of security:

s3key=audio-for-twilio/$(uuid -v4 -FSIV).mp3
aws s3 cp --acl public-read $audio_file s3://$s3bucket/$s3key

For easy cleanup, you can use a bucket with a lifecycle that automatically deletes objects after a day or thirty. See instructions below for how to set this up.

3- Initiate Call With Twilio

Once you have set up an account with Twilio (see pointers below if you don’t have one yet), here are sample commands to initiate a phone call and play the Amazon Polly speech audio:

from_phone="+1..." # Your Twilio allocated phone number
to_phone="+1..."   # Your phone number to call

TWILIO_ACCOUNT_SID="..." # Your Twilio account SID
TWILIO_AUTH_TOKEN="..."  # Your Twilio auth token

speech_url="http://s3.amazonaws.com/$s3bucket/$s3key"
twimlet_url="http://twimlets.com/message?Message%5B0%5D=$speech_url"

curl -XPOST https://api.twilio.com/2010-04-01/Accounts/$TWILIO_ACCOUNT_SID/Calls.json \
  -u "$TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN" \
  --data-urlencode "From=$from_phone" \
  --data-urlencode "To=to_phone" \
  --data-urlencode "Url=$twimlet_url"

The Twilio web service will return immediately after queuing the phone call. It could take a few seconds for the call to be initiated.

Make sure you listen to the phone call as soon as you answer, as Twilio starts playing the audio immediately.

The ringspeak Command

For your convenience (actually for mine), I’ve put together a command line program that turns all the above into a single command. For example, I can now type things like:

... || ringspeak --to +1NUMBER "Please review the cron job failure messages"

or:

ringspeak --at 6:30am \
  "Good morning!" \
  "Breakfast is being served now in Venetian Hall G.." \
  "Werners keynote is at 8:30."

Twilio credentials, default phone numbers, S3 bucket configuration, and Amazon Polly voice defaults can be stored in a $HOME/.ringspeak file.

Here is the source for the ringspeak command:

https://github.com/alestic/ringspeak

Tip: S3 Setup

Here is a sample commands to configure an S3 bucket with automatic deletion of all keys after 1 day:

aws s3api put-bucket-lifecycle \
  --bucket "$s3bucket" \
  --lifecycle-configuration '{
    "Rules": [{
        "Status": "Enabled",
        "ID": "Delete all objects after 1 day",
        "Prefix": "",
        "Expiration": {
          "Days": 1
        }
  }]}'

This is convenient because you don’t have to worry about knowing when Twilio completes the phone call to clean up the temporary speech audio files.

Tip: Twilio Setup

This isn’t the place for an entire Twilio howto, but I will say that it is about this simple to set up:

  1. Create a Twilio account

  2. Reserve a phone number through Twilio.

  3. Find the ACCOUNT SID and AUTH TOKEN for use in Twilio API calls.

When you are using the Twilio free trial, it requires you to verify phone numbers before calling them. To call arbitrary numbers, enter your credit card and fund the minimum of $20.

Twilio will only charge you for what you use (about a dollar a month per phone number, about a penny per minute for phone calls, etc.).

Closing

A lot is possible when you start integrating Twilio with AWS. For example, my daughter developed an Alexa skill that lets her speak a message for a family member and have it delivered by phone. Alexa triggers her AWS Lambda function, which invokes the Twilio API to deliver the message by voice call.

With Amazon Polly, these types of voice applications can sound better than ever.