Integrating ElevenLabs AI Text to Speech with a Headless CMS
Storyblok is the first headless CMS that works for developers & marketers alike.
In this tutorial, we’ll explore how to generate audio versions of your content entries using Storyblok and ElevenLabs. ElevenLabs is an innovative technology company specializing in realistic and natural-sounding AI-generated voices. Using ElevenLabs text-to-speech API endpoint in combination with Storyblok’s webhooks and APIs, we’ll create a serverless function that automatically produces an mp3 file hosted in Storyblok’s built-in DAM. This approach is technology-agnostic, allowing you to serve high-quality narrated content across different channels, improving the user experience and accessibility of your digital offerings.
Are you curious to jump right into the code? Check out the GitHub repository containing the complete logic required in the serverless function.
Requirements
In order to follow this tutorial, please make sure you meet these requirements:
- A basic understanding of JavaScript and TypeScript, serverless functions, and webhooks
- A Storyblok account
- An ElevenLabs account
- A Netlify account
- Netlify CLI installed
- Node.js LTS installed
Creating the content model in Storyblok
First of all, let’s set up a suitable component schema in Storyblok. If you’re unfamiliar with this step, please refer to our documentation. For any content type of your choice, create an Asset field with the technical name audio
, configured only to accept audio files. And…that’s it! As we will crawl your production website to retrieve the content as it is delivered to the user, this designated field is all we need to proceed.
Creating the serverless function
In a blank local project initialized with npm, run netlify init
followed by netlify functions:create
. Pick Serverless function (Node/Go) and TypeScript from the options. We can use the basic [hello-world] example as a template. Let’s call the function text-to-speech
. After the installation has been completed, you’ll find the generated boilerplate code in functions/text-to-speech/text-to-speech.ts
.
Let’s create a few additional folders so that the project has the following folder structure:
netlify └── functions └── text-to-speech src └── lib ├── elevenlabs └── storyblok
As you can see, we’ll manage the relevant logic for ElevenLabs and Storyblok in two respective folders. Before aggregating the final serverless function, let’s therefore create and discuss these separately.
Before moving on, let’s install all required additional dependencies:
npm i elevenlabs storyblok-js-client cheerio dotenv formdata-node
Also, let’s make sure to set “type”: “module”
in the package.json.
Retrieving an audio stream from ElevenLabs
Now, let’s create src/lib/elevenlabs/text_to_speech_stream.ts
with the following content:
ElevenLabs conveniently provides a JavaScript client. After checking whether the ElevenLabs API key has been provided as an environment variable, the client is initialized and the createAudioStreamFromText
function is exported. This function accepts a string as a parameter, allowing us to pass text content from Storyblok at a later stage. Using the generate
method, an audio stream of the text content is created and returned. It is possible to customize the output by selecting a voice and a model.
You could create voice
and model
fields in your content model to allow content creators to influence the AI-generated audio output per story.
Let’s export the function by creating src/lib/elevenlabs/index.ts
with the following content:
Handling the content entry and uploading the audio file to Storyblok
In the src/lib/storyblok
folder, we’ll need to provide the code to:
- retrieve and process the content of a published or modified Storyblok story to hand it over to the previously created
createAudioStreamFromText
function - upload the generated audio file as an asset in the Storyblok space and assign it to the
audio
asset field of the relevant story
For both operations, we’ll require the Storyblok JavaScript client. We can initialize the client in a dedicated src/lib/storyblok/client.ts
file:
Similarly to the ElevenLabs counterpart, we’ll make sure that the access token is provided before proceeding to initiate the client. For our purposes, the client needs to be initiated with a personal access token in order to be able to use Storyblok’s Management API.
Hereafter, we can proceed to retrieve and process the content. In this example, we’ve chosen to use Cheerio in order to crawl and parse the content as it is delivered to the audience, directly from the production environment. The advantage of this approach is that the logic that aggregates the story content in the frontend does not have to be reproduced in the serverless function (which is particularly a concern when dealing with complex stories with many nested components).
Should your project not have any crawlable production environment, it is also possible to fetch and aggregate the story content directly in the serverless function.
Let’s create src/lib/storyblok/get_story_content.ts
with the following content:
The first function, getStoryUrl
, requires the parameters spaceId
and storyId
(which are included in the webhook payload), fetches the full story object, and returns the story’s full_slug
. It uses the stories endpoint of the Management API.
In combination with the production domain (defined via an environment variable), we can dynamically construct the URL to be crawled using Cheerio, which occurs in the getStoryContent
function. You can customize the titleSelector
and bodySelector
to match the DOM of your production environment.
Once the content has been loaded using the JavaScript Fetch API and parsed using Cheerio’s load
method, the final output is improved by separating the main headline from the main section of the text and instructing ElevenLabs to pause after reading the headline using the <break time="1.0s" />
tag.
Lastly, we need to take care of uploading the audio stream returned by the createAudioStreamFromText
function as an asset to Storyblok. In order to accomplish this, let’s create src/lib/storyblok/upload_asset.ts
with the following content:
The uploadAsset
function accepts three parameter: the file content retrieved from ElevenLabs, and the story and space IDs included in the webhook payload. First of all, a new asset entry is created and a signed response from Storyblok’s API is returned. Once the asset upload has been completed, the whole story object is fetched in order to replace the asset referenced in the audio
field and update the story. Lastly, the previous asset is removed.
Let’s export both functions by creating src/lib/storyblok/index.ts
with the following content:
Finalizing and deploying
The last step is to replace the code of the dummy serverless function located under netlify/functions/text-to-speech/text-to-speech.ts
with the following:
Here, we’ll use all of the functions created previously. After verifying that a POST
request to this endpoint is made, the webhook payload is turned into a JavaScript object, allowing us to provide the space and story IDs as parameters for getStoryContent
. Subsequently, the crawled content is provided as string parameter for createAudioStreamFromText
in order to retrieve the audio stream from ElevenLabs. Finally, the generated audio is uploaded to Storyblok and linked to the story using uploadAsset
.
For security purposes, preventing anyone from invoking our serverless function, you would want to use a webhook secret and verify the signature in a production scenario.
Now, we can conveniently deploy the serverless function by running netlify build
followed by netlify deploy —prod
. Don’t forget to add your environment variables to the Netlify project. A link to the Function logs of the Netlify project is provided. Here, we can copy the URL of the Endpoint.
In case the articles are very long and the serverless function times out, you could consider alternative backend solutions such as Netlify’s Background Functions.
Configuring the webhook in Storyblok
As we’d like to automatically generate and attach a new audio file whenever new or changed content is published, we’ll have to configure a webhook that gets fired whenever this event occurs. Therefore, in our Storyblok space, let’s head to Settings > Webhooks and create a New Webhook. Let’s name this webhook “Generate Audio via ElevenLabs” and paste the endpoint copied from Netlify beforehand. In the Triggers section, we’ll have to check Story published.
Please refer to our webhook documentation to learn more.
And that’s it! Try publishing any story, and you should see the generated audio file has been added to the story shortly after. You can also check the Netlify function logs to confirm whether the function has been invoked and run correctly.
Now, you can fetch the audio file directly from Storyblok’s Asset CDN and implement it in your digital offerings to provide state-of-the-art, AI-driven audio versions of your content.
Take this barebones example and feel free to customize it as required for your project. For example, you could provide additional customization and control options for editors, add more robust error handling, check whether the request is for a story from a particular folder or of a specific content type, and much more.