Tutorials / Getting started / Building a voice to text search
May. 10, 2019

Building a Voice to Text Search

To build a voice search experience, your application needs three things:

  • an input
  • an output
  • and - in the middle - fulfillment

Algolia sits right there in the middle in the fulfillment, but before looking at that, let’s look at what you need for the outer layers of the round-trip.

Input

The speech to text (STT) layer

This is where your user will speak to your application and their speech becomes text. Algolia only handles search that comes from text, so you must have a speech to text (STT) layer.

If you’re building on top of a voice-first platform, like Alexa or Google Assistant, then you get built-in speech to text. This is also true today inside the Chrome browser, and in iOS and Android native apps. For all other web-based applications, you’ll need an STT service. Some options are Google Cloud Speech to Text, Azure Cognitive Services, or AssemblyAI. You will send the user’s speech to the STT service, receive it back, and then send it to Algolia as a search query.

Steps

  1. Add a speech to text (STT) layer

    • With browser (Chrome only), native app, or voice platform tooling
    • With third-party service, such as:
      • Google Cloud Speech to Text
      • Azure Cognitive Services
      • AssemblyAI

Output

Speech Synthesis

Not all voice platforms need speech synthesis, or text to speech (TTS). A mobile website, for example, might suffice for showing search results. If you need it, your choices again are either baked-in, or third party. Voice-first platforms have their own speech synthesis, of course, and all major, modern browsers have support for speech synthesis through the SpeechSynthesis API. If you want a wider choice of voices, you have Azure Cognitive Services or AWS Polly.

Steps

  1. Determine if speech synthesis is necessary (might not be if not a conversation)

  2. Implement speech synthesis

    • With browser, native app, or vvoice platform tooling
    • With third-party services
      • Azure Cognitive Services
      • AWS Polly

Fulfillment

The fulfillment is the business logic code that powers your application or website. Algolia will be the part of the fulfillment that brings up the relevant content to display to the user, much like Algolia is one part of your website or application today.

There are two parts to the Algolia fulfillment:

  • query time settings
  • index configurations

Query Time Settings

  • Set removeStopWords to the two-letter code of language used (e.g. en)

    • This will pull out words like “a,” “an,” or “the” that don’t add value to the query
  • Send the entire query string along as optionalWords (no need to split the words)

    • When searching conversationally, searchers might add words that won’t be in any of the records. Marking all of the words as optional means that records don’t need to match all of the words, but records matching more words will rank higher than those matching fewer.
  • Set ignorePlurals to true to the two-letter code of language used (e.g. en)

    • This makes words like “car” and “cars” equivalent
  • Send analyticsTags including voice

Index Settings

Did you find this page helpful?