Tutorials / Getting started / Building a voice to text search

May. 10, 2019

Building a Voice to Text Search

Input

The speech to text (STT) layer

This is where your user will speak to your application and their speech becomes text. Algolia only handles search that comes from text, so you must have a speech to text (STT) layer.

If you’re building on top of a voice-first platform, like Alexa or Google Assistant, then you get built-in speech to text. This is also true today inside the Chrome browser, and in iOS and Android native apps. For all other web-based applications, you’ll need an STT service. Some options are Google Cloud Speech to Text, Azure Cognitive Services, or AssemblyAI. You will send the user’s speech to the STT service, receive it back, and then send it to Algolia as a search query.

Steps

Add a speech to text (STT) layer
- With browser (Chrome only), native app, or voice platform tooling
- With third-party service, such as:
  - Google Cloud Speech to Text
  - Azure Cognitive Services
  - AssemblyAI

Output

Speech Synthesis

Not all voice platforms need speech synthesis, or text to speech (TTS). A mobile website, for example, might suffice for showing search results. If you need it, your choices again are either baked-in, or third party. Voice-first platforms have their own speech synthesis, of course, and all major, modern browsers have support for speech synthesis through the SpeechSynthesis API. If you want a wider choice of voices, you have Azure Cognitive Services or AWS Polly.

Steps

Determine if speech synthesis is necessary (might not be if not a conversation)
Implement speech synthesis
- With browser, native app, or vvoice platform tooling
- With third-party services
  - Azure Cognitive Services
  - AWS Polly

Fulfillment

The fulfillment is the business logic code that powers your application or website. Algolia will be the part of the fulfillment that brings up the relevant content to display to the user, much like Algolia is one part of your website or application today.

There are two parts to the Algolia fulfillment:

query time settings
index configurations

Query Time Settings

Set removeStopWords to the two-letter code of language used (e.g. en)
- This will pull out words like “a,” “an,” or “the” that don’t add value to the query
Send the entire query string along as optionalWords (no need to split the words)
- When searching conversationally, searchers might add words that won’t be in any of the records. Marking all of the words as optional means that records don’t need to match all of the words, but records matching more words will rank higher than those matching fewer.
Set ignorePlurals to true to the two-letter code of language used (e.g. en)
- This makes words like “car” and “cars” equivalent
Send analyticsTags including voice

Index Settings

Add query rules for dynamic filters

Building Search UI

Building Search UI

Building Search UI

Building Search UI

Building Search UI

Building Search UI

PHP

Ruby

JavaScript

Python

iOS

Kotlin

Android

.NET

Java

Golang

Scala

InstantSearch.js

React InstantSearch

Vue InstantSearch

Angular InstantSearch

iOS InstantSearch

Android InstantSearch

Index settings and search parameters

A full reference of API Endpoints

Rails

Symfony

Django

Laravel

Magento 1

Magento 2

WordPress

Shopify

Building a Voice to Text Search

On this page

Input

The speech to text (STT) layer

Steps

Output

Speech Synthesis

Steps

Fulfillment

Query Time Settings

Index Settings

Did you find this page helpful?

On this page