Data Sanitization
On this page
Algolia Does Not Sanitize Your Data
Algolia accepts any data, without any alteration. Same goes with the response, Algolia returns all data in your index as is. It therefore saves and returns HTML and XML tags and their properties.
That said, Algolia’s search algorithm ignores HTML and XML. Users can’t search tag content.
Let’s take a look at an example. Algolia has no problem saving a record that contains the HTML tag <strong>
. However, because Algolia strips tags during search, a search for the word “strong” won’t find the following record.
1
2
3
{
"description": "She is amazingly <strong>powerful</strong>, deeply visionary."
}
Sanitizing the query response
Some characters are systematically removed (not escaped) from the API’s response:
- Control characters (U+0000 to U+001F)
- Delete (U+007F)
Security
Clean you indices
Since Algolia does not sanitize your data and returns it as is, you need to manage this yourself. Otherwise, you run the risk of an XSS attack.
To avoid it, you have two options:
- Escape or strip potentially dangerous characters before indexing
- Escape or strip them before displaying results
Clean your user search input
You also need to handle user search input. Any HTML or code they may enter in the search bar exposes you to an XSS attack because Algolia sends back the query in its response. Therefore, you want to escape or strip tags and code before displaying them.