Whether you’re using the API or Dashboard, it’s best to send several records at a time instead of pushing them one by one.
This has many benefits: it reduces network calls and speeds up indexing.
Customers with the largest number of records, such as those on the Enterprise plans, will see the biggest impact on performance, but we recommend everyone to send indexing operations in batches whenever possible.
For example, let’s say you’re fetching all data from your database and end up with a million records to index.
That would be too big to send in one take.
But sending one record at a time would take too long.
You would get much faster indexing by splitting the whole lot into smaller chunks of records, and sending these one by one.
Example
Continuing with our example.
You have a million records to index.
Pushing them in a single call wouldn’t likely be an option, because Algolia limits you to 1 GB per request.
Plus, sending that much data would fail anyway before ever reaching the API.
Your first instinct might be to loop over each record and send them with the addObjects method.
The problem is that you would perform a million individual network calls, which is bad from a performance standpoint both on your end and on Algolia’s side.
A much leaner approach is to split your collection of records into smaller collections, then send each chunk one by one.
For optimal indexing performance, we recommend a batch size of ~10 MB, which represents between 1,000 or 10,000 records depending on the average record size.
Batching records won’t reduce your operations count.
Algolia counts indexing operations per record, not per method call, so batching records won’t be counted differently than indexing them one by one.
Using the API
To push records in batches, you need to chunk your records, then loop over each chunk and send it to Algolia with the addObjects method.
If you need to send data from large files and handle concurrency in JavaScript, you can also use algolia-cli with the algolia import
command.
1
2
3
4
5
6
7
| $client = new \AlgoliaSearch\Client('YourApplicationID', 'YourAdminAPIKey');
$index = $client->initIndex('actors');
$records = json_decode(file_get_contents('actors.json'), true);
// Batching is done automatically by the API client
$index->saveObjects($records, ['autoGenerateObjectIDIfNotExist' => true]);
|
1
2
3
4
5
6
7
8
9
10
| require 'json'
require 'algoliasearch'
Algolia.init(application_id: 'YourApplicationID', api_key: 'YourAdminAPIKey')
index = Algolia::Index.new('actors')
file = File.read('actors.json')
records = JSON.parse(file)
records.each_slice(10000) { |batch| index.add_objects(batch) }
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| const algoliasearch = require('algoliasearch')
const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');
const client = algoliasearch('YourApplicationID', 'YourAdminAPIKey');
const index = client.initIndex('actors');
const stream = fs.createReadStream('actors.json').pipe(StreamArray.withParser());
let chunks = [];
stream
.on('data', ({ value }) => {
chunks.push(value);
if (chunks.length === 10000) {
stream.pause();
index
.addObjects(chunks)
.then(res => {
chunks = [];
stream.resume();
})
.catch(err => console.error(err));
}
})
.on('end', () => {
if (chunks.length) {
index.addObjects(chunks).catch(err => console.error(err));
}
})
.on('error', err => console.error(err));
|
1
2
3
4
5
6
7
8
9
10
11
| import json
from algoliasearch.search_client import SearchClient
client = SearchClient.create('YourApplicationID', 'YourAdminAPIKey')
index = client.init_index('actors')
with open('actors.json') as f:
records = json.load(f)
# Batching is done automatically by the API client
index.save_objects(records, {'autoGenerateObjectIDIfNotExist': True});
|
1
2
3
4
5
6
7
8
9
10
| let filePath = Bundle.main.path(forResource: "actors", ofType: "json")!
let contentData = FileManager.default.contents(atPath: filePath)!
let records = try! JSONSerialization.jsonObject(with: contentData, options: []) as! [[String: Any]]
let chunkSize = 10000
for beginIndex in stride(from: 0, to: records.count, by: chunkSize) {
let endIndex = min(beginIndex + chunkSize, records.count)
index.addObjects(Array(records[beginIndex..<endIndex]))
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| // Asynchronous version
List<Actor> actors = fetchActorsFromDatabase(); // a million actors
for (int i = 0; i < actors.size(); i += 10000) {
JSONArray chunk = new JSONArray(actors.subList(i, i + 10000));
index.addObjectsAsync(chunk, new CompletionHandler() {
@Override
public void requestCompleted(JSONObject jsonObject, AlgoliaException e) {
if (e != null) {
// Handle potential error here
}
}
});
}
// Synchronous version, must run in a background thread to avoid blocking the UI
List<Actor> actors = fetchActorsFromDatabase(); // a million actors
for (int i = 0; i < actors.size(); i += 10000) {
JSONArray chunk = new JSONArray(actors.subList(i, i + 10000));
try {
index.addObjects(chunk, null);
} catch (AlgoliaException e) {
// Handle potential error here
}
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| using System.IO;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
public class Actor
{
public string Name { get; set; }
public string ObjectId { get; set; }
public int Rating { get; set; }
public string ImagePath { get; set; }
public string AlternativePath { get; set; }
}
AlgoliaClient client = new AlgoliaClient("YourApplicationID", "YourAdminAPIKey");
Index index = client.InitIndex("actors");
// Don't forget to set the naming strategy of the serializer to handle Pascal/Camel casing
IEnumerable<Actor> actors = JsonConvert.DeserializeObject<IEnumerable<Actor>>(File.ReadAllText("actors.json"));
// Batching/Chunking is done automatically by the API client
bool autoGenerateObjectIDIfNotExist = true;
index.SaveObjects(actors, autoGenerateObjectIDIfNotExist);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| import java.io.FileInputStream;
import java.io.InputStream;
import com.fasterxml.jackson.databind.ObjectMapper;
public class Actor {
// Getters/Setters ommitted
private String name;
private String objectId;
private int rating;
private String imagePath;
private String alternativePath;
}
// Synchronous version
SearchClient client =
DefaultSearchClient.create("YourApplicationID", "YourAdminAPIKey");
SearchIndex<Actor> index = client.initIndex("actors", Actor.class);
ObjectMapper objectMapper = Defaults.getObjectMapper();
InputStream input = new FileInputStream("actors.json");
Actor[] actors = objectMapper.readValue(input, Actor[].class);
// Batching/Chuking is done automatically by the API client
boolean autoGenerateObjectIDIfNotExist = true;
index.saveObjects(Arrays.asList(actors), autoGenerateObjectIDIfNotExist);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| package main
import (
"encoding/json"
"io/ioutil"
"github.com/algolia/algoliasearch-client-go/algolia/search"
)
type Actor struct {
Name string `json:"name"`
Rating int `json:"rating"`
ImagePath string `json:"image_path"`
AlternativeName string `json:"alternative_name"`
ObjectID string `json:"objectID"`
}
func main() {
client := search.NewClient("YourApplicationID", "YourAdminAPIKey")
index := client.InitIndex("actors")
var actors []Actor
data, _ := ioutil.ReadFile("actors.json")
_ = json.Unmarshal(data, &actors)
// Batching is done automatically by the API client
_, _ = index.SaveObjects(actors)
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| package algolia
import java.io.FileInputStream
import algolia.AlgoliaDsl._
import org.json4s._
import org.json4s.native.JsonMethods._
import scala.concurrent.ExecutionContext.Implicits.global
case class Actor(name: String,
rating: Int,
image_path: String,
alternative_path: Option[String],
objectID: String)
object Main {
def main(args: Array[String]): Unit = {
val client = new AlgoliaClient("YourApplicationID", "YourAdminAPIKey")
val records = parse(new FileInputStream("actors.json")).extract[Seq[Actor]]
records
.grouped(10000)
.map(g => {
client.execute {
index into "actors" objects g
}
})
}
}
|
1
2
3
4
5
6
7
8
9
10
| val client = ClientSearch(ApplicationID("YourApplicationID"), APIKey("YourAdminAPIKey"))
val index = client.initIndex(IndexName("actors"))
val string = File("actors.json").readText()
val actors = Json.plain.parse(JsonObjectSerializer.list, string)
index.apply {
actors
.chunked(1000)
.map { saveObjects(it) }
.wait() // Wait for all indexing operations to complete.
|
With this approach, you would only make 100 API calls.
Depending on the size of your records and your network speed, you could create bigger or smaller chunks.
For more information, see our Importing Data via the API tutorial.
Using the Dashboard
You can also send your records in your Algolia dashboard.
Add records manually
- Go to your dashboard and select your index.
- Click Manage current index then Add manually.
- Copy/paste your chunk in the JSON editor, then click Push record.
- Repeat for all your chunks.
Upload a file
- Go to your dashboard and select your index.
- Click Manage current index then Upload file.
- Either click the file upload area to select the file where your chunk is, or drag and drop it on it.
- Upload will start automatically.
- Repeat for all your chunks.
For more information, see our Importing Data via the Dashboard tutorial.