Go back to the guide - Send and update your data

Jul. 26, 2019

Sending Records in Batches

Whether you’re using the API or Dashboard, it’s best to send several records at a time instead of pushing them one by one. This has many benefits: it reduces network calls and speeds up indexing. Customers with the largest number of records, such as those on the Enterprise plans, will see the biggest impact on performance, but we recommend everyone to send indexing operations in batches whenever possible.

For example, let’s say you’re fetching all data from your database and end up with a million records to index. That would be too big to send in one take. But sending one record at a time would take too long. You would get much faster indexing by splitting the whole lot into smaller chunks of records, and sending these one by one.

Example

Continuing with our example. You have a million records to index. Pushing them in a single call wouldn’t likely be an option, because Algolia limits you to 1 GB per request. Plus, sending that much data would fail anyway before ever reaching the API.

Your first instinct might be to loop over each record and send them with the addObjects method. The problem is that you would perform a million individual network calls, which is bad from a performance standpoint both on your end and on Algolia’s side.

A much leaner approach is to split your collection of records into smaller collections, then send each chunk one by one. For optimal indexing performance, we recommend a batch size of ~10 MB, which represents between 1,000 or 10,000 records depending on the average record size.

Batching records won’t reduce your operations count. Algolia counts indexing operations per record, not per method call, so batching records won’t be counted differently than indexing them one by one.

Using the API

To push records in batches, you need to chunk your records, then loop over each chunk and send it to Algolia with the addObjects method.

If you need to send data from large files and handle concurrency in JavaScript, you can also use algolia-cli with the algolia import command.

Copy

$client = new \AlgoliaSearch\Client('YourApplicationID', 'YourAdminAPIKey');
$index = $client->initIndex('actors');

$records = json_decode(file_get_contents('actors.json'), true);

// Batching is done automatically by the API client
$index->saveObjects($records, ['autoGenerateObjectIDIfNotExist' => true]);

require 'json'
require 'algoliasearch'

Algolia.init(application_id: 'YourApplicationID', api_key: 'YourAdminAPIKey')

index   = Algolia::Index.new('actors')
file    = File.read('actors.json')
records = JSON.parse(file)

records.each_slice(10000) { |batch| index.add_objects(batch) }

const algoliasearch = require('algoliasearch')
const fs = require('fs');
const StreamArray = require('stream-json/streamers/StreamArray');

const client = algoliasearch('YourApplicationID', 'YourAdminAPIKey');
const index = client.initIndex('actors');

const stream = fs.createReadStream('actors.json').pipe(StreamArray.withParser());
let chunks = [];

stream
  .on('data', ({ value }) => {
    chunks.push(value);
    if (chunks.length === 10000) {
      stream.pause();
      index
        .addObjects(chunks)
        .then(res => {
          chunks = [];
          stream.resume();
        })
        .catch(err => console.error(err));
    }
  })
  .on('end', () => {
    if (chunks.length) {
      index.addObjects(chunks).catch(err => console.error(err));
    }
  })
  .on('error', err => console.error(err));

import json
from algoliasearch.search_client import SearchClient

client = SearchClient.create('YourApplicationID', 'YourAdminAPIKey')
index = client.init_index('actors')

with open('actors.json') as f:
    records = json.load(f)

# Batching is done automatically by the API client
index.save_objects(records, {'autoGenerateObjectIDIfNotExist': True});

let filePath = Bundle.main.path(forResource: "actors", ofType: "json")!
let contentData = FileManager.default.contents(atPath: filePath)!
let records = try! JSONSerialization.jsonObject(with: contentData, options: []) as! [[String: Any]]

let chunkSize = 10000

for beginIndex in stride(from: 0, to: records.count, by: chunkSize) {
  let endIndex = min(beginIndex + chunkSize, records.count)
  index.addObjects(Array(records[beginIndex..<endIndex]))
}

// Asynchronous version
List<Actor> actors = fetchActorsFromDatabase(); // a million actors
for (int i = 0; i < actors.size(); i += 10000) {
    JSONArray chunk = new JSONArray(actors.subList(i, i + 10000));
    index.addObjectsAsync(chunk, new CompletionHandler() {
        @Override
        public void requestCompleted(JSONObject jsonObject, AlgoliaException e) {
            if (e != null) {
                // Handle potential error here
            }
        }
    });
}

// Synchronous version, must run in a background thread to avoid blocking the UI
List<Actor> actors = fetchActorsFromDatabase(); // a million actors
for (int i = 0; i < actors.size(); i += 10000) {
  JSONArray chunk = new JSONArray(actors.subList(i, i + 10000));
  try {
    index.addObjects(chunk, null);
  } catch (AlgoliaException e) {
    // Handle potential error here
  }
}

using System.IO;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;

public class Actor
{
  public string Name { get; set; }
  public string ObjectId { get; set; }
  public int Rating { get; set; }
  public string ImagePath { get; set; }
  public string AlternativePath { get; set; }
}

AlgoliaClient client = new AlgoliaClient("YourApplicationID", "YourAdminAPIKey");
Index index = client.InitIndex("actors");

// Don't forget to set the naming strategy of the serializer to handle Pascal/Camel casing
IEnumerable<Actor> actors = JsonConvert.DeserializeObject<IEnumerable<Actor>>(File.ReadAllText("actors.json"));

// Batching/Chunking is done automatically by the API client
bool autoGenerateObjectIDIfNotExist = true;
index.SaveObjects(actors, autoGenerateObjectIDIfNotExist);

import java.io.FileInputStream;
import java.io.InputStream;
import com.fasterxml.jackson.databind.ObjectMapper;

public class Actor {
    // Getters/Setters ommitted
    private String name;
    private String objectId;
    private int rating;
    private String imagePath;
    private String alternativePath;
}

// Synchronous version
SearchClient client = 
DefaultSearchClient.create("YourApplicationID", "YourAdminAPIKey");

SearchIndex<Actor> index = client.initIndex("actors", Actor.class);

ObjectMapper objectMapper = Defaults.getObjectMapper();

InputStream input = new FileInputStream("actors.json");
Actor[] actors = objectMapper.readValue(input, Actor[].class);

// Batching/Chuking is done automatically by the API client
boolean autoGenerateObjectIDIfNotExist = true;
index.saveObjects(Arrays.asList(actors), autoGenerateObjectIDIfNotExist);

package main

import (
	"encoding/json"
	"io/ioutil"

	"github.com/algolia/algoliasearch-client-go/algolia/search"
)

type Actor struct {
	Name            string `json:"name"`
	Rating          int    `json:"rating"`
	ImagePath       string `json:"image_path"`
	AlternativeName string `json:"alternative_name"`
	ObjectID        string `json:"objectID"`
}

func main() {
	client := search.NewClient("YourApplicationID", "YourAdminAPIKey")
	index := client.InitIndex("actors")

	var actors []Actor
	data, _ := ioutil.ReadFile("actors.json")
	_ = json.Unmarshal(data, &actors)

	// Batching is done automatically by the API client
	_, _ = index.SaveObjects(actors)
}

package algolia

import java.io.FileInputStream

import algolia.AlgoliaDsl._
import org.json4s._
import org.json4s.native.JsonMethods._

import scala.concurrent.ExecutionContext.Implicits.global

case class Actor(name: String,
                 rating: Int,
                 image_path: String,
                 alternative_path: Option[String],
                 objectID: String)

object Main {

  def main(args: Array[String]): Unit = {
    val client = new AlgoliaClient("YourApplicationID", "YourAdminAPIKey")

    val records = parse(new FileInputStream("actors.json")).extract[Seq[Actor]]

    records
      .grouped(10000)
      .map(g => {
        client.execute {
          index into "actors" objects g
        }
      })
  }

}

val client = ClientSearch(ApplicationID("YourApplicationID"), APIKey("YourAdminAPIKey"))
val index = client.initIndex(IndexName("actors"))
val string = File("actors.json").readText()
val actors = Json.plain.parse(JsonObjectSerializer.list, string)

index.apply {
    actors
        .chunked(1000)
        .map { saveObjects(it) }
        .wait() // Wait for all indexing operations to complete.

With this approach, you would only make 100 API calls. Depending on the size of your records and your network speed, you could create bigger or smaller chunks.

For more information, see our Importing Data via the API tutorial.

Using the Dashboard

You can also send your records in your Algolia dashboard.

Add records manually

Go to your dashboard and select your index.
Click Manage current index then Add manually.
Copy/paste your chunk in the JSON editor, then click Push record.
Repeat for all your chunks.

Upload a file

Go to your dashboard and select your index.
Click Manage current index then Upload file.
Either click the file upload area to select the file where your chunk is, or drag and drop it on it.
Upload will start automatically.
Repeat for all your chunks.

For more information, see our Importing Data via the Dashboard tutorial.

Building Search UI

Building Search UI

Building Search UI

Building Search UI

Building Search UI

Building Search UI

PHP

Ruby

JavaScript

Python

iOS

Kotlin

Android

.NET

Java

Golang

Scala

InstantSearch.js

React InstantSearch

Vue InstantSearch

Angular InstantSearch

iOS InstantSearch

Android InstantSearch

Index settings and search parameters

A full reference of API Endpoints

Rails

Symfony

Django

Laravel

Magento 1

Magento 2

WordPress

Shopify

Sending Records in Batches

On this page

Example

Using the API

Using the Dashboard

Add records manually

Upload a file

Did you find this page helpful?

On this page