How to Group Results

Algolia works differently than relational databases. When fetching data from a database, you can select what you need, perform complex operations to aggregate data from different tables together, and get data in a format that’s already close to how you want to display it on your front end. With Algolia, every time you have a match within one or more of your records, the engine returns the full records ranked by relevance.

Sometimes your data contains records that are subparts of a larger record. This can happen with a blog article that is broken up into one paragraph per record. It can also happen when several records share a common source, as in a hierarchy or one-to-many relationship. A good example of this is with job openings, where companies offer multiple job offers.

As you’ll see, the solution is to flatten records and repeat some data. In the job offer example, you only want to show the most relevant 1 or 3 offers per company, leaving room for other companies. Let’s see how to do this.

We discuss here only the job offer example. Follow these links to see how to break up large texts:

Dataset Example

Before

If we took a traditional approach for structuring our records, the dataset could look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[
  {
    "company": "Twilio",
    "job_openings": [
      "Staff Software Engineer - Cloud Platform",
      "Lead Front End Engineer",
      "Senior Data Engineer",
      "Senior Software Engineer, Developer Experience"
    ]
  },
  {
    "company": "Algolia",
    "job_openings": [
      "Full-Stack Software Engineer",
      "Frontend Engineer",
      "Open Source Software Engineer (JavaScript)",
      "Senior Software Engineer - Core API",
      "Senior Systems Engineer - SRE"
    ]
  }
]

The problem with this structure is that whenever you have a match for any opening, the engine returns the full record for the company. If you want to show the best match per company, this data structure doesn’t work.

If you want to show a limited number of job openings per company, the right approach would be to split content into smaller records, by job opening, and repeat company data.

After

With the strategy of splitting records per company, you would have a single record per job opening, and repeat the company in each. Here’s what it might look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
[
  {
    "company": "Twilio",
    "job_opening": "Staff Software Engineer - Cloud Platform"
  },
  {
    "company": "Twilio",
    "job_opening": "Lead Front End Engineer"
  },
  {
    "company": "Twilio",
    "job_opening": "Senior Data Engineer"
  },
  {
    "company": "Twilio",
    "job_opening": "Senior Software Engineer, Developer Experience"
  },
  {
    "company": "Algolia",
    "job_opening": "Full-Stack Software Engineer"
  },
  {
    "company": "Algolia",
    "job_opening": "Frontend Engineer"
  },
  {
    "company": "Algolia",
    "job_opening": "Open Source Software Engineer (JavaScript)"
  },
  {
    "company": "Algolia",
    "job_opening": "Senior Software Engineer - Core API"
  },
  {
    "company": "Algolia",
    "job_opening": "Senior Systems Engineer - SRE"
  }
]

This approach has many benefits. First, job openings are no longer intertwined, which allows for more granular search. Whenever someone searches for a position, for example, “engineer”, they no longer retrieve records representing a company with the full job openings list. Instead, they get single, best matching job positions, that can be individually ranked with custom ranking attributes.

Besides, you can handle the duplicate data with Algolia’s distinct feature. Enabling this would let you, for example, only retrieve the best matching position per company.

Configuring attributeForDistinct and Enabling distinct

Using the API

To use distinct you first need to set company as attributeForDistinct during indexing time. Only then can you set distinct to true to de-duplicate your results. Note that setting distinct at indexing time is optional. If you want to, you can set it at query time instead.

1
2
3
4
$index->setSettings([
  'attributeForDistinct' => 'company',
  'distinct' => true
]);

Once attributeForDistinct is set, you can enable distinct by setting it to true. Note that you can set distinct to true or 1 interchangeably. If you wanted to show the three best positions for one company, you could set distinct to 3.

1
2
3
$results = $index->search('query', [
  'distinct' => true
]);

Using the Dashboard

You can also set your attribute for distinct and enable distinct in your Algolia dashboard.

  • Go to your dashboard and select your index.
  • Click the Configuration tab, then click on Deduplication and Grouping.
  • Set Distinct to true
  • Select attribute “company” in the Attribute for Distinct dropdown.
  • Don’t forget to save your changes.

Did you find this page helpful?