What is solr?

Solr is an Apache-based search engine, written in the Java programming language and based on the Java Lucene library, which enables vertical search engines to be integrated.

A vertical search engine, unlike a "general search engine," can target a specific segment of online content.

To give an example, if we had a blog dedicated to cinema, Solr would allow us to find a certain movie within the blog itself by entering the title of a movie or the name of one of its actors. This search would be carried out by going through the documents, whether textual or databases, that are part of the web.

Thus, with Solr we can easily create a search engine to carry out searches on websites and databases.

What does the word SOLR mean

The letters of the word Solr explain its characteristics and mean:

  • S means searching, that is, searching.
  • O means on
  • L stands for Lucene, which is the open source application interface that gives Solr its indexing and search capabilities.
  • R stands for Replication, which implies that it responds to the search (with Replication)

In short, it tells us that Solr is a system that gives us the results of a search based on the Lucene library, which is a library widely used by search engines.

What are the characteristics of searches with Solr

The main features of Solr searches are:

  • Filtering of searches, through which we can restrict searches
  • Search by facets, by which Solr will make filtering suggestions
  • Classification of search results
  • Search by synonyms
  • Integration with databases

We dedicate this page to detail the search process of Solr, analyzing more carefully its characteristics.

How Solr Works

Solr works by going through the selected documents and incorporating them into an index. This process is called indexing.

Thus, indexing with Solr consists of adding the keywords of the documents that we have indicated to the Solr index. A Solr index accepts data from many sources, such as XML, CSV, Word, or PDF files.

Solr instead of searching in the text itself, performs the search for the keyword searched in the index, and then tells us in which documents that keyword is found

This type of index is called a reverse index because the structure of the data is based on keywords rather than on the page.

indexado mediante solr

In the page dedicated to how Solr indexing we detail the Solr indexing process and the concept of reverse or inverted index.

What big websites do Solr use?

Several large websites use Solr. We could highlight:

  • Netflix, which, among others, makes use of its faceted search functionalities
  • faceted query in netflix with solr
  • SourceForge, which makes use of faceted search and filter search
  • faceted query in SourceForge with solr
  • The NASA
  • faceted query in NASA with solr

And others like Instagram, etc.

How much does Solr cost?

Download Solr is free. To download Solr you can go to the official page or follow our tutorial to download and start Solr. So you can use Apache Lucene Solr, modify it and share it for free.

Now, Solr has some hardware requirements to perform the searches that will depend:

  • The number and size of the documents on which the search will be carried out.
  • The number and complexity of the searches. Regarding complexity, the greater the number of search conditions and the greater the fields that are intended to be obtained, the greater the requirements.

Some experts estimate that for an index size of 10GB and a single search field, 2 quad-core processors with 16GB of RAM could handle the system.

As for the implementation itself, a software engineer, or someone with knowledge of xml files, file management, etc., should be able to install it should you try to install Solr with a basic search system.

If we decide to learn and implement Solr ourselves, on this website, in a 2015 article, Solr expert Alexandre Rafalovitch gives us a series of tips to master Solr with resources available on the website itself, the first of which would be to follow the tutorial from Solr's own official page.