What is Apache Solr?
Apache Solr is an open source search platform developed upon a Java library described Lucene. Solr is a kind of full-text search server that uses the Lucene Java search library for searching full text at its core. Solr having REST (representational state transfer) feature like HTTP/XML and JSON API's that helps in linking it to most programming languages. Many of the Internet's most significant sites like Apple, CISCO, etc. use Apache Solr for their search and navigation features.
In actual we can say that Apache Solr is sub-branch of Apache Lucene that is developed with Java. As part of the Lucene project, Solr uses the Lucene Java search library at its core for searching and indexing.This article will cover details regarding Apache Solr Security. Operations Performed by Apache Solr to Search a document -
- Indexing - The Document that needs to be searched is converted into the machine-readable format that is termed as indexing.
- Quering - Than it tries to understand the terms of Query that is asked by the user like some essential terms, keywords, etc.
- Mapping - The QQuery made by the user is then mapped to the document stored in the database for the relevant result.
- Ranking the Outcome - Whenever the engine searches the indexed text, it lists the output based on there relevance.
What is the architecture of Apache Solr Security?
Indexing and searching are the two primary functions that need to be supported by Apache Solr. There are Handlers present that is used to handle data within a specific category. Update processor chain took place whenever data is uploaded. It goes through the cleanup process in which duplicate values are eliminated to avoid unnecessary reappearing. Examining the field and generation of the tokens is done by the Analyzer. The field data is broken into lexical units or symbols by the help of Tokenizer. There can only be one Tokenizer per Analyzer. Common words like is, am, are, etc. Are taken out by the Apache Solr Administration for more effective result.
Big Data tools and frameworks are responsible for retrieving meaningful information from a huge set of data. Click to explore about, Open Source Big Data Tools
Now there is also Query Parser that is responsible for the parsing of the Query. DisMax,Lucene,e-DisMax etc. Are some of the Query Parsers. Based on the requirements, each Parser comes to the play as each Parser has different roles. After passing the Query, it is then handed over to index searcher. The job of the index reader is to run the queries on the index store and gather the results to the response editor. The response writer is responsible for responding to the client; it formats the query response based on the search results from the Lucene engine. Following Diagram will show us the process after Search is done.
Implementing Apache Solr Security with Kerberos
The Solar client that need to authenticate with the Solar is necessary for safety point of view. In the case of Kerberos, Authentication Solr includes Kerberos service principal and keytab file that will be needed to authenticate with Zookeeper and between the nodes of The Solr Cluster. Besides this, all the clients and users will also be having a valid ticket that will be needed before sending a request to Solr.
To secure Apache Solr, we will walk through the following steps - Before configuring Solr, we will make sure that we have Kerberos Service Principle for each of the Solr Host and there is also Zookeeper that must be available in KDC server. We will then generate a Keytab file as - Assuming hostname to be 192.168.10.120 and home directory to be home/foo/. For this particular environment, the phase will be -
root@kdc:/# kadmin.local
Authenticating as principal foo/admin@EXAMPLE.COM with password
kadmin.local: addprinc HTTP/192.168.10.120
WARNING: no policy specified for HTTP/192.168.10.120@EXAMPLE.COM; defaulting to no policy
Enter the password for principal "HTTP/192.168.10.120@EXAMPLE.COM":
Re-enter password for principal "HTTP/192.168.10.120@EXAMPLE.COM":
Principal "HTTP/192.168.10.120@EXAMPLE.COM" created.
kadmin.local: ktadd -k /tmp/120.keytab HTTP/192.168.10.120
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/tmp/120.keytab.
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type arcfour-hmac added to keytab WRFILE:/tmp/120.keytab.
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type des3-cbc-sha1 added to keytab WRFILE:/tmp/108.keytab.
Entry for principal HTTP/192.168.10.120 with kvno 2, encryption type des-cbc-crc added to keytab WRFILE:/tmp/120.keytab.
kadmin.local: quit
Copy the keytab file from the KDC server's/tmp/120.keytab location to the Solr host at /keytabs/120.keytab. Repeat this step for each Solr node.If in case the Zookeeper hasn't been set up than similar steps must take place for the ZooKeeper service principal and keytab. Now we have to create the security.json file and we have to put in our $SOLR_HOME directory. In case of SolrCloud mode we can upload it by using Kerberos Plugin into Zookeeper. While we can create it as follow -
server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd put /security.json
'{"authentication":{"class": "org.apache.solr.security.KerberosPlugin"}}'
Now we have to Define a JAAS Configuration File. This helps us to define specific properties that will be needed for authentication. We can also set some other properties like ticket caching etc. Below is the JAAS configuration file with the name and path of /home/foo/jaas-client.conf
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/keytabs/120.keytab"
storeKey=true
useTicketCache=true
debug=true
principal="HTTP/192.168.0.120@EXAMPLE.COM";
};
This name and path will be used to define the Solr start parameters and helps in authentication of the internode requests and to zookeeper requests. Now before Starting the Solr, some parameters need to be passed. We can also use passe these parameters at the command line with the bin/solr start command. Following table tells us the settings that are needed or not. Once the Configuration is completed, we can start the Solr by the following command -
bin/solr -c -z server1:2181,server2:2181,server3:2181/solr
To test the Configuration, we will try to connect the Solr with the following command as -
curl --negotiate -u :"http://192.168.0.120:8983/solr/"
Apache Solr Security Best Practices
Following are the best practices to secure Solr from development to production.
Encryption with a TLS Certificate (SSL)
Encrypting traffic to/from Solr and between Solr nodes prevents sensitive data from leaking across the network. TLS is also often required to prevent credential sniffing when using authentication.
Authentication, Authorization, and Audit Logging
Authorization ensures that only users with the necessary roles/permissions can access a given resource. Authorization ensures that only users with the necessary roles/permissions can access a given resource. Audit logging will log the audit of requests to your cluster, such as users being denied access to administrative APIs.
Enable IP Access Control
Restrict network access to specific hosts by setting SOLR_IP_WHITELIST/SOLR_IP_BLACKLIST through environment variables or in solr.in.sh/solr.in.cmd.
ZooKeeper Traffic Protection
ZooKeeper is a core part of the SolrCloud cluster and the ZooKeeper Access Control page shows how to protect its content.
Enable Security Manager
Solr can be run in the Java Security Manager sandbox by setting SOLR_SECURITY_MANAGER_ENABLED=true via an environment variable or in solr.in.sh/solr.in.cmd. This feature is not compatible with Hadoop.
A Comprehensive Approach
Real Time Indexing and Advanced Text Search features can help Enterprises to enable highly scalable and fault tolerant indexing capabilities. To know more about Near Real-Time Indexing we advise taking the following steps -- Read more about Auto Indexing with Machine Learning
- Learn more about Anomaly Detection with AI