Search: Installing SphinxSearch RedHat/CentOS

We'll cover how to install SphinxSearch on RedHat and CentOS servers. Sphinx is an optional component for HelpSpot 4 and not needed for most installations. By default database full text indexes are used.

HelpSpot 4 was tested and developed against SphinxSearch 2.1.9. Currently, there is a 2.2 series available. We recommend using 2.1.9. however some newer server distributions may require the installation of the 2.2.* series based on library/dependency availability. Either series will work with HelpSpot, however the 2.2 series may display some WARNING type errors during indexing, due to some depreciated configuration in use. These are normal and should not affect the searching engine's capabilities.

Installation

You can install SphinxSearch from the appropriate .rpm files as provided by SphinxSearch: Here for the latest version (2.2.*) or Here for previous versions (2.1.* and older). On RedHat/CentOS 7, we used SphinxSearch version 2.2.6. While testing this, I downloaded the .rpm file directly to the server using the following command:

# Download SphinxSearch installer rpm for 64bit architecture
curl -L -o sphinx-2.2.6-1.rhel7.x86_64.rpm \
    http://sphinxsearch.com/files/sphinx-2.2.6-1.rhel7.x86_64.rpm

Once you have the .rpm file appropriate for your server release and architecture (32bit vs 64bit), you can continue installing SphinxSearch.

The process for installation in RedHat and CentOS is the same.

# Install SphinxSearch and dependencies from rpm file
sudo yum install ./sphinx-2.2.6-1.rhel7.x86_64.rpm

# Set SphinxSearch to start on system boot
sudo chkconfig searchd on

Tmpfs bug

Similar to the current Debian installer for SphinxSearch, there is a bug which prevents SphinxSearch from restarting on system start.

RedHat/CentOS makes use of the tmpfs file system for files in the /var/run directory. This filesystem is saved only in memory, and therefore files must be re-created when a system reboots. These are not created currently in SphinxSearch.

To rectify this, we need to make one adjustment to the system so it creates the /var/run/sphinx directory when the system starts.

Create and edit the /usr/lib/tmpfiles.d/searchd.conf file. Add the following content, just as seen here:

d /var/run/sphinx 0755 sphinx sphinx -

Once that's saved, you can ensure it has proper ownership and SELinux context, if SELinux is in use:

sudo chown root:root /usr/lib/tmpfiles.d/searchd.conf
sudo chcon --user=system_u --role=object_r --type=lib_t /usr/lib/tmpfiles.d/searchd.conf

Configuration

Assuming HelpSpot version 4+ is already installed and running, and the new "data" directory can be written to by HelpSpot, we can use the new HelpSpot commands to create our Sphinx configuration file.

# Change directory into the HelpSpot site location
cd /path/to/helpspot

# Create Sphinx configuration file using available "hs" command-line tool:
php hs search:config

This will create a file in the HelpSpot "data" directory named "sphinx.conf". This will have the needed configuration to index the HelpSpot database, including host, user and password connection information for your HelpSpot database. You can symlink this file to the SphinxSearch location at /etc/sphinx/sphinx.conf or you can move this configuration file to that location. We recommend the latter, as systems usually cannot follow symlinks at system boot time, and so SphinxSearch may not successfully start back up when the system restarts.

# Move Sphinx example conf file
sudo mv /etc/sphinx/sphinx.conf /etc/sphinx/sphinx.conf.bak

# Move HelpSpot sphinx.conf file in place
sudo mv data/sphinx.conf /etc/sphinx/sphinx.conf

Next we need to setup some directories and adjust the sphinx.conf file for use with RedHat/CentOS. The sphinx.conf file assumes some directory locations which are slightly different in RedHat/CentOS systems. The following will adjust those:

# Create the directory for sphinx index data
sudo mkdir /var/lib/sphinx/data
# Give that directory proper permissions
# and SELinux context, if SELinux is enabled
sudo chown sphinx:sphinx /var/lib/sphinx/data
sudo chcon --user=system_u --role=object_r --type=var_lib_t /var/lib/sphinx/data

# Find and replace some file path assumptions 
# in the generated configuration file
sudo sed -i "s@/var/lib/sphinxsearch@/var/lib/sphinx@" /etc/sphinx/sphinx.conf
sudo sed -i "s@/var/log/sphinxsearch@/var/log/sphinx@" /etc/sphinx/sphinx.conf
sudo sed -i "s@/var/run/searchd.pid@/var/run/sphinx/searchd.pid@" /etc/sphinx/sphinx.conf

Once that's done, we can index our database and then start the search engine.

# Index the database into the search engine
sudo indexer --all --rotate

# Start Sphinx
sudo service searchd start

If SELinux is enabled, we need Apache to be able to make network calls to SphinxSearch. In the installer documentation, we enabled the SELinux "httpd_can_network_connect_db" boolean, which allows network calls to the standard MySQL port 3306. Now we will enable the "" boolean, which allows any network calls to be made.

sudo setsebool -P httpd_can_network_connect on
sudo service httpd restart

CRON Tasks

SphinxSearch must be told to periodically re-index the HelpSpot database. This is done in two ways:

  • Regularly, but less often (perhaps once per day) re-index the entire HelpSpot database. This fixes any indexing issues, and resolves any potential edge cases such as merged requests causing inaccurate search results.
  • Regularly, fairly often, indexing the defined "Delta Indexes", which index only data that has accumulated since the last indexing.

There are four indexes used with HelpSpot:

  • Requests (Customer information, custom fields, other related data)
  • Request History (Any public, private or external note within requests)
  • Knowledge Books
  • Forums

The following is our recommended setup for scheduling indexing of the HelpSpot database, however definitely change how often indexing occurs based on your needs for HelpSpot (e.g., if you have a higher or lower volume of requests, if you make little use of forums or Knowledge Books):

Note that since Sphinx is started/run by user "sphinx" in RedHat/CentOS, the CRON tasks should also be run by user sphinx.

Entire Database:

Index the entire database once per day, preferably at a less busy time of day.

A CRON task for that would look like this:

0 0 * * * indexer --all --rotate

Forums and Knowledge Books

Index the forums and knowledge books 2 to 4 times a day. The following will index them every 6 hours, which is 4 times per day:

0 */6 * * * indexer forums_ndx knowledgebooks_ndx --rotate

Requests: Delta Indexes

Request delta indexes must be indexed and then combined into the main index. This involves multiple commands and can best be run in a shell script, such as the following:

#! /usr/bin/env bash

# Assumed to be run as root in Debian/Ubuntu
indexer requests_history_ndx_delta --rotate
indexer --merge requests_history_ndx requests_history_ndx_delta --rotate
indexer requests_ndx_delta --rotate
indexer --merge requests_ndx requests_ndx_delta --rotate

The following CRON task will run the above shell script every 10 minutes:

0/10 * * * * /path/to/delta_index_shell_script.sh

 


This page was: Helpful | Not Helpful