This project has retired. For details please refer to its Attic page.
Metron – Solr in Metron

Solr in Metron

Introduction

Metron ships with Solr 6.6.2 support. Solr Cloud can be used as the real-time portion of the datastore resulting from metron-indexing.

Configuration

The Indexing Topology

Solr is a viable option for the random access topology and, similar to the Elasticsearch Writer, can be configured via the global config. The following settings are possible as part of the global config:

  • solr.zookeeper
    • The zookeeper quorum associated with the SolrCloud instance. This is a required field with no default.
  • solr.commitPerBatch
    • This is a boolean which defines whether the writer commits every batch. The default is true.
    • WARNING: If you set this to false, then commits will happen based on the SolrClient’s internal mechanism and worker failure may result data being acknowledged in storm but not written in Solr.
  • solr.commit.soft
    • This is a boolean which defines whether the writer makes a soft commit or a durable commit. See here The default is false.
    • WARNING: If you set this to true, then commits will happen based on the SolrClient’s internal mechanism and worker failure may result data being acknowledged in storm but not written in Solr.
  • solr.commit.waitSearcher
    • This is a boolean which defines whether the writer blocks the commit until the data is available to search. See here The default is true.
    • WARNING: If you set this to false, then commits will happen based on the SolrClient’s internal mechanism and worker failure may result data being acknowledged in storm but not written in Solr.
  • solr.commit.waitFlush
    • This is a boolean which defines whether the writer blocks the commit until the data is flushed. See here The default is true.
    • WARNING: If you set this to false, then commits will happen based on the SolrClient’s internal mechanism and worker failure may result data being acknowledged in storm but not written in Solr.
  • solr.collection
    • The default solr collection (if unspecified, the name is metron). By default, sensors will write to a collection associated with the index name in the indexing config for that sensor. If that index name is the empty string, then the default collection will be used.
  • solr.http.config
    • This is a map which allows users to configure the Solr client’s HTTP client.
    • Possible fields here are:
      • socketTimeout : Socket timeout measured in ms, closes a socket if read takes longer than x ms to complete throws java.net.SocketTimeoutException: Read timed out exception
      • connTimeout : Connection timeout measures in ms, closes a socket if connection cannot be established within x ms with a java.net.SocketTimeoutException: Connection timed out
      • maxConectionsPerHost : Maximum connections allowed per host
      • maxConnections : Maximum total connections allowed
      • retry : Retry http requests on error
      • allowCompression : Allow compression (deflate,gzip) if server supports it
      • followRedirects : Follow redirects
      • httpBasicAuthUser : Basic auth username
      • httpBasicAuthPassword : Basic auth password
      • solr.ssl.checkPeerName : Check peer name

Installing

Solr is installed in the full dev environment for CentOS by default but is not started initially. Navigate to $METRON_HOME/bin and start Solr Cloud by running start_solr.sh.

Metron’s Ambari MPack installs several scripts in $METRON_HOME/bin that can be used to manage Solr. A script is also provided for installing Solr Cloud outside of full dev. The script performs the following tasks

  • Stops ES and Kibana
  • Downloads Solr
  • Installs Solr
  • Starts Solr Cloud

Note: for details on setting up Solr Cloud in production mode, see https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html

Navigate to $METRON_HOME/bin and spin up Solr Cloud by running install_solr.sh. After running this script, Elasticsearch and Kibana will have been stopped and you should now have an instance of Solr Cloud up and running at http://localhost:8983/solr/#/~cloud. This manner of starting Solr will also spin up an embedded Zookeeper instance at port 9983. More information can be found here

Solr can also be installed using HDP Search 3. HDP Search 3 sets the Zookeeper root to /solr so this will need to be added to each url in the comma-separated list in Ambari UI -> Services -> Metron -> Configs -> Index Settings -> Solr Zookeeper Urls. For example, in full dev this would be node1:2181/solr.

Enabling Solr

Elasticsearch is the real-time store used by default in Metron. Solr can be enabled following these steps:

  1. Stop the Metron Indexing component in Ambari.
  2. Update Ambari UI -> Services -> Metron -> Configs -> Index Settings -> Solr Zookeeper Urls to match the Solr installation described in the previous section.
  3. Change Ambari UI -> Services -> Metron -> Configs -> Indexing -> Index Writer - Random Access -> Random Access Search Engine to Solr.
  4. Change Ambari UI -> Services -> Metron -> Configs -> REST -> Source Type Field Name to source.type.
  5. Change Ambari UI -> Services -> Metron -> Configs -> REST -> Threat Triage Score Field Name to threat.triage.score.
  6. Start the Metron Indexing component in Ambari.
  7. Restart Metron REST and the Alerts UI in Ambari.

This will automatically create collections for the schemas shipped with Metron:

  • bro
  • snort
  • yaf
  • error (used internally by Metron)
  • metaalert (used internall by Metron)

Any other collections must be created manually before starting the Indexing component. Alerts should be present in the Alerts UI after enabling Solr.

Schemas

As of now, we have mapped out the Schemas in src/main/config/schema. Ambari will eventually install these, but at the moment it’s manual and you should refer to the Solr documentation https://lucene.apache.org/solr/guide/6_6 in general and here if you’d like to know more about schemas in Solr.

In Metron’s Solr DAO implementation, document updates involve reading a document, applying the update and replacing the original by reindexing the whole document.
Indexing LatLonType and PointType field types stores data in internal fields that should not be returned in search results. For these fields a dynamic field type matching the suffix needs to be added to store the data points. Solr 6+ comes with a new LatLonPointSpatialField field type that should be used instead of LatLonType if possible. Otherwise, a LatLongType field should be defined as:

<dynamicField name="*.location_point" type="location" multiValued="false" docValues="false"/>
<dynamicField name="*_coordinate" type="pdouble" indexed="true" stored="false" docValues="false"/>
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

A PointType field should be defined as:

<dynamicField name="*.point" type="point" multiValued="false" docValues="false"/>
<dynamicField name="*_point" type="pdouble" indexed="true" stored="false" docValues="false"/>
<fieldType name="point" class="solr.PointType" subFieldSuffix="_point"/>

If any copy fields are defined, stored and docValues should be set to false.

Collections

Convenience scripts are provided with Metron to create and delete collections. Ambari uses these scripts to automatically create collections. To use them outside of Ambari, a few environment variables must be set first:

# Path to the zookeeper node used by Solr
export ZOOKEEPER=node1:2181/solr
# Set to true if Kerberos is enabled
export SECURITY_ENABLED=true 

The scripts can then be called directly with the collection name as the first argument . For example, to create the bro collection:

$METRON_HOME/bin/create_collection.sh bro

To delete the bro collection:

$METRON_HOME/bin/delete_collection.sh bro

The create_collection.sh script depends on schemas installed in $METRON_HOME/config/schema. There are several schemas that come with Metron:

  • bro
  • snort
  • yaf
  • metaalert
  • error

Additional schemas should be installed in that location if using the create_collection.sh script. Any collection can be deleted with the delete_collection.sh script. These scripts use the Solr Collection API.