For a variety of components (threat intelligence triage and field transformations) we have the need to do simple computation and transformation using the data from messages as variables. For those purposes, there exists a simple, scaled down DSL created to do simple computation and transformation.
The Stellar language supports the following:
Variables may be used in boolean expressions and variables which are not explicitly boolean may be interpreted as booleans subject to the following rules:
Otherwise, boolean variables will be interpreted as their values reflect.
The following keywords need to be single quote escaped in order to be used in Stellar expressions:
not | else | exists | if | then |
and | or | in | NaN | match |
default | == | != | <= | > |
>= | + | - | < | ? |
* | / | , | { | } |
=> |
Using parens such as: “foo” : “<ok>” requires escaping; “foo”: “'<ok>'”
Below is how the == operator is expected to work:
The != operator is the negation of the above.
Stellar provides the capability to pass lambda expressions to functions which wish to support that layer of indirection. The syntax is:
where
In the core language functions, we support basic functional programming primitives such as
Stellar provides the capability to write match expressions, which are similar to switch statements commonly found in c like languages.
The syntax is:
Where:
default is required
Lambda expressions are supported, but they must be no argument lambdas such as () -> STATEMENT
The following is an example query (i.e. a function which returns a boolean) which would be seen possibly in threat triage:
IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1', '10.0.0.2' ] or exists(is_local)
This evaluates to true precisely when one of the following is true:
The following is an example transformation which might be seen in a field transformation:
TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC'))
For a message with a timestamp and dc field, we want to set the transform the timestamp to an epoch timestamp given a timezone which we will lookup in a separate map, called dc2tz.
This will convert the timestamp field to an epoch timestamp based on the
A microbenchmarking utility is included to assist in executing microbenchmarks for Stellar functions. The utility can be executed via maven using the exec plugin, like so, from the metron-common directory:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.stellar.common.benchmark.StellarMicrobenchmark" -Dexec.args="..."
where exec.args can be one of the following:
-e,--expressions <FILE> Stellar expressions -h,--help Generate Help screen -n,--num_times <NUM> Number of times to run per expression (after warmup). Default: 1000 -o,--output <FILE> File to write output. -p,--percentiles <NUM> Percentiles to calculate per run. Default: 50.0,75.0,95.0,99.0 -v,--variables <FILE> File containing a JSON Map of variables to use -w,--warmup <NUM> Number of times for warmup per expression. Default: 100
For instance, to run with a set of Stellar expression in file /tmp/expressions.txt:
# simple functions TO_UPPER('casey') TO_LOWER(name) # math functions 1 + 2*(3 + int_num) / 10.0 1.5 + 2*(3 + double_num) / 10.0 # conditionals if ('foo' in ['foo']) OR one == very_nearly_one then 'one' else 'two' 1 + 2*(3 + int_num) / 10.0 #Network funcs DOMAIN_TO_TLD(domain) DOMAIN_REMOVE_SUBDOMAINS(domain)
And variables in file /tmp/variables.json:
{ "name" : "casey", "int_num" : 1, "double_num" : 17.5, "one" : 1, "very_nearly_one" : 1.000001, "domain" : "www.google.com" }
Written to file /tmp/output.txt would be the following command:
mvn -DskipTests clean package && \ mvn exec:java -Dexec.mainClass="org.apache.metron.stellar.common.benchmark.StellarMicrobenchmark" \ -Dexec.args="-e /tmp/expressions.txt -v /tmp/variables.json -o ./output.json"
The Stellar Shell is a REPL (Read Eval Print Loop) for the Stellar language that helps in debugging, troubleshooting, and learning Stellar. It can also be used as a language-checking resource while interacting with a live Metron cluster.
The Stellar DSL (domain specific language) is used to act upon streaming data within Apache Storm. It is difficult to troubleshoot Stellar when it can only be executed within a Storm topology. This REPL is intended to help mitigate that problem by allowing a user to replicate behavior encountered in production, isolate initialization errors, or understand function resolution problems. Because it can be run from the command line on any node with Metron installed, it can help the user understand environmental problems that may be interfering with Stellar running in Storm servers.
The shell supports customization via ~/.inputrc as it is backed by a proper readline implementation.
Shell-like operations are supported such as
Note: Stellar classpath configuration from the global config is honored here if the REPL knows about zookeeper.
When starting the REPL via $METRON_HOME/bin/stellar you can specify certain environment variables to customize the experience:
To run the Stellar Shell from within a deployed Metron cluster, run the following command on the host where Metron is installed.
$ $METRON_HOME/bin/stellar Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, ... [Stellar]>>> ?PROTOCOL_TO_NAME PROTOCOL_TO_NAME desc: Convert the IANA protocol number to the protocol name args: IANA Number ret: The protocol name associated with the IANA number. [Stellar]>>> ip.protocol := 6 6 [Stellar]>>> PROTOCOL_TO_NAME(ip.protocol) TCP
$ $METRON_HOME/bin/stellar -h usage: stellar -h,--help Print help -irc,--inputrc <arg> File containing the inputrc if not the default ~/.inputrc -v,--variables <arg> File containing a JSON Map of variables -z,--zookeeper <arg> Zookeeper URL fragment in the form [HOSTNAME|IPADDRESS]:PORT -na,--no_ansi Make the input prompt not use ANSI colors.
Optional
Optionally load a JSON map which contains variable assignments. This is intended to give you the ability to save off a message from Metron and work on it via the REPL.
Optional
Attempts to connect to Zookeeper and read the Metron global configuration. Stellar functions may require the global configuration to work properly. If found, the global configuration values are printed to the console. If specified, then the classpath may be augmented by the paths specified in the stellar config in the global config.
$ $METRON_HOME/bin/stellar -z node1:2181 Stellar, Go! {es.clustername=metron, es.ip=node1, es.port=9300, es.date.format=yyyy.MM.dd.HH} [Stellar]>>>
Stellar has no concept of variable assignment. For testing and debugging purposes, it is important to be able to create variables that simulate data contained within incoming messages. The REPL has created a means for a user to perform variable assignment outside of the core Stellar language. This is done via the := operator, such as foo := 1 + 1 would assign the result of the stellar expression 1 + 1 to the variable foo.
[Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> 2 + 2 4.0
The REPL has a set of magic commands that provide the REPL user with information about the Stellar execution environment. The following magic commands are supported.
This command lists all functions resolvable in the Stellar environment.
[Stellar]>>> %functions BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DECODE, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, GET, GET_FIRST, GET_LAST, GET_ENCODINGS_LIST, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_ENCODING, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP_EXISTS, MAP_GET, MONTH, PROTOCOL_TO_NAME, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
The list of functions returned can also be filtered by passing an argument. Only the functions containing the argument as a substring will be returned.
[Stellar]>>> %functions NET IN_SUBNET
Lists all variables in the Stellar environment.
[Stellar]>>> %vars [Stellar]>>> foo := 2 + 2 4.0 [Stellar]>>> %vars foo = 4.0
Lists all values that are defined in the global configuration.
Most of Metron’s functional components have access to what is called the global configuration. This is a key/value configuration store that can be used to customize Metron. Many Stellar functions accept configuration values from the global configuration. The Stellar Shell also leverages the global configuration for customizing the behavior of many Stellar functions.
[Stellar]>>> %globals {es.clustername=metron, es.ip=node1:9300, es.date.format=yyyy.MM.dd.HH, parser.error.topic=indexing, update.hbase.table=metron_update, update.hbase.cf=t}
Defines a global configuration value in the current shell session. This value will be forgotten once the session is ended.
[Stellar]>>> %define bootstrap.servers := "node1:6667" node1:6667 [Stellar]>>> %globals {bootstrap.servers=node1:6667}
Undefine a global configuration value in the current shell session. This will not modify the persisted global configuration.
[Stellar]>>> %undefine bootstrap.servers [Stellar]>>> %globals {}
Returns formatted documentation of the Stellar function. Provides the description of the function along with the expected arguments.
[Stellar]>>> ?BLOOM_ADD BLOOM_ADD desc: Adds an element to the bloom filter passed in args: bloom - The bloom filter, value* - The values to add ret: Bloom Filter [Stellar]>>> ?IS_EMAIL IS_EMAIL desc: Tests if a string is a valid email address args: address - The String to test ret: True if the string is a valid email address and false otherwise. [Stellar]>>>
To run the Stellar Shell directly from the Metron source code, run a command like the following. Ensure that Metron has already been built and installed with mvn clean install -DskipTests.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.stellar.common.shell.cli.StellarShell" \ -pl metron-platform/metron-enrichment ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... [Stellar]>>> %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, ENRICHMENT_EXISTS, ENRICHMENT_GET, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GEO_GET, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
Changing the project passed to the -pl argument will define which dependencies are included and ultimately which Stellar functions are available within the shell environment.
This can be useful for troubleshooting function resolution problems. The previous example defines which functions are available during Enrichment. For example, to determine which functions are available within the Profiler run the following.
$ mvn exec:java \ -Dexec.mainClass="org.apache.metron.stellar.common.shell.cli.StellarShell" \ -pl metron-analytics/metron-profiler ... Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... %functions ABS, APPEND_IF_MISSING, BIN, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CHOMP, CHOP, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENDS_WITH, FILL_LEFT, FILL_RIGHT, FILTER, FORMAT, GET, GET_FIRST, GET_LAST, HLLP_ADD, HLLP_CARDINALITY, HLLP_INIT, HLLP_MERGE, IN_SUBNET, IS_DATE, IS_DOMAIN, IS_EMAIL, IS_EMPTY, IS_INTEGER, IS_IP, IS_URL, JOIN, LENGTH, LIST_ADD, MAAS_GET_ENDPOINT, MAAS_MODEL_APPLY, MAP, MAP_EXISTS, MAP_GET, MONTH, OUTLIER_MAD_ADD, OUTLIER_MAD_SCORE, OUTLIER_MAD_STATE_MERGE, PREPEND_IF_MISSING, PROFILE_FIXED, PROFILE_GET, PROFILE_WINDOW, PROTOCOL_TO_NAME, REDUCE, REGEXP_MATCH, SPLIT, STARTS_WITH, STATS_ADD, STATS_BIN, STATS_COUNT, STATS_GEOMETRIC_MEAN, STATS_INIT, STATS_KURTOSIS, STATS_MAX, STATS_MEAN, STATS_MERGE, STATS_MIN, STATS_PERCENTILE, STATS_POPULATION_VARIANCE, STATS_QUADRATIC_MEAN, STATS_SD, STATS_SKEWNESS, STATS_SUM, STATS_SUM_LOGS, STATS_SUM_SQUARES, STATS_VARIANCE, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR
The Stellar Shell is also packaged as a stand alone application. It can be unpacked on any supported operating system.
Only the base Stellar functions are available as packaged. Other functions, such as those in metron-profiler and metron-management are not available.
metron-stellar/stellar-common/target/stellar-common-0.7.1-stand-alone.tar.gz
When unpacked, the following structure will be created:
. ├── bin │ └── stellar └── lib └── stellar-common-0.7.1-uber.jar
To run the Stellar Shell run the following from the directory you unpacked to:
bin/stellar
-> % bin/stellar Stellar, Go! Please note that functions are loading lazily in the background and will be unavailable until loaded fully. [Stellar]>>> Functions loaded, you may refer to functions now... [Stellar]>>> %functions ABS, APPEND_IF_MISSING, BLOOM_ADD, BLOOM_EXISTS, BLOOM_INIT, BLOOM_MERGE, CEILING, CHOMP, CHOP, COS, COUNT_MATCHES, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, DECODE, DOMAIN_REMOVE_SUBDOMAINS, DOMAIN_REMOVE_TLD, DOMAIN_TO_TLD, ENCODE, ENDS_WITH, EXP, FILL_LEFT, FILL_RIGHT, FILTER, FLOOR, FORMAT, GET, GET_FIRST, GET_LAST, GET_SUPPORTED_ENCODINGS, IN_SUBNET, IS_EMPTY, IS_ENCODING, JOIN, LENGTH, LIST_ADD, LN, LOG10, LOG2, MAP, MAP_EXISTS, MAP_GET, MONTH, PREPEND_IF_MISSING, REDUCE, REGEXP_GROUP_VAL, REGEXP_MATCH, ROUND, SIN, SPLIT, SQRT, STARTS_WITH, STRING_ENTROPY, SYSTEM_ENV_GET, SYSTEM_PROPERTY_GET, TAN, TO_DOUBLE, TO_EPOCH_TIMESTAMP, TO_FLOAT, TO_INTEGER, TO_LONG, TO_LOWER, TO_STRING, TO_UPPER, TRIM, URL_TO_HOST, URL_TO_PATH, URL_TO_PORT, URL_TO_PROTOCOL, WEEK_OF_MONTH, WEEK_OF_YEAR, YEAR, ZIP, ZIP_LONGEST [Stellar]>>>
By default the shell will have the base Stellar Language commands available. Any jars in the lib directory that contain Stellar functions will also be loaded, and their commands will be available to shell, as long as their dependencies are satisfied.
The Stellar Shell can be executed both from the command line and from within a Stellar Notebook. The behavior and underlying implementation of the behavior is exactly the same across these two environments.
This package contains classes that are reused across both the CLI and Zeppelin shell environments.
StellarShellExecutor Executes Stellar in a shell-like environment. Provides the Stellar language extensions like variable assignment, comments, magics, and doc strings that are only accessible in the shell.
StellarAutoCompleter Handles auto-completion for Stellar.
StellarExecutorListeners An event listener that can be notified when variables, functions, and specials are defined. This is how a StellarAutoCompleter is notified throughout the life of a shell session.
All Stellar language extensions are contained within this package.
This package contains classes that are specific to the CLI-driven REPL.
Stellar can be configured in a variety of ways from the Global Configuration. In particular, there are three main configuration parameters around configuring Stellar:
If specified, Stellar will use a custom classloader which will wrap the context classloader and allow for the resolution of classes stored in jars not shipped with Metron and stored in a variety of mediums:
This path is a comma separated list of
{ ... "stellar.function.paths" : "hdfs://node1:8020/apps/metron/stellar/metron-management-0.4.2.jar, hdfs://node1:8020/apps/metron/3rdparty/.*.jar" }
Please be aware that this classloader does not reload functions dynamically and the classpath specified here in the global config is read on topology start. A change in classpath, to be picked up, would necessitate a topology restart at the moment
If specified, this defines one or more regular expressions applied to the classes implementing the Stellar function that specify what should be included when searching for Stellar functions.
{ ... "stellar.function.resolver.includes" : "org.apache.metron.*,com.myorg.stellar.*" }
Stellar provides a REST Client with the REST_GET function. This function depends on the Apache HttComponents library for executing Http requests. The syntax is:
REST_GET( uri , optional config )
The second argument is an optional Map of settings. The following settings are available:
This Map of settings can also be stored in the global config stellar.rest.settings property. For example, to configure basic authentication settings you would add this property to the global config:
{ "stellar.rest.settings": { "basic.auth.user": "user", "basic.auth.password.path": "/password/path" } }
Any settings passed into the expression will take precedence over the global config settings. The global config settings will take precedence over the defaults.
For security purposes, passwords are read from a file in HDFS. Passwords are read as is including any new lines or spaces. Be careful not to include these in the file unless they are specifically part of the password.
Perform a simple GET request with no authentication:
[Stellar]>>> REST_GET('http://httpbin.org/get') {args={}, headers={Accept=application/json, Accept-Encoding=gzip,deflate, Cache-Control=max-age=259200, Connection=close, Host=httpbin.org, User-Agent=Apache-HttpClient/4.3.2 (java 1.5)}, origin=127.0.0.1, 136.62.241.236, url=http://httpbin.org/get}
Perform a GET request using basic authentication:
[Stellar]>>> config := {'basic.auth.user': 'user', 'basic.auth.password.path': '/password/path'} {basic.auth.user=user, basic.auth.password.path=/password/path} [Stellar]>>> REST_GET('http://httpbin.org/basic-auth/user/passwd', config) {authenticated=true, user=user}
Perform a GET request using a proxy:
[Stellar]>>> config := {'proxy.host': 'node1', 'proxy.port': 3128, 'proxy.basic.auth.user': 'user', 'proxy.basic.auth.password.path': '/proxy/password/path'} {proxy.basic.auth.password.path=/proxy/password/path, proxy.port=3128, proxy.host=node1, proxy.basic.auth.user=user} [Stellar]>>> REST_GET('http://httpbin.org/get', config) {args={}, headers={Accept=application/json, Accept-Encoding=gzip,deflate, Cache-Control=max-age=259200, Connection=close, Host=httpbin.org, User-Agent=Apache-HttpClient/4.3.2 (java 1.5)}, origin=127.0.0.1, 136.62.241.236, url=http://httpbin.org/get}
Performing a REST request will introduce latency in a streaming pipeline. Therefore this function should only be used for low volume telemetries that are unlikely to be affected by higher latency operations. The timeout setting can be used to guarantee that requests complete within the configured time.
In cases of Http errors, timeouts, etc this function will log the error and return null. Only a status code of 200 is considered successful by default but this can be changed with the response.codes.allowed setting. Values returned on errors or emtpy content can be changed from the default value of null using the error.value.override and empty.content.override respectively.