MalwareSoup

Thoughts on cyber threat intelligence, malware analysis, and other things

Sysmon and Neo4j

In my day job I spend a lot of time thinking about how to get the most information possible out of log data. A very common challenge, at least for many of the use cases I see, boils down to identifying relationships between different data points in those logs. Sometimes approaching the data in new ways, whether in how it is aggregated or stored, or just using new techniques for analysis, can be helpful in identifying relationships that might not otherwise be obvious.

Enter Neo4j.

Neo4j is a graph database that is built on nodes and the relationships between those nodes. After spending a good bit of time playing with different hunting concepts using the ELK stack, I thought a graph database might be a good way to not only visualize the data I've been aggregating using the ELK stack, but also store it in a way that makes relationships readily apparent and easy to query. Ideally I'd want to continue using the ELK stack as my main hunting platform, but also stream the data to a Neo4j instance to allow me to visualize potentially interesting data points identified using my normal work flow.

As an aside, if you haven't had a chance to check out the excellent content by Roberto (@cyb3rWard0g) related to threat-hunting using the ELK stack then I highly recommend you check out his blog here.

Sysmon has been getting a lot of attention lately as a good source of information for threat hunting, so I thought that would be a great use case for this idea. After doing a little bit of research, I came across several other blogs detailing very similar ideas - visualizing sysmon data using Neo4j (like this one). However, the examples I've seen mainly deal with network connections. I wanted to take things a step further and visualize as much as I could - network connections, process creation, image loads, named pipe interactions, etc.

The basic process I'm going to detail here involves using WinLogBeat to ship Sysmon logs from a Windows host to Logstash. From there we'll use the "pipe" output plugin with Logstash to pipe Sysmon events to a python script that will populate the Neo4j instance. The steps below were designed for an Ubuntu 16.04 installation.

Lets get started.

Installing Logstash

Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite “stash.”

Ideally you'd ship data to both Elasticsearch and Neo4j, although for testing purposes we'll just be configuring Logstash to push data to Neo4j for now.

To install Logstash we'll first need to make sure a supported version of Java is installed. You can check which version you have installed using the command:
java -version

If you don't have Java installed at all, you can do so using apt:
sudo apt install openjdk-8-jre

Once you've got a working version of Java installed, you'll need to download an install Elastic's public signing key:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Install the apt-transport-https package:
sudo apt install apt-transport-https

And update your repository sources:
echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

Finally, you can update and install the Logstash package:
sudo apt update && sudo apt install logstash

Installing Neo4j

Logstash should now be installed and ready to ship logs to our "stash" of choice. Now we'll need to get Neo4j installed and ready to receive those logs. The process is pretty much the same.

Download and add Neo4j's public signing key:
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -

Update your repository sources:
echo 'deb https://debian.neo4j.org/repo stable/' | sudo tee /etc/apt/sources.list.d/neo4j.list

Update and install Neo4j:
sudo apt update && sudo apt install neo4j

Clone the sysmon2neo4j Github repo

Okay, now we've got Logstash and Neo4j installed, we just need a way for them to talk to each other. Logstash has a pretty long list of plugins available to manipulate data, and allows developers to write their own plugins (in Ruby) if one doesn't suit their needs. I experimented with writing a custom output plugin, but quickly decided that I hate the entire Ruby ecosystem and should take a different route. One of the pre-installed plugins for Logstash is the "pipe" plugin which pipes data to another application using stdin. I decided to use that option and a python script to push events from Logstash to Neo4j. I've made the code available on Github, although I should probably add that I am by no means a professional coder. Pull requests are very much appreciated if you see mistakes or areas that could be improved.

In any case, to get rolling you'll just need to clone the repo:
git clone https://github.com/MalwareSoup/sysmon2neo4j

And install the requirements:

cd sysmon2neo4j
sudo pip install -r requirements.txt

You'll just need to modify the and fields to authenticate to your Neo4j instance (we'll set a Neo4j username and password when we first start the Neo4j instance shortly).

If you configured Neo4j to listen on an IP:port combo other than the default of localhost:7687 you'll need to modify the code for that location as well.

Create a Logstash pipeline

Logstash works based on a "pipeline" that includes inputs, filters, and outputs. We'll need to configure it to listen for winlogbeat connections, and output Sysmon events to our script via the pipe plugin.

Create a .conf file (I called mine test-pipeline.conf) with the contents:

input {
    beats {
        port => 5043
    }
}
output {
    if [source_name] == "Microsoft-Windows-Sysmon" {
        pipe {
            ttl => 300
            codec => 'json'
            command => 'python /path/to/sysmon2neo4j.py'
        }
    }
}

Note, this is just meant as an example for experimentation. In practice, you'd likely want to make sure to configure your beats input to use SSL and output data to Elasticsearch as well as Neo4j. For a more detailed explanation of setting up a more secure pipeline, see part 5 of Roberto's blog.

Save the config to /etc/logstash/conf.d/

Installing Sysmon

We'll be using Sysmon to log various events on our Windows host(s) and Winlogbeat to ship those events to our Logstash instance. Go ahead and download Sysmon here. You can extract the contents of the zip to a location of your choice (I chose C:\Tools\Sysmon). You can install the service using the command:

sysmon.exe -i -accepteula

However that will start Sysmon using default a default configuration. For more robust logging, you'll want to include a configuration file. These config files follow a format based on XML which you can read more about here. For starters though, I'd recommend using Roberto's baseline configuration file that he's made available on his Github. Just save the file with the .xml extension and run the following command to update your Sysmon configuration.

sysmon.exe -c <filename>.xml

If you take a look in the Windows event viewer you should start to see Sysmon logs rolling in.

Installing Winlogbeat

Now that Sysmon is logging lots of wonderful data for us, we need to ship those logs to our Logstash instance so they can be forwarded on to Neo4j. Here's were Winlogbeat comes in. Go ahead and download Winlogbeat here and unzip the file into your C:\Program Files folder. Open powershell, navigate to the winlogbeat folder, and run the install-winlogbeat-service.ps1 script.

.\install-winlogbeat-service.ps1

We'll then need to edit a couple of areas of the winlogbeat.yml file to ship logs to Logstash appropriately.

In the first section titled "Winlogbeat specific options" we'll add the line:

- name: Microsoft-windows-sysmon/operational

Then go ahead and comment out the output.elasticsearch section and all of its subfields under the "Outputs" section. Modify the output.logstash section to look something like this where <Logstash IP> is the IP address of the system you installed Logstash on:

output.logstash:
   # The Logstash hosts
   hosts: ["<Logstash IP>:5043"]

   # Optional SSL. By default is off.
   # List of root certificates for HTTPS server verifications
   #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

   # Certificate for SSL client authentication
   #ssl.certificate: "/etc/pki/client/cert.pem"

   # Client Certificate Key
   #ssl.key: "/etc/pki/client/cert.key"

Note: If you chose to secure your logstash interaction with SSL, you'll need to modify those settings here

Start all the things!

Alright, we've installed all of the necessary components, let's start some services! On the Ubuntu system where you've installed Logstash and Neo4j you can just use the systemctl start command to launch the two services.

sudo systemctl start logstash

sudo systemctl start neo4j

Now if you fire up your web browser and navigate to http://localhost:7474 you should be prompted to login to your Neo4j browser. The default username and password is neo4j:neo4j. After logging in you should be prompted to change your password. This is what you'll use to modify your sysmon2neo4j.py script.

Back on your windows host, you should be able to start your Winlogbeat service via powershell using the command:

Start-Service winlogbeat

If everything goes well, data should now be flowing to Neo4j as planned.

Example Query

While I plan on covering some specific use cases in my next post, I wanted to provide at least a quick example of how you can query Neo4j and visualize your Sysmon data. In the Neo4j browser, I entered the query:

match (p:Process)-[]-(d:Destination) match (p)-[]-(i:Image) return p, d, i

Neo4j uses the cipher query language (CQL). This query matches any pattern where a process has a connection to a network destination and returns the process, image used by that process, and the network destination. The response looks like:

Stay tuned for future posts detailing a few use cases for identifying attacker behavior.