menu
FlowTraq > Blog > Tech Thoughts > Getting Started with AWS VPC flow logs in FlowTraq

Getting Started with AWS VPC flow logs in FlowTraq

Alex Barsamian
By | June 10, 2016


Facebooktwitterlinkedin

We make pretty extensive use of Amazon Web Services at FlowTraq. We host a significant percentage of both our public cloud and private cloud offerings in EC2, and prospective customers are delighted to hear that we will support them with optimized configurations, best practices, and more if they want to run their own FlowTraq instance in Amazon’s data centers.

Until recently, however, network visibility in AWS was a bit of a blind spot. But last year, AWS offered the ability to generate CloudWatch logs of network flows to and from your EC2 hosts.

Unfortunately, the logs are not in IPFIX format, the de-facto standard of modern network flow reporting, or any other common format, but a proprietary format. So feeding AWS “netflow” directly into an off-the-shelf flow analyzer is a non-starter.

Fortunately, AWS also offers a clever feature called Lambda, which lets you run a snippet of code in response to an event (such as the generation of a CloudWatch log). The code runs on the AWS infrastructure, but not on a particular EC2 host, and you have your choice of a JavaScript, Java, or Python runtime environment. You pay by the request and by the millisecond.

So we can architect a fairly simple solution: write a Lambda script that responds to the generation of a VPC Flow log and puts the log information on the wire, addressed to a destination IP and TCP port of our choosing. On that destination we run a listener that grabs the information, repackages/converts it to IPFIX, and forwards it to FlowTraq for analysis.

AWS VPC flow logs into FlowTraq

As far as we know, what follows is the best and easiest way to get AWS EC2 CloudWatch logs converted to IPFIX for NetFlow analysis, in FlowTraq or otherwise!

Prerequisites

If you haven’t already, set up an AWS CloudWatch Flow log IAM role and a log stream for the virtual interface you want to monitor, per the AWS VPC Flow Logs User Guide.

The Lambda script

The Lambda script is pretty simple:

var zlib = require('zlib');
exports.handler = function(event, context) {
    var payload = new Buffer(event.awslogs.data, 'base64');
    zlib.gunzip(payload, function(e, result) {
        if (e) {
            context.fail(e);
        } else {
            var host = '10.0.1.8';
            var port = 20555;

            var message = new Buffer(result.toString('utf8'));

            var net = require('net');

            var client = new net.Socket();
            client.ref();
            client.connect(port, host, function() {
                client.write(message, function() {
                    client.end();
                    context.succeed("Success");
                });
            });
        }
    });
};

There are two configuration variables to consider, namely the IP and destination port to send the logs. Substitute the IP of the host you plan to run the listener on (which may be, but need not be, the same as your FlowTraq server, but in any event must have a public IP) and a port that your firewall will let external IPs connect to.

Create a Lambda function and paste that bit of JavaScript into it.

The listener

As they’re emitted by our Lambda script, there’s some JSON giftwrap around the log lines, which themselves look like this:

version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status

On the other hand, FlowTraq’s IPFIX exporter expects inputs as follows:

client ip, port, server ip, port, protocol, client packets, client bytes, server packets, server bytes, start time, end time, exporter ip, application name

Python’s good for many things, but it’s great for 1) creating a lightweight TCP server, 2) parsing JSON, and 3) manipulating strings.

These three tasks turn out to be the bulk of what’s required of a listener, and we can accomplish our goal in just a few dozen lines of code.

import json
import socket

# CONFIGURATION FIELDS
TCP_LISTEN_IP = "10.0.1.8"
TCP_LISTEN_PORT = 20555

EXPORT_AS_IP = "127.0.0.1"
# STOP EDITING

def printLogEvents(jsonFormat):
    data = json.loads(jsonFormat)
    
    logEvents = data['logEvents']
    
    for logEvent in logEvents:
        message = logEvent['message']
        messageSplit = message.split()

        # rearranage the fields from the VPC's log format to ftsq's in format, setting the <- bytes/packets to 0, the application "unknown"
        # and the exporter IP to the IP configured above
        ftsqFormat = ""
        ftsqFormat = ftsqFormat + messageSplit[3] + ","
        ftsqFormat = ftsqFormat + messageSplit[5] + ","
        ftsqFormat = ftsqFormat + messageSplit[4] + ","
        ftsqFormat = ftsqFormat + messageSplit[6] + ","
        ftsqFormat = ftsqFormat + messageSplit[7] + ","
        ftsqFormat = ftsqFormat + messageSplit[8] + ","
        ftsqFormat = ftsqFormat + messageSplit[9] + ","
        ftsqFormat = ftsqFormat + "0" + ","
        ftsqFormat = ftsqFormat + "0" + ","
        ftsqFormat = ftsqFormat + messageSplit[10] + ","
        ftsqFormat = ftsqFormat + messageSplit[11] + ","
        ftsqFormat = ftsqFormat + EXPORT_AS_IP + ","
        ftsqFormat = ftsqFormat + "\"unknown\""
        
        print ftsqFormat

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.bind((TCP_LISTEN_IP, TCP_LISTEN_PORT))

sock.listen(1)

while True:
    connection, client_address = sock.accept()

    jsonFormat = ""

    try:
        while True:
            chunk = connection.recv(1024)
            
            if chunk:
                jsonFormat = jsonFormat + chunk
            else:
                break
            
    finally:
        connection.close()
        printLogEvents(jsonFormat)

This time there are three configuration variables to consider. As before, an IP and TCP port to listen on, which ought to line up with the Lambda script. In addition, you must specify the IP to use as the exporter IP, which is how the flows will appear in FlowTraq. If you want to monitor multiple EC2 hosts or VPCs, you should use different exporter IPs to distinguish them.

Putting it all together

Set up the VPC flow log stream and the lambda script, and subscribe the lambda script to the flow log event source. Then on your listener host, invoke the listener script, and pipe the output to the "ftsq" command. The parameters to put ftsq in exporter mode are:

./ftsq -read -ipfix collector-ip port

e.g.

./ftsq -read -ipfix 127.0.0.1 4739

Therefore, use the long-lived command

python AWSVPCFlowlogListener.py | ./ftsq -read -ipfix 127.0.0.1 4739

To get the VPC Flow logs into FlowTraq.

The next time CloudWatch generates a batch of flow log files, you should see a new exporter appear in FlowTraq with those flows. Et, voilà!

Screen Shot 2016-06-09 at 6.52.40 PM