menu
Webinar: Top Tips for Effective Cyber Threat Hunting Watch Now
FlowTraq > Articles by: Alex Barsamian

Author: Alex Barsamian

Deep network visibility and security in AWS VPC

Alex Barsamian
By | May 9, 2018


Facebooktwitterlinkedin

AWS VPC is (nearly) a wonder of the world

Most networking in the AWS VPC cloud is familiar and intuitive to network engineering veterans. And if you think about that, that’s kind of a miracle. All you have to do is pick a region, declare a VPC, and suddenly you have an internal IP space, Elastic IPs for communicating with the world, and virtual routers, firewalls, NAT devices, and more.

All these abstractions behave uncannily like the brick and mortar hardware they’re modeling, making them easy to reason about and design with. And interaction with them is neatly packaged up as APIs, command-line tools, and mostly-decent user interface, enabling network engineers to forget many of the arcane details they used to have to worry about (“how do I configure a VLAN on a Nexus 5000?”)

The black boxes are so nice and shiny that it’s easy to forget all the engineering under the hood. The trouble with black boxes, though, is that sometimes what’s inside surprises you. And sometimes you need a little more control than the knobs, switches, and ports on the outside of the box.

This is especially true in network security and visibility, where 1) your threat detection platforms and analysis tools absolutely need real-time, forensically-accurate network data, and 2) your analysts need access to first-class tools across the whole network, especially the stuff in the cloud.

The state of flow monitoring in AWS

Recognizing those two needs, about a year ago I released a free AWS Lambda tool to convert and forward Cloudwatch flow logs to Netflow v5. My honest hope was that people would use it to forward their VPC traffic from AWS to their favorite flow analysis platform and use that to do some good in the world (and perhaps that some of those might discover FlowTraq and find a new favorite).

Based on customer feedback, the biggest issue with this approach is the reliance on Cloudwatch logging. Cloudwatch flow logging is an AWS abstraction with some pretty serious limitations, the biggest one being that AWS only promises flow logs every ten minutes or so, and even then the promise isn’t a strong one.

Ten minutes’ lag time before detecting a DDoS attack or other security incident isn’t that far from not detecting it at all. It became clear we needed to take another approach to the AWS VPC security problem.

(Incidentally, while Google was late to the party having just rolled out their version of VPC flow logs a few weeks ago, they really showed Amazon up here, with flow updates every five seconds. I plan to discuss their offering in a future post.)

When the built-in options aren’t enough

If you’re committed to the AWS ecosystem, where do you go from here? Well, in a traditional network environment, if your existing hardware doesn’t support flow generation, you have a couple options:

  • If you can get a network tap in there, generate flow off of that.
  • If you can’t, put a device in-line and generate flow off of that. The simpler the device, the better; best case would be a simple layer 2 bridge doing Proxy ARP

Unfortunately, there’s no abstraction for a tap in the AWS VPC. So the first option is out.

As for the second option: it turns out you can’t build a working bridge in AWS, either. That marvelous simplicity-in-complexity that is AWS that was I referring to earlier? It means that some networking concepts don’t translate. In this case the concept that isn’t a perfect match for on-premise networking is ARP.

In an AWS VPC, when host “A” arps for host “B” the response doesn’t come from “B”. In fact, the initial request never even arrives at “B”. It’s captured and handled by something called the “AWS mapping service”. Simply put, there isn’t a traditional broadcast domain in AWS at all. No, not even between machines on the same VPC subnet!

(Want to learn more about the mapping service and other AWS engineering marvels? Check out Eric Brandwine’s talk “A Day in the Life of a Billion Packets” at AWS re:Invent 2013. It’s fascinating stuff!)

What this means is, to get real-time, forensically accurate, reliable flow out of AWS VPCs we actually need to roll our own router instance. Yes, really. It’s not as bad as it sounds, and it will start paying dividends in networking security right away.

AWS

Credit to: https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html


Fill out the form below and we’ll give you access to the detailed How-To Document:

Getting Started with AWS VPC flow logs in FlowTraq

Alex Barsamian
By | June 10, 2016


Facebooktwitterlinkedin

We make pretty extensive use of Amazon Web Services at FlowTraq. We host a significant percentage of both our public cloud and private cloud offerings in EC2, and prospective customers are delighted to hear that we will support them with optimized configurations, best practices, and more if they want to run their own FlowTraq instance in Amazon’s data centers.

Until recently, however, network visibility in AWS was a bit of a blind spot. But last year, AWS offered the ability to generate CloudWatch logs of network flows to and from your EC2 hosts.

Unfortunately, the logs are not in IPFIX format, the de-facto standard of modern network flow reporting, or any other common format, but a proprietary format. So feeding AWS “netflow” directly into an off-the-shelf flow analyzer is a non-starter.

Fortunately, AWS also offers a clever feature called Lambda, which lets you run a snippet of code in response to an event (such as the generation of a CloudWatch log). The code runs on the AWS infrastructure, but not on a particular EC2 host, and you have your choice of a JavaScript, Java, or Python runtime environment. You pay by the request and by the millisecond.

So we can architect a fairly simple solution: write a Lambda script that responds to the generation of a VPC Flow log and puts the log information on the wire, addressed to a destination IP and TCP port of our choosing. On that destination we run a listener that grabs the information, repackages/converts it to IPFIX, and forwards it to FlowTraq for analysis.

AWS VPC flow logs into FlowTraq

As far as we know, what follows is the best and easiest way to get AWS EC2 CloudWatch logs converted to IPFIX for NetFlow analysis, in FlowTraq or otherwise!

Prerequisites

If you haven’t already, set up an AWS CloudWatch Flow log IAM role and a log stream for the virtual interface you want to monitor, per the AWS VPC Flow Logs User Guide.

The Lambda script

The Lambda script is pretty simple:

var zlib = require('zlib');
exports.handler = function(event, context) {
    var payload = new Buffer(event.awslogs.data, 'base64');
    zlib.gunzip(payload, function(e, result) {
        if (e) {
            context.fail(e);
        } else {
            var host = '10.0.1.8';
            var port = 20555;

            var message = new Buffer(result.toString('utf8'));

            var net = require('net');

            var client = new net.Socket();
            client.ref();
            client.connect(port, host, function() {
                client.write(message, function() {
                    client.end();
                    context.succeed("Success");
                });
            });
        }
    });
};

There are two configuration variables to consider, namely the IP and destination port to send the logs. Substitute the IP of the host you plan to run the listener on (which may be, but need not be, the same as your FlowTraq server, but in any event must have a public IP) and a port that your firewall will let external IPs connect to.

Create a Lambda function and paste that bit of JavaScript into it.

The listener

As they’re emitted by our Lambda script, there’s some JSON giftwrap around the log lines, which themselves look like this:

version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status

On the other hand, FlowTraq’s IPFIX exporter expects inputs as follows:

client ip, port, server ip, port, protocol, client packets, client bytes, server packets, server bytes, start time, end time, exporter ip, application name

Python’s good for many things, but it’s great for 1) creating a lightweight TCP server, 2) parsing JSON, and 3) manipulating strings.

These three tasks turn out to be the bulk of what’s required of a listener, and we can accomplish our goal in just a few dozen lines of code.

import json
import socket

# CONFIGURATION FIELDS
TCP_LISTEN_IP = "10.0.1.8"
TCP_LISTEN_PORT = 20555

EXPORT_AS_IP = "127.0.0.1"
# STOP EDITING

def printLogEvents(jsonFormat):
    data = json.loads(jsonFormat)
    
    logEvents = data['logEvents']
    
    for logEvent in logEvents:
        message = logEvent['message']
        messageSplit = message.split()

        # rearranage the fields from the VPC's log format to ftsq's in format, setting the <- bytes/packets to 0, the application "unknown"
        # and the exporter IP to the IP configured above
        ftsqFormat = ""
        ftsqFormat = ftsqFormat + messageSplit[3] + ","
        ftsqFormat = ftsqFormat + messageSplit[5] + ","
        ftsqFormat = ftsqFormat + messageSplit[4] + ","
        ftsqFormat = ftsqFormat + messageSplit[6] + ","
        ftsqFormat = ftsqFormat + messageSplit[7] + ","
        ftsqFormat = ftsqFormat + messageSplit[8] + ","
        ftsqFormat = ftsqFormat + messageSplit[9] + ","
        ftsqFormat = ftsqFormat + "0" + ","
        ftsqFormat = ftsqFormat + "0" + ","
        ftsqFormat = ftsqFormat + messageSplit[10] + ","
        ftsqFormat = ftsqFormat + messageSplit[11] + ","
        ftsqFormat = ftsqFormat + EXPORT_AS_IP + ","
        ftsqFormat = ftsqFormat + "\"unknown\""
        
        print ftsqFormat

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.bind((TCP_LISTEN_IP, TCP_LISTEN_PORT))

sock.listen(1)

while True:
    connection, client_address = sock.accept()

    jsonFormat = ""

    try:
        while True:
            chunk = connection.recv(1024)
            
            if chunk:
                jsonFormat = jsonFormat + chunk
            else:
                break
            
    finally:
        connection.close()
        printLogEvents(jsonFormat)

This time there are three configuration variables to consider. As before, an IP and TCP port to listen on, which ought to line up with the Lambda script. In addition, you must specify the IP to use as the exporter IP, which is how the flows will appear in FlowTraq. If you want to monitor multiple EC2 hosts or VPCs, you should use different exporter IPs to distinguish them.

Putting it all together

Set up the VPC flow log stream and the lambda script, and subscribe the lambda script to the flow log event source. Then on your listener host, invoke the listener script, and pipe the output to the "ftsq" command. The parameters to put ftsq in exporter mode are:

./ftsq -read -ipfix collector-ip port

e.g.

./ftsq -read -ipfix 127.0.0.1 4739

Therefore, use the long-lived command

python AWSVPCFlowlogListener.py | ./ftsq -read -ipfix 127.0.0.1 4739

To get the VPC Flow logs into FlowTraq.

The next time CloudWatch generates a batch of flow log files, you should see a new exporter appear in FlowTraq with those flows. Et, voilà!

Screen Shot 2016-06-09 at 6.52.40 PM

FlowTraq Q4/13 Released

Alex Barsamian
By | November 13, 2013


Facebooktwitterlinkedin

FlowTraq Q4/13 Released & Ready for Download

The FlowTraq team is proud to announce the availability of the Q4/13 release of FlowTraq.  This release provides new features aimed at improving your ability to perform network analysis to secure and manage complex networks: (more…)

Flow-Based Behavioral Fingerprinting FloCon Presentation

Alex Barsamian
By | July 15, 2013


Facebooktwitterlinkedin

FloCon 2013 Presentation by FlowTraq

As a way to demonstrate the powerful algorithms developed for FlowTraq, Alex Barsamian was selected to share this presentation at FloCon 2013.   The presentation focused on identifying network users using flow-based behavioral fingerprinting. During the presentation, Alexander discussed some of the features we extract from flow data to create unique user fingerprints, and how well these algorithms fare at identifying users on a 500-user network. (more…)

You are invited to join the FlowTraq Cloud!

Alex Barsamian
By | June 6, 2013


Facebooktwitterlinkedin

Announcing an industry first: a new way to harness the power of FlowTraq’s flow analytics with the FlowTraq Cloud.

We are excited to announce the availability of the SaaS (software-as-a-service) edition of FlowTraq at cloud.flowtraq.com. With hosted FlowTraq, you no longer have to worry about hardware provisioning, software setup, licensing, or system administration; just sign up, start sending your flows to the cloud, and begin analyzing, sorting, searching, and viewing your network traffic instantly. Access your flow information from anywhere. (more…)

FlowTraq Q3/12 Released

Alex Barsamian
By | July 11, 2012


Facebooktwitterlinkedin

FlowTraq Q3/12 Released

FlowTraq Q3/12 is about to receive the QA department’s stamp of approval and will be ready for download shortly. This new version of FlowTraq packs a number of new features you’ll want to explore:

  • The Dashboard now supports several layout options, allowing for up to four columns of widgets.
  • FlowTraq now supports a flow input protocol: IPFIX over TCP. (more…)

FlowTraq Q2/12 Ready to Download

Alex Barsamian
By | April 13, 2012


Facebooktwitterlinkedin

FlowTraq Q2/12 Released

The development team is happy to announce that the Q2/12 release of FlowTraq is ready to download. This new version of FlowTraq packs in a number of new features you’ll want to explore: (more…)

FlowTraq Q2/12 Preview

Alex Barsamian
By | April 6, 2012


Facebooktwitterlinkedin

The development team is happy to pass along word that the Q2/12 quarterly release of FlowTraq is nearly complete.

We’ve been busy squashing bugs, adding features, and stress-testing FlowTraq, so it’s hard to say what we’re most excited about in this new release, but we think the feature that will get the most notice is SIEM integration. FlowTraq Q2/12 can now send updates on any alertable condition straight to Splunk™, ArcSight™, or your favorite SIEM via syslog/udp.

If you’re a current FlowTraq user or partner, we warmly invite you to download FlowTraq Q2/12 Preview; just let us know if you want to try it.

FlowTraq Q4/11 Released

Alex Barsamian
By | December 23, 2011


Facebooktwitterlinkedin

We are pleased to announce that FlowTraq Q4/11 has been released.

To recap from the beta announcement: we’ve enjoyed a fast-growing customer base this quarter, and you’ve provided us with a lot of excellent feedback which we’ve used to make FlowTraq even better than it was just a few months ago.

Among Q4’s new features:

  • Email notification (SMTP) for Alerts, Scheduled Reports, and System Messages.
  • Significantly more intuitive filtering (more info here).
  • Experimental feature: Interactive connection graphs, which let you explore the relationship between entities in pairwise views.
  • Bug fixes, UI streamlining, and other refinement.

As always, we are grateful to our users for their feedback, and wish everyone the happiest of holidays.

~The FlowTraq Team

FlowTraq Q4/11 Beta

Alex Barsamian
By | December 16, 2011


Facebooktwitterlinkedin

We are pleased to announce that FlowTraq Q4/11 has entered beta testing. We've enjoyed a fast-growing customer base this quarter, and you've provided us with a lot of excellent feedback which we've used to make FlowTraq even better than it was just a few months ago.

Among Q4's new features:

  • Email notification (SMTP) for Alerts, Scheduled Reports, and System Messages.
  • Interactive connection graphs, which let you explore the relationship between entities in pairwise views.
  • Significantly more intuitive filtering (more info here). (more…)

Next Page »