menu
Webinar: Top Tips for Effective Cyber Threat Hunting Watch Now
FlowTraq > Blog > Working With VPC Flow Logs

Working With VPC Flow Logs

Chris Brenton
By | August 19, 2017


Facebooktwitterlinkedin

In this article, I’ll perform a deep dive into Amazon’s Virtual Private Cloud (VPC) flow logs. I’ll discuss how to interpret the data they produce, as well as how to find security related incidents. Finally, I’ll provide some advice on automating the review process.

Flow Log Setup

There is already some great advice out there on how to configure your AWS environment to generate flow log data. The Amazon User Guide is a great place to start. Alex Barsamian has a great article on configuration as well as importing flow logs into Flowtraq. Rather than repeating the same steps, I’ll jump right into analyzing the data.

Reading VPC Flow Logs

Flow logs can be created for specific system interfaces or entire VPCs or subnets. Flow logs provide a level of detail similar to Netflow or IPFIX compatible systems, although Amazon does not precisely follow either of these standards.

Figure 1: Sample Flow Log data

Figure 1 shows an example of some flow log data. Each row represents a unique communication flow that was recorded. From left to right, here is a description of each of the fields. Note that each field is space separated.

  • Timestamp in UTC of when this flow log entry was opened
  • VPC flow log version number
  • AWS account ID associated with the logs (blacked out in this example except for the last three numbers)
  • The network interface ID of the instance associated with the traffic
  • The source IP address
  • The destination IP address
  • Source port when applicable (UDP and TCP traffic)
  • Destination port when applicable (UDP and TCP traffic)
  • Protocol being used (1=ICMP, 6=TCP, 17=UDP, etc.)
  • The number of packets transmitted in this flow
  • The number of bytes transmitted in this flow (not including frame information)
  • The time the first packet was see expressed in Unix time
  • The time the last packet finished transmitting expressed in Unix time
  • The action taken on the packet based on security group of ACL settings
  • Status of attempting to record this flow (used for troubleshooting)

Interpreting VPC Flow Logs

Let’s walk through an example by looking at the first row of data in Figure 2. “13:10.03” tells us the log entry was created at approximately 1:10 PM and 2 seconds UTC (the seconds are expressed in hundredths). The current flow log version being used is “2,” and “315” are the last three digits of the AWS account used in this example (prior digits are blacked out). “Eni-768d1092” is the network interface ID of the system where these packets were sent or received. To see which system uses this interface, login to your AWS account and go the the EC2 dashboard. In the left hand menu, select “Network Interfaces” and search for the network interface identified in the log entry.

The next field, “39.88.118.15,” is the source IP address that is sending the traffic. In this case an IPv4 address is being displayed. IPv6 addresses are also supported. Note that flow logs do not clearly identify inbound and outbound traffic direction. I need to know that this is not an IP address that I am using in order to identify that this traffic is headed inbound from the Internet. The next field, “172.31.8.226,” is the IP address of the receiving system. Since this is a private IP address, this must be our cloud instance.

The next value, “42608,” identifies the source port being used by the transmitting system. The next value, “23,” identifies the port expected to receive this traffic. TCP/23 is Telnet traffic. If I do not recognize the source IP address, this is most likely someone probing to see if they can find a Telnet server to exploit. The next value, “6,” confirms that this is a TCP packet.

Our next two fields are where things get interesting. We see that this flow included “2” packets, for a total of “80” bytes of information. Let’s assume the source IP transmitted a packet, and when no response was received, the packet was retransmitted. This would mean both packets would be nearly identical, so it would be safe to assume that each packet contained 40 bytes of data (80/2=40). Both the IP and TCP headers are 20 bytes in size when no options are set. While it is normal for a system to not set any IP options when transmitting a packet, it is unheard of for a modern operating system to not use one or more TCP options. When TCP options are set, the TCP header grows in size. Since no TCP options are set in this packet, it must have originated from some kind of security tool like a port scanner. So if we don’t recognize the source IP, and that system is running a port scanner, they are most likely hostile.

The next two fields are the start and stop time of the network traffic expressed in Unix time. This is an expression of the number of seconds that have elapsed since January 1st, 1970. If you need to convert these values to normal date/time stamps, there are online converts or you can write your own. For example, the start time translates to May 12th, 2017 at 1:10 PM and 3 seconds UTC. The traffic stopped at May 12th, 2017 at 1:10 PM and 57 seconds UTC.

The next field, “REJECT,” tells us how the packet was processed. You can use Amazon security groups and network ACLs to define which traffic patterns you wish to permit through. Which packets are permitted will be based on an intersection of these two security features. In other words, the traffic pattern must be permitted by both in order to reach your instance.

It is worth noting that Amazon does not use “REJECT” in the classic firewall sense. With firewalls, you can typically apply one of three actions on a packet stream:

  • Permit = Let the traffic through
  • Drop = Quietly remove the packet from the network
  • Reject = Remove the packet from the network and return an administratively prohibited error

When Amazon describes a flow as being rejected, what they actually mean is that the packets have been dropped. This is important when you are troubleshooting inbound connection failures as blocked packets will not return an error code. This is also important if you are controlling outbound sessions as blocked traffic patterns will take longer to timeout, and thus use more CPU and memory.

Flow Log Caveats

Flow logs do not record every communication flow. Amazon specifically ignores certain traffic patterns.These include:

  • Traffic going to and from Amazon DNS servers
  • Traffic associated with Windows license activation
  • DHCP traffic
  • Traffic exchanges with the default VPC router

The IP address in the flow log entry may not reflect the IP address in the actual packet. For example in the first row in Figure 1 the destination IP address is recorded as “172.31.8.226.” Since this is a private address, it is not possible for the Internet based source IP address in this entry to be sending traffic to this private IP address and have it arrive at our instance. So our instance must have a public IP address associated. This is the real destination IP address that was recorded in the packet. However flow logs convert all entries to the primary private IP address.

This is true for traffic in both directions. Note that in row five the source IP address is listed as “172.31.8.226.” It would be impossible for a host on the Internet to respond to this private address, so this packet must actually contain a legal IP address. Again, we are simply being shown the primary private IP address associated with this interface.

Flows Versus Sessions

A session typically refers to one complete information exchange between two systems. For example, a “TCP Session” would include all packets in both directions from the initial SYN (connection establishment) to the final FIN/ACK (connection teardown).

A flow is a subset of a session and describes some number of packets moving in one direction.

For example, refer back to Figure 1 and look at row seven. Here we see 6 packets moving from source port TCP/24901 on 73.47.160.75 to destination port TCP/80 on 172.31.8.226. This would be described as a single flow. If you jump to row ten, you’ll see 5 packets moving from source port TCP/80 on 172.31.8.226 to destination port TCP/24901 on 73.47.160.75. Combine these two flows together, and they describe a single session between the two systems.

If you keep looking through the data, you may notice that in row 13 there is another flow between these two systems. While you may initially think that this flow is also part of the above session, note the source and destination ports being used. In row 13, we see 5 packets moving from source port TCP/80 to destination port TCP/24925. Since at least one of the TCP ports being used has changed, this flow is part of a different session. If you look at row 13, you will see 6 packets moving from source port TCP/24925 on 73.47.160.75 to destination port TCP/80 on 172.31.8.226. This flow is also part of this second session.

When you are looking at a single flow entry, you are only looking at the packets that move in one direction. You need to find the complementary flow to account for all of the packets used in the session.

With a full session, you need to look at both the IP addresses as well as the ports being used to communicate.

Analyzing Flow Logs

While flow logs are not very granular, they can be used to test for a number of different security conditions. Here are some possibilities:

  • Monitor for outbound SSH and HTTPS traffic. Either could be an indication of a compromised system that is attempting to call home or install additional software.
  • Monitor for inbound ICMP error packets. A host receiving an excessive number of ICMP errors could be an indication that it is compromised and scanning the Internet.
  • Monitor for an excessive number of unique sessions compared to other source IPs. An attacker who is trying to guess your SSH passwords, or scan your Web site for vulnerabilities, will typically generate more unique sessions than what is normal for that service.
  • Monitor for excessive amounts of session data compared to other sessions. If your system has become compromised, and an attacker is transferring your databases and files, they will typically transfer larger byte counts than is normal for a single IP address.
  • Monitor for large increases in traffic to a specific IP or service. This could be an indication that a denial of service attack has just been launched.

In short, you should become familiar with how your instances normally operate, and define this as your operational baseline. You can then monitor for changes in this baseline in order to prompt deeper investigations.

Manipulating Flow Log Data

In order to extract value from your flow log data, you are going to need to be able to manipulate it. You should be able to search it for specific values, create filters, combine and summarize multiple flows, and create threshold alerts. You have three possible options, work with the Amazon supplied tools, build your own system, or buy a turnkey solution.

Amazon Flow Log Tools

Flow Logs can be manipulated directly within the Amazon graphical interface via CloudWatch. Once you login, simply go to the CloudWatch console. You can select the log group you wish to review, and view the log entries via the Console shown in Figure 2.

Figure 2: The CloudWatch Console can be used to view log entries and perform simple searches

While the search capability is pretty rudimentary, it is sufficient for finding simple patterns in the data. Note the process can be pretty labor intensive. CloudWatch also lets you set filters and alerts to trigger on specific events. These can help reduce the amount of data you must sort through, as well as notify you when a predefined event takes place. However this capability is also pretty rudimentary. For example I can set an alert to warn me if five connection attempts occur within a defined period of time to my SSH server. However I cannot say I only want an alert if all five come from the same IP address. This can make defining a proper threshold problematic. My goal is to trigger an alert if someone is trying to bruteforce my SSH server. However since I don’t have the granularity to group events by IP, I may see false positive alerts if multiple administrators are working on the system. So using Amazon’s tools is probably the least preferred option.

Build Your Own System

Another solution is to build your own logging system and export the flow logs into this system. While many environments already have a centralized logging solution, you should consider separating out both flow logs and/or firewall logs to their own system. Networks see a lot of packets which can create a huge number of flows or log entries on a regular basis. By keeping the system separate you can keep it easier to manage.

Building your own system can seem attractive, as you can get started immediately and for very little cost. For example, a relatively robust system can be build on Linux using Elasticsearch, Logstash and Kibana. This is typically referred to as the ELK Stack. However, what you are getting is a starting framework. You still need to decide how you will configure the system to monitor for suspect activity. This portion can take many hours. Further, these setups tend to be quite tribal. There is usually only one, possibly two, people that understand the full system. So PTO or a separation can leave a company hard pressed to manage the system.

If you are a very small company, or simply need to address personal use, an ELK stack or similar may be the way to go. However if you are an organization that is looking to mature and grow, you should consider a turnkey solution that includes support.

Turnkey Solution

FlowTraq provides a turnkey solution that is capable of processing flow data, once it has been properly converted. As mentioned earlier in this article, Alex Barsamian has a great write up on importing flow logs into Flowtraq.