In this article, I’ll perform a deep dive into Amazon’s Virtual Private Cloud (VPC) flow logs. I’ll discuss how to interpret the data they produce, as well as how to find security related incidents. Finally, I’ll provide some advice on automating the review process.
There is already some great advice out there on how to configure your AWS environment to generate flow log data. The Amazon User Guide is a great place to start. Alex Barsamian has a great article on configuration as well as importing flow logs into Flowtraq. Rather than repeating the same steps, I’ll jump right into analyzing the data.
Flow logs can be created for specific system interfaces or entire VPCs or subnets. Flow logs provide a level of detail similar to Netflow or IPFIX compatible systems, although Amazon does not precisely follow either of these standards.
Figure 1 shows an example of some flow log data. Each row represents a unique communication flow that was recorded. From left to right, here is a description of each of the fields. Note that each field is space separated.
Let’s walk through an example by looking at the first row of data in Figure 2. “13:10.03” tells us the log entry was created at approximately 1:10 PM and 2 seconds UTC (the seconds are expressed in hundredths). The current flow log version being used is “2,” and “315” are the last three digits of the AWS account used in this example (prior digits are blacked out). “Eni-768d1092” is the network interface ID of the system where these packets were sent or received. To see which system uses this interface, login to your AWS account and go the the EC2 dashboard. In the left hand menu, select “Network Interfaces” and search for the network interface identified in the log entry.
The next field, “220.127.116.11,” is the source IP address that is sending the traffic. In this case an IPv4 address is being displayed. IPv6 addresses are also supported. Note that flow logs do not clearly identify inbound and outbound traffic direction. I need to know that this is not an IP address that I am using in order to identify that this traffic is headed inbound from the Internet. The next field, “172.31.8.226,” is the IP address of the receiving system. Since this is a private IP address, this must be our cloud instance.
The next value, “42608,” identifies the source port being used by the transmitting system. The next value, “23,” identifies the port expected to receive this traffic. TCP/23 is Telnet traffic. If I do not recognize the source IP address, this is most likely someone probing to see if they can find a Telnet server to exploit. The next value, “6,” confirms that this is a TCP packet.
Our next two fields are where things get interesting. We see that this flow included “2” packets, for a total of “80” bytes of information. Let’s assume the source IP transmitted a packet, and when no response was received, the packet was retransmitted. This would mean both packets would be nearly identical, so it would be safe to assume that each packet contained 40 bytes of data (80/2=40). Both the IP and TCP headers are 20 bytes in size when no options are set. While it is normal for a system to not set any IP options when transmitting a packet, it is unheard of for a modern operating system to not use one or more TCP options. When TCP options are set, the TCP header grows in size. Since no TCP options are set in this packet, it must have originated from some kind of security tool like a port scanner. So if we don’t recognize the source IP, and that system is running a port scanner, they are most likely hostile.
The next two fields are the start and stop time of the network traffic expressed in Unix time. This is an expression of the number of seconds that have elapsed since January 1st, 1970. If you need to convert these values to normal date/time stamps, there are online converts or you can write your own. For example, the start time translates to May 12th, 2017 at 1:10 PM and 3 seconds UTC. The traffic stopped at May 12th, 2017 at 1:10 PM and 57 seconds UTC.
The next field, “REJECT,” tells us how the packet was processed. You can use Amazon security groups and network ACLs to define which traffic patterns you wish to permit through. Which packets are permitted will be based on an intersection of these two security features. In other words, the traffic pattern must be permitted by both in order to reach your instance.
It is worth noting that Amazon does not use “REJECT” in the classic firewall sense. With firewalls, you can typically apply one of three actions on a packet stream:
When Amazon describes a flow as being rejected, what they actually mean is that the packets have been dropped. This is important when you are troubleshooting inbound connection failures as blocked packets will not return an error code. This is also important if you are controlling outbound sessions as blocked traffic patterns will take longer to timeout, and thus use more CPU and memory.
The IP address in the flow log entry may not reflect the IP address in the actual packet. For example in the first row in Figure 1 the destination IP address is recorded as “172.31.8.226.” Since this is a private address, it is not possible for the Internet based source IP address in this entry to be sending traffic to this private IP address and have it arrive at our instance. So our instance must have a public IP address associated. This is the real destination IP address that was recorded in the packet. However flow logs convert all entries to the primary private IP address.
This is true for traffic in both directions. Note that in row five the source IP address is listed as “172.31.8.226.” It would be impossible for a host on the Internet to respond to this private address, so this packet must actually contain a legal IP address. Again, we are simply being shown the primary private IP address associated with this interface.
A session typically refers to one complete information exchange between two systems. For example, a “TCP Session” would include all packets in both directions from the initial SYN (connection establishment) to the final FIN/ACK (connection teardown).
For example, refer back to Figure 1 and look at row seven. Here we see 6 packets moving from source port TCP/24901 on 18.104.22.168 to destination port TCP/80 on 172.31.8.226. This would be described as a single flow. If you jump to row ten, you’ll see 5 packets moving from source port TCP/80 on 172.31.8.226 to destination port TCP/24901 on 22.214.171.124. Combine these two flows together, and they describe a single session between the two systems.
If you keep looking through the data, you may notice that in row 13 there is another flow between these two systems. While you may initially think that this flow is also part of the above session, note the source and destination ports being used. In row 13, we see 5 packets moving from source port TCP/80 to destination port TCP/24925. Since at least one of the TCP ports being used has changed, this flow is part of a different session. If you look at row 13, you will see 6 packets moving from source port TCP/24925 on 126.96.36.199 to destination port TCP/80 on 172.31.8.226. This flow is also part of this second session.
When you are looking at a single flow entry, you are only looking at the packets that move in one direction. You need to find the complementary flow to account for all of the packets used in the session.
While flow logs are not very granular, they can be used to test for a number of different security conditions. Here are some possibilities:
In short, you should become familiar with how your instances normally operate, and define this as your operational baseline. You can then monitor for changes in this baseline in order to prompt deeper investigations.
In order to extract value from your flow log data, you are going to need to be able to manipulate it. You should be able to search it for specific values, create filters, combine and summarize multiple flows, and create threshold alerts. You have three possible options, work with the Amazon supplied tools, build your own system, or buy a turnkey solution.
Flow Logs can be manipulated directly within the Amazon graphical interface via CloudWatch. Once you login, simply go to the CloudWatch console. You can select the log group you wish to review, and view the log entries via the Console shown in Figure 2.
While the search capability is pretty rudimentary, it is sufficient for finding simple patterns in the data. Note the process can be pretty labor intensive. CloudWatch also lets you set filters and alerts to trigger on specific events. These can help reduce the amount of data you must sort through, as well as notify you when a predefined event takes place. However this capability is also pretty rudimentary. For example I can set an alert to warn me if five connection attempts occur within a defined period of time to my SSH server. However I cannot say I only want an alert if all five come from the same IP address. This can make defining a proper threshold problematic. My goal is to trigger an alert if someone is trying to bruteforce my SSH server. However since I don’t have the granularity to group events by IP, I may see false positive alerts if multiple administrators are working on the system. So using Amazon’s tools is probably the least preferred option.
Another solution is to build your own logging system and export the flow logs into this system. While many environments already have a centralized logging solution, you should consider separating out both flow logs and/or firewall logs to their own system. Networks see a lot of packets which can create a huge number of flows or log entries on a regular basis. By keeping the system separate you can keep it easier to manage.
Building your own system can seem attractive, as you can get started immediately and for very little cost. For example, a relatively robust system can be build on Linux using Elasticsearch, Logstash and Kibana. This is typically referred to as the ELK Stack. However, what you are getting is a starting framework. You still need to decide how you will configure the system to monitor for suspect activity. This portion can take many hours. Further, these setups tend to be quite tribal. There is usually only one, possibly two, people that understand the full system. So PTO or a separation can leave a company hard pressed to manage the system.
If you are a very small company, or simply need to address personal use, an ELK stack or similar may be the way to go. However if you are an organization that is looking to mature and grow, you should consider a turnkey solution that includes support.
FlowTraq provides a turnkey solution that is capable of processing flow data, once it has been properly converted. As mentioned earlier in this article, Alex Barsamian has a great write up on importing flow logs into Flowtraq.