One of the most embarrassing moments as a security analyst or operator is finding out from someone outside of your company that one of your internal hosts has been compromised. While 100% of all attacks can not be prevented, as the security caretaker of your infrastructure, you should at least be able to identify when the worst has occurred. You can do this with a network forensic tool.
Many attackers take steps to hide their tracks once they break in. Network forensics can provides valuable insight that may be otherwise difficult, if not impossible to recover from the compromised host itself without a cyber hunting tool in place.
Let’s walk you through how FlowTraq can be used along side your current security tools to paint a more complete picture of a security incident.
For example, you may pull together the appropriate tools, but then fail to take the time to configure them properly to support a later forensic investigation. It’s important to stop waiting until an incident occurs before sifting through traffic and identifying what is normal traffic and what is not.
Leverage FlowTraq’s ability to identify traffic patterns that are considered “interesting”. For example, consider your corporate Web server: What type of outbound connectivity does it need to support? In other words, we expect to see inbound HTTP and HTTPS connections, but, are there any legitimate outbound traffic patterns being used?
First step is to identify legitimate patterns in advance in order to establish an appropriate baseline.
If your Web server is typical, it generates very few outbound sessions. It may leverage NTP to sync time, HTTP or HTTPS to install patches, and that’s about it. All other connection establishment is generated inbound to the system. Once normal traffic patterns from the Web server are understood this exercise can be repeated with each exposed server.
Next step is to leverage FlowTraq’s Policies to identify patterns that fall outside of this baseline.
FlowTraq is configured to alert you if any part of the system is part of the “External Servers” traffic group transfers more than 10MB out to the Internet. With so much flexibility, you could also choose to set thresholds for total connections, activity on specific ports, or some combination of these options.
A Policy that generates an alert when any of my exposed servers transfer more than 10MB outbound in a single session. Note that defining triggers is not an exact science. One of the reasons to complete this prior to an incident is to see if any false positives get triggered. If they do, simply tweak our thresholds to eliminate the noise. Note that networks are an evolving entity so from time it’s wise to revisit the threshold settings. Once Policies are created, configure FlowTraq to submit alerts to a centralized logging solution. This permits consolidation of all alerts in one place for consistent handling based on severity level.
If your alerts are set up correctly, an alert will be triggered when suspicious network activity hits your network.
A possible example is shown in Figure 1. Note that an internal system has sent approximately 400MB of data to some host out on the Internet. You should not expect to see outbound SSH sessions from the internal servers, let alone ones that transmit large amounts of information. This example is is a pretty good indication that there is an incident worth investigating.
Next verify the information in the alert. FlowTraq will show all outbound session establishment from this server, and the amount of data being transferred in each session. This is shown in Figure 2. Note that our server communicated with two systems. The second entry looks like simple patching or upgrading activity. However, the first session clearly shows that an outbound SSH session performed what looks like two large file transfers to a host in Amazon EC2.
While an incident may have occurred because it is not understood why the servers would be transferring out large files, we don’t know for sure if a compromise has taken place. More data is needed in order to see if any additional suspect activity has taken place.A further analysis is shown in Figure 4. This time you can see that the numbers have been plotted showing outbound connections generated by the server. Note that in less than five hours, the server has sent close to 18.5 million packets to just under a half million systems. Given that each of these sessions contained a small amount of data, we can be pretty certain that server is scanning the Internet for open ports on other systems.
If you look at the timeline in Figures 3 and 4, you’ll see that all of this suspicious activity began around 17:45, or 5:45 PM. This begs the question, did anything suspicious happened just before this timestamp? Luckily FlowTraq can help you out there as well. In Figure 5 , you can see the configuration FlowTraq to show us open port responses from the server just prior to this timestamp. Why are we looking at outbound SYN/ACK packets instead of inbound SYN packets? An inbound SYN packet tells me someone was looking for an open port. An outbound SYN/ACK tells me that an open port was detected. This keeps me from having to weed out any unsuccessful connection attempts.
In Figure 4, we see that another internal system 10.0.0.236, made three connection attempts to our compromised server just prior to that server exhibiting suspect activity. Clearly we should scrutinize this new system as well. This is shown in Figure 6. Note that this system initiated communications with just over 2,000 unique systems. While a lower volume, this activity looks pretty similar to the port scanning activity we identified earlier.
The timestamp in Figure 5 indicates that the port scanning activity started at about 5:15PM. Just like we did with the first server, let’s analyze this second server to see if there was any inbound connections just prior to this suspect activity. This is shown in Figure 7. Note that we see just over 9,000 SSH connections from a new unidentified external IP address (126.96.36.199). This pattern is indicative of a brute force attempt. It is extremely likely that this new IP address attempted to guess a valid login name and password combination. Based on the data, it is also extremely likely that after just over 9,000 tries, they got lucky and gained access to the server. This would explain the outbound port scanning activity. This would also explain why so few connections were made to the first suspicious system via SSH. Once the attacker found a valid logon name password combination to use, the same credentials got them into that first server.
When performing network forensics, we need to follow proper incident handling procedures. This will include an “identification” stage, where we attempt to identify if a security event has taken place and if so, what is the scope of the compromise. With this in mind, it is worth sanity checking what we know so far.
We know that:
– 192.168.18.34 transferred about 400MB to an external system
– 192.168.18.34 scanned close to 500,000 hosts on the Internet
– Just prior to this suspect activity, the internal server 10.0.0.236, connected to 192.168.18.34 via SSH
– 10.0.0.236 scanned just over 2,000 hosts on the Internet
– Just prior to this suspect activity, 188.8.131.52 made over 9,000 connection attempts to the SSH port on 10.0.0.236,
It is worth noting that we collected all of the above evidence though a single tool (FlowTraq) by simply analyzing network traffic. It is also worth noting that if our attacker had taken steps to cover their tracks, it would have no impact on our ability to collect network evidence. Further, we have enough information to identify that two of our servers need to shift to the containment phase of our incident handling process in order to minimize further damage.
Based on what we know, we have some unanswered questions.
The first two questions are important because they address whether we have more compromised systems in need of containment. Luckily, we can leverage FlowTraq to search the rest of our servers to search for additional SSH traffic and similar port scanning activity. The third question speaks to eradication and recovery. We need to identify which account was compromised. Hopefully this information was retained in our system logs. We then want to identify if that account was used to access any other internal systems. We also need to identify the best way to ensure the account can no longer be used to compromise our servers (perhaps change from using passwords to public/private keys with SSH).
By generating a baseline which identifies normal traffic patterns, we were able to set threshold triggers to warn us when suspect activity took place.
FlowTraq provides the anchor to identify the scope and initial point of intrusion in a multi-host compromise. This allows quick movement through the incident handling identification and containment phases. Time and effort was saved as over having to analyze each server on a host by host basis.