Every minute counts when it comes to resolving a security breach or data leak—that’s why mean time to resolution (MTTR) is a key performance indicator. Because time is money, having a plan for resolving these types of network attacks quickly is paramount to preserving a company’s reputation, customer relationships, and bottom line.
When it comes to identifying and resolving security issues, there are four key phases: awareness, root cause analysis, remediation, and testing. Expediting each phase can help significantly reduce MTTR. Here, we outline these key phases and provide you with insight into the types of tools and visibility required to accelerate them:
You can’t solve a problem unless you know you have one. So recognizing that you have an issue—awareness—is the first step. It can sometimes take weeks, even months to detect a data breach or data leak, because these sorts of cybercrimes often fly under the radar until they are finally discovered. Imagine if Sony, Anthem, Home Depot and Target had identified their data leaks sooner. How could it have helped reduce the fallout?
While many intrusion detection tools can help flag DDoS attacks, brute-force attempts, botnets and other external threats, data breaches are much more difficult to identify. Because of this, network administrators need to look beyond their intrusion detection tools in order to pinpoint a larger, more severe problem like a data breach before too much damage has been done.
Network behavior intelligence tools provide visibility into your network that detection tools can’t. They enable you to understand normal traffic patterns so when anomalies do occur you can spot them. By analyzing network traffic flow records and recognizing both hosts initiating a connection and hosts receiving data outside of normal thresholds (or exhibiting unexpected network behavior patterns) you can flag potential data leaks in real time.
For example, maybe your organization has been receiving suspicious emails at unusual times, or your network is running really slowly, or confidential documents have appeared outside of your company firewalls. No matter how good your detection tools are, data exfiltration is more often revealed through these types of secondary indicators, which may point to lateral movement in your network making your organization an unknowing participant in an attack. But by providing a macro view of all aspects of your network infrastructure with network behavior intelligence tools, administrators can detect potential problems on the spot.
2. Root Cause Analysis
Once you recognize that your company is potentially under attack, you have to determine the source so you can immediately mitigate it. In the root cause analysis phase you quickly piece together what you know and draw a conclusion. To use an analogy: You m smell smoke in your house and your fire alarm goes off, but if you open the doors and windows to let the smoke out and take the battery out of the smoke alarm you’re just addressing the symptoms, you’re not solving the problem. You’re actually making it worse. In this case you know that a fire is the root cause of your problem and that you need to put it out, quickly (but you don’t know yet what started it—that investigation will be necessary later).
When it comes to performing a root cause analysis on a network breach you need to first determine which computers are leaking. Are they still leaking? Has the hardware failed? Is there a faulty router? And you need a quick fix to “put out the fire.” Afterwards you will need to go back and perform forensics to have a deeper understanding of what happened and how you can prevent it from happening again.
Identifying the root cause is the most challenging step to resolution because there is so much variability. In the case of a data breach, nothing is “broken” so you won’t be able to diagnose the problem with detection tools. But network visibility tools provide a window into your network, which enables you to correlate events, isolate specific suspicious IP addresses, and pinpoint the root cause of a leak much more quickly to speed MTTR. The tools and processes you have in place will determine how long this step will take and how much it will cost you.
Once you’ve identified the root cause of the problem, you can fix it. This remediation phase is a quick fix to the problem to stop the leaking, but not a long-term solution. It’s more like addressing the symptom, not the problem.
Every network is different and remediation is going to be determined by your primary network function. It could involve shutting down computers, creating new firewalls, resetting passwords, and any number of other things to stop the leaking and get your network operational again. But remediation is not meant to be a long-term solution. You will need to take the time afterwards to go back and find out what caused the leak in the first place through forensics. It’s important to keep in mind that remediation is never a one-size-fits-all solution, it’s more of a many-to-many solution, but network visibility tools provide you with much more knowledge about how to find the problem and how to remediate it.
Once the problem has been remediated you need to test your solution. Look at historical tools, current traffic patterns and whether the data leakage is still occurring. If it is, you likely haven’t found the root cause—or maybe you have, but you haven’t effectively remediated it. You will need to repeat steps 1-3 until you can confidently determine that, after testing the solution, it’s working.
The unfortunate reality is, once people scramble to put the fire out, they often don’t spend a lot of time on the aftermath of the breach—that’s why it’s so important to go deeper through forensics and ensure that you’ve developed a more permanent solution to keep your network safe in the future.
THE RIGHT SOLUTION FOR SPEEDING TIME TO RESOLUTION
At almost every organization, network downtime is inevitable. But while detection tools are important for flagging certain types of network security threats, when it comes to identifying data leaks and security breaches, you need network visibility tools. Instead of wasting time guessing what could have gone wrong, you can use network data to pinpoint the precise time that the problem first occurred. But not all network visibility tools are created equal:
Full packet capture solutions
Full packet capture solutions let you look deep inside the packets, and provide a granular view of data, but they do not provide a long enough history to be useful in an investigation, nor are they fast enough for real-time analysis. Full packet capture solutions cannot scale to the level or speed that’s required for root cause analysis or forensics and they cannot provide you with the network data you need beyond a few weeks. Without sufficient traffic history you won’t be able to answer questions such as: Where did the leak originate? Where was the data sent? And are you still leaking? You need full-fidelity tools for that.
After a data breach or security leak, you will need full network data recall that goes back far enough to stage a meaningful incident response and forensic investigation to understand what happened, how long it was happening and what other systems may have been affected. Full-fidelity network visibility tools are much more thorough and can help speed MTTR because of their fast detection, scalability, and access to complete historical data. This allows network administrators to analyze when the problem first occurred and trace it back to its source(s). If you want to protect your organization from the damage of a potential data leak, full-fidelity network visibility tools will be your best defense.
As perpetrators develop more sophisticated ways to steal sensitive data and breach our security systems, organizations must develop more sophisticated defenses. Understanding normal patterns of behavior on your network through full-fidelity network visibility tools is your best strategy so that when anomalies occur, you can spot them immediately, act fast and stop the leaking as quickly as possible.
On any given day, a typical networked host will send about 30MB and receive about 200MB. About 300,000 packets are switched. During peak times, the average workstation initiates two to four network UDP or TCP sessions per second, and each session averages 34KB in size, roughly 100 packets. What’s more, these sessions are negative-exponentially distributed with regard to packet count. What does that mean? It means there are a lot more very short sessions of only a couple of packets, and then there are lengthy sessions with lots of packets.
When routers use sampling for NetFlow generation, an interesting thing happens. The sampling is done on a packet-count level, so a 1:512 sampling rate will grab roughly every 512th packet to update the flow state tables.
This is great for reducing CPU load. But it is not so great at reducing flow update rate. Here’s why: With an average session size of roughly 100 packets, each sampled packet is very likely to be part of a flow that is not yet in the state table. This means an entry is created, which will lead to a flow update being sent. Compare this to 1:1 unsampled flow generation, where most of the packets will go toward updating existing entries in the flow state table. Flow state tables are typically exported when a flow is 60 seconds old, or the table is full, and the old ones need to be purged.
Leaving the exact math out for clarity, if unsampled flow generation results in a flow rate of X, then a 1:512 sampling results in a roughly 1/5th of the NetFlow being generated. Not 1/512th.
This is the intuitive answer, and the true results of sampling depend much on the precise mix of traffic present on the network. Also, some routers will use adaptive flow sampling rates to keep their flow export rates constant. This means that at busier times, the granularity of the data becomes less and less. Although this is nice for CPU time considerations on the router’s end, it does not help much that the roughest data is collected during the heaviest attack!
“How much security is enough, given a level of acceptable risk?”
A friend in the security industry asked me that question, the other day.
Isn’t it true that that forever and a day the generals of this world have struggled with that problem? Not necessarily “how much security is enough” but “how do I strategically apply the security I have?”
The deployment of battle tanks has evolved significantly since they were first introduced in 1916 during WWI to break through trench lines. Later, lightweight anti-tank weapons led to improvement in tank design, such as heavier armor. Today, modern tanks seldom operate alone. They are organized into combined arms units which involve the support of reconnaissance, ground-attack aircraft, and infantry. 
Tank commanders evolved to deal with a changing threat landscape, and network security executives should be doing the same.
In my daily interactions I often find network defenders deploying their limited defensive dollars in the rookie manner. Kind of the way I apply drywall mud: even layer everywhere, essentially leaving all the big bumps still visible, but now covered in an equally applied defensive layer. Meaning it is spread thin across the board. I’ve seen some network security centers spread their mud across the wall, rationalizing their argument as: “everybody has a firewall, and a WAF, and an IDS, and a SIEM, and a etc. etc. etc.” Try to buy cheap in all those areas, and only buying “weaponry,” if you will. Firepower.
The smart defenders (of which there are truly few) orchestrate a “combined arms” approach. They look for the parts of their organization (mind you, I did not say “infrastructure”) where they are truly weak or at risk. Then they make security deployment decisions. Organizations differ, and because we employ people, their security aptitudes can vary widely from one organization to the next. Individuals vary greatly in their susceptibility to delusion or deception, and gullibility even when it comes to handling security threats. Typically the landscape differs across industries. For instance, a web services firm (ecommerce) will have a very different threat profile than an insurance company (call center). And the threat landscape changes.
Although this may sound obvious, because most of you have been analyzing these environments for decades, the truth is that in every organization there are people with constrained budgets and little experience, but lots of technical smarts, who are being asked big planning questions. In fact, I dare even to say that security folks tend to be picked on their proven tactical skills, meaning they are good with scripts, and code, and exploits, and etc., while few ever consider their ability to think strategically about security. Trust me, a C-level exec will often entrust entire security policy to a kid who understands how a stack smash works, while the same kid cannot figure out how to talk to anybody at the soda machine. Let alone truly analyze the terrain the org should be trying to defend.
So the odd duck out is the security strategist who truly knows how to take a thin budget and apply it where it matters, to the specific organization he/she is trying to defend. And when I find individuals who can think at that level I see a couple of stark product choice differences:
1. They invest in visibility more than in defenses (crosshairs, more than firepower).
2. They invest in process, procedure, and triage plans (yes, there’s products in this space, but a lot comes down to design and training) and finally:
3. They hold their vendors accountable for being part of the overall defense, instead of purchasing based on a checklist of features from an earlier version of their security product.
And so unfortunate is the difference here, though, that vendors have started pricing their firepower much higher than their aiming equipment. This is why we know that the former type of CISO is much more prevalent than the latter.
 Source WikiPedia http://en.wikipedia.org/wiki/Tank
NetFlow is an abundant data source in any organization, because most routing and switching devices will export some form of NetFlow/sFlow/jFlow/IPFIX. Using flow formats to detect an incoming Distributed Denial of Service DDoS attack is therefore both a logical choice, as well as a practical convenience. However, there are also drawbacks to using NetFlow which may impact detection speed if not carefully considered. Below are some factors that impact the effectiveness of fast DDoS detection using NetFlow.
A big factor in fast DDoS detection is the speed at which your routers or switches are exporting their flows. Most devices are configured to export flow records when the flow is 60 seconds old. This means it may add up to a full minute delay before the router starts sending evidence of the ongoing attack. Since for most devices this delay is configurable, we highly recommend you dial this number down to 15 or even 10 seconds. Every second counts when fighting an incoming DDoS.
When a “view”, or a “graph” of NetFlow data is displayed, you are actually looking at an aggregation of the underlying NetFlow. Meaning the actual flows are dropped in buckets before analysis can happen. Most typically DDoS attacks are detected based on a significant deviation in volume. This means a threshold is set, or a threshold is learned (both options are available in FlowTraq), and the “buckets” must fill to this threshold before a DDoS is detected. The bigger the buckets are, the longer it will take to fill to the threshold point. Smaller buckets overflow faster, leading to faster detections in most cases. What this means is that in general that the bigger/louder/volumunious a DDoS attack is, the sooner any detector will pick it up.
Conventional flow tools create their minute-by-minte (or even 5-minute-by-5-minute) buckets as the flow comes in, loosing granularity instantly. FlowTraq does this differently: flow is stored, and at the time of graph/view creation the buckets are filled dynamically at the required granularity, giving FlowTraq a detection speed advantage.
Sampling is technique used to reduce the load on the analysis engine, because analysis engines typically don’t scale well. When sampling 1:50,000 flows, the error in the DDoS analysis becomes much bigger. Meaning: you cannot simply trigger an alert when you cross a threshold, because there’s a potentially huge mathematical error in a high sampling rate. The bigger the sampling rate, the bigger the deviation needs to be to confidently say there is a DDoS happening. FlowTraq was designed to scale in clusters to handle large volumes traffic while reducing the need to sample data. This means the answers are more accurate, and FlowTraq alerts faster with higher confidence.
FlowTraq reduces or eliminates the need for sampling. More accuracy in your data, means more speed in your DDoS detection.
Finally, not all DDoS attacks are equally easy to pick up. SynFlood and reflection style attacks are more straightforward, since both are very volumetric and loud in their nature. SlowLoris and RUDY style DDoS attacks are much harder to see fast, as the traffic volumes may never exceed normal levels on the packet level. Instead, FlowTraq detects these attacks by tracking the number of concurrent sessions, which do go up substantially.
When I ask people about their network visibility, they often tell me they have “network monitoring software” installed. Often “network visibility” and “network monitoring” are used to mean the same thing. But they are not. Let me explain the difference:
Monitoring is simply watching for conditions such as downtime or link saturation and fixing them when they happen. Network Monitoring is generally bred from a desire to keep the packets moving.
Visibility goes much further in that it is bred from a desire to be safe, and understand what is happening on your network. If people seek to increase visibility into their networks, they are looking to get multiple, or many vantage points from which they can observe what is happening, and might learn what has happened in the past. This is driven by a desire to investigate badness, catch bad actors, understand data movements, so we can decide what may be a data leak.
Although to most people “visibility” and “monitoring” are the same thing, if we take a good look at the kinds of products that are offered in each space, we see they are tailored to completely different use cases: Most monitoring products are rough, collect only what is needed to solve the “uptime” problem, and don’t offer much in the way of handling the unexpected. Time-frames are rough, often 5-minute-by-5-minute, and historical data is notional at best. Most of all, the different views of the network are limited and simplistic, lacking insight.
Monitoring tools help you deal with KNOWN future situations, while Visibility tools prepare you for dealing with UNKNOWN future situations.
Good visibility products collect data from many vantage points, offer a myriad of views into this data, and store histories, so that you have the tools you need to create the understanding of what may have happened. Visibility tools prepare you to deal with unknown future situations, where you simply cannot know today, what you will need to be looking at tomorrow. These tools offer a level of depth and complexity that allows a skilled operator to gather insight into almost any aspect of keeping the network safe and operational.
This difference is reflected in the price point of the different classes of products. Those who do not take the time to understand the difference between Network Monitoring and Network Visibility will not understand this difference, and dismiss it. However, those who want to understand what is happening on the network, to keep the network safe and operating, will gladly make the investment in good visibility products. After all: how do you know what medicine to take, if you don’t know what you’re suffering from?
A large distributed reflection attack can potentially sustain flood rates of a terabit or more, for extended periods of time when properly orchestrated. As networks and server capabilities grow in the future, the potential of these attacks will increase. What is important to understand is that stopping an attack of such magnitude requires hard work, and cooperation of many participating parties around the globe. Defending against an attack of this size takes time because it cannot easily be “turned off”. Here’s why:
After a week spent testing servers, updating OpenSSL, and changing lots of passwords and even SSL key pairs, it is understandable for IT pros to be feeling tired of hearing about Heartbleed. However, there are good reasons to ask ourselves some hard questions right now, and to take to heart some of the lessons — uncomfortable or not — to be gained from answering them. The bug of the week (year?) may be fixed, but it is not the only bug out there and there is a difference between being safer and being safe.
It seems these days that the marketplace is saturated with flow export formats. CISCO has NetFlow, InMon has sFlow®, Juniper uses JFlow, and there are several others. Few of these manufacturers seem to release details on the inner workings of their protocols, and their subsequent benefits. What follows is an overview of flow technologies.
For the NetFlow suite of protocols we most often see version 5 (supported by the majority of devices), some combined v5/v7 (the Catalysts), and some version 9 on the newer devices. Don’t be fooled by the ASA series of firewalls; they do not actually support version 9 flow exporting. Instead, these CISCO devices use NetFlow 9 to firewall events, similar to log lines: no real traffic records in there! NetFlow v5 uses a static packet format (and is in this way very similar to v7), defining IPv4 IPs, protocols, ports, and millisecond precision on flow start and flow end times. Version 9 uses a dynamic format which is parsed based on a template which is sent around first. These templates are flexible and allow for expansion of the protocol in the future. (Incidentally, IPFIX is based on it also and is versioned as NetFlow 10). JFlow and CFlow are the same as CISCO Netflow v5. Only NetFlow v9 and IPFIX support IPv6. (more…)
Any organization that manages a large, geographically dispersed, physical infrastructure (like power grids, water, oil & gas, nuclear facilities, sewage or chemical plants) naturally employ computer systems that measure temperatures, open and close valves, and turn devices on or off. These computer systems generally are referred to as “SCADA/PCS” systems, which stands for: Supervisory Control And Data Acquisition / Process Control System.
These systems are designed to allow operators to have central control of production processes, and monitor and measure system performance. Traditionally these systems were designed with tons of wires running to unique devices. Each sensor needed its own wire. So did every electronic valve, motor, heating element, etc. This got confusing and expensive fast, so several common-bus protocols were designed to put many devices on the same wire, and keep their communications from mixing. A popular one is named ‘ModBus‘.
No evidence of compromise is not the same as evidence of no compromise.
Simply put, if you don’t see any evidence that somebody might be stealing your data, it does not mean that nobody is stealing your data. If I’m not watching the backdoor, I cannot know for sure that nobody is walking in or out. Not until I start paying attention. The same goes for your computer network. And it is time to start paying attention.