How Fastly Protects its customers from Massive DDoS threats including the Rapid Reset attack

By admin
How Fastly Protects its customers from Massive DDoS threats including the novel Rapid Reset attack

Customer traffic on Fastly is not vulnerable to the massive

Rapid Reset DDoS
ORG

attacks that have been recently disclosed.

At the initial onset of the Rapid Reset DDoS activity Fastly saw high volumes of requests which risked high CPU utilization if not addressed, but our autonomous systems helped detect the method used by attackers and we quickly deployed proper mitigation.

Our protections for massive scale attacks are handled at the edge automatically with detection and defense capabilities that are built into our kernel and network application layer processing stack. These systems defend all Fastly customers from massive attacks because we prioritize keeping the Fastly network up and running for everyone regardless of the package or security offerings they subscribe to. We offer various other security products that allow customers to define more specific security rules for their unique needs, like our Next-Gen WAF , Edge Rate Limiting , and

Managed Security Services
ORG

(

MSS
ORG

), but these massive scale attacks are mitigated for all Fastly customers and traffic at no extra cost. This and other secure-by-design capabilities are built directly into the Fastly network.

Details of the rapid reset attack have been extensively discussed over a number of blog posts by our industry peers but in this write up we want to explain how we are able to detect novel attacks and mitigate them quickly and effectively for Fastly customers. We approach DDoS attacks on our network differently in order to deliver incredible performance and reliability for our customers and their end users all over the internet ALL of the time. Our goal is to make it impossible for them to tell the difference between times when we’re under attack from a record-breaking botnet that causes significant deterioration in other vendors, and times when we’re not.

Timeline of events


Late August 2023
DATE

– A novel Rapid Reset issue was

first
ORDINAL

automatically detected at a volume of

~250 million
CARDINAL


RPS
ORG

and a duration of

~3 minutes
TIME

Our DDoS forensics systems immediately flagged a never-seen-before class of attack and they helped identify the exact method used by the attacker to reach this level of amplification.

Fastly Engineers easily added new capabilities to our DDoS mitigation engine at the edge in order to effectively mitigate the future occurrences of this attack


October 10, 2023
DATE



CVE-2023-44487
ORG

(the Rapid Reset attack that

Google
ORG

and others have written about) was disclosed. This did not present a risk to our customers as it was not novel to Fastly and effective mitigations were already in place. Fastly was not impacted.

Let’s look at Fastly’s principles for dealing with massive DDoS attacks, and some of the specific capabilities that played a role in our Rapid Reset defense.

Fastly’s DDoS defense principles

We have

three
CARDINAL

core principles for how we structure our network defenses against DDoS attacks.

Everything starts with rapid, accurate detection of the malicious traffic

Mitigations
GPE

must be safe to run. Even

one
CARDINAL

false positive is way too many Our defense tactics should be deceptive, minimizing signals that go back to the attackers

Anatomy and economy of DDoS attacks

There are

three
CARDINAL

basic flavors of DDoS attack:

PPS
ORG

(Packet Per

Second
ORDINAL

),

Volumetric
ORG

, and

RPS
ORG

(Requests per Seconds).


Packet
PERSON

Per

Second
ORDINAL

attacks attempt to overwhelm the packet processing engines along the path (

L3
CARDINAL

and

L4
PRODUCT

network layer attacks). They test the performance boundaries of the packet processing engines. They’re cheaper to run because they cost the attacker less network bandwidth. By contrast a “

Volumetric
PRODUCT

” attack tries to overwhelm the transfer capacity of a network by clogging it with data and sending lots of large packets. These have become less common recently, but are still seen sometimes.

RPS attacks poke around your site or application trying to identify computationally expensive objects for which it can then begin to send an overwhelming number of requests. Finding a way to request a complex regex will cause CPU spikes and tax your infrastructure many many times more than requesting a static image that’s served from a CDN.

All DDoS attackers are looking for amplification factors – this would be anything that lets them use the least amount of botnet resources to still launch massive amounts of damage. Attackers have to pay to run their botnets in the same way other legitimate operators have to pay for their computing resources, so it costs them real money to have a bunch of botnet nodes active. They will always be interested in ways that they can have fewer nodes achieving a larger impact. For example, if you can send

1
CARDINAL

attack per

second
ORDINAL

per node then you need

a million
QUANTITY

nodes to send

1mm
QUANTITY

attacks per

second
ORDINAL

. However, if you can find a way to have fewer nodes send many times more requests, then the cost is lower for the same volume. Rapid Reset found a way to have a relatively small number of connections and botnet nodes sending

hundreds of millions
MONEY

of requests per

second
ORDINAL

in their attacks.

Novel or not?

Most attacks can be easily grouped into being either well understood or novel. A well understood attack is usually something that has been seen before, often many times, and slowly mutates over time as attackers try to find better ways to circumvent the defenses that are in place, or look for a new weak spot in the defenses where the attack can still be deployed. The OWASP Top Ten are a great example of this – everyone knows basically what they are, but attackers are constantly tweaking and evolving them, and poking your defenses with new approaches, so you have to stay on top of it.

When an attack is novel it means there’s a fundamentally new approach that hasn’t been seen before. These can pack a big punch and do a lot of damage because the fix often has to be novel as well, and it might take longer to create and deploy.

The Rapid Reset attack disclosed on

October 10th
DATE

was considered novel because it relied on a characteristic of the HTTP/2 protocol that had not been previously exploited.

Rapid detection with accurate fingerprinting

Rapid detection is at the center of an effective response strategy. DDoS attacks accelerate quickly and are often over quickly, so effective defenses must be able to accurately detect an attack and distinguish between the good and the bad traffic in real-time.

Attacks often scale from

zero
CARDINAL

requests per

second
ORDINAL

(

RPS
ORG

) to

millions or hundreds
CARDINAL

of millions RPS after

just a few seconds
TIME

, and then it may be over

less than a minute later
TIME

. As you can see in the chart above, when we look at the attacks* we saw at Fastly from

July 1, 2023 through October 12, 2023
DATE

:


90%
PERCENT

of the attacks have a total duration of

150 seconds
TIME

or less .


50%
PERCENT

of the attacks are under

52 seconds
TIME

!

By the time a human can be made aware of an attack and equipped to respond, the attack is often over. Applying the rules at that time would be like getting a vaccine

the week
DATE

after you’ve already recovered from being sick. It will probably help you the next time you get exposed, but it doesn’t fix anything related to the impact of your

first
ORDINAL

illness. Fastly’s automated detection and response is able to detect and respond without human intervention to mitigate attacks.

Sophisticated attacks like

Rapid Reset
ORG

and others require the discovery of distinguishing characteristics to identify the bad traffic. Anything less results in catching a lot of false positives of the organic traffic it’s blended in with, and this results in the negative consequences we’re all familiar with – customer websites and applications experiencing problems and parts of the internet breaking.

Attribute Unmasking for accurate, automated signature extraction

Fingerprinting is a way to identify specific attacks and distinguish them from the organic traffic on a network. In its simplest form you can imagine that you have the entirety of an attack initiated from a single IP address. Normally, you don’t see any traffic at all from that IP address on your network, but during the attack you’re getting a TON. Your fingerprinting can start by simply identifying the IP address from that datacenter, and blocking it, and congratulations! You’ve stopped the attack. The problem is that over time the attackers will get better at more advanced techniques like blending their traffic in with other legitimate traffic so that it’s harder to identify and separate. This also means that it costs more for your defense team to mitigate it (time, resources, computation, business impact, etc).

A couple of years ago
DATE

the

Meris
PERSON

botnet did this by taking over infrastructure in a lot of campus networks like hospitals, universities, and other institutions, and routing malicious traffic through them to make the attack look like it was coming from these organizations.

This kind of blending with legitimate traffic complicates things because even if you know the IP address it’s all coming from, it would be catastrophic to block that traffic in such an unspecified way because you would also be blocking a ton of legitimate traffic. That would be very bad anytime, but especially in the case of hospitals, ISPs, and other organizations where a blockage would lead to serious real-world impacts!

To address this problem, Fastly uses a technique we call “

Attribute Unmasking
WORK_OF_ART

” to rapidly extract accurate fingerprints out of the network traffic when we are being hit with complicated attacks. For any request coming through a network there are a huge number of characteristics that can be used to describe the traffic Things like Layer 3 and Layer

4
CARDINAL

headers,

TLS
ORG

info, Layer

7
CARDINAL

details, and more. Borrowing concepts from

AI
ORG

, our Attribute Unmasking system ingests the metadata from inbound requests on our network, and extracts the elements that match the shape and volume of traffic over time that matches the shape and volume of the attack.

The system starts by testing individual attributes until it finds one that shows some similarity to the curve of the attack on the network.

Now the system has a candidate to work with, and it starts combining that

first
ORDINAL

attribute with others, testing out sets of attributes, and building a curve that gets closer and closer to representing the entirety of surplus traffic on the network produced by the attack. With each incrementally better attribute set that is identified the system shrinks the degrees of freedom needed to further improve the model until it fails to be able to produce a better fit, and has arrived at an optimized fingerprint for the attack.

This might sound like a computationally intensive process, but it’s all occurring in real-time – identification, fingerprinting, and mitigation.

Our Attribute Unmasking system is an area of continued investment for us. It’s already a stunning achievement. We’re extremely proud of what we’ve accomplished and the way in which it is already protecting us from attacks that impact other networks, but we will continue to improve it and expand its capabilities.

Rapid fingerprinting as a differentiator

Fastly works under a principle that we should do as much processing and decision-making as possible on the edge rather than running things through a centralizing function that will inevitably serve as a bottleneck. We can do that because our network is completely software defined – by removing dependencies on specialized hardware and other components like routers (and other components), all of these functions can be run in a more distributed fashion across the servers in parallel. Fastly prioritizes speed, and in order to do this kind of processing without impacting the experience for our customers and their end users, then we need to perform it at the edge. Our distributed processing and decision making gives us the power and flexibility to process, analyze, diagnose, and respond with effective solutions at the edge in this way.

Rapid fingerprinting wouldn’t be valuable if we couldn’t quickly adapt to new attacks and immediately implement our mitigations. Our system is modular – this means that we can rapidly enhance our detection and mitigation capabilities as new classes of attacks are discovered without needing to develop an entirely new mechanism to respond. When something like the Rapid Reset attack comes along, we simply add a few new functions to our detection and response modules, which keeps our response times incredibly short, even for novel attacks.

Fastly customers experience a direct benefit from this innovation, and the whole point is that they never know when it’s happening because it works! This type of automation is extremely difficult to create. It requires a truly distributed architecture like only Fastly runs, and a pool of talent that isn’t easy to assemble, but it’s worth it when it makes a noticeable difference in the quality of service we are able to provide.

Safe mitigation that ensure low false positives

Every automated system has a risk of generating false positives and blocking legitimate traffic. The industry has a long history of outages where an automated system alone, or in combination with human error created a bad situation, but if you’re too relaxed then you’re starting to let actual attacks through. In order to walk that fine line we basically apply

two
CARDINAL

categories of security rules on the network. Some are part of a basic set of rules, and these are always on – always active. We consider these rules safe to be running all the time without consequence. They’ve been through lots of validation, and normal code review processes, and are considered safe at all times. Our Attribute Unmasking rules are extremely effective, but their constantly changing nature introduces a higher risk of generating false positives. We match these rules up against a “distress signal” so that they are only applied while the network is currently being attacked because these are the times when they help with mitigation. This limits the impact of those rules and keeps them from catching false positives when attacks are not occuring.

Deception as an attack defense strategy

Information is power when it comes to DDoS attacks. When attackers learn something about a network or their previous attempts that gets used to plan their next attack. It’s a cat and mouse game of constant evolution, and by withholding information from the attackers you are making them work harder to figure out if they need to change tactics, and how they should adapt. When most platforms detect an attack they act swiftly to close the connection on the attacker, or deny access to their platform in another way. This signals to the attacker that they’ve been discovered, and also that if they try the same approach again it is likely to be more easily identified and blocked.

At Fastly, we intentionally minimize the amount of information (of any form) that is sent back to the attackers.

One
CARDINAL

example is that we may leave the connections open or use other tactics that make the attacker think they have not been detected, and that the attack is going as planned. When

Alan Turing
PERSON

and the team at

Bletchley Park
FAC

worked to break the

Enigma
PRODUCT

ciphering in

World War II
EVENT

, they knew it was important to not let on that the code had been broken, because the enemy would adapt more quickly and eat away at their advantage.

A recent example of

Attribute Unmasking

Here
WORK_OF_ART

is an interesting example of an attack that was automatically detected by the attribute unmasking. The system detected an increase in the volume of traffic on our network, and within

seconds
TIME

it compiled a signature that effectively matched the curve of the attack. When we reviewed the details of the attack

the next day
DATE

, we looked at what the headers were inside of the attack and the User-Agent was quite peculiar – it looked like this!

User – Agent :

🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡🤡
DATE

*Attack duration data was collected by looking at the ingress requests to Fastly network from

2023-07-01
DATE

to

2023-10-12
DATE

. The onset of attack is registered when a

30%
PERCENT

increase from anticipated baseline is detected, and it ends when traffic is back to expected levels. We have excluded known organic traffic spikes and load testing from this dataset.