Google recently faced a major outage in many parts of the world thanks to a BGP leak. This incident that was caused by a Nigerian ISP – Mainone – occurred on 12 November 2018 between 21.10 and 22.35 UTC, and was identified in tweets from the BGP monitoring service BGPMon, as well as the network monitoring provider Thousand Eyes.
Google also announced the problem through their status page:
We’ve received a report of an issue with Google Cloud Networking as of Monday, 2018-11-12 14:16 US/Pacific. We have reports of Google Cloud IP addresses being erroneously advertised by internet service providers other than Google. We will provide more information by Monday, 2018-11-12 15:00 US/Pacific.
In order to understand this issue, MainOne Inc (AS37282) is peering at IXPN (Internet Exchange Point of Nigeria) in Lagos where Google (AS151169) and China Telecom (AS4809) are also members.
Google (AS15169) advertise their prefixes (more than 500) through the IXPN Route Server, where PCH (Packet Clearing House) collects a daily snapshot of BGP announcements of IXPN. Unfortunately, 212 prefixes (aggregates of those 500+ announcements) from Google were leaked, which was recorded by BGPMon and RIPEstat.
Looking at the RIPE stats it is evident that the first announcement via MainOne Inc (AS37282) was recorded at 21:12 UTC and the issue lasted for more than an hour:
As per the tweet from BGPMon, the issues lasted for 74 minutes:
Looking at the circumstances around this incident, it’s likely this was an inadvertent leak from MainOne caused by a configuration mistake. A Google representative is quoted in ArsTechnical as saying “officials suspect the leak was accidental and not a malicious hijack”, and also added that the affected traffic was encrypted which limited the harm that could result from malicious hijackings.
Later in the same day, the MainOne Twitter account posted on the BGPMon analysis thread, accepting the mistake and assuring the world that corrective measures are now in place:
So this was a configuration mistake that was quickly rectified and didn’t cause any reported financial damage (even though service outages do cause financial and reputational damage to the service provider and its users), but it does demonstrate the problems that can be caused by accidental mistakes, and especially how an actor with bad intent could do a great deal of damage as with the Amazon Route 53 hijack. It therefore illustrates why greater efforts need to be made towards improving the security and resilience of the Internet.
This BGP leak could have been easily avoided if proper prefix filtering had been undertaken by MainOne (AS37282) or China Telecom (AS4809). It is very difficult for the networks in the middle to block such leaks, because the prefixes are still legitimately originating from the correct AS number (in this scenario AS15169 – Google).
As mentioned in many previous blogs, Mutually Agreed Norms for Routing Security (MANRS) can be part of the solution here. It calls for four simple but concrete actions that ALL network operators should implement to reduce the most common routing threats, including filtering which prevents the propagation of incorrect routing information (the other three are anti-spoofing, address validation, and global coordination).
Network operators have a responsibility to ensure a globally robust and secure routing infrastructure, and your network’s safety depends on a routing infrastructure that weeds out accidental misconfigurations and bad actors. The more network operators who work together, the fewer incidents there will be, and the less damage they can do. It’s time to implement the MANRS actions now!