Follow

It’s not DNS
There is no way it’s DNS
It was DNS

@320x200
From: theguardian.com/technology/202

"Facebook (accidentally, we assume) sent an update to a deep-level routing protocol on the internet that said, basically, "hey we don't have any servers any more xoxo"

— alex hern (@alexhern) October 4, 2021"

@wendy the best part is how the network crash had a physical side-effect; employees couldn't access their offices for security checks were out of order 😆

@mara
:yikes:
As the Guardian’s UK technology editor, Alex Hern, put it on Twitter,
“Facebook runs EVERYTHING through Facebook”
:unacceptable:

@wendy @mara At least, they’re eating their own dogshit. That’s probably the only thing I respect from that company, even if, from a resilience perspective, that wasn’t the smartest move in this case.

@xuv @wendy @mara But it seems like some pretty poor systems engineering. Now honestly, I have no idea how their network is organized but when I worked with this kind of tech, way back in the 90s, orgs of this size typically had an Out of Band (OOB) network with terminal servers so that you could (or in some cases were required to) connect to a console over a serial cable even if the device had a completely borked config.

If routes or network interfaces were "flapping" up and down, you could temporarily shut them down while you fixed the config. And you always had a copy of the old config saved before you did any changes, so worst case you could just paste in the whole old config while you figured out where the problem was.

Badge authentication systems should not have the same failure modes as your customer network!

I don't know if this mess now is a result of laziness, naiveté or just poor network architecture but this kind of thing should be close to impossible. I am just speculating but it smells like hubris that is the product of some web-scale cowboy culture.

@praxeology @xuv @wendy We can indeed only speculate, so here's mine: All FB systems are cloud based, and it took as long as the time needed for a devops team to fly over to CA and reset the servers with the old config.

For the cowboy idiom you are referring to, I didn't get it. Do you meant the story of th network routing failure is just gossip?

Sign in to participate in the conversation
post.lurk.org

Welcome to post.lurk.org, an instance for discussions around cultural freedom, experimental, new media art, net and computational culture, and things like that.