Monitoring and aviation

• 2015-01-06 • 4 min

I’m starting to think there’s a lot that the monitoring world can learn from the aviation world, and vice-versa.

Those that know me are probably starting to roll their eyes; I really like talking about airplanes. I love flying, especially when I’m the one doing it. I got my private pilots license many years ago, and am in the middle of a glider rating. I wish I had more time to go up.

Maybe it’s a long-shot connection, and yes I’m biased because monitoring and aviation are two big interests of mine. As raintank gets underway, I find myself thinking about this connection.

Todays airplanes are are obviously insanely complicated machines. We fly faster, higher and safer than ever before (save for the Concorde no longer flying). But, the real magic happens in the software. As @pmarca famously said, “software is eating the world”. That’s certainly true when it comes to aviation.

Boeings new 787 generates terabytes of diagnostic data every single day. It’s orders of magnitude more than their previous jets. The airlines are just barely beginning to cope with this new world order.

Even the engines on everything from the A380 to little biz jets have long been reduced to an almost arcade-like experience for the sometimes solo pilot. FADEC (Fully Automated Digital Engine Control) is really slick software that does what you used to need a copilot or flight engineer for, only a lot better and faster, while never making a mistake or a pass at the stewardess.

All by interpreting thousands of metrics in real time.

We’ve had “fly by wire” systems, which translate the movement of a pilots hand into electric signals to move the appropriate control surface, for quite a while now. But we’re moving way past that, into “inherently unstable aircraft” like the stealth fighter. The computer is now translating the pilots intent into a complex set of movements.

If the flight computers on a stealth fighter all had problems, no pilot would be able to fly the plane manually. That code better be pretty damn solid. It’d give a whole new meaning to “software crash”.

Planes are becoming flying apps, and pilots are just admin users.

All this automation has downsides, of course. A good example is Air France 447 which crashed in the Atlantic in 2009. The failure mode would be anything but simple. There would be no clear lesson. “poor interface design” and “alert overload” were two major contributing factors.

Are we still talking about airplanes?

The pilots got confused. They’d lost overall situational awareness, and were overwhelmed by all the seamingly meaningless alarms. Their respective control inputs simply cancelled each other out, processed in real time as by a tight piece of code running in a loop somewhere. Before fly by wire, that wouldn’t have even been physically possible.

These days, it isn’t a simple component or part that brings an airplane down. It is rarely even a single bad decision or event. In most cases, What we have to watch out for, and have had to deal with for a while, are the more insidious and complicated cascading failures, made far worse by more complicated, interdependent, buggy, and sometimes confusing software.

That sounds an awful lot like modern internet infrastructure, doesn’t it?

Watching departures and arrivals from the JFK observation deck while listening to ATC is a favorite past-time of mine. I keep reminding my friends just how safe commercial flying is, statistically. That’s in no small part because the software is generally very very good, the processes are established and there is redundancy built into every part of the system. Personally I consider it to be a ballet of sorts.

They say that flying is hour upon hour of sheer boredom, punctuated by the occasional moment of sheer terror.

To me that sounds like quite an aspirational state of affairs for infrastructure monitoring!

Feedback

Monitoring and aviation

Related content

Monitoring and aviation

Related content

How Japan's space agency used Grafana to monitor its first moon landing in real time

Observing exchange rates: How to keep tabs on currencies during the summer travel season

A guide to scaling OpenTelemetry Collectors across multiple hosts via Ansible