How Wayfinder is Using Neural Networks for Vision-Based Autonomous Landing
It's a critical skill every student pilot learns before being sent off on their first solo flight: how to land. No navigation instruments are needed—students are taught to look out the window and learn the sight picture. Over the course of many landings, just by looking at the apparent size and shape of the runway, the pilot gains a sense of how far out they are and whether they're on the approach path. The human eye is the primary navigation sensor for students all the way up to airline pilots flying visual approaches into busy international airports.
On the other end of the spectrum, some commercial aircraft can already fly an approach, land, and rollout to taxi speeds with no pilot input, but only on runways equipped with a CAT III ILS (Instrument Landing System). ILS uses multiple radio antenna arrays installed near the runway to provide position guidance: the localizer signal allows the aircraft determine its lateral deviation from the centerline, while the glideslope signal allows the aircraft to determine its vertical deviation from the glideslope.
However, few airports can justify the installation and maintenance cost of a CAT III ILS, which are typically installed only where necessitated by low visibility. Airbus commercial aircraft operate out of nearly 4,000 airports around the world. Fewer than 1,000 airports have any kind of ILS at all, and fewer than 100 have the CAT III variant. Clearly, if we want our aircraft to land autonomously almost anywhere, they must do so without needing navigation equipment at every airport.
Airbus’ Autonomous Taxi, Takeoff, and Landing (ATTOL) demonstrator project is working toward just that, equipping an airliner test aircraft with a suite of sensors, actuators, and computers to explore the potential of autonomy via computer vision and machine learning. As part of the collaboration between Wayfinder and ATTOL, we set out to teach the aircraft to see and navigate to the runway using deep learning, much like a human pilot would.
Traditional computer vision techniques require handcrafted feature extraction algorithms and heuristics, which are too brittle in unconstrained environments. A convolutional neural network (CNN) gradually learns a deeply layered hierarchy of features from training data, allowing it to generalize more effectively in the real world—but only if the training data is sufficiently diverse. If it has only seen sunny weather during training, it might struggle with clouds, and if it has only seen one runway, it will overfit to that runway and fail to generalize.
What we needed were landings with a variety of runways, weather, lighting, and ILS deviations. What we actually had were a few landings at Toulouse-Blagnac Airport where only a small fraction of the images had been manually annotated with runway bounding boxes, with no way to annotate distance, localizer, or glideslope values. Manual annotation clearly was not going to suffice.
We developed a geometric projection method to automate annotation using aircraft telemetry data. Commercial aircraft continuously determine their own position and attitude, from which we compute the camera's position and orientation in world coordinates. We can then geometrically transform the 3D world coordinates of runways and markings, which we extract from satellite imagery, into 2D pixel coordinates where they should appear in the image. Finally, the runway distance, localizer, and glideslope values can be directly computed from the position of the aircraft.
While we waited for more datasets to arrive, we turned to the X-Plane Pro flight simulator to generate photorealistic synthetic data. To mimic the real camera, the simulator was configured to match the sensor resolution and lens characteristics, while visual features like dynamic range saturation and exposure control were tuned via post-processing shaders in ReShade. With built-in developer interfaces and an add-on ecosystem with many meticulously modeled airports, X-Plane provided a simple yet effective way to fly simulated ILS approaches into various airports and runways with customizable weather and time of day. We could even (fully intentionally, of course) fly meandering approaches that in the real world would call into question the sobriety of the test pilot, the better to cover a broader range of localizer and glideslope deviations.
Whereas convolutional neural networks are often trained for a single task—classifying chihuahuas and muffins, for example, or detecting stop signs—we need both object detection, for drawing bounding boxes around runways and markings, as well as regression, for estimating distance, localizer, and glideslope values. A technique called Multi-Task Learning offers an elegant way to implement both in a single network.
The key insight is that when different tasks rely on a similar underlying understanding of the data, sharing feature extraction layers can help the network generalize better, since features extracted in shared layers have to support multiple output tasks. In our highly modified variant of Single Shot MultiBox Detector (SSD), a number of shared layers extract features about the runway environment, followed by multiple task-specific heads that output bounding boxes, bounding box classes, distance to runway, localizer deviation, and glideslope deviation.
Using synthetic data, our network can reliably detect the runway from several miles away, and runway distance estimates are typically within a few percent of ground truth. Localizer and glideslope deviation estimates are more challenging and need some more work, since even a 0.1 degree error is a lot when the entire glideslope range is only ±0.7 degrees. Most promisingly, even though the network had only been trained on synthetic images, it performed surprisingly well on real images.
This is just the beginning. As we collect more real flight data, we can begin to explore domain adaptation across different combinations of synthetic and real training data, while recent developments in generative adversarial networks may allow us to blur the distinction between synthetic and real data altogether. While we currently process one frame at a time, techniques from recurrent neural networks may allow us to leverage temporal information across sequences of frames. Injecting known side-channel information, like runway dimensions, airport layouts, or nonstandard glideslopes, will improve performance across airports.
From commercial airliners to urban air taxis to unmanned aircraft, it's an exciting time for autonomy in aviation. At Wayfinder, we're researching deep learning models, scaling out data and compute infrastructure, creating realistic simulations, developing safety critical software and hardware, and flight testing experimental systems. Together, we're building the future of autonomous flight. Come join us.
- Harvest Zhang, Head of Software, Wayfinder