6-DOF Atmospheric Rocket Landing Guidance using Meta-Reinforcement Learning

Master thesis (2024)

Authors

J. Carradori Aerospace Engineering

Contributors

E. Mooij (mentor)

J. Guo Space Systems Egineering (graduation committee member)

Marco Sagliano Deutsches Zentrum für Luft- und Raumfahrt (DLR) (mentor)

E. van Kampen (graduation committee member)

Faculty

Aerospace Engineering

Reinforcement Learning GNC Rocket landing

To reference this document use:

http://resolver.tudelft.nl/uuid:bf2a598c-9694-40cd-8dec-03b73d539b54

More Info

expand_more

Published Date

27-09-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

Landing a rocket on Earth is a key factor in enabling quicker and more cost-effective access to space. However, it poses significant challenges due to the highly uncertain environment. A robust, reliable, and real-time capable Guidance, Navigation, and Control (GNC) system is essential to guide the vehicle to the landing site while meeting terminal constraints and minimizing fuel consumption.. A 6-Degrees Of Freedom (DOF) flight simulator is developed, including accurate vehicle and environmental models such as variable Mass, Center of Mass and Inertia (MCI), winds, and detailed aerodynamics. Furthermore, not only thrust magnitude and engine deflections are used as controls, but also two sets of aerodynamic fins, as they are present in real rockets. Finally, initial condition dispersions and uncertainties in dynamics, navigation, and controls are included, to assess the robustness of the GNC strategy.
This study applies Meta Reinforcement Learning (meta-RL) to the terminal rocket landing problem. Through repeated interactions between an agent and its environment over multiple episodes, a Neural Network (NN) develops a policy that maps observed states to control actions. Unlike traditional methods, meta-RL leverages both current and past observations to output the optimal action, enhancing the robustness of the policy. Recurrent Neural Networks (RNNs) like Long-Short Term Memory (LSTM), are commonly used in meta-RL for their ability to handle sequential data. However, this thesis investigates also the use of attention-based Gated Transformer XL (GTrXL) networks, which are promising to improve the solution accuracy.
This research confirms this hypothesis, demonstrating that the GTrXL-based policy significantly outperforms the LSTM-based one. This policy is also less sensitive to internal hyperparameters, provided that the NN has a sufficient number of weights and biases to capture all the relevant features. The results indicate that incorporating complex vehicle and environmental models, along with dispersed initial conditions, aerodynamic and wind uncertainties, navigation and control errors, and actuator deflection rate constraints into the training process, yields a robust GTrXL-based policy. This guidance policy is able to meet terminal constraints on position, horizontal velocity, vertical angle, and angular rates in 1000/1000 simulations. The only exception is the vertical velocity which is exceeded on average by only 2-3 m/s. Including a thrust rate constraint mitigates this issue, reducing the average violation to 1 m/s. Finally, developing a terminal patch to increase the thrust magnitude in the last few meters before touchdown completely eliminates the vertical velocity issue, producing a 100% success rate, with all the Monte Carlo runs meeting all the terminal constraints. Moreover, the NN produces a solution in about 6 ms, showing great potential for its real-time use. The results are consistently superior to those of a conventional Guidance and Control (G&C) strategy where a Linear Quadratic Regulator (LQR) controller tracks an optimal trajectory, consuming only 6% more fuel.
Recommendations for future works include improving the reward function to better handle the vertical velocity constraint, and penalizing control effort to have smoother Thrust Vector Control (TVC) and fins control profiles. It is also suggested to expand the simulation scenario, including the unpowered aerodynamic descent.

Files

Jacopo_Carradori_MScThesis.pdf

Unknown license

File under embargo until 13-09-2025