Sven Briels

Master thesis (2)

2 records found

Automated Semantic Segmentation of Aerial Imagery using Synthetic Data

Master thesis (2022) - C.A. Caceres Tocora (author), S. Du (mentor), J.E. Stoter (graduation committee member), Sven Briels (coach), W. Gao (coach)

Semantic segmentation of aerial images is the ability to assign labels to all pixels of an image. It proves to be essential for various applications such as urban planning, agriculture and real-estate analysis. Deep Learning techniques have shown satisfactory results in performing semantic segmentation tasks. Training a deep learning model is an expensive operation, while most of the time manually labelled images are required. Additionally, a bottleneck in semantic segmentation projects concerns the annotation of images. Consequently, synthetic data, which consists of images from a virtual world that simulates the real world, can be used as training data for segmentation tasks to improve the classification results. Therefore, this thesis aims to create a pipeline that generates synthetic images with semantic segmentation labels to be used in an existing deep learning model and discuss how the generated synthetic data improves the semantic segmentation of aerial images. In this research work, an existing model (FuseNet), which in previous works achieved satisfactory results, is trained with solely synthetic data and a mix of real data in different training and testing scenarios to classify true ortho imagery from Haaksbergen, Netherlands and Potsdam, Germany. In addition, a benchmark of domain adaptation techniques is performed to close the domain gap between the synthetic and real imagery. The semantic maps include building, road and other classes. Experiments are performed to test the performance of the synthetic data using 1) Different 3D models of the virtual world, 2) Different quantities of synthetic and real training data, 3) Different cross-geographical scenarios, and 4) Different domain adaptation techniques. The assessment is based on the (mean) intersection over union (IoU), F1 score, precision and recall and an extensive visual assessment. The virtual world is created through a pipeline in CityEngine using procedural modelling techniques and then rendered in Blender to create the training dataset. The results show that the synthetic data has a mIoU of 0.48, which is lower compared to cases when solely real data (0.75) are used, when the segmentation is performed in the same training and testing area. In addition, the 3D models partly affect the segmentation results. When using a mix of real and synthetic data, the results are maintained to a mIoU of 0.75. On the contrary, when training and testing in different areas, the use of synthetic data seems to improve the results on average by 21.5, 12.5, 1.5 and 2 percentage points on the mIoU, IoU for classes building, road and other respectively. Additionally, domain adaptation techniques such as Cycle GAN and Cycada improve the performance of synthetic datasets by 4 percentage points. Overall, this thesis shows that when the domain difference between the training and testing datasets is big, the addition of the synthetic data helps to improve the performance of the semantic segmentation of aerial images. Synthetic datasets improve the segmentation results by using a mix of existing labelled imagery from different geographical regions when a project lacks labelled imagery. In contrast, when labelled imagery is present in the same testing area, the real training data obtains robust results, thus the addition of synthetic data does not improve the segmentation results.

Semantic Segmentation of RGB-Z Aerial Imagery Using Convolutional Neural Networks

Master thesis (2020) - A.E. Mulder (author), B. Dukai (mentor), R.Y. Peters (graduation committee member), J.E. Stoter (graduation committee member), Sven Briels (coach), Jean Michel Renders (coach)

Semantic segmentation (or pixel-level classification) of remotely sensed imagery has shown to be useful for applications in fields as mapping of land cover, object detection, change detection and land-use analysis. Deep learning algorithms called convolutional neural networks (CNNs) have shown to outperform traditional computer vision and machine learning approaches in tackling semantic segmentation tasks. Furthermore, addition of height information (Z) to aerial imagery (RGB) is believed to improve segmentation results. However, discussion remains on the following: to what extent height information adds value; the best way to combine RGB information with height information; and what type of height information can best be used. This study aims to answer these questions. In this research work, the CNN architectures FCN-8s, SegNet, U-Net and FuseNet-SF5 are trained to semantically segment 10 cm resolution true ortho imagery of Haarlem, potentially augmented with height information. The outputted topographic maps contain the classes building, road, water and other. Experiments are conducted that allow for the comparison of 1) models trained on RGB and on RGB-Z, 2) models combining RGB and height information through data fusion and through data stacking, and 3) models trained using different types of absolute and relative height approaches. Performances are compared based on scores on the performance measure (mean) intersection over union (IoU) and through visual assessment of outputted prediction maps. The results indicated that on average segmentation performance improves by approximately 1 percent when absolute height information is added. The class building showed to benefit the most from the addition of height information. Furthermore, extracting features from height information in a separate encoder and fusing these into RGB feature maps, led to a higher overall segmentation quality than when height information is provided as a stacked extra band and processed in the same encoder as the RGB information. Finally, models using relative height delivered a higher quality segmentation than when absolute height approaches were used, especially for large objects. The best performing model; FuseNet-SF5 trained on RGB imagery and pixel-level, relative height, retrieved a mean IoU of 0.8427 and IoUs of 0.8744, 0.7865, 0.9131 and 0.7966 for the classes building, road, water and other respectively. This model was able to correctly classify over 90% of the pixels of 67% of all the objects present in the ground truth. Overall, this study showed that, when considering semantic segmentation of aerial RGB imagery, 1) height information can improve segmentation results, 2) adding height information through data fusion can result in a higher segmentation quality than when data stacking is used, and 3) providing relative height to a network, rather than absolute height, can improve semantic segmentation quality.