This work explores the possibility of incorporating depth information into a deep neural network to improve accuracy of RGB instance segmentation. The baseline of this work is semantic instance segmentation with discriminative loss function.The baseline work proposes a novel disc
...
This work explores the possibility of incorporating depth information into a deep neural network to improve accuracy of RGB instance segmentation. The baseline of this work is semantic instance segmentation with discriminative loss function.The baseline work proposes a novel discriminative loss function with which the semantic net-work can learn a n-D embedding for all pixels belonging to instances. Embeddings of the same instances are attracted to their own centers while centers of different instance embeddings repulse each other. Two limitations are set for attraction and repulsion, namely the in-margin and out-margin. A post-processing procedure (clustering) is required to infer instance indices from embeddings with an important parameter bandwidth, the threshold for clustering. The contribution of the work in this thesis are several new methods to incorporate depth information into the baseline work. One simple method is adding scaled depth directly to RGB embeddings, which is named as scaling. Through theorizing and experiments, this work also proposes that depth pixels can be encoded into 1-D embeddings with the same discriminative loss function and combined with RGB embeddings. Explored combination methods are fusion and concatenation. Additionally, two depth pre-processing methods are proposed, replication and coloring. From the experimental result, both scaling and fusion lead to significant improvements over baseline work while concatenation contributes more to classes with lots of similarities.