Genetic programming for multiple-feature construction on high-dimensional classification
journal contributionposted on 2020-12-10, 02:32 authored by Binh TranBinh Tran, Bing Xue, Mengjie Zhang
© 2019 Elsevier Ltd
Data representation is an important factor in deciding the performance of machine learning algorithms including classification. Feature construction (FC)can combine original features to form high-level ones that can help classification algorithms achieve better performance. Genetic programming (GP)has shown promise in FC due to its flexible representation. Most GP methods construct a single feature, which may not scale well to high-dimensional data. This paper aims at investigating different approaches to constructing multiple features and analysing their effectiveness, efficiency, and underlying behaviours to reveal the insight of multiple-feature construction using GP on high-dimensional data. The results show that multiple-feature construction achieves significantly better performance than single-feature construction. In multiple-feature construction, using multi-tree GP representation is shown to be more effective than using the single-tree GP thanks to the ability to consider the interaction of the newly constructed features during the construction process. Class-dependent constructed features achieve better performance than the class-independent ones. A visualisation of the constructed features also demonstrates the interpretability of the GP-based FC approach, which is important to many real-world applications.