AutoFeat: Transitive Feature Discovery over Join Paths
Andra Ionescu, Kiril Vasilev, Florena Buse, Rihan Hai, Asterios Katsifodimos
Published in ICDE, 2024
This paper proposes a novel ranking-based feature discovery method called AutoFeat. Given a base table with a target label, AutoFeat explores multi-hop, transitive join paths to find relevant features in order to augment the base table with additional features, ultimately leading to increased accuracy of an ML model. AutoFeat is general: it evaluates the predictive power of features without the need to train an ML model, ranking join paths using the concepts of relevance and redundancy. Our experiments on real-world open data show that AutoFeat is efficient: it can find features of high predictive power on data lakes with an increased number of dataset joinability relationships 5x-44x faster than baseline approaches. In addition, AutoFeat is effective, improving accuracy by 16% on average compared to the baseline approaches, even in noisy, uncurated data lakes.
- Bibtex:
@inproceedings{ionescu2024autofeat, title={AutoFeat: Transitive Feature Discovery over Join Paths}, author={Ionescu, Andra and Vasilev, Kiril and Buse, Florena and Hai, Rihan and Katsifodimos, Asterios}, booktitle={2024 IEEE 40th International Conference on Data Engineering (ICDE)}, pages={1861--1873}, year={2024}, organization={IEEE} }