AutoFeat: Transitive Feature Discovery over Join Paths

Andra Ionescu, Kiril Vasilev, Florena Buse, Rihan Hai, Asterios Katsifodimos

Published in ICDE, 2024

This paper proposes a novel ranking-based feature discovery method called AutoFeat. Given a base table with a target label, AutoFeat explores multi-hop, transitive join paths to find relevant features in order to augment the base table with additional features, ultimately leading to increased accuracy of an ML model. AutoFeat is general: it evaluates the predictive power of features without the need to train an ML model, ranking join paths using the concepts of relevance and redundancy. Our experiments on real-world open data show that AutoFeat is efficient: it can find features of high predictive power on data lakes with an increased number of dataset joinability relationships 5x-44x faster than baseline approaches. In addition, AutoFeat is effective, improving accuracy by 16% on average compared to the baseline approaches, even in noisy, uncurated data lakes.

  • Bibtex:
    @inproceedings{ionescu2024autofeat,
    title={AutoFeat: Transitive Feature Discovery over Join Paths},
    author={Ionescu, Andra and Vasilev, Kiril and Buse, Florena and Hai, Rihan and Katsifodimos, Asterios},
    booktitle={2024 IEEE 40th International Conference on Data Engineering (ICDE)},
    pages={1861--1873},
    year={2024},
    organization={IEEE}
    }