Fast Diversified Top-k Rule Discovery via User-Guided Embeddings

2026-04-15

 

Ziyan Han, Wanjia Chen, Yunpeng Han, Rui Mao, and Jianbin Qin*

Published at TKDE 2026


Abstract:

Rule discovery is a fundamental task in data analysis, with broad applications in data cleaning, knowledge extraction, and decision making. However, existing methods often generate a large number of functionally redundant rules, with a high time cost. To address this, a recent line of work, the first to introduce diversified top-k rule discovery, aims to identify a set of top-ranked rules that are both relevant and diverse. Despite this advancement, it still suffers from high user interaction overhead, computational inefficiency, and the inability to handle a common scenario of selecting a diverse subset from an existing rule set. In this paper, we propose a user-friendly and efficient framework for diversified top-k rule discovery. As a testbed, we consider Entity Enhancing Rules (REEs), which subsume common association rules and data quality rules as special cases. Our method allows users to specify lightweight preference templates, which are used to train a correlation model that captures user preferences and generates subjective embeddings for predicates and rules. Based on these embeddings, we define an objective function to jointly measure the relevance and diversity of rules in a unified vector space; moreover, we formulate and study two key problems: (i) selecting diversified top-k rules from an existing redundant rule set, and (ii) discovering diversified top-k rules directly from raw data. We prove that both problems are intractable and propose effective algorithms; in particular, the second problem is more challenging and thus we further optimize its solution with carefully designed pruning strategies and parallel optimization. Extensive evaluation on real-world datasets demonstrates that our algorithms consistently identify top-ranked relevant and diverse rules, achieving an average 14.4× speedup (up to 35.57×) over the state-of-the-art method.


[Download Paper]