skip to main content

Title: On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness
It is well known that the standard greedy algorithm guarantees a worst-case approximation factor of 1 − 1/e when maximizing a monotone submodular function under a cardinality constraint. However, empirical studies show that its performance is substantially better in practice. This raises a natural question of explaining this improved performance of the greedy algorithm. In this work, we define sharpness for submodular functions as a candidate explanation for this phenomenon. We show that the greedy algorithm provably performs better as the sharpness of the submodular function increases. This improvement ties in closely with the faster convergence rates of first order methods for sharp functions in convex optimization.
Authors:
; ;
Award ID(s):
1717947
Publication Date:
NSF-PAR ID:
10178884
Journal Name:
Thirty-seventh International Conference on Machine Learning
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We consider the problem of covering multiple submodular constraints. Given a finite ground setN, a weight function$$w: N \rightarrow \mathbb {R}_+$$w:NR+,rmonotone submodular functions$$f_1,f_2,\ldots ,f_r$$f1,f2,,froverNand requirements$$k_1,k_2,\ldots ,k_r$$k1,k2,,krthe goal is to find a minimum weight subset$$S \subseteq N$$SNsuch that$$f_i(S) \ge k_i$$fi(S)kifor$$1 \le i \le r$$1ir. We refer to this problem asMulti-Submod-Coverand it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$r=1Multi-Submod-Covergeneralizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced toSubmod-SC. A simple greedy algorithm gives an$$O(\log (kr))$$O(log(kr))approximation where$$k = \sum _i k_i$$k=ikiand this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm forMulti-Submod-Coverthat covers each constraint to within a factor of$$(1-1/e-\varepsilon )$$(1-1/e-ε)while incurring an approximation of$$O(\frac{1}{\epsilon }\log r)$$O(1ϵlogr)in the cost. Second, we consider the special case when each$$f_i$$fiis a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraintsmore »Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.

    « less
  2. Mobile edge computing pushes computationally-intensive services closer to the user to provide reduced delay due to physical proximity. This has led many to consider deploying deep learning models on the edge – commonly known as edge intelligence (EI). EI services can have many model implementations that provide different QoS. For instance, one model can perform inference faster than another (thus reducing latency) while achieving less accuracy when evaluated. In this paper, we study joint service placement and model scheduling of EI services with the goal to maximize Quality-of-Servcice (QoS) for end users where EI services have multiple implementations to serve user requests, each with varying costs and QoS benefits. We cast the problem as an integer linear program and prove that it is NP-hard. We then prove the objective is equivalent to maximizing a monotone increasing, submodular set function and thus can be solved greedily while maintaining a (1 – 1/e)-approximation guarantee. We then propose two greedy algorithms: one that theoretically guarantees this approximation and another that empirically matches its performance with greater efficiency. Finally, we thoroughly evaluate the proposed algorithm for making placement and scheduling decisions in both synthetic and real-world scenarios against the optimal solution and some baselines.more »In the real-world case, we consider real machine learning models using the ImageNet 2012 data-set for requests. Our numerical experiments empirically show that our more efficient greedy algorithm is able to approximate the optimal solution with a 0.904 approximation on average, while the next closest baseline achieves a 0.607 approximation on average.« less
  3. We study the problem of approximating maximum Nash social welfare (NSW) when allocating m indivisible items among n asymmetric agents with submodular valuations. The NSW is a well-established notion of fairness and efficiency, defined as the weighted geometric mean of agents' valuations. For special cases of the problem with symmetric agents and additive(-like) valuation functions, approximation algorithms have been designed using approaches customized for these specific settings, and they fail to extend to more general settings. Hence, no approximation algorithm with factor independent of m is known either for asymmetric agents with additive valuations or for symmetric agents beyond additive(-like) valuations. In this paper, we extend our understanding of the NSW problem to far more general settings. Our main contribution is two approximation algorithms for asymmetric agents with additive and submodular valuations respectively. Both algorithms are simple to understand and involve non-trivial modifications of a greedy repeated matchings approach. Allocations of high valued items are done separately by un-matching certain items and re-matching them, by processes that are different in both algorithms. We show that these approaches achieve approximation factors of O(n) and O(n log n) for additive and submodular case respectively, which is independent of the number of items.more »For additive valuations, our algorithm outputs an allocation that also achieves the fairness property of envy-free up to one item (EF1). Furthermore, we show that the NSW problem under submodular valuations is strictly harder than all currently known settings with an e/(e-1) factor of the hardness of approximation, even for constantly many agents. For this case, we provide a different approximation algorithm that achieves a factor of e/(e-1), hence resolving it completely.« less
  4. Diffusion of information in social network has been the focus of intense research in the recent past decades due to its significant impact in shaping public discourse through group/individual influence. Existing research primarily models influence as a binary property of entities: influenced or not influenced. While this is a useful abstraction, it discards the notion of degree of influence, i.e., certain individuals may be influenced ``more'' than others. We introduce the notion of \emph{attitude}, which, as described in social psychology, is the degree by which an entity is influenced by the information. Intuitively, attitude captures the number of distinct neighbors of an entity influencing the latter. We present an information diffusion model (AIC model) that quantifies the degree of influence, i.e., attitude of individuals, in a social network. With this model, we formulate and study attitude maximization problem. We prove that the function for computing attitude is monotonic and sub-modular, and the attitude maximization problem is NP-Hard. We present a greedy algorithm for maximization with an approximation guarantee of $(1-1/e)$. In the context of AIC model, we study two problems, with the aim to investigate the scenarios where attaining individuals with high attitude is objectively more important than maximizing themore »attitude of the entire network. In the first problem, we introduce the notion of \emph{actionable attitude}; intuitively, individuals with actionable attitude are likely to ``act'' on their attained attitude. We show that the function for computing actionable attitude, unlike that for computing attitude, is non-submodular and however is \emph{approximately submodular}. We present approximation algorithm for maximizing actionable attitude in a network. In the second problem, we consider identifying the number of individuals in the network with attitude above a certain value, a threshold. In this context, the function for computing the number of individuals with attitude above a given threshold induced by a seed set is \emph{neither submodular nor supermodular}. We present heuristics for realizing the solution to the problem. We experimentally evaluated our algorithms and studied empirical properties of the attitude of nodes in network such as spatial and value distribution of high attitude nodes.« less
  5. Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in many problems. Here, we propose Fast Interpretable Greedy-Tree Sums (FIGS), an algorithm for fitting concise rule-based models. Specifically, FIGS generalizes the CART algorithm to simultaneously grow a flexible number of trees in a summation. The total number of splits across all the trees can be restricted by a pre-specified threshold, thereby keeping both the size and number of its trees under control. When both are small, the fitted tree-sum can be easily visualized and written out by hand, making it highly interpretable. A partially oracle theoretical result hints at the potential for FIGS to overcome a key weakness of single-tree models by disentangling additive components of generative additive models, thereby reducing redundancy from repeated splits on the same feature. Furthermore, given oracle access to optimal tree structures, we obtain l2 generalization bounds for such generative models in the case of C component functions, matching known minimax rates in some cases. Extensive experiments across a wide array of real-world datasets show that FIGS achieves state-of-the-art prediction performance (among all popular rule-based methods) when restricted to just a few splits (e.g. less than 20). We findmore »empirically that FIGS is able to avoid repeated splits, and often provides more concise decision rules than fitted decision trees, without sacrificing predictive performance. All code and models are released in a full-fledged package on Github.« less