We consider the problem of covering multiple submodular constraints. Given a finite ground set
On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness
It is well known that the standard greedy algorithm guarantees a worstcase approximation factor of 1 − 1/e when maximizing a monotone submodular function under a cardinality constraint. However, empirical studies show that its performance is substantially better in practice. This raises a natural question of explaining this improved performance of the greedy algorithm. In this work, we define sharpness for submodular functions as a candidate explanation for this phenomenon. We show that the greedy algorithm provably performs better as the sharpness of the submodular function increases. This improvement ties in closely with the faster convergence rates of first order methods for sharp functions in convex optimization.
 Award ID(s):
 1717947
 Publication Date:
 NSFPAR ID:
 10178884
 Journal Name:
 Thirtyseventh International Conference on Machine Learning
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract N , a weight function ,$$w: N \rightarrow \mathbb {R}_+$$ $w:N\to {R}_{+}$r monotone submodular functions over$$f_1,f_2,\ldots ,f_r$$ ${f}_{1},{f}_{2},\dots ,{f}_{r}$N and requirements the goal is to find a minimum weight subset$$k_1,k_2,\ldots ,k_r$$ ${k}_{1},{k}_{2},\dots ,{k}_{r}$ such that$$S \subseteq N$$ $S\subseteq N$ for$$f_i(S) \ge k_i$$ ${f}_{i}\left(S\right)\ge {k}_{i}$ . We refer to this problem as$$1 \le i \le r$$ $1\le i\le r$MultiSubmodCover and it was recently considered by HarPeled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260 HarPeled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$ $r=1$MultiSubmodCover generalizes the wellknown Submodular Set Cover problem (SubmodSC ), and it can also be easily reduced toSubmodSC . A simple greedy algorithm gives an approximation where$$O(\log (kr))$$ $O(log(kr\left)\right)$ and this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm for$$k = \sum _i k_i$$ $k={\sum}_{i}{k}_{i}$MultiSubmodCover that covers each constraint to within a factor of while incurring an approximation of$$(11/e\varepsilon )$$ $(11/e\epsilon )$ in the cost. Second, we consider the special case when each$$O(\frac{1}{\epsilon }\log r)$$ $O(\frac{1}{\u03f5}logr)$ is a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover ($$f_i$$ ${f}_{i}$PartialSC ), covering integer programs (CIPs ) and multiple vertex cover constraintsmore » 
Mobile edge computing pushes computationallyintensive services closer to the user to provide reduced delay due to physical proximity. This has led many to consider deploying deep learning models on the edge – commonly known as edge intelligence (EI). EI services can have many model implementations that provide different QoS. For instance, one model can perform inference faster than another (thus reducing latency) while achieving less accuracy when evaluated. In this paper, we study joint service placement and model scheduling of EI services with the goal to maximize QualityofServcice (QoS) for end users where EI services have multiple implementations to serve user requests, each with varying costs and QoS benefits. We cast the problem as an integer linear program and prove that it is NPhard. We then prove the objective is equivalent to maximizing a monotone increasing, submodular set function and thus can be solved greedily while maintaining a (1 – 1/e)approximation guarantee. We then propose two greedy algorithms: one that theoretically guarantees this approximation and another that empirically matches its performance with greater efficiency. Finally, we thoroughly evaluate the proposed algorithm for making placement and scheduling decisions in both synthetic and realworld scenarios against the optimal solution and some baselines.more »

We study the problem of approximating maximum Nash social welfare (NSW) when allocating m indivisible items among n asymmetric agents with submodular valuations. The NSW is a wellestablished notion of fairness and efficiency, defined as the weighted geometric mean of agents' valuations. For special cases of the problem with symmetric agents and additive(like) valuation functions, approximation algorithms have been designed using approaches customized for these specific settings, and they fail to extend to more general settings. Hence, no approximation algorithm with factor independent of m is known either for asymmetric agents with additive valuations or for symmetric agents beyond additive(like) valuations. In this paper, we extend our understanding of the NSW problem to far more general settings. Our main contribution is two approximation algorithms for asymmetric agents with additive and submodular valuations respectively. Both algorithms are simple to understand and involve nontrivial modifications of a greedy repeated matchings approach. Allocations of high valued items are done separately by unmatching certain items and rematching them, by processes that are different in both algorithms. We show that these approaches achieve approximation factors of O(n) and O(n log n) for additive and submodular case respectively, which is independent of the number of items.more »

Diffusion of information in social network has been the focus of intense research in the recent past decades due to its significant impact in shaping public discourse through group/individual influence. Existing research primarily models influence as a binary property of entities: influenced or not influenced. While this is a useful abstraction, it discards the notion of degree of influence, i.e., certain individuals may be influenced ``more'' than others. We introduce the notion of \emph{attitude}, which, as described in social psychology, is the degree by which an entity is influenced by the information. Intuitively, attitude captures the number of distinct neighbors of an entity influencing the latter. We present an information diffusion model (AIC model) that quantifies the degree of influence, i.e., attitude of individuals, in a social network. With this model, we formulate and study attitude maximization problem. We prove that the function for computing attitude is monotonic and submodular, and the attitude maximization problem is NPHard. We present a greedy algorithm for maximization with an approximation guarantee of $(11/e)$. In the context of AIC model, we study two problems, with the aim to investigate the scenarios where attaining individuals with high attitude is objectively more important than maximizing themore »

Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in many problems. Here, we propose Fast Interpretable GreedyTree Sums (FIGS), an algorithm for fitting concise rulebased models. Specifically, FIGS generalizes the CART algorithm to simultaneously grow a flexible number of trees in a summation. The total number of splits across all the trees can be restricted by a prespecified threshold, thereby keeping both the size and number of its trees under control. When both are small, the fitted treesum can be easily visualized and written out by hand, making it highly interpretable. A partially oracle theoretical result hints at the potential for FIGS to overcome a key weakness of singletree models by disentangling additive components of generative additive models, thereby reducing redundancy from repeated splits on the same feature. Furthermore, given oracle access to optimal tree structures, we obtain l2 generalization bounds for such generative models in the case of C component functions, matching known minimax rates in some cases. Extensive experiments across a wide array of realworld datasets show that FIGS achieves stateoftheart prediction performance (among all popular rulebased methods) when restricted to just a few splits (e.g. less than 20). We findmore »