<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Ticketed learning--unlearning schemes</title></titleStmt>
			<publicationStmt>
				<publisher>Conference on Learning Theory</publisher>
				<date>07/01/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10526848</idno>
					<idno type="doi"></idno>
					
					<author>Badih Ghazi</author><author>Pritish Kamath</author><author>Ravi Kumar</author><author>Pasin Manurangsi</author><author>Ayush Sekhari</author><author>Chiyuan Zhang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We consider the learning-unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when learning from scratch on the surviving examples.We propose a new ticketed model for learning-unlearning wherein the learning algorithm can send back additional information in the form of a small-sized (encrypted) "ticket" to each participating training example, in addition to retaining a small amount of "central" information for later. Subsequently, the examples that wish to be unlearnt present their tickets to the unlearning algorithm, which additionally uses the central information to return a new predictor. We provide space-efficient ticketed learning-unlearning schemes for a broad family of concept classes, including thresholds, parities, intersection-closed classes, among others.En route, we introduce the count-to-zero problem, where during unlearning, the goal is to simply know if there are any examples that survived. We give a ticketed learning-unlearning scheme for this problem that relies on the construction of Sperner families with certain properties, which might be of independent interest.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Machine learning models trained on user data have become widespread in applications. While these models have proved to be greatly valuable, there is an increasing demand for ensuring that they respect the consent of the users and the privacy of their data. One of the simplest and common challenge is how to update a model when a user wishes to drop out of the training data. Re-learning the model from scratch without this user's data is a natural approach, but this can be computationally prohibitive. Designing alternative approaches which aim to "mimic" this natural approach, without the computational overhead, has come to be known as the problem of machine "unlearning", a topic of growing interest.</p><p>Informally, the ideal learning-unlearning (LU) paradigm can be modeled as follows. An agent gets a sequence of learning and unlearning requests. Each such request is accompanied by a dataset of examples. At any point, the agent must be able to produce a predictor, the distribution of which must be identical to, or at least "close to", that of the same agent on a single learning request containing only the surviving examples, namely, all examples in learning requests that have not been present in any subsequent unlearning request. A naive approach to keep in mind is an agent that explicitly keeps track of all surviving examples at any point, and returns the predictor obtained by simulating a single learning request on the surviving examples. However, the space requirement of such an agent is linear in the number of examples (as it needs to store all the examples to simulate re-training). The goal in this paper is to understand:</p><p>When are space-efficient learning-unlearning schemes possible?</p><p>Unlearning has been an active area of research. <ref type="bibr">Cao and Yang (2015)</ref> initiated the study through exact unlearning. Their definition requires an algorithm to have identical outputs on a dataset after deleting a point, as if that point was never inserted; their algorithms are restricted to very structured problems. Several later works studied unlearning algorithms for empirical risk minimization <ref type="bibr">(Guo et al., 2020;</ref><ref type="bibr">Izzo et al., 2021;</ref><ref type="bibr">Neel et al., 2021;</ref><ref type="bibr">Ullah et al., 2021;</ref><ref type="bibr">Thudi et al., 2022;</ref><ref type="bibr">Graves et al., 2021;</ref><ref type="bibr">Bourtoule et al., 2021)</ref>. However, these works focus exclusively on the time complexity of unlearning. However, many of these provably effective unlearning methods have large space requirements to enable deletion, e.g., they store space-intensive check-pointing data structures <ref type="bibr">(Ullah et al., 2021)</ref>, large ensembles of models trained on subsets of data <ref type="bibr">(Bourtoule et al., 2021)</ref>, extensive statistics about the learning process <ref type="bibr">(Thudi et al., 2022;</ref><ref type="bibr">Neel et al., 2021)</ref>, or at the very least the entire (surviving) training dataset. This additional space overhead can be impractical for various applications. In contrast, the primary focus of our paper is to understand the space complexity of unlearning and to develop space-efficient learning-unlearning schemes for general function classes.</p><p>The model of learning-unlearning that has been considered in the above mentioned prior works can be deemed as the "central" model, where the agent responsible for unlearning has to remember all additional information beyond the learnt predictor that might be required for unlearning later.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Our Contributions</head><p>In Section 2, we introduce the notion of a ticketed learning-unlearning (TiLU) scheme, which consists of a learning and an unlearning algorithm. The learning algorithm given a training dataset, produces a predictor, as well as a "ticket" corresponding to each example and some additional "central" information. The unlearning algorithm, given a subset of the training examples accompanied by their corresponding tickets, and the previously generated central information, produces an updated predictor (that realizes the unlearning guarantee).</p><p>In Section 3, we show that certain limitations of the central model of learning-unlearning can be overcome by the ticketed model. In particular, we provide TiLU schemes for a broad family of concept classes that includes several commonly studied classes such as one-dimensional thresholds, product of thresholds, and parities. In Section 4, we provide improved TiLU schemes with even better space complexity bounds for products of thresholds, as well as for point functions, which are notably not covered by the techniques in Section 3.</p><p>Underlying our improvements in Section 4, is a basic primitive of count-to-zero. The goal in this problem is simply to determine if the unlearning request contains precisely all the examples in the original learning request or not. We give a TiLU scheme for this problem with space complexity that scales as the log of the inverse-Ackermann function (see Definition 9) of the number of examples. This relies on a novel construction of size-indexing Sperner families, which we believe to be of independent interest. We also prove a lower bound on such families in Section 5, and use it to show that the space complexity of TiLU schemes for any non-trivial concept class must necessarily increase with the number of examples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Other Related Work</head><p>For specific learning models like SVMs, various algorithms for exact unlearning have been considered under the framework of decremental learning <ref type="bibr">(Cauwenberghs and Poggio, 2000;</ref><ref type="bibr">Tveit et al., 2003;</ref><ref type="bibr">Karasuyama and Takeuchi, 2010;</ref><ref type="bibr">Romero et al., 2007)</ref>. However, these works do not enjoy any guarantees on the space requirements for unlearning. The primary motivation in these works is to use the decremental learning framework to empirically estimate the leave-one-out error in order to provide generalization guarantees for the learned model.</p><p>We also note that beyond exact unlearning, various probabilistic/approximate notions of unlearning, which are in turn inspired by the notions in differential privacy <ref type="bibr">(Dwork et al., 2006)</ref>, have been considered <ref type="bibr">(Ginart et al., 2019;</ref><ref type="bibr">Sekhari et al., 2021;</ref><ref type="bibr">Gupta et al., 2021;</ref><ref type="bibr">Chourasia et al., 2023)</ref>, and there has been an extensive effort to develop (approximate) unlearning algorithms for various problem settings. These include unlearning in deep neural networks <ref type="bibr">(Du et al., 2019;</ref><ref type="bibr">Golatkar et al., 2020</ref><ref type="bibr">Golatkar et al., , 2021;;</ref><ref type="bibr">Nguyen et al., 2020;</ref><ref type="bibr">Graves et al., 2021)</ref>, random forests <ref type="bibr">(Brophy and Lowd, 2021)</ref>, large-scale language models <ref type="bibr">(Zanella-B&#233;guelin et al., 2020)</ref>, convex loss functions <ref type="bibr">(Guo et al., 2020;</ref><ref type="bibr">Sekhari et al., 2021;</ref><ref type="bibr">Neel et al., 2021;</ref><ref type="bibr">Suriyakumar and Wilson, 2022)</ref>, etc. However, we reiterate that all these prior works have huge space requirements, especially in high-dimensional settings.</p><p>In a related setting, prior works have looked at the problem of constructing history-independent data structures <ref type="bibr">(Hartline et al., 2005;</ref><ref type="bibr">Naor and Teague, 2001)</ref>. The key motivation here is to prevent an adversary from inferring information about the dataset from the memory representation of the data structure that is not available through its "legitimate" interface. However, no direct application of history-independent data structures for unlearning is currently known.</p><p>The main motivation for unlearning is that a trained model could potentially leak user information in various forms such as membership inference attack <ref type="bibr">(Shokri et al., 2017;</ref><ref type="bibr">Carlini et al., 2022a)</ref> or even data extraction attack for text <ref type="bibr">(Carlini et al., 2021</ref><ref type="bibr">(Carlini et al., , 2023b) )</ref> or image <ref type="bibr">(Carlini et al., 2023a)</ref> generative models. An unlearning protocol allows users to withdraw their data from the model training set. The implication of such mechanisms were also studied in the literature. For example, <ref type="bibr">Ippolito et al. (2022)</ref> showed that an inference-time filtering mechanism to prevent the generation of verbatim training text sequences could be easily bypassed by "style-transfer" like prompting in large language models. <ref type="bibr">Carlini et al. (2022b)</ref> further show a "privacy onion" effect where unlearning one set of examples could have large impact on the privacy score of a non-overlapping set of examples in the training set. Furthermore, there have also been works exploring machine unlearning as a tool to attack an ML system, e.g., <ref type="bibr">Di et al. (2022)</ref> recently showed that unlearning can be used as a trigger for executing data poisoning attacks (on demand).</p><p>Finally, there has been a recent line of work on formulating alternative definitions of approximate (and probabilistic) unlearning to capture the spirit of "right to be forgotten" and "data deletion" under different circumstances, e.g., <ref type="bibr">(Dukler et al., 2023;</ref><ref type="bibr">Krishna et al., 2023;</ref><ref type="bibr">Cohen et al., 2022;</ref><ref type="bibr">Eisenhofer et al., 2022;</ref><ref type="bibr">Garg et al., 2020)</ref>, etc. However, developing space-efficient algorithms for these definitions is beyond the scope of our paper, and we only consider exact unlearning of ERMs. Guarantee:</p><p>Figure <ref type="figure">1</ref>: Illustration of the guarantees of LU schemes in central and ticketed models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Learning-Unlearning Schemes</head><p>We focus on supervised learning with the binary loss function &#8467;(&#375;, y) = 1{&#375; &#824; = y}. Each example (x, y) belongs to Z = X &#215; Y, with the label set Y = {0, 1}. We denote the empirical loss of a predictor h : X &#8594; Y on a dataset S = (z 1 = (x 1 , y 1 ), . . . ,</p><p>we say that a dataset S is H-realizable if there exists h &#8712; H such that L(h; S) = 0. Let ERM H (S) = argmin h&#8712;H L(h; S) be the set of all minimizers in H of the empirical loss over S. For any subset I &#166; [n] of indices, let S I denote the dataset ((x i , y i )) i&#8712;I .</p><p>Central Model of Learning-Unlearning. We define the notion of a learning-unlearning (LU) scheme below, formalizing the standard setting considered in literature, which we will refer to as the "central" model.</p><p>Definition 1 (LU Scheme) A learning-unlearning (LU) scheme for a concept class H consists of a pair (Learn, Unlearn) of algorithms as follows.</p><p>&#8226; On input S &#8712; Z n , Learn(S) returns a predictor h &#8712; ERM H (S) and auxiliary information aux &#8712; {0, 1} C .</p><p>For all S and S I &#166; S, the predictor returned by Unlearn is required to be identical to the predictor returned by Learn(S &#8726; S I ). The space complexity of the scheme is C, the bit-complexity of the auxiliary information aux.</p><p>Unless otherwise stated, we will only require the above to hold for H-realizable S.</p><p>Informally speaking, in a LU scheme, the central agent on a learning request containing a dataset, returns a predictor and also retains some auxiliary information for itself. On an unlearning request (containing a subset of the previous dataset), the agent, using its auxiliary information, returns a new predictor. This new predictor is required to be identical to the predictor that the agent would have returned on a learning request with only the surviving examples. See Figure <ref type="figure">1(a)</ref>.</p><p>In this work, our focus is on space complexity, namely, we consider a LU scheme as spaceefficient if its space complexity is poly(log n, log |Z|, log |H|). Note that the naive scheme, where aux contains the entire dataset S and Unlearn simply returns the predictor returned by Learn(S&#8726;S I ), requires C = n &#8226; log |Z| bits and hence is not space-efficient. While we do not explicitly discuss time complexity, all the algorithms that we present are also time-efficient, namely, they run in poly(n, log |Z|, log |H|) time. Our lower bounds, on the other hand, hold even against computationally inefficient algorithms. Finally, in this work, we only assume a one-shot setting, i.e., there is a single learning request with a dataset S, followed by a single unlearning request S I &#166; S.</p><p>We show in Appendix B that there exists a space-efficient LU scheme for the class of threshold functions. Unfortunately, the central model easily runs into barriers and is quite limited: we also show that any LU scheme for the class of point functions must store &#8486;(|X |) bits of auxiliary information. To circumvent such barriers, we introduce a new ticketed model of learning-unlearning.</p><p>Ticketed Model of Learning-Unlearning. The basic idea of the ticketed model is that the central agent issues "tickets" for each example that is stored by the corresponding user contributing the example. These tickets have to then be provided as part of the unlearning request (see Figure <ref type="figure">1(b)</ref>).</p><p>Definition 2 (Ticketed LU Scheme) A ticketed learning-unlearning (TiLU) scheme for a concept class H consists of a pair (Learn, Unlearn) of algorithms such that</p><p>Learn(S) returns (h, aux, (t i ) i&#8712;[n] ), with a predictor h &#8712; ERM H (S), auxiliary information aux &#8712; {0, 1} Cs and tickets t 1 , . . . , t n &#8712; {0, 1} Ct associated with examples z 1 , . . . , z n respectively<ref type="foot">foot_2</ref> </p><p>For all S and S I &#166; S, the predictor returned by Unlearn above is required to be identical to the predictor returned by Learn(S &#8726; S I ). The space complexity of the scheme is (C s , C t ), where C s is the bit-complexity of aux and C t is the bit-complexity of each ticket t i .</p><p>Unless otherwise stated, we will only require the above to hold for H-realizable S.</p><p>A TiLU scheme is similar to the central LU scheme, except that during learning, the central agent issues a "ticket" for each example in the dataset that will be given to the user contributing that said example; this ticket is required along with the example as part of the unlearning request. As before, we consider a TiLU scheme to be space-efficient if C s , C t = poly(log n, log |Z|, log |H|). Note that while the space requirement for storing all tickets does grow linearly in n, the main point is that no single party has to store more than poly-logarithmic amount of information. The challenge in constructing a TiLU scheme is that only the tickets corresponding to the examples in the unlearning request are available at the time of unlearning, so the unlearning step has to be performed with access to a limited amount of information.</p><p>Remark 3 For both schemes, our definition is restrictive in the following sense:</p><p>&#8226; Exact ERM: the learning algorithm is required to output a predictor in ERM H (S).</p><p>&#8226; Exact unlearning: the predictor after unlearning is required to be identical to the predictor learned on just the surviving examples.<ref type="foot">foot_3</ref> &#8226; One-shot unlearning: there is a single learning request, followed by a single unlearning request.</p><p>While the above restrictions are seemingly limiting, as our results show, they already capture and highlight many of the key technical challenges. Developing schemes when these restrictions are suitably relaxed (e.g., when unlearning can be approximate) is an important research direction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Mergeable Concept Classes</head><p>We define a mergeability property of concept classes and provide examples of several commonly studied classes with this property. We then provide space-efficient TiLU schemes for such classes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 4 (Mergeable Concept Class</head><p>Before we describe TiLU schemes for such classes, we list a few well-studied concept classes that are efficiently mergeable (details deferred to Appendix A).</p><p>&#8226; Thresholds. Class H th consists of all threshold functions over X = {1, . . . , |X |}, namely for any a &#8712; {0, 1, . . . , |X |}, we have</p><p>&#8226; consists of all parity functions, namely for</p><p>mergeable, where d is the VC-dimension of H. In particular, this includes H d th (more examples in Appendix A). We also consider an example of a simple class that is not efficiently mergeable.</p><p>&#8226; Point Functions. Class H pt consists of all point functions over X = {1, . . . , |X |}, namely for any a &#8712; {1, . . . , |X |}, we have h a (x) = 1{x = a}; H pt is not o(|X |)-bit mergeable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Ticketed LU Schemes for Mergeable Concept Classes</head><p>In this section, we provide a TiLU scheme for mergeable concept classes. The basic idea is to use a Merkle tree-like data structure to construct tickets for each example.</p><p>Theorem 5 For any C-bit mergeable concept class H, there exists a TiLU scheme with space complexity</p><p>Proof For simplicity, we assume that n is a power of 2; the argument generalizes to all n easily. Consider a full binary tree of depth d = log 2 n with leaf i corresponding to example</p><p>Figure 2: Illustration of TiLU scheme underlying the proof of Theorem 5. The ticket t 5 for the example (x 5 , y 5 ) are the index i and the outputs of Encode applied on S {1,2,3,4} , S {7,8} , and S {6} .</p><p>examples corresponding to the leaf nodes in the subtree under v. For any leaf i, let v 1 , . . . , v d-1 be the nodes on the path from the root to leaf i, and for each j &#8712; {2, . . . , d} let &#7805;j be the child of v j-1 and sibling of v j . Figure <ref type="figure">2</ref> shows an example. Let the ticket corresponding to example i be given as</p><p>It is immediate to see that the number of bits in</p><p>Define Learn(S) to return h = Decode(Encode(S)), aux = h, and tickets t i as specified above. We define Unlearn as follows. If I is empty, then simply return h. Given a non-empty I &#166; [n], let R be the set of all nodes v such that no leaf in the sub-tree under v belongs to I, but the same is not true of the parent of v. It is easy to see that S &#8726; S I is precisely given as v&#8712;R S v , and moreover, S v and S v &#8242; are disjoint for distinct v, v &#8242; &#8712; R. For all v &#8712; R, we can recover Encode(S v ) from ticket t i for any leaf i &#8712; I in the subtree under the sibling of v. Thus, by repeated applications of Merge, we can recover Encode( v&#8712;R S v ) = Encode(S &#8726; S I ). Finally, we can return h &#8242; = Decode(Encode(S &#8726; S I )) as the predictor after unlearning, which is identical to the predictor returned by Learn(S &#8726; S I ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Improvements with Compressibility</head><p>We augment the notion of mergeable concept classes, with an additional notion of existence of compressions (similar to the notion introduced by Littlestone and Warmuth (1986)), and provide improved TiLU schemes for such classes. The advantage of this scheme over that in Theorem 5 is that the space complexity of the ticket depends on n through only an additive log n factor.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 6 (Mergeable Concept Class with Compressions</head><p>is said to be C-bit mergeable with K-compressions if in addition to Encode, Decode, Merge as in Definition 4, there exists a relation Compress &#166; Z * &#215; Z fK , where we say that T &#8712; Z fK is a compression of S if (S, T ) &#8712; Compress. For any H-realizable S &#8712; Z * , the following properties must hold for any valid compression T : (i) T &#166; S, (ii) Encode(S) = Encode(T ), and (iii) for all z &#8712; S &#8726; T , T is a compression for S &#8726; {z}.</p><p>The above notion is related, but incomparable to classes with stable compression schemes <ref type="bibr">(Bousquet et al., 2020)</ref>. A stable compression scheme requires that there exist a unique or canonical compression set for any realizable dataset, whereas in the above we allow the existence of many compressions and do not require that there be a canonical one. On the other hand, the above definition requires the mergeable property, which is not required for stable compression schemes.</p><p>It is easy to see that all the examples of mergeable concept classes we considered earlier are in fact mergeable with compressions (the details are deferred to Appendix A). In particular,</p><p>&#8226; Theorem 7 For any class H that is mergeable with K-compressions, there exists a TiLU scheme with space complexity</p><p>Proof Algorithm Learn works as follows. Given any dataset S, partition S into T 1 , T 2 , . . . such that T i is a compression of jgi T i (this can be iteratively done by choosing T 1 as any compression of S, T 2 as any compression of S &#8726; T 1 , and so on). The ticket t i for the example (x i , y i ) that lies in T j is given as (j, T j &#8226; T j+1 ). The size of the ticket is 2K log |Z| + log n, since |T &#8467; | f K for all &#8467;, and the number of parts is at most n. The predictor returned is h = Decode(Encode(S)) and aux = h.</p><p>Algorithm Unlearn is defined as follows. If I is empty, then simply return h. Otherwise, let &#8467; be the smallest index such that S I &#8745; T &#8467; = &#8709;. We can reconstruct T j for all j f &#8467; from the tickets, since for each j &lt; &#8467; there exists some example (x i , y i ) &#8712; T j that has been presented for unlearning, and the ticket</p><p>Learn on input S &#8726; S I would have returned Decode(Encode(S &#8726; S I )). Thus, to be a valid TiLU scheme, it suffices to show that Encode(T</p><p>where, ( * ) follows from the property of Merge and ( * * ) uses that Encode(T &#8467; ) = Encode( jg&#8467; T j &#8726; S I ), since T &#8467; is a compression of jg&#8467; T j and hence a compression of jg&#8467; T j &#8726; S I .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Sharper Bounds for Specific Classes</head><p>While the previous general reductions are applicable to a large number of concept classes, the resulting space complexity bounds can be improved for certain classes by designing more specific algorithms. In this section, we present TiLU schemes for point functions and (product of d) thresholds, as stated formally below. In the case of product of d thresholds, this significantly improves the dependency on n from log n to log &#179; -1 (n) (inverse-Ackermann function; see Definition 9), and in case of 1D thresholds, we also improve the dependency on domain size.</p><p>Theorem 8 There exist TiLU schemes for (a) H pt with space complexity</p><p>Class H C t (from Theorem 5) C t (from Theorem 7)</p><p>Table 1: The size C t of tickets, derived as corollaries of Theorem 5 for various concept classes. The size C s of aux is O(log |H|) in each case.</p><p>(c) H th with space complexity</p><p>Note that Item (c) is an improvement over Item (b) for d = 1, as the ticket size in the former does not depend on |X | at all. Moreover, Item (a) provides a TiLU scheme for the class of point functions for which the techniques from Theorems 5 and 7 were not applicable (see Proposition 19).</p><p>In Table <ref type="table">1</ref>, we summarize the implied bounds of Theorems 5, 7 and 8 for some concrete classes.</p><p>At the heart of our improvements is a problem called count-to-zero, which we introduce and give a TiLU scheme for in Section 4.1. In the subsequent sections, we then use it to give TiLU schemes for each of the aforementioned class. We only describe high-level overviews and the full proof of Theorem 8 is deferred to Appendix D.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Count-to-Zero</head><p>We first describe the Count-to-Zero (CtZ) problem. Here there are m examples, where m is unknown a priori. After receiving an unlearning request, we would like to tell whether there is still any example left (that is not unlearned). For CtZ, we follow the terminologies of Definition 2 except that Unlearn returns either &#167; (when S I = S, and thus no example is left) or &#166; (when S I &#170; S, and some example is remaining), instead of the hypotheses h, h &#8242; . <ref type="foot">5</ref>We give a TiLU scheme for CtZ with ticket size O(log &#179; -1 (m)) bits (and one-bit auxiliary information), where &#179; -1 is the inverse-Ackermann function defined below. Note that a naive algorithmwriting down m in each ticket-requires ticket size O(log m) bits.</p><p>Definition 9 Consider a sequence of functions A 1 , A 2 , . . . defined as, A r (1) = 2 for all r g 1, and for all t g 2,</p><p>&#8226; A 1 (t) = 2t (interpretable as t-fold repeated addition of 2), and</p><p>t-1 times .</p><p>The Ackermann function &#179;(t) is defined as A t (t). The inverse-Ackermann function &#179; -1 (n) is defined as the smallest t such that n f &#179;(t).</p><p>For example, A 2 (t) = 2 t , and A 3 (t) = 2 &#8648; t, where a &#8648; k denotes the kth tetration of a (i.e., a a &#8226; &#8226; &#8226; a of order k), etc. The Ackermann function was demonstrated as a function that is recursive, but not primitive recursive.</p><p>Theorem 10 There is a TiLU scheme for Count-to-Zero (CtZ) problem with space complexity</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1.">SMALL-ALPHABET SIZE-INDEXING SPERNER FAMILIES</head><p>Recall that a Sperner family is a family Q of multi-sets (Q i ) i&#8712;N such that none of them is a submultiset of another (see the book by <ref type="bibr">Engel (1997)</ref> for background on Sperner families). We say that a Sperner family Q is size-indexing if for all m &#8712; N, the multiset Q m has m elements (denoted |Q m | = m). Furthermore, we say that Q is of alphabet size n(m) if for all m &#8712; N, each element of the multiset Q m is from the ordered alphabet [n(m)] = {1, . . . , n(m)}. For &#8467;, r &#8712; N, &#8467; &lt; r, we define [&#8467;, r]-size-indexing Sperner family similar to above, except with Q = {Q &#8467; , . . . , Q r }.</p><p>Lemma 11 For all r, t g 1, there is an [A r (t), A r (A r (t))]-size-indexing Sperner family Q r,t with alphabet size 2r.</p><p>Proof We prove this by induction over r. Consider the base case of r = 1, i.e., we want to construct a [2t, 4t]-size-indexing Sperner family Q 1,t with alphabet {1, 2}. For each m &#8712; {2t, . . . , 4t}, let</p><p>Next, consider the case of r &gt; 1. Let the alphabet be {1, . . . , 2r}. If t = 1, then note that A r (t) = 2 and A r (A r (t)) = 4, and thus from above, there exists Q r,t with alphabet size 2. Next, consider the case of t g 2. For ease of notation, let T = A r (t). From the inductive hypothesis we have that there exists an</p><p>] and let j m be the unique value such that A r (j m ) f m -T + 2 &lt; A r (j m + 1). Since m f A r (T ) we have that 1 f j m &lt; T . We construct Q r,t m as the union of Tj m -1 copies of 2r-1 and j m -1 copies of 2r and Q r-1,jm m-T +2 , where the latter uses symbols {1, . . . , 2r-2}. Clearly, |Q r,t m | = m. To see that this is a Sperner family, consider any m &lt; m &#8242; . Note that since</p><p>are two (different) subsets from the same Sperner family Q r-1,j , and hence</p><p>Theorem 12 There is a size-indexing Sperner family with alphabet size O(&#179; -1 (m) 3 ).</p><p>Proof We first show that there exists a [2, A t (t)]-size-indexing Sperner family with alphabet size t 2 . From Lemma 11, we have [A t (i), A t (i + 1)]-size-indexing Sperner families Q t,i using alphabet of size 2(t -1)</p><p>Renaming the symbols to be distinct and setting Q = (Q t,1 , Q t,2 , . . . , Q t,t ), we get a [2, A t (t)]-size-indexing family. In particular, this implies a [A t-1 (t -1), A t (t) -1]-size indexing Sperner family Q t . Finally, this implies a size-indexing Sperner family (for all sizes) with alphabet size n(m) f O(m 3 ) as Q = (Q t ) tg0 , by renaming symbols so that each Q t uses distinct symbols. (We use Q 0 to denote the trivial [1, 1]-size-indexing family with alphabet size of 1.)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2.">FROM SIZE-INDEXING SPERNER FAMILIES TO A TILU SCHEME</head><p>Next, we show that a size-indexing Sperner family can be used to construct a TiLU scheme for CtZ. The basic idea is to use elements of Q |S| as the tickets. This is formalized below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Proof of Theorem 10</head><p>For each m &#8712; N, let Q m be the multiset from the size-indexing Sperner family from Theorem 12; recall |Q m | = m. Finally, for each m &#8712; N and i &#8712; [m], we let q m i denote the i-th element in Q m (where the elements of Q m are ordered arbitrarily). Recall also that S is our original training set. Our algorithm is as follows:</p><p>where the auxiliary information (&#166; or &#167;) can be written in one bit of aux. For unlearning, we define</p><p>where the expression {t i } i&#8712;I is interpreted as a multiset.</p><p>The correctness is obvious in the cases S I = &#8709; and S I = S. We next show correctness when &#8709; &#170; S I &#170; S. Since Q is a Sperner family, we have</p><p>and thus Unlearn will return &#166;-which is the correct answer-in this case.</p><p>Finally, the space complexity claim follows from the fact that q</p><p>3 )] since the Sperner family has alphabet size O(&#179; -1 (m) 3 ). Each ticket only stores a single character from an alphabet of size O(&#179; -1 (m) 3 ) and hence the ticket size is O(log &#179; -1 (m)).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Point Functions</head><p>We can use CtZ scheme to get a TiLU scheme for H pt , the class of point function. First, we need to define the predictor h output by Learn(S). If (a, 1) appears in S for some a &#8712; X , then we must output h a . Otherwise, we can output h b for any b &#8712; X such that (b, 0) is not in S; for tie-breaking, we choose the smallest such b. The first case is easy to check: we can just use CtZ to determine whether there is still any (a, 1) left after unlearning. This only requires O(1)size auxiliary information and O(log &#179; -1 (n))-size tickets as desired. Now, suppose we are in the second case, and that we start off with b * being the smallest such that (b * , 0) is not in S. The idea here is to use the CtZ scheme to check whether, after unlearning, any (b, 0) remains in the set for each b &lt; b * . Note that each example (x, y) is only a part of one such subscheme (for b = x) and therefore the ticket size remains O(log &#179; -1 (n)). However, there seems to be an issue: since we run this subscheme for all b &lt; b * , we may require O(b * ) f O(|X |) bits to keep the auxiliary information. Fortunately, this turns out to be unnecessary: Observe that our CtZ scheme never uses the auxiliary information except for the case where the surviving set is empty. On the other hand, if the unlearning request does not contain (b, 0) for b &lt; b * , we know that S must contain (b, 0) for each b &lt; b * (due to minimality b * ). Thus, we do not require storing any additional auxiliary information. This gives the final</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Product of d Thresholds</head><p>In the case of product of d thresholds H d th , we start by deriving a primitive for computing the minimum value of the dataset. In the MINimum VALue (MINVAL) problem, we are given a set S &#166; X where X &#166; R is the domain and the goal is to output min(S) if S &#824; = &#8709; and &#167; if S = &#8709;.</p><p>We first give a TiLU scheme for MINVAL with space complexity</p><p>To understand the high-level idea, let us consider the case where the input contains only distinct values from X . In this case, the algorithm is simple: let aux = min(S) and the ticket to each x be its successor (i.e., the smallest element in S greater than x). When the minimum is unlearned, we update the minimum to its successor. The actual scheme for MINVAL is more subtle, since we need to handle repeated elements. Nonetheless, the general idea is that we additionally use the CtZ scheme to help determine whether all copies of the minimum have been unlearned.</p><p>To derive a TiLU scheme for H d th , we use the following learning algorithm for H d th : Output h &gt;a where, for j &#8712; [d], a j is the minimum value among all jth coordinate of the 1-labeled samples, minus one. To compute this minimum value, we simply use the TiLU scheme for MINVAL.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Further Improvements for 1D Thresholds</head><p>Our improvement for H th is based on an insight from the following (central) LU scheme. First, use binary search to find the threshold. Namely, we start with a 0 = +|X |/2,. If h &gt;a 0 agrees with all the examples, then output h a 0 . Otherwise, we determine whether a 0 is too large or too small, and then adjust the search range and recurse. Suppose that on the original dataset the search path is a 0 , . . . , a i . Observe that, after unlearning, the output predictor must be one of h &gt;a 0 , . . . , h &gt;a i . Thus, we record (in aux) the empirical loss of each h &gt;a j . We can then update this after seeing the unlearning set. This allows us to determine where in the binary search we stop, and thus the predictor to output. Since binary search examines O(log |X |) hypotheses and recording each empirical loss uses O(log n) bits, the space complexity of this LU scheme, without tickets, is O((log |X |)&#8226;(log n)). Now, to reduce the dependency on n, we can apply our CtZ scheme. The most natural attempt would be to apply the CtZ scheme on the sets of examples that each h &gt;a j errs on. However, since each example can be wrong on all of h &gt;a 0 , . . . , h &gt;a i-1 , an example can receive as many as i -1 = O(log |X |) different tickets from the CtZ scheme. This results in</p><p>To reduce the ticket size further, observe that whether the empirical error of h &gt;a j is zero is actually just whether there is an example left between a i and a j . With this in mind, we partition the domain X at points a 1 , . . . , a j and then use the CtZ scheme for each partition. Since we can determine whether each partition is empty, we can determine whether each h &gt;a j has zero empirical error after unlearning. Furthermore, since each training example belongs to just a single partition, it receives a single ticket of size O(log &#179; -1 (n)). This concludes our high-level overview.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Lower Bounds</head><p>Given the mild dependence on n in the above schemes' space complexity (O(log &#179; -1 (n)) bits), it is natural to ask whether this can be removed altogether. The main result of this section is that it cannot be removed. (In other words, at least one of C s or C t must grow with n.) Recall that a concept class is non-trivial if it contains at least two concepts.</p><p>Theorem 13 For any non-trivial H, there does not exist any TiLU scheme for H with space complexity</p><p>Once again, our key ingredient is a lower bound on size-indexing Sperner family, which we prove in Section 5.1. We then turn this into a space complexity lower bound for CtZ (Section 5.2), before finally reducing from this to space complexity of learning any concept class (Section 5.3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Lower Bounds for Sperner Family</head><p>We first derive lower bounds for Sperner families that are more general than size-indexing, as defined below.</p><p>Let a = (a i ) i&#8712;N be any infinite (not necessarily strictly) increasing sequence of integers. We say that a Sperner family</p><p>for the sequence a i = i, this coincides with the size-indexing family from Section 4.1.1.) The main result of this subsection is that any a-size-indexing family must have alphabet size that grows to infinity:</p><p>Theorem 14 For any constant &#195; &#8712; N and any increasing sequence a, there is no a-size-indexing Sperner family with alphabet size &#195;.</p><p>To prove Theorem 14, we need the following well-known result , which is a simple consequence of the so-called "infinite Ramsey theorem". Recall that a subset of elements of a partially ordered set (poset) is said to be a chain if each element in the subset is comparable to all other elements in the subset. A subset is an antichain if no two pair of elements in the subset are comparable.</p><p>Lemma 15 (e.g., <ref type="bibr">Tachtsis (2016)</ref>) Any infinite poset either contains an infinite chain or an infinite anti-chain.</p><p>Proof of Theorem 14. For convenience, we view each multiset Q i as a vector in N &#195; , and let Q i,j denote the number of times j appears in Q i . We will prove the main statement by induction on &#195;.</p><p>Base Case. The case &#195; = 1 is trivial, as for any i &lt; j we have Q i &#166; Q j . We next focus on &#195; = 2. Suppose for the sake of contradiction that there is an a-size-indexing family Q with alphabet size 2.</p><p>Thus, by the pigeonhole principle, there must exist some indices i 1 &lt; i 2 (with i 1 , i 2 f 2a 1 ) such that either</p><p>Inductive Step. Suppose that the statement holds for all &#195; &lt; m for some positive integer m g 3. Furthermore, suppose for the sake of contradiction that there exists an a-size-indexing family Q with alphabet size m.</p><p>Applying Lemma 15, this poset must contain an infinite chain or an infinite anti-chain. We consider these two cases separately:</p><p>&#8226; Case I: It contains an infinite anti-chain (b i j ) j&#8712;N . Consider the family S := (S j ) j&#8712;N where S j has elements 1, 2 defined by letting</p><p>Notice that S is an (a i j ) j&#8712;N -size indexing family. Furthermore, since b i j is an anti-chain, this is a Sperner family. This contradicts our inductive hypothesis for &#195; = 2. &#8226; Case II: It contains an infinite chain (b i j ) j&#8712;N , i.e., where b i 1 &lt; b i 2 &lt; &#8226; &#8226; &#8226; . Consider S := (S j ) j&#8712;N where S j results from throwing away all ones from Q i j . S is an ((b i j ) 2 ) j&#8712;N -size indexing family, and ((b i j ) 2 ) j&#8712;N is a increasing sequence (because (b i j ) j&#8712;N forms a chain).</p><p>Finally, since</p><p>form a Sperner family 6 . Since this collection only uses alphabet 2, . . . , m, this contradicts our inductive hypothesis for &#195; = m -1. In both cases, we arrive at a contradiction, concluding the proof. &#9632;</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">From Sperner Family to CtZ</head><p>Next, we observe that the lower bound on the alphabet size for Sperner Family translates to a lower bound for CtZ. This is the "reverse reduction" of Theorem 10, but this direction requires more care as we have to handle the auxiliary information aux as well.</p><p>Lemma 16 There is no TiLU scheme for CtZ with space complexity (C s = O(1), C t = O(1)).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">From Learning any Classes to CtZ</head><p>Finally, we observe that any TiLU scheme for the learning/unlearning problem can be converted into a TiLU scheme for CtZ with the same complexity, formalized below in Theorem 17. This, together with Lemma 16, yields Theorem 13.</p><p>Lemma 17 For any non-trivial H, if there exists a TiLU scheme for H with space complexity (C s , C t ), then there also exists a TiLU scheme for CtZ with the same space complexity (C s , C t ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Future Directions</head><p>In this paper, we introduce and study the ticketed model of unlearning and obtain several results. Perhaps the most basic unresolved question is to characterize the classes for which space-efficient TiLU schemes exist. For example, can we get TILU schemes for all concept classes with VCdimension d, whose space complexity scales as poly(d, log(n), log(|X |))? Note that we do not know the answer to this question even for some basic classes such as linear separators.</p><p>Another tantalizing open question is to make the lower bound for CtZ (Theorem 17) quantitative. Related to this, it would also be interesting to understand whether subpolylogarithmic in n dependency is always possible for all space-efficient TiLU schemes.</p><p>We already pointed out several limitations of our model in Remark 3. Extending the TiLU model to overcome these limitations, e.g., allowing approximate ERM, seems like an interesting research direction. Another intriguing direction is to study unlearning in the agnostic setting where the dataset is not assumed to be H-realizable. It is unclear how to obtain generic space-efficient LU schemes, such as ones in Theorem 5 and Theorem 7, in the agnostic setting; as a preliminary result, we give a TiLU scheme for agnostic 1D thresholds in Appendix C.2. Finally, it is also interesting to study the ticketed model for simpler tasks such as realizability, testing, etc; Appendix F contains initial results for agnostic 1D thresholds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENTS</head><p>Part of the work was done when AS was a student researcher at Google Research, Mountain View during summer 2022. AS thanks Surbhi Goel, Sasha Rakhlin, and Karthik Sridharan for useful discussions, and acknowledges the support of NSF through awards DMS-2031883 and DMS-1953181, as well as DOE through award DE-SC0022199.</p><p>6. Otherwise, if Sj &#166; S &#8467; for some j &lt; &#8467;, we must have Qi j &#166; Qi &#8467; since Qi j ,1 f Qi &#8467; ,1.</p><p>Dataset S constructed by Alice:</p><p>Dataset S &#8726; S I2 :</p><p>Figure <ref type="figure">3</ref>: Illustration of lower bound reduction in Theorem 24, with red circles for label 1 and green squares for label 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2. Upper Bounds for TiLU Schemes</head><p>We now describe a TiLU scheme for H th with space complexity</p><p>that is valid for all distributions. We first describe the key ideas before delving into the finer details.</p><p>We will construct a Learn algorithm that would return the predictor h &gt;a &#8712; ERM H th (S) with the smallest possible a (hence forth referred to as the "minimal ERM").</p><p>For simplicity we assume that |X | = D = 2 d is a power of 2 (the proof can easily be generalized to other values). We use the following notation below for all p, q &#8712; X = {1, . . . , D} with p &lt; q:</p><p>&#8226; S p,q denotes dataset within S with x-values in {p, . . . , q}.</p><p>&#8226; Let a p,q be the smallest value such that h &gt;ap,q &#8712; ERM H th (S p,q ) and p -1 f a p,q f q; note that such a value exists because all examples in S p,q have x values between p and q due to which L(h &gt;a ; S p,q ) is same for all a f p -1 and similarly, the same for all a g q. &#8226; err p,q := L(h &gt;ap,q ; S). Intuitively speaking, err p,q is the smallest loss incurred by any function h &gt;a over S subject to p -1 f a f q. Consider an interval [p, q] such that no example in S I has x-value in [p, q]. Now, if the minimal ERM of S &#8726; S I happens to be of the form h &gt;a for p -1 f a f q, then, the minimal ERM has to be h &gt;ap,q . Moreover, we have L(h &gt;ap,q ; S &#8726; S I ) = err p,q -L(h &gt;ap,q ; S I ).</p><p>Learn. Consider a full binary tree of depth d with leaf x corresponding to a possible input value x &#8712; X . For each internal node v, let p v be the smallest leaf index under v, and similarly let q v be the largest leaf index under v. For each example (x i , y i ), let v 1 , . . . , v d-1 be the nodes on the path from the root to leaf x i , and for each j &#8712; {2, . . . , d} let &#7805;j be the child of v j-1 and sibling of v j . Figure <ref type="figure">4</ref> shows an example. Let the ticket corresponding to example (x i , y i ) be given as</p><p>a pv 2 ,qv 2 , err pv 2 ,qv 2 ), . . . , (a pv d ,qv d , err pv d ,qv d )) S 1,8 S 1,4 S 1,2 1 2 S 3,4 3 4 S 5,8 S 5,6 5 6 S 7,8 7 8 Figure 4: Illustration of TiLU scheme underlying the proof of Theorem 24(b) for |X | = 8. The ticket t i for all examples of the form (x i = 5, y i ) are the predictors h &gt;a 6 , h &gt;a 7,8 , h &gt;a 1,4 as well as err 6 , err 7,8 and err 1,4 . It is immediate to see that the number of bits in t i is d(d + log n). Define Learn(S) to return h = h &gt;a 1,D , aux = h, and tickets t i as specified above. Unlearn. If S I is empty, then simply return h (present in aux). Given a non-empty dataset S I indexed by I &#166; [n], we can use the tickets to construct a partition of [1, <ref type="bibr">D]</ref> into disjoint intervals such that either each interval is a singleton {p} such that some example in S I has x-value equal to p, or is of the form [p, q] such that no example in S I has an x-value in [p, q]. As argued above, we know that the minimal ERM for S &#8726; S I must be of the form h &gt;ap,q for one of these parts as recovered above. Moreover, it is possible to compute the loss of each such predictor over S as L(h &gt;ap,q ; S &#8726; S I ) = err p,q -L(h &gt;ap,q ; S I ). Thus, we can choose the minimal ERM for S &#8726; S I by choosing the predictor h &gt;a with smallest empirical loss over S &#8726; S I and the smallest possible value of a among the candidates above.</p><p>Appendix D. Missing Proofs from Section 4 D.1. Point Functions Proof of Theorem 8(a) In the following, we provide the methods Learn and Unlearn. Our algorithm will use the TiLU scheme ( Learn, Unlearn) for CtZ from Theorem 10. One specific property of this scheme we will use is that Unlearn does not need aux in the case where the unlearning set is non-empty.</p><p>Learn. We will separately define the hypothesis h, the auxiliary information aux, and the ticket</p><p>&#8226; The hypothesis h is defined as follows:</p><p>&#8226; The auxiliary information aux consists of three parts aux 1 , aux 2 , aux 3 . The first part aux 1 is simply h. The second part aux 2 is one bit and records which of the two cases we are in for h, i.e., aux 2 := 1{&#8707;a &#8712; X , (a, 1) &#8712; S}. The last part aux 3 records min{b | (b, 0) / &#8712; S}.</p><p>&#8226; As for the tickets, for every a &#8712; X , let S (a) denote {(x i , y i ) | i &#8712; [n], x i = a}. For every a &#8712; X , we run Learn on S (a) and let the tickets for the examples in S (a) be as the output of Learn.</p><p>Unlearn. Unlearn proceeds as follows, based on if aux 2 = 1. Here, for every a &#8712; X , let S To verify that this is a valid learning scheme, note that for a and j's used in the algorithm, we know that S (a) and S (j) are not empty. Therefore, by considering the case S I . Due to this, our procedure correctly updates (i) if there is any 1-labeled example remaining and (ii) the minimum b whose (b, 0) does not appear in the dataset. From this, it is not hard to see that (Learn, Unlearn) is a valid ticketed LU scheme. The claimed space complexity the follows immediately from that of Theorem 10.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.2. Product of d Thresholds</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.2.1. MINIMUM VALUE PRIMITIVE</head><p>Theorem 25 There is a TiLU scheme for MINVAL with space complexity</p><p>Proof For each a &#8712; X , we write succ S (a) to denote the maximum value in S that is larger than a; if such a value does not exist, then let succ S (a) = &#167;. Similarly, we use the convention that min(&#8709;) = &#167;. Similar to before, we use S (a) (resp. S (a)</p><p>, and let ( Learn, Unlearn) be the ticketed LU scheme for CtZ from Theorem 10. We now describe our algorithms.</p><p>Learn. First, we let h = aux = min(S). Each ticket t i consists of two components t</p><p>(1) i , t</p><p>(2) i . The second component t</p><p>(2) i is simply set to succ S (x i ). As for the first component, it is computed as follows: for every a &#8712; X , we run Learn(S (a) ) and assign the tickets back to samples in S (a) as a first component. As for the ticket, if y i = 0, then the ticket t i is empty. For the 1-labeled samples, their ticket t i is the concatenation of the corresponding tickets from Unlearn(S (+) 1 ), . . . , Unlearn(S (+) d ). Similarly, the auxiliary information is the concatenation of the auxiliary information from these Unlearn executions.</p><p>Unlearn. We use Unlearn to find b j = min(S (+) j \ (S I ) (+) j ). We then let</p><p>Finally, output h &gt;a .</p><p>It is simple to see that the learning algorithm is correct: by definition, all 1-labeled examples are correctly labeled by h &gt;a . Furthermore, if the input training set is realizable by h &gt;a * , then we must have a * j f a j for all j &#8712; [d]; this implies that h &gt;a also label all 0-labeled examples correctly. The correctness of the unlearning algorithm then immediately follows from the correctness of ( Learn, Unlearn) for MINVAL.</p><p>Since the ticket is a concatenation of the d tickets for MINVAL, we can use the bound in Theorem 25. This implies that the total ticket size is O d &#8226; log m + log &#179; -1 (n) = O(log |X | + d &#8226; log &#179; -1 (n))) as desired. A similar calculation shows that the auxiliary information has size O(log |X |).</p><p>unlearning request S I i to remove all 0-labeled samples except (2i + 1, 0), and all 1-labeled samples with x values smaller than 2i + 1 There are two cases:</p><p>&#8226; X i = 0: In this case, S \ S I i contains (2i + 1, 1) and is not H th -realizable, meaning that we set Y i = 0. &#8226; X i = 1: In this case, S \ S I i does not contain (2i + 1, 1) and thus is H th -realizable, leading to Y i = 1. Therefore, Y i = X i in both cases as desired. The lower bound follows since n = 2m.</p><p>(b) To obtain a TiLU scheme, the key idea is that in order to check whether S is realizable via a threshold, we only need to verify if the largest 0-labeled sample is strictly smaller than the smallest 1-labeled sample. The smallest 1-labeled sample can be computed using the MINVAL primitive in Theorem 25, and the largest 0-labeled sample can be computed using the MAXVAL primitive in Theorem 27, both of which have a TiLU scheme with space complexity (O(log |X |), O(log</p><p>With that in mind, we define Learn and Unlearn as follows.</p><p>Learn. On input S,</p><p>&#8226; Compute S 0 as the set of all 0-labeled samples in S, and S 1 = S \ S 0 .</p><p>&#8226; Implement MAXVAL on S 0 , let x 0 = max(S 0 ) and compute the corresponding tickets for samples in S 0 . &#8226; Implement MINVAL on S 1 , let x 1 = min(S 1 ) and compute the corresponding tickets for samples in S 1 . &#8226; If either x 0 or x 1 is &#167;, or if x 0 &lt; x 1 , set aux = ( &#167;, x 0 , x 1 ), and return.</p><p>&#8226; Else, set aux = (&#166;, x 0 , x 1 ) and return.</p><p>Learn. On input aux = (a, x 0 , x 1 ), unlearning requests S I and the corresponding tickets {t z } z&#8712;S I , return a if S I is empty. Otherwise,</p><p>&#8226; If a = &#167;, return &#167;.</p><p>&#8226; Else, compute S 0 I and S 1 I as the set of 0-labeled and 1-labeled delete requests respectively. Similarly, define the tickets T 0 I and T 1 I as the tickets for the 0-labeled and 1-labeled samples in I respectively.</p><p>&#8226; Compute the max 0-labeled sample in remaining dataset via x 0 = MAXVAL.Unlearn(S 0 I , x 0 , T 0 I ). &#8226; Compute the min1-labeled sample in remaining dataset via x 1 = MINVAL.Unlearn(S 1 I , x 1 , T 1 I ). &#8226; If x 0 = &#167; or x 1 = &#167;, return &#167;.</p><p>&#8226; Else if x 0 &lt; x 1 , return &#167;. Else, return &#166;.</p><p>Since, the MINVAL and MAXVAL primitives compute the minimum 1-labeled x 1 , and the maximum 0-labeled samples x 0 . Additionally, since realizability testing w.r.t. H th is equivalent to checking whether x 0 &lt; x 1 , one can verify that the above algorithm is correct. Since the MINVAL and MAXVAL primitives can be executed using a TiLU scheme with space complexity </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>&#169; 2023 B. Ghazi, P. Kamath, R. Kumar, P. Manurangsi, A. Sekhari &amp; C. Zhang.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1"><p>For ease of notation, we let L denote the sum of losses over the dataset, as opposed to the more conventional average.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>The tickets are sent back to the users contributing the corresponding examples, and not stored centrally.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3"><p>We could consider a seemingly more relaxed variant where the distribution of the predictor returned by Unlearn equals the distribution of the predictor returned by Learn on the surviving examples, but given any such scheme, we could convert it to have a deterministic output by simply choosing a canonical predictor for each distribution.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_4"><p>We use S1 &#8746; S2 to denote the concatenation of the two datasets.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_5"><p>We note that the outputs &#167; or &#166; can be stored using only one bit.</p></note>
		</body>
		</text>
</TEI>
