<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Referee-Meta-Learning for Fast Adaptation of Locational Fairness</title></titleStmt>
			<publicationStmt>
				<publisher>AAAI</publisher>
				<date>03/25/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10503376</idno>
					<idno type="doi">10.1609/aaai.v38i20.30197</idno>
					<title level='j'>Proceedings of the AAAI Conference on Artificial Intelligence</title>
<idno>2159-5399</idno>
<biblScope unit="volume">38</biblScope>
<biblScope unit="issue">20</biblScope>					

					<author>Weiye Chen</author><author>Yiqun Xie</author><author>Xiaowei Jia</author><author>Erhu He</author><author>Han Bao</author><author>Bang An</author><author>Xun Zhou</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p>When dealing with data from distinct locations, machine learning algorithms tend to demonstrate an implicit preference of some locations over the others, which constitutes biases that sabotage the spatial fairness of the algorithm. This unfairness can easily introduce biases in subsequent decision-making given broad adoptions of learning-based solutions in practice. However, locational biases in AI are largely understudied. To mitigate biases over locations, we propose a locational meta-referee (Meta-Ref) to oversee the few-shot meta-training and meta-testing of a deep neural network. Meta-Ref dynamically adjusts the learning rates for training samples of given locations to advocate a fair performance across locations, through an explicit consideration of locational biases and the characteristics of input data. We present a three-phase training framework to learn both a meta-learning-based predictor and an integrated Meta-Ref that governs the fairness of the model. Once trained with a distribution of spatial tasks, Meta-Ref is applied to samples from new spatial tasks (i.e., regions outside the training area) to promote fairness during the fine-tune step. We carried out experiments with two case studies on crop monitoring and transportation safety, which show Meta-Ref can improve locational fairness while keeping the overall prediction quality at a similar level.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Locational bias has been widely studied in many social sectors and linked with social disparities of various types <ref type="bibr">(Fan et al. 2020;</ref><ref type="bibr">Kontokosta and Hong 2021)</ref>, such as the impact of climate change and related disasters (e.g., floods, food shortage), resource distribution (e.g., subsidies in agriculture), and infrastructure quality and safety. With the increasing of adoptions of machine learning methods in broad domains (e.g., climate resilience, food security, public resource management), fairness issues associated with these prediction models have become a major subject, directly impacting the trust from the public and the sustained use of the systems in the long-term. While fairness issues related to races or genders have been extensively examined in machine learning <ref type="bibr">(Hardt, Price, and Srebro 2016;</ref><ref type="bibr">Dwork et al. 2012;</ref><ref type="bibr">Zafar et al. 2017;</ref><ref type="bibr">Agarwal et al. 2018;</ref><ref type="bibr">Creager et al. 2019;</ref><ref type="bibr">Kusner et al. 2017)</ref>, very few studies have attempted to consider locational fairness. The lack of consideration for locational fairness in machine learning applications may result in unintended consequences (e.g., biased resource distribution). In this study, we exemplify this issue by examining two important social problems: (1) Agricultural monitoring: The climate change and the growing population have raised alarms and attention on global food security (e.g., G20's GEOGLAM initiative). With the broad deployment of machine learning in satellite-based crop monitoring (e.g., NASA Harvest), it is critical to explicitly consider locational fairness in mapping results, informing key decisions such as subsidy distribution or farm insurance (Bailey and Boryan 2010; NASEM 2018). ( <ref type="formula">2</ref>) Transportation safety: Given the complexity of traffic accident risk estimation, machine learning algorithms have been increasingly used to account for heterogeneous information from diverse sources. However, without the awareness of locational fairness, these methods can produce biased risk maps, further adding bias in investments for infrastructure improvements <ref type="bibr">(Kontokosta and Hong 2021;</ref><ref type="bibr">Bednarek, Boyce, and Sileo 2022)</ref>.</p><p>We aim to create a new meta-learning framework that explicitly models locational fairness and enables rapid adaptation of fairness to new locations within different regions (e.g., different cities). Unlike traditional fairness formulation using pre-defined groups such as races and genders, locational fairness faces more challenges when being transferred between training and test data. First, in traditional fairness definitions, the groups considered in fairness evaluation are the same in the training and test datasets. In contrast, locational fairness often deals with entirely different sets of locations in the training region (e.g., city A) and test region (e.g., city B). Second, the change of spatial regions between training and test also introduces distribution shifts in the data. Third, fields such as agricultural monitoring require labor-intensive field survey, resulting in scarce availability of labeled data. Addressing these challenges requires the meta-learning model to learn the initial weights that not only can quickly adapt to the new distribution, but also adapt to an unknown fairness criterion (i.e., fairness defined on a new set of locations), from limited amount of labeled data. Several directions have been explored in related work: Fair learning: Fairness-aware learning formulations have been extensively studied and most existing works focus on pre-defined groups <ref type="bibr">(Mehrabi et al. 2021)</ref>. A mainstream direction is to minimize the correlation between learned features with sensitive attributes, such as gender or race. The approaches include sensitive information encryption or removal <ref type="bibr">(Kilbertus et al. 2018;</ref><ref type="bibr">Johndrow and Lum 2019)</ref>, feature decorrelation <ref type="bibr">(Zhao et al. 2022b)</ref>, agnostic representation learning <ref type="bibr">(Creager et al. 2019;</ref><ref type="bibr">Morales et al. 2020)</ref>, representation neutralization <ref type="bibr">(Du et al. 2021)</ref>, regularization <ref type="bibr">(Yan and Howe 2019)</ref>, and so on. However, these methods do not consider the scenario faced in this problem, where the groups represented by locations involved in fairness evaluation are different from training to test. Locational fairness: Recent studies <ref type="bibr">(Xie et al. 2022;</ref><ref type="bibr">He et al. 2022</ref><ref type="bibr">He et al. , 2023) )</ref> examined fairness formulations with respect to locations, and they focus on the case where space partitions are used for fairness evaluation. Similarly, they only consider problems where the spatial region remains the same from training to testing, and cannot address the issue of non-stationary groups. Domain shifts: Domain adaptation methods mitigate covariance shift and learn invariant domains to reduce the effects of distribution shifts on model bias <ref type="bibr">(Singh et al. 2021)</ref>. Sample-reweighting and self-training approaches <ref type="bibr">(Bickel, Br&#252;ckner, and Scheffer 2007;</ref><ref type="bibr">An et al. 2022a;</ref><ref type="bibr">He et al. 2023</ref>) also aim to reduce the distribution gap between training and testing sets by assigning higher weights to samples more similar to test samples feature-wise, or include highconfidence pseudo-labels on test samples during training. In addition, heterogeneity-aware learning tackles variability by data partitioning and network branching <ref type="bibr">(Xie et al. 2021</ref><ref type="bibr">(Xie et al. , 2023))</ref>. While these methods address distribution shifts, they also do not consider the changes of groups (locations) between training and test for fairness applications. Meta-learning: Model-agnostic meta learning (MAML)'s gradient-by-gradient training allows it to learn an initial model that can be quickly fine-tuned to the test data with only a small number of observations <ref type="bibr">(Finn, Abbeel, and Levine 2017;</ref><ref type="bibr">Ren et al. 2018;</ref><ref type="bibr">Xie et al. 2023;</ref><ref type="bibr">Chen et al. 2023)</ref>. Recent developments have also started exploring the use of MAML in fairness-aware learning <ref type="bibr">(Zhao et al. 2020</ref><ref type="bibr">(Zhao et al. , 2022a))</ref>. These methods enable a fair model's prediction to remain independent from the sensitive attributes, and can let it adapt to changing distributions. However, they similarly have not considered the case where training and test sets have completely different groups (locations). Moreover, we focus on a different class of fairness definitions -prediction quality parity instead of protected attribute decorrelation (though both are commonly used standard definitions) <ref type="bibr">(Zhang, Lemoine, and Mitchell 2018;</ref><ref type="bibr">Du et al. 2020</ref>) -for our targeted applications.</p><p>We propose a referee-meta-learning framework to address the challenges. Our contributions are:</p><p>&#8226; We propose a locational meta-referee (Meta-Ref) which learns to dynamically adjust learning rates of data samples in a task to make the prediction model fairer for samples at different locations after the gradient updates. &#8226; We propose a three-phase training framework to update parameters of Meta-Ref and its corresponding prediction model using a distribution of spatial tasks.</p><p>&#8226; We experiment with real-world data for satellite-based crop classification and traffic accident risk estimation. Our results on crop monitoring and transportation safety show that Meta-Ref can effectively improve fairness over locations in new test regions while keeping aggregated global performances similar to the baselines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Concepts and Problem Formulation</head><p>The goal of this work is to mitigate locational biases in the prediction results, i.e., to reduce the variation of model performances, or prediction quality disparity <ref type="bibr">(Du et al. 2020)</ref>, over locations in a spatial region. Definition 1 A location i is defined as a specific point or region in the geographical space, with s i representing all data points associated with the location i. A data point can be an one-dimensional vector, a time-series, an image, etc.</p><p>Definition 2 Locational fairness, L f air , measures the locational biases of a deep neural network F using the performance scores M = {m si } for data points from a set of distinct locations S = {s i }:</p><p>where M is the average performance for all data points. Definition 3 A spatial task T S refers to the set of geolocated data points S in a study area of interest, where a deep neural network F with parameters &#920; is learned to make predictions. T S also defines the set of locations where locational fairness is evaluated. Fig. <ref type="figure">1</ref>(a) shows an example of sampling spatial tasks for training and testing, with counties as distinct locations. Fig. <ref type="figure">1(b</ref>) illustrates that a location may be associated with non-time series and/or time series data. A standard machine learning algorithm does not consider locational fairness, potentially resulting in an unfair distribution of prediction quality scores across the spatial task. Conversely, we expect a fairness-driven algorithm to enhance parity among locations in terms of prediction quality scores. Problem Formulation. Given a set of spatial tasks {T S1 , T S2 , ...} with associated features X and labels y from training locations, we aim to train a deep neural network F &#920; (&#8226;) with awareness of locational fairness (Eq. ( <ref type="formula">2</ref>)). The goal is that F &#920; (&#8226;) can be quickly adapted to a new spatial task T S &#8242; from test locations, where T S &#8242; &#8745;{T S1 , T S2 , ...} = &#981;, using only a small amount of test samples X &#8242; and y &#8242; (Fig. <ref type="figure">1(c)</ref>). The adaptation should consider both the prediction and fairness objectives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>In this section, we introduce our locational meta-referee (Meta-Ref). Meta-Ref works in conjunction with any neural network-based prediction models and dynamically assigns learning rates to mini-batches to enforce fairness. Trained using a meta-learning framework, Meta-Ref can adapt to various spatial tasks and can be fine-tuned for unseen ones. We will provide details on its structure as well as its training and transfer strategies in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Non-time series</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Time series</head><p>Use a small set of available data with labels within this for finetuning/adaptation Use a small number of labels available at the beginning timestamps for finetuning/adaptation (b) A location and its data points " # &#8712; $ (c) Adaptation of fairness for an unseen spatial task </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model-Agnostic Meta-Learning</head><p>Model-Agnostic Meta-Learning (MAML) (Finn, Abbeel, and Levine 2017) is a scheme which trains a generalizable model that can be quickly adapted to new tasks through gradient-by-gradient update strategies. It moderates and learns through gradient updates of over a set of tasks and finds a gradient leading to better generalization. The goal of MAML is to learn an initial model from a distribution of tasks such that the initial model can be quickly fine-tuned to the optimal parameters of individual tasks using only a few samples. Given a distribution of tasks {T i |T i &#8764; p(T )}, each gradient update in MAML is given by:</p><p>where &#920; &#8242; i represents temporary parameters of the deep neural network F for a task T i ; &#945; and &#946; are the hyperparameters of the step size of gradient updates; X (i) and y (i)  are the training mini-batches, and X(i) and &#7929;(i) are the validation mini-batches for task T i , respectively.</p><p>This combination of task-specific and global gradient update simulates the scenarios encountered in testing, where we are given one initial model and aim to reach good performance of a new task after updating with a mini-batch. Thus, using the sequence of gradient updates (gradients of gradients), MAML learns a set of parameters that are not necessarily optimal for any given task, but can be quickly adapted to one specific task with a small number of points. In this work, we define the tasks using spatial tasks {T S }, which contain geo-located data points from different spatial regions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A Locational Meta-Referee</head><p>We propose a locational meta-referee (Meta-Ref), which aims to adapt fairness learned from a distribution of spatial tasks {T S1 , T S2 , ...} in the training area to a new spatial task T S &#8242; in the test region. (2) The global performance metrics to benchmark the level of locational fairness and convert absolute performance metrics to relative scores; and (3) The encoding generated by F over data samples. The encoding reflects the characteristics of samples that can better guide the learning rate estimation. For example, a large loss may not always entail a high learning rate as a sample may be a very difficult case, whose loss can hardly be further reduced without causing significant negative impacts on other samples. Denoting the encoding process of the prediction model as F enc , we consider the prediction model as</p><p>yielded by the prediction model on data samples. The performance metrics are further standardized by subtracting the global performance M to obtain relative performances, so that Meta-Ref becomes invariant of the state of the overall performance, making it more transferable. Formally, we represent Meta-Ref as a neural network F M R with parameters W, which outputs a fairness factor &#951; i for each location s i in a spatial task T S using:</p><p>The fairness factors will be translated into learning rates during meta-training, detailed in the next section. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Three-Phase Training of Meta-Ref</head><p>We apply MAML's gradient-by-gradient update strategies to train both Meta-Ref and the prediction model. Below, we present a three-phase framework for the training process.</p><p>Generation of spatial task distribution T (S). Before the start of the three-phase training, it is important to first generate a diverse distribution of spatial tasks so that we can learn a more transferable initial set of parameters for the prediction model F in conjunction with Meta-Ref. We combine two strategies to generate spatial tasks for the distribution T (S):</p><p>(1) Locations are grouped according to administrative boundaries (e.g., cities, states), zones, or their attributes (e.g., precipitation, altitude) to create spatial tasks. To enlarge the variety of spatial tasks, the elements of two spatial tasks do not need to be disjoint, so there's a possibility that &#8707;S i , S j &#8712; {S} : S i &#8745; S j &#824; = &#8709;.</p><p>(2) To create spatial tasks that are more distinctive from regular ones from the training area, we further include spatial tasks whose locations are randomly sampled from the entire training area. </p><p>i ) at location s</p><p>in the spatial task T Sj , we assess the prediction loss as l</p><p>We apply the same procedure on all locations in a spatial task T Sj , getting an losses and metrics, L (j) and M (j) :</p><p>With loss values calculated for all locations, we compute gradients of each location as g</p><p>i . Additionally, we evaluate the overall performance for all training data points regardless of their location origin (X, y):</p><p>Phase 2: Fairness-aware learning rate estimation. Using outputs L (j) , M (j) , g</p><p>i and M from Phase 1, Meta-Ref dynamically adjusts the step sizes of gradients associated with data points at different locations in a spatial task:</p><p>i represents the learning rate of data points in location s (j) i , assigned by Meta-Ref, and &#920; &#8242;(j) is the temporary parameters of F through updates on data points of all locations. Different from MAML (Eq. ( <ref type="formula">2</ref>)), step sizes are no longer identical over all data points in a mini-batch; instead, they become dependent on the locations and spatial tasks.</p><p>Step sizes are assigned via translating Meta-Ref-generated fairness factors N (j) = &#951; (j) i i &#8712; [1, |S j |] with Eq. ( <ref type="formula">4</ref>). Specifically, we perform the translation by standardizing the fairness factors from all locations within each spatial task into a stationary range to improve the stability of training:</p><p>where &#951;</p><p>i -M ; and &#946; + and &#946; -represent the upper and lower limits of step size of the gradient update, respectively.</p><p>To stabilize training at the early stage, we constrain the variance of learning rates across data samples from different locations, var &#946;</p><p>, by adjusting &#946; + and &#946; -. Given a baseline learning rate &#946; 0 and a scaling factor &#961;, we set the upper and lower bounds of &#946; (j) i at iteration t as &#946; + = 1 1+e -t/&#961; &#8226; &#946; 0 and &#946; -= e -t/&#961; 1+e -t/&#961; &#8226; &#946; 0 . The gap between these bounds expands gradually through a sigmoidshaped curve as training progresses, so after the early training stage learning rates approach a constant range, allowing more flexibility to improve fairness.</p><p>The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)</p><p>Finally, the location-dependent learning rates are applied via Eq. ( <ref type="formula">7</ref>) for gradient updates on the temporary prediction model parameters within the inner meta-training loop.</p><p>Phase 3: Dual meta-updates. In this dual meta-update phase, we consider both the prediction performance and the locational fairness to make final gradient updates. Specifically, we use two different losses on the validation data for a spatial task T Sj , prediction loss L(j) and locational fairness loss L(j) f air , to meta-update the parameters of the prediction model and Meta-Ref. The prediction loss measures the collective performance of temporarily updated prediction model parameters &#951;(j) i , and the locational fairness loss reflects the effectiveness on the coordination of Meta-Ref and the prediction model. We compute two losses using:</p><p>, &#7929;(j) ) are validation data from the whole S j ; and ( X(j) i , &#7929;(j) i ) are validation mini-batch sampled from location s i .</p><p>For the prediction loss L(j) , we use its gradients to update only the prediction model. For L(j) f air , we use its gradients to update both the prediction model and Meta-Ref. In this way, Meta-Ref focuses on the fairness side, and it coordinates with the prediction model to address both prediction performance and fairness:</p><p>where &#945; 1 , &#945; 2 , and &#945; 3 are learning rates set for the three meta-update operations, respectively. Through Eqs. (10 -14), we can see that each of the three-way involves gradients over gradients (since &#920; &#8242;(j) remains in expanded forms of Eqs. <ref type="bibr">(12 -14)</ref>). This makes the gradient updates consider the contribution of each location in a spatial task, and mimic the actual fine-tuning process during testing. Algorithm 1 summarizes the procedure of the threephase training of Meta-Ref.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fine-Tuning on Test Region</head><p>Meta-trained parameters of the prediction model are expected to demonstrate good generalizability but not necessarily optimal to any tasks. Though, we do not fine-tune Meta-Ref, considering that it is not directly related to the prediction performance, while also avoiding overfitting.</p><p>Given a spatial task in the test area T S &#8242; , where T S &#8242; &#8745; {T S1 , T S2 , ...} = &#981;, we fine-tune the test data X &#8242; and y &#8242; in a slightly different fashion than three-phase meta-updates. The fine tuning has two phases: 1) Prediction performance estimation, and 2) Meta-Ref-guided optimization.</p><p>Algorithm 1: Three-Phase Training of Meta-Ref Require: p(T S ): distribution of spatial tasks Parameters: &#945; 1 , &#945; 2 , &#945; 3 , &#946; 0 , &#961; 1: sample batch of spatial tasks T = {TS j &#8764; p(T )} 2: for all TS j &#8712; T do 3:</p><p>[Phase 1] 4:</p><p>for all s (j) i &#8712; Sj do 5:</p><p>Evaluate local prediction loss l</p><p>end for 7:</p><p>Evaluate global prediction loss M = M (X, F &#920; , y) 8:</p><p>[Phase 2] 9:</p><p>Assign step size &#946; (j) i</p><p>with Eqs. (8 -9) for s</p><p>[Eq. 7] 11:</p><p>[Phase 3] 12:</p><p>Evaluate L(j) with Eq. ( <ref type="formula">10</ref>) 13:</p><p>Evaluate L(j) f air with Eq. ( <ref type="formula">11</ref>) 14:</p><p>Update</p><p>f air</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>17: end for</head><p>Phase 1 of fine-tuning is similar to the Phase 1 of training Meta-Ref on a spatial task, with the differences being that the input data are from different regions, and that the overall performance metrics M remains being calculated from the training data. Formally, we evaluate the local prediction loss on every location</p><p>Phase 2 follows with the assignment of learning rates of all locations within this spatial task, following Eqs. (8 -9) using Meta-Ref, producing &#946; &#8242; i for each location in the test region. Then we optimize &#920; with</p><p>This two-phase fine-tuning effectively simulates the behavior of Eq. ( <ref type="formula">7</ref>) where we update the prediction model with Meta-Ref-assigned learning rates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiments Case Study Datasets</head><p>Satellite-based crop classification: Crop mapping is important for various downstream tasks including acreage estimation, subsidy distribution and farm insurance. Our study area is a &#8764;6700 km 2 region in Central Valley, California, which is a major region in the US for walnut plantation. The satellite imagery we use is from Sentinel-2 multispectral data. As the 10 spectral bands we use from Sentinel-2 have different spatial resolution (e.g., 10m, 20m), we sample all bands to 20m (with an image tile size of 4096 &#215; 4096), a common choice in applications. The image tile was captured in August 2018. The labels are from the USDA Crop Data Layer (CDL) (CDL 2017). For walnut plantation mapping, we preprocess the labels into binary walnut and non-walnut classes. Since the fairness paradigm is based on prediction quality parity, fairness calculation needs the performances from the locations as inputs. For classification, this requires a certain level of aggregation (e.g., F1 or accuracy is not meaningful for an individual point). In our experiment, locations are thus represented by 128&#215;128 non-overlapping local patches from the image tile instead of individual pixels. We use a 50% by 50% train-test split for the locations. In a test location, 5% of randomly sampled data points are used for fine-tuning. Each spatial task S i is randomly sampled from training or test locations covered by a random 1280 &#215; 1280 window (a &#8764;25km&#215;25km region), with the number of locations ranging from 10 to 15 per spatial task. Traffic accident risk estimation: Location-based biases in transportation safety estimation can further lead to biases for investment distribution for infrastructure improvements. We use the Iowa traffic accident record dataset shared by <ref type="bibr">(An et al. 2022b)</ref>, which contains 3 years of traffic accident records and 47 related factors. The dataset has a daily temporal resolution and was spatially aggregated into grid cells, and the total grid size is 64 &#215; 128 <ref type="bibr">(An et al. 2022b)</ref>. We use each cell as a location for this regression problem. We partition the dataset into 8-week moving windows, where we use factors in the first 7 weeks to predict the average daily count of accidents in the eighth week. Similarly, we use a 50% by 50% train-test split for the locations. In a test location, only the first 5% of moving windows are used for fine-tuning. Each spatial task is randomly sampled from training or testing locations in a randomly-selected 32 &#215; 32 window, with 10 to 15 locations per spatial task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods for Comparison</head><p>We evaluate the following methods in terms of prediction performance and, particularly, locational fairness: ( <ref type="formula">1</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Evaluation Metrics</head><p>To evaluate the prediction quality of a test spatial task T S &#8242; from crop classification, we use F1-score p i to account for the prediction quality for data points at location s i &#8712; S &#8242; . For a regression task in traffic accident risk estimation, we adopt the root mean squared error (RMSE) for p i . We assess the locational fairness (LF) on the test spatial task from the evaluation metrics at each location by taking their standard deviation. Some methods producing poor prediction quality might achieve better locational fairness on some spatial tasks. Therefore, we also include an adjusted locational fairness (ALF) metric to account for differences in prediction quality when evaluating fairness. Instead of using the mean prediction performance from the current method to calculate the standard deviation, we use the best prediction performance for this task among all methods as the reference mean, denoted as p * . Then the adjusted fairness score for a spatial task is defined as the average deviation of the prediction quality {p i } from this reference p * :</p><p>By setting the mean to the best prediction performance among all methods, methods that trade performance for fairness (i.e., low variance among {p i } but higher distance to p * ) will be penalized and produce worse scores in ALF.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>Training: We sample 1000 spatial tasks from training locations of each dataset. Then we sample another 90 spatial tasks from testing locations from each dataset and split them into three folds for testing. All methods are fine-tuned in each of the 90 spatial tasks in test regions for both datasets.</p><p>Comparison to baselines: Table <ref type="table">1</ref> and<ref type="table">2</ref>    In Fig. <ref type="figure">3</ref> and also in the technical appendix, we demonstrate pairwise comparison matrices, where each element indicates the number of spatial tasks where the row method has lower fairness metrics (LF or ALF) than column method. It shows that Meta-Ref maintain better fairness on most spatial tasks compared to baselines. In addition, as shown in Fig. <ref type="figure">4</ref>, Meta-Ref has demonstrated fairer predictions than MAML on most tasks, confirming its effectiveness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sensitivity analysis:</head><p>The training of Meta-Ref relies on three outer gradient updates <ref type="bibr">(Eqs. (12,</ref><ref type="bibr">13,</ref><ref type="bibr">14)</ref>) to coordinate the performance loss and fairness loss with their impacts on prediction and meta-referee. To demonstrate the effectiveness of three outer gradient updates in Meta-Ref, we further conduct an ablation study with the following models:</p><p>(1) MR-P2P: Meta-Ref without applying performance gradient to the prediction model (P2P, Eq. ( <ref type="formula">12</ref>)); (2) MR-F2M: Meta-Ref without applying fairness gradient to the metareferee (F2M, Eq. ( <ref type="formula">13</ref>)); and (3) MR-F2P: Meta-Ref without applying fairness gradient to the prediction model (F2P, Eq. ( <ref type="formula">14</ref>)). </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>The Thirty-Eighth AAAI Conference on Artificial Intelligence </p></note>
		</body>
		</text>
</TEI>
