<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>RLIFE: Remaining Lifespan Prediction for E-scooters</title></titleStmt>
			<publicationStmt>
				<publisher>ACM</publisher>
				<date>10/21/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10560392</idno>
					<idno type="doi"></idno>
					
					<author>S Zhong</author><author>W Yubeaton</author><author>W Lyu</author><author>G Wang</author><author>D Zhang</author><author>Y Yang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Shared electric scooters (e-scooters) have beenincreasingly popular because of their characteristics of convenience and eco-friendliness. Due to their shared natureand widespread usage, e-scooters usually have a short lifespan (e.g., two to five months [2]), which makes it important to predict the remaining lifespan accurately, ensuring timely replacements. While several studies have focused on the lifespan prediction of various systems, such as batteries and bridges, they present a two-fold drawback. Firstly, they require significant manual labor or additional sensor resources to ascertain the explicit status of the object, rendering them cost-ineffective. Secondly, these studies assume that future usage is similar to historical usage. To solve these limitations, we aim at accurately predicting the remaining lifespan of e-scooters without extra cost, and its essence is to accurately represent its current status and anticipate its future usage. However, it is challenging because: i) lack of explicit rules for the e-scooters' status representation; and ii) e-scooters' future usage may significantly differ from their historical usage. In this paper, we design a framework called RUFE, whose key insight is modeling user behaviors from trip transactions is of great importance in predicting the B:emaining Lifespan of shared §_-scooters. Specifically, we introduce an unsupervised contrastive learning component to learn the e-scooters' status representation over time considering degradation, where user preferences are served as a status reflector; We further design an LSTM-based recursive component to dynamically predict uncertain future usage, upon which we fuse the current status and predicted usage of the e-scooter for its remaining lifespan prediction. Extensive experiments are conducted on large-scale, real-world datasets collected from an e-scooter company. It shows that RUFE improves the baselines by 35.67%and benefits from the learned user preferences and predicted future usage.
CCS CONCEPTS• Information systems-+ Data mining.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Shared electrical micromobility have become increasingly popular in recent years. Let take e-scooters as a concrete example. Lime <ref type="bibr">[4]</ref> served more than 55 million customers in 2021 and is projected to serve 124.8 million users in 2026 <ref type="bibr">[1]</ref>. Compared with traditional human-powered bikes, e-scooters provide a faster and easier way to solve the first and last-mile problem during commuting, using battery-powered motors with speeds of up to 50km per hour <ref type="bibr">[3]</ref>. Due to their shared nature and widespread usage, e-scooters typically suffer from a short lifespan, ranging from two to five months <ref type="bibr">[2]</ref>. Such a short lifespan makes it necessary to maintain or replace e-scooters timely in order to ensure a positive customer experience and prevent potential safety hazards before they become unserviceable. To this end, it is important to predict the remaining lifespan of e-scooters accurately.</p><p>To date, the remaining lifespan prediction problem has been studied in manysystems, e.g., rail infrastructures <ref type="bibr">(22]</ref>, batteries <ref type="bibr">[9]</ref>, and bridges <ref type="bibr">[23]</ref>. Existing works heavily rely on deploying dedicated sensors to collect explicit status indicators, e.g., state of health (SOH) in batteries <ref type="bibr">[9]</ref>. Based on the sequentially collected data, <ref type="bibr">(9,</ref><ref type="bibr">38,</ref><ref type="bibr">39]</ref> leverage neural networks(e.g., RNN, LSTM), to learn the non-linear degradation curve for the measured target, e.g., battery life curves <ref type="bibr">[9]</ref>. However, those frameworks cannot be applied in our scenario directly, because: 1) the learned life curve typically works in ideal environments without considering uncertain noise; 2)e-scooters are sophisticated machines with multiple components (i.e., wheels, batteries, etc.) and different kinds of sensorsare needed for status monitoring. Sensor deployment requires significant labor efforts and expensive fees, rendering it cost-effective. The limitations of the existing works motivate us to answer a research question: can we predict the remaining lifespan of shared e-scooters without additional dedicated sensor deployment?</p><p>In this work, we collaborate with an e-scooter company to learn the degradation process of e-scooters in a data-driven manner. This collaboration offers us the opportunity to predict the remaining lifespan based on large-scale operational data without extra labor or sensor deployment. Through detailed data analysis, we found that it is important to consider both e-scooters' current status and predicted future usage in lifespan prediction (supported by <ref type="bibr">Figures 2,</ref><ref type="bibr">3,</ref><ref type="bibr">4)</ref>. Though it sounds straightforward, there are two challenges:</p><p>&#8226; Lack of explicit rules for e-scooters' status representation and lack of explicit correlations between status and remaining lifespan. One straightforward approach is to leverage the served distance to estimate the status of e-scooters. Intuitively, a longer served distance leads to more significant wear and tear, consequently leading to a shortened lifespan. However, we found the correlation coefficient between the served distance and corresponding remaining lifespan is only 0.6302 (as depicted in Section 2). This relatively modest correlation is because of the fact that the longevity of e-scooters is not only affected by the used distance, but affected by other non-observable factors, e.g., weather conditions, riding habits, and accidents <ref type="bibr">[2]</ref>. &#8226; The future usage of e-scooters deviates considerably from their historical patterns. Specifically, the daily trip distance decreases as the "age" of e-scooters increases (as shown in Section 2). For example, the average daily tripdistance during the first 10% of the lifespan is 14.3%more than that in the last 10%.</p><p>To solve these challenges, we design a framework called RUFE to predict the Remaining LIFespan of sharedg-scooters. The key insight is that modeling user behaviors from trip transactions is of great importance in remaining lifespan prediction (detailed in Sec. 2). The rationale behind this insight is two-fold: i) the user behavior patterns indirectly reflect e-scooter status; and ii) user behavior trends can also provide valuable insights intofuture e-scooter usage. For instance, when faced with multiple nearby e-scooters, users typically opt for those in better condition, such as those without broken parts or with a pristine appearance. Furthermore, frequent usage is associated with accelerated wear and tear, ultimately resulting in a shortened lifespan. Drawing from this insight, we have devised two main components for our approach, including (i) selfsupervised e-scooters status representation learning, and (ii) user preference evolution prediction. In component (i), we design a un-supervised contrastive learning, which learns thee-scooter's degradation status representation over time, where the trip records and user preferences are served as the direct and indirect reflectors, respectively. For component (ii), we train a recursive layer to project the user preferences after /1days. Finally, we fuse the learned current status and /1-day user preference to estimate the future status. Byvarying the value of /1, we can estimate the future status of the e-scooter. Once the estimated status indicates that the end of life is approaching, we consider /1 as the remaining lifespan starting from the present moment. The key contributions of this paper are summarized as follows:</p><p>&#8226; We for the first time study the remaining lifespan prediction problem for e-scooters without extra dedicated sensors. It in-</p><p>&#8226; We highlight the importance of modeling user preferences from transactions in the status learning and usage estimation. Specifically, we design an unsupervised contrastive learning framework to discriminate the lifespan status representation without annotations and a recursive layer to predict the dynamic future usage in a given /1-day. &#8226; Weevaluate RU FE with 9-month data collected from an e-scooter company in two cities. The results show RUFE improves prediction accuracy by 35.67% and 29.81% compared with the SoA methods in the two cities. The code and the data are available<ref type="foot">foot_0</ref> .</p><p>The rest of the paper is organized as follows. In Section 2, we introduce the data sets, analyze the challenges and the key insight, and provide the formal definition of this problem. We show the technical design in Section 3, including the overview of RUFE, and the detailed design. In Section 4, we evaluate the performance of RUFE to show the effectiveness compared with baselines. We provide related works in Section 5. Finally, we discuss the lesson learned, the limitations, futureworksand privacy issues in Section 6 and conclude the paper in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BACKGROUND AND MOTIVATION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Data</head><p>In this work, we mainly use two datasets, including an e-scooter trip record dataset and a weather dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.1">E-scooter Dataset.</head><p>By collaborating with an e-scooter company , one of the major shared e-scooter service providers, we have access to real-world datasets in two cities of New Jersey, USA:</p><p>&#8226; In New Brunswick, the data is collected with 1, 179 e-scooters and 118,609 trips in 9 months from April to December in 2021; &#8226; In Newark, the data is collected with 639 e-scooters and 50,631 trips in 4 months from August to December in 2021.</p><p>Each trip record captures data from the point a user picks up an e-scooter until the point the user drops it off, including vehicle ID, trip start and end time, and trip routes (i.e., GPS traces). All the data are obtained legally under the users' consents [6).The detailed data format is listed in Table <ref type="table">1</ref>.</p><p>Table 1:Trip Record Format and Example Field I Value Vehicle ID 50109575 Trip ID d0980blf-59af-5944-980c-4ebb5336fdbe Trip duration 211 Trip distance 232 Start time August 7, 2021 8:57:29 PM End time August 7, 2021 8:59:49 PM Routes August 7, 2021 8:57:30 PM, [-74.448150, 40.499419), August 7, 2021 8:57:32 PM, [-74.448144, 40.499345), ... corporates the current status representation learning and future usage estimation from operational data. l' 2.1.2 Weather Datatset. The weather condition datawerecollected from 2, 400 stations in National Oceanic and Atmospheric Administration (NOAA) <ref type="bibr">[5]</ref>. We utilize weather data from April 2021 to December 2021, including temperature, relative humidity, precipitation, wind speed and direction, visibility, atmospheric pressure, and duration of different weather types (e.g., rain, snow, etc.).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Problem Formulation</head><p>Suppose an e-scooter has been in service for t days, generating time-ordered trip records denoted as Rt = [r1, rz,... ,rt], where r; E JR_Nr and N, is the dimension of record.</p><p>Given those records, we aim to predict the remaining lifespan of this e-scooter. Formally, it is defined as:</p><p>where 6 is the number of days, and F is the function that returns the probability of the e-scooter still in service in 6 days, Fth is a given probability threshold.</p><p>Because the remaining lifespan is affected by its current degradation status and future usage, Equation 1 is further extended to: remaining lifespan= max6IF(fd(Rt),Js(Rt, 6))</p><p>where fd(Rt) returns the status representation at t, fs(Rt, 6) returns future usage during t tot+ 6.</p><p>r:1'r:, ...,r:1"'r:, ,sr"r:, 200 400 600 800 I000</p><p>with the increase of thee-scooters' "age&#8226;, which validates the future usage is different from historical usage.</p><p>Figure <ref type="figure">3</ref>: Selection Probabil-Figure <ref type="figure">4</ref>: Selection Probability vs. Remaining Lifespan ity in Different Percentages of (days). Lifespan.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Two Challenges</head><p>Even though the idea sounds straightforward, there are still two challenges, including a lack of explicit status representation and uncertain future usage.We perform data analysis to show the above two challenges as follows.</p><p>&#8226; Lack of explicit rules for e-scooters' status representation. Different from previous works <ref type="bibr">[9,</ref><ref type="bibr">29,</ref><ref type="bibr">30,</ref><ref type="bibr">39)</ref> that deployed sensors to monitor the operation status of machines, we lack the explicit factors to directly evaluate the e-scooters' status and the explicit relationships between status and remaining lifespan. The simplest way is that we can leverage the total served distance to reflect the remaining lifespan. Intuitively, a longer served distance may indicate a shorter remaining lifespan. However, when we investigate the correlation coefficient between the served distance and remaining lifespan (as shown in Figure <ref type="figure">1</ref>), where each point is an e-scooter. We found that the coefficient between Served Distance (km)</p><p>Percentage of Lifespan the total served distance and remaining lifespan is only 0.6302, </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Key Insight</head><p>Our system RUFE is based on one key insight: modeling user behaviors from trip transactions is of great importance in reflecting cun-ent status and future usage for remaining lifespan prediction. The rationale behinds it: i) the current user behavior patterns indirectly reflect e-scooter status; and ii) user behavior trends can also offer valuable insights into future e-scooter's usage. To visualize it, we quantify the user preference as selection probability, which is calculated as the total selected times over the total available times (as in Sec. 3.2). For example, if an e-scooter is available for 10 trips in a time period (e.g., within 10 meters to the start locations of these 10 trips) and it is selected twice, then the selection probability of this e-scooter is 0.2. Figure <ref type="figure">3</ref> shows that the e-scooters with long remaining days have a higher selection probability, which proofs that user behavior reflects thee-scooter's status. Figure <ref type="figure">2</ref> and Figure <ref type="figure">4</ref> shows that the selection probability and daily usage decreases which means the e-scooters with the same served distance may have significantly different remaining lifespans. In reality, the longevity of e-scooters is simultaneously affected by multiple factors, e.g., weather, riding habits, and accidents, which are nonobservable sometimes <ref type="bibr">[2]</ref>.Thus, it is inaccurate to directly use one single explicit data, e.g., the total served distance or duration, to represent the status of e-scooters.</p><p>&#8226; E-scooters' future usages are significantly different from the historical usages. As shown in Figure <ref type="figure">2</ref>, the average trip distance continuously decreases as their "age&#8226; increases, which indicates a shift in usage patterns overtime. Typically, we analyze the usage of e-scooters(i.e., average daily trip distance) during their different lifespan stages(i.e., from the first 10%to the last 10%). Future usage is one of the factors that affect the remaining lifespan. In this case, the remaining lifespan prediction works that do not explicitly consider the future usage patterns <ref type="bibr">[22,</ref><ref type="bibr">35]</ref> cannot achieve satisfying performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">Motivation</head><p>Why do we choose contrastive learning? Tobetter illustrate our motivation, we first introduce contrastive learning. It is an unsupervised framework that learns the general feature representations <ref type="figure">------------------------------------</ref>from input data without explicit labels or categories. By comparing similar and dissimilar data pairs, the method can differentiate the two data types from the representations it learns. Typically, data augmentation techniques are designed and applied to generate similar data pairs, while other data points are treated as dissimilar pairs.</p><p>- <ref type="figure">---------------------</ref>, : Data( &#167;LIA)I Trip Records ,;tier I Road Network 1 : Pre-processing Data Trip User Preference : ( &#167;111.B) A_ e ation Features Features :</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>r--------------------, -------------</head><p>In our research, as there are no clear indicators to represent the status of e-scooters, we utilize contrastive learning to investigate</p><p>possible representations of thee-scooters' status. Using it, we aim that distinguish between them. Our assumption is that e-scooters that have similar trip records, such as similar locations and weather conditions, should have similar statuses. To take advantage of this, we use contrastive learning to optimize the alignment of the representations of e-scooters' status with similar trip records without the need for human annotations.</p><p>The key technical improvement. Asdescribed in literature <ref type="bibr">[7, 11-13, 17, 18, 36]</ref>, data augmentation is a critical element in contrastive learning. It plays an important role in creating semantically similar pairs of e-scooters' records, whichin turn affects the quality of the learned representations of their status. However, traditional augmentation methods such as rotation or cropping, which are suitable for time-invariant data such as images or graphs, do not take into account temporal correlations and are not appropriate for sequential trip records. This highlights the need for specialized and tailored data augmentation techniques for our sequential data. To address these limitations, we have developed three specialized data augmentation techniques that takeintoaccount the time, geographical, and usage aspects of e-scooters' trip record simultaneously. These methods are called record masking, record shifting, and trip drifting, and they will be described in more detail in Section 3.3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">DESIGN OF RUFE</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Overall Architecture</head><p>Fig. <ref type="figure">5</ref> shows the overall architecture of RUFE including Pre-processing, Status Representation Learning, Future Usage Prediction, and Remaining Lifespan Prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1">Pre-processing.</head><p>We extract features based on the aggregation of trip records, weather, and road networks. Specifically, we extract trip features (e.g., distance and duration) and user preference features (e.g., selection probability), and trip intervals for the status representation learning and future usage prediction, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2">Status Representation</head><p>Learning. We process sequential trip features as explicit observation and user preference as an implicit reflection to learn e-scooters' current status representation. Specifically, we leverage a self-supervised contrastive learning component to discriminate the lifespan status representation. Different from traditional contrastive learning, we mainly have two improvements: i) the augmentation method perturbs the input in three aspects, i.e., time, geographical, and usage domain, simultaneously;ii) the similarity is guided by both degradation statusand user preferences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3">Future Usage Prediction.</head><p>We model the dynamic evolution of user preference, which predicts the future embedding trend of user preference. This is done by leveraging an attention-based layer to project the embedding of user preference after a time lapse I'.. The projected embedding is used for downstream tasks, i.e., predicting the future usage at a given query time I'..</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.4">Remaining Lifespan Prediction.</head><p>We formulate the remaining lifespan prediction as a query task whose inputs are the e-scooters' current status and predicted future usage during time [t, t+I'.]. The output is the probability that an e-scooter is still in service after time I'.. By computing the queries for multiple I'., we can derive the distribution of the probability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Pre-processing</head><p>We mainly clean the raw data, e.g., outliers removal, and extract features from them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Data Cleaning.</head><p>Since data collected from real-world sources maycontain noise, we eliminate triprecords with improbable speed and distancevalues. For example, e-scooters havea maximum speed of 30 mph <ref type="bibr">[3]</ref>. So we remove the trips with an average speed of over 30 mph. Further, we identify and remove a set of trips with a distance over 25 miles that are abnormal in our dataset considering the service areas in the city.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Features Extraction.</head><p>We first map the GPS points on the road network and obtain the sequence of passed regions. We then aggregate the records with weather information and derive the features from two aspects, i.e., trip features and user preference features.</p><p>&#8226; Trip Features. represent the trip features within one trip, including its start time and end time, origin and destination, trip = 1:, 1 c = -log duration and distance, passed regions, and current weather situation (e.g., temperature, relative humidity, wind, etc.).</p><p>&#8226; User Preference Features. represent the trip features between consecutive trips, i.e., selection probability and idle intervals. Specifically, the selection probability p for an e-scooter is calculated by p =N where N is the total potential trips for this e-scooter (i.e., this e-scooter is within a certain distance of the trip origin and can be potentially selected by users), n is the number of actually selected times. The idle intervals of an e-scooter are the interval between two consecutive trips, which also reflects its popularity. Record Masking. Intuitively, the status of an e-scooter should be more similar to itself than to others, even though its historical usage is slightly adjusted. To reflect this, we disturb input data by selectively masking (deleting) the trip records for certain days.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">E-scooter Status Representation Learning</head><p>Record Shifting. Similar trip records, e.g., used distance, frequencies, and regions, probably have a similar impact on status representation. Thus, we provide a record-shifting method that augments 3.3.2 Contrastive Learning. Fig. <ref type="figure">6</ref> shows the architecture of status representation learning. For each e-scooter x; and its augmented sample Xj, we first embed them using an LSTM to generate trip embeddings. Then the embeddings are fed into an encoder Je(-) (or J;(-)) to get the corresponding degradation status d; (or dj)-We adopt an MLP as the encoder. Following the design in <ref type="bibr">[18]</ref>, we call.fe(&#8226;) the query encoder andf;(-) themomentum encoder respectively. The degradation status representation d; and dj are extracted as:</p><p>where x; and Xj are the input trip features.</p><p>Instead of directly calculating the similarity of the learned status representations, we introduce two projectors to project the status representations to different spaces. One is for the status comparison of similar/dissimilar e-scooters (i.e., self with its augmented positive samples and negative samples) and the other is for the user preference estimation. The motivation is that different views of the learned statusrepresentations in different spaces make it robust for different tasks.</p><p>Status comparison. We leverage an MLP as projector for status comparison and the process is formulated as: <ref type="bibr">(4)</ref> where a is a ReLU non-linearity. After obtaining the output, we apply a loss function, following the form oflnfoNCE <ref type="bibr">[27]</ref>, where one e-scooter is encouraged to be close to those with similar experiences. the data by shifting the trip records in the time domain. Specifically, this method involves selecting the trip records at random and shifting them to the neighboring days. exp(l-; l-1/:r) , exp(l; &#8226; Ij /r) + I,1 1 exp(l; &#8226; t; /r) <ref type="bibr">(5)</ref> Trip Drifting. As we derive the geographical information from GPS sensors which may naturally have noise and drifting, such drifting should not impact our results too much. Therefore, we introduce a trip drifting augmentation, i.e., randomly disturbing the trip's passed region to some neighboring regions.</p><p>In our work, for each e-scooter, we treat the augmented trip records from the same e-scooter as the positive samples and the trip records from other e-scooters in the batch as negative samples. <ref type="figure">r ----------------------------1</ref> where lj is known as l;'s positive sample and the I"j is regarded as l;'s negative sampler. is a temperature hyper-parameter for I; and lj with l2 normalization <ref type="bibr">[34]</ref>.</p><p>User preference estimation. Meanwhile, we use user preference estimation to guide the status representation learning. We apply another projector to map the status representation d; to the user preference space asp;. We utilize an MSE loss function to calculate the loss between the projected user preference p; and the groundtruth user preference P(-Formally, the user preference estimation loss is defined as</p><p>Figure <ref type="figure">6</ref>: Contrastive Status Representation Learning. Based on trip records, each e-scooter is fed into an LSTM to generate the trip embedding. Then we encode the trip embeddings to get the corresponding degradation status andhistorical usage status. Wethen use a predictor to estimate the user preference based on historical usage status. Combining the user preference and degradation status, we project the e-scooter status which is then used to compare similar/dissimilar pairs. Finally, we combine the contrastive loss function and the user preference estimation loss function as the total loss of our contrastive status representation learning. Formally, the total loss is defined as <ref type="bibr">(7)</ref> where w1 is a learnable weight.</p><p>Momentum Update.After computing the total loss, we conduct back-propagation and update the parameters of the momentum encoder following the momentum update <ref type="bibr">[18]</ref>. Specifically, given the momentum m, we update the J: by the following equation:</p><p>G Weather Trajecto x::r th fu A &#8226; .trip .trip .trip ' I'</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Future Usage Prediction</head><p>After learning the status representation, the next step is to predict the future usage of the e-scooters. Traditional ways to predict future usage generally purely rely on the historical usage <ref type="bibr">[22]</ref>, while ignoring the degradation of the e-scooters. It makes the prediction sub-optimal because the usage would decrease with the gradual degradation of thee-scooters. In our work, we introduce the dynamically changed user preference (i.e., predicted future user preferences) in the prediction to represent the impacts of the degradation of the e-scooters on future usage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.1">Single-step Prediction.</head><p>We consider the current e-scooter status and the user preference evolving process in the future usage prediction. As the user preference is influenced by the degradation status, we incorporate learned status representation to predict the future user preference and its influence on future usage.</p><p>Wefirst apply an LSTM-based feature extractor to extract the trip features x:rip, which will be put into the encoder for the current status dt generation. We then put the dt into the user-preference projector to estimate the user preference pf. The estimated user preference will be combined with the trip feature hidden states to predict future usage at time t + 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ht = LSTM(x:rip)</head><p>Considering a relatively small search space of!',., we simply iterate all the possible I',. in a certain range (e.g., historically maximum lifespan of all the e-scooters) to obtain the optimal remaining lifespan.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">EXPERIMENTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Evaluation Settings</head><p>4.1.7 Baselines. We start this subsection by describing the baselines for comparison, followed by evaluation metrics. Then we summarize the implementation details. We include the following eight benchmark methods for evaluation, each of which serves as a representative framework for predicting the remaining lifespan of the e-scooter.</p><p>&#8226; Historical Average (HA): We calculate the average length of lifespan for all the e-scooters and obtain the remaining lifespan of each e-scooter by subtracting the duration of service. &#8226; XGBoost <ref type="bibr">[10]</ref>: It is a boosting tree-based method that achieved outstanding performance in many prediction tasks. In our implementation, the input is the trip features, and the output is the remaining days. &#8226; LSlM <ref type="bibr">[38]</ref>:The LongShort-Term Memory Network is a suitable model for sequential data learning, i.e., sensors in manufacturing machines. The input of our baseline is the trip features in Pt = g'(fe(ht))</p><p>x::r= LSTM(ht EB Pt) <ref type="bibr">(9)</ref> sequences, and the output is the same as that in XGBoost.</p><p>&#8226; TCN [19]: It is for rolling bearing remaining lifespan prediction. The input and output of the temporal convolutional network 3.4.2 Multi-step Prediction. Given the output from the single-step prediction, we further design a recursive way for multi-step prediction. Similar to the single-step prediction, we first generate by Equati &#8226; on (9) . Th en, we trea . t t x r t + i p I as th e . mput to generate .t x r t+ ip z m &#8226; the same way. Given a future time slot parameter I',., we can predict . th " II . l e tureusagem e10 owmg'--'timesotsxt+l ,xt+z , ... ,xt+fl.'</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.3">Multi-step Fusion.</head><p>After predicting the future usage of the e-scooter in the following I',.time slots, we fuse the multi-step future usage to represent the future usage for this e-scooter. We apply an MLP network where the input is the predicted future usages x .t t r + ip , x .t t r + ip , ... ,x .t t r + i l p l. an d t h e output 1 . s h t e fu se d fu ture usage " 1ea-(TCN) are the same as that of LSTM. The difference between LSTM and TCN is that LSTM emphasizes long-term and shortterm influences while TCN focuses on the neighboring influences determined by kernel sizek. We set k to 5.</p><p>&#8226; Linear Regression <ref type="bibr">[26]</ref>: It is a straightforwardapproach that uses a linear function to model the correlation between the input and the output. The input is the aggregated trip records, which is the same input used in the XGBoost method. &#8226; Auto-encoder <ref type="bibr">[25]</ref>: Auto-encoder uses the encoder-decoder framework with multiple-layer neural networks for the bearing lifespanprediction. It takesin the sameinput data as XGBoost and captures the complex, non-linear relationships for more accurate</p><p>predictions. ture at the following I',. days as follows:</p><p>.trip _ .trip .trip .trip <ref type="bibr">(10)</ref> &#8226; Belief Network [21]: It is a model for the machine's remaining lifespan prediction. It consists of multiple stacked restricted Boltzmarm machines for greedy layer-by-layer training. Its input is the same as that of the XGBoost model where xll. -{xt+l ,xt+z , ... ,xt+ll. }, wu are learnable parameters, and is the predicted future usage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Remaining Life Prediction</head><p>After learning the status representationand predicting the future usage, we predict the remaininglife.Different from general machine learning tasks that directly output lifespan, we design a query scheme to output the probability of the predicted lifespan given I',._ In this way, we can introduce negative samples such as a very large lifespan but with a probability of 0. Formally, the remaining life prediction is defined as:</p><p>remaining lifespan= max l',.IF(dt, ull.,!',.) &lt;". Fih <ref type="bibr">(11)</ref> &#8226; AdaCare <ref type="bibr">[20]</ref>:The model is a general health-status representation learning model. It first adopts dilated convolutional layers as short, medium, and long-term convolutional layers for various time scales, where the kernel size k is set to 1, 2, and 3, respectively. Then, it adopts two fully-connected layers to learn the nonlinear dependencies between features explicitly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.7.2">Metrics.</head><p>We introduce three metrics to evaluate the prediction performance, i.e., Mean Absolute Errors (MAE), Root Mean Squared Errors(RMSE), and Mean Absolute Percentage Error (MAPE).</p><p>In particular, we use a day as the unit of the lifespan, which is consistent with the minimum operational intervals, such as daily rebalancing or charging. Future usage prediction. In order to dynamically explore future usage, we formulate it as a query task with a time variable /J.. /J. changes from 1 to the maximum threshold, which we set 50 in the experiments.</p><p>Remaining lifespan prediction. For each /J., the output is the probability that this e-scooter is stillalive in service in /J. days. Bycomparing the query results on multiple /J. to the given probability threshold, we find the largest one as our predicted remaining lifespan. Intuitively, the probability threshold is set to be 0.5. We implement RUFE with Keras 2.4 and test it on a server with NVIDIA A4000 GPU with Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 256GB memory. For the hyper-parameters, the batch size is 256, and the decayweight is 10-6 for alldatasets. The momentum coefficient is set as 0.5 and 0.9for Newark data and New Brunswick data, respectively (detailed in Sec. 4.4). For contrastive learning, the learning rate is set as 1.5 X 10-5 , as the momentum mechanism requires a relatively smooth parameter update <ref type="bibr">(14)</ref>. For the remaining life prediction, we set the learning rate as 0.01. The dimension of status representationis optimized as 1, 024. Weoptimize it with the Adam optimizer for 100 epochs and do not apply any non-mentioned optimization techniques. All the experiments are repeated 5 times, and the performances are presented using the "mean&#177;standard deviation&#8226; format.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Overall Performance</head><p>From Table <ref type="table">2</ref>, we observe that:</p><p>&#8226; In general, the models <ref type="bibr">[21,</ref><ref type="bibr">25)</ref> that focus on capturing the integrated featuresof e-scooters' status (i.e., total served distance, and total served duration) achieve better performance than that (19, 38] explore the accumulated influences of time-series trip records. It is because the integrated results have a stronger representative power of e-scooters' status, and the models learned by individual records may drop partial information.</p><p>&#8226; AdaCare <ref type="bibr">[20)</ref> outperforms others (19, 38) because it integrates the status representationconsidering the temporal correlation. &#8226; RUFE gains 35.67% and 29.81% improvement compared with AdaCare (20] by leveraging user behavior as an implicit input for the degradation status representation learning. Moreover, different from previous work [19, <ref type="bibr">25)</ref>, we explore the influences of dynamic future usage on the remaining days ofservice.</p><p>The results show that the remaining lifespan of the e-scooter is determined by both its current degradation status and dynamic future usage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Ablation Studies</head><p>We conduct a comprehensive ablation study to further evaluate the status representation learning component, the future usage prediction component, and the impact of the user preference. We build the following variants of RUFE.</p><p>&#8226; RUFE-lstm removes the LSTM module and replaces it with the integrated trip records, i.e., total served distance and duration, to evaluate the strength of the degradation learning process. &#8226; RUFE-FU removes the future usage prediction module (i.e., Sec. <ref type="bibr">3.4)</ref> and predicts the remaining lifespan according to historical usage. &#8226; RUFE-UP removes the contributions of user preferences by (i) removing the user preference estimation loss in the status learning part and (ii) using LSTM only in the future usage prediction part without the user preferences.</p><p>We present the results in Fig. <ref type="figure">7</ref> and find that:</p><p>&#8226; RUFE outperforms RUFE-lstm, which demonstrates the importance of the degradation process (i.e., daily trip records). &#8226; RUFE outperforms RUFE-FU, which shows the future usage is inconsistent with the historical usage and predicting the future usage strengthens the prediction performance. &#8226; RUFE outperforms RUFE-UP, which verifies our intuition that the user preferences can serve as an implicit input to imply the overall status, and then improve the performance.</p><p>Overall, the results show that the learning of the degradation process (i.e., LSTM module), the future usage prediction, and the users' preference should be considered to improve the prediction performance.   4.4 Sensitivity Analysis Lil !I Momentum Coefficient (a) RMSE Momenrum Coefficient (b) MAE hand, a larger dimension benefits to contain more information and learn more accurate status representation; on the other hand, a too large dimension significantly increases the number of parameters, leading to overfitting and low performance. 5 RELATED WORK 5.1 Remaining Lifespan Prediction. There are lots of works exploring the information in operational records for remaining lifespan prediction such as trips, billing, and medical records. It can be further categorized into model-based and data-driven methods.For model-based methods, they use mathematical models to fit a degradation curve of the target, e.g., battery life curves [9]. However, they typically work in an ideal environment The Momentum Coefficient. One key parameter in RUFE is the momentum coefficient m, which influences the degradation condition representation learning. In general, the momentum coefficient adjusts the update rate or the encoders' consistency.If it is set to 0, it means the parameters of the momentum encoder are always updated with the query encoder.Such drastic updates influences the consistency of the encoded positive and negative samples, which eventually affects the representation learning. A relatively larger value indicates the samples are encoded by a slowly progressing encoder, which ensures consistency for better learning. However, ifit issetclose to 1 (e.g., 0.99), the encoders tend to keep the original parameters, which may also affect the representation learning.</p><p>Thus, the optimal momentum coefficient needs to be neither too small nor too large. Fig. <ref type="figure">8</ref>(a) and 8(b) show the effects of different momentumcoefficients in the New Brunswick and Newark datasets, respectively. We observe that RUFE achieves the best performance when the coefficient is set to be 0.9 and 0.5 in the New Brunswick and Newark dataset, respectively. This is mainly because the New Brunswick dataset has a much larger data capacity than the Newark dataset, which needs a larger momentum coefficient. Compared to not using the momentum encoder (i.e., set the coefficient to 0), the momentum encoder improves the performance by 13.6%.</p><p>Dimension of Learned Representation Vector. Another critical parameter in RUFE is the dimension of the learned status representation vector, which indicates the information diversity. Fig. <ref type="figure">9</ref>(a) and Fig. <ref type="figure">9</ref>(b) show the effects of dimensions of representation vector on the Newark and New Brunswick datasets. We observe that on one without noise and uncertainty. For data-driven methods, neural networks <ref type="bibr">[38,</ref><ref type="bibr">39]</ref> are applied to historical data to learn the non-linear degradation trend of sequential data. For example, MLP is useful for learning non-linear degradation patterns <ref type="bibr">[39]</ref>, but it lacks the ability to incorporate temporal information. Then, the RNN-based frameworks, e.g., RNN <ref type="bibr">[39]</ref>, LSTM <ref type="bibr">[38)</ref>, have been applied to learn the degradation trend of sequential data. Zhang et al. <ref type="bibr">[39]</ref> utilized the long short-term memory (LSTM) recurrent neural network (RNN) to learn the long-term dependencies among the degraded capacities of lithium-ion batteries. However, those methods heavily rely on sensors to collect explicit status indicators, e.g., state of health(SOH) in batteries <ref type="bibr">[9]</ref>, which incurs two limitations in our problem. First, existing sensors are designed to monitor only certain components of e-scooters, such as batteries <ref type="bibr">[39)</ref>, while other components, such as wheels and brakes, cannot be well monitored (or need more sophisticated and expensive sensors). Second, the cost of sensors is proportional to the number of e-scooters, so it is expensive to deploy sensors at a large scale.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Representation Learning</head><p>Representationlearning aims to learn a low-dimensional vector for data representation, such as graphs <ref type="bibr">[33]</ref>, and hidden status <ref type="bibr">[15,</ref><ref type="bibr">20,</ref><ref type="bibr">37]</ref>. For instance, AdaCare <ref type="bibr">[20]</ref> depicted the health status by capturing the long and short-term variations of biomarkers and modeled the correlation between clinical features to enhance the oneswhich indicate the health status. GRASP <ref type="bibr">[37]</ref> proposed a generic framework for healthcare models which aims to solve data sparsity or low-quality data.Med2Vec <ref type="bibr">[15]</ref> learned the representationsfor both &#8226;&lt; 10 0 medical codes and visits from large EHR datasets with over a millionvisits. PNRL <ref type="bibr">[33]</ref> proposed a predictive network representation for the structural link prediction. PTARL <ref type="bibr">[31]</ref> explored the peer and temporal dependencies of driving behavior with GPS trajectories data.</p><p>However, those frameworks focus on individual status learning rather thanlearning similar or dissimilar representations from data organized into similar or dissimilar pairs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Contrastive Learning.</head><p>Contrastive representationlearningmadea great successin practice in classifying groups of images unsupervisedly <ref type="bibr">[7, 11-13, 18, 36]</ref>. It benefits to identify two key properties related to the contrastive loss: (1) alignment (i.e., closeness) of features from positive pairs, and (2) uniformity of the induced distribution of the normalized features <ref type="bibr">[28,</ref><ref type="bibr">32]</ref>. For example, SimCLR <ref type="bibr">[11]</ref> proposed two major components to enable the contrastive prediction tasks to learn useful representations, including data augmentation and learnable nonlinear transformation. MoCo V2 <ref type="bibr">[12]</ref> used an MLP projection head and more data augmentation with Momentum Contrast (MoCo), which outperformed SimCLR and did not require large training batches. BYOL <ref type="bibr">[16]</ref> introduced a new framework for self-supervised representation learning, which relies on two neural networks, including online and target networks that interact and learn from each other. However, contrastive methods typically have real-time requirements and need many explicit pairwise feature comparisons, which incur a high computational cost. For efficiency, SwAV <ref type="bibr">[7]</ref> is an online algorithm without being required to compute pairwise comparisons. SimSiam <ref type="bibr">[13]</ref> simplified the BYOL framework by removing: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders, and achieved surprising empirical results. BARLOW TWINS <ref type="bibr">[36]</ref> did not require large batches nor asymmetry between the network twins, i.e., a predictor network, gradient stopping, or a moving average on the weight updates.</p><p>In summary, Contrastive learning is a great self-supervised approach that benefits learning similar or dissimilar representations from data. It is suitable to learn the similar or dissimilar degradation status of e-scooters without explicit status measures. In this work, we enhance the generic contrastive learning with a newdata augmentation method for sequential data and introduce user preferences as implicit feedback to improve representation learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">DISCUSSION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Lessons Learned</head><p>Based on the design, implementation, and evaluation of RUFE, we learned the following lessons:</p><p>&#8226; User behavior performs well as an implicit input to measure e-scooters status. The key insight of RUFE is that user behavior, i.e., user preferences, can be utilized as the implicit input to learn the e-scooters' degradation status.That is, a less-selected e-scooter (i.e., low selection probability) or longer idle time escooter (i.e., long idle intervals between consecutive trips) generally has a worse condition. Supported by Fig. <ref type="figure">7</ref>, we found that introducing user preferences helps our model gain 24.96% and 7.95%improvement in the New Brunswick and Newark datasets, respectively.</p><p>&#8226; Future usage dynamics should be considered in the remaining lifespan prediction. Different from the existing lifespan prediction that the future usage generally is consistent with the historical usage, e-scooters' usage changes as the degradation status changes. Our ablation study validates the necessity of considering changed future usage for the remaining lifespan prediction. Supported by Fig. <ref type="figure">7</ref>, we observed that the future usage prediction component leads to the performance improvement of 47.16% and 26.31%. &#8226; User preferences can be used to improve future usage prediction. Predicting future usage can be challenging if a dynamic degradation process is involved. In our work, we use the learned status representation as an opportunity to estimate future user preference, which in turn supports future usage prediction. Supported by Fig. <ref type="figure">7</ref>, we observed that the introduction of user preference estimation in the future usage prediction improves the performance by 27.25%and 26.39%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Practical Implications of the results</head><p>In this work, we focus on modeling e-scooters' current status and future usage to provide a more accurate prediction about the remaininglifespan.The potential implications include that the results (i.e., estimated remaining lifespan) can be utilized to further study thee-scooters' re-balancing problem <ref type="bibr">[8,</ref><ref type="bibr">24]</ref>. For instance, we can re-balance the e-scooters with longer lifespan (i.e., good condition) to the areas with higher demand to increase the users' satisfaction.</p><p>And we can also re-balance the e-scooters with shorter lifespan to low-demand areas to increase overall lifespan.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Ethics and Privacy</head><p>During the data analysis and data mining of the trip records, we took careful steps to address ethical and privacy concerns. First, all thee-scooter users have digested the Terms of Services and consent the platform can collect their trip trajectories for research and service improvement. Second, all the raw data has been pre-processed intoaggregated anonymous statistics based on the privacy protection requirements during the data collection process. All the user identifiers are removed, and all the auxiliary information is strictly limited to GPS traces.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSION</head><p>In this work, we design a framework called RUFE for remaining lifespan prediction of e-scooters with user preferences consideration. Our RUFE validates that the user preference is beneficial to be explored as the implicit input for the e-scooters' degradation status representation learning. Moreover, future usage prediction contributes to prediction performance. Based on the experiment results, RUFE can improve the performance by up to 35.67%compared with the baseline methods.We alsodemonstrate the effectiveness of our RUFE with different ablation studies and parameters analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENT</head><p>Thiswork is partially supported by <ref type="bibr">NSF 1932223, 1951890, 1952096, 2003874, 2047822, 2246080</ref>. We thank all the reviewers for their insightful feedback to improve this paper.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>https://www.dropbox.com/s/2muo5q6ggeOwd5!/rlife-src.tar.gz</p></note>
		</body>
		</text>
</TEI>
