<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>On streaming disaster damage assessment in social sensing: A crowd-driven dynamic neural architecture searching approach</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>03/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10385135</idno>
					<idno type="doi">10.1016/j.knosys.2021.107984</idno>
					<title level='j'>Knowledge-Based Systems</title>
<idno>0950-7051</idno>
<biblScope unit="volume">239</biblScope>
<biblScope unit="issue">C</biblScope>					

					<author>Yang Zhang</author><author>Ruohan Zong</author><author>Ziyi Kou</author><author>Lanyu Shang</author><author>Dong Wang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Motivated by the recent advances in Internet and communication techniques and the proliferation of online social media, social sensing has emerged as a new sensing paradigm to obtain timely observations of the physical world from "human sensors". In this study, we focus on an emerging application in social sensingstreaming disaster damage assessment (DDA), which aims to automatically assess the damage severity of affected areas in a disaster event on the fly by leveraging the streaming imagery data about the disaster on social media.In particular, we study a dynamic optimal neural architecture searching (NAS) problem in streaming DDA applications. Our goal is to dynamically determine the optimal neural network architecture that accurately estimates the damage severity for each newly arrived image in the stream by leveraging human intelligence from the crowdsourcing systems. The present study is motivated by the observation that the neural network architectures in current DDA solutions are mainly designed by artificial intelligence (AI) experts, which often leads to non-negligible costs and errors given the dynamic nature of the streaming DDA applications and the lack of real-time annotations of the massive social media]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Social sensing has emerged as a powerful sensing paradigm for collecting observations of the physical world through social media <ref type="bibr">[1,</ref><ref type="bibr">2]</ref>. Examples of social sensing applications include city-wide traffic surveillance using Twitter feeds <ref type="bibr">[3]</ref>, urban anomaly detection using Foursquare check-ins <ref type="bibr">[4]</ref>, and community disease outbreak monitoring using Facebook posts <ref type="bibr">[5]</ref>. Unlike other infrastructure-based sensing paradigms (e.g., CCTV cameras, remote sensing, wireless sensor networks), social sensing provides a pervasive and scalable solution for obtaining real-time damage information during disaster events <ref type="bibr">[6]</ref>. In this paper, we focus on an emerging application in social sensing: streaming disaster damage assessment (streaming DDA) <ref type="bibr">[7]</ref>. The goal of streaming DDA applications is to automatically assess the damage severity of affected areas in a disaster event on the fly by leveraging the streaming imagery data posted on social media.</p><p>The outputs of streaming DDA applications can be shared with emergency response agencies (e.g., Federal Emergency Management Agency (FEMA), fire departments) for timely rescue and recovery operations.</p><p>Recent advancements in artificial intelligence (AI) have helped in improving the performance of DDA applications <ref type="bibr">[8,</ref><ref type="bibr">7,</ref><ref type="bibr">9,</ref><ref type="bibr">10]</ref>. In particular, compared with the traditional DDA solutions that largely rely on intensive manual labeling efforts from disaster specialists <ref type="bibr">[11]</ref>, the AI-driven DDA solutions significantly reduce the labeling costs while providing a reasonable assessment accuracy <ref type="bibr">[12]</ref>.</p><p>However, current AI-driven DDA solutions often require inputs from experts who are specialists in both AI models and DDA applications to design an appropriate neural network architecture for a particular DDA application. This manual neural network architecture design process is known to be both timeconsuming and suboptimal <ref type="bibr">[13]</ref>. Figure <ref type="figure">1</ref> shows an example where the optimal neural network architecture in a streaming DDA application changes over time. In particular, we observe that the optimal neural network architectures for disaster-related images collected in consecutive timesteps in the same disaster event are different. In such scenarios, it is difficult for AI experts to predict and design an individual optimal neural network architecture for each newly arrived image on the fly. Motivated by the above observations, we study a dynamic optimal neural architecture searching (NAS) problem in streaming DDA applications where the goal is to dynamically determine the optimal neural network architecture that accurately estimates the damage severity for each newly arrived image without the inputs from AI experts.</p><p>In this study, we develop a crowd-driven dynamic neural architecture search (CD-NAS) system to address the above problem by exploring the collective intelligence of both AI and humans. The objective of our CD-NAS design is to leverage human intelligence from crowdsourcing systems to guide the discovery of the optimal neural network architecture for every image in a streaming DDA application. In particular, we observe that human perception is often more reliable and consistent than AI algorithms in terms of identifying the severity of disaster damage from the image (e.g., we can clearly determine the damage severity of images reported in Figure <ref type="figure">1</ref>). Such human intelligence could possibly help us dynamically identify the optimal neural network architecture in Mechanical Turk) <ref type="bibr">[14]</ref>. We refer to the human intelligence collected from the crowdsourcing platform as crowd intelligence in the remainder of this paper.</p><p>Two important technical challenges exist in designing such a crowd-driven NAS system, which are elaborated below.</p><p>Dynamic optimal neural architecture searching. The first challenge lies in the dynamic identification of the optimal instance of neural network architecture for each image in the streaming DDA application without knowing its ground truth label a priori. In particular, current NAS solutions in AI are mainly designed to identify a single best-performing neural network architecture for a given set of training data <ref type="bibr">[15,</ref><ref type="bibr">16]</ref> and leverage the identified neural network architecture to estimate the damage severity for all testing data. However, such a one-size-fits-all neural network architecture could inaccurately estimate the damage severity for a non-negligible portion of images because the optimal neural network architecture often changes over time (as shown in Figure <ref type="figure">1</ref>).</p><p>Recent advancements in dynamic neural networks could potentially be applied to address this issue <ref type="bibr">[17,</ref><ref type="bibr">18]</ref>. However, a major limitation of these solutions is that they still require a large amount of high-quality training labels from the studied disaster event to periodically retrain their models to capture the dynamics of streaming data. However, such a high-quality training dataset is often not available for an unfolding disaster in streaming DDA because of the "cold start" problem <ref type="bibr">[19]</ref> and the lack of real-time annotations due to cost and resource constraints <ref type="bibr">[20]</ref>. Additionally, recent efforts in online deep learning have been made to dynamically update the learned network instances <ref type="bibr">[21,</ref><ref type="bibr">22]</ref>.</p><p>However, these solutions primarily focus on periodically optimizing the performance of a particular neural network architecture pre-defined by the AI experts on relatively simple tasks (e.g., tweet text classification and handwriting number identification). As a result, their performance may be suboptimal because of the potential bias and constraints of the manual network design process given the excessive damage characteristics and fine-grained details of disaster-related social media images <ref type="bibr">[23]</ref>. Therefore, the dynamic identification of the optimal neural network architecture for each incoming imagery data in streaming DDA applications remains a nontrivial question.</p><p>Imperfect crowd intelligence-driven NAS. The second challenge lies in leveraging the imperfect crowd intelligence from potentially unreliable crowd workers to facilitate the identification of the optimal neural network architecture in streaming DDA applications. Unlike AI experts who are capable of designing effective neural network architectures, crowd workers are often limited to simplified annotation tasks (e.g., labeling damage severity levels for assigned images). More importantly, unlike the damage severity annotated by disaster specialists, the labels from crowd workers are often imperfect (biased, noisy, and even conflicting responses from different crowd workers) <ref type="bibr">[24]</ref>. Additionally, the noise embedded in crowd intelligence can be amplified during the neural network architecture search process, leading to the selection of the poorly performed neural network architecture <ref type="bibr">[15,</ref><ref type="bibr">25]</ref>. Therefore, the key question in our design is how to effectively transfer potentially imperfect crowd knowledge (e.g., noisy crowd labels) into an accurate neural network architecture selection for streaming imagery data.</p><p>To address the above challenge, we developed CD-NAS, a crowd-driven dynamic neural architecture searching approach that carefully explores crowd intelligence to solve the optimal neural architecture search problem and optimize the performance of streaming DDA applications. To address the first challenge, we develop a streaming neural network architecture search framework that recursively updates the optimal neural network architecture for each incoming image through a novel recursive maximum likelihood estimation model. To address the second challenge, we designed a novel crowd-AI fusion model that translates imperfect crowd intelligence to effective neural network architecture selection through a robust crowd-AI collaborative network searching process.</p><p>To the best of our knowledge, CD-NAS is the first dynamic crowd-driven NAS approach for solving the streaming DDA problem. We evaluated CD-NAS using a real-world streaming DDA application from a recent disaster event, Typhoon Hagupit. The evaluation results show that our CD-NAS consistently outperforms both state-of-the-art AI and NAS baselines by achieving the highest disaster damage assessment accuracy while maintaining the lowest computational cost under various evaluation scenarios.</p><p>A preliminary version of this study was published in <ref type="bibr">[26]</ref>. The journal paper is a significant extension of previous work in the following aspects. First, we identify two new intrinsic challenges (i.e., dynamic optimal neural architecture searching and imperfect crowd intelligence-driven NAS ) to solve the dynamic optimal NAS problem and explicitly discuss how our scheme addresses these two challenges (Section 1 and Section 4). Second, we extend the dynamic optimal architecture searching (DOAS) module in CD-NAS by developing a dynamic neural network architecture searching scheme that adaptively updates the estimation of the optimal neural network for each image through a recursive estimation framework (Section 4). Third, we extend the evaluation in the conference paper by explicitly studying the performance of all compared schemes with a diversified set of crowdsourcing settings (i.e., different numbers of crowd workers and crowd query frequencies). The new results demonstrate the effectiveness of our scheme in explicitly leveraging crowd intelligence to guide the discovery of the optimal neural network architecture under different streaming DDA application scenarios (Section 5). Fourth, we add a new study to evaluate the computational cost of all compared schemes (i.e., the average computation time required to estimate the damage severity of an image). This is motivated by the fact that the computational cost is critical in streaming DDA applications, especially in the context of massive social media data inputs. The new results demonstrate that our CD-NAS scheme takes orders of magnitude less time to accomplish the streaming DDA task compared with other baselines (Section 5).</p><p>Fifth, we compare CD-NAS with two additional deep learning and NAS baselines (i.e., DenseNet and MnasNet) and demonstrate the performance gains achieved by CD-NAS compared with all baselines (Section 5). Sixth, we add a new robustness study to evaluate the robustness of the CD-NAS by varying one key parameter in our design: the size of the sliding window for streaming DDA applications (Section 5). Finally, we extend the related work by adding discussions on recent progress in social sensing and NAS Both of these topics are closely related to the theme of this study (Section 2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Social Sensing</head><p>Motivated by the recent advances in Internet and communication techniques (e.g., 4/5G, Internet of Everything (IoE)), as well as the proliferation of online social media (e.g., Twitter and Instagram), social sensing has emerged as a new sensing paradigm to obtain timely observations of the physical world from "human sensors" <ref type="bibr">[27]</ref>. Examples of social sensing applications include monitoring real-time traffic conditions in a metro area using mobile crowdsensing to enhance traffic safety <ref type="bibr">[3]</ref>, obtaining situational awareness in the aftermath of a disaster using online social media for rapid disaster response <ref type="bibr">[28]</ref>, and detection of infectious disease outbreaks in big cities using location-based crowd track-ing services to improve public health <ref type="bibr">[5]</ref>. Several key challenges exist in the current social sensing applications. Examples include real-time guarantee, data reliability, incentive design, privacy protection, and noise reduction <ref type="bibr">[29,</ref><ref type="bibr">30,</ref><ref type="bibr">31]</ref>.</p><p>However, the crowd-driven dynamic optimal NAS problem in streaming DDA applications remains an unsolved challenge in social sensing. In this paper, we address this problem by developing a novel crowd-AI collaborative NAS framework to accurately assess the damage severity of affected areas on the fly using streaming imagery data posted on social media.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Disaster Damage Assessment</head><p>Recent advances in AI and deep learning have been proved remarkably helpful in improving the performance of DDA applications <ref type="bibr">[8,</ref><ref type="bibr">7,</ref><ref type="bibr">9,</ref><ref type="bibr">10,</ref><ref type="bibr">32,</ref><ref type="bibr">33]</ref>. For example, Li et al. developed a deep domain adaptation approach to estimate the damage severity of affected areas using online social media data via adversarial transfer learning <ref type="bibr">[8]</ref>. Nguyen et al. proposed a deep convolutional network framework for disaster damage assessment of unfolding disaster events for timely disaster response <ref type="bibr">[7]</ref>. Kumar et al. proposed a deep image classification framework to identify disaster-affected cultural heritage sites from social media imagery data via an end-to-end deep image processing system design <ref type="bibr">[9]</ref>.</p><p>Mouzannar et al. developed a deep neural network approach that utilizes both text and image data from social media posts for damage identification via multimodal convolutional neural networks <ref type="bibr">[10]</ref>. However, current AI-driven DDA solutions often require extensive inputs from AI experts to design an effective neural network architecture for DDA tasks. Such a manual design process is known to be both error-prone and time-consuming in the presence of massive social data inputs in streaming DDA applications <ref type="bibr">[13]</ref>. Efforts on dynamic neural networks in DDA are also relevant to our work <ref type="bibr">[17,</ref><ref type="bibr">18]</ref>. However, two limitations prevent them from being applied to address our problem: i) those methods often require periodical model retraining that often cannot catch up with the large dynamics in our streaming DDA application settings <ref type="bibr">[34]</ref>; ii) the performance of these models often drops significantly when they are retrained using the imperfect crowd labels <ref type="bibr">[35]</ref>. In contrast, our CD-NAS framework effectively identifies the optimal neural network architecture for each image without the inputs from AI experts and in the absence of ground-truth labels of newly arrived images.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Crowd Intelligence</head><p>Our work is also related to the growing trend of utilizing pervasive and scalable human intelligence from crowdsourcing systems to solve complex realworld problems <ref type="bibr">[36,</ref><ref type="bibr">37,</ref><ref type="bibr">38,</ref><ref type="bibr">39,</ref><ref type="bibr">40]</ref>. For example, Harris et al. leveraged mobile crowdsourcing to detect the defected and deteriorated urban infrastructure for smart city management <ref type="bibr">[37]</ref>. Dos Reis et al. utilized citizen scientists to segment cancer cells from breast tumors in biomedical research <ref type="bibr">[38]</ref>. Wang et al.</p><p>used road traffic information reported by common citizens to monitor real-time traffic congestion in intelligent transportation <ref type="bibr">[40]</ref>. However, two fundamental limitations exist in current solutions that fully rely on human intelligence from crowdsourcing systems. First, these approaches may be too labor-intensive and costly compared to our CD-NAS which only requires crowd labels from a small subset of studied images to guide the discovery of the optimal neural network architecture for desirable DDA performance <ref type="bibr">[12]</ref>. Second, unlike the professional annotations from disaster specialists, labels from crowd workers can be biased, noisy, and even conflicting because of the lack of sufficient expertise on disaster assessment and response <ref type="bibr">[24]</ref>. As a result, the current crowdsourcing solutions could suffer from a non-trivial DDA performance drop by using only the imperfect responses from crowd workers. In contrast, our CD-NAS jointly integrates the inputs from crowd workers and AI models into a novel crowd-AI collaborative model that effectively fuses intelligence from both the crowd and AI to address the imperfect crowd response challenge and identify the optimal neural network architecture in DDA applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Neural Architecture Searching</head><p>Our work also resembles the NAS technique that is used to automate the neural network design process in many AI-driven real-world applications <ref type="bibr">[15,</ref><ref type="bibr">16,</ref><ref type="bibr">25,</ref><ref type="bibr">41,</ref><ref type="bibr">42]</ref>. For example, Zoph et al. developed a scheduled drop path mechanism to enable an effective neural network architecture search for semantic image segmentation <ref type="bibr">[15]</ref>. Liu et al. proposed a differentiable architecture representation mechanism to effectively refine the neural network architecture during the NAS process in natural language modeling <ref type="bibr">[16]</ref>. Tan et al. designed a lightweight NAS approach to incorporate model inference latency into the factorized hierarchical searching process for image object detection via multiobjective reinforcement learning <ref type="bibr">[25]</ref>. Mo et al. proposed a recursive NAS approach to concurrently search for the optimal network architecture on layer and network block levels to improve the NAS performance in keyword spotting on smart devices <ref type="bibr">[41]</ref>. To the best of our knowledge, CD-NAS is the first NAS solution that effectively transfers imperfect crowd intelligence to dynamic optimal neural network architecture selections in streaming DDA applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Problem Description</head><p>In this section, we formally define our crowd-driven dynamic NAS problem in streaming DDA applications. We first define a few key terms that will be used in the problem formulation.</p><p>Definition 1. Disaster-related social media images (X): We define X to represent the disaster-related images posted by common citizens on social media (e.g., Twitter) during a disaster event (as shown in Figure <ref type="figure">2</ref>), where each posted image captures a specific scene of the studied disaster event.</p><p>Definition 2. Social media image stream (S): We define S = {X 1 , X 2 , ..., X T } as the set of streaming social media images collected during a disaster event, where X t represents the disaster-related social media image collected from the t th timestep and T is the total number of timesteps in the studied streaming DDA application (e.g., see Figure <ref type="figure">1</ref>). Definition 4. Categories of damage severity level (K): Following a similar procedure in <ref type="bibr">[7]</ref>, the damage severity level in an image can be classified into one of the K pre-defined categories: L t &#8712; {1, 2, ..., K}. For example, we can consider three categories of damage severity levels (i.e., K =3) that include severe damage, mild damage, no/minor damage as shown in Figure <ref type="figure">2</ref>. in Figure <ref type="figure">1</ref>). In this study, we leverage the neural network architecture design space (i.e., different configurations of adopting ImageNet-pre-trained convolutional layers for image classification tasks <ref type="bibr">[43]</ref>), which is commonly adopted in the current AI-driven DDA solutions <ref type="bibr">[7,</ref><ref type="bibr">11]</ref>. Definition 6. Damage severity estimation from AI ( L N ): We define L N as the damage severity level estimated by different neural network architectures in N . In particular, L Ne t represents the damage severity level estimated by the neural network architecture N e for the reported image X t . Definition 7. Dynamic optimal network architecture (N * ): We define N * as the set of optimal neural network architectures identified by our CD-NAS framework from N for different images in S. In particular, N t * represents the optimal neural network architecture that produces the most accurate damage severity estimation L N t * t for the image X t collected at the t th timestep (e.g., N t * is set to be architecture 1 at timestep 1 and architecture 2 at timestep 2 in Figure <ref type="figure">1</ref>).</p><p>The goal of our crowd-driven dynamic NAS problem is to leverage human intelligence from the crowdsourcing systems to improve the performance of streaming DDA applications. In particular, our goal was to dynamically select the optimal neural network architecture for each image. We formally define our problem as follows:</p><p>This problem is challenging because of the difficulty of transferring the imperfect crowd intelligence to dynamically identify the optimal neural network architecture for streaming social media image data in the absence of ground-truth labels.</p><p>In this paper, we develop a CD-NAS system to address this problem, which is elaborated in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Solution</head><p>In this section, we present the CD-NAS framework to address the dynamic optimal neural architecture search problem in streaming DDA applications. We first present an overview of CD-NAS and then discuss its core modules in detail.</p><p>Finally, we summarize the CD-NAS framework using pseudocodes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Overview of CD-NAS Framework</head><p>An overview of the CD-NAS is shown in Figure <ref type="figure">3</ref>. In particular, it consists of two modules: 1) crowd-driven network architecture selection (CNAS)</p><p>and dynamic optimized architecture searching (DOAS). First, the CNAS module develops a novel crowd-AI integration model to effectively leverage imperfect crowd knowledge to facilitate the discovery of an optimal neural network architecture. Second, the DOAS module designs a dynamic neural network architecture searching scheme that adaptively updates the estimation of the optimal neural network for each image through a recursive estimation framework. In this subsection, we develop a principled crowd-AI integration model to explicitly leverage imperfect crowd intelligence to facilitate the discovery of the optimal neural network architecture in streaming DDA applications. In particular, we first define a key concept that is used in our CNAS module: Definition 8. AI-crowd fusion window (AF W ): The AF W is defined as a sliding window for the streaming DDA applications that includes the most recent I images from the social media stream S. In particular, we define AF W = {X 1 , X 2 , ..., X I }, where X i represents the i th image in the sliding window and I is the size of the AF W . We note that I is an application-specific parameter and will study its effect in Section 5.</p><p>Similar to the online video applications (e.g., YouTube) that often use a local data buffer to ensure a smooth streaming video service, the AF W here is designed to buffer a set of images in streaming DDA applications for the dynamic neural network architecture search. In particular, we add newly arrived images to the AF W until it is full. Then, we apply the first-in-first-out (FIFO) strategy to replace the oldest image in AF W with the newly arrived image. The optimal neural network architecture for each image in the AF W was identified when the image was evicted from the AF W . Such a design is performed to ensure that our CD-NAS can recursively improve the estimation of the optimal neural network for each image in the AF W .</p><p>In our CD-NAS system, we explicitly leverage human intelligence from a crowdsourcing system to guide the discovery of the optimal neural network architecture for each image in AF W . Hence, we further define a few concepts related to crowd intelligence as follows:</p><p>Definition 9. Crowd query (Q): We define a crowd query as a crowdsourcing task in which our system sends a subset of images in the AF W for the crowd workers to label their damage severity levels. The returned crowd labels are used to search the optimal neural network architecture for each image, which is discussed later in this section when we formally introduce our AI-crowd collaboration model design. </p><p>where Given the above definitions, the goal of our CNAS module is to select the neural network architecture in M with the highest assessment reliability as the optimal neural network architecture in our crowd-driven NAS problem. To that end, we further define P + u,k and P - u,k as the unknown probability that the member M u estimates the damage severity level of an image to be the k th level and the value other than the k th level given the ground-truth damage severity level of the image is the k th level, respectively. We formally define P + u,k and P - u,k as follows:</p><p>where L Mu t represents the estimated damage severity level by a member M u in M on an image X t in AF W . L i is the ground-truth damage severity level for X i . Given the above definition, P + u,k and P - u,k are related to the assessment reliability &#948; Mu using Bayesian theorem as follows:</p><p>where G Mu,k and G Mu,k represent the probability that a member M u estimates the k th damage severity level and values other than the k th level, respectively.</p><p>d k represents the prior probability that a randomly selected image belongs to the k th damage severity level. We note that we can learn the assessment reliability score &#948; Mu if we can obtain the values for the other parameters in the above equation. To that end, we formulate a crowd-AI maximum likelihood estimation (MLE) problem to estimate the unknown assessment reliability score &#948; Mu for each member M u in the AI-crowd collaboration committee and unknown damage severity level L as follows:</p><p>where L Mu indicates the damage severity estimated by a neural network architecture L Ne or labeled by a crowd worker L C b in M .</p><p>Given the crowd-AI MLE problem above, we further define the likelihood function L(&#952;; &#969;, Z) of our MLE problem as follows:</p><p>The above likelihood function represents the likelihood of the observed data &#969; (i.e., damage severity levels of images in the current AF W estimated by different neural network architectures and crowd workers) and the values of hidden variables Z (i.e., the actual damage severity level of an image) given the estimated parameter &#952;. Detailed explanations of the parameters in the L(&#952;; &#969;, Z) are summarized in Table <ref type="table">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Dynamic Network Architecture Searching (DNAS)</head><p>In the previous subsection, we presented our crowd-AI MLE formulation to learn the assessment reliability for each neural network architecture in our AIcrowd collaboration committee. The next question involves adaptively solving the formulated crowd-AI MLE problem to learn the assessment reliability on the fly so that we can dynamically identify the optimal neural network architecture for each image. To that end, we propose a recursive expectation maximization (EM) solution to solve the crowd-AI MLE problem. In estimation theory <ref type="bibr">[44]</ref>,</p><p>the estimation parameter of an MLE problem can be recursively updated in indicator variable that is set to be 1 when a member Mu estimates the damage severity of a given image X i to be the k th level and is set to be 0 otherwise.</p><p>indicator variable that is set to be 1 when a member Mu estimates the damage severity of a given image X i to be the value other than k th level and is set to be 0 otherwise. </p><p>where &#952; t and &#952; t+1 indicate the estimation parameters &#952; at two consecutive timestep t and t + 1, respectively. X t+1 indicates the nearly arrived image at timestep t + 1. The estimation parameter &#952; t+1 is used to calculate the updated assessment reliability for each neural network architecture in the AI-crowd collaboration committee using Equation ( <ref type="formula">4</ref>). I c (&#952; t ) -1 indicates the inverse of the Fisher information of the estimation parameter &#952; t at timestep t. &#934;(X t+1 , &#952; t )</p><p>represents the score vector of the observed data (input image X t+1 ) at timestep t + 1 given the estimation parameter &#952; t from the last timestep t. The key idea of the above streaming formulation is to provide a dynamic solution to recursively update the estimation parameter &#952; on the fly.</p><p>To obtain the Fisher information I c (&#952; t ) and score vector &#934;(X t+1 , &#952; t ), we first derive the log function of L(&#952;; &#969;, Z) by assuming that the correctness of the hidden variable (Z t,k ) can be correctly estimated when the number of members in the AI-crowd collaboration committee is sufficient. In particular, we can derive the log-likelihood function logL(&#952;; &#969;, Z) as:</p><p>Given the log-likelihood function logL(&#952;; &#969;, Z), we can derive the inverse of the Fisher information I c (&#952; t ) -1 for our problem as follows:</p><p>In addition, we can also derive the score vector &#934;(M t+1 , &#952; t ) from logL(&#952;; &#969;, Z)</p><p>as follows:</p><p>Finally, we can plug in I c (&#952; t ) -1 and &#934;(M t+1 , &#952; t ) into Equation ( <ref type="formula">7</ref>) to obtain the recursive formula to update the estimation parameters &#952; (i.e., P + u,k and P - u,k ) as follows: ). In addition, we observe that Z t+1 i,k is unknown and can be estimated by its approximation &#7824;t+1 i,k as follows:</p><p>where W t+1 n,k can be computed as follows:</p><p>In summary, the above recursive approach provides a dynamic solution for learning the estimation parameter &#952; of the crowd-AI MLE problem on the fly at each timestep using the estimation from the previous timestep and the images from the current image sliding window. Finally, we can derive the assessment reliability &#948; Mn for each member M u in M dynamically by plugging the updated &#952; t+1 to Equation (4) at each timestep. After obtaining the assessment reliability score for each neural network architecture, we select the neural network architecture with the highest assessment reliability score as the optimal neural network architecture N * for the image that is about to be evicted from the AF W as follows:</p><p>where X i t+1 represents the image that is about to be evicted from the AF W at timestep t + 1. &#948; Mu t+1 represents the updated assessment reliability score at timestep t + 1. In addition, the estimated damage severity L N i * from the optimal neural network architecture N i * was taken as the final output of our CD-NAS framework for image X i t+1 .</p><p>Finally, we summarize the CD-NAS framework in Algorithm 1. The inputs to the CD-NAS are the set of streaming social media images X t . The outputs are the dynamically identified optimal neural network architecture N t * and the estimated damage severity level L N t * generated by N t * for each X t .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Evaluation</head><p>In this section, we evaluate the performance of the CD-NAS framework using real-world streaming DDA applications from a real world disaster event. The results show that CD-NAS consistently outperforms the state-of-the-art AI and NAS baselines in terms of both damage assessment accuracy and computational cost under various application scenarios.</p><p>Algorithm 1 CD-NAS Framework Summary </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Dataset and Crowdsourcing Platform</head><p>Disaster Damage Assessment Dataset: In our evaluation, we used a real-world dataset on disaster damage assessment collected by <ref type="bibr">[7]</ref> <ref type="foot">foot_0</ref> . In particular, the dataset consists of social media images collected over the course of Typhoon Hagupit in Philippines (2014). The collected social media images have diversified damage characteristics (e.g., flooding damage, buildings and infrastructure damage, and vehicle damage) as shown in Figure <ref type="figure">1</ref>. In the dataset, the ground-truth damage severity level of each social media image was manually classified by domain experts into three categories (i.e., severe damage, mild damage, and no/minor damage). In particular, the distributions of different damage severity levels in our dataset were as follows: severe damage: 11.2%; mild damage: 42.2%; and no damage: 46.6%. We keep the ratio of training to testing data as 3:1, the same as in <ref type="bibr">[7]</ref>. The training dataset was used to train all the compared AI models for disaster damage assessment.</p><p>Amazon Mechanical Turk Platform: To obtain the crowd intelligence, we utilize Amazon Mechanical Turk (AMT)<ref type="foot">foot_1</ref> , one of the largest crowdsourcing platforms that provides a large number of 24/7 freelance crowd workers to complete assigned tasks with reasonable incentives. In each crowdsourcing task, we ask the crowd workers to label the damage severity level of the image in the query. To ensure the crowd label quality, we select the crowd workers who have an overall task approval rate greater than 95% and have completed at least 1000 approved tasks to participate in our crowdsourcing tasks. We paid $0.20 for each worker per image in our experiment. In our evaluation, we study a diversified set of crowd query settings to create a challenging evaluation scenario for our CD-NAS framework. In particular, we vary the number of participating crowd workers who respond to each queried image (Definition 10) from 3 to 5 and vary the crowd query frequency &#946; (Definition 11) from 1/5 to 1/3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Baselines and Experiment Settings</head><p>We compared CD-NAS with a set of representative deep neural network (DNN) and neural architecture searching (NAS) baselines in streaming DDA applications.</p><p>&#8226; DNN Baselines:</p><p>1. InceptationNet <ref type="bibr">[45]</ref>: a popular deep learning model that accelerates the learning process of the DDA task through a convolution factorization mechanism.</p><p>2. DenseNet <ref type="bibr">[46]</ref>: a widely used deep neural network approach that establishes dense connections among different network layers to boost the DDA accuracy.</p><p>3. VGG <ref type="bibr">[11]</ref>: A representative deep convolutional network framework that utilizes recursive deep convolutional operations to ensure the sufficient network depth for a desirable DDA performance.</p><p>&#8226; NAS Baselines:</p><p>1. NashNetLarge/Mobile <ref type="bibr">[15]</ref>: A state-of-the-art NAS approach that effectively refines the neural network architecture by introducing a scheduled drop path mechanism. In addition to the standard version of NashNet (NashNetLarge), we also consider the mobile version of NashNet (NashNetMobile) which achieves a better trade-off between the NAS performance and computational efficiency in streaming DDA applications.</p><p>2. Darts <ref type="bibr">[16]</ref>: a representative NAS framework that introduces a differentiable architecture representation to ensure an effective NAS process.</p><p>3. MansNet <ref type="bibr">[25]</ref>: A lightweight NAS approach to incorporate model inference latency into the factorized hierarchical architecture searching process via multi-objective reinforcement learning.</p><p>To ensure a fair comparison, the inputs to all compared schemes were set to be the same, which included: 1) the input social media images, 2) the ground truth labels of images in the training dataset, and 3) the labeled images from crowd workers. In particular, we retrained all compared baselines using the labels returned by the crowd query to ensure a fair comparison. In addition, we also consider the random baseline, which estimates the damage severity for each image by randomly selecting a damage severity level from the possible categories. In our system, we implemented our CD-NAS model using Tensorflow 2.0<ref type="foot">foot_2</ref> , and trained our model using the NVIDIA Quadro RTX 6000 GPU. In our experiment, all hyperparameters were optimized using the Adam optimizer <ref type="bibr">[47]</ref>.</p><p>In particular, we set the learning rate to be 10 -6 . We also set the batch size to be 20, and the model was trained over 300 epochs.</p><p>To evaluate the performance of all compared schemes, we adopted three metrics that are widely used to evaluate the performance of multi-class image classification tasks in image processing: 1) F1-score, 2) Cohen's kappa Score (K-Score) <ref type="bibr">[48]</ref>, and Matthews correlation coefficient (MCC) <ref type="bibr">[49]</ref>. We use K-Score and MCC in our evaluation because we have an imbalanced dataset, and these two metrics have been proven to be reliable for imbalanced data <ref type="bibr">[50]</ref>. Higher F1-score, K-Score, and MCC indicate better performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Evaluation Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.1.">DDA Classification Accuracy with Different Crowdsourcing Settings</head><p>In the first set of experiments, we studied the performance of all the compared schemes with different crowdsourcing settings. First, we vary the crowd query frequency &#946; (Definition 11) from 1/5 to 1/3 for all compared schemes (e.g., we periodically send every one out of three images in the data stream when &#946; is 1/3 in crowd query) while fixing the number of participating crowd workers B (Definition 10) to be 3. Second, we change the number of participating crowd workers B in the crowd query from three to five while fixing the crowd query frequency to be 1/3. We set the size of the AI-crowd fusion window AF W (Definition 8) to be 40. The evaluation results are presented in Tables <ref type="table">2</ref> and<ref type="table">3</ref>. We observed that our CD-NAS consistently outperformed all the compared baselines in all experimental settings. For example, the performance gain of CD-NAS compared to the best-performing baseline (i.e., DenseNet) when the crowd query frequency &#946; = 1/3 and B = 3 on F1-Score, K-Score, and M CC are 5.76%, 7.48%, and 6.00%, respectively. The performance gains our scheme mainly come from the fact that it adaptively transfers the imperfect crowd intelligence to the optimal neural network selection for each image through the dynamic crowd-AI MLE design. In addition, we further evaluated the performance of our CD-NAS on additional settings of the two experimental variables (i.e., crowd query frequency &#946; and the crowd worker numbers B). We also compared the performance of the CD-NAS with the best-performing baselines from the different categories (i.e., DenseNet for DNN baselines in Table <ref type="table">2</ref> and Table <ref type="table">3</ref>, NasNetMobile for NAS baselines in Table <ref type="table">2</ref> and NasNetLarge for NAS baselines in Table <ref type="table">3</ref>). The results are shown in Figure <ref type="figure">4</ref> and Figure <ref type="figure">5</ref>, respectively. We observed that CD-NAS consistently outperformed the bestperforming baselines on different evaluation metrics for all evaluation settings.</p><p>Such evaluation results demonstrate the effectiveness of our scheme in leveraging the imperfect crowd knowledge to dynamically identify the optimal neural network architecture for each newly arrived image to provide accurate DDA results across different experimental variable settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.2.">Computational Efficiency</head><p>In the second set of experiments, we compared the computational cost of all the compared schemes (except the trivial random baseline) in the studied streaming DDA application. We define the computational cost as the average computational time required to estimate the damage severity of an image.</p><p>To ensure a fair comparison, we evaluated all schemes using the same NVIDIA Quadro RTX 6000 GPU. The evaluation results are presented in Tables <ref type="table">4</ref> and<ref type="table">5</ref>, respectively. We observe that our CD-NAS scheme takes orders of magnitude  less time to accomplish the DDA task compared to other baselines under different evaluation settings. This is because the compared baselines require additional computational time to retrain their models to capture the dynamics of the streaming data by leveraging the labels from crowd workers. In contrast, our CD-NAS designs a recursive expectation maximization solution that estimates the assessment reliability score of each neural network architecture on the fly without requiring any additional network retraining. In addition, we evaluated the computational cost of our CD-NAS for additional crowdsourcing settings.</p><p>Similar to the performance comparison in Section 5.3.1, we compare the performance of the CD-NAS with the best-performing baselines from each category in Tables <ref type="table">2</ref> and<ref type="table">3</ref>. The results are shown in Figure <ref type="figure">6</ref> and Figure <ref type="figure">7</ref>. We observe that our CD-NAS achieves a clear performance gain compared to the best-performing baselines in all different settings, which further demonstrates the effectiveness of the dynamic neural network architecture searching scheme in maintaining the best DDA performance while maintaining the lowest computational time cost.    provide a window for users of our CD-NAS scheme to select the AFW size to achieve a desirable DDA performance. In addition, we also note that the CD-NAS buffers few images in AFW its is too small, which often leads to suboptimal classification results. On the other hand, CD-NAS can buffer too many images in AFW when its size is too large, which often leads to a significantly reduced computation time. The actual selection of the AFW size will largely depend on the tradeoff between the classification accuracy and response time of the CD-NAS scheme that the users would like achieve in a particular DDA application. for the other scenarios are similar. Please note that we show the performance of CD-NAS from the 20 th timestep because our CD-NAS needs to explore the imagery data at the first few timesteps to overcome the cold start problem of the recursive EM algorithm. We observe that our CD-NAS can quickly boost the assessment performance and remain stable afterward, suggesting its effectiveness in recursively learning the optimal neural network architecture in the studied application.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>We presented a CD-NAS framework to address a crowd-driven dynamic NAS problem and improve the QoS of streaming DDA applications. Our solution is accuracy and computational cost. We believe that CD-NAS will provide useful insights to explore the collective power of AI and crowd intelligence in a rich set of AI-driven streaming applications (e.g., disaster response, truth discovery, intelligent transportation).</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>https://crisisnlp.qcri.org/</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>https://www.mturk.com/</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>https://www.tensorflow.org/</p></note>
		</body>
		</text>
</TEI>
