skip to main content


This content will become publicly available on June 21, 2024

Title: Long-Range Social Influence in Phone Communication Networks on Offline Adoption Decisions
We use high-resolution mobile phone data with geolocation information and propose a novel technical framework to study how social influence propagates within a phone communication network and affects the offline decision to attend a performance event. Our fine-grained data are based on the universe of phone calls made in a European country between January and July 2016. We isolate social influence from observed and latent homophily by taking advantage of the rich spatial-temporal information and the social interactions available from the longitudinal behavioral data. We find that influence stemming from phone communication is significant and persists up to four degrees of separation in the communication network. Building on this finding, we introduce a new “influence” centrality measure that captures the empirical pattern of influence decay over successive connections. A validation test shows that the average influence centrality of the adopters at the beginning of each observational period can strongly predict the number of eventual adopters and has a stronger predictive power than other prevailing centrality measures such as the eigenvector centrality and state-of-the-art measures such as diffusion centrality. Our centrality measure can be used to improve optimal seeding strategies in contexts with influence over phone calls, such as targeted or viral marketing campaigns. Finally, we quantitatively demonstrate how raising the communication probability over each connection, as well as the number of initial seeds, can significantly amplify the expected adoption in the network and raise net revenue after taking into account the cost of these interventions. History: Sam Ransbotham, Senior Editor; Yan Huang, Associate Editor. Funding: Y. Leng acknowledges the support provided by the National Science Foundation [Grant IIS-2153468]. E. Moro acknowledges the support provided by the National Science Foundation [Grant 2218748]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/isre.2023.1231 .  more » « less
Award ID(s):
2153468
NSF-PAR ID:
10451770
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Information Systems Research
ISSN:
1047-7047
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chen, Yan (Ed.)
    Do people have well-defined social preferences waiting to be applied when making decisions? Or do they have to construct social decisions on the spot? If the latter, how are those decisions influenced by the way in which information is acquired and evaluated? These temporal dynamics are fundamental to understanding how people trade off selfishness and prosociality in organizations and societies. Here, we investigate how the temporal dynamics of the choice process shape social decisions in three studies using response times and mouse tracking. In the first study, participants made binary decisions in mini-dictator games with and without time constraints. Using mouse trajectories and a starting time drift diffusion model, we find that, regardless of time constraints, selfish participants were delayed in processing others’ payoffs, whereas the opposite was true for prosocial participants. The independent mouse trajectory and computational modeling analyses identified consistent measures of the delay between considering one’s own and others’ payoffs (self-onset delay, SOD). This measure correlated with individual differences in prosociality and predicted heterogeneous effects of time constraints on preferences. We confirmed these results in two additional studies, one a purely behavioral study in which participants made decisions by pressing computer keys, and the other a replication of the mouse-tracking study. Together, these results indicate that people preferentially process either self or others’ payoffs early in the choice process. The intrachoice dynamics are crucial in shaping social preferences and might be manipulated via nudge policies (e.g., manipulating the display order or saliency of self and others’ outcomes) for behavior in managerial or other contexts. This paper was accepted by Yan Chen, behavioral economics and decisions analysis. Funding: F. Chen acknowledges support from the National Natural Science Foundation of China [Grants 71803174 and 72173113]. Z. Zhu acknowledges support from the Ministry of Science and Technology [Grant STI 2030-Major Projects 2021ZD0200409]. Q. Shen acknowledges support from the National Natural Science Foundation of China [Grants 71971199 and 71942004]. I. Krajbich acknowledges support from the U.S. National Science Foundation [Grant 2148982]. This work was also supported by the James McKeen Cattell Fund. Supplemental Material: The online appendix and data are available at https://doi.org/10.1287/mnsc.2023.4732 . 
    more » « less
  2. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less
  3. We provide tools to analyze information design problems subject to constraints. We do so by extending an insight by Le Treust and Tomala to the case of multiple inequality and equality constraints. Namely, that an information design problem subject to constraints can be represented as an unconstrained information design problem with additional states, one for each constraint. Thus, without loss of generality, optimal solutions induce as many posteriors as the number of states and constraints. We provide results that refine this upper bound. Furthermore, we provide conditions under which there is no duality gap in constrained information design, thus validating a Lagrangian approach. We illustrate our results with applications to mechanism design with limited commitment and persuasion of a privately informed receiver.

    Funding: L. Doval acknowledges the support of the National Science Foundation through [Grant SES-2131706]. V. Skreta acknowledges the support from the National Science Foundation through [Grant SES-1851729] and from the European Research Council (ERC) through consolidator [Grant 682417].

    Supplemental Material: The e-companion is available at https://doi.org/10.1287/moor.2022.1346 .

     
    more » « less
  4. The separability of clusters is one of the most desired properties in clustering. There is a wide range of settings in which different clusterings of the same data set appear. We are interested in applications for which there is a need for an explicit, gradual transition of one separable clustering into another one. This transition should be a sequence of simple, natural steps that upholds separability of the clusters throughout. We design an algorithm for such a transition. We exploit the intimate connection of separability and linear programming over bounded-shape partition and transportation polytopes: separable clusterings lie on the boundary of partition polytopes and form a subset of the vertices of the corresponding transportation polytopes, and circuits of both polytopes are readily interpreted as sequential or cyclical exchanges of items between clusters. This allows for a natural approach to achieve the desired transition through a combination of two walks: an edge walk between two so-called radial clusterings in a transportation polytope, computed through an adaptation of classical tools of sensitivity analysis and parametric programming, and a walk from a separable clustering to a corresponding radial clustering, computed through a tailored, iterative routine updating cluster sizes and reoptimizing the cluster assignment of items. Funding: Borgwardt gratefully acknowledges support of this work through National Science Foundation [Grant 2006183] Circuit Walks in Optimization, Algorithmic Foundations, Division of Computing and Communication Foundations; through Air Force Office of Scientific Research [Grant FA9550-21-1-0233] The Hirsch Conjecture for Totally-Unimodular Polyhedra; and through Simons Collaboration [Grant 524210] Polyhedral Theory in Data Analytics. Happach has been supported by the Alexander von Humboldt Foundation with funds from the German Federal Ministry of Education and Research. 
    more » « less
  5. With computing impacting most every professional field, it has become essential to provide pathways for students other than those majoring in computer science to acquire computing knowledge and skills. Virtually all employers and graduate and professional schools seek these skills in their employees or students, regardless of discipline. Academia currently leans towards approaches such as double majors or combined majors between computer science and other non-CS disciplines, commonly referred to as “CS+X” programs. These programs tend to require rigorous courses gleaned from the institutions’ courses for computer science majors. Thus, they may not meet the needs of majors in disciplines such as the social and biological sciences, humanities, and others. The University of Maryland, Baltimore County (UMBC) is taking an approach more suitably termed “X+CS” to fulfill the computing needs of non-CS majors. As part of a National Science Foundation (NSF) grant, we are developing a “computing” minor specifically to meet their needs. To date, we have piloted the first two of the minor’s approximately six courses. The first is a variation on the existing Computer Science I course required for majors but restricted to nonmajors. Both versions of the course use the Python language and cover the same programming content, but with the non-majors assigned projects with relevance to non-CS disciplines. We use the same student assessment measures of homework, projects, and examinations for both courses. After four semesters, results show that non-CS majors perform comparably to majors. Students also express increased interest in computing and satisfaction with being part of a non- CS major cohort. The second course was piloted in fall 2019. It is a new course intended to enhance and hone programming skills and introduce topics such as web scraping, HTML and CSS, web application development, data formats, and database use. Students again express increased interest in computing and were already beginning to apply the computing skills that they were learning to their non-CS courses. As a welcome side effect, we experienced a significant increase in the number of women and under-represented minorities (URMs) in these two courses when compared with CS-major specific courses. Overall, women comprised 52% of the population, with URMs following a similar upward trend. We are currently developing the third course in the computing minor and exploring options for the remaining three. Possibilities include electives from our Information Systems major. We will also be working with our science, social science, and humanities departments to utilize existing courses in those disciplines that apply computing. The student response that we have received thus far provides us with evidence that our computing minor will be popular among UMBC’s non-CS population, providing them with a more suitable and positive computing education than existing CS+X efforts. 
    more » « less