skip to main content

Title: A Look into User Privacy and Third-party Applications in Facebook
A huge amount of personal and sensitive data is shared on Facebook, which makes it a prime target for attackers. Adversaries can exploit third-party applications connected to a user’s Facebook profile (i.e., Facebook apps) to gain access to this personal information. Users’ lack of knowledge and the varying privacy policies of these apps make them further vulnerable to information leakage. However, little has been done to identify mismatches between users’ perceptions and the privacy policies of Facebook apps. We address this challenge in our work. We conducted a lab study with 31 participants, where we received data on how they share information in Facebook, their Facebook-related security and privacy practices, and their perceptions on the privacy aspects of 65 frequently-used Facebook apps in terms of data collection, sharing, and deletion. We then compared participants’ perceptions with the privacy policy of each reported app. Participants also reported their expectations about the types of information that should not be collected or shared by any Facebook app. Our analysis reveals significant mismatches between users’ privacy perceptions and reality (i.e., privacy policies of Facebook apps), where we identified over-optimism not only in users’ perceptions of information collection, but also on their self-efficacy in protecting more » their information in Facebook despite experiencing negative incidents in the past. To the best of our knowledge, this is the first study on the gap between users’ privacy perceptions around Facebook apps and the reality. The findings from this study offer directions for future research to address that gap through designing usable, effective, and personalized privacy notices to help users to make informed decisions about using Facebook apps. « less
Authors:
; ;
Editors:
Furnell, Steven
Award ID(s):
1949694
Publication Date:
NSF-PAR ID:
10250311
Journal Name:
Information and computer security
ISSN:
2056-497X
Sponsoring Org:
National Science Foundation
More Like this
  1. The dominant privacy framework of the information age relies on notions of “notice and consent.” That is, service providers will disclose, often through privacy policies, their data collection practices, and users can then consent to their terms. However, it is unlikely that most users comprehend these disclosures, which is due in no small part to ambiguous, deceptive, and misleading statements. By comparing actual collection and sharing practices to disclosures in privacy policies, we demonstrate the scope of the problem. Through analysis of 68,051 apps from the Google Play Store, their corresponding privacy policies, and observed data transmissions, we investigated themore »potential misrepresentations of apps in the Designed For Families (DFF) program, inconsistencies in disclosures regarding third-party data sharing, as well as contradictory disclosures about secure data transmissions. We find that of the 8,030 DFF apps (i.e., apps directed at children), 9.1% claim that their apps are not directed at children, while 30.6% claim to have no knowledge that the received data comes from children. In addition, we observe that 10.5% of 68,051 apps share personal identifiers with third-party service providers, yet do not declare any in their privacy policies, and only 22.2% of the apps explicitly name third parties. This ultimately makes it not only difficult, but in most cases impossible, for users to establish where their personal data is being processed. Furthermore, we find that 9,424 apps do not use TLS when transmitting personal identifiers, yet 28.4% of these apps claim to take measures to secure data transfer. Ultimately, these divergences between disclosures and actual app behaviors illustrate the ridiculousness of the notice and consent framework.« less
  2. Abstract STUDY QUESTION To what extent does the use of mobile computing apps to track the menstrual cycle and the fertile window influence fecundability among women trying to conceive? SUMMARY ANSWER After adjusting for potential confounders, use of any of several different apps was associated with increased fecundability ranging from 12% to 20% per cycle of attempt. WHAT IS KNOWN ALREADY Many women are using mobile computing apps to track their menstrual cycle and the fertile window, including while trying to conceive. STUDY DESIGN, SIZE, DURATION The Pregnancy Study Online (PRESTO) is a North American prospective internet-based cohort of womenmore »who are aged 21–45 years, trying to conceive and not using contraception or fertility treatment at baseline. PARTICIPANTS/MATERIALS, SETTING, METHODS We restricted the analysis to 8363 women trying to conceive for no more than 6 months at baseline; the women were recruited from June 2013 through May 2019. Women completed questionnaires at baseline and every 2 months for up to 1 year. The main outcome was fecundability, i.e. the per-cycle probability of conception, which we assessed using self-reported data on time to pregnancy (confirmed by positive home pregnancy test) in menstrual cycles. On the baseline and follow-up questionnaires, women reported whether they used mobile computing apps to track their menstrual cycles (‘cycle apps’) and, if so, which one(s). We estimated fecundability ratios (FRs) for the use of cycle apps, adjusted for female age, race/ethnicity, prior pregnancy, BMI, income, current smoking, education, partner education, caffeine intake, use of hormonal contraceptives as the last method of contraception, hours of sleep per night, cycle regularity, use of prenatal supplements, marital status, intercourse frequency and history of subfertility. We also examined the impact of concurrent use of fertility indicators: basal body temperature, cervical fluid, cervix position and/or urine LH. MAIN RESULTS AND THE ROLE OF CHANCE Among 8363 women, 6077 (72.7%) were using one or more cycle apps at baseline. A total of 122 separate apps were reported by women. We designated five of these apps before analysis as more likely to be effective (Clue, Fertility Friend, Glow, Kindara, Ovia; hereafter referred to as ‘selected apps’). The use of any app at baseline was associated with 20% increased fecundability, with little difference between selected apps versus other apps (selected apps FR (95% CI): 1.20 (1.13, 1.28); all other apps 1.21 (1.13, 1.30)). In time-varying analyses, cycle app use was associated with 12–15% increased fecundability (selected apps FR (95% CI): 1.12 (1.04, 1.21); all other apps 1.15 (1.07, 1.24)). When apps were used at baseline with one or more fertility indicators, there was higher fecundability than without fertility indicators (selected apps with indicators FR (95% CI): 1.23 (1.14, 1.34) versus without indicators 1.17 (1.05, 1.30); other apps with indicators 1.30 (1.19, 1.43) versus without indicators 1.16 (1.06, 1.27)). In time-varying analyses, results were similar when stratified by time trying at study entry (<3 vs. 3–6 cycles) or cycle regularity. For use of the selected apps, we observed higher fecundability among women with a history of subfertility: FR 1.33 (1.05–1.67). LIMITATIONS, REASONS FOR CAUTION Neither regularity nor intensity of app use was ascertained. The prospective time-varying assessment of app use was based on questionnaires completed every 2 months, which would not capture more frequent changes. Intercourse frequency was also reported retrospectively and we do not have data on timing of intercourse relative to the fertile window. Although we controlled for a wide range of covariates, we cannot exclude the possibility of residual confounding (e.g. choosing to use an app in this observational study may be a marker for unmeasured health habits promoting fecundability). Half of the women in the study received a free premium subscription for one of the apps (Fertility Friend), which may have increased the overall prevalence of app use in the time-varying analyses, but would not affect app use at baseline. Most women in the study were college educated, which may limit application of results to other populations. WIDER IMPLICATIONS OF THE FINDINGS Use of a cycle app, especially in combination with observation of one or more fertility indicators (basal body temperature, cervical fluid, cervix position and/or urine LH), may increase fecundability (per-cycle pregnancy probability) by about 12–20% for couples trying to conceive. We did not find consistent evidence of improved fecundability resulting from use of one specific app over another. STUDY FUNDING/COMPETING INTEREST(S) This research was supported by grants, R21HD072326 and R01HD086742, from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, USA. In the last 3 years, Dr L.A.W. has served as a fibroid consultant for AbbVie.com. Dr L.A.W. has also received in-kind donations from Sandstone Diagnostics, Swiss Precision Diagnostics, FertilityFriend.com and Kindara.com for primary data collection and participant incentives in the PRESTO cohort. Dr J.B.S. reports personal fees from Swiss Precision Diagnostics, outside the submitted work. The remaining authors have nothing to declare. TRIAL REGISTRATION NUMBER N/A.« less
  3. Abstract Smartphone location sharing is a particularly sensitive type of information disclosure that has implications for users’ digital privacy and security as well as their physical safety. To understand and predict location disclosure behavior, we developed an Android app that scraped metadata from users’ phones, asked them to grant the location-sharing permission to the app, and administered a survey. We compared the effectiveness of using self-report measures commonly used in the social sciences, behavioral data collected from users’ mobile phones, and a new type of measure that we developed, representing a hybrid of self-report and behavioral data to contextualize users’more »attitudes toward their past location-sharing behaviors. This new type of measure is based on a reflective learning paradigm where individuals reflect on past behavior to inform future behavior. Based on data from 380 Android smartphone users, we found that the best predictors of whether participants granted the location-sharing permission to our app were: behavioral intention to share information with apps, the “FYI” communication style, and one of our new hybrid measures asking users whether they were comfortable sharing location with apps currently installed on their smartphones. Our novel, hybrid construct of self-reflection on past behavior significantly improves predictive power and shows the importance of combining social science and computational science approaches for improving the prediction of users’ privacy behaviors. Further, when assessing the construct validity of the Behavioral Intention construct drawn from previous location-sharing research, our data showed a clear distinction between two different types of Behavioral Intention: self-reported intention to use mobile apps versus the intention to share information with these apps. This finding suggests that users desire the ability to use mobile apps without being required to share sensitive information, such as their location. These results have important implications for cybersecurity research and system design to meet users’ location-sharing privacy needs.« less
  4. This paper presents the results of an interview study with twelve TikTok users to explore user awareness, perception, and experiences with the app’s algorithm in the context of privacy. The social media entertainment app TikTok collects user data to cater individualized video feeds based on users’ engagement with presented content which is regulated in a complex and overly long privacy policy. Our results demonstrate that participants generally have very little knowledge of the actual privacy regulations which is argued for with the benefit of receiving free entertaining content. However, participants experienced privacy-related downsides when algorithmically catered video content increasingly adaptedmore »to their biography, interests, or location and they in turn realized the detail of personal data that TikTok had access to. This illustrates the tradeoff users have to make between allowing TikTok to access their personal data and having favorable video consumption experiences on the app.« less
  5. The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmetmore »is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert data scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating. The challenge: enabling wide participation With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5]. In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles: 1. No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.) 2. No participant should need to be an expert data scientist. 3. No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.) 4. No participant should need to provide their own computing resources. In addition to the above, our platform should further • guide participants through the entire process from sign-up to model submission, • facilitate collaboration, and • provide instant feedback to the participants through data visualisation and intermediate online leaderboards. The platform The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team. Figure 1: High-level architecture of the challenge platform Measuring success The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication. Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists. Figure 2: Primary occupation as reported by challenge participants Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience. Conclusion Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge.« less