skip to main content


Title: Occupation Modularity and the Work Ecosystem
Occupations, like many other social systems, are hierarchical. They evolve with other elements within the work ecosystem including technology and skills. This paper investigates the relationships among these elements using an approach that combines network theory and modular systems theory. A new method of using work related data to build occupation networks and theorize occupation evolution is proposed. Using this technique, structural properties of occupations are discovered by way of community detection on a knowledge network built from labor statistics, based on more than 900 occupations and 18,000 tasks. The occupation networks are compared across the work ecosystem as well as over time to understand the interdependencies between task components and the coevolution of occupation, tasks, technology, and skills. In addition, a set of conjectures are articulated based on the observations made from occupation structure comparison and change over time.  more » « less
Award ID(s):
2113906 2026583 1939088 1909803 2128906
NSF-PAR ID:
10298088
Author(s) / Creator(s):
;
Date Published:
Journal Name:
International Conference on Information Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    More details about human movement patterns are needed to evaluate relationships between daily travel and malaria risk at finer scales. A multiagent mobility simulation model was built to simulate the movements of villagers between home and their workplaces in 2 townships in Myanmar.

    Methods

    An agent-based model (ABM) was built to simulate daily travel to and from work based on responses to a travel survey. Key elements for the ABM were land cover, travel time, travel mode, occupation, malaria prevalence, and a detailed road network. Most visited network segments for different occupations and for malaria-positive cases were extracted and compared. Data from a separate survey were used to validate the simulation.

    Results

    Mobility characteristics for different occupation groups showed that while certain patterns were shared among some groups, there were also patterns that were unique to an occupation group. Forest workers were estimated to be the most mobile occupation group, and also had the highest potential malaria exposure associated with their daily travel in Ann Township. In Singu Township, forest workers were not the most mobile group; however, they were estimated to visit regions that had higher prevalence of malaria infection over other occupation groups.

    Conclusions

    Using an ABM to simulate daily travel generated mobility patterns for different occupation groups. These spatial patterns varied by occupation. Our simulation identified occupations at a higher risk of being exposed to malaria and where these exposures were more likely to occur.

     
    more » « less
  2. As work changes, so does technology. The two coevolve as part of a work ecosystem. This paper suggests a way of plotting this coevolution by comparing the embeddings - high dimensional vector representations - of textual descriptions of tasks, occupations and technologies. Tight coupling between tasks and technologies - measured by the distances between vectors - are shown to be associated with high task importance. Moreover, tasks that are more prototypical in an occupation are more important. These conclusions were reached through an analysis of the 2020 data release of The Occupational Information Network (O*NET) from the U.S. Department of Labor on 967 occupations and 19,533 tasks. One occupation, journalism, is analyzed in depth, and conjectures are formed related to the ways technologies and tasks evolve through both design and exaptation. 
    more » « less
  3. Automation continues to be a disruptive force in the workforce. In particular, new automated technologies are projected to replace many mid-skill jobs, potentially displacing millions of workers. Career planning agencies and other organizations can help support workers if they are able to effectively identify optimal transition occupations for displaced workers. We drew upon the 24.2 Occupational Information Network (O*NET) Database to conduct two related studies that identify alternate occupations for truck drivers, who are at risk of job loss due to the adoption of autonomous vehicles. In Study 1, we statistically compared the jobs that we identified based on different search methods using O*NET classifications based on their similarity to the knowledge, skills, values, and interests held by truck drivers. In Study 2, we conducted a survey of truck drivers to evaluate their perceptions of the occupations identified as objectively similar to their occupation. Results indicate that optimal transition occupations may be identified by searching for occupations that share skills as well as the same work activities/industry as a given occupation. These findings hold further implications for career planning organizations and policymakers to ease workforce disruption due to automation. 
    more » « less
  4. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should produce identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues. 
    more » « less
  5. This study aims to investigate the collaboration processes of immigrant families as they search for online information together. Immigrant English-language learning adults of lower socioeconomic status often work collaboratively with their children to search the internet. Family members rely on each other’s language and digital literacy skills in this collaborative process known as online search and brokering (OSB). While previous work has identified ecological factors that impact OSB, research has not yet distilled the specific learning processes behind such collaborations. Design/methodology/approach: For this study, the authors adhere to practices of a case study examination. This study’s participants included parents, grandparents and children aged 10–17 years. Most adults were born in Mexico, did not have a college-degree, worked in service industries and represented a lower-SES population. This study conducted two to three separate in-home family visits per family with interviews and online search tasks. Findings: From a case study analysis of three families, this paper explores the funds of knowledge, resilience, ecological support and challenges that children and parents face, as they engage in collaborative OSB experiences. This study demonstrates how in-home computer-supported collaborative processes are often informal, social, emotional and highly relevant to solving information challenges. Research limitations/implications: An intergenerational OSB process is different from collaborative online information problem-solving that happens between classroom peers or coworkers. This study’s research shows how both parents and children draw on their funds of knowledge, resilience and ecological support systems when they search collaboratively, with and for their family members, to problem solve. This is a case study of three families working in collaboration with each other. This case study informs analytical generalizations and theory-building rather than statistical generalizations about families. Practical implications: Designers need to recognize that children and youth are using the same tools as adults to seek high-level critical information. This study’s model suggests that if parents and children are negotiating information seeking with the same technology tools but different funds of knowledge, experience levels and skills, the presentation of information (e.g. online search results, information visualizations) needs to accommodate different levels of understanding. This study recommends designers work closely with marginalized communities through participatory design methods to better understand how interfaces and visuals can help accommodate youth invisible work. Social implications: The authors have demonstrated in this study that learning and engaging in family online searching is not only vital to the development of individual and digital literacy skills, it is a part of family learning. While community services, libraries and schools have a responsibility to support individual digital and information literacy development, this study’s model highlights the need to recognize funds of knowledge, family resiliency and asset-based learning. Schools and teachers should identify and harness youth invisible work as a form of learning at home. The authors believe educators can do this by highlighting the importance of information problem solving in homes and youth in their families. Libraries and community centers also play a critical role in supporting parents and adults for technical assistance (e.g. WiFi access) and information resources. Originality/value: This study’s work indicates new conditions fostering productive joint media engagement (JME) around OSB. This study contributes a generative understanding that promotes studying and designing for JME, where family responsibility is the focus.

     
    more » « less