skip to main content

Title: Occupation Modularity and the Work Ecosystem
Occupations, like many other social systems, are hierarchical. They evolve with other elements within the work ecosystem including technology and skills. This paper investigates the relationships among these elements using an approach that combines network theory and modular systems theory. A new method of using work related data to build occupation networks and theorize occupation evolution is proposed. Using this technique, structural properties of occupations are discovered by way of community detection on a knowledge network built from labor statistics, based on more than 900 occupations and 18,000 tasks. The occupation networks are compared across the work ecosystem as well as over time to understand the interdependencies between task components and the coevolution of occupation, tasks, technology, and skills. In addition, a set of conjectures are articulated based on the observations made from occupation structure comparison and change over time.
Award ID(s):
2113906 2026583 1939088 1909803 2128906
Publication Date:
Journal Name:
International Conference on Information Systems
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background More details about human movement patterns are needed to evaluate relationships between daily travel and malaria risk at finer scales. A multiagent mobility simulation model was built to simulate the movements of villagers between home and their workplaces in 2 townships in Myanmar. Methods An agent-based model (ABM) was built to simulate daily travel to and from work based on responses to a travel survey. Key elements for the ABM were land cover, travel time, travel mode, occupation, malaria prevalence, and a detailed road network. Most visited network segments for different occupations and for malaria-positive cases were extracted and compared. Data from a separate survey were used to validate the simulation. Results Mobility characteristics for different occupation groups showed that while certain patterns were shared among some groups, there were also patterns that were unique to an occupation group. Forest workers were estimated to be the most mobile occupation group, and also had the highest potential malaria exposure associated with their daily travel in Ann Township. In Singu Township, forest workers were not the most mobile group; however, they were estimated to visit regions that had higher prevalence of malaria infection over other occupation groups. Conclusions Using anmore »ABM to simulate daily travel generated mobility patterns for different occupation groups. These spatial patterns varied by occupation. Our simulation identified occupations at a higher risk of being exposed to malaria and where these exposures were more likely to occur.« less
  2. As work changes, so does technology. The two coevolve as part of a work ecosystem. This paper suggests a way of plotting this coevolution by comparing the embeddings - high dimensional vector representations - of textual descriptions of tasks, occupations and technologies. Tight coupling between tasks and technologies - measured by the distances between vectors - are shown to be associated with high task importance. Moreover, tasks that are more prototypical in an occupation are more important. These conclusions were reached through an analysis of the 2020 data release of The Occupational Information Network (O*NET) from the U.S. Department of Labor on 967 occupations and 19,533 tasks. One occupation, journalism, is analyzed in depth, and conjectures are formed related to the ways technologies and tasks evolve through both design and exaptation.
  3. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should producemore »identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues.« less
  4. Computer labs are commonly used in computing education to help students reinforce the knowledge obtained in classrooms and to gain hands-on experience on specific learning subjects. While traditional computer labs are based on physical computer centers on campus, more and more virtual computer lab systems (see, e.g., [1, 2, 3, 4]) have been developed that allow students to carry out labs on virtualized resources remotely through the internet. Virtual computer labs make it possible for students to use their own computers at home, instead of relying on computer centers on campus to work on lab assignments. However, they also make it difficult for students to collaborate, due to the fact that students work remotely and there is a lack of support of sharing and collaboration. This is in contrast to traditional computer labs where students naturally feel the presence of their peers in a physical lab room and can easily work together and help each other if needed. Funded by NSF’s Division of Undergraduate Education, this project develops a collaborative virtual computer lab (CVCL) environment to support collaborative learning in virtual computer labs. The CVCL environment leverages existing open source collaboration tools and desktop sharing technologies and adds new functionsmore »unique to virtual computer labs to make it easy for students to collaborate while working on computer labs remotely. It also implements several collaborative lab models to support different forms of collaboration in both formal and informal settings. We have developed the main functions of the CVCL environment and begun to use it in classes in the Computer Science (CS) department at Georgia State University. While the original project focuses on computer labs in its traditional sense, the issue of lack of collaboration applies to much broader learning settings where students work on tasks or assignments on computers, with or without being associated with a lab environment. Due to the high mobility of students in modern campuses and the fact that many learning activities are carried out over the Internet, computer-based learning increasingly happen in students’ personal spaces (e.g., homes, apartments), as opposed to public learning spaces (e.g., laboratories, libraries). In these personal spaces, it is difficult for students to get help from classmates or teaching assistants (TAs) when encountering problems. As a result, collaborative learning is difficult and rare. This is especially true for urban universities such as Georgia State University where a significant portion of students are part-time students and/or commute. To address this issue, we intend to broaden the concept of “virtual computer lab” to include general computer based learning happening in “virtual space,” which is any location where people can meet using networked digital devices [5]. Virtual space is recognized as an increasingly important part of “learning spaces” and asks for support from both the technology aspect and learning theory aspect [5]. Collaborative learning environments that support remote collaboration in virtual computer labs would fill an important need in this broader trend.« less
  5. We measure the labor-demand effects of two simultaneous forms of technological change—automation of production processes and consolidation of parts. We collect detailed shop-floor data from four semiconductor firms with different levels of automation and consolidation. Using the O*NET survey instrument, we collect novel task data for operator laborers that contains process-step level skill requirements, including operations and control, near vision, and dexterity requirements. We then use an engineering process model to separate the effects of the distinct technological changes on these process tasks and operator skill requirements. Within an occupation, we show that aggregate measures of technological change can mask the opposing skill biases of multiple simultaneous technological changes. In our empirical context, automation polarizes skill demand as routine, codifiable tasks requiring low and medium skills are executed by machines instead of humans, whereas the remaining and newly created human tasks tend to require low and high skills. Consolidation converges skill demand as formerly divisible low and high skill tasks are transformed into a single indivisible task with medium skill requirements and higher cost of failure. We conclude by developing a new theory for how the separability of tasks mediates the effect of technology change on skill demand by changingmore »the divisibility of labor.« less