skip to main content

Title: Occupation Modularity and the Work Ecosystem
Occupations, like many other social systems, are hierarchical. They evolve with other elements within the work ecosystem including technology and skills. This paper investigates the relationships among these elements using an approach that combines network theory and modular systems theory. A new method of using work related data to build occupation networks and theorize occupation evolution is proposed. Using this technique, structural properties of occupations are discovered by way of community detection on a knowledge network built from labor statistics, based on more than 900 occupations and 18,000 tasks. The occupation networks are compared across the work ecosystem as well as over time to understand the interdependencies between task components and the coevolution of occupation, tasks, technology, and skills. In addition, a set of conjectures are articulated based on the observations made from occupation structure comparison and change over time.
Authors:
;
Award ID(s):
2113906 2026583 1939088 1909803 2128906
Publication Date:
NSF-PAR ID:
10298088
Journal Name:
International Conference on Information Systems
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    More details about human movement patterns are needed to evaluate relationships between daily travel and malaria risk at finer scales. A multiagent mobility simulation model was built to simulate the movements of villagers between home and their workplaces in 2 townships in Myanmar.

    Methods

    An agent-based model (ABM) was built to simulate daily travel to and from work based on responses to a travel survey. Key elements for the ABM were land cover, travel time, travel mode, occupation, malaria prevalence, and a detailed road network. Most visited network segments for different occupations and for malaria-positive cases were extracted and compared. Data from a separate survey were used to validate the simulation.

    Results

    Mobility characteristics for different occupation groups showed that while certain patterns were shared among some groups, there were also patterns that were unique to an occupation group. Forest workers were estimated to be the most mobile occupation group, and also had the highest potential malaria exposure associated with their daily travel in Ann Township. In Singu Township, forest workers were not the most mobile group; however, they were estimated to visit regions that had higher prevalence of malaria infection over other occupation groups.

    Conclusions

    Using an ABM to simulate dailymore »travel generated mobility patterns for different occupation groups. These spatial patterns varied by occupation. Our simulation identified occupations at a higher risk of being exposed to malaria and where these exposures were more likely to occur.

    « less
  2. As work changes, so does technology. The two coevolve as part of a work ecosystem. This paper suggests a way of plotting this coevolution by comparing the embeddings - high dimensional vector representations - of textual descriptions of tasks, occupations and technologies. Tight coupling between tasks and technologies - measured by the distances between vectors - are shown to be associated with high task importance. Moreover, tasks that are more prototypical in an occupation are more important. These conclusions were reached through an analysis of the 2020 data release of The Occupational Information Network (O*NET) from the U.S. Department of Labor on 967 occupations and 19,533 tasks. One occupation, journalism, is analyzed in depth, and conjectures are formed related to the ways technologies and tasks evolve through both design and exaptation.
  3. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should producemore »identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues.« less
  4. Computer labs are commonly used in computing education to help students reinforce the knowledge obtained in classrooms and to gain hands-on experience on specific learning subjects. While traditional computer labs are based on physical computer centers on campus, more and more virtual computer lab systems (see, e.g., [1, 2, 3, 4]) have been developed that allow students to carry out labs on virtualized resources remotely through the internet. Virtual computer labs make it possible for students to use their own computers at home, instead of relying on computer centers on campus to work on lab assignments. However, they also make it difficult for students to collaborate, due to the fact that students work remotely and there is a lack of support of sharing and collaboration. This is in contrast to traditional computer labs where students naturally feel the presence of their peers in a physical lab room and can easily work together and help each other if needed. Funded by NSF’s Division of Undergraduate Education, this project develops a collaborative virtual computer lab (CVCL) environment to support collaborative learning in virtual computer labs. The CVCL environment leverages existing open source collaboration tools and desktop sharing technologies and adds new functionsmore »unique to virtual computer labs to make it easy for students to collaborate while working on computer labs remotely. It also implements several collaborative lab models to support different forms of collaboration in both formal and informal settings. We have developed the main functions of the CVCL environment and begun to use it in classes in the Computer Science (CS) department at Georgia State University. While the original project focuses on computer labs in its traditional sense, the issue of lack of collaboration applies to much broader learning settings where students work on tasks or assignments on computers, with or without being associated with a lab environment. Due to the high mobility of students in modern campuses and the fact that many learning activities are carried out over the Internet, computer-based learning increasingly happen in students’ personal spaces (e.g., homes, apartments), as opposed to public learning spaces (e.g., laboratories, libraries). In these personal spaces, it is difficult for students to get help from classmates or teaching assistants (TAs) when encountering problems. As a result, collaborative learning is difficult and rare. This is especially true for urban universities such as Georgia State University where a significant portion of students are part-time students and/or commute. To address this issue, we intend to broaden the concept of “virtual computer lab” to include general computer based learning happening in “virtual space,” which is any location where people can meet using networked digital devices [5]. Virtual space is recognized as an increasingly important part of “learning spaces” and asks for support from both the technology aspect and learning theory aspect [5]. Collaborative learning environments that support remote collaboration in virtual computer labs would fill an important need in this broader trend.« less
  5. Beavers have established themselves as a key component of low arctic ecosystems over the past several decades. Beavers are widely recognized as ecosystem engineers, but their effects on permafrost-dominated landscapes in the Arctic remain unclear. In this study, we document the occurrence, reconstruct the timing, and highlight the effects of beaver activity on a small creek valley confined by ice-rich permafrost on the Seward Peninsula, Alaska using multi-dimensional remote sensing analysis of satellite (Landsat-8, Sentinel-2, Planet CubeSat, and DigitalGlobe Inc./MAXAR) and unmanned aircraft systems (UAS) imagery. Beaver activity along the study reach of Swan Lake Creek appeared between 2006 and 2011 with the construction of three dams. Between 2011 and 2017, beaver dam numbers increased, with the peak occurring in 2017 (n = 9). Between 2017 and 2019, the number of dams decreased (n = 6), while the average length of the dams increased from 20 to 33 m. Between 4 and 20 August 2019, following a nine-day period of record rainfall (>125 mm), the well-established dam system failed, triggering the formation of a beaver-induced permafrost degradation feature. During the decade of beaver occupation between 2011 and 2021, the creek valley widened from 33 to 180 m (~450% increase) andmore »the length of the stream channel network increased from ~0.6 km to more than 1.9 km (220% increase) as a result of beaver engineering and beaver-induced permafrost degradation. Comparing vegetation (NDVI) and snow (NDSI) derived indices from Sentinel-2 time-series data acquired between 2017 and 2021 for the beaver-induced permafrost degradation feature and a nearby unaffected control site, showed that peak growing season NDVI was lowered by 23% and that it extended the length of the snow-cover period by 19 days following the permafrost disturbance. Our analysis of multi-dimensional remote sensing data highlights several unique aspects of beaver engineering impacts on ice-rich permafrost landscapes. Our detailed reconstruction of the beaver-induced permafrost degradation event may also prove useful for identifying degradation of ice-rich permafrost in optical time-series datasets across regional scales. Future field- and remote sensing-based observations of this site, and others like it, will provide valuable information for the NSF-funded Arctic Beaver Observation Network (A-BON) and the third phase of the NASA Arctic-Boreal Vulnerability Experiment (ABoVE) Field Campaign.« less