NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A data science and machine learning approach to continuous analysis of Shakespeare's plays

https://doi.org/10.46298/jdmdh.10829

Swisher, Charles; Shamir, Lior (July 2023, Journal of Data Mining & Digital Humanities)

The availability of quantitative text analysis methods has provided new waysof analyzing literature in a manner that was not available in thepre-information era. Here we apply comprehensive machine learning analysis tothe work of William Shakespeare. The analysis shows clear changes in the styleof writing over time, with the most significant changes in the sentence length,frequency of adjectives and adverbs, and the sentiments expressed in the text.Applying machine learning to make a stylometric prediction of the year of theplay shows a Pearson correlation of 0.71 between the actual and predicted year,indicating that Shakespeare's writing style as reflected by the quantitativemeasurements changed over time. Additionally, it shows that the stylometrics ofsome of the plays is more similar to plays written either before or after theyear they were written. For instance, Romeo and Juliet is dated 1596, but ismore similar in stylometrics to plays written by Shakespeare after 1600. Thesource code for the analysis is available for free download.
more » « less
Full Text Available
Reanalysis of the Spin Direction Distribution of Galaxy Zoo SDSS Spiral Galaxies

https://doi.org/10.1155/2023/4114004

Mcadam, Darius; Shamir, Lior (February 2023, Advances in Astronomy)
Gaite, Jose (Ed.)
The distribution of the spin directions of spiral galaxies in the Sloan Digital Sky Survey has been a topic of debate in the past two decades, with conflicting conclusions reported even in cases where the same data were used. Here, we follow one of the previous experiments by applying the SpArcFiRe algorithm to annotate the spin directions in an original dataset of Galaxy Zoo 1. The annotation of the galaxy spin directions is carried out after the first step of selecting the spiral galaxies in three different manners: manual analysis by Galaxy Zoo classifications, by a model-driven computer analysis, and with no selection of spiral galaxies. The results show that when spiral galaxies are selected by Galaxy Zoo volunteers, the distribution of their spin directions as determined by SpArcFiRe is not random, which agrees with previous reports. When selecting the spiral galaxies using a model-driven computer analysis or without selecting the spiral galaxies at all, the distribution is also not random. Simple binomial distribution analysis shows that the probability of the parity violation to occur by chance is lower than 0.01. Fitting the spin directions as observed from the Earth to cosine dependence exhibits a dipole axis with statistical strength of 2.33 σ to 3.97 σ . These experiments show that regardless of the selection mechanism and the analysis method, all experiments show similar conclusions. These results are aligned with previous reports using other methods and telescopes, suggesting that the spin directions of spiral galaxies as observed from the Earth exhibit a dipole axis formed by their spin directions. Possible explanations can be related to the large-scale structure of the universe or to the internal structure of galaxies. The catalogs of annotated galaxies generated as part of this study are available.
more » « less
Full Text Available
Neural Network Bias in Analysis of Galaxy Photometry Data

https://doi.org/10.1109/eScience55777.2022.00061

Goddard, Hunter; Shamir, Lior (October 2022, 18th IEEE International Conference on eScience)

Full Text Available
Analysis of ∼10^6 Spiral Galaxies from Four Telescopes Shows Large-Scale Patterns of Asymmetry in Galaxy Spin Directions

https://doi.org/https://doi.org/10.1155/2022/8462363

Shamir, Lior (April 2022, Advances in Astronomy)
Frey, Sandor (Ed.)
The ability to collect unprecedented amounts of astronomical data has enabled the nomical data has enabled the stu scientific questions that were impractical to study in the pre-information era. This study uses large datasets collected by four different robotic telescopes to profile the large-scale distribution of the spin directions of spiral galaxies. These datasets cover the Northern and Southern hemispheres, in addition to data acquired from space by the Hubble Space Telescope. The data were annotated automatically by a fully symmetric algorithm, as well as manually through a long labor-intensive process, leading to a dataset of nearly 10^6 galaxies. The data show possible patterns of asymmetric distribution of the spin directions, and the patterns agree between the different telescopes. The profiles also agree when using automatic or manual annotation of the galaxies, showing very similar large-scale patterns. Combining all data from all telescopes allows the most comprehensive analysis of its kind to date in terms of both the number of galaxies and the footprint size. The results show a statistically significant profile that is consistent across all telescopes. The instruments used in this study are DECam, HST, SDSS, and Pan-STARRS. The paper also discusses possible sources of bias and analyzes the design of previous work that showed different results. Further research will be required to understand and validate these preliminary observations.
more » « less
Full Text Available
Using Machine Learning to Profile Asymmetry between Spiral Galaxies with Opposite Spin Directions

https://doi.org/10.3390/sym14050934

Shamir, Lior (April 2022, Symmetry)

Spiral galaxies can spin clockwise or counterclockwise, and the spin direction of a spiral galaxy is a clear visual characteristic. Since in a sufficiently large universe the Universe is expected to be symmetric, the spin direction of a galaxy is merely the perception of the observer, and therefore, galaxies that spin clockwise are expected to have the same characteristics of galaxies spinning counterclockwise. Here, machine learning is applied to study the possible morphological differences between galaxies that spin in opposite directions. The dataset used in this study is a dataset of 77,840 spiral galaxies classified by their spin direction, as well as a smaller dataset of galaxies classified manually. A machine learning algorithm was applied to classify between images of clockwise galaxies and counterclockwise galaxies. The results show that the classifier was able to predict the spin direction of the galaxy by its image in accuracy higher than mere chance, even when the images in one of the classes were mirrored to create a dataset with consistent spin directions. That suggests that galaxies that seem to spin clockwise to an Earth-based observer are not necessarily fully symmetric to galaxies that spin counterclockwise; while further research is required, these results are aligned with previous observations of differences between galaxies based on their spin directions.
more » « less
Full Text Available
New evidence and analysis of cosmological-scale asymmetry in galaxy spin directions

https://doi.org/10.1007/s12036-022-09809-8

Shamir, Lior (January 2022, Journal of astrophysics and astronomy)

In the past several decades, multiple cosmological theories that are based on the contention that the Universe has a major axis have been proposed. Such theories can be based on the geometry of the Universe, or multiverse theories such as black hole cosmology. The contention of a cosmological-scale axis is supported by certain evidence such as the dipole axis formed by the CMB distribution. Here I study another form of the cosmological-scale axis, based on the distribution of the spin direction of spiral galaxies. Data from four different telescopes are analyzed, showing nearly identical axis profiles when the distribution of the redshifts of the galaxies is similar.
more » « less
Full Text Available
Systematic biases when using deep neural networks for annotating large catalogs of astronomical images

https://doi.org/10.1016/j.ascom.2022.100545

Dhar, Sanchari; Shamir, Lior (January 2022, Astronomy and Computing)

Full Text Available
Analysis of the Alignment of Non-Random Patterns of Spin Directions in Populations of Spiral Galaxies

https://doi.org/10.3390/particles4010002

Shamir, Lior (March 2021, Particles)
null (Ed.)
Observations of non-random distribution of galaxies with opposite spin directions have recently attracted considerable attention. Here, a method for identifying cosine-dependence in a dataset of galaxies annotated by their spin directions is described in the light of different aspects that can impact the statistical analysis of the data. These aspects include the presence of duplicate objects in a dataset, errors in the galaxy annotation process, and non-random distribution of the asymmetry that does not necessarily form a dipole or quadrupole axes. The results show that duplicate objects in the dataset can artificially increase the likelihood of cosine dependence detected in the data, but a very high number of duplicate objects is required to lead to a false detection of an axis. Inaccuracy in galaxy annotations has relatively minor impact on the identification of cosine dependence when the error is randomly distributed between clockwise and counterclockwise galaxies. However, when the error is not random, even a small bias of 1% leads to a statistically significant cosine dependence that peaks at the celestial pole. Experiments with artificial datasets in which the distribution was not random showed strong cosine dependence even when the data did not form a full dipole axis alignment. The analysis when using the unmodified data shows asymmetry profile similar to the profile shown in multiple previous studies using several different telescopes.
more » « less
Full Text Available
Automatic identification of outliers in Hubble Space Telescope galaxy images

https://doi.org/10.1093/mnras/staa4036

Shamir, Lior (January 2021, Monthly Notices of the Royal Astronomical Society)
null (Ed.)
ABSTRACT Rare extragalactic objects can carry substantial information about the past, present, and future universe. Given the size of astronomical data bases in the information era, it can be assumed that very many outlier galaxies are included in existing and future astronomical data bases. However, manual search for these objects is impractical due to the required labour, and therefore the ability to detect such objects largely depends on computer algorithms. This paper describes an unsupervised machine learning algorithm for automatic detection of outlier galaxy images, and its application to several Hubble Space Telescope fields. The algorithm does not require training, and therefore is not dependent on the preparation of clean training sets. The application of the algorithm to a large collection of galaxies detected a variety of outlier galaxy images. The algorithm is not perfect in the sense that not all objects detected by the algorithm are indeed considered outliers, but it reduces the data set by two orders of magnitude to allow practical manual identification. The catalogue contains 147 objects that would be very difficult to identify without using automation.
more » « less
Full Text Available
Evaluation of the benchmark datasets for testing the efficacy of deep convolutional neural networks

https://doi.org/10.1016/j.visinf.2021.10.001

Dhar, Sanchari; Shamir, Lior (January 2021, Visual informatics)

In the past decade, deep neural networks, and specifically convolutional neural networks (CNNs), have been becoming a primary tool in the field of biomedical image analysis, and are used intensively in other fields such as object or face recognition. CNNs have a clear advantage in their ability to provide superior performance, yet without the requirement to fully understand the image elements that reflect the biomedical problem at hand, and without designing specific algorithms for that task. The availability of easy-to-use libraries and their non-parametric nature make CNN the most common solution to problems that require automatic biomedical image analysis. But while CNNs have many advantages, they also have certain downsides. The features determined by CNNs are complex and unintuitive, and therefore CNNs often work as a “Black Box”. Additionally, CNNs learn from any piece of information in the pixel data that can provide a discriminative signal, making it more difficult to control what the CNN actually learns. Here we follow common practices to test whether CNNs can classify biomedical image datasets, but instead of using the entire image we use merely parts of the images that do not have biomedical content. The experiments show that CNNs can provide high classification accuracy even when they are trained with datasets that do not contain any biomedical information, or can be systematically biased by irrelevant information in the image data. The presence of such consistent irrelevant data is difficult to identify, and can therefore lead to biased experimental results. Possible solutions to this downside of CNNs can be control experiments, as well as other protective practices to validate the results and avoid biased conclusions based on CNN-generated annotations.
more » « less
Full Text Available

« Prev Next »

Search for: All records