skip to main content


Title: AutoPhoto: Aesthetic Photo Capture using Reinforcement Learning
The process of capturing a well-composed photo is difficult and it takes years of experience to master. We propose a novel pipeline for an autonomous agent to automatically capture an aesthetic photograph by navigating within a local region in a scene. Instead of classical optimization over heuristics such as the rule-of-thirds, we adopt a data-driven aesthetics estimator to assess photo quality. A reinforcement learning framework is used to optimize the model with respect to the learned aesthetics metric. We train our model in simulation with indoor scenes, and we demonstrate that our system can capture aesthetic photos in both simulation and real world environments on a ground robot. To our knowledge, this is the first system that can automatically explore an environment to capture an aesthetic photo with respect to a learned aesthetic estimator. Source code is at https://github.com/HadiZayer/AutoPhoto  more » « less
Award ID(s):
1900783
NSF-PAR ID:
10377841
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Page Range / eLocation ID:
944 to 951
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    As inborn characteristics, humans possess the ability to judge visual aesthetics, feel the emotions from the environment, and comprehend others’ emotional expressions. Many exciting applications become possible if robots or computers can be empowered with similar capabilities. Modeling aesthetics, evoked emotions, and emotional expressions automatically in unconstrained situations, however, is daunting due to the lack of a full understanding of the relationship between low-level visual content and high-level aesthetics or emotional expressions. With the growing availability of data, it is possible to tackle these problems using machine learning and statistical modeling approaches. In the talk, I provide an overview of our research in the last two decades on data-driven analyses of visual artworks and digital visual content for modeling aesthetics and emotions. First, I discuss our analyses of styles in visual artworks. Art historians have long observed the highly characteristic brushstroke styles of Vincent van Gogh and have relied on discerning these styles for authenticating and dating his works. In our work, we compared van Gogh with his contemporaries by statistically analyzing a massive set of automatically extracted brushstrokes. A novel extraction method is developed by exploiting an integration of edge detection and clustering-based segmentation. Evidence substantiates that van Gogh’s brushstrokes are strongly rhythmic. Next, I describe an effort to model the aesthetic and emotional characteristics in visual contents such as photographs. By taking a data-driven approach, using the Internet as the data source, we show that computers can be trained to recognize various characteristics that are highly relevant to aesthetics and emotions. Future computer systems equipped with such capabilities are expected to help millions of users in unimagined ways. Finally, I highlight our research on automated recognition of bodily expression of emotion. We propose a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize the body language of humans. Comprehensive statistical analysis revealed many interesting insights from the dataset. A system to model the emotional expressions based on bodily movements, named ARBEE (Automated Recognition of Bodily Expression of Emotion), has also been developed and evaluated. 
    more » « less
  2. Placing and orienting a camera to compose aesthetically meaningful shots of a scene is not only a key objective in real-world photography and cinematography but also for virtual content creation. The framing of a camera often significantly contributes to the story telling in movies, games, and mixed reality applications. Generating single camera poses or even contiguous trajectories either requires a significant amount of manual labor or requires solving highdimensional optimization problems, which can be computationally demanding and error-prone. In this paper, we introduce GAIT, a Deep Reinforcement Learning (DRL) agent, that learns to automatically control a camera to generate a sequence of aesthetically meaningful views for synthetic 3D indoor scenes. To generate sequences of frames with high aesthetic value, GAIT relies on a neural aesthetics estimator, which is trained on a crowed-sourced dataset. Additionally, we introduce regularization techniques for diversity and smoothness to generate visually interesting trajectories for a 3D environment, and to constrain agent acceleration in the reward function to generate a smooth sequence of camera frames. We validated our method by comparing it to baseline algorithms, based on a perceptual user study, and through ablation studies. The source code of our method will be released with the final version of our paper. 
    more » « less
  3. Abstract

    A commonplace sight is seeing other people walk. Our visual system specializes in processing such actions. Notably, we are not only quick to recognize actions, but also quick to judge how elegantly (or not) people walk. What movements appear appealing, and why do we have such aesthetic experiences? Do aesthetic preferences for body movements arise simply from perceiving others’ positive emotions? To answer these questions, we showed observers different point-light walkers who expressed neutral, happy, angry, or sad emotions through their movements and measured the observers’ impressions of aesthetic appeal, emotion positivity, and naturalness of these movements. Three experiments were conducted. People showed consensus in aesthetic impressions even after controlling for emotion positivity, finding prototypical walks more aesthetically pleasing than atypical walks. This aesthetic prototype effect could be accounted for by a computational model in which walking actions are treated as a single category (as opposed to multiple emotion categories). The aesthetic impressions were affected both directly by the objective prototypicality of the movements, and indirectly through the mediation of perceived naturalness. These findings extend the boundary of category learning, and hint at possible functions for action aesthetics.

     
    more » « less
  4. Automatic Number Plate Recognition (ANPR) has been widely used in different domains, such as car park management, traffic management, tolling, and intelligent transport systems. Despite this technology’s importance, the existing ANPR approaches suffer from the accurate identification of number plats due to its different size, orientation, and shapes across different regions worldwide. In this paper, we are studying these challenges by implementing a case study for smart car towing management using Machine Learning (ML) models. The developed mobile-based system uses different approaches and techniques to enhance the accuracy of recognizing number plates in real-time. First, we developed an algorithm to accurately detect the number plate’s location on the car body. Then, the bounding box of the plat is extracted and converted into a grayscale image. Second, we applied a series of filters to detect the alphanumeric characters’ contours within the grayscale image. Third, the detected the alphanumeric characters’ contours are fed into a K-Nearest Neighbors (KNN) model to detect the actual number plat. Our model achieves an overall classification accuracy of 95% in recognizing number plates across different regions worldwide. The user interface is developed as an Android mobile app, allowing law-enforcement personnel to capture a photo of the towed car, which is then recorded in the car towing management system automatically in real-time. The app also allows owners to search for their cars, check the case status, and pay fines. Finally, we evaluated our system using various performance metrics such as classification accuracy, processing time, etc. We found that our model outperforms some state-of-the-art ANPR approaches in terms of the overall processing time. 
    more » « less
  5. Any graph drawing can be characterised by a range of computational aesthetic metrics. For example, a given drawing might be described as having eight crossings, a mean angular resolution of 0.34, and an edge orthogonality value of 0.72. However, without knowing the distribution of these metrics it is hard to compare the quality of drawings of different graphs, nor know whether a given drawing is typical or an outlier within the space of all possible drawings. This paper explores the range and distribution of ten normalised graph drawing layout metrics, based on graphs created by six graph generation algorithms and drawings created by six popular layout algorithms. We include the “Rome" and “North" graph repositories in our analysis. Our exploration of the multi-dimensional aesthetics space allows for comparisons between the graph drawing algorithms, highlighting those that cover larger or smaller volumes of the aesthetics space. We calculate the correlation coefficients between the metrics, indicating those that may conflict with each other (negatively correlated), and those that may be redundant (positively correlated). Our results will be useful as the basis for simulated annealing or gradient descent layout algorithms, for identifying the best layout algorithms for producing a specified combination and range of aesthetics, and for informing experimental controls in human empirical studies. 
    more » « less