- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources2
- Resource Type
-
0002000000000000
- More
- Availability
-
20
- Author / Contributor
- Filter by Author / Creator
-
-
Fortes, Jose A. (2)
-
Alzuru, Icaro (1)
-
Matsunaga, Andrea (1)
-
Nakagami, Makoto (1)
-
Tsugawa, Mauricio (1)
-
Yamaguchi, Saneyasu (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
& Aleven, V. (0)
-
& Andrews-Larson, C. (0)
-
& Archibald, J. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
DOI 10.1109/COMPSAC.2019.10284 Abstract—Hadoop is a popular data-analytics platform based on the MapReduce model. When analyzing extremely big data, hard disk drives are commonly used and Hadoop performance can be optimized by improving I/O performance. Hard disk drives have different performance depending on whether data are placed in the outer or inner disk zones. In this paper, we propose a method that uses knowledge of job characteristics to place data in hard disk drives so that Hadoop performance is improved. Files of a job that intensively and sequentially accesses the storage device are placed in outer disk tracks which have higher sequential access speed than inner tracks. Temporary and permanent files are placed in the outer and inner zones, respectively. This enables repeated usage of the faster zones by avoiding the use of the faster zones by permanent files. Our evaluation demonstrates that the proposed method improves the performance of Hadoop jobs by 15.0% over the normal case when file placement is not used. The proposed method also outperforms a previously proposed placement approach by 9.9%.more » « less
-
Alzuru, Icaro; Matsunaga, Andrea; Tsugawa, Mauricio; Fortes, Jose A. (, 2016 IEEE 12th International Conference on e-Science (e-Science), Baltimore, MD USA)Historical data sources, like medical records or biological collections, consist of unstructured heterogeneous content: handwritten text, different sizes and types of fonts, and text overlapped with lines, images, stamps, and sketches. The information these documents can provide is important, from a historical perspective and mainly because we can learn from it. The automatic digitization of these historical documents is a complex machine learning process that usually produces poor results, requiring costly interventions by experts, who have to transcribe and interpret the content. This paper describes hybrid (Human- and Machine-Intelligent) workflows for scientific data extraction, combining machine-learning and crowdsourcing software elements. Our results demonstrate that the mix of human and machine processes has advantages in data extraction time and quality, when compared to a machine-only workflow. More specifically, we show how OCRopus and Tesseract, two widely used open source Optical Character Recognition (OCR) tools, can improve their accuracy by more than 42%, when text areas are cropped by humans prior to OCR, while the total time can increase or decrease depending on the OCR selection. The digitization of 400 images, with Entomology, Bryophyte, and Lichen specimens, is evaluated following four different approaches: processing the whole specimen image (machine-only), processing crowd cropped labels (hybrid), processing crowd cropped fields (hybrid), and cleaning the machine-only output. As a secondary result, our experiments reveal differences in speed and quality between Tesseract and OCRopus.more » « less
An official website of the United States government
