- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources1
- Resource Type
-
0001000000000000
- More
- Availability
-
01
- Author / Contributor
- Filter by Author / Creator
-
-
Madhyastha, Harsha V (1)
-
Sun, Huanchen (1)
-
Zhu, Jingyuan Zhu (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
& Aleven, V. (0)
-
& Andrews-Larson, C. (0)
-
& Archibald, J. (0)
-
& Arnett, N. (0)
-
& Arya, G. (0)
-
& Attari, S. Z. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Operators of web archives have two options for how to crawl pages from the web. Browser-based dynamic crawlers capture all of the resources on every page, but incur high compute overheads. Static browserless crawlers are more lightweight, but miss page resources which are fetched only when scripts are executed. In this paper, we make the case that a web archive does not have to make a binary choice between dynamic or static crawling. Instead, by using a browser for a carefully chosen small subset of crawls, an archive can significantly improve its ability to serve statically crawled pages with high fidelity. First, we show how to reuse crawled resources, both across pages and across multiple crawls of the same page over time. Second, by leveraging a dynamic crawl of a page, we show that subsequent static crawls of the page can be augmented to fetch resources without executing the scripts which request them. We estimate that, as long as 8.9% of page crawls use a browser, an archive can serve roughly 99% of the remaining statically crawled pages without any loss in fidelity, up from 55% without our techniques.more » « lessFree, publicly-accessible full text available October 28, 2026
An official website of the United States government
