Search for: All records

Creators/Authors contains: "Vukotic, Ilija"

« Prev Next »

Total Resources

8

Resource Type
Conference Paper

5

Conference Proceeding

0

Dataset

0

Journal Article

3

Workshop Report

0

Availability
Full Text / Resource Available

8

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TRACER (TRACe route ExploRer): A tool to explore OSG/WLCG network route topologies

https://doi.org/10.1142/S0217751X21300052

Tretyakov, Evgeniy ; Artamonov, Alexey ; Grigorieva, Maria ; Klimentov, Alexei ; McKee, Shawn ; Vukotic, Ilija ( February 2021 , International Journal of Modern Physics A)
null (Ed.)
The experiments at the Large Hadron Collider (LHC) rely upon a complex distributed computing infrastructure (WLCG) consisting of hundreds of individual sites worldwide at universities and national laboratories, providing about half a billion computing job slots and an exabyte of storage interconnected through high speed networks. Wide Area Networking (WAN) is one of the three pillars (together with computational resources and storage) of LHC computing. More than 5 PB/day are transferred between WLCG sites. Monitoring is one of the crucial components of WAN and experiments operations. In the past years all experiments have invested significant effort to improve monitoring and integrate networking information with data management and workload management systems. All WLCG sites are equipped with perfSONAR servers to collect a wide range of network metrics. We will present the latest development to provide the 3D force directed graph visualization for data collected by perfSONAR. The visualization package allows site admins, network engineers, scientists and network researchers to better understand the topology of our Research and Education networks and it provides the ability to identify nonreliable or/and nonoptimal network paths, such as those with routing loops or rapidly changing routes.
more » « less
Full Text Available
Towards a NoOps Model for WLCG

https://doi.org/https://doi.org/10.1051/epjconf/202024507024

Gardner, Robert ; Bryant, Lincoln ; Stephen, Judith ; Vukotic, Ilija ; Weaver, Christopher ; Wu, Wenjing ( November 2020 , 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019))
null (Ed.)
One of the most costly factors in providing a global computing infrastructure such as the WLCG is the human effort in deployment, integration, and operation of the distributed services supporting collaborative computing, data sharing and delivery, and analysis of extreme scale datasets. Furthermore, the time required to roll out global software updates, introduce new service components, or prototype novel systems requiring coordinated deployments across multiple facilities is often increased by communication latencies, staff availability, and in many cases expertise required for operations of bespoke services. While the WLCG (and distributed systems implemented throughout HEP) is a global service platform, it lacks the capability and flexibility of a modern platform-as-a-service including continuous integration/continuous delivery (CI/CD) methods, development-operations capabilities (DevOps, where developers assume a more direct role in the actual production infrastructure), and automation. Most importantly, tooling which reduces required training, bespoke service expertise, and the operational effort throughout the infrastructure, most notably at the resource endpoints (sites), is entirely absent in the current model. In this paper, we explore ideas and questions around potential NoOps models in this context: what is realistic given organizational policies and constraints? How should operational responsibility be organized across teams and facilities? What are the technical gaps? What are the social and cybersecurity challenges? Conversely what advantages does a NoOps model deliver for innovation and for accelerating the pace of delivery of new services needed for the HL-LHC era? We will describe initial work along these lines in the context of providing a data delivery network supporting IRIS-HEP DOMA R&D.
more » « less
Full Text Available
WLCG Networks: Update on Monitoring and Analytics

https://doi.org/10.1051/epjconf/202024507053

Babik, Marian ; McKee, Shawn ; Andrade, Pedro ; Bockelman, Brian Paul ; Gardner, Robert ; Fajardo Hernandez, Edgar Mauricio ; Martelli, Edoardo ; Vukotic, Ilija ; Weitzel, Derek ; Zvada, Marian ( January 2020 , EPJ Web of Conferences)
Doglioni, C. ; Kim, D. ; Stewart, G.A. ; Silvestris, L. ; Jackson, P. ; Kamleh, W. (Ed.)
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues including connection failures, congestion and traffic routing. The OSG Networking Area, in partnership with WLCG, is focused on being the primary source of networking information for its partners and constituents. It was established to ensure sites and experiments can better understand and fix networking issues, while providing an analytics platform that aggregates network monitoring data with higher level workload and data transfer services. This has been facilitated by the global network of the perfSONAR instances that have been commissioned and are operated in collaboration with WLCG Network Throughput Working Group. An additional important update is the inclusion of the newly funded NSF project SAND (Service Analytics and Network Diagnosis) which is focusing on network analytics. This paper describes the current state of the network measurement and analytics platform and summarises the activities taken by the working group and our collaborators. This includes the progress being made in providing higher level analytics, alerting and alarming from the rich set of network metrics we are gathering.
more » « less
Full Text Available
StashCache: A Distributed Caching Federation for the Open Science Grid

https://doi.org/10.1145/3332186.3332212

Weitzel, Derek ; Zvada, Marian ; Vukotic, Ilija ; Gardner, Rob ; Bockelman, Brian ; Rynge, Mats ; Hernandez, Edgar Fajardo ; Lin, Brian ; Selmeci, Mátyás ( January 2019 , Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) (PEARC ‘19). ACM, New York, NY, USA, Article 58, 7 pages.)

Data distribution for opportunistic users is challenging as they neither own the computing resources they are using or any nearby storage. Users are motivated to use opportunistic computing to expand their data processing capacity, but they require storage and fast networking to distribute data to that processing. Since it requires significant management overhead, it is rare for resource providers to allow opportunistic access to storage. Additionally, in order to use opportunistic storage at several distributed sites, users assume the responsibility to maintain their data. In this paper we present StashCache, a distributed caching federation that enables opportunistic users to utilize nearby opportunistic storage. StashCache is comprised of four components: data origins, redirectors, caches, and clients. StashCache has been deployed in the Open Science Grid for several years and has been used by many projects. Caches are deployed in geographically distributed locations across the U.S. and Europe. We will present the architecture of StashCache, as well as utilization information of the infrastructure. We will also present performance analysis comparing distributed HTTP Proxies vs StashCache.
more » « less
Full Text Available
Managing Privilege and Access on Federated Edge Platforms

https://doi.org/10.1145/3332186.3332234

Breen, Joe ; Bryant, Lincoln ; Chen, Jiahui ; Ford, Emerson ; Gardner, Robert W. ; Glupker, Gage ; Griffith, Skyler ; Kulbertis, Ben ; McKee, Shawn ; Pierce, Rose ; et al ( January 2019 , Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning))

Full Text Available
Developing Edge Services for Federated Infrastructure Using MiniSLATE

https://doi.org/10.1145/3332186.3332236

Breen, Joe ; Bryant, Lincoln ; Chen, Jiahui ; Ford, Emerson ; Gardner, Robert W. ; Glupker, Gage ; Griffith, Skyler ; Kulbertis, Ben ; McKee, Shawn ; Pierce, Rose ; et al ( January 2019 , Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning))

Full Text Available
Building the SLATE Platform

https://doi.org/10.1145/3219104.3219144

Breen, Joe ; McKee, Shawn ; Riedel, Benedikt ; Stidd, Jason ; Truong, Luan ; Vukotic, Ilija ; Bryant, Lincoln ; Carcassi, Gabriele ; Chen, Jiahui ; Gardner, Robert W. ; et al ( July 2018 , Proceedings of the Practice and Experience on Advanced Research Computing)

We describe progress on building the SLATE (Services Layer at the Edge) platform. The high level goal of SLATE is to facilitate creation of multi-institutional science computing systems by augmenting the canonical Science DMZ pattern with a generic, "programmable", secure and trusted underlayment platform. This platform permits hosting of advanced container-centric services needed for higher-level capabilities such as data transfer nodes, software and data caches, workflow services and science gateway components. SLATE uses best-of-breed data center virtualization and containerization components, and where available, software defined networking, to enable distributed automation of deployment and service lifecycle management tasks by domain experts. As such it will simplify creation of scalable platforms that connect research teams, institutions and resources to accelerate science while reducing operational costs and development cycle times.
more » « less
Full Text Available
Operation and performance of the ATLAS semiconductor tracker in LHC Run 2

https://doi.org/10.1088/1748-0221/17/01/P01013

Aad, Georges ; Abbott, Brad ; Abbott, Dale Charles ; Abed Abud, Adam ; Abeling, Kira ; Abhayasinghe, Deshan Kavishka ; Abidi, Syed Haider ; Aboulhorma, Asmaa ; Abramowicz, Halina ; Abreu, Henso ; et al ( January 2022 , Journal of Instrumentation)

Abstract The semiconductor tracker (SCT) is one of the tracking systems for charged particles in the ATLAS detector. It consists of 4088 silicon strip sensor modules.During Run 2 (2015–2018) the Large Hadron Collider delivered an integrated luminosity of 156 fb -1 to the ATLAS experiment at a centre-of-mass proton-proton collision energy of 13 TeV. The instantaneous luminosity and pile-up conditions were far in excess of those assumed in the original design of the SCT detector.Due to improvements to the data acquisition system, the SCT operated stably throughout Run 2.It was available for 99.9% of the integrated luminosity and achieved a data-quality efficiency of 99.85%.Detailed studies have been made of the leakage current in SCT modules and the evolution of the full depletion voltage, which are used to study the impact of radiation damage to the modules.
more » « less
Full Text Available