Vision Language models (VLMs) have transformed Generative AI by enabling systems to interpret and respond to multi-modal data in real-time. While advancements in edge computing have made it possible to deploy smaller Large Language Models (LLMs) on smartphones and laptops, deploying competent VLMs on edge devices remains challenging due to their high computational demands. Furthermore, cloud-only deployments fail to utilize the evolving processing capabilities at the edge and limit responsiveness. This paper introduces a distributed architecture for VLMs that addresses these limitations by partitioning model components between edge devices and central servers. In this setup, vision components run on edge devices for immediate processing, while language generation of the VLM is handled by a centralized server, resulting in up to 33% improvement in throughput over traditional cloud-only solutions. Moreover, our approach enhances the computational efficiency of off-the-shelf VLM models without the need for model compression techniques. This work demonstrates the scalability and efficiency of a hybrid architecture for VLM deployment and contributes to the discussion on how distributed approaches can improve VLM performance. Index Terms—vision-language models (VLMs), edge computing, distributed computing, inference optimization, edge-cloud collaboration.
more »
« less
Communication-Efficient and Privacy-Preserving Edge-Cloud Framework For Smart Healthcare
The healthcare industry has experienced a re-markable digital transformation through the adoption of IoT technologies, resulting in a significant increase in the volume and variety of medical data generated. Challenges in processing, analyzing, and sharing healthcare data persist. Traditional cloud computing approaches, while useful for processing healthcare data, have drawbacks, including delays in data transfer, data privacy concerns, and the risk of data unavailability. In this paper, we propose a software-defined 5G and AI-enabled distributed edge-cloud collaboration platform to classify healthcare data at the edge devices, facilitate realtime service delivery, and create AI/ML-based models for identifying patients' potential medical conditions. In our architecture, we have incorporated a federated learning scheme based on homomorphic encryption to provide privacy in data sharing and processing. The proposed framework ensures secure and efficient data communication and processing, ultimately fostering effective collaboration among healthcare institutions. The models will be validated by performing a comparative time analysis, and the interplay between edge and cloud computing will be investigated to support realtime healthcare applications.
more »
« less
- Award ID(s):
- 2219741
- PAR ID:
- 10535397
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-7021-8
- Page Range / eLocation ID:
- 377 to 382
- Format(s):
- Medium: X
- Location:
- Kuala Lumpur, Malaysia
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With each passing year, the state-of-the-art deep learning neural networks grow larger in size, requiring larger computing and power resources. The high compute resources required by these large networks are alienating the majority of the world population that lives in low-resource settings and lacks the infrastructure to benefit from these advancements in medical AI. Current state-of-the-art medical AI, even with cloud resources, is a bit difficult to deploy in remote areas where we don’t have good internet connectivity. We demonstrate a cost-effective approach to deploying medical AI that could be used in limited resource settings using Edge Tensor Processing Unit (TPU). We trained and optimized a classification model on the Chest X-ray 14 dataset and a segmentation model on the Nerve ultrasound dataset using INT8 Quantization Aware Training. Thereafter, we compiled the optimized models for Edge TPU execution. We find that the inference performance on edge TPUs is 10x faster compared to other embedded devices. The optimized model is 3x and 12x smaller for the classification and segmentation respectively, compared to the full precision model. In summary, we show the potential of Edge TPUs for two medical AI tasks with faster inference times, which could potentially be used in low-resource settings for medical AI-based diagnostics. We finally discuss some potential challenges and limitations of our approach for real-world deployments.more » « less
-
Sensor-powered devices offer safe global connections; cloud scalability and flexibility, and new business value driven by data. The constraints that have historically obstructed major innovations in technology can be addressed by advancements in Artificial Intelligence (AI) and Machine Learning (ML), cloud, quantum computing, and the ubiquitous availability of data. Edge AI (Edge Artificial Intelligence) refers to the deployment of AI applications on the edge device near the data source rather than in a cloud computing environment. Although edge data has been utilized to make inferences in real-time through predictive models, real-time machine learning has not yet been fully adopted. Real-time machine learning utilizes real-time data to learn on the go, which helps in faster and more accurate real-time predictions and eliminates the need to store data eradicating privacy issues. In this article, we present the practical prospect of developing a physical threat detection system using real-time edge data from security cameras/sensors to improve the accuracy, efficiency, reliability, security, and privacy of the real-time inference model.more » « less
-
Abstract In 2020, the U.S. Department of Defense officially disclosed a set of ethical principles to guide the use of Artificial Intelligence (AI) technologies on future battlefields. Despite stark differences, there are core similarities between the military and medical service. Warriors on battlefields often face life-altering circumstances that require quick decision-making. Medical providers experience similar challenges in a rapidly changing healthcare environment, such as in the emergency department or during surgery treating a life-threatening condition. Generative AI, an emerging technology designed to efficiently generate valuable information, holds great promise. As computing power becomes more accessible and the abundance of health data, such as electronic health records, electrocardiograms, and medical images, increases, it is inevitable that healthcare will be revolutionized by this technology. Recently, generative AI has garnered a lot of attention in the medical research community, leading to debates about its application in the healthcare sector, mainly due to concerns about transparency and related issues. Meanwhile, questions around the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied. As a result, there are no clear solutions to address ethical concerns, and decision-makers often neglect to consider the significance of ethical principles before implementing generative AI in clinical practice. In an attempt to address these issues, we explore ethical principles from the military perspective and propose the “GREAT PLEA” ethical principles, namely Governability, Reliability, Equity, Accountability, Traceability, Privacy, Lawfulness, Empathy, and Autonomy for generative AI in healthcare. Furthermore, we introduce a framework for adopting and expanding these ethical principles in a practical way that has been useful in the military and can be applied to healthcare for generative AI, based on contrasting their ethical concerns and risks. Ultimately, we aim to proactively address the ethical dilemmas and challenges posed by the integration of generative AI into healthcare practice.more » « less
-
Jiang, Yizhang (Ed.)In the era of IoT and smart systems, an enormous amount of data will be generated from various IoT/smart devices in smart homes, smart cars, etc. Typically, this big data is collected and sent directly to the cloud infrastructure for processing, analyzing, and storing. However, traditional cloud infrastructure faces serious challenges when handling this massive amount of data, including insufficient bandwidth, high latency, unsatisfactory real-time response, high power consumption, and privacy protection issues. The edge-centric computing is emerging as a complementary solution to address the aforementioned issues of the cloud infrastructure. Furthermore, for many real-world IoT and smart systems, such as smart cars, real-time, in situ, and online data analysis and processing are crucial. With edge computing, data processing and analysis can be done closer to the source of the data (i.e., at the edge of the networks), which in turn enables real-time and in-situ data analytics and processing. As a result, edge computing will soon become the cornerstone of many IoT and smart applications. However, edge computing is still in its infancy; thus, requires novel models and techniques to support real-time and in-situ data processing and analysis. In this research work, we introduce novel and efficient computation models that are suitable for real-time processing and analysis on next-generation edge-computing platforms. Since most common edge-computing tasks are data analytics/mining, we focus on widely used data analytics techniques, including dimensionality reduction and classification techniques, specifically, principal component analysis (PCA) and support vectors machine (SVM), respectively. This is mainly because it is demonstrated that combination of PCA and SVM leads to high classification accuracy in many fields. In this paper, we introduce three different PCA+SVM models (i.e., Model 1, Model 2, and Model 3), for real-time processing and analysis (for online training and inference) on edge computing platforms. Model 1 and Model 2 are created utilizing the same SVM algorithm but with a different design/functional flows, whereas Model 3 is created with the same functional flow as Model 2 but utilizing a modified SVM algorithm. Our experimental results and analysis demonstrate that Model 3 utilizes dramatically lower number of iterations to produce the results, compared to that of other two models, while achieving acceptable performance results. Our results and analysis demonstrate that Model 3 is the most suitable computation model for real-time processing and analysis of edge computing platform.more » « less
An official website of the United States government

