skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning
The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.  more » « less
Award ID(s):
2040462
PAR ID:
10447850
Author(s) / Creator(s):
Date Published:
Journal Name:
Asia Conference on Computers and Communications, ACCC
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various seg- mentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we eval- uate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmen- tation, (3) cell nuclei segmentation. Core Results: The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological seg- mentation tasks might help the model to achieve better performance in dense object segmentation. 
    more » « less
  2. Cell counting in immunocytochemistry is vital for biomedical research, supporting the diagnosis and treatment of diseases such as neurological disorders, autoimmune conditions, and cancer. However, traditional counting methods are manual, time-consuming, and error-prone, while deep learning solutions require costly labeled datasets, limiting scalability. We introduce the Immunocytochemistry Dataset Cell Counting with Segment Anything Model (IDCC-SAM), a novel application of the Segment Anything Model (SAM), designed to adapt the model for zero-shot-based cell counting in fluorescent microscopic immunocytochemistry datasets. IDCC-SAM leverages Meta AI’s SAM, pre-trained on 11 million images, to eliminate the need for annotations, enhancing scalability and efficiency. Evaluated on three public datasets (IDCIA, ADC, and VGG), IDCC-SAM achieved the lowest Mean Absolute Error (26, 28, 52) on VGG and ADC and the highest Acceptable Absolute Error (28%, 26%, 33%) across all datasets, outperforming state-of-the-art supervised models like U-Net and Mask R-CNN, as well as zero-shot benchmarks like NP-SAM and SAM4Organoid. These results demonstrate IDCC-SAM’s potential to improve cell-counting accuracy while reducing reliance on specialized models and manual annotations. 
    more » « less
  3. Nuclei segmentation is a fundamental task in histopathological image analysis. Typically, such segmentation tasks require significant effort to manually generate pixel-wise annotations for fully supervised training. To alleviate the manual effort, in this paper we propose a novel approach using points only annotation. Two types of coarse labels with complementary information are derived from the points annotation, and are then utilized to train a deep neural network. The fully- connected conditional random field loss is utilized to further refine the model without introducing extra computational complexity during inference. Experimental results on two nuclei segmentation datasets reveal that the proposed method is able to achieve competitive performance compared to the fully supervised counterpart and the state-of-the-art methods while requiring significantly less annotation effort. Our code is publicly available. 
    more » « less
  4. We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards: http://epic-kitchens.github.io/VISOR 
    more » « less
  5. Tumor segmentation in medical imaging is critical for diagnosis, treatment planning, and prognosis, yet remains challenging due to limited annotated data, tumor heterogeneity, and modality-specific complexities in CT, MRI, and histopathology. Although the Segment Anything Model (SAM) shows promise as a zero-shot learner, it struggles with irregular tumor boundaries and domain-specific variations. We introduce the Adaptive Unified Segmentation Anything Model (AUSAM). This novel framework extends SAM’s capabilities for multi-modal tumor segmentation by integrating an intelligent prompt module, dynamic sampling, and stage-based thresholding. Specifically, clustering-based prompt learning (DBSCAN for CT/MRI and K-means for histopathology) adaptively allocates prompts to capture challenging tumor regions, while entropy-guided sampling and dynamic thresholding systematically reduce annotation requirements and computational overhead. Validated on diverse benchmarks—LiTS (CT), FLARE 2023 (CT/MRI), ORCA, and OCDC (histopathology)—AUSAM achieves state-of-the-art Dice Similarity Coefficients (DSC) of 94.25%, 91.84%, 87.59%, and 91.84%, respectively, with significantly reduced data usage. As the first framework to adapt SAM for multi-modal tumor segmentation, AUSAM sets a new standard for precision, scalability, and efficiency. It is offered in two variants: AUSAM-Lite for resource-constrained environments and AUSAM-Max for maximum segmentation accuracy, thereby advancing medical imaging and clinical decision-making. 
    more » « less