Javascript must be enabled to continue!

Foundation-Model-Driven Skin Lesion Segmentation and Classification Using SAM-Adapters and Vision Transformers

Background: The precise segmentation and classification of dermoscopic images remain prominent obstacles in automated skin cancer evaluation due, in part, to variability in lesions, low-contrast borders, and additional artifacts in the background. There have been recent developments in foundation models, with a particular emphasis on the Segment Anything Model (SAM)—these models exhibit strong generalization potential but require domain-specific adaptation to function effectively in medical imaging. The advent of new architectures, particularly Vision Transformers (ViTs), expands the means of implementing robust lesion identification; however, their strengths are limited without spatial priors. Methods: The proposed study lays out an integrated foundation-model-based framework that utilizes SAM-Adapter-fine-tuning for lesion segmentation and a ViT-based classifier that incorporates lesion-specific cropping derived from segmentation and cross-attention fusion. The SAM encoder is kept frozen while lightweight adapters are fine-tuned only, to introduce skin surface-specific capacity. Segmentation priors are incorporated during the classification stage through fusion with patch-embeddings from the images, creating lesion-centric reasoning. The entire pipeline is trained using a joint multi-task approach using data from the ISIC 2018, HAM10000, and PH2 datasets. Results: From extensive experimentation, the proposed method outperforms the state-of-the-art segmentation and classification across the dataset. On the ISIC 2018 dataset, it achieves a Dice score of 94.27% for segmentation and an accuracy of 95.88% for classification performance. On PH2, a Dice score of 95.62% is achieved, and for HAM10000, an accuracy of 96.37% is achieved. Several ablation analyses confirm that both the SAM-Adapters and lesion-specific cropping and cross-attention fusion contribute substantially to performance. Paired t-tests are used to confirm statistical significance for all the previously stated measures where improvements over strong baselines indicate a p<0.01 for most comparisons and with large effect sizes. Conclusions: The results indicate that the combination of prior segmentation from foundation models, plus transformer-based classification, consistently and reliably improves the quality of lesion boundaries and diagnosis accuracy. Thus, the proposed SAM-ViT framework demonstrates a robust, generalizable, and lesion-centric automated dermoscopic analysis, and represents a promising initial step towards clinically deployable skin cancer decision-support system. Next steps will include model compression, improved pseudo-mask refinement and evaluation on real-world multi-center clinical cohorts.

MDPI AG

Faisal Binzagr Majed Hariri

Diagnostics

2026

Title: Foundation-Model-Driven Skin Lesion Segmentation and Classification Using SAM-Adapters and Vision Transformers

Description:

There have been recent developments in foundation models, with a particular emphasis on the Segment Anything Model (SAM)—these models exhibit strong generalization potential but require domain-specific adaptation to function effectively in medical imaging.

The advent of new architectures, particularly Vision Transformers (ViTs), expands the means of implementing robust lesion identification; however, their strengths are limited without spatial priors.

Methods: The proposed study lays out an integrated foundation-model-based framework that utilizes SAM-Adapter-fine-tuning for lesion segmentation and a ViT-based classifier that incorporates lesion-specific cropping derived from segmentation and cross-attention fusion.

The SAM encoder is kept frozen while lightweight adapters are fine-tuned only, to introduce skin surface-specific capacity.

Segmentation priors are incorporated during the classification stage through fusion with patch-embeddings from the images, creating lesion-centric reasoning.

The entire pipeline is trained using a joint multi-task approach using data from the ISIC 2018, HAM10000, and PH2 datasets.

Results: From extensive experimentation, the proposed method outperforms the state-of-the-art segmentation and classification across the dataset.

On the ISIC 2018 dataset, it achieves a Dice score of 94.

27% for segmentation and an accuracy of 95.

88% for classification performance.

On PH2, a Dice score of 95.

62% is achieved, and for HAM10000, an accuracy of 96.

37% is achieved.

Several ablation analyses confirm that both the SAM-Adapters and lesion-specific cropping and cross-attention fusion contribute substantially to performance.

Paired t-tests are used to confirm statistical significance for all the previously stated measures where improvements over strong baselines indicate a p<0.

01 for most comparisons and with large effect sizes.

Conclusions: The results indicate that the combination of prior segmentation from foundation models, plus transformer-based classification, consistently and reliably improves the quality of lesion boundaries and diagnosis accuracy.

Thus, the proposed SAM-ViT framework demonstrates a robust, generalizable, and lesion-centric automated dermoscopic analysis, and represents a promising initial step towards clinically deployable skin cancer decision-support system.

Next steps will include model compression, improved pseudo-mask refinement and evaluation on real-world multi-center clinical cohorts.

Back

Related Results

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

AI‐enabled precise brain tumor segmentation by integrating Refinenet and contour‐constrained features in MRI images

AbstractBackgroundMedical image segmentation is a fundamental task in medical image analysis and has been widely applied in multiple medical fields. The latest transformer‐based de...

Hydatid Disease of The Brain Parenchyma: A Systematic Review

Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...

EPD Electronic Pathogen Detection v1

Electronic pathogen detection (EPD) is a non - invasive, rapid, affordable, point- of- care test, for Covid 19 resulting from infection with SARS-CoV-2 virus. EPD scanning techno...

Without the Right Adapters a Force Calibration Technician is Nothing Short of Being Called a Miracle Worker

Think about that for a minute. Would you want a surgeon to operate on you with kitchen utensils such as a serrated knife? Then why do some people (management cough) expect or ask t...

Multiple surface segmentation using novel deep learning and graph based methods

<p>The task of automatically segmenting 3-D surfaces representing object boundaries is important in quantitative analysis of volumetric images, which plays a vital role in nu...

2AM: Weakly Supervised Tumor Segmentation in Pathology via CAM and SAM Synergy

Tumor microenvironment (TME) analysis plays an extremely important role in computational pathology. Deep learning shows tremendous potential for tumor tissue segmentation on pathol...

On the Remote Calibration of Instrumentation Transformers: Influence of Temperature

The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...

Email:
Password:

Email: