Results for "image-feature-extraction"

100 matches found.

facebook

facebook/dinov2-small

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...

πŸ”Ž image-feature-extraction 2,794,873
google

google/vit-base-patch16-224-in21k

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...

πŸ”Ž image-feature-extraction 1,218,566
facebook

facebook/dinov2-base

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...

πŸ”Ž image-feature-extraction 1,132,921
facebook

facebook/dinov2-large

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...

πŸ”Ž image-feature-extraction 890,681
facebook

facebook/dinov3-vitb16-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 653,727
facebook

facebook/dinov3-vitl16-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 577,514
timm

timm/vit_small_patch14_reg4_dinov2.lvd142m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 22.1 - GMACs: 29.6 - Activations (M): 57.5 - Image size: 51...

πŸ”Ž image-feature-extraction 444,794
timm

timm/vit_base_patch14_dinov2.lvd142m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 86.6 - GMACs: 151.7 - Activations (M): 397.6 - Image size: ...

πŸ”Ž image-feature-extraction 384,381
facebook

facebook/dino-vitb16

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...

πŸ”Ž image-feature-extraction 373,964
nomic-ai

nomic-ai/nomic-embed-vision-v1.5

No description available.

πŸ”Ž image-feature-extraction 318,151
timm

timm/vit_large_patch14_reg4_dinov2.lvd142m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 304.4 - GMACs: 416.1 - Activations (M): 305.3 - Image size:...

πŸ”Ž image-feature-extraction 173,929
timm

timm/vit_base_patch14_reg4_dinov2.lvd142m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 86.6 - GMACs: 117.5 - Activations (M): 115.0 - Image size: ...

πŸ”Ž image-feature-extraction 158,416
timm

timm/vit_base_patch16_clip_224.openai

The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was ...

πŸ”Ž image-feature-extraction 155,191
facebook

facebook/dinov3-vits16-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 143,268
timm

timm/vit_base_patch16_dinov3.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 85.6 - GMACs: 23.6 - Activations (M): 34.1 - Image size: 256 x 256 - Original...

πŸ”Ž image-feature-extraction 140,987
facebook

facebook/dinov2-giant

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...

πŸ”Ž image-feature-extraction 139,919
timm

timm/vit_large_patch16_dinov3.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 303.1 - GMACs: 82.4 - Activations (M): 90.6 - Image size: 256 x 256 - Origina...

πŸ”Ž image-feature-extraction 118,450
facebook

facebook/dinov3-vith16plus-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 109,803
facebook

facebook/dino-vits16

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...

πŸ”Ž image-feature-extraction 102,934
facebook

facebook/dinov2-with-registers-base

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) originally introduced to do supervised image classification on Image...

πŸ”Ž image-feature-extraction 92,500
paige-ai

paige-ai/Virchow2

Developed by: Paige, NYC, USA and Microsoft Research, Cambridge, MA USA - Model Type: Image feature backbone - Model Stats: - Params (M): 63...

πŸ”Ž image-feature-extraction 90,934
timm

timm/vit_large_patch14_dinov2.lvd142m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 304.4 - GMACs: 507.1 - Activations (M): 1058.8 - Image size...

πŸ”Ž image-feature-extraction 84,577
microsoft

microsoft/rad-dino

RAD-DINO is described in detail in Exploring Scalable Medical Image Encoders Beyond Text Supervision (F. PΓ©rez-GarcΓ­a, H. Sharma, S. Bond-Ta...

πŸ”Ž image-feature-extraction 83,274
timm

timm/vit_small_patch14_dinov2.lvd142m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 22.1 - GMACs: 46.8 - Activations (M): 198.8 - Image size: 5...

πŸ”Ž image-feature-extraction 77,131
timm

timm/samvit_base_patch16.sa1b

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 89.7 - GMACs: 486.4 - Activations (M): 1343.3 - Image size:...

πŸ”Ž image-feature-extraction 76,783
facebook

facebook/dinov3-vits16plus-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 63,664
timm

timm/vit_base_patch16_224.orig_in21k

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 16.9 - Activations (M): 16.5 - Image size: 22...

πŸ”Ž image-feature-extraction 63,415
google

google/vit-large-patch16-224-in21k

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...

πŸ”Ž image-feature-extraction 59,525
timm

timm/vit_small_patch16_dinov3.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 21.6 - GMACs: 6.3 - Activations (M): 17.0 - Image size: 256 x 256 - Original:...

πŸ”Ž image-feature-extraction 55,647
prov-gigapath

prov-gigapath/prov-gigapath

Overview of Prov-GigaPath model architecture...

πŸ”Ž image-feature-extraction 54,451
timm

timm/convnext_tiny.dinov3_lvd1689m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 27.8 - GMACs: 4.5 - Activations (M): 13.4 - Image size: 224...

πŸ”Ž image-feature-extraction 47,206
MahmoodLab

MahmoodLab/TITAN

Developed by: Mahmood Lab AI for Pathology @ Harvard/BWH - Model type: Pretrained vision-language encoders - Pretraining dataset: Mass-340K,...

πŸ”Ž image-feature-extraction 46,464
MahmoodLab

MahmoodLab/UNI2-h

Developed by: Mahmood Lab AI for Pathology @ Harvard/BWH - Model type: Pretrained vision backbone (ViT-H/14 via DINOv2) for multi-purpose ev...

πŸ”Ž image-feature-extraction 45,846
facebook

facebook/dinov3-convnext-base-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 44,724
MahmoodLab

MahmoodLab/CONCH

No description available.

πŸ”Ž image-feature-extraction 43,428
timm

timm/vit_base_patch16_224.dino

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 16.9 - Activations (M): 16.5 - Image size: 22...

πŸ”Ž image-feature-extraction 42,860
TTPlanet

TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic

Here's a refined version of the update notes for the Tile V2: -Introducing the new Tile V2, enhanced with a vastly improved training dataset...

πŸ”Ž image-feature-extraction 41,832
google

google/vit-huge-patch14-224-in21k

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...

πŸ”Ž image-feature-extraction 40,376
StanfordAIMI

StanfordAIMI/dinov2-base-xray-224

AIMI FMs: A Collection of Foundation Models in Radiology...

πŸ”Ž image-feature-extraction 39,752
timm

timm/vit_large_patch16_siglip_256.v2_webli

Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...

πŸ”Ž image-feature-extraction 39,702
histai

histai/hibou-L

No description available.

πŸ”Ž image-feature-extraction 38,858
facebook

facebook/dinov3-convnext-small-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 34,800
timm

timm/vit_base_patch16_siglip_512.v2_webli

Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...

πŸ”Ž image-feature-extraction 28,390
PIA-SPACE-LAB

PIA-SPACE-LAB/dinov3-vitl-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 27,970
facebook

facebook/dinov3-convnext-tiny-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 27,034
timm

timm/vit_small_patch16_224.dino

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 21.7 - GMACs: 4.3 - Activations (M): 8.2 - Image size: 224 ...

πŸ”Ž image-feature-extraction 26,956
timm

timm/vit_large_patch16_dinov3.sat493m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 303.1 - GMACs: 82.4 - Activations (M): 90.6 - Image size: 256 x 256 - Origina...

πŸ”Ž image-feature-extraction 22,929
timm

timm/vit_so400m_patch16_siglip_512.v2_webli

Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...

πŸ”Ž image-feature-extraction 22,359
timm

timm/convnextv2_tiny.fcmae

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 27.9 - GMACs: 4.5 - Activations (M): 13.4 - Image size: 224...

πŸ”Ž image-feature-extraction 20,596
nvidia

nvidia/RADIO-L

No description available.

πŸ”Ž image-feature-extraction 20,006
bioptimus

bioptimus/H-optimus-0

- image-feature-extraction - timm - pathology - histology - medical imaging - self-supervised learning - vision transformer - foundation mod...

πŸ”Ž image-feature-extraction 19,876
timm

timm/convnext_small.dinov3_lvd1689m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 49.5 - GMACs: 8.7 - Activations (M): 21.6 - Image size: 224...

πŸ”Ž image-feature-extraction 18,813
timm

timm/naflexvit_so400m_patch16_siglip.v2_webli

No description available.

πŸ”Ž image-feature-extraction 18,510
Lin-Chen

Lin-Chen/ShareGPT4V-7B_Pretrained_vit-large336-l12

Model type: This is the vision tower of ShareGPT4V-7B fine-tuned with our ShareGPT4V dataset. Model date: This vision tower was trained in N...

πŸ”Ž image-feature-extraction 17,755
py-feat

py-feat/img2pose

img2pose uses Faster R-CNN to predict 6 Degree of Freedom Pose (DoF) for all faces in the photo. An interesting property of this model is th...

πŸ”Ž image-feature-extraction 17,476
py-feat

py-feat/resmasknet

resmasknet combines residual masking with unet architecture to predict 7 facial emotion categories from images....

πŸ”Ž image-feature-extraction 16,932
timm

timm/aimv2_large_patch14_224.apple_pt_dist

No description available.

πŸ”Ž image-feature-extraction 16,862
facebook

facebook/dinov2-with-registers-large

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) originally introduced to do supervised image classification on Image...

πŸ”Ž image-feature-extraction 16,758
MahmoodLab

MahmoodLab/UNI

Developed by: Mahmood Lab AI for Pathology @ Harvard/BWH - Model type: Pretrained vision backbone (ViT-L/16 via DINOv2) for multi-purpose ev...

πŸ”Ž image-feature-extraction 16,291
timm

timm/vit_base_patch16_224.mae

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 17.6 - Activations (M): 23.9 - Image size: 22...

πŸ”Ž image-feature-extraction 15,941
DAMO-NLP-SG

DAMO-NLP-SG/VL3-SigLIP-NaViT

No description available.

πŸ”Ž image-feature-extraction 15,932
microsoft

microsoft/rad-dino-maira-2

RAD-DINO-MAIRA-2 is a vision transformer model trained to encode chest X-rays using the self-supervised learning method DINOv2. RAD-DINO-MAI...

πŸ”Ž image-feature-extraction 15,462
timm

timm/vit_base_patch16_siglip_256.v2_webli

Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...

πŸ”Ž image-feature-extraction 15,189
owkin

owkin/phikon-v2

Developed by: Owkin, Inc - Model type: Pretrained vision backbone (ViT-L/16 via DINOv2) - Pretraining dataset: PANCAN-XL, sourced from publi...

πŸ”Ž image-feature-extraction 14,813
timm

timm/vit_so400m_patch14_siglip_384.webli

No description available.

πŸ”Ž image-feature-extraction 14,479
timm

timm/vit_small_plus_patch16_dinov3.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 28.7 - GMACs: 8.1 - Activations (M): 21.8 - Image size: 256 x 256 - Original:...

πŸ”Ž image-feature-extraction 14,037
Xenova

Xenova/dinov2-small

No description available.

πŸ”Ž image-feature-extraction 13,213
timm

timm/convnext_base.dinov3_lvd1689m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 87.6 - GMACs: 15.4 - Activations (M): 28.8 - Image size: 22...

πŸ”Ž image-feature-extraction 12,768
facebook

facebook/dinov3-vitl16-pretrain-sat493m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 12,478
timm

timm/vit_so400m_patch16_siglip_256.v2_webli

Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...

πŸ”Ž image-feature-extraction 12,225
timm

timm/vit_small_patch8_224.dino

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 21.7 - GMACs: 16.8 - Activations (M): 32.9 - Image size: 22...

πŸ”Ž image-feature-extraction 12,066
facebook

facebook/dinov2-with-registers-small

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) originally introduced to do supervised image classification on Image...

πŸ”Ž image-feature-extraction 11,868
facebook

facebook/dinov3-vit7b16-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 11,671
timm

timm/vit_base_patch16_siglip_224.v2_webli

Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...

πŸ”Ž image-feature-extraction 10,971
timm

timm/vit_huge_patch14_clip_224.laion2b

No description available.

πŸ”Ž image-feature-extraction 10,935
paige-ai

paige-ai/Virchow

Developed by: Paige, NYC, USA and Microsoft Research, Cambridge, MA USA - Model Type: Image feature backbone - Model Stats: - Params (M): 63...

πŸ”Ž image-feature-extraction 10,092
gwkrsrch2

gwkrsrch2/siglip2-so400m-patch16-384

No description available.

πŸ”Ž image-feature-extraction 9,465
timm

timm/eva02_base_patch14_224.mim_in22k

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 23.2 - Activations (M): 36.6 - Image size: 22...

πŸ”Ž image-feature-extraction 8,498
timm

timm/vit_base_patch16_dinov3_qkvb.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 85.7 - GMACs: 23.6 - Activations (M): 34.1 - Image size: 256 x 256 - Original...

πŸ”Ž image-feature-extraction 8,241
timm

timm/vit_so400m_patch14_siglip_224.v2_webli

Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...

πŸ”Ž image-feature-extraction 8,182
timm

timm/vit_large_patch16_224.mae

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 303.3 - GMACs: 61.6 - Activations (M): 63.5 - Image size: 2...

πŸ”Ž image-feature-extraction 7,581
timm

timm/vit_7b_patch16_dinov3.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 6716.0 - GMACs: 1775.1 - Activations (M): 515.9 - Image size: 256 x 256 - Ori...

πŸ”Ž image-feature-extraction 7,511
timm

timm/convnextv2_base.fcmae

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 87.7 - GMACs: 15.4 - Activations (M): 28.8 - Image size: 22...

πŸ”Ž image-feature-extraction 7,313
yujiepan

yujiepan/tiny-random-swin-patch4-window7-224

No description available....

πŸ”Ž image-feature-extraction 6,907
timm

timm/vit_huge_plus_patch16_dinov3.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 840.5 - GMACs: 224.9 - Activations (M): 193.6 - Image size: 256 x 256 - Origi...

πŸ”Ž image-feature-extraction 6,557
bioptimus

bioptimus/H-optimus-1

- image-feature-extraction - timm - pathology - histology - medical imaging - self-supervised learning - vision transformer - foundation mod...

πŸ”Ž image-feature-extraction 6,456
timm

timm/vit_base_patch8_224.dino

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 66.9 - Activations (M): 65.7 - Image size: 22...

πŸ”Ž image-feature-extraction 6,327
timm

timm/sam2_hiera_small.fb_r896_2pt1

No description available.

πŸ”Ž image-feature-extraction 6,290
facebook

facebook/dinov3-convnext-large-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 6,006
google

google/vit-base-patch32-224-in21k

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...

πŸ”Ž image-feature-extraction 5,964
hi-wesley

hi-wesley/gemma3-vision-encoder

No description available.

πŸ”Ž image-feature-extraction 5,598
camenduru

camenduru/dinov3-vitl16-pretrain-lvd1689m

These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...

πŸ”Ž image-feature-extraction 5,212
facebook

facebook/ijepa_vith14_1k

No description available.

πŸ”Ž image-feature-extraction 4,892
timm

timm/convnext_large_mlp.clip_laion2b_ft_soup_320

No description available.

πŸ”Ž image-feature-extraction 4,824
timm

timm/vit_small_patch16_dinov3_qkvb.lvd1689m

Model Type: Image Feature Encoder - Model Stats: - Params (M): 21.6 - GMACs: 6.3 - Activations (M): 17.0 - Image size: 256 x 256 - Original:...

πŸ”Ž image-feature-extraction 4,709
bioptimus

bioptimus/H0-mini

- image-feature-extraction - timm - pathology - histology - medical imaging - self-supervised learning - vision transformer - foundation mod...

πŸ”Ž image-feature-extraction 4,671
timm

timm/convnext_large.dinov3_lvd1689m

Model Type: Image classification / feature backbone - Model Stats: - Params (M): 196.2 - GMACs: 34.4 - Activations (M): 43.1 - Image size: 2...

πŸ”Ž image-feature-extraction 4,602
Xenova

Xenova/dino-vits16

No description available.

πŸ”Ž image-feature-extraction 4,554
timm

timm/vit_base_patch32_clip_224.laion2b

No description available.

πŸ”Ž image-feature-extraction 4,077
OpenGVLab

OpenGVLab/InternViT-300M-448px

Model Type: vision foundation model, feature backbone - Model Stats: - Params (M): 304 - Image size: 448 x 448, training with 1 - 12 tiles -...

πŸ”Ž image-feature-extraction 4,002