Results for "image-feature-extraction"
100 matches found.
facebook/dinov2-small
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...
google/vit-base-patch16-224-in21k
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...
facebook/dinov2-base
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...
facebook/dinov2-large
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...
facebook/dinov3-vitb16-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
facebook/dinov3-vitl16-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
timm/vit_small_patch14_reg4_dinov2.lvd142m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 22.1 - GMACs: 29.6 - Activations (M): 57.5 - Image size: 51...
timm/vit_base_patch14_dinov2.lvd142m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 86.6 - GMACs: 151.7 - Activations (M): 397.6 - Image size: ...
facebook/dino-vitb16
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...
nomic-ai/nomic-embed-vision-v1.5
No description available.
timm/vit_large_patch14_reg4_dinov2.lvd142m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 304.4 - GMACs: 416.1 - Activations (M): 305.3 - Image size:...
timm/vit_base_patch14_reg4_dinov2.lvd142m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 86.6 - GMACs: 117.5 - Activations (M): 115.0 - Image size: ...
timm/vit_base_patch16_clip_224.openai
The CLIP model was developed by researchers at OpenAI to learn about what contributes to robustness in computer vision tasks. The model was ...
facebook/dinov3-vits16-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
timm/vit_base_patch16_dinov3.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 85.6 - GMACs: 23.6 - Activations (M): 34.1 - Image size: 256 x 256 - Original...
facebook/dinov2-giant
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...
timm/vit_large_patch16_dinov3.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 303.1 - GMACs: 82.4 - Activations (M): 90.6 - Image size: 256 x 256 - Origina...
facebook/dinov3-vith16plus-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
facebook/dino-vits16
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fash...
facebook/dinov2-with-registers-base
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) originally introduced to do supervised image classification on Image...
paige-ai/Virchow2
Developed by: Paige, NYC, USA and Microsoft Research, Cambridge, MA USA - Model Type: Image feature backbone - Model Stats: - Params (M): 63...
timm/vit_large_patch14_dinov2.lvd142m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 304.4 - GMACs: 507.1 - Activations (M): 1058.8 - Image size...
microsoft/rad-dino
RAD-DINO is described in detail in Exploring Scalable Medical Image Encoders Beyond Text Supervision (F. PΓ©rez-GarcΓa, H. Sharma, S. Bond-Ta...
timm/vit_small_patch14_dinov2.lvd142m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 22.1 - GMACs: 46.8 - Activations (M): 198.8 - Image size: 5...
timm/samvit_base_patch16.sa1b
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 89.7 - GMACs: 486.4 - Activations (M): 1343.3 - Image size:...
facebook/dinov3-vits16plus-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
timm/vit_base_patch16_224.orig_in21k
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 16.9 - Activations (M): 16.5 - Image size: 22...
google/vit-large-patch16-224-in21k
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...
timm/vit_small_patch16_dinov3.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 21.6 - GMACs: 6.3 - Activations (M): 17.0 - Image size: 256 x 256 - Original:...
prov-gigapath/prov-gigapath
Overview of Prov-GigaPath model architecture...
timm/convnext_tiny.dinov3_lvd1689m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 27.8 - GMACs: 4.5 - Activations (M): 13.4 - Image size: 224...
MahmoodLab/TITAN
Developed by: Mahmood Lab AI for Pathology @ Harvard/BWH - Model type: Pretrained vision-language encoders - Pretraining dataset: Mass-340K,...
MahmoodLab/UNI2-h
Developed by: Mahmood Lab AI for Pathology @ Harvard/BWH - Model type: Pretrained vision backbone (ViT-H/14 via DINOv2) for multi-purpose ev...
facebook/dinov3-convnext-base-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
MahmoodLab/CONCH
No description available.
timm/vit_base_patch16_224.dino
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 16.9 - Activations (M): 16.5 - Image size: 22...
TTPlanet/TTPLanet_SDXL_Controlnet_Tile_Realistic
Here's a refined version of the update notes for the Tile V2: -Introducing the new Tile V2, enhanced with a vastly improved training dataset...
google/vit-huge-patch14-224-in21k
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...
StanfordAIMI/dinov2-base-xray-224
AIMI FMs: A Collection of Foundation Models in Radiology...
timm/vit_large_patch16_siglip_256.v2_webli
Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...
histai/hibou-L
No description available.
facebook/dinov3-convnext-small-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
timm/vit_base_patch16_siglip_512.v2_webli
Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...
PIA-SPACE-LAB/dinov3-vitl-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
facebook/dinov3-convnext-tiny-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
timm/vit_small_patch16_224.dino
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 21.7 - GMACs: 4.3 - Activations (M): 8.2 - Image size: 224 ...
timm/vit_large_patch16_dinov3.sat493m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 303.1 - GMACs: 82.4 - Activations (M): 90.6 - Image size: 256 x 256 - Origina...
timm/vit_so400m_patch16_siglip_512.v2_webli
Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...
timm/convnextv2_tiny.fcmae
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 27.9 - GMACs: 4.5 - Activations (M): 13.4 - Image size: 224...
nvidia/RADIO-L
No description available.
bioptimus/H-optimus-0
- image-feature-extraction - timm - pathology - histology - medical imaging - self-supervised learning - vision transformer - foundation mod...
timm/convnext_small.dinov3_lvd1689m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 49.5 - GMACs: 8.7 - Activations (M): 21.6 - Image size: 224...
timm/naflexvit_so400m_patch16_siglip.v2_webli
No description available.
Lin-Chen/ShareGPT4V-7B_Pretrained_vit-large336-l12
Model type: This is the vision tower of ShareGPT4V-7B fine-tuned with our ShareGPT4V dataset. Model date: This vision tower was trained in N...
py-feat/img2pose
img2pose uses Faster R-CNN to predict 6 Degree of Freedom Pose (DoF) for all faces in the photo. An interesting property of this model is th...
py-feat/resmasknet
resmasknet combines residual masking with unet architecture to predict 7 facial emotion categories from images....
timm/aimv2_large_patch14_224.apple_pt_dist
No description available.
facebook/dinov2-with-registers-large
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) originally introduced to do supervised image classification on Image...
MahmoodLab/UNI
Developed by: Mahmood Lab AI for Pathology @ Harvard/BWH - Model type: Pretrained vision backbone (ViT-L/16 via DINOv2) for multi-purpose ev...
timm/vit_base_patch16_224.mae
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 17.6 - Activations (M): 23.9 - Image size: 22...
DAMO-NLP-SG/VL3-SigLIP-NaViT
No description available.
microsoft/rad-dino-maira-2
RAD-DINO-MAIRA-2 is a vision transformer model trained to encode chest X-rays using the self-supervised learning method DINOv2. RAD-DINO-MAI...
timm/vit_base_patch16_siglip_256.v2_webli
Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...
owkin/phikon-v2
Developed by: Owkin, Inc - Model type: Pretrained vision backbone (ViT-L/16 via DINOv2) - Pretraining dataset: PANCAN-XL, sourced from publi...
timm/vit_so400m_patch14_siglip_384.webli
No description available.
timm/vit_small_plus_patch16_dinov3.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 28.7 - GMACs: 8.1 - Activations (M): 21.8 - Image size: 256 x 256 - Original:...
Xenova/dinov2-small
No description available.
timm/convnext_base.dinov3_lvd1689m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 87.6 - GMACs: 15.4 - Activations (M): 28.8 - Image size: 22...
facebook/dinov3-vitl16-pretrain-sat493m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
timm/vit_so400m_patch16_siglip_256.v2_webli
Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...
timm/vit_small_patch8_224.dino
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 21.7 - GMACs: 16.8 - Activations (M): 32.9 - Image size: 22...
facebook/dinov2-with-registers-small
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) originally introduced to do supervised image classification on Image...
facebook/dinov3-vit7b16-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
timm/vit_base_patch16_siglip_224.v2_webli
Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...
timm/vit_huge_patch14_clip_224.laion2b
No description available.
paige-ai/Virchow
Developed by: Paige, NYC, USA and Microsoft Research, Cambridge, MA USA - Model Type: Image feature backbone - Model Stats: - Params (M): 63...
gwkrsrch2/siglip2-so400m-patch16-384
No description available.
timm/eva02_base_patch14_224.mim_in22k
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 23.2 - Activations (M): 36.6 - Image size: 22...
timm/vit_base_patch16_dinov3_qkvb.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 85.7 - GMACs: 23.6 - Activations (M): 34.1 - Image size: 256 x 256 - Original...
timm/vit_so400m_patch14_siglip_224.v2_webli
Dataset: webli - Papers: - SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Fea...
timm/vit_large_patch16_224.mae
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 303.3 - GMACs: 61.6 - Activations (M): 63.5 - Image size: 2...
timm/vit_7b_patch16_dinov3.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 6716.0 - GMACs: 1775.1 - Activations (M): 515.9 - Image size: 256 x 256 - Ori...
timm/convnextv2_base.fcmae
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 87.7 - GMACs: 15.4 - Activations (M): 28.8 - Image size: 22...
yujiepan/tiny-random-swin-patch4-window7-224
No description available....
timm/vit_huge_plus_patch16_dinov3.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 840.5 - GMACs: 224.9 - Activations (M): 193.6 - Image size: 256 x 256 - Origi...
bioptimus/H-optimus-1
- image-feature-extraction - timm - pathology - histology - medical imaging - self-supervised learning - vision transformer - foundation mod...
timm/vit_base_patch8_224.dino
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 85.8 - GMACs: 66.9 - Activations (M): 65.7 - Image size: 22...
timm/sam2_hiera_small.fb_r896_2pt1
No description available.
facebook/dinov3-convnext-large-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
google/vit-base-patch32-224-in21k
The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, ...
hi-wesley/gemma3-vision-encoder
No description available.
camenduru/dinov3-vitl16-pretrain-lvd1689m
These are Vision Transformer and ConvNeXt models trained following the method described in the DINOv3 paper. 12 models are provided: - 10 mo...
facebook/ijepa_vith14_1k
No description available.
timm/convnext_large_mlp.clip_laion2b_ft_soup_320
No description available.
timm/vit_small_patch16_dinov3_qkvb.lvd1689m
Model Type: Image Feature Encoder - Model Stats: - Params (M): 21.6 - GMACs: 6.3 - Activations (M): 17.0 - Image size: 256 x 256 - Original:...
bioptimus/H0-mini
- image-feature-extraction - timm - pathology - histology - medical imaging - self-supervised learning - vision transformer - foundation mod...
timm/convnext_large.dinov3_lvd1689m
Model Type: Image classification / feature backbone - Model Stats: - Params (M): 196.2 - GMACs: 34.4 - Activations (M): 43.1 - Image size: 2...
Xenova/dino-vits16
No description available.
timm/vit_base_patch32_clip_224.laion2b
No description available.
OpenGVLab/InternViT-300M-448px
Model Type: vision foundation model, feature backbone - Model Stats: - Params (M): 304 - Image size: 448 x 448, training with 1 - 12 tiles -...