depth-anything
depth-anything/DA3-GIANT-1.1
DA3 Giant model for multi-view depth estimation, camera pose estimation, and 3D Gaussian estimation. This is the flagship foundation model w...
Model Documentation
Depth Anything 3: DA3-GIANT
Model Description
DA3 Giant model for multi-view depth estimation, camera pose estimation, and 3D Gaussian estimation. This is the flagship foundation model with unified depth-ray representation.
| Property | Value | |----------|-------| | Model Series | Any-view Model | | Parameters | 1.15B | | License | CC BY-NC 4.0 |
⚠️ Non-commercial use only due to CC BY-NC 4.0 license.
Capabilities
Quick Start
Installation
bash
git clone https://github.com/ByteDance-Seed/depth-anything-3
cd depth-anything-3
pip install -e .
Basic Example
python
import torch
from depth_anything_3.api import DepthAnything3
Load model from Hugging Face Hub
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = DepthAnything3.from_pretrained("depth-anything/da3-giant")
model = model.to(device=device)
Run inference on images
images = ["image1.jpg", "image2.jpg"] List of image paths, PIL Images, or numpy arrays
prediction = model.inference(
images,
export_dir="output",
export_format="glb" Options: glb, npz, ply, mini_npz, gs_ply, gs_video
)
Access results
print(prediction.depth.shape) Depth maps: [N, H, W] float32
print(prediction.conf.shape) Confidence maps: [N, H, W] float32
print(prediction.extrinsics.shape) Camera poses (w2c): [N, 3, 4] float32
print(prediction.intrinsics.shape) Camera intrinsics: [N, 3, 3] float32
Command Line Interface
bash
Process images with auto mode
da3 auto path/to/images \
--export-format glb \
--export-dir output \
--model-dir depth-anything/da3-giant
Use backend for faster repeated inference
da3 backend --model-dir depth-anything/da3-giant
da3 auto path/to/images --export-format glb --use-backend
Model Details
Key Insights
💎 A single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization.
noqa: E501
✨ A singular depth-ray representation obviates the need for complex multi-task learning.
Performance
🏆 Depth Anything 3 significantly outperforms:
For detailed benchmarks, please refer to our paper.
noqa: E501
Limitations
noqa: E501
Citation
If you find Depth Anything 3 useful in your research or projects, please cite:
bibtex
@article{depthanything3,
title={Depth Anything 3: Recovering the visual space from any views},
author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, noqa: E501
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2025}
}
Links
Authors
Haotong Lin · Sili Chen · Junhao Liew · Donny Y. Chen · Zhenyu Li · Guang Shi · Jiashi Feng · Bingyi Kang
noqa: E501
Files & Weights
| Filename | Size | Action |
|---|---|---|
| model.safetensors | 5.05 GB |