Results for "reinforcement-learning"
100 matches found.
TianheWu/VisualQuality-R1-7B
No description available.
mradermacher/HER-32B-i1-GGUF
No description available.
HumanCompatibleAI/ppo-seals-CartPole-v0
No description available.
ValueFX9507/Tifa-Deepsex-14b-CoT-Q8
No description available.
ValueFX9507/Tifa-Deepsex-14b-CoT
No description available.
mradermacher/HER-32B-absolute-heresy-i1-GGUF
No description available.
PKU-Alignment/beaver-7b-v1.0-cost
The Beaver cost model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping t...
PKU-Alignment/beaver-7b-v1.0-reward
The Beaver reward model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping...
mradermacher/Arctic-AWM-8B-i1-GGUF
No description available.
HumanCompatibleAI/ppo-Pendulum-v1
No description available.
mradermacher/MetaphorStar-32B-i1-GGUF
No description available.
mradermacher/MediX-R1-8B-i1-GGUF
No description available.
mradermacher/Clado-BrowserOS-Action-i1-GGUF
No description available.
mradermacher/MediX-R1-2B-i1-GGUF
No description available.
mradermacher/drkernel-8b-i1-GGUF
No description available.
mradermacher/StarPO-4B-i1-GGUF
No description available.
edbeeching/decision-transformer-gym-hopper-medium
- deep-reinforcement-learning - reinforcement-learning - decision-transformer - gym-continous-control...
mradermacher/Arctic-AWM-14B-i1-GGUF
No description available.
mradermacher/StarPO-1.7B-i1-GGUF
No description available.
mradermacher/flawed-fictions-gemma-3-4b-i1-GGUF
No description available.
mradermacher/flawed-fictions-qwen3-4b-i1-GGUF
No description available.
mradermacher/MetaphorStar-3B-i1-GGUF
No description available.
mradermacher/flawed-fictions-olmo-3-7b-i1-GGUF
No description available.
mradermacher/MediX-R1-30B-i1-GGUF
No description available.
ValueFX9507/Tifa-Deepsex-14b-CoT-GGUF-Q4
No description available.
mradermacher/NurseSim-Triage-Llama-3.2-3B-i1-GGUF
No description available.
jasonyandell/zeb-42
| Property | Value | |----| | Architecture | Transformer encoder (pre-LN) + policy/value heads | | Parameters | 556,970 | | Embedding dim | ...
mradermacher/drkernel-14b-i1-GGUF
No description available.
mradermacher/SEAD-14B-GGUF
No description available.
mradermacher/ASTRA-14B-Thinking-v1-GGUF
No description available.
mradermacher/IntentRL-Ambig-Text2SQL-4B-i1-GGUF
No description available.
mradermacher/Arctic-AWM-4B-i1-GGUF
No description available.
mradermacher/inf-query-aligner-GGUF
No description available.
ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q8
No description available.
mradermacher/HER-32B-ACL-GGUF
No description available.
mradermacher/StarPO-1.7B-GGUF
No description available.
mradermacher/FlowSteer-8b-GGUF
No description available.
mradermacher/drkernel-8b-GGUF
No description available.
ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4
No description available.
rpharale/ppo-Huggy
This is a trained model of a ppo agent playing Huggy using the Unity ML-Agents Library. The Documentation: https://github.com/huggingface/ml...
RLinf/RLinf-OpenVLAOFT-LIBERO-130-Base-Lora
The RLinf-openvlaoft-libero series is trained on RLinf/RLinf-OpenVLAOFT-LIBERO-xxx-Base-Lora (including libero90 and libero130) and Haozhan7...
mradermacher/StarPO-4B-GGUF
No description available.
mradermacher/Arctic-AWM-14B-GGUF
No description available.
mradermacher/Rotten.Llama-3.2-1B-i1-GGUF
No description available.
mradermacher/HER-32B-absolute-heresy-GGUF
No description available.
PKU-Alignment/beaver-7b-unified-reward
The Beaver reward model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping...
mradermacher/IntentRL-Ambig-Text2SQL-4B-GGUF
No description available.
marcogfedozzi/ppo-LunarLander-v2
No description available.
mradermacher/Clado-BrowserOS-Action-GGUF
No description available.
mradermacher/MediX-R1-8B-GGUF
No description available.
mradermacher/MetaphorStar-3B-GGUF
No description available.
mradermacher/eubiota-planner-8b-i1-GGUF
No description available.
mradermacher/drkernel-14b-GGUF
No description available.
PKU-Alignment/beaver-7b-unified-cost
The Beaver cost model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping t...
mradermacher/GRiP-i1-GGUF
No description available.
mradermacher/Arctic-AWM-8B-GGUF
No description available.
mradermacher/NurseSim-Triage-Llama-3.2-3B-GGUF
No description available.
sb3/ppo-CartPole-v1
No description available.
mradermacher/flawed-fictions-qwen3-4b-GGUF
No description available.
ValueFX9507/Tifa-DeepsexV3-14b-GGUF-Q6
No description available.
TaTo69/ppo-LunarLander-v3
No description available.
mradermacher/Arctic-AWM-4B-GGUF
No description available.
mradermacher/MediX-R1-2B-GGUF
No description available.
infly/inf-query-aligner
No description available.
mradermacher/Tifa-DeepsexV2-7b-MGRPO-safetensors-i1-GGUF
No description available.
mradermacher/Rotten.Llama-3.2-1B-GGUF
No description available.
mradermacher/Tifa-Deepsex-14b-CoT-GGUF
No description available.
mradermacher/MediX-R1-30B-GGUF
No description available.
mradermacher/flawed-fictions-gemma-3-4b-GGUF
No description available.
mradermacher/eubiota-planner-8b-GGUF
No description available.
cagataydev/sac-unitree-g1-mujoco
No description available.
ThomasSimonini/ppo-SpaceInvadersNoFrameskip-v4
No description available.
mradermacher/flawed-fictions-olmo-3-7b-GGUF
No description available.
mradermacher/beaver-7b-v2.0-GGUF
No description available.
ssssmark/Aes-R1
No description available.
ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-F16
No description available.
mradermacher/BeamPERL-GGUF
No description available.
nidek/ppo-SnowballTarget
This is a trained model of a ppo agent playing SnowballTarget using the Unity ML-Agents Library. The Documentation: https://github.com/huggi...
mradermacher/R-PRM-7B-DPO-i1-GGUF
No description available.
mradermacher/VeriReason-Qwen2.5-1.5b-RTLCoder-Verilog-GRPO-reasoning-tb-GGUF
No description available.
cagataydev/sac-unitree-go2-mujoco
No description available.
Malgesw/ppo-Huggy
No description available.
mradermacher/VeriReason-Qwen2.5-7b-RTLCoder-Verilog-GRPO-reasoning-tb-i1-GGUF
No description available.
mradermacher/arc-teacher-8b-i1-GGUF
No description available.
mradermacher/LongWriter-Zero-32B-GGUF
No description available.
edbeeching/decision-transformer-gym-hopper-expert
- deep-reinforcement-learning - reinforcement-learning - decision-transformer - gym-continous-control...
sb3/dqn-PongNoFrameskip-v4
No description available.
DocPereira/PEAL_V4_LHP_Zero_Entropy_Controlled
- pt - en licensename: peal-v4-sovereign licenselink: >- https://huggingface.co/DocPereira/PEALV4LHPZeroEntropyControlled/blob/main/ identif...
mradermacher/Orsta-7B-i1-GGUF
No description available.
talebzeghmi/ppo-SnowballTarget
No description available.
mradermacher/SpatialThinker-3B-GGUF
No description available.
infly/inf-retriever-v1-pro
No description available.
zzsi/swm-dmc-expert-policies
No description available.
mradermacher/inf-retriever-v1-pro-GGUF
No description available.
Open-Reasoner-Zero/Open-Reasoner-Zero-7B
No description available.
mradermacher/SpatialThinker-7B-i1-GGUF
No description available.
mradermacher/Tifa-Deepsex-14b-CoT-i1-GGUF
No description available.
mradermacher/InfiGUI-G1-3B-i1-GGUF
No description available.
0sunfire0/poca-SoccerTwos_00
No description available.
nicklashansen/newt
We open-source 200+ model checkpoints, including a multi-task Newt agent trained on 200 tasks simultaneously. We are excited to see what the...