Results for "reinforcement-learning"

100 matches found.

TianheWu

TianheWu/VisualQuality-R1-7B

No description available.

🤖 reinforcement-learning 15,982
mradermacher

mradermacher/HER-32B-i1-GGUF

No description available.

🤖 reinforcement-learning 12,973
HumanCompatibleAI

HumanCompatibleAI/ppo-seals-CartPole-v0

No description available.

🤖 reinforcement-learning 10,099
ValueFX9507

ValueFX9507/Tifa-Deepsex-14b-CoT-Q8

No description available.

🤖 reinforcement-learning 7,787
ValueFX9507

ValueFX9507/Tifa-Deepsex-14b-CoT

No description available.

🤖 reinforcement-learning 7,486
mradermacher

mradermacher/HER-32B-absolute-heresy-i1-GGUF

No description available.

🤖 reinforcement-learning 7,394
PKU-Alignment

PKU-Alignment/beaver-7b-v1.0-cost

The Beaver cost model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping t...

🤖 reinforcement-learning 6,877
PKU-Alignment

PKU-Alignment/beaver-7b-v1.0-reward

The Beaver reward model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping...

🤖 reinforcement-learning 6,580
mradermacher

mradermacher/Arctic-AWM-8B-i1-GGUF

No description available.

🤖 reinforcement-learning 6,025
HumanCompatibleAI

HumanCompatibleAI/ppo-Pendulum-v1

No description available.

🤖 reinforcement-learning 5,845
mradermacher

mradermacher/MetaphorStar-32B-i1-GGUF

No description available.

🤖 reinforcement-learning 5,051
mradermacher

mradermacher/MediX-R1-8B-i1-GGUF

No description available.

🤖 reinforcement-learning 4,484
mradermacher

mradermacher/Clado-BrowserOS-Action-i1-GGUF

No description available.

🤖 reinforcement-learning 4,449
mradermacher

mradermacher/MediX-R1-2B-i1-GGUF

No description available.

🤖 reinforcement-learning 4,345
mradermacher

mradermacher/drkernel-8b-i1-GGUF

No description available.

🤖 reinforcement-learning 4,166
mradermacher

mradermacher/StarPO-4B-i1-GGUF

No description available.

🤖 reinforcement-learning 4,007
edbeeching

edbeeching/decision-transformer-gym-hopper-medium

- deep-reinforcement-learning - reinforcement-learning - decision-transformer - gym-continous-control...

🤖 reinforcement-learning 3,978
mradermacher

mradermacher/Arctic-AWM-14B-i1-GGUF

No description available.

🤖 reinforcement-learning 3,971
mradermacher

mradermacher/StarPO-1.7B-i1-GGUF

No description available.

🤖 reinforcement-learning 2,738
mradermacher

mradermacher/flawed-fictions-gemma-3-4b-i1-GGUF

No description available.

🤖 reinforcement-learning 2,731
mradermacher

mradermacher/flawed-fictions-qwen3-4b-i1-GGUF

No description available.

🤖 reinforcement-learning 2,661
mradermacher

mradermacher/MetaphorStar-3B-i1-GGUF

No description available.

🤖 reinforcement-learning 2,650
mradermacher

mradermacher/flawed-fictions-olmo-3-7b-i1-GGUF

No description available.

🤖 reinforcement-learning 2,473
mradermacher

mradermacher/MediX-R1-30B-i1-GGUF

No description available.

🤖 reinforcement-learning 2,361
ValueFX9507

ValueFX9507/Tifa-Deepsex-14b-CoT-GGUF-Q4

No description available.

🤖 reinforcement-learning 2,337
mradermacher

mradermacher/NurseSim-Triage-Llama-3.2-3B-i1-GGUF

No description available.

🤖 reinforcement-learning 2,206
jasonyandell

jasonyandell/zeb-42

| Property | Value | |----| | Architecture | Transformer encoder (pre-LN) + policy/value heads | | Parameters | 556,970 | | Embedding dim | ...

🤖 reinforcement-learning 2,081
mradermacher

mradermacher/drkernel-14b-i1-GGUF

No description available.

🤖 reinforcement-learning 1,954
mradermacher

mradermacher/SEAD-14B-GGUF

No description available.

🤖 reinforcement-learning 1,920
mradermacher

mradermacher/ASTRA-14B-Thinking-v1-GGUF

No description available.

🤖 reinforcement-learning 1,859
mradermacher

mradermacher/IntentRL-Ambig-Text2SQL-4B-i1-GGUF

No description available.

🤖 reinforcement-learning 1,804
mradermacher

mradermacher/Arctic-AWM-4B-i1-GGUF

No description available.

🤖 reinforcement-learning 1,702
mradermacher

mradermacher/inf-query-aligner-GGUF

No description available.

🤖 reinforcement-learning 1,614
ValueFX9507

ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q8

No description available.

🤖 reinforcement-learning 1,476
mradermacher

mradermacher/HER-32B-ACL-GGUF

No description available.

🤖 reinforcement-learning 1,330
mradermacher

mradermacher/StarPO-1.7B-GGUF

No description available.

🤖 reinforcement-learning 1,296
mradermacher

mradermacher/FlowSteer-8b-GGUF

No description available.

🤖 reinforcement-learning 1,268
mradermacher

mradermacher/drkernel-8b-GGUF

No description available.

🤖 reinforcement-learning 1,185
ValueFX9507

ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4

No description available.

🤖 reinforcement-learning 1,169
rpharale

rpharale/ppo-Huggy

This is a trained model of a ppo agent playing Huggy using the Unity ML-Agents Library. The Documentation: https://github.com/huggingface/ml...

🤖 reinforcement-learning 991
RLinf

RLinf/RLinf-OpenVLAOFT-LIBERO-130-Base-Lora

The RLinf-openvlaoft-libero series is trained on RLinf/RLinf-OpenVLAOFT-LIBERO-xxx-Base-Lora (including libero90 and libero130) and Haozhan7...

🤖 reinforcement-learning 914
mradermacher

mradermacher/StarPO-4B-GGUF

No description available.

🤖 reinforcement-learning 913
mradermacher

mradermacher/Arctic-AWM-14B-GGUF

No description available.

🤖 reinforcement-learning 886
mradermacher

mradermacher/Rotten.Llama-3.2-1B-i1-GGUF

No description available.

🤖 reinforcement-learning 882
mradermacher

mradermacher/HER-32B-absolute-heresy-GGUF

No description available.

🤖 reinforcement-learning 864
PKU-Alignment

PKU-Alignment/beaver-7b-unified-reward

The Beaver reward model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping...

🤖 reinforcement-learning 853
mradermacher

mradermacher/IntentRL-Ambig-Text2SQL-4B-GGUF

No description available.

🤖 reinforcement-learning 841
marcogfedozzi

marcogfedozzi/ppo-LunarLander-v2

No description available.

🤖 reinforcement-learning 800
mradermacher

mradermacher/Clado-BrowserOS-Action-GGUF

No description available.

🤖 reinforcement-learning 782
mradermacher

mradermacher/MediX-R1-8B-GGUF

No description available.

🤖 reinforcement-learning 782
mradermacher

mradermacher/MetaphorStar-3B-GGUF

No description available.

🤖 reinforcement-learning 767
mradermacher

mradermacher/eubiota-planner-8b-i1-GGUF

No description available.

🤖 reinforcement-learning 755
mradermacher

mradermacher/drkernel-14b-GGUF

No description available.

🤖 reinforcement-learning 743
PKU-Alignment

PKU-Alignment/beaver-7b-unified-cost

The Beaver cost model is a preference model trained using the PKU-SafeRLHF dataset. It can play a role in the safe RLHF algorithm, helping t...

🤖 reinforcement-learning 730
mradermacher

mradermacher/GRiP-i1-GGUF

No description available.

🤖 reinforcement-learning 687
mradermacher

mradermacher/Arctic-AWM-8B-GGUF

No description available.

🤖 reinforcement-learning 661
mradermacher

mradermacher/NurseSim-Triage-Llama-3.2-3B-GGUF

No description available.

🤖 reinforcement-learning 649
sb3

sb3/ppo-CartPole-v1

No description available.

🤖 reinforcement-learning 645
mradermacher

mradermacher/flawed-fictions-qwen3-4b-GGUF

No description available.

🤖 reinforcement-learning 630
ValueFX9507

ValueFX9507/Tifa-DeepsexV3-14b-GGUF-Q6

No description available.

🤖 reinforcement-learning 628
TaTo69

TaTo69/ppo-LunarLander-v3

No description available.

🤖 reinforcement-learning 623
mradermacher

mradermacher/Arctic-AWM-4B-GGUF

No description available.

🤖 reinforcement-learning 621
mradermacher

mradermacher/MediX-R1-2B-GGUF

No description available.

🤖 reinforcement-learning 593
infly

infly/inf-query-aligner

No description available.

🤖 reinforcement-learning 591
mradermacher

mradermacher/Tifa-DeepsexV2-7b-MGRPO-safetensors-i1-GGUF

No description available.

🤖 reinforcement-learning 561
mradermacher

mradermacher/Rotten.Llama-3.2-1B-GGUF

No description available.

🤖 reinforcement-learning 543
mradermacher

mradermacher/Tifa-Deepsex-14b-CoT-GGUF

No description available.

🤖 reinforcement-learning 491
mradermacher

mradermacher/MediX-R1-30B-GGUF

No description available.

🤖 reinforcement-learning 491
mradermacher

mradermacher/flawed-fictions-gemma-3-4b-GGUF

No description available.

🤖 reinforcement-learning 489
mradermacher

mradermacher/eubiota-planner-8b-GGUF

No description available.

🤖 reinforcement-learning 477
cagataydev

cagataydev/sac-unitree-g1-mujoco

No description available.

🤖 reinforcement-learning 475
ThomasSimonini

ThomasSimonini/ppo-SpaceInvadersNoFrameskip-v4

No description available.

🤖 reinforcement-learning 460
mradermacher

mradermacher/flawed-fictions-olmo-3-7b-GGUF

No description available.

🤖 reinforcement-learning 460
mradermacher

mradermacher/beaver-7b-v2.0-GGUF

No description available.

🤖 reinforcement-learning 452
ssssmark

ssssmark/Aes-R1

No description available.

🤖 reinforcement-learning 443
ValueFX9507

ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-F16

No description available.

🤖 reinforcement-learning 439
mradermacher

mradermacher/BeamPERL-GGUF

No description available.

🤖 reinforcement-learning 437
nidek

nidek/ppo-SnowballTarget

This is a trained model of a ppo agent playing SnowballTarget using the Unity ML-Agents Library. The Documentation: https://github.com/huggi...

🤖 reinforcement-learning 436
mradermacher

mradermacher/R-PRM-7B-DPO-i1-GGUF

No description available.

🤖 reinforcement-learning 434
mradermacher

mradermacher/VeriReason-Qwen2.5-1.5b-RTLCoder-Verilog-GRPO-reasoning-tb-GGUF

No description available.

🤖 reinforcement-learning 429
cagataydev

cagataydev/sac-unitree-go2-mujoco

No description available.

🤖 reinforcement-learning 416
Malgesw

Malgesw/ppo-Huggy

No description available.

🤖 reinforcement-learning 416
mradermacher

mradermacher/VeriReason-Qwen2.5-7b-RTLCoder-Verilog-GRPO-reasoning-tb-i1-GGUF

No description available.

🤖 reinforcement-learning 394
mradermacher

mradermacher/arc-teacher-8b-i1-GGUF

No description available.

🤖 reinforcement-learning 390
mradermacher

mradermacher/LongWriter-Zero-32B-GGUF

No description available.

🤖 reinforcement-learning 385
edbeeching

edbeeching/decision-transformer-gym-hopper-expert

- deep-reinforcement-learning - reinforcement-learning - decision-transformer - gym-continous-control...

🤖 reinforcement-learning 381
sb3

sb3/dqn-PongNoFrameskip-v4

No description available.

🤖 reinforcement-learning 379
DocPereira

DocPereira/PEAL_V4_LHP_Zero_Entropy_Controlled

- pt - en licensename: peal-v4-sovereign licenselink: >- https://huggingface.co/DocPereira/PEALV4LHPZeroEntropyControlled/blob/main/ identif...

🤖 reinforcement-learning 371
mradermacher

mradermacher/Orsta-7B-i1-GGUF

No description available.

🤖 reinforcement-learning 320
talebzeghmi

talebzeghmi/ppo-SnowballTarget

No description available.

🤖 reinforcement-learning 311
mradermacher

mradermacher/SpatialThinker-3B-GGUF

No description available.

🤖 reinforcement-learning 311
infly

infly/inf-retriever-v1-pro

No description available.

🤖 reinforcement-learning 307
zzsi

zzsi/swm-dmc-expert-policies

No description available.

🤖 reinforcement-learning 287
mradermacher

mradermacher/inf-retriever-v1-pro-GGUF

No description available.

🤖 reinforcement-learning 282
Open-Reasoner-Zero

Open-Reasoner-Zero/Open-Reasoner-Zero-7B

No description available.

🤖 reinforcement-learning 280
mradermacher

mradermacher/SpatialThinker-7B-i1-GGUF

No description available.

🤖 reinforcement-learning 279
mradermacher

mradermacher/Tifa-Deepsex-14b-CoT-i1-GGUF

No description available.

🤖 reinforcement-learning 277
mradermacher

mradermacher/InfiGUI-G1-3B-i1-GGUF

No description available.

🤖 reinforcement-learning 269
0sunfire0

0sunfire0/poca-SoccerTwos_00

No description available.

🤖 reinforcement-learning 268
nicklashansen

nicklashansen/newt

We open-source 200+ model checkpoints, including a multi-task Newt agent trained on 200 tasks simultaneously. We are excited to see what the...

🤖 reinforcement-learning 259