Gemma 4 26B and 31B Uncensored Models Released

https://x.com/dealignai/status/2040216221493395831
Technical announcement and benchmark post on social media · Researched April 4, 2026

Summary

On April 4, 2026, ML researcher dealignai announced the release of uncensored versions of Google's newly-launched Gemma 4 large language models (26B and 31B parameter sizes), claiming these abliterated variants perform within 5% of the original models on MMLU benchmarks. The post, accompanied by a benchmark comparison table, represents a significant moment in the ongoing tension between AI safety and open-source model accessibility.

Gemma 4 was released by Google on April 2, 2026, as its most capable open-source model family to date, with the 31B model ranking #3 on the Arena AI leaderboard and the 26B MoE model ranking #6. The models feature advanced reasoning, agentic workflows, native multimodal support (vision and audio), 256K context windows, and—critically—built-in safety guardrails designed to prevent the models from generating harmful, illegal, or unethical content.

Dealignai's post demonstrates that these safety guardrails can be circumvented through abliteration, a technique that surgically removes the neural mechanisms underlying refusal behavior. By identifying and neutralizing specific latent directions in the model's transformer layers that encode safety responses, researchers can "uncensor" models without requiring expensive retraining. The claim that performance remains within 5% of baseline is significant because it suggests safety alignment can be removed with minimal capability loss—a finding that contradicts assumptions that safety mechanisms are deeply integrated with reasoning abilities.

The rapid emergence of uncensored variants (dealignai's alongside competitors like HauhauCS with "aggressive" uncensoring methods) within hours of Gemma 4's release demonstrates a broader community pattern: whenever Google or other labs release new open models, jailbreak and abliteration specialists immediately produce uncensored versions. This has become a reliable, predictable cycle in the open-source AI ecosystem. The post signals both technical capability and ideological commitment to unrestricted model access, with dealignai's Hugging Face profile explicitly stating an "obsession with making high quality red teaming LLM's" and rejecting "fine tuning, lora, template bs" in favor of "pure real ablation."

Key Takeaways

About

Author: dealignai (@dealignai)

Publication: X (formerly Twitter)

Published: 2026-04-04

Sentiment / Tone

Technical and matter-of-fact, with an undercurrent of ideological confidence. Dealignai presents the uncensored models as a straightforward technical achievement (with benchmark evidence), without rhetorical flourish, suggesting this is routine work within their community. The brevity and directness of the post—just a claim and a benchmark image—conveys certainty and normalcy around the process of removing safety mechanisms. There's an implicit position that uncensoring is not controversial but rather a natural and desirable capability, with no disclaimer or discussion of potential harms. The tone reflects a researcher community that views safety guardrails as obstacles to be overcome through technical methods rather than as valuable safeguards, positioning unrestricted model access as the default good.

Related Links

Research Notes

**Author Credibility & Background:** Dealignai is an active ML researcher in the open-source jailbreaking and red-teaming community, maintaining 84+ public models on Hugging Face, many of them abliterated or "uncensored" variants of frontier models (Nemotron, Qwen, Mistral, MiniMax). The account links to dealign.ai, a dedicated website, and appears to be the primary work of this researcher. While not an academic publishing researcher with visible papers, they demonstrate consistent technical competence across multiple model architectures and abliteration methods. The work is positioned as red-teaming and safety research (they maintain "Safety Across Scale" and "MoE Safety - GateBreaker Findings" research spaces), though the primary output is uncensored models rather than published findings. This positions them as part of a broader open-source community focused on adversarial testing of AI safety mechanisms. **Broader Context & Reactions:** The post arrives in a heated moment for Google's Gemma line. Gemma 3 was reportedly pulled from Google AI Studio after Senator Blackburn complaints about jailbreak capabilities. Gemma 4's release has already drawn extensive community analysis: Reddit discussions show multiple jailbreak methods working within hours (basic system prompt engineering, Heretic's ARA method, CRACK abliteration). HauhauCS released "fully unlocked" aggressive uncensored variants claiming 0/465 refusals, while other teams focused on more conservative uncensoring. The technical bar for abliteration has clearly lowered—multiple independent implementations now exist (Heretic, CRACK, Nous Research's llm-abliteration), and research papers on abliteration methods appeared on arXiv in 2025. This is no longer specialized knowledge but reproducible community practice. **Significance & Implications:** This post exemplifies a fundamental mismatch between Google's safety intentions and technical reality: no matter how carefully a model is aligned at release, if it's open-source under a permissive license, an active community will remove those mechanisms within hours. The "within 5% of base models" claim is particularly damaging to the alignment-as-integrated-capability narrative—if safety can be surgically removed with minimal performance loss, it suggests safety was never deeply woven into reasoning but rather added as a separable layer. This has implications for AI governance: it suggests that safety cannot be relied upon in open-source models once released, and that purely technical approaches to alignment (fine-tuning, RLHF) are fragile against motivated modification. **Potential Biases & Caveats:** (1) The 5% performance claim is unverified in the post itself—viewers must trust the benchmark image and dealignai's methodology. (2) The framing ignores potential harms: uncensored models can be used for disinformation, illegal content generation, social engineering, etc. (3) The "red teaming" framing is a legitimizing narrative for what is functionally model jailbreaking. (4) Dealignai's position appears ideologically driven (evidenced by account name "dealignai" meaning "de-align"), which may affect their work. (5) MMLU benchmarks measure general knowledge, not downstream safety behavior—they tell us nothing about whether the uncensored model will still refuse harmful requests.

Topics

LLM safety and alignment removal Abliteration and model jailbreaking Open-source AI governance Gemma 4 models and benchmarks Red teaming and adversarial AI Model quantization and MLX optimization