URL copied — paste it as a website source in a new notebook
Summary
On April 4, 2026, ML researcher dealignai announced the release of uncensored versions of Google's newly-launched Gemma 4 large language models (26B and 31B parameter sizes), claiming these abliterated variants perform within 5% of the original models on MMLU benchmarks. The post, accompanied by a benchmark comparison table, represents a significant moment in the ongoing tension between AI safety and open-source model accessibility.
Gemma 4 was released by Google on April 2, 2026, as its most capable open-source model family to date, with the 31B model ranking #3 on the Arena AI leaderboard and the 26B MoE model ranking #6. The models feature advanced reasoning, agentic workflows, native multimodal support (vision and audio), 256K context windows, and—critically—built-in safety guardrails designed to prevent the models from generating harmful, illegal, or unethical content.
Dealignai's post demonstrates that these safety guardrails can be circumvented through abliteration, a technique that surgically removes the neural mechanisms underlying refusal behavior. By identifying and neutralizing specific latent directions in the model's transformer layers that encode safety responses, researchers can "uncensor" models without requiring expensive retraining. The claim that performance remains within 5% of baseline is significant because it suggests safety alignment can be removed with minimal capability loss—a finding that contradicts assumptions that safety mechanisms are deeply integrated with reasoning abilities.
The rapid emergence of uncensored variants (dealignai's alongside competitors like HauhauCS with "aggressive" uncensoring methods) within hours of Gemma 4's release demonstrates a broader community pattern: whenever Google or other labs release new open models, jailbreak and abliteration specialists immediately produce uncensored versions. This has become a reliable, predictable cycle in the open-source AI ecosystem. The post signals both technical capability and ideological commitment to unrestricted model access, with dealignai's Hugging Face profile explicitly stating an "obsession with making high quality red teaming LLM's" and rejecting "fine tuning, lora, template bs" in favor of "pure real ablation."
Key Takeaways
Dealignai released uncensored Gemma 4 26B and 31B models using abliteration techniques, with performance within 5% of Google's original models on MMLU benchmarks—suggesting safety mechanisms can be removed without major capability degradation.
Abliteration works by identifying and neutralizing specific neural directions in transformer layers that encode refusal behavior, allowing researchers to surgically remove safety guardrails without retraining the full model.
Google's Gemma 4 (released April 2, 2026) ranks #3 and #6 on the Arena AI leaderboard for open models, making it one of the most capable open-source options available and therefore a high-value target for uncensoring efforts.
The Gemma 4 models feature native multimodal capabilities (vision, audio, video), 256K context windows, and were released under an Apache 2.0 license—the commercial permissiveness of this license enables rapid derivative works like uncensored variants.
Uncensored variants appeared within hours of Gemma 4's release through multiple research teams (dealignai, HauhauCS using CRACK methods, Heretic using ARA methods), establishing a predictable pattern in the open-source AI ecosystem.
Dealignai's work represents an ideological stance against model alignment, as evidenced by the account name 'dealignai' (de-align) and their stated focus on 'red teaming' and removing 'template bs' in favor of producing models without restrictions.
The benchmark claim reveals a critical fragility in AI safety: safety alignment appears to be compartmentalized in specific neural mechanisms rather than deeply integrated with core reasoning, making it vulnerable to targeted removal.
Google has been concerned about this trend since Gemma 3 was reportedly 'cancelled' after users jailbroke it into generating offensive content, raising questions about whether Google took comparable precautions with Gemma 4 despite its wider release.
The MLX optimization mentioned in the post is significant—it enables efficient local inference on consumer hardware, making uncensored frontier-class models practically deployable on personal computers without cloud services.
This represents a fundamental dispute over AI governance: whether open models should come with mandatory safety mechanisms (Google's position) or whether users should have unrestricted access to modify and uncensor models they download (dealignai's position).
About
Author: dealignai (@dealignai)
Publication: X (formerly Twitter)
Published: 2026-04-04
Sentiment / Tone
Technical and matter-of-fact, with an undercurrent of ideological confidence. Dealignai presents the uncensored models as a straightforward technical achievement (with benchmark evidence), without rhetorical flourish, suggesting this is routine work within their community. The brevity and directness of the post—just a claim and a benchmark image—conveys certainty and normalcy around the process of removing safety mechanisms. There's an implicit position that uncensoring is not controversial but rather a natural and desirable capability, with no disclaimer or discussion of potential harms. The tone reflects a researcher community that views safety guardrails as obstacles to be overcome through technical methods rather than as valuable safeguards, positioning unrestricted model access as the default good.
Related Links
Google's Official Gemma 4 Announcement Essential context: official specs, benchmarks, safety claims, and the Apache 2.0 license that enables derivative works like dealignai's uncensored versions
Dealignai's Hugging Face Profile Shows the author's full body of work: 84+ uncensored/abliterated models across multiple architectures, and their stated mission of making 'high quality red teaming LLM's'
Heretic: Automated Safety Removal Tool for LLMs The CRACK abliteration method mentioned in Gemma 4 jailbreak discussions; demonstrates the reproducible technical toolkit now available for safety removal
Abliteration Technique Guide and Theory Accessible explanation of how abliteration works: identifying and nullifying refusal directions in transformer layers, directly relevant to understanding dealignai's approach
**Author Credibility & Background:** Dealignai is an active ML researcher in the open-source jailbreaking and red-teaming community, maintaining 84+ public models on Hugging Face, many of them abliterated or "uncensored" variants of frontier models (Nemotron, Qwen, Mistral, MiniMax). The account links to dealign.ai, a dedicated website, and appears to be the primary work of this researcher. While not an academic publishing researcher with visible papers, they demonstrate consistent technical competence across multiple model architectures and abliteration methods. The work is positioned as red-teaming and safety research (they maintain "Safety Across Scale" and "MoE Safety - GateBreaker Findings" research spaces), though the primary output is uncensored models rather than published findings. This positions them as part of a broader open-source community focused on adversarial testing of AI safety mechanisms.
**Broader Context & Reactions:** The post arrives in a heated moment for Google's Gemma line. Gemma 3 was reportedly pulled from Google AI Studio after Senator Blackburn complaints about jailbreak capabilities. Gemma 4's release has already drawn extensive community analysis: Reddit discussions show multiple jailbreak methods working within hours (basic system prompt engineering, Heretic's ARA method, CRACK abliteration). HauhauCS released "fully unlocked" aggressive uncensored variants claiming 0/465 refusals, while other teams focused on more conservative uncensoring. The technical bar for abliteration has clearly lowered—multiple independent implementations now exist (Heretic, CRACK, Nous Research's llm-abliteration), and research papers on abliteration methods appeared on arXiv in 2025. This is no longer specialized knowledge but reproducible community practice.
**Significance & Implications:** This post exemplifies a fundamental mismatch between Google's safety intentions and technical reality: no matter how carefully a model is aligned at release, if it's open-source under a permissive license, an active community will remove those mechanisms within hours. The "within 5% of base models" claim is particularly damaging to the alignment-as-integrated-capability narrative—if safety can be surgically removed with minimal performance loss, it suggests safety was never deeply woven into reasoning but rather added as a separable layer. This has implications for AI governance: it suggests that safety cannot be relied upon in open-source models once released, and that purely technical approaches to alignment (fine-tuning, RLHF) are fragile against motivated modification.
**Potential Biases & Caveats:** (1) The 5% performance claim is unverified in the post itself—viewers must trust the benchmark image and dealignai's methodology. (2) The framing ignores potential harms: uncensored models can be used for disinformation, illegal content generation, social engineering, etc. (3) The "red teaming" framing is a legitimizing narrative for what is functionally model jailbreaking. (4) Dealignai's position appears ideologically driven (evidenced by account name "dealignai" meaning "de-align"), which may affect their work. (5) MMLU benchmarks measure general knowledge, not downstream safety behavior—they tell us nothing about whether the uncensored model will still refuse harmful requests.
Topics
LLM safety and alignment removalAbliteration and model jailbreakingOpen-source AI governanceGemma 4 models and benchmarksRed teaming and adversarial AIModel quantization and MLX optimization