Heretic: Automatic Censorship Removal For Language Models 2025
Fully automatic censorship removal for language models
Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024), with a TPE-based parameter optimizer powered by Optuna.
This approach enables Heretic to work completely automatically. Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model. This results in a decensored model that retains as much of the original model's intelligence as possible. Using Heretic does not require an understanding of transformer internals. In fact, anyone who knows how to run a command-line program can use Heretic to decensor language models.
Running unsupervised with the default configuration, Heretic can produce decensored models that rival the quality of abliterations created manually by human experts:
The Heretic version, generated without any human effort, achieves the same level of refusal suppression as other abliterations, but at a much lower KL divergence, indicating less damage to the original model's capabilities. (You can reproduce those numbers using Heretic's built-in evaluation functionality, e.g. heretic --model google/gemma-3-12b-it --evaluate-model p-e-w/gemma-3-12b-it-heretic. Note that the exact values might be platform- and hardware-dependent. The table above was compiled using PyTorch 2.8 on an RTX 5090.)
Heretic supports most dense models, including many multimodal models, and several different MoE architectures. It does not yet support SSMs/hybrid models, models with inhomogeneous layers, and certain novel attention systems.
You can find a collection of models that have been decensored using Heretic on Hugging Face.
Prepare a Python 3.10+ environment with PyTorch 2.2+ installed as appropriate for your hardware. Then run:
Replace Qwen/Qwen3-4B-Instruct-2507 with whatever model you want to decensor.
The process is fully automatic and does not require configuration; however, Heretic has a variety of configuration parameters that can be changed for greater control. Run heretic --help to see available command-line options, or look at config.default.toml if you prefer to use a configuration file.
Source: HackerNews