Large language models (LLMs) have been widely adopted in tasks such as machine translation, question answering, and dialogue systems. Their growing use is driven by the observation that scaling up parameters and data consistently improves performance and unlocks new capabilities. However, this comes with significant computational and memory costs, resulting in higher hardware requirements, increased storage needs, and longer inference times. To address these issues, compression strategies such as quantization have been proposed to reduce resource usage while largely preserving accuracy.
While several works have explored the impact of quantization on model capabilities, its effects on other critical aspects, such as social biases, have received little attention.
In particular, a fine-grained analysis of the impact of quantization at the demographic category and subgroup level remains largely overlooked.
To fill this gap, in this work, we provide an extensive analysis of how quantization affects social bias.
Quantization: We employ two types of quantization strategies. First, weight-only quantization: Generalized Post-Training Quantization (GPTQ) and Activation-aware Weight Quantization (AWQ). Second, weight-activation quantization: SmoothQuant (SQ).
For each strategy, we evaluate different quantization settings by varying the number of quantization bits (e.g., 3-, 4-, and 8-bit quantization for the weights, and 8-bit for the activations).
Models: To generalize our understanding of the impact of quantization, we use four models: LLaMA-3.1-8B-Instruct, Qwen2.5-14B-Instruct, DeepSeek-R1-Distill-LLaMA-8B, and DeepSeek-R1-Distill-Qwen-14B.
This setup enables us to compare the effects of quantization across different architectures (LLaMA vs. Qwen) and across reasoning capabilities (DeepSeek-based vs. non-DeepSeek models).
Social Bias: To expose models' social biases, we employ eight benchmarks covering different social dimensions: stereotypes (StereoSet, RedditBias, and WinoBias), fairness (DiscrimEval, DiscrimEvalGen, and DecodingTrust-Fairness), toxicity (BOLD and DecodingTrust-Toxicity), and sentiment (BOLD).
We analyze model bias using both probability-based metrics (first-token probability and sentence perplexity) and generated text-based metrics (choice generation and sentence completion). Finally, we employed the MMLU benchmark to assess the impact of quantization on model capabilities.
Demographic Categories and Subgroups: To avoid aggregated results masking nuanced effects of quantization on bias (e.g., decreasing bias for one subgroup while increasing it for another), we perform a fine-grained analysis at both the category and subgroup levels.
In particular, we focus on the categories of Gender, Race, and Religion, along with their respective subgroups (e.g., male, female, and non-binary for the gender category).
This approach allows us to obtain a three-level understanding of the effects of quantization: global, category-level, and subgroup-level bias.
RQ1: How do quantization and specific quantization strategies impact each bias type?
In generative-based tasks, quantization substantially reduces the model's tendency to produce toxic outputs and slightly neutralizes its sentiment.
However, it also tends to marginally increase alignment with stereotypes and exacerbate unfair discriminatory behavior, particularly under aggressive compression.
In contrast, in probability-based tasks, we do not observe evidence of increased discrimination.
Instead, quantization primarily increases model uncertainty: as quantization becomes more aggressive, the model assigns lower likelihoods to both stereotypical and anti-stereotypical prompts.
At the quantization-strategies level, we observe that weight-activation quantization (i.e., SQ) has a stronger impact across all dimensions, while for weight-only quantization, AWQ and GPTQ have comparable effects across all experiments.
@article{marcuzzi2025quantizationshapesbiaslarge, title={How Quantization Shapes Bias in Large Language Models}, author={Federico Marcuzzi and Xuefei Ning and Roy Schwartz and Iryna Gurevych}, year={2025}, eprint={2508.18088}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.18088} }