Running modern Python TTS toolchains on non-AVX2 CPUs
Quick triage
Working pin-set
Patches required
Patch 1: torch._dynamo SIGFPE on int division by zero
Patch 2: GPU-only mel-spectrogram computation
Patch 3: num_workers=0 everywhere
Patch 4: weights_only=False for older checkpoint formats
Patch 5: Stub datasets for transformers' lazy loader
Patch 6: --no-build-isolation for Cython extensions
Per-project status
F5-TTS
StyleTTS2
kokoro
whisper.cpp
What does not work
Worth trying if you have AVX (no AVX2)
Summary Notes from getting F5-TTS, StyleTTS2, kokoro/Misaki, and whisper.cpp to work
on an AMD Phenom II X6 1090T (2010 K10/Family-10h architecture). The CPU has SSE/SSE2/SSE3/SSE4a, plus CX16/POPCNT/LAHF — but no SSE4.1, noSSE4.2, no AVX, no AVX2, no FMA, no F16C. That puts it below the modernx86-64-v2 baseline. A growing share of binary Python wheels in the AIecosystem assume v2 or v3, so they SIGILL or SIGFPE at import. This is aground-truth list of what we hit and what worked. If your CPU is below x86-64-v2 (in particular, missing SSE4.1), expect: If your CPU is x86-64-v2 (Nehalem ~2008 or newer Intel; Bulldozer ~2011 ornewer AMD) but missing AVX/AVX2, you'll still hit some of these but fewer. These are versions empirically verified to import and run on this CPU: For a fresh install, layer the pins after the project install: Even after pinning to torch 2.7.0, the very first dynamo init still SIGFPEson this CPU. Cause: torch._dynamo.variables.torch_function.populate_builtin_to_tensor_fn_map()probes Python operators on dummy tensors, including tensor // 0 (integerfloor-divide by zero). Newer Intel CPUs trap this into a PythonZeroDivisionError via signal handler. AMD Phenom II just SIGFPEs. The function's output isn't actually needed for inference. Stub it: This is non-invasive — only affects code that uses torch.compile() /dynamo paths, which most fine-tuning trainers don't. torch.matmul on CPU SIGFPEs on this CPU. Anything that calls torchaudio'sMelSpectrogram on CPU dies. For training pipelines that compute melsin the data loader, this is fatal. a) Move the mel module to GPU (cheap audio→mel transfer per sample): b) Pre-compute all mels once on GPU, save to disk, load at training time(example script). (b) is faster overall — no per-sample audio→GPU transfer, just torch.load. DataLoader spawns subprocess workers that re-import torch and re-run_dynamo init. Even with patch 1, the patched source isn't always picked upin subprocess. Set num_workers=0 to keep all loading in the main process. PyTorch 2.6+ flipped the default. If you load checkpoints saved before 2.6that contain pickled Python objects, you need torch.load(path, weights_only=False).Affected: many published TTS pretrained models (StyleTTS2's ASR/JDC/PLBERTmodules, F5-TTS in some cases). transformers.utils.import_utils._is_package_available("datasets") callsimportlib.util.find_spec("datasets"), which raises ValueError if__spec__ is None. If you provide a stub datasets module viasys.modules (to avoid pulling pyarrow), it must have a real ModuleSpec: monotonic_align (used by StyleTTS2) and similar packages build with theirown ephemeral build-env via pip's build isolation. That ephemeral envre-installs numpy and cython and may pull AVX2 wheels. Use: This forces the build to use your already-installed (pinned) numpy+cython. A 2011-era Sandy Bridge or later Intel CPU has AVX but no AVX2. Most of thepatches above still apply, but you may not need patch 1 (dynamo SIGFPE),and pyarrow/datasets/pandas may install (just not the AVX2-specific codepaths). Try without the uninstalls first. If you want to do TTS fine-tuning on hardware below x86-64-v2: The rig produces useful output. It's not a fast-iteration machine — everyupstream upgrade re-breaks something — but for fine-tuning (which doesn'tneed a fast-iteration machine) it's economical: an RTX 3060 12 GB on a
2010-era CPU running real-world TTS workloads. Originally posted at netlinux-ai.github.io/2026/05/09/non-avx2-cpu-tts-compat/. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse
$ -weight: 500;">pip -weight: 500;">install --prefer-binary <project> # whatever you actually want
-weight: 500;">pip -weight: 500;">install --prefer-binary --force-reinstall --no-deps \ "torch==2.7.0" "torchaudio==2.7.0" \ "transformers==4.57.3" "numpy<2"
-weight: 500;">pip uninstall -y datasets pyarrow pyarrow-hotfix pandas torchcodec
-weight: 500;">pip -weight: 500;">install --prefer-binary <project> # whatever you actually want
-weight: 500;">pip -weight: 500;">install --prefer-binary --force-reinstall --no-deps \ "torch==2.7.0" "torchaudio==2.7.0" \ "transformers==4.57.3" "numpy<2"
-weight: 500;">pip uninstall -y datasets pyarrow pyarrow-hotfix pandas torchcodec
-weight: 500;">pip -weight: 500;">install --prefer-binary <project> # whatever you actually want
-weight: 500;">pip -weight: 500;">install --prefer-binary --force-reinstall --no-deps \ "torch==2.7.0" "torchaudio==2.7.0" \ "transformers==4.57.3" "numpy<2"
-weight: 500;">pip uninstall -y datasets pyarrow pyarrow-hotfix pandas torchcodec
F=$(python -c "import torch._dynamo.variables.torch_function as m; print(m.__file__)")
cp $F $F.orig
sed -i "0,/ global BUILTIN_TO_TENSOR_FN_MAP/s// return # patched: SIGFPE on Phenom II\n global BUILTIN_TO_TENSOR_FN_MAP/" $F
F=$(python -c "import torch._dynamo.variables.torch_function as m; print(m.__file__)")
cp $F $F.orig
sed -i "0,/ global BUILTIN_TO_TENSOR_FN_MAP/s// return # patched: SIGFPE on Phenom II\n global BUILTIN_TO_TENSOR_FN_MAP/" $F
F=$(python -c "import torch._dynamo.variables.torch_function as m; print(m.__file__)")
cp $F $F.orig
sed -i "0,/ global BUILTIN_TO_TENSOR_FN_MAP/s// return # patched: SIGFPE on Phenom II\n global BUILTIN_TO_TENSOR_FN_MAP/" $F
to_mel = torchaudio.transforms.MelSpectrogram(...).to("cuda")
def preprocess(wave): wave = torch.from_numpy(wave).to("cuda") mel = to_mel(wave) return mel.cpu() # back to CPU for DataLoader collator
to_mel = torchaudio.transforms.MelSpectrogram(...).to("cuda")
def preprocess(wave): wave = torch.from_numpy(wave).to("cuda") mel = to_mel(wave) return mel.cpu() # back to CPU for DataLoader collator
to_mel = torchaudio.transforms.MelSpectrogram(...).to("cuda")
def preprocess(wave): wave = torch.from_numpy(wave).to("cuda") mel = to_mel(wave) return mel.cpu() # back to CPU for DataLoader collator
import importlib.machinery, types, sys
_stub = types.ModuleType("datasets")
_stub.__spec__ = importlib.machinery.ModuleSpec("datasets", loader=None)
_stub.Dataset = type("Dataset", (), {})
_stub.load_from_disk = lambda *a, **kw: None
sys.modules["datasets"] = _stub
import importlib.machinery, types, sys
_stub = types.ModuleType("datasets")
_stub.__spec__ = importlib.machinery.ModuleSpec("datasets", loader=None)
_stub.Dataset = type("Dataset", (), {})
_stub.load_from_disk = lambda *a, **kw: None
sys.modules["datasets"] = _stub
import importlib.machinery, types, sys
_stub = types.ModuleType("datasets")
_stub.__spec__ = importlib.machinery.ModuleSpec("datasets", loader=None)
_stub.Dataset = type("Dataset", (), {})
_stub.load_from_disk = lambda *a, **kw: None
sys.modules["datasets"] = _stub
-weight: 500;">pip -weight: 500;">install --no-build-isolation --no-deps <package>
-weight: 500;">pip -weight: 500;">install --no-build-isolation --no-deps <package>
-weight: 500;">pip -weight: 500;">install --no-build-isolation --no-deps <package> - pyarrow static-init pinsrq SIGILL on import
- numpy 2.x wheel SIGILL on import (numpy 1.26.4 still has a fallback path)
- torch 2.10+ wheel SIGFPE in torch._dynamo on import
- pandas modern wheels SIGILL on tokenisation
- monotonic_align and other Cython extensions: build-from-source SIGILL
- DataLoader subprocess workers SIGFPE re-importing torch - Inference and training both work after patches 1–5.
- See companion gist for a minimal trainer that bypasses datasets/accelerate.
- Issue filed: SWivid/F5-TTS#1292 (EMA-only checkpoint structure). - Inference and fine-tune both work after patches 1, 2, 3, 4, 6.
- PRs filed: yl4579/StyleTTS2#361 (weights_only=False), #362 (drop pandas). - Inference works (via the kokoro-onnx ONNX runtime path; PyTorch path
blocked by upstream dep pinning, not CPU).
- Issue filed: hexgrad/kokoro#321 (broken misaki>=0.7.16 PyPI pin). - Works out of the box. Pure C++, no Python wheels involved. CUDA inference
on the GPU. - pyarrow source build: succeeds eventually but the resulting library
still uses SSE4.1 in places (Apache Arrow's CMake ARROW_SIMD_LEVEL=NONE
doesn't cover everything). Not worth the multi-hour build.
- numpy 2.x: even from-source build emits AVX-needing code via OpenBLAS
bundled wheels. Stick with 1.26.4.
- Anything using bitsandbytes int8/int4 quantisation: those kernels
hard-require AVX2. - Do inference work on the GPU. Keep CPU-side code to file I/O and JSON.
- Pin numpy 1.26 + torch 2.7 + transformers 4.57.
- Stub or uninstall datasets/pyarrow/pandas/torchcodec.
- Patch torch._dynamo once per torch -weight: 500;">install.
- Pre-compute mel-spectrograms offline.
- Train at num_workers=0.