Tools: Running modern Python TTS toolchains on non-AVX2 CPUs - Analysis

Tools: Running modern Python TTS toolchains on non-AVX2 CPUs - Analysis

Running modern Python TTS toolchains on non-AVX2 CPUs

Quick triage

Working pin-set

Patches required

Patch 1: torch._dynamo SIGFPE on int division by zero

Patch 2: GPU-only mel-spectrogram computation

Patch 3: num_workers=0 everywhere

Patch 4: weights_only=False for older checkpoint formats

Patch 5: Stub datasets for transformers' lazy loader

Patch 6: --no-build-isolation for Cython extensions

Per-project status

F5-TTS

StyleTTS2

kokoro

whisper.cpp

What does not work

Worth trying if you have AVX (no AVX2)

Summary Notes from getting F5-TTS, StyleTTS2, kokoro/Misaki, and whisper.cpp to work

on an AMD Phenom II X6 1090T (2010 K10/Family-10h architecture). The CPU has SSE/SSE2/SSE3/SSE4a, plus CX16/POPCNT/LAHF — but no SSE4.1, noSSE4.2, no AVX, no AVX2, no FMA, no F16C. That puts it below the modernx86-64-v2 baseline. A growing share of binary Python wheels in the AIecosystem assume v2 or v3, so they SIGILL or SIGFPE at import. This is aground-truth list of what we hit and what worked. If your CPU is below x86-64-v2 (in particular, missing SSE4.1), expect: If your CPU is x86-64-v2 (Nehalem ~2008 or newer Intel; Bulldozer ~2011 ornewer AMD) but missing AVX/AVX2, you'll still hit some of these but fewer. These are versions empirically verified to import and run on this CPU: For a fresh install, layer the pins after the project install: Even after pinning to torch 2.7.0, the very first dynamo init still SIGFPEson this CPU. Cause: torch._dynamo.variables.torch_function.populate_builtin_to_tensor_fn_map()probes Python operators on dummy tensors, including tensor // 0 (integerfloor-divide by zero). Newer Intel CPUs trap this into a PythonZeroDivisionError via signal handler. AMD Phenom II just SIGFPEs. The function's output isn't actually needed for inference. Stub it: This is non-invasive — only affects code that uses torch.compile() /dynamo paths, which most fine-tuning trainers don't. torch.matmul on CPU SIGFPEs on this CPU. Anything that calls torchaudio'sMelSpectrogram on CPU dies. For training pipelines that compute melsin the data loader, this is fatal. a) Move the mel module to GPU (cheap audio→mel transfer per sample): b) Pre-compute all mels once on GPU, save to disk, load at training time(example script). (b) is faster overall — no per-sample audio→GPU transfer, just torch.load. DataLoader spawns subprocess workers that re-import torch and re-run_dynamo init. Even with patch 1, the patched source isn't always picked upin subprocess. Set num_workers=0 to keep all loading in the main process. PyTorch 2.6+ flipped the default. If you load checkpoints saved before 2.6that contain pickled Python objects, you need torch.load(path, weights_only=False).Affected: many published TTS pretrained models (StyleTTS2's ASR/JDC/PLBERTmodules, F5-TTS in some cases). transformers.utils.import_utils._is_package_available("datasets") callsimportlib.util.find_spec("datasets"), which raises ValueError if__spec__ is None. If you provide a stub datasets module viasys.modules (to avoid pulling pyarrow), it must have a real ModuleSpec: monotonic_align (used by StyleTTS2) and similar packages build with theirown ephemeral build-env via pip's build isolation. That ephemeral envre-installs numpy and cython and may pull AVX2 wheels. Use: This forces the build to use your already-installed (pinned) numpy+cython. A 2011-era Sandy Bridge or later Intel CPU has AVX but no AVX2. Most of thepatches above still apply, but you may not need patch 1 (dynamo SIGFPE),and pyarrow/datasets/pandas may install (just not the AVX2-specific codepaths). Try without the uninstalls first. If you want to do TTS fine-tuning on hardware below x86-64-v2: The rig produces useful output. It's not a fast-iteration machine — everyupstream upgrade re-breaks something — but for fine-tuning (which doesn'tneed a fast-iteration machine) it's economical: an RTX 3060 12 GB on a

2010-era CPU running real-world TTS workloads. Originally posted at netlinux-ai.github.io/2026/05/09/non-avx2-cpu-tts-compat/. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ -weight: 500;">pip -weight: 500;">install --prefer-binary <project> # whatever you actually want -weight: 500;">pip -weight: 500;">install --prefer-binary --force-reinstall --no-deps \ "torch==2.7.0" "torchaudio==2.7.0" \ "transformers==4.57.3" "numpy<2" -weight: 500;">pip uninstall -y datasets pyarrow pyarrow-hotfix pandas torchcodec -weight: 500;">pip -weight: 500;">install --prefer-binary <project> # whatever you actually want -weight: 500;">pip -weight: 500;">install --prefer-binary --force-reinstall --no-deps \ "torch==2.7.0" "torchaudio==2.7.0" \ "transformers==4.57.3" "numpy<2" -weight: 500;">pip uninstall -y datasets pyarrow pyarrow-hotfix pandas torchcodec -weight: 500;">pip -weight: 500;">install --prefer-binary <project> # whatever you actually want -weight: 500;">pip -weight: 500;">install --prefer-binary --force-reinstall --no-deps \ "torch==2.7.0" "torchaudio==2.7.0" \ "transformers==4.57.3" "numpy<2" -weight: 500;">pip uninstall -y datasets pyarrow pyarrow-hotfix pandas torchcodec F=$(python -c "import torch._dynamo.variables.torch_function as m; print(m.__file__)") cp $F $F.orig sed -i "0,/ global BUILTIN_TO_TENSOR_FN_MAP/s// return # patched: SIGFPE on Phenom II\n global BUILTIN_TO_TENSOR_FN_MAP/" $F F=$(python -c "import torch._dynamo.variables.torch_function as m; print(m.__file__)") cp $F $F.orig sed -i "0,/ global BUILTIN_TO_TENSOR_FN_MAP/s// return # patched: SIGFPE on Phenom II\n global BUILTIN_TO_TENSOR_FN_MAP/" $F F=$(python -c "import torch._dynamo.variables.torch_function as m; print(m.__file__)") cp $F $F.orig sed -i "0,/ global BUILTIN_TO_TENSOR_FN_MAP/s// return # patched: SIGFPE on Phenom II\n global BUILTIN_TO_TENSOR_FN_MAP/" $F to_mel = torchaudio.transforms.MelSpectrogram(...).to("cuda") def preprocess(wave): wave = torch.from_numpy(wave).to("cuda") mel = to_mel(wave) return mel.cpu() # back to CPU for DataLoader collator to_mel = torchaudio.transforms.MelSpectrogram(...).to("cuda") def preprocess(wave): wave = torch.from_numpy(wave).to("cuda") mel = to_mel(wave) return mel.cpu() # back to CPU for DataLoader collator to_mel = torchaudio.transforms.MelSpectrogram(...).to("cuda") def preprocess(wave): wave = torch.from_numpy(wave).to("cuda") mel = to_mel(wave) return mel.cpu() # back to CPU for DataLoader collator import importlib.machinery, types, sys _stub = types.ModuleType("datasets") _stub.__spec__ = importlib.machinery.ModuleSpec("datasets", loader=None) _stub.Dataset = type("Dataset", (), {}) _stub.load_from_disk = lambda *a, **kw: None sys.modules["datasets"] = _stub import importlib.machinery, types, sys _stub = types.ModuleType("datasets") _stub.__spec__ = importlib.machinery.ModuleSpec("datasets", loader=None) _stub.Dataset = type("Dataset", (), {}) _stub.load_from_disk = lambda *a, **kw: None sys.modules["datasets"] = _stub import importlib.machinery, types, sys _stub = types.ModuleType("datasets") _stub.__spec__ = importlib.machinery.ModuleSpec("datasets", loader=None) _stub.Dataset = type("Dataset", (), {}) _stub.load_from_disk = lambda *a, **kw: None sys.modules["datasets"] = _stub -weight: 500;">pip -weight: 500;">install --no-build-isolation --no-deps <package> -weight: 500;">pip -weight: 500;">install --no-build-isolation --no-deps <package> -weight: 500;">pip -weight: 500;">install --no-build-isolation --no-deps <package> - pyarrow static-init pinsrq SIGILL on import - numpy 2.x wheel SIGILL on import (numpy 1.26.4 still has a fallback path) - torch 2.10+ wheel SIGFPE in torch._dynamo on import - pandas modern wheels SIGILL on tokenisation - monotonic_align and other Cython extensions: build-from-source SIGILL - DataLoader subprocess workers SIGFPE re-importing torch - Inference and training both work after patches 1–5. - See companion gist for a minimal trainer that bypasses datasets/accelerate. - Issue filed: SWivid/F5-TTS#1292 (EMA-only checkpoint structure). - Inference and fine-tune both work after patches 1, 2, 3, 4, 6. - PRs filed: yl4579/StyleTTS2#361 (weights_only=False), #362 (drop pandas). - Inference works (via the kokoro-onnx ONNX runtime path; PyTorch path blocked by upstream dep pinning, not CPU). - Issue filed: hexgrad/kokoro#321 (broken misaki>=0.7.16 PyPI pin). - Works out of the box. Pure C++, no Python wheels involved. CUDA inference on the GPU. - pyarrow source build: succeeds eventually but the resulting library still uses SSE4.1 in places (Apache Arrow's CMake ARROW_SIMD_LEVEL=NONE doesn't cover everything). Not worth the multi-hour build. - numpy 2.x: even from-source build emits AVX-needing code via OpenBLAS bundled wheels. Stick with 1.26.4. - Anything using bitsandbytes int8/int4 quantisation: those kernels hard-require AVX2. - Do inference work on the GPU. Keep CPU-side code to file I/O and JSON. - Pin numpy 1.26 + torch 2.7 + transformers 4.57. - Stub or uninstall datasets/pyarrow/pandas/torchcodec. - Patch torch._dynamo once per torch -weight: 500;">install. - Pre-compute mel-spectrograms offline. - Train at num_workers=0.