Tools: How to Make Your Rust Tests Run Faster in CI (A Practical Guide)

Tools: How to Make Your Rust Tests Run Faster in CI (A Practical Guide)

Slow CI pipelines are often blamed on: But in many cases, the real issue is much simpler: Step 1 — Understand How cargo test Uses Threads Rust’s test harness runs tests in parallel by default. However, in CI environments: Step 2 — Check How Many CPUs Your Runner Has Inside your CI job, run: This means the environment has 2 logical CPUs available. If you don’t explicitly configure thread usage, your tests might not use both. Step 3 — Explicitly Set --test-threads Each invocation runs sequentially. To ensure each test run uses all available CPU cores, capture the number of CPUs dynamically: Why the Double Dash (--) Is Important. The -- separator is critical. Everything before -- is interpreted by cargo. Everything after -- is passed to the test binary (Rust’s test harness). --test-threads is a test harness argument — not a cargo argument. If you forget the separator, the flag won’t work. Step 4 — Why Use $(nproc) Instead of a Fixed Number? But that creates a hidden maintenance issue. If the runner changes from 2 CPUs to 4, your CI won’t scale automatically. Using: THREADS=$(nproc) ensures: Step 5 — Make Sure Your Tests Are Safe to Parallelize Parallel test execution requires test isolation. Your tests should: A safe pattern is to instantiate dependencies per test: Each test gets its own isolated state. When You Should Disable Parallelism. If a test suite depends on shared external state (for example, a real database instance), you may need to force sequential execution: Apply this only to specific test groups that require it. Do not disable parallelism globally unless necessary. Optional Optimization: Avoid Repeating Expensive Setup Sometimes slow tests are caused by repeated expensive operations (for example, hashing, cryptographic setup, or large fixture generation). You can cache computed values safely using OnceLock: However, always evaluate tradeoffs: Often, proper thread configuration already solves most CI performance issues. If your CI was running tests effectively single-threaded on a multi-core runner, explicitly configuring --test-threads can: In many cases, improvements of 2–3x are realistic. If your Rust CI feels slow, verify the following: Before rewriting your test suite or scaling infrastructure, make sure you are actually using the hardware available to you. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to ? It will become hidden in your post, but will still be visible via the comment's permalink. as well , this person and/or CODE_BLOCK:
nproc CODE_BLOCK:
2 CODE_BLOCK:
script: - cargo test -p my_crate module_a - cargo test -p my_crate module_b - cargo test -p my_crate module_c - cargo test -p my_crate module_d CODE_BLOCK:
script: - cargo test -p my_crate module_a - cargo test -p my_crate module_b - cargo test -p my_crate module_c - cargo test -p my_crate module_d CODE_BLOCK:
script: - cargo test -p my_crate module_a - cargo test -p my_crate module_b - cargo test -p my_crate module_c - cargo test -p my_crate module_d CODE_BLOCK:
script: - THREADS=$(nproc) - echo "Running tests with ${THREADS} threads" - cargo test -p my_crate module_a -- --test-threads=${THREADS} - cargo test -p my_crate module_b -- --test-threads=${THREADS} - cargo test -p my_crate module_c -- --test-threads=${THREADS} - cargo test -p my_crate module_d -- --test-threads=${THREADS} CODE_BLOCK:
script: - THREADS=$(nproc) - echo "Running tests with ${THREADS} threads" - cargo test -p my_crate module_a -- --test-threads=${THREADS} - cargo test -p my_crate module_b -- --test-threads=${THREADS} - cargo test -p my_crate module_c -- --test-threads=${THREADS} - cargo test -p my_crate module_d -- --test-threads=${THREADS} CODE_BLOCK:
script: - THREADS=$(nproc) - echo "Running tests with ${THREADS} threads" - cargo test -p my_crate module_a -- --test-threads=${THREADS} - cargo test -p my_crate module_b -- --test-threads=${THREADS} - cargo test -p my_crate module_c -- --test-threads=${THREADS} - cargo test -p my_crate module_d -- --test-threads=${THREADS} CODE_BLOCK:
--test-threads=2 CODE_BLOCK:
--test-threads=2 CODE_BLOCK:
--test-threads=2 COMMAND_BLOCK:
fn create_test_repository() -> InMemoryRepository { InMemoryRepository::new()
} [tokio::test]
fn example_test() { let repo = create_test_repository(); // test logic here
} COMMAND_BLOCK:
fn create_test_repository() -> InMemoryRepository { InMemoryRepository::new()
} [tokio::test]
fn example_test() { let repo = create_test_repository(); // test logic here
} COMMAND_BLOCK:
fn create_test_repository() -> InMemoryRepository { InMemoryRepository::new()
} [tokio::test]
fn example_test() { let repo = create_test_repository(); // test logic here
} CODE_BLOCK:
cargo test -- --test-threads=1 CODE_BLOCK:
cargo test -- --test-threads=1 CODE_BLOCK:
cargo test -- --test-threads=1 COMMAND_BLOCK:
use std::sync::OnceLock; static PRECOMPUTED_VALUE: OnceLock<String> = OnceLock::new(); fn get_precomputed_value() -> String { PRECOMPUTED_VALUE .get_or_init(|| { expensive_operation() }) .clone()
} fn expensive_operation() -> String { // Simulate heavy work "computed_result".to_string()
} COMMAND_BLOCK:
use std::sync::OnceLock; static PRECOMPUTED_VALUE: OnceLock<String> = OnceLock::new(); fn get_precomputed_value() -> String { PRECOMPUTED_VALUE .get_or_init(|| { expensive_operation() }) .clone()
} fn expensive_operation() -> String { // Simulate heavy work "computed_result".to_string()
} COMMAND_BLOCK:
use std::sync::OnceLock; static PRECOMPUTED_VALUE: OnceLock<String> = OnceLock::new(); fn get_precomputed_value() -> String { PRECOMPUTED_VALUE .get_or_init(|| { expensive_operation() }) .clone()
} fn expensive_operation() -> String { // Simulate heavy work "computed_result".to_string()
} - Heavy test suites
- Complex integrations
- Rust compilation time - Your tests are not fully using the CPU available in the CI runner.
- But in many cases, the real issue is much simpler:
- Your tests are not fully using the CPU available in the CI runner. - CPU limits may restrict available cores
- Containers may expose fewer threads
- The harness may default to 1 thread in constrained setups
- You should never assume your CI is using all available CPUs. - Automatic adaptation
- No future edits required
- Better portability between environments - Avoid global mutable state
- Avoid shared in-memory singletons
- Avoid reusing the same database instance
- Avoid mutating global environment variables - The expensive operation runs only once
- Tests remain deterministic
- No unsafe global mutation occurs - Does it significantly reduce runtime?
- Does it add unnecessary complexity?
- Is parallelism alone sufficient? - Reduce test stage time dramatically
- Improve resource utilization
- Avoid unnecessary infrastructure upgrades - How many CPUs does the runner expose? (nproc)
- Are tests running in parallel?
- Is --test-threads explicitly configured?
- Are tests properly isolated?
- Are expensive operations unnecessarily repeated?