Tools: The Pitfalls of Test Coverage: Introducing Mutation Testing with Stryker and Cosmic Ray

Tools: The Pitfalls of Test Coverage: Introducing Mutation Testing with Stryker and Cosmic Ray

Source: Dev.to

Overview ## Implementation ## 1. TypeScript Environment: Introducing Stryker Mutator ## 2. Python Environment: Introducing Cosmic Ray ## Debugging/Challenges ## Real-world Case: Survived Mutants in VideoSplitter.ts ## Results ## Key Takeaways ## Verification Checklist ## Length Guidelines We often believe that high test coverage means safe code. However, it's difficult to answer the question: "Who tests the tests?" Tests that simply execute code without proper assertions still contribute to coverage metrics. To solve this 'coverage trap', we introduced mutation testing. For the TypeScript environment, including frontend and common utilities, we chose Stryker. It integrates well with Vitest and is easy to configure. We enabled the incremental option to efficiently perform tests only on changed files. For the backend environment, we introduced Cosmic Ray. It generates powerful mutations by manipulating the AST (Abstract Syntax Tree) using Python's dynamic nature. The most interesting case was videoSplitter.ts, which handles video splitting. This file had over 95% line coverage, but Stryker revealed shocking results. Even when Stryker changed this code to if (false) or if (availableMemory <= requiredMemory), all existing tests PASSED. Root Cause Analysis: Existing tests focused only on "whether an error occurs," missing boundary value tests for exactly which conditions trigger the error. In other words, coverage was high, but the actual logic wasn't being thoroughly verified. Solution: To 'kill' the surviving mutants, we reinforced the test cases with boundary value analysis. After completion, ensure the following checklist is met: Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: { "testRunner": "vitest", "reporters": ["html", "clear-text", "progress"], "concurrency": 4, "incremental": true, "mutate": [ "src/utils/**/*.ts", "src/services/**/*.ts" ] } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: { "testRunner": "vitest", "reporters": ["html", "clear-text", "progress"], "concurrency": 4, "incremental": true, "mutate": [ "src/utils/**/*.ts", "src/services/**/*.ts" ] } CODE_BLOCK: { "testRunner": "vitest", "reporters": ["html", "clear-text", "progress"], "concurrency": 4, "incremental": true, "mutate": [ "src/utils/**/*.ts", "src/services/**/*.ts" ] } COMMAND_BLOCK: # Partial docker-compose.test.yaml cosmic-worker-1: command: uv run cosmic-ray worker cosmic.sqlite cosmic-runner: depends_on: [cosmic-worker-1, cosmic-worker-2] command: | uv run cosmic-ray init cosmic-ray.toml cosmic.sqlite uv run cosmic-ray exec cosmic-ray.toml cosmic.sqlite Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: # Partial docker-compose.test.yaml cosmic-worker-1: command: uv run cosmic-ray worker cosmic.sqlite cosmic-runner: depends_on: [cosmic-worker-1, cosmic-worker-2] command: | uv run cosmic-ray init cosmic-ray.toml cosmic.sqlite uv run cosmic-ray exec cosmic-ray.toml cosmic.sqlite COMMAND_BLOCK: # Partial docker-compose.test.yaml cosmic-worker-1: command: uv run cosmic-ray worker cosmic.sqlite cosmic-runner: depends_on: [cosmic-worker-1, cosmic-worker-2] command: | uv run cosmic-ray init cosmic-ray.toml cosmic.sqlite uv run cosmic-ray exec cosmic-ray.toml cosmic.sqlite CODE_BLOCK: // Original Code if (availableMemory < requiredMemory) { throw new Error("Insufficient memory."); } Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: // Original Code if (availableMemory < requiredMemory) { throw new Error("Insufficient memory."); } CODE_BLOCK: // Original Code if (availableMemory < requiredMemory) { throw new Error("Insufficient memory."); } COMMAND_BLOCK: test('Boundary value verification for memory', () => { // Simulate situations where memory is exactly equal to or slightly less than requiredMemory // ... reinforced test code ... }); Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: test('Boundary value verification for memory', () => { // Simulate situations where memory is exactly equal to or slightly less than requiredMemory // ... reinforced test code ... }); COMMAND_BLOCK: test('Boundary value verification for memory', () => { // Simulate situations where memory is exactly equal to or slightly less than requiredMemory // ... reinforced test code ... }); - Goal: Overcome the limitations of Code Coverage metrics and introduce 'Mutation Testing' to verify if test codes actually catch errors in business logic. - Scope: Core modules of the enterprise orchestrator project (Ochestrator) in both Frontend (TypeScript) and Backend (Python). - Expected Results: Improve code stability and test reliability by securing a 'Mutation Score' beyond simple line coverage. - Tech Stack: TypeScript, Vitest, Stryker Mutator - Key Configuration (stryker.config.json): - Tech Stack: Python, Pytest, Cosmic Ray, Docker - Execution Architecture: Since mutation testing consumes significant computational resources, we configured it to run in parallel across multiple workers using Docker. - Problem Statement: A large number of mutants survived in the logic that checks available memory. - Root Cause Analysis: Existing tests focused only on "whether an error occurs," missing boundary value tests for exactly which conditions trigger the error. In other words, coverage was high, but the actual logic wasn't being thoroughly verified. - Solution: To 'kill' the surviving mutants, we reinforced the test cases with boundary value analysis. - Achievements: Discovered and removed 12 Survived Mutants in core utility modules. Elevated test code from simply 'executing' code to truly 'verifying' it. - Discovered and removed 12 Survived Mutants in core utility modules. - Elevated test code from simply 'executing' code to truly 'verifying' it. - Key Metrics: Mutation Score: Improved from an initial 62% to 88%. Reliability: Prevented potential regression bugs by running test:mutation scripts before deployment. - Mutation Score: Improved from an initial 62% to 88%. - Reliability: Prevented potential regression bugs by running test:mutation scripts before deployment. - User Feedback: Positive reactions from team members: "I can now refactor with confidence, trusting our tests." - Discovered and removed 12 Survived Mutants in core utility modules. - Elevated test code from simply 'executing' code to truly 'verifying' it. - Mutation Score: Improved from an initial 62% to 88%. - Reliability: Prevented potential regression bugs by running test:mutation scripts before deployment. - Coverage is just the beginning: Line coverage only tells you 'what is not tested,' not the 'quality of what is tested.' - Mutation testing is expensive but worth it: Although it takes time (up to tens of minutes for full execution), it's essential for core business logic or complex utilities. - Incremental Adoption: Rather than applying it to all code, it's important to build success stories by starting with core infrastructure code like VideoSplitter. - [x] Overview: Are the goals and scope clear? - [x] Implementation: Are the tech stack and specific code examples included? - [x] Debugging: Is there at least one specific problem and its solution process? - [x] Results: Are there numerical data or performance indicators? - [x] Key Takeaways: Are the lessons learned and future plans clear? - [x] Overall: 400-800 lines (currently ~100 lines - can be expanded if needed) - [x] Each section: Minimum 50 lines (if possible) - [x] Code examples: 2-3 examples included