Tools
Tools: Ultimate Guide: VoIP Mos Score Testing Tools
MOS Score Basics (The 60-Second Version)
The G.711 Ceiling
Method 1: Wireshark RTP Stream Analysis (Free)
Analysis
Estimating MOS from Wireshark Data
What "Normal" Looks Like
Method 2: Asterisk CLI (Free)
Checking Active Channel Quality
Enabling RTP Statistics Logging
Using CDR Quality Fields
Method 3: Command-Line Network Testing (Free)
iperf3 — Bandwidth and Jitter Test
mtr — Latency and Path Analysis
ping — Basic Latency Check
Method 4: PESQ and POLQA (Commercial)
How They Work
PESQ vs POLQA
Practical PESQ Testing Setup
Method 5: Continuous Monitoring (Free + Commercial)
Grafana + Asterisk CDR (Free)
Commercial Monitoring Tools
Interpreting Results: What Bad MOS Scores Mean
High Packet Loss (>1%)
High Jitter (>30ms)
High Latency (>150ms one-way)
MOS Score Testing Checklist for VICIdial Last updated: March 2026 | Reading time: ~22 minutes Your agents are complaining about call quality. Customers sound robotic. There's echo. Words get clipped. Your manager wants a number that proves the calls are bad, and your carrier wants a number that proves they're not responsible. That number is the MOS score — Mean Opinion Score — and it's been the standard metric for voice quality since 1996. The problem is that most people who talk about MOS scores have never measured one. They parrot the "4.0+ is good, below 3.5 is bad" line without knowing how to produce the number in the first place. I'm going to show you exactly how to measure MOS scores on your VoIP/VICIdial system using tools that range from free (Wireshark, Asterisk CLI) to expensive (POLQA hardware probes). You'll leave here with actual testing procedures, not marketing copy. MOS was originally defined by ITU-T P.800 as a subjective test: you play audio samples to a panel of listeners, they rate quality from 1 to 5, and you average the results. That's the "Mean Opinion Score" — literally the average opinion of human listeners. For a call center, 4.0 is your minimum target. Anything below 3.6 and your agents will struggle to understand customers, call times increase, customer satisfaction drops, and conversion rates tank. We've measured the correlation across our client base: a 0.5-point MOS drop corresponds to roughly 8-12% longer average handle times. No VoIP call can ever reach a perfect 5.0 MOS. Even under laboratory conditions with zero packet loss and zero jitter, the codec itself introduces some degradation: If you're using G.729 to save bandwidth, your MOS ceiling is 3.92 before any network impairment. Add even 1% packet loss and you're below 3.5. For call center quality, G.711 ulaw is the standard because it gives you the most headroom before quality degrades. Wireshark is the most practical tool for diagnosing VoIP quality issues. It captures the actual RTP packets and calculates jitter, packet loss, and estimated MOS for every stream in the capture. Install Wireshark on a machine that can see VoIP traffic. This could be: Start a capture with an RTP-focused filter: (Adjust the port range to match your Asterisk RTP port configuration in rtp.conf:) After capturing, go to Telephony > RTP > RTP Streams in Wireshark. You'll see every RTP stream in the capture with: Select a stream and click Analyze. The detailed view shows: Wireshark doesn't directly display MOS scores, but you can calculate an estimate using the E-model (ITU-T G.107) from the captured metrics: Step 1: Get your jitter and packet loss numbers from the RTP stream analysis. Step 2: Calculate the R-factor: This is a simplified version of the E-model formula. The full E-model has dozens of parameters, but for practical call center diagnostics, the simplified version gets you within 0.2 points of the full calculation. Step 3: Convert R-factor to MOS: Example: You measure 0.5% packet loss and 12ms average jitter: That's excellent quality. Contrast with 3% packet loss and 40ms jitter: Still acceptable but noticeably degraded. From our deployments, here's what healthy VICIdial systems show in Wireshark: Asterisk provides real-time quality metrics for active calls through the CLI. This won't give you a MOS score directly, but it shows the underlying metrics (jitter, loss, RTT) that you can plug into the E-model formula. While a call is in progress: Look for the RTP statistics section in the output. You'll see: For chan_sip (legacy): You can configure Asterisk to log RTP quality metrics for every call. In rtp.conf: Then in Asterisk CLI: This outputs per-packet RTP statistics to the console, including jitter calculations. It's verbose — don't leave it on in production. Use it for diagnostic sessions. Asterisk CDRs (Call Detail Records) include quality data if your system is configured to log it. The userfield or custom CDR variables can capture end-of-call RTP statistics. You can query these from the VICIdial database: If quality data isn't showing up in your CDRs, you may need to add AGI or dialplan logic to capture the channel's RTP stats before hangup and write them to the CDR. Before you blame the carrier for bad audio, verify that your network can handle VoIP traffic. These tools test the network path between your Asterisk server and the carrier without involving actual calls. Install on your Asterisk server and a remote endpoint: The -u flag uses UDP (like RTP), -b 1M sets bandwidth to 1 Mbps (roughly 12 concurrent G.711 calls), -t 30 runs for 30 seconds. This sends 100 probes and reports per-hop latency and loss. Look for: Fast-paced ping (50 per second) to simulate real-time traffic timing. Check: PESQ (ITU-T P.862) and its successor POLQA (ITU-T P.863) are the gold standard for objective voice quality measurement. They work by comparing a reference audio signal to the degraded signal after it's traveled through your VoIP system. For VICIdial deployments using G.711, PESQ is fine and much cheaper. POLQA is needed only if you're running wideband codecs or need to validate modern codec quality. The ITU published the PESQ algorithm source code (P.862 reference implementation) for research purposes. You can find it referenced in academic papers and some open-source projects. It's not commercially licensed for production use, but it's useful for internal testing. There's also the ViSQOL (Virtual Speech Quality Objective Listener) project from Google, which is open source and available on GitHub. It provides MOS predictions using a different algorithm than PESQ/POLQA but with reasonable accuracy. For a VICIdial system, here's a practical test workflow: This tests the full path: local network → Asterisk → MeetMe bridge → SIP trunk → carrier → return path. One-time tests are useful for diagnostics, but VoIP quality changes throughout the day as network conditions fluctuate. Continuous monitoring catches degradation before agents start complaining. VoIPmonitor is an open-source network packet sniffer that captures and analyzes SIP/RTP traffic. It calculates MOS scores for every call and stores them in a database for trending. Install on your Asterisk server (or a mirror port): Configuration in /etc/voipmonitor.conf: VoIPmonitor produces per-call quality reports with: If you're already running Grafana dashboards for VICIdial, you can add VoIP quality panels. The process: This gives operations managers a visual dashboard showing call quality trends without needing to dig through packet captures. For most VICIdial call centers, VoIPmonitor (free) plus Wireshark (for deep dives) covers 90% of quality monitoring needs. You've measured your MOS scores. They're bad. Now what? The root cause maps to specific metrics: Symptoms: Choppy audio, missing syllables, robot-voice effect. Symptoms: Inconsistent audio quality — sometimes fine, sometimes garbled. Words arrive out of order or with gaps. Symptoms: Conversation feels unnatural — people talk over each other, awkward pauses. Not distorted, just delayed. Before you blame the carrier, run through this checklist: Test during peak hours — Quality problems often only appear when your network and the carrier's network are both busy. Testing at 3 AM tells you nothing about 10 AM quality. Test both directions — Jitter and loss can be asymmetric. The customer-to-agent path might be fine while the agent-to-customer path is degraded. Test multiple carriers — If you have multiple SIP trunks, compare MOS scores across carriers. If one carrier consistently scores lower, you have your answer. Test at your call volume — Quality at 5 concurrent calls and quality at 100 concurrent calls are different. Bandwidth contention, Asterisk CPU load, and carrier congestion all increase with scale. Document your baseline — Capture MOS scores during a period of known good quality. When problems arise, you can compare against the baseline to quantify the degradation. We audited a 75-agent VICIdial center that was convinced they needed a carrier change because agents reported terrible audio quality. We deployed VoIPmonitor for a week and collected MOS data for every call. The carrier was fine. The problem was the agents' network — they were remote agents using consumer-grade ISPs with no QoS, WiFi connections with interference, and shared household bandwidth. Three agents were on satellite internet with 600ms latency. The fix wasn't a carrier change. It was QoS configuration at agent home routers, requiring wired Ethernet for agent workstations, and replacing the satellite-internet agents' connections with terrestrial broadband. Total cost: about $2,000 in router upgrades versus the $3,000/month carrier switch they were planning. That's the value of measuring MOS instead of guessing. And it's the kind of analysis we do as part of every ViciStack engagement — diagnose the real problem with real data before spending money on the wrong fix. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse
When MOS Testing Reveals the Real Problem - Average MOS on the carrier trunk: 4.28 (excellent)
- Average MOS on agent-side legs: 3.21 (poor)