Tools: How We Added Silence Removal to SendRec

Tools: How We Added Silence Removal to SendRec

Source: Dev.to

Detection: ffmpeg's silencedetect filter ## The presigned URL optimization ## Duration clamping ## Frontend ## What we learned Anyone who has watched a screen recording knows the feeling: the presenter pauses to think, navigates a menu slowly, or just trails off between thoughts. Those dead seconds add up fast. In v1.65.4, SendRec can detect and remove them automatically. We already had filler word removal — detecting "um", "uh", and similar disfluencies from transcript data, presenting them as a checklist, and cutting them out. Silence removal follows the same pattern, but operates purely on audio energy rather than transcription. ffmpeg ships a silencedetect audio filter that scans an audio stream and emits timestamps whenever audio drops below a configurable noise floor for at least a minimum duration. The -vn flag tells ffmpeg to skip video decoding entirely. Silence detection only needs the audio stream, and skipping video makes the scan significantly faster on large files. ffmpeg writes silence events to stderr: We parse this with a pair of regexes: Walk the stderr lines, match each pattern, pair starts with ends. If audio is silent at the very end of a recording, ffmpeg emits a silence_start with no corresponding silence_end — we discard those unpaired starts. The result is a list of {start, end} pairs returned via a new endpoint: The request body accepts two optional parameters: noiseDB (default -30) and minDuration (default 1.0 seconds). The initial implementation downloaded the full video from S3 to a temp file on disk, then ran ffmpeg against the local path. This worked, but the wait before detection started scaled linearly with file size. A large recording meant waiting for the entire download before ffmpeg processed a single audio frame. The fix: generate a presigned S3 URL and pass it directly to ffmpeg. ffmpeg handles HTTPS inputs natively. It starts streaming and processing audio immediately — no temp file, no download wait. Combined with -vn skipping the video track, detection on a long recording completes in seconds rather than minutes. ffmpeg reports timestamps as floats. Our database stores video duration as an integer (seconds). This mismatch caused a subtle bug: a video stored as 120 seconds could have ffmpeg report silence ending at 120.041 seconds, which the segment removal worker would reject with "segment end exceeds video duration." The fix is a clamping step before returning results: This keeps detection results consistent with what the removal worker expects, regardless of float-to-integer rounding. The UI mirrors the existing filler word removal modal. Click "Remove Silence" on a video, and a modal shows each detected pause as a checkbox entry with timestamp range and duration, all selected by default. Confirm, and the segments are handed off to the same removeSegmentsFromVideo worker that handles filler word removal. No new cut logic was needed. The presigned URL approach is worth considering any time you run a command-line tool against a cloud-stored file. Passing a URL directly to ffmpeg eliminates the download-to-disk step, and ffmpeg's streaming behavior means processing starts almost immediately. Combined with -vn for audio-only analysis, the latency improvement on large files is substantial. The float/integer duration mismatch only showed up with real recordings. ffmpeg is precise; databases round. A clamping step at the boundary prevents confusing errors downstream. SendRec is open source. The full implementation is on GitHub. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK: ffmpeg -i input.mp4 -vn -af silencedetect=noise=-30dB:d=1.0 -f null - Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: ffmpeg -i input.mp4 -vn -af silencedetect=noise=-30dB:d=1.0 -f null - CODE_BLOCK: ffmpeg -i input.mp4 -vn -af silencedetect=noise=-30dB:d=1.0 -f null - CODE_BLOCK: [silencedetect @ 0x...] silence_start: 3.504 [silencedetect @ 0x...] silence_end: 5.200 | silence_duration: 1.696 Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: [silencedetect @ 0x...] silence_start: 3.504 [silencedetect @ 0x...] silence_end: 5.200 | silence_duration: 1.696 CODE_BLOCK: [silencedetect @ 0x...] silence_start: 3.504 [silencedetect @ 0x...] silence_end: 5.200 | silence_duration: 1.696 CODE_BLOCK: startRe := regexp.MustCompile(`silence_start:\s*([\d.]+)`) endRe := regexp.MustCompile(`silence_end:\s*([\d.]+)`) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: startRe := regexp.MustCompile(`silence_start:\s*([\d.]+)`) endRe := regexp.MustCompile(`silence_end:\s*([\d.]+)`) CODE_BLOCK: startRe := regexp.MustCompile(`silence_start:\s*([\d.]+)`) endRe := regexp.MustCompile(`silence_end:\s*([\d.]+)`) CODE_BLOCK: POST /api/videos/{id}/detect-silence Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: POST /api/videos/{id}/detect-silence CODE_BLOCK: POST /api/videos/{id}/detect-silence CODE_BLOCK: presignedURL, err := h.storage.GenerateDownloadURL(ctx, fileKey, 15*time.Minute) cmd := exec.Command("ffmpeg", "-i", presignedURL, "-vn", "-af", fmt.Sprintf("silencedetect=noise=%ddB:d=%.2f", noiseDB, minDuration), "-f", "null", "-", ) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK: presignedURL, err := h.storage.GenerateDownloadURL(ctx, fileKey, 15*time.Minute) cmd := exec.Command("ffmpeg", "-i", presignedURL, "-vn", "-af", fmt.Sprintf("silencedetect=noise=%ddB:d=%.2f", noiseDB, minDuration), "-f", "null", "-", ) CODE_BLOCK: presignedURL, err := h.storage.GenerateDownloadURL(ctx, fileKey, 15*time.Minute) cmd := exec.Command("ffmpeg", "-i", presignedURL, "-vn", "-af", fmt.Sprintf("silencedetect=noise=%ddB:d=%.2f", noiseDB, minDuration), "-f", "null", "-", ) - Drop any segment whose start is at or beyond the stored duration - Cap any segment end at the stored duration - Drop any segment shorter than 0.1 seconds after clamping