Tools
Tools: How We Built a 5x Faster NotebookLM Watermark Tool on AWS — Engineering Story
2026-02-20
0 views
admin
How We Built a 5x Faster NotebookLM Watermark Tool on AWS ## The Technical Problem ## Architecture Decision: Why Lambda ## Building the Pipeline ## Phase 1: Prototype (Days 1–3) ## Phase 2: Watermark Detection Algorithm (Days 4–7) ## Phase 3: The Orchestration Layer (Days 8–10) ## Phase 4: The API Layer (Days 11–14) ## What Broke Along the Way ## Results ## Business Results (First 6 Weeks) ## What's Next ## Frequently Asked Questions ## Try the Result Originally published at notebooklmstudio.com Every product has an origin story. Ours started with a frozen browser tab. I was working on a research presentation, had exported a 45-page PDF from Google NotebookLM, and needed the watermark off before a meeting. I loaded the free browser tool, uploaded the file, and watched my MacBook's fan spin up as the browser processed page by page. Three minutes later — with my browser tab completely frozen — I had my clean PDF. Three minutes for something a computer should be able to do in seconds. That was the moment I decided to build something better. The existing tools all shared the same architecture: upload file → run JavaScript in the browser → download result. That architecture has a fundamental ceiling. JavaScript in a browser is single-threaded (Web Workers help, but you're still bound by the user's CPU cores and browser overhead). A 30-page PDF means 30 sequential canvas operations. A 60-second MP4 at 30fps means 1,800 sequential frame operations. The data doesn't lie: The processing is inherently parallelizable. Each page of a PDF is independent. Each frame of a video is independent. The browser just can't parallelize them adequately. I considered several server-side approaches: Option A: Traditional server (EC2/VPS)
Pros: Simple. Cons: Scaling requires manual provisioning, cost for idle capacity, single server = single bottleneck. Option B: Container-based (ECS/Fargate)
Pros: More control. Cons: Slower cold starts for parallel fan-out, more operational overhead. Option C: Lambda
Pros: Instant horizontal scale, pay-per-use, no idle costs, AWS handles all infrastructure.
Cons: Cold starts, 15-minute max runtime (not an issue for our workloads), statelessness. Lambda was the obvious choice. The key insight: Lambda's concurrency model maps perfectly to our parallelization need. For a 30-page PDF, I don't need 1 powerful server — I need 30 Lambda functions running simultaneously, each processing one page. I started with a simple proof of concept: The prototype worked but had a problem: my watermark detection was too aggressive — it was removing some content that visually resembled the watermark pattern. NotebookLM's watermark has consistent characteristics across exports: I ended up combining three detection methods: Method 1: Template matching — The watermark has a known visual pattern. Template matching finds it with ~95% accuracy. Method 2: Frequency-domain analysis — FFT reveals repeating patterns. The watermark's repetition shows up as distinct peaks in the frequency domain. Method 3: Statistical anomaly detection — The watermark pixels have different statistical properties than document content pixels (color distribution, gradient patterns). Using all three with confidence weighting gave me 99%+ detection accuracy with near-zero false positives. With worker Lambdas working, I needed an orchestrator to: The reassembly Lambda is triggered by an SQS queue that workers publish to upon completion. When all N chunks are complete (tracked in DynamoDB), it runs the assembly step. I built the REST API on Next.js 14 (App Router) for the frontend and API Gateway + Lambda for the processing backend. The frontend API routes handle: The heavy processing never touches Next.js — it goes straight from the browser to S3 via presigned URL, then triggers the Lambda pipeline. Problem 1: Lambda cold starts for first batch
When I deployed, the first request after idle always had a 2-3 second delay due to Lambda cold starts. Fixed with provisioned concurrency for the orchestrator Lambda (the workers are invoked enough that they warm up quickly). Problem 2: Large MP4 reassembly timing out
The reassembly Lambda for large videos was hitting Lambda's 15-minute limit on very long videos. Fixed by pre-splitting reassembly into smaller merge trees. Problem 3: Memory errors on large PDFs
100-page PDFs were causing memory issues in workers at 1024MB. Bumped worker Lambdas to 2048MB — this also improved CPU allocation (AWS allocates CPU proportionally to RAM). Problem 4: S3 rate limiting on batch jobs
Submitting 50 files simultaneously caused S3 PUT throttling. Fixed with exponential backoff on uploads and distributing across multiple S3 prefixes. After two weeks of evenings and weekends: The 50-file batch benchmark still impresses me — 50 PDFs that would take 90+ minutes in a browser completing in 18 seconds. The engineering investment was ~80 hours over two weeks. For anyone building something similar, Lambda + S3 is a remarkably productive stack for file processing SaaS. Q: What was the most technically challenging part?
A: The watermark detection algorithm. Getting it to reliably detect NotebookLM's specific watermark without false positives required combining multiple detection methods and tuning confidence thresholds carefully. Q: Why Next.js for the frontend and not a simpler stack?
A: Next.js App Router gives us server-side rendering for SEO, easy API routes for the lightweight API layer, and TypeScript throughout. The file processing itself is in Python Lambdas, so the "full-stack JS" concern doesn't apply to the heavy compute. Q: How do you handle Lambda cold starts in production?
A: Provisioned concurrency on the orchestrator Lambda (the entry point). Worker Lambdas warm up naturally with traffic. At current scale, cold starts affect fewer than 2% of requests. Q: What's the infrastructure cost at scale?
A: Lambda costs are genuinely low for this workload — approximately $0.003–0.005 per PDF processed. Even at 10,000 PDFs/month, infrastructure is under $50. S3 and CloudFront add a similar amount. Q: Would you use a different architecture if you started over?
A: Potentially Step Functions instead of custom orchestration for the fan-out coordination — it would simplify the state management. But Lambda + SQS works well and I understand it deeply, which matters for debugging. → NotebookLM Studio — 5x faster watermark removal, free to start 50 free credits. No credit card. Built on the architecture described above. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK:
# Worker Lambda (Python, 2048MB)
import fitz # PyMuPDF
import boto3
import json def handler(event, context): # Get page data from S3 s3 = boto3.client('s3') page_data = s3.get_object( Bucket=event['bucket'], Key=event['page_key'] )['Body'].read() # Open single page PDF doc = fitz.open(stream=page_data, filetype="pdf") page = doc[0] # Detect and remove watermark clean_page = remove_watermark(page) # Save cleaned page back to S3 output_key = event['output_key'] s3.put_object(Bucket=event['bucket'], Key=output_key, Body=clean_page) return {'status': 'completed', 'output_key': output_key} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Worker Lambda (Python, 2048MB)
import fitz # PyMuPDF
import boto3
import json def handler(event, context): # Get page data from S3 s3 = boto3.client('s3') page_data = s3.get_object( Bucket=event['bucket'], Key=event['page_key'] )['Body'].read() # Open single page PDF doc = fitz.open(stream=page_data, filetype="pdf") page = doc[0] # Detect and remove watermark clean_page = remove_watermark(page) # Save cleaned page back to S3 output_key = event['output_key'] s3.put_object(Bucket=event['bucket'], Key=output_key, Body=clean_page) return {'status': 'completed', 'output_key': output_key} COMMAND_BLOCK:
# Worker Lambda (Python, 2048MB)
import fitz # PyMuPDF
import boto3
import json def handler(event, context): # Get page data from S3 s3 = boto3.client('s3') page_data = s3.get_object( Bucket=event['bucket'], Key=event['page_key'] )['Body'].read() # Open single page PDF doc = fitz.open(stream=page_data, filetype="pdf") page = doc[0] # Detect and remove watermark clean_page = remove_watermark(page) # Save cleaned page back to S3 output_key = event['output_key'] s3.put_object(Bucket=event['bucket'], Key=output_key, Body=clean_page) return {'status': 'completed', 'output_key': output_key} COMMAND_BLOCK:
# Orchestrator Lambda
import boto3
import json
import uuid def handler(event, context): job_id = str(uuid.uuid4()) input_key = event['input_key'] # Determine file type and split if input_key.endswith('.pdf'): chunks = split_pdf_to_pages(input_key) elif input_key.endswith('.mp4'): chunks = split_video_to_gops(input_key) else: chunks = [input_key] # Single image # Dispatch workers in parallel lambda_client = boto3.client('lambda') waiter_tasks = [] for i, chunk_key in enumerate(chunks): output_key = f"processing/{job_id}/chunk_{i:04d}" response = lambda_client.invoke_async( FunctionName='nlms-worker', InvokeArgs=json.dumps({ 'bucket': BUCKET, 'page_key': chunk_key, 'output_key': output_key, 'job_id': job_id, }) ) waiter_tasks.append(output_key) # Store job state in DynamoDB update_job_state(job_id, 'processing', total_chunks=len(chunks)) return {'job_id': job_id, 'total_chunks': len(chunks)} Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK:
# Orchestrator Lambda
import boto3
import json
import uuid def handler(event, context): job_id = str(uuid.uuid4()) input_key = event['input_key'] # Determine file type and split if input_key.endswith('.pdf'): chunks = split_pdf_to_pages(input_key) elif input_key.endswith('.mp4'): chunks = split_video_to_gops(input_key) else: chunks = [input_key] # Single image # Dispatch workers in parallel lambda_client = boto3.client('lambda') waiter_tasks = [] for i, chunk_key in enumerate(chunks): output_key = f"processing/{job_id}/chunk_{i:04d}" response = lambda_client.invoke_async( FunctionName='nlms-worker', InvokeArgs=json.dumps({ 'bucket': BUCKET, 'page_key': chunk_key, 'output_key': output_key, 'job_id': job_id, }) ) waiter_tasks.append(output_key) # Store job state in DynamoDB update_job_state(job_id, 'processing', total_chunks=len(chunks)) return {'job_id': job_id, 'total_chunks': len(chunks)} COMMAND_BLOCK:
# Orchestrator Lambda
import boto3
import json
import uuid def handler(event, context): job_id = str(uuid.uuid4()) input_key = event['input_key'] # Determine file type and split if input_key.endswith('.pdf'): chunks = split_pdf_to_pages(input_key) elif input_key.endswith('.mp4'): chunks = split_video_to_gops(input_key) else: chunks = [input_key] # Single image # Dispatch workers in parallel lambda_client = boto3.client('lambda') waiter_tasks = [] for i, chunk_key in enumerate(chunks): output_key = f"processing/{job_id}/chunk_{i:04d}" response = lambda_client.invoke_async( FunctionName='nlms-worker', InvokeArgs=json.dumps({ 'bucket': BUCKET, 'page_key': chunk_key, 'output_key': output_key, 'job_id': job_id, }) ) waiter_tasks.append(output_key) # Store job state in DynamoDB update_job_state(job_id, 'processing', total_chunks=len(chunks)) return {'job_id': job_id, 'total_chunks': len(chunks)} - Specific frequency signature (detectable via FFT)
- Consistent spatial positioning
- Fixed opacity range
- Specific text/logo pattern - Accept the uploaded file
- Split it into chunks (pages for PDF, GOP segments for MP4)
- Store chunks in S3
- Invoke N worker Lambdas simultaneously
- Wait for all workers to complete (via SQS completion messages)
- Reassemble the output
- Return a signed CloudFront URL - Authentication (JWT validation)
- Credit deduction (DynamoDB)
- Presigned URL generation for direct-to-S3 uploads - Users: ~180 signups
- Pro subscribers: 12 ($9.99/month)
- MRR: $119.88
- Infrastructure cost: ~$8/month at current scale
- Unit margin: healthy - Webhook support: In progress — let API users get notified on completion without polling
- S3 direct integration: Let users point to their S3 bucket directly, skip the upload step
- Higher concurrency tiers: For enterprise customers with very high volume
how-totutorialguidedev.toaiserverrouterpythonjavascript