Tools
Tools: Rotating Residential Proxy Evaluation Mini-Lab You Can Run in 90 Minutes
2026-01-22
0 views
admin
lab.py This is a runnable mini-lab for evaluating rotating residential proxies in scraping and monitoring. You’ll generate evidence in 60–90 minutes: rotation proof, sticky-session proof, pool collision metrics under concurrency, a ramp-and-soak signal report, and CP1K. The deeper acceptance gates live in the hub: Rotating Residential Proxies Evaluation Playbook for Web Scraping in 2026 Run the exact same harness against every provider you’re considering, including MaskProxy, so your results compare cleanly. Decide up front what “success” means for your pipeline (not just HTTP 200), and set a hard request budget so you don’t burn time chasing noisy data. Set scope, budget, and evidence fields Pick two targets you are allowed to test: Easy target: stable baseline for exit IP and latency (an IP echo endpoint works). Defended target: a real site that matches your production pain (price intel, availability checks, SERP monitoring), tested within policy and terms. Write down a request budget and stop conditions: Stop if 403 or 429 rates stay high for 2–3 minutes. Stop if p95 latency doubles and stays there. Stop if challenge pages dominate “success.” If you need a quick reference for rotation modes you’ll see in the wild (rotate every request vs sticky for N minutes), keep the definition consistent with how you set your session model: Rotating Proxies Log one JSON record per request. Keep the fields stable across all tests so you can diff runs and compute metrics without hand-waving: ts, test, target, url, status, latency_ms, exit_ip, session, bytes, retry, sig Build the tiny harness with JSONL logs Create a timestamped run folder and write one JSON line per request. This makes your lab reproducible and audit-friendly when someone asks “what exactly did we measure?” import os, json, time, uuid, asyncio
from typing import Optional, Dict, Any
import httpx RUN_ID = time.strftime("%Y%m%d-%H%M%S") + "-" + uuid.uuid4().hex[:6]
OUTDIR = f"./runs/{RUN_ID}"
os.makedirs(OUTDIR, exist_ok=True)
LOG_PATH = f"{OUTDIR}/requests.jsonl" EASY_URL = os.getenv("EASY_URL", "https://api.ipify.org?format=json")
DEFENDED_URL = os.getenv("DEFENDED_URL", "https://example.com/") MAX_REQUESTS = int(os.getenv("MAX_REQUESTS", "4000"))
MAX_MINUTES = int(os.getenv("MAX_MINUTES", "90")) PROXY_URL = os.getenv("PROXY_URL") # http://user:pass@host:port
TIMEOUT_S = float(os.getenv("TIMEOUT_S", "20"))
MAX_RETRIES = int(os.getenv("MAX_RETRIES", "2")) def jlog(rec: Dict[str, Any]) -> None: with open(LOG_PATH, "a", encoding="utf-8") as f: f.write(json.dumps(rec, ensure_ascii=False) + "\n") Wrap requests with timing, retries, and signatures You need three things on every request: exit IP, latency, and a lightweight signature that labels rate limiting, blocking, or challenge behavior. For HTTP semantics, RFC 9110 is the baseline reference when you’re debugging edge behavior: RFC 9110 CHALLENGE_MARKERS = ["captcha", "challenge", "cf-chl", "recaptcha", "px-captcha", "akamai"] def classify(status: int, body_text: str) -> str: lower = (body_text or "").lower() if status == 429: return "rate_limited" if status == 403: return "blocked" if any(m in lower for m in CHALLENGE_MARKERS): return "soft_challenge" if status == 0: return "error" return "ok" async def get_exit_ip(client: httpx.AsyncClient, session: Optional[str]) -> str: headers = {"User-Agent": "eval-lab/1.0"} if session: headers["X-Session"] = session # map to your provider’s sticky-session mechanism r = await client.get(EASY_URL, headers=headers, timeout=TIMEOUT_S) return r.json().get("ip", "") async def fetch(client: httpx.AsyncClient, test: str, target: str, url: str, session: Optional[str]=None) -> Dict[str, Any]: headers = {"User-Agent": "eval-lab/1.0"} if session: headers["X-Session"] = session Prove rotation and sticky sessions with a repeatable test This test answers two questions that matter for “rotating residential proxies free trial” evaluations: Does the pool rotate when you do not pin a session? Does the exit IP stay stable when you do pin a session? def uniq(seq): return len(set(seq)) async def test_rotation_and_sticky(): async with httpx.AsyncClient(proxies=PROXY_URL) as client: rot_ips = [await get_exit_ip(client, session=None) for _ in range(30)] sticky_a = [await get_exit_ip(client, session="A") for _ in range(15)] sticky_b = [await get_exit_ip(client, session="B") for _ in range(15)] Rotation should produce meaningfully more unique IPs than sticky. Sticky A should be mostly stable, and sticky B should differ from sticky A most of the time. If rotation uniqueness is tiny, you’re functionally testing a small shared pool with frequent IP reuse. If you want a product-level reference point while you interpret these rotation and sticky behaviors, use: Rotating Residential Proxies Measure pool collisions and IP reuse under concurrency Collisions are the hidden killer for scraping throughput. If 100 workers share 10 exit IPs, one IP-level reputation event becomes a fleet-wide failure pattern. This is where providers can look fine in single-thread tests and fall apart under real concurrency. Run a micro-burst at your expected in-flight concurrency (50–200). Keep it short and measurable. async def burst_collisions(concurrency=80, total=400): sem = asyncio.Semaphore(concurrency) async with httpx.AsyncClient(proxies=PROXY_URL) as client: async def one(): async with sem: ip = await get_exit_ip(client, session=None) jlog({"ts": int(time.time()), "test": "burst_ip", "target": "easy", "exit_ip": ip}) return ip ips = await asyncio.gather(*[one() for _ in range(total)]) Collision rate increases with concurrency, but it should not instantly collapse into a handful of IPs. If you see heavy top-IP concentration, treat “residential proxy pool capacity” as a gating risk for monitoring jobs. Run this exact burst against MaskProxy and any other candidate pool with the same concurrency and total requests. You want to see which pool degrades first. Run a ramp-and-soak and collect p95, 429, 403, and challenge signals Use a simple load shape: warm-up → ramp → soak. This makes “day-3 style” decay show up within one session. When you interpret 429, don’t invent semantics. The definitive reference for 429 is RFC 6585: RFC 6585 For practical status summaries, MDN is a useful quick check: MDN 429 and MDN 403 async def ramp_soak(): phases = [ ("warmup", 2*60, 20), ("ramp", 8*60, 60), ("soak", 15*60, 60), ] async with httpx.AsyncClient(proxies=PROXY_URL) as client: for name, seconds, conc in phases: end = time.time() + seconds while time.time() < end: sem = asyncio.Semaphore(conc) async def one(): async with sem: return await fetch(client, name, "defended", DEFENDED_URL, session=None) await asyncio.gather(*[one() for _ in range(conc)]) What you’re looking for: p95 latency drift during soak suggests pool saturation, retry amplification, or target throttling. sustained 429 indicates a rate limit wall; sustained 403 indicates refusal or policy blocks. “soft_challenge” is a failure for most pipelines unless you explicitly solve it. Compute CP1K from collected numbers CP1K is cost per 1,000 successful requests. Define success as what your pipeline needs: for many scraping workloads it’s “2xx and not a challenge page.” Start with a simple model: total cost in USD for the run window (your plan proration + traffic charges if applicable), divided by successes per 1,000. def compute_cp1k(log_path: str, total_cost_usd: float) -> None: attempts = 0 successes = 0 When you plug in pricing inputs, use the correct unit basis (per GB vs per request vs plan minimum) so CP1K doesn’t lie. For a concrete pricing reference when you compute CP1K, use: Rotating Residential Proxies Pricing If MaskProxy yields stable soak signals but a higher CP1K than another pool, you now have a real tradeoff discussion: reliability and operability versus raw unit cost. You should now have a run folder with JSONL evidence: rotation and sticky behavior, collision rate under concurrency, ramp-and-soak stability, and a CP1K number you can defend in a go or no-go review. If you want the decision structure that turns these signals into acceptance criteria, close the loop with the hub: Rotating Residential Proxies Evaluation Playbook for Web Scraping in 2026 Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse CODE_BLOCK:
start = time.time()
last_err = None for attempt in range(MAX_RETRIES + 1): try: r = await client.get(url, headers=headers, timeout=TIMEOUT_S, follow_redirects=True) latency_ms = int((time.time() - start) * 1000) body = (r.text[:2000] if "text" in (r.headers.get("content-type") or "") else "") sig = classify(r.status_code, body) rec = { "ts": int(time.time()), "test": test, "target": target, "url": url, "status": r.status_code, "latency_ms": latency_ms, "session": session, "bytes": len(r.content or b""), "retry": attempt, "sig": sig, } jlog(rec) return rec except Exception as e: last_err = repr(e) if attempt == MAX_RETRIES: rec = { "ts": int(time.time()), "test": test, "target": target, "url": url, "status": 0, "latency_ms": int((time.time() - start) * 1000), "session": session, "bytes": 0, "retry": attempt, "sig": "error", "err": last_err, } jlog(rec) return rec await asyncio.sleep(0.5 * (2 ** attempt)) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
start = time.time()
last_err = None for attempt in range(MAX_RETRIES + 1): try: r = await client.get(url, headers=headers, timeout=TIMEOUT_S, follow_redirects=True) latency_ms = int((time.time() - start) * 1000) body = (r.text[:2000] if "text" in (r.headers.get("content-type") or "") else "") sig = classify(r.status_code, body) rec = { "ts": int(time.time()), "test": test, "target": target, "url": url, "status": r.status_code, "latency_ms": latency_ms, "session": session, "bytes": len(r.content or b""), "retry": attempt, "sig": sig, } jlog(rec) return rec except Exception as e: last_err = repr(e) if attempt == MAX_RETRIES: rec = { "ts": int(time.time()), "test": test, "target": target, "url": url, "status": 0, "latency_ms": int((time.time() - start) * 1000), "session": session, "bytes": 0, "retry": attempt, "sig": "error", "err": last_err, } jlog(rec) return rec await asyncio.sleep(0.5 * (2 ** attempt)) CODE_BLOCK:
start = time.time()
last_err = None for attempt in range(MAX_RETRIES + 1): try: r = await client.get(url, headers=headers, timeout=TIMEOUT_S, follow_redirects=True) latency_ms = int((time.time() - start) * 1000) body = (r.text[:2000] if "text" in (r.headers.get("content-type") or "") else "") sig = classify(r.status_code, body) rec = { "ts": int(time.time()), "test": test, "target": target, "url": url, "status": r.status_code, "latency_ms": latency_ms, "session": session, "bytes": len(r.content or b""), "retry": attempt, "sig": sig, } jlog(rec) return rec except Exception as e: last_err = repr(e) if attempt == MAX_RETRIES: rec = { "ts": int(time.time()), "test": test, "target": target, "url": url, "status": 0, "latency_ms": int((time.time() - start) * 1000), "session": session, "bytes": 0, "retry": attempt, "sig": "error", "err": last_err, } jlog(rec) return rec await asyncio.sleep(0.5 * (2 ** attempt)) CODE_BLOCK:
print("rotation unique:", uniq(rot_ips), "of", len(rot_ips))
print("sticky A unique:", uniq(sticky_a), "of", len(sticky_a))
print("sticky B unique:", uniq(sticky_b), "of", len(sticky_b))
print("A vs B overlap:", len(set(sticky_a) & set(sticky_b))) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
print("rotation unique:", uniq(rot_ips), "of", len(rot_ips))
print("sticky A unique:", uniq(sticky_a), "of", len(sticky_a))
print("sticky B unique:", uniq(sticky_b), "of", len(sticky_b))
print("A vs B overlap:", len(set(sticky_a) & set(sticky_b))) CODE_BLOCK:
print("rotation unique:", uniq(rot_ips), "of", len(rot_ips))
print("sticky A unique:", uniq(sticky_a), "of", len(sticky_a))
print("sticky B unique:", uniq(sticky_b), "of", len(sticky_b))
print("A vs B overlap:", len(set(sticky_a) & set(sticky_b))) CODE_BLOCK:
uniq_ips = len(set(ips))
collision_rate = 1 - (uniq_ips / max(1, len(ips)))
print("total:", len(ips), "unique:", uniq_ips, "collision_rate:", round(collision_rate, 3)) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
uniq_ips = len(set(ips))
collision_rate = 1 - (uniq_ips / max(1, len(ips)))
print("total:", len(ips), "unique:", uniq_ips, "collision_rate:", round(collision_rate, 3)) CODE_BLOCK:
uniq_ips = len(set(ips))
collision_rate = 1 - (uniq_ips / max(1, len(ips)))
print("total:", len(ips), "unique:", uniq_ips, "collision_rate:", round(collision_rate, 3)) CODE_BLOCK:
with open(log_path, "r", encoding="utf-8") as f: for line in f: rec = json.loads(line) if rec.get("test") not in ("warmup", "ramp", "soak"): continue attempts += 1 status = rec.get("status", 0) sig = rec.get("sig") if 200 <= status < 300 and sig == "ok": successes += 1 cp1k = (total_cost_usd / (successes / 1000)) if successes else float("inf")
print("attempts:", attempts, "successes:", successes, "CP1K_USD:", round(cp1k, 2)) Enter fullscreen mode Exit fullscreen mode CODE_BLOCK:
with open(log_path, "r", encoding="utf-8") as f: for line in f: rec = json.loads(line) if rec.get("test") not in ("warmup", "ramp", "soak"): continue attempts += 1 status = rec.get("status", 0) sig = rec.get("sig") if 200 <= status < 300 and sig == "ok": successes += 1 cp1k = (total_cost_usd / (successes / 1000)) if successes else float("inf")
print("attempts:", attempts, "successes:", successes, "CP1K_USD:", round(cp1k, 2)) CODE_BLOCK:
with open(log_path, "r", encoding="utf-8") as f: for line in f: rec = json.loads(line) if rec.get("test") not in ("warmup", "ramp", "soak"): continue attempts += 1 status = rec.get("status", 0) sig = rec.get("sig") if 200 <= status < 300 and sig == "ok": successes += 1 cp1k = (total_cost_usd / (successes / 1000)) if successes else float("inf")
print("attempts:", attempts, "successes:", successes, "CP1K_USD:", round(cp1k, 2))
how-totutorialguidedev.toai