Fara-7b By Microsoft: An Agentic Small Language Model Designed For...
Fara-7B is Microsoft's first agentic small language model (SLM) designed specifically for computer use. With only 7 billion parameters, Fara-7B is an ultra-compact Computer Use Agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.
Try Fara-7B locally as follows (see Installation for detailed instructions):
Hint: might need to do --tensor-parallel-size 2 with vllm command if you run out of memory
Unlike traditional chat models that generate text-based responses, Fara-7B leverages computer interfaces—mouse and keyboard—to perform multi-step tasks on behalf of users. The model:
Fara-7B is trained using a novel synthetic data generation pipeline built on the Magentic-One multi-agent framework, with 145K trajectories covering diverse websites, task types, and difficulty levels. The model is based on Qwen2.5-VL-7B and trained with supervised fine-tuning.
Fara-7B achieves state-of-the-art results across multiple web agent benchmarks, outperforming both comparable-sized models and larger systems:
Table: Online agent evaluation results showing success rates (%) across four web benchmarks. Results are averaged over 3 runs.
We are releasing WebTailBench, a new evaluation benchmark focusing on 11 real-world task types that are underrepresented or missing in existing benchmarks. The benchmark includes 609 tasks across diverse categories, with the first 8 segments testing single skills or objectives (usually on a single website), and the remaining 3 evaluating more difficult multi-step or cross-site tasks.
Table: Breakdown of WebTailBench results across all 11 segments. Success rates (%) are averaged over 3 independent runs. Fara-7B achieves the highest performance among computer-use models across all task categories.
Note: Fara-7B is an experimental release designed to invite hands-on exploration and feedback from the community. We recommend running it in a sandboxed environment, monitoring its execution, and avoiding sensitive data or high-risk domains.
Source: HackerNews