This project is no longer maintained. Some features may be non-functional — see Funnel instead.
Multi-LLM evaluation platform querying 6+ models simultaneously with synthesized consensus responses and support for 11+ APIs through a unified routing layer.
Overview
Consensus was a multi-LLM evaluation platform designed to compare concurrent model outputs in real-time. The system executed simultaneous queries across various Gemini models, automatically synthesizing responses to resolve discrepancies and extract high-confidence insights.
Highlights
Achieved 236 tokens/s and 2.5s p95 latency using Gemini 2.0 Flash for streaming inference.
Built a web app running 6+ Gemini LLMs simultaneously with side-by-side LaTeX results and collapsible UI.
Developed a React and Express.js stack to aggregate and merge outputs from multiple lightweight models.

