Open source LLM evaluation - AI tools
-
BenchLLM The best way to evaluate LLM-powered apps
BenchLLM is a tool for evaluating LLM-powered applications. It allows users to build test suites, generate quality reports, and choose between automated, interactive, or custom evaluation strategies.
- Other
-
PromptsLabs A Library of Prompts for Testing LLMs
PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.
- Free
-
Gentrace Intuitive evals for intelligent applications
Gentrace is an LLM evaluation platform designed for AI teams to test and automate evaluations of generative AI products and agents. It facilitates collaborative development and ensures high-quality LLM applications.
- Usage Based
-
EleutherAI Empowering Open-Source Artificial Intelligence Research
EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.
- Free
-
Conviction The Platform to Evaluate & Test LLMs
Conviction is an AI platform designed for evaluating, testing, and monitoring Large Language Models (LLMs) to help developers build reliable AI applications faster. It focuses on detecting hallucinations, optimizing prompts, and ensuring security.
- Freemium
- From 249$
-
Laminar The AI engineering platform for LLM products
Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.
- Freemium
- From 25$
-
LLM Explorer Discover and Compare Open-Source Language Models
LLM Explorer is a comprehensive platform for discovering, comparing, and accessing over 46,000 open-source Large Language Models (LLMs) and Small Language Models (SLMs).
- Free
-
EvalsOne Evaluate LLMs & RAG Pipelines Quickly
EvalsOne is a platform for rapidly evaluating Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) pipelines using various metrics.
- Freemium
- From 19$
-
phoenix.arize.com Open-source LLM tracing and evaluation
Phoenix accelerates AI development with powerful insights, allowing seamless evaluation, experimentation, and optimization of AI applications in real time.
- Freemium
-
ModelBench No-Code LLM Evaluations
ModelBench enables teams to rapidly deploy AI solutions with no-code LLM evaluations. It allows users to compare over 180 models, design and benchmark prompts, and trace LLM runs, accelerating AI development.
- Free Trial
- From 49$
-
Gorilla Large Language Model Connected with Massive APIs
Gorilla is an open-source Large Language Model (LLM) developed by UC Berkeley, designed to interact with a wide range of services by invoking API calls.
- Free
-
Hegel AI Developer Platform for Large Language Model (LLM) Applications
Hegel AI provides a developer platform for building, monitoring, and improving large language model (LLM) applications, featuring tools for experimentation, evaluation, and feedback integration.
- Contact for Pricing
-
Transformer Lab The Open Source Platform for Training Advanced AI Models
Transformer Lab is an open-source platform enabling researchers, ML engineers, and developers to collaboratively build, train, evaluate, and deploy AI models with features like provenance, reproducibility, and transparency.
- Free
-
Axolotl AI We make fine-tuning accessible, scalable, fun
Axolotl AI is a free, open-source tool designed to make fine-tuning Large Language Models (LLMs) faster, more accessible, and scalable across various AI models and platforms.
- Free
-
Superpipe The OSS experimentation platform for LLM pipelines
Superpipe is an open-source experimentation platform designed for building, evaluating, and optimizing Large Language Model (LLM) pipelines to improve accuracy and minimize costs. It allows deployment on user infrastructure for enhanced privacy and security.
- Free
-
Rhesis AI Open-source test generation SDK for LLM applications
Rhesis AI offers an open-source SDK to generate comprehensive, context-specific test sets for LLM applications, enhancing AI evaluation, reliability, and compliance.
- Freemium
-
TheFastest.ai Reliable performance measurements for popular LLM models.
TheFastest.ai provides reliable, daily updated performance benchmarks for popular Large Language Models (LLMs), measuring Time To First Token (TTFT) and Tokens Per Second (TPS) across different regions and prompt types.
- Free
-
neutrino AI Multi-model AI Infrastructure for Optimal LLM Performance
Neutrino AI provides multi-model AI infrastructure to optimize Large Language Model (LLM) performance for applications. It offers tools for evaluation, intelligent routing, and observability to enhance quality, manage costs, and ensure scalability.
- Usage Based
-
Parea Test and Evaluate your AI systems
Parea is a platform for testing, evaluating, and monitoring Large Language Model (LLM) applications, helping teams track experiments, collect human feedback, and deploy prompts confidently.
- Freemium
- From 150$
-
OneLLM Fine-tune, evaluate, and deploy your next LLM without code.
OneLLM is a no-code platform enabling users to fine-tune, evaluate, and deploy Large Language Models (LLMs) efficiently. Streamline LLM development by creating datasets, integrating API keys, running fine-tuning processes, and comparing model performance.
- Freemium
- From 19$
-
Langfuse Open Source LLM Engineering Platform
Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.
- Freemium
- From 59$
Featured Tools
Join Our Newsletter
Stay updated with the latest AI tools, news, and offers by subscribing to our weekly newsletter.
Explore More
-
social media video maker AI 60 tools
-
social media team collaboration tool 27 tools
-
Photo to Ghibli animation style 30 tools
-
how to use Flux AI image generator 60 tools
-
Data analytics and visualization 37 tools
-
Video audio editing software 41 tools
-
AI homework helper extension 45 tools
-
Voice AI journey mapping tool 42 tools
-
AI fortune telling tool 25 tools
Didn't find tool you were looking for?