open-source LLM testing tool - AI tools

BenchLLM is a tool for evaluating LLM-powered applications. It allows users to build test suites, generate quality reports, and choose between automated, interactive, or custom evaluation strategies.
- Other

PromptsLabs is a community-driven platform providing copy-paste prompts to test the performance of new LLMs. Explore and contribute to a growing collection of prompts.
- Free

Conviction is an AI platform designed for evaluating, testing, and monitoring Large Language Models (LLMs) to help developers build reliable AI applications faster. It focuses on detecting hallucinations, optimizing prompts, and ensuring security.
- Freemium
- From 249$

Rhesis AI offers an open-source SDK to generate comprehensive, context-specific test sets for LLM applications, enhancing AI evaluation, reliability, and compliance.
- Freemium

Gentrace is an LLM evaluation platform designed for AI teams to test and automate evaluations of generative AI products and agents. It facilitates collaborative development and ensures high-quality LLM applications.
- Usage Based

promptfoo is an open-source LLM testing tool designed to help developers secure and evaluate their language model applications, offering features like vulnerability scanning and continuous monitoring.
- Freemium

Libretto offers comprehensive LLM monitoring, automated prompt testing, and optimization tools to ensure the reliability and performance of your AI applications.
- Freemium
- From 180$

Phoenix accelerates AI development with powerful insights, allowing seamless evaluation, experimentation, and optimization of AI applications in real time.
- Freemium

Autoblocks is a collaborative testing and evaluation platform for LLM-based products that automatically improves through user and expert feedback, offering comprehensive tools for monitoring, debugging, and quality assurance.
- Freemium
- From 1750$

Laminar is an open-source platform that enables developers to trace, evaluate, label, and analyze Large Language Model (LLM) applications with minimal code integration.
- Freemium
- From 25$

Langtail is a comprehensive testing platform that enables teams to test and debug LLM-powered applications with a spreadsheet-like interface, offering security features and integration with major LLM providers.
- Freemium
- From 99$

GPT-LLM Playground is a macOS application designed for advanced experimentation and testing with Language Learning Models (LLMs). It offers features like multi-model support, versioning, and custom endpoints.
- Free

ModelBench enables teams to rapidly deploy AI solutions with no-code LLM evaluations. It allows users to compare over 180 models, design and benchmark prompts, and trace LLM runs, accelerating AI development.
- Free Trial
- From 49$

Superpipe is an open-source experimentation platform designed for building, evaluating, and optimizing Large Language Model (LLM) pipelines to improve accuracy and minimize costs. It allows deployment on user infrastructure for enhanced privacy and security.
- Free

surf.new is a playground environment for testing various web agents powered by Large Language Models (LLMs) that can navigate and interact with websites.
- Free

Langfuse provides an open-source platform for tracing, evaluating, and managing prompts to debug and improve LLM applications.
- Freemium
- From 59$

Ottic empowers tech and non-technical teams to test LLM applications, ensuring faster product development and enhanced reliability. Streamline your QA process and gain full visibility into your LLM application's behavior.
- Contact for Pricing

RoostGPT is an AI-powered testing co-pilot that automates test case generation, providing 100% test coverage while detecting static vulnerabilities. It leverages Large Language Models to enhance software development efficiency and reliability.
- Paid
- From 25000$

LLM Explorer is a comprehensive platform for discovering, comparing, and accessing over 46,000 open-source Large Language Models (LLMs) and Small Language Models (SLMs).
- Free

EleutherAI is a research institute focused on advancing and democratizing open-source AI, particularly in language modeling, interpretability, and alignment. They train, release, and evaluate powerful open-source LLMs.
- Free

TheFastest.ai provides reliable, daily updated performance benchmarks for popular Large Language Models (LLMs), measuring Time To First Token (TTFT) and Tokens Per Second (TPS) across different regions and prompt types.
- Free

Hegel AI provides a developer platform for building, monitoring, and improving large language model (LLM) applications, featuring tools for experimentation, evaluation, and feedback integration.
- Contact for Pricing

GeneratorLLMs is a tool that creates standardized `llms.txt` files by extracting core website content. This improves how Large Language Models (LLMs) understand websites and enhances AI visibility.
- Free

Inductor enables developers to rapidly prototype, evaluate, and improve LLM applications, ensuring high-quality app delivery.
- Freemium

NeuralTrust offers a unified platform for securing, testing, monitoring, and scaling Large Language Model (LLM) applications, ensuring robust security, regulatory compliance, and operational control for enterprises.
- Contact for Pricing
Featured Tools

DeepSwaper
Free AI Face Swap Video & Photo Online
Foundor.ai
Business Planning, Supercharged by AI
SpicyGen
Turn your AI Images into Spicy Videos
SweetAI
Best NSFW AI: Free Sex Chat, Image Generator, Characters for Adults
MiriCanvas
Complete all your designs with MiriCanvas
BestFaceSwap
Change faces in videos and photos with 3 simple clicks
Search Daddie
Discover the Best NSFW AI on the Internet
Freebeat.ai
Turn Music into Viral Videos In One Click
Kindo
Enterprise-Ready Agentic Security for DevOps and SecOps AutomationDidn't find tool you were looking for?