Most QA benchmarks count bugs found. We measure whether a tool understands why a bug matters, its business impact, workflow cascade, and domain context.
Every industry needs a standard classification for what goes wrong. Security has one. Functional testing didn't -- until now.
The universal standard for classifying the most critical security vulnerabilities. Every security team, compliance framework, and pen-test report references it.
Standardized security testing globally. Referenced by PCI-DSS, SOC2, ISO 27001.
QEFix™ (Quality Engineering Functional Issues eXplorer) is the first standard classification of the most critical functional issues per SaaS vertical. Domain-specific, not generic. 25+ verticals covered.
Standardizing functional testing across SaaS verticals. Domain-specific bug prioritization.
Before OWASP, security testing was ad-hoc. Before QEFix™, functional testing was ad-hoc. Both bring order by defining what matters most for your specific context.
A tool that flags a missing WHERE clause is useful. A tool that also tells you that missing clause will delete reminders for every user in your system, break three downstream services, and violate your data isolation policy -- that is a different category of intelligence.
WorkflowBench™ evaluates QA tools not just on what they catch, but on how much they understand about the system they are testing.
Each bug is evaluated through three lenses: what category it belongs to, how deeply a tool understands it, and what business outcomes that understanding enables.
QEFix™ Taxonomy
QEFix™ (Quality Engineering Functional Issues eXplorer) classifies the top 10 bug categories for each SaaS vertical. A scheduling app breaks differently than an auth platform.
FQI™ Intelligence Index
FQI™ (Functional QA Intelligence) measures four dimensions of QA intelligence: requirements traceability, workflow awareness, domain knowledge, and learning from history.
Quality Outcomes
The practical impact: production bug escape rate, time to detection, workflow coverage, and false positive ratio.
QEFix™ Taxonomy
QEFix™ (Quality Engineering Functional Issues eXplorer) classifies the top 10 bug categories for each SaaS vertical. A scheduling app breaks differently than an auth platform.
FQI™ Intelligence Index
FQI™ (Functional QA Intelligence) measures four dimensions of QA intelligence: requirements traceability, workflow awareness, domain knowledge, and learning from history.
Quality Outcomes
The practical impact: production bug escape rate, time to detection, workflow coverage, and false positive ratio.
The FQI™ (Functional QA Intelligence) Index measures how deeply a tool understands a code change -- not just whether it spots a bug, but whether it grasps the requirements, workflows, domain rules, and historical patterns involved.
FQI™ = 0.25RI + 0.30WI + 0.25DI + 0.20LIWorkflow Intelligence is weighted highest (30%) because cross-service cascade detection is the hardest capability and the most valuable in production.
Does the tool connect code changes back to user stories and acceptance criteria?
Does the tool identify which end-to-end user workflows are affected?
Does the tool understand the business rules and domain constraints at risk?
Does the tool learn from past incidents and prioritize by real business impact?
The benchmark uses 15 real production bugs from 5 open-source GitHub repositories. Each bug was traced from a fix commit back to the PR that introduced it.
Repositories span five languages and five software verticals, ensuring the framework is tested against diverse domain logic -- from scheduling workflows to authentication flows to monitoring pipelines.
Open-source scheduling infrastructure
Error tracking and performance monitoring
Monitoring and observability platform
Community discussion platform
Below is a real bug from an open-source repo. The comparison shows the difference between surface-level detection and the kind of deep, workflow-aware analysis our framework measures.
The deleteMany query in the booking cancellation handler lacks a proper WHERE clause, causing it to delete workflow reminders for ALL users instead of just the cancelling user's reminders.
Found: deleteMany call in cancellation handler has no WHERE filter on userId. Generated test: 'Verify that cancelling booking A does not delete reminders for booking B.' Did not identify downstream impacts on notification service or calendar sync.
CRITICAL: Unscoped deleteMany will cascade across all tenants. Identified 3 downstream workflow impacts: (1) notification pipeline will send false cancellations to ~all active users, (2) Google Calendar sync will remove events for unrelated bookings, (3) analytics aggregation will report incorrect cancellation metrics. Traced to user story US-142: 'Cancel booking without side effects.' Business rule BR-017 violated: 'Data mutations must be scoped to the acting user's tenant.'
We ran OrangePro's analysis against real pull requests in public repositories. Every finding is verifiable — click through to see the trace matrix, evidence, and the exact GitHub searches.