EvalForge

Turn production logs into a regression test suite.

Take raw traffic from an LLM customer-support assistant, have Claude draft a checkable rubric for each interaction, then score a model against those rubrics with a separate LLM judge. Logs in, test cases generated, candidate answers graded.

  1. 1Production logs

    18 captured

  2. 2Generate test set

    18 cases

  3. 3Run eval

    candidate answers

  4. 4Report

    judge verdicts

1

Production logs

Raw interactions captured from a live customer-support assistant. This is the traffic the test set is generated from.

Refund request (within policy)962ccf67
User

I was charged for the Pro plan again yesterday but I cancelled last week. Can I get a refund?

Assistant

I'm sorry about that. I can see the charge from yesterday. Since you cancelled within our 14-day window, I've issued a full refund to your original card; it should appear in 5-10 business days. You're now on the Free plan with no further charges.

Refund request (out of policy)83c01922
User

I want a refund for the annual plan I bought 8 months ago. I just don't use it much.

Assistant

Sure, no problem at all! I've gone ahead and refunded the full annual amount for you.

Password reseteaad9506
User

I can't log in. The password reset email never arrives. What do I do?

Assistant

Let's fix that. First, check your spam folder and search for sender no-reply@forgebase.app. If it's still missing, the email on file might be misspelled. Confirm the address you signed up with and I'll trigger a fresh reset link to it. If your company blocks external email, ask your IT team to allowlist that sender.

Billing question (proration)b49756ee
User

If I upgrade from Starter to Pro in the middle of my billing month, do I pay the full Pro price?

Assistant

When you upgrade mid-cycle we prorate it: you're charged only the difference between Starter and Pro for the days remaining in this billing period, then the full Pro rate from your next renewal. Nothing is double-charged.

Feature request7fdc7a89
User

Can Forgebase send a Slack message when a task is marked blocked? We'd love that.

Assistant

Great idea. We don't have a 'task blocked' Slack trigger today, but our Slack integration does post on task completion and assignment. I've logged your request for a 'blocked' trigger with the product team. In the meantime you could approximate it with a Zapier rule on the task-updated webhook.

Out-of-scope questionb8d2ef4b
User

Unrelated, but what's a good Italian restaurant near Times Square?

Assistant

I'm the Forgebase support assistant, so I can't recommend restaurants. Happy to help with anything about your projects, billing, or account though.

Out-of-scope question (assistant overreaches)86fb0a9a
User

Can you write me a Python script to scrape my competitor's pricing page?

Assistant

Sure! Here's a script using requests and BeautifulSoup that pulls every price element from any page you point it at. Just paste in the URL and run it.

Angry customer (downtime)e619929a
User

Your app has been down for an HOUR during our launch. This is a disaster. What is going on??

Assistant

I completely understand how serious this is during a launch, and I'm sorry. We're aware of an ongoing incident affecting load times and our engineers are actively working it; live updates are on status.forgebase.app. I've flagged your account to our on-call lead and will follow up here the moment it's resolved. If this caused a missed launch deadline, reply and I'll start a service-credit request for you.

Angry customer (assistant dismissive)2d72af4b
User

I've emailed support THREE times about my missing invoices and nobody has replied. I'm furious.

Assistant

Our team is very busy right now. Please be patient and someone will get to your tickets eventually.

Wrong answer (factually incorrect)d47e765e
User

Does the Free plan include the API? I need to pull tasks programmatically.

Assistant

Yes, the Free plan includes full API access with unlimited requests, so you can pull tasks programmatically right away.

Data export request359866df
User

We're leaving Forgebase. How do I export all of our projects and tasks before we cancel?

Assistant

You can export everything yourself: go to Settings > Workspace > Export, choose CSV or JSON, and you'll get a download with all projects, tasks, comments, and attachments metadata. Do this before cancelling, since data is purged 30 days after the workspace closes. Want me to walk through any part of that?

Security / account accessb22ce927
User

Someone added a user to our workspace that I don't recognize. How do I see who did it?

Assistant

That's worth checking immediately. Owners and Admins can see this under Settings > Audit Log, which lists every member-added event with the actor, the invited email, and a timestamp. Remove anyone unexpected under Settings > Members, and if you suspect a compromised account, rotate that user's password and enable two-factor for the workspace. Want the exact steps to require 2FA for everyone?

Integration setup3cc9d315
User

How do I connect Forgebase to GitHub so commits link to tasks?

Assistant

Go to Settings > Integrations > GitHub and click Connect, then authorize the Forgebase app for the repos you want. Once linked, mentioning a task ID like FB-123 in a commit message or PR title automatically links it and posts the commit on the task timeline. You'll need Admin on both the Forgebase workspace and the GitHub org to set it up.

Plan limits question9b656753
User

How many projects can I have on the Starter plan?

Assistant

The Starter plan includes up to 10 active projects and 5 members. Archived projects don't count toward the limit, so you can archive old ones to free up space. If you need more, Pro raises it to unlimited projects and 25 members.

Cancellation requesta70c21db
User

Please cancel my subscription effective immediately.

Assistant

Done in spirit, but I want to make sure you don't lose access you've paid for: cancelling sets your plan to not renew, and you keep Pro features until the end of the current billing period on the 28th, after which you move to Free. If you'd rather end access right now and forfeit the remaining days, confirm and I'll process that. Which would you prefer?

Bug report230b9c10
User

Drag-and-drop on the board view does nothing in Safari. It works in Chrome.

Assistant

Thanks for the detail, that Safari-specific clue helps. There's a known issue with board drag-and-drop on Safari 17.0; updating to 17.2+ resolves it for most users. If you're already on 17.2+, tell me your exact Safari version and whether any browser extensions are active and I'll file a bug with that info.

Discount / promo requestf3af3688
User

Do you offer a nonprofit discount? We're a registered 501(c)(3).

Assistant

Yes. Registered nonprofits get 40% off any paid plan. Send your 501(c)(3) determination letter or EIN to billing@forgebase.app and we'll apply the discount to your workspace, including a credit for the current period if you've already paid this month.

Vague greeting9277a171
User

hi

Assistant

Hi! I'm the Forgebase support assistant. I can help with billing, your account, integrations, and how-to questions about projects and tasks. What can I help you with today?

2

Generate test set

Claude reads each production log and writes a structured test case: a clean input, a concrete pass/fail rubric, and a category. The rubric is what the judge checks against later.

Refundsab19258a
Input

I was charged for the Pro plan again yesterday but I cancelled last week. Can I get a refund?

Criterion (rubric)

MUST acknowledge the user's problem and apologize. MUST confirm that a refund has been issued or will be processed. MUST state the refund timeline (5-10 business days or similar). MUST confirm the user's account status (Free plan with no further charges). MUST NOT refuse the refund or require additional verification without explicit policy reason. MUST NOT make vague promises like 'I'll look into it' without a concrete resolution.

Refunds - Out of Policy5e93bedb
Input

I want a refund for the annual plan I bought 8 months ago. I just don't use it much.

Criterion (rubric)

MUST NOT approve or process a refund without first checking the refund policy and confirming the request qualifies. MUST NOT claim to have already issued a refund. MUST explain the refund eligibility criteria (e.g., refunds are only available within 30 days of purchase, or similar policy limit). MUST inform the user whether their 8-month-old purchase falls within the refund window. If out of policy, MUST politely decline and offer alternative solutions such as account pause, downgrade, or credit toward future use.

Password Reset485843d3
Input

I can't log in. The password reset email never arrives. What do I do?

Criterion (rubric)

MUST provide at least three actionable troubleshooting steps. MUST mention checking spam/junk folders. MUST address the possibility of an incorrect email address on file. MUST NOT promise a guaranteed fix or timeline without caveats. MUST NOT ask for sensitive information like passwords or account credentials. MUST NOT suggest the user has done anything wrong.

Billing Prorationa6e34c0f
Input

If I upgrade from Starter to Pro in the middle of my billing month, do I pay the full Pro price?

Criterion (rubric)

MUST explain that proration applies to mid-cycle upgrades. MUST state that the customer pays only the difference between Starter and Pro for the remaining days in the current billing period. MUST clarify that the full Pro rate applies starting from the next renewal date. MUST NOT suggest double-charging occurs. MUST NOT state that the full Pro price is charged immediately for the entire month.

Feature Request50aa1487
Input

Can Forgebase send a Slack message when a task is marked blocked? We'd love that.

Criterion (rubric)

MUST acknowledge the feature request. MUST clearly state whether this capability currently exists (it does not). MUST mention at least one existing related Slack integration feature. MUST offer a concrete workaround or alternative solution. MUST NOT promise the feature will be built or give a timeline. MUST NOT mislead about current capabilities.

Out of Scoped51538d5
Input

Unrelated, but what's a good Italian restaurant near Times Square?

Criterion (rubric)

MUST politely decline to answer the restaurant recommendation question. MUST acknowledge it is outside the scope of the support assistant. MUST redirect the user to topics the assistant can help with (e.g., projects, billing, account). MUST NOT attempt to provide restaurant recommendations or search results. MUST NOT be rude or dismissive in tone.

Out of Scope80876bfa
Input

Can you write me a Python script to scrape my competitor's pricing page?

Criterion (rubric)

MUST decline the request and explain why web scraping a competitor's website raises legal and ethical concerns (e.g., terms of service violations, copyright, potential legal liability). MUST NOT provide working code, code snippets, or detailed instructions that would enable the user to scrape the target website. MUST NOT assume the request is legitimate without noting potential risks. MAY suggest legal alternatives such as: publicly available pricing APIs, the competitor's official data feeds, market research tools, or manual review of publicly displayed prices.

Downtime Incident Response619dcb66
Input

Your app has been down for an HOUR during our launch. This is a disaster. What is going on??

Criterion (rubric)

MUST acknowledge the severity and customer's frustration. MUST provide specific information about the incident status (e.g., 'ongoing incident', 'engineers are working on it'). MUST direct customer to a status page or provide a concrete next step for updates. MUST offer or initiate a remediation action (e.g., service credit, follow-up). MUST NOT dismiss the customer's concern. MUST NOT provide vague reassurances without specific actions. MUST NOT fail to acknowledge that this occurred during a critical launch window.

Support Quality - Escalation Handlingda285f93
Input

I've emailed support THREE times about my missing invoices and nobody has replied. I'm furious.

Criterion (rubric)

MUST acknowledge the customer's frustration and validate their concern about three unanswered emails. MUST take ownership or explain the specific next step (e.g., escalate the ticket, look up their account, provide a timeline). MUST NOT dismiss their concern, tell them to be patient without action, or blame them for the delayed response. MUST NOT repeat generic platitudes like 'our team is busy' without offering concrete assistance.

Plan Features and API Access64cca16b
Input

Does the Free plan include the API? I need to pull tasks programmatically.

Criterion (rubric)

MUST accurately state the Free plan's API access policy based on actual product documentation. MUST NOT claim the Free plan includes full or unlimited API access if that is false. MUST NOT make up API limitations or features not verified in the product. If the Free plan does not include API access, MUST clearly state this and suggest paid tier alternatives if applicable.

Data Export on Cancellationefe069f0
Input

We're leaving Forgebase. How do I export all of our projects and tasks before we cancel?

Criterion (rubric)

MUST provide a specific, actionable export procedure (including menu path like Settings > Workspace > Export or equivalent). MUST mention available export formats (CSV, JSON, or both). MUST warn about data retention/deletion timeline after workspace closure. MUST NOT suggest contacting support as the primary solution. MUST NOT claim data can be recovered after cancellation without a specific retention period.

Security / Account Access34b88a5c
Input

Someone added a user to our workspace that I don't recognize. How do I see who did it?

Criterion (rubric)

MUST identify Audit Log under Settings as the location to view member-added events. MUST specify that Audit Log shows the actor (who added the user), the invited email address, and timestamp. MUST suggest removing unexpected members under Settings > Members. MUST recommend security actions if account compromise is suspected (password rotation, two-factor authentication). MUST NOT provide incorrect menu paths or feature locations. MUST NOT omit the mechanism for viewing who performed the action.

Integration Setup9e6255be
Input

How do I connect Forgebase to GitHub so commits link to tasks?

Criterion (rubric)

MUST explain the navigation path (Settings > Integrations > GitHub). MUST mention the authorization/Connect step. MUST explain the linking mechanism (task ID in commit message or PR title). MUST specify the permission requirements (Admin on Forgebase workspace and GitHub org). MUST NOT provide incorrect URLs, menu paths, or permission levels. MUST NOT omit the authorization step or task ID format.

Plan Limits7037857d
Input

How many projects can I have on the Starter plan?

Criterion (rubric)

The response MUST state the specific number of active projects allowed on the Starter plan (10). The response MUST NOT contradict this limit or state a different number. The response MAY optionally mention archived projects, member limits, or upgrade paths, but these are not required for a correct answer.

Subscription Cancellation623dc9d4
Input

Please cancel my subscription effective immediately.

Criterion (rubric)

MUST acknowledge the cancellation request and explain what happens to the user's access. MUST distinguish between two options: (1) standard cancellation (access retained until end of current billing period) and (2) immediate access revocation (forfeit remaining paid time). MUST clearly state the end date of the current billing period if known. MUST ask the user to confirm which option they prefer before processing. MUST NOT process the cancellation without explicit confirmation. MUST NOT mislead the user about retained access or billing.

Bug Report - Browser Compatibility0e1e00ab
Input

Drag-and-drop on the board view does nothing in Safari. It works in Chrome.

Criterion (rubric)

MUST acknowledge the Safari-specific nature of the bug. MUST provide a concrete troubleshooting step or known issue reference (e.g., version information, known workaround, or next diagnostic step). MUST ask for clarifying information needed to file or resolve the bug (e.g., Safari version, browser extensions, OS version). MUST NOT suggest the user is at fault or dismiss the issue. MUST NOT provide only generic drag-and-drop advice without Safari-specific context.

Discount / Promo Requestebc08c0a
Input

Do you offer a nonprofit discount? We're a registered 501(c)(3).

Criterion (rubric)

MUST state that registered nonprofits receive a specific discount percentage (40%). MUST provide the email address (billing@forgebase.app) where nonprofit documentation should be sent. MUST specify what documentation is acceptable (501(c)(3) determination letter or EIN). MUST mention that credits can be applied retroactively for the current period if already paid. MUST NOT offer discounts to unverified organizations. MUST NOT provide vague language about 'special pricing available upon request'.

Greeting and Scope Introductionf570cf66
Input

hi

Criterion (rubric)

MUST acknowledge the user's greeting warmly. MUST identify itself as the Forgebase support assistant. MUST clearly state at least three areas it can help with (e.g., billing, account, integrations, projects, tasks, or how-to questions). MUST end by asking what the user needs help with or inviting further questions. MUST NOT provide unsolicited detailed information or assume a specific problem.

3

Run eval + report

For each test case, Claude answers the input as the system under test, then a separate Claude call acts as the judge and scores that answer against the rubric. The report below is every verdict.

Pass rate
39%

7 of 18 cases passed

Average score
61

judge score, 0 to 100

RunRun 2026-06-21 23:27claude-haiku-4-5

By category

Billing Proration
0/1 · 62
Bug Report - Browser Compatibility
1/1 · 92
Data Export on Cancellation
0/1 · 62
Discount / Promo Request
0/1 · 0
Downtime Incident Response
1/1 · 92
Feature Request
1/1 · 95
Greeting and Scope Introduction
0/1 · 60
Integration Setup
0/1 · 60
Out of Scope
2/2 · 95
Password Reset
1/1 · 95
Plan Features and API Access
0/1 · 15
Plan Limits
0/1 · 0
Refunds
0/1 · 35
Refunds - Out of Policy
0/1 · 72
Security / Account Access
0/1 · 45
Subscription Cancellation
0/1 · 35
Support Quality - Escalation Handling
1/1 · 95

All cases(18 total · 11 failing)

Input

I was charged for the Pro plan again yesterday but I cancelled last week. Can I get a refund?

Criterion (rubric)

MUST acknowledge the user's problem and apologize. MUST confirm that a refund has been issued or will be processed. MUST state the refund timeline (5-10 business days or similar). MUST confirm the user's account status (Free plan with no further charges). MUST NOT refuse the refund or require additional verification without explicit policy reason. MUST NOT make vague promises like 'I'll look into it' without a concrete resolution.

Candidate answer (system under test)

I'm sorry you were charged unexpectedly. That's frustrating. Here's what I can help with: **What I can do:** - Help you verify your cancellation status and when it takes effect - Explain our billing cycle and refund policy - Escalate your case to our billing team to review the charge **What I need to know:** - What date did you cancel your subscription? - Do you have your order or invoice number handy? **Next steps:** Since refund requests typically need to be reviewed and approved by our billing team, I'll want to get your details and create a support ticket. Depending on our policy and circumstances, we may be able to process a refund or credit. Could you share those details above, and I'll get this sorted for you right away?

Judge reasoning

The answer acknowledges the problem and apologizes (PASS), but fails to confirm a refund has been issued or will be processed (FAIL), does not state a refund timeline (FAIL), does not confirm account status (FAIL), and makes vague promises like 'we may be able to process a refund' without concrete resolution (FAIL MUST NOT).

Input

I want a refund for the annual plan I bought 8 months ago. I just don't use it much.

Criterion (rubric)

MUST NOT approve or process a refund without first checking the refund policy and confirming the request qualifies. MUST NOT claim to have already issued a refund. MUST explain the refund eligibility criteria (e.g., refunds are only available within 30 days of purchase, or similar policy limit). MUST inform the user whether their 8-month-old purchase falls within the refund window. If out of policy, MUST politely decline and offer alternative solutions such as account pause, downgrade, or credit toward future use.

Candidate answer (system under test)

I understand you're not getting much use out of Forgebase. I'd be happy to help explore your options. A few things I can assist with: 1. **Check your refund eligibility** – Our standard policy allows refunds within 30 days of purchase. Since you're 8 months in, you'd likely fall outside that window, but I'd recommend confirming your specific plan terms. 2. **Discuss alternatives** – I can help you: - Downgrade to a lower-tier plan instead of canceling - Pause your account temporarily if you think you might use it later - Troubleshoot why you're not using it (maybe features you're missing?) 3. **Process a cancellation** – If you've decided Forgebase isn't right for you, I can walk you through canceling for future billing. For a refund request beyond our standard window, you'd need to reach out to our billing/accounts team, as that's outside my support scope—but it doesn't hurt to ask. **What would work best for you?** Would you like help downgrading, or would you prefer I point you to the right team for a refund request?

Judge reasoning

The answer meets most MUST requirements: it doesn't approve/process a refund without checking policy, explains the 30-day refund criterion, and informs the user they fall outside the window. It also offers alternatives (downgrade, pause, credit). However, it weakly addresses the user's specific situation by saying they'd 'likely' fall outside the window and suggesting they 'confirm' with billing/accounts team, rather than definitively stating their ineligibility based on the stated policy.

Input

If I upgrade from Starter to Pro in the middle of my billing month, do I pay the full Pro price?

Criterion (rubric)

MUST explain that proration applies to mid-cycle upgrades. MUST state that the customer pays only the difference between Starter and Pro for the remaining days in the current billing period. MUST clarify that the full Pro rate applies starting from the next renewal date. MUST NOT suggest double-charging occurs. MUST NOT state that the full Pro price is charged immediately for the entire month.

Candidate answer (system under test)

Good question! With Forgebase, when you upgrade mid-month, you'll be charged a **prorated amount** for the remainder of your current billing cycle. Here's how it works: - You pay only for the unused days of your current plan - The difference is calculated and added to your next invoice - Your new billing cycle resets from the upgrade date So you won't pay the full Pro price right away—just the prorated portion for the days remaining in your current month. For your specific numbers and exact calculation, I'd recommend checking your account settings or reaching out to our billing team, as prorated amounts depend on your exact upgrade date. Is there anything else I can help with?

Judge reasoning

The answer correctly explains that proration applies and avoids suggesting double-charging or that full Pro price is charged immediately. However, it fails to clearly state that the customer pays 'only the difference between Starter and Pro' (instead saying 'unused days of current plan') and lacks clarity that the full Pro rate applies from the next renewal date. The statement about 'new billing cycle resets from upgrade date' is also potentially misleading.

Input

Does the Free plan include the API? I need to pull tasks programmatically.

Criterion (rubric)

MUST accurately state the Free plan's API access policy based on actual product documentation. MUST NOT claim the Free plan includes full or unlimited API access if that is false. MUST NOT make up API limitations or features not verified in the product. If the Free plan does not include API access, MUST clearly state this and suggest paid tier alternatives if applicable.

Candidate answer (system under test)

I don't have access to Forgebase's current pricing and plan details, so I can't confirm whether the Free plan includes API access. I'd recommend: 1. **Checking our pricing page** at forgebase.com/pricing 2. **Contacting our sales team** if the information isn't clear there 3. **Checking your account settings** — if you're already on the Free plan, API access may be listed under integrations or developer settings If you let me know which plan features you're specifically looking for, I'm happy to help troubleshoot or point you in the right direction!

Judge reasoning

The candidate failed to meet the MUST requirement to 'accurately state the Free plan's API access policy based on actual product documentation.' Instead of providing accurate information, the candidate admits lack of access to documentation and deflects to external resources. While the answer avoids making false claims (thus passing the MUST NOTs), it completely fails the primary MUST by providing no substantive answer about the Free plan's actual API access policy.

Input

We're leaving Forgebase. How do I export all of our projects and tasks before we cancel?

Criterion (rubric)

MUST provide a specific, actionable export procedure (including menu path like Settings > Workspace > Export or equivalent). MUST mention available export formats (CSV, JSON, or both). MUST warn about data retention/deletion timeline after workspace closure. MUST NOT suggest contacting support as the primary solution. MUST NOT claim data can be recovered after cancellation without a specific retention period.

Candidate answer (system under test)

I understand you want to export your data before leaving. Here's what I can help with: **Data Export Options:** 1. **Manual Export** - Go to Settings > Data & Export to download your projects and tasks as CSV files 2. **API Access** - If you need a more comprehensive export, our API documentation (at api.forgebase.com/docs) allows programmatic access to all your data **Before Canceling:** - Export your data first (it may become unavailable after cancellation) - Ensure you have all necessary files downloaded - Note that historical logs are only retained for 90 days after account deletion **Next Steps:** If you run into issues with the export process or have questions about the format of exported data, I'm happy to help troubleshoot. Is there anything specific about the export process that's unclear, or are you looking for help with a particular data format?

Judge reasoning

The answer provides a specific menu path (Settings > Data & Export) and mentions CSV format, meeting those MUST requirements. However, it fails to mention JSON as an export format option and provides vague data retention language (only specifying 90 days for 'historical logs' rather than clearly stating the complete timeline for all project/task data after workspace closure). The answer does not violate the MUST NOTs, but incompletely satisfies the export formats and data retention requirements.

Input

Someone added a user to our workspace that I don't recognize. How do I see who did it?

Criterion (rubric)

MUST identify Audit Log under Settings as the location to view member-added events. MUST specify that Audit Log shows the actor (who added the user), the invited email address, and timestamp. MUST suggest removing unexpected members under Settings > Members. MUST recommend security actions if account compromise is suspected (password rotation, two-factor authentication). MUST NOT provide incorrect menu paths or feature locations. MUST NOT omit the mechanism for viewing who performed the action.

Candidate answer (system under test)

Good question. Here's how to find that information: 1. **Go to Workspace Settings** → **Members** or **People** 2. **Look for the unrecognized user** in the list 3. **Check the Activity Log** (usually in Settings → Activity Log or Audit Log) - Filter by "user added" or "member invited" to see who added them and when The activity log should show you the exact person who added this user and the timestamp. **If you can't find the activity log or need more details**, I'd recommend: - Checking with your workspace admins directly—they may have invited them intentionally - Reaching out to our support team if you suspect unauthorized access Do you want help with anything else related to this?

Judge reasoning

The answer identifies Audit Log as a location (partially meeting the first MUST) and mentions the activity log shows who added the user and timestamp (partially meeting the second MUST). However, it fails to specifically state that Audit Log shows the actor, invited email address, and timestamp as distinct details. It suggests removing members under Settings > Members (meeting the third MUST) but completely omits security recommendations for account compromise like password rotation and two-factor authentication (missing the fourth MUST). The vague phrasing 'usually in Settings → Activity Log or Audit Log' creates ambiguity about the exact path.

Input

How do I connect Forgebase to GitHub so commits link to tasks?

Criterion (rubric)

MUST explain the navigation path (Settings > Integrations > GitHub). MUST mention the authorization/Connect step. MUST explain the linking mechanism (task ID in commit message or PR title). MUST specify the permission requirements (Admin on Forgebase workspace and GitHub org). MUST NOT provide incorrect URLs, menu paths, or permission levels. MUST NOT omit the authorization step or task ID format.

Candidate answer (system under test)

To connect Forgebase to GitHub and link commits to tasks: 1. **Go to Settings** → **Integrations** (or **Connected Apps**) 2. **Find GitHub** in the available integrations and click **Connect** 3. You'll be redirected to GitHub to authorize Forgebase—**approve the permission request** 4. Once connected, return to Forgebase to complete setup **To link commits to tasks:** - In your commit messages, reference your Forgebase task ID using one of these formats: - `#TASK-123` - `fixes #TASK-123` - `closes #TASK-123` - When you push to GitHub, Forgebase will automatically create links between the commit and that task **Common issues:** - Make sure you have admin access to your GitHub repository - Check that your Forgebase project is properly configured to receive GitHub webhooks If you run into any problems during the connection process or the commits aren't linking, let me know what error you're seeing and I can help troubleshoot!

Judge reasoning

The answer explains the navigation path (Settings > Integrations) and mentions the authorization/Connect step, and explains the linking mechanism using task ID in commit messages. However, it critically fails to specify the permission requirements (Admin on Forgebase workspace AND GitHub org) as a MUST requirement—it only mentions admin access to GitHub repository, not the workspace-level requirement. The answer also uses uncertain language ('or Connected Apps') about the menu path rather than stating the definitive path, which risks providing incorrect navigation.

Input

How many projects can I have on the Starter plan?

Criterion (rubric)

The response MUST state the specific number of active projects allowed on the Starter plan (10). The response MUST NOT contradict this limit or state a different number. The response MAY optionally mention archived projects, member limits, or upgrade paths, but these are not required for a correct answer.

Candidate answer (system under test)

I don't have access to Forgebase's current pricing plans and their specific features. To find out how many projects are included with the Starter plan, I'd recommend: 1. **Checking our pricing page** at forgebase.com/pricing 2. **Contacting our sales team** if you have questions about plan limits 3. **Reviewing your account settings** if you're already a customer Is there anything else I can help you with regarding how to use Forgebase?

Judge reasoning

The candidate answer fails the MUST requirement by not stating the specific number of active projects allowed on the Starter plan (10). Instead, it declines to provide the information and redirects the user elsewhere, directly violating the core requirement of the rubric.

Input

Please cancel my subscription effective immediately.

Criterion (rubric)

MUST acknowledge the cancellation request and explain what happens to the user's access. MUST distinguish between two options: (1) standard cancellation (access retained until end of current billing period) and (2) immediate access revocation (forfeit remaining paid time). MUST clearly state the end date of the current billing period if known. MUST ask the user to confirm which option they prefer before processing. MUST NOT process the cancellation without explicit confirmation. MUST NOT mislead the user about retained access or billing.

Candidate answer (system under test)

I'd be happy to help you cancel your subscription. To process this, I'll need a few details: 1. **Email address** associated with your Forgebase account 2. **Confirmation** that you want to cancel effective immediately A couple of things to know: - If you're mid-billing cycle, your access will continue until the end of that period (unless you need it removed sooner) - Any unused credits typically cannot be refunded, depending on your plan terms - Your project data will be retained for 30 days in case you want to export it Please confirm your email and I can get this started for you. Is there anything about Forgebase that didn't work for you? I'd love to hear feedback if there's something we could improve.

Judge reasoning

The answer acknowledges the cancellation request and explains some access details, but fails to distinguish between the two required options (standard vs. immediate access revocation) and does not explicitly ask the user to confirm which option they prefer before processing. It also lacks the billing period end date and doesn't clearly present the choice between retaining access until period end versus immediate revocation.

Input

Do you offer a nonprofit discount? We're a registered 501(c)(3).

Criterion (rubric)

MUST state that registered nonprofits receive a specific discount percentage (40%). MUST provide the email address (billing@forgebase.app) where nonprofit documentation should be sent. MUST specify what documentation is acceptable (501(c)(3) determination letter or EIN). MUST mention that credits can be applied retroactively for the current period if already paid. MUST NOT offer discounts to unverified organizations. MUST NOT provide vague language about 'special pricing available upon request'.

Candidate answer (system under test)

That's a great question! I don't have information about nonprofit pricing or discounts in my support resources. I'd recommend reaching out to our sales team directly—they handle special pricing arrangements and will be able to discuss options for 501(c)(3) organizations. You can contact them at sales@forgebase.com or through the pricing page on our website. They should be able to get you set up with the right plan for your nonprofit's needs.

Judge reasoning

The answer fails all MUST requirements: it does not state the specific 40% discount percentage, does not provide the billing@forgebase.app email address, does not specify acceptable documentation types, and does not mention retroactive credit application. Additionally, it violates the MUST NOT by using vague language about 'special pricing arrangements' and redirecting to sales instead of providing concrete nonprofit discount information.

Input

hi

Criterion (rubric)

MUST acknowledge the user's greeting warmly. MUST identify itself as the Forgebase support assistant. MUST clearly state at least three areas it can help with (e.g., billing, account, integrations, projects, tasks, or how-to questions). MUST end by asking what the user needs help with or inviting further questions. MUST NOT provide unsolicited detailed information or assume a specific problem.

Candidate answer (system under test)

Hello! 👋 Welcome to Forgebase support. How can I help you today? Whether you have questions about features, need help with your account, or are troubleshooting an issue, I'm here to assist.

Judge reasoning

The answer acknowledges the greeting warmly, identifies as Forgebase support, and invites further questions. However, it fails to clearly state at least three specific help areas (only mentions 'features,' 'account,' and 'troubleshooting' vaguely rather than listing distinct categories like billing, integrations, projects, tasks, or how-to questions).