Technical Deep Dive
The 'Earned vs. Burned' skill operates as a meta-layer on top of Claude's existing capabilities. It does not modify the underlying model architecture but instead introduces a structured prompt template and output parser that guides the user through a cost-benefit analysis. The core mechanism is a two-step process: first, the user defines a set of 'earned' metrics—these can be quantitative (e.g., 'number of customer support tickets resolved,' 'lines of code generated,' 'dollars in sales attributed') or qualitative (e.g., 'improved user satisfaction score,' 'reduced error rate'). Second, the user specifies 'burned' metrics, which typically include API token usage (input + output tokens), compute time (in seconds or GPU-hours), and any ancillary costs like human review time or API call failures.
Under the hood, the skill leverages Claude's ability to parse structured data and perform arithmetic. It likely uses a system prompt that instructs the model to extract the user's defined metrics, calculate totals, and then compute a net value score using a user-defined weighting system. For example, a user might assign a weight of $10 per resolved ticket and $0.003 per 1,000 tokens consumed. The skill then outputs a summary table like:
| Metric | Earned | Burned | Net |
|---|---|---|---|
| Tickets Resolved | 150 | — | 150 |
| Token Cost | — | 45,000 | -$0.135 |
| Human Review Time | — | 2 hours | -$40 |
| Net Value | $1,500 | -$40.135 | $1,459.87 |
*Data Takeaway: This table demonstrates the skill's core function: converting abstract AI contributions into a concrete financial ledger. The net value figure becomes the single metric for assessing whether an AI deployment is worth continuing.*
The skill's design is intentionally minimalistic, avoiding complex integrations. It does not require external APIs or databases; all data is provided by the user in natural language. This lowers the barrier to entry but also limits its utility for automated, real-time monitoring. For that, developers would need to build custom pipelines that feed usage logs into the skill. The open-source community has already started exploring this. A GitHub repository named 'claude-roi-tracker' (recently starred 340 times) provides a Python wrapper that logs Claude API calls and automatically generates 'Earned vs. Burned' reports. Another repo, 'llm-cost-calculator' (1,200 stars), offers a more granular breakdown of token costs across different providers, which could be integrated to enhance the skill's accuracy.
Key Players & Case Studies
The 'Earned vs. Burned' skill is a first-party creation by Anthropic, the company behind Claude. This is significant because it shows Anthropic is actively thinking about enterprise adoption barriers. While OpenAI has focused on raw capability improvements (e.g., GPT-4o's multimodal speed) and Google DeepMind on research breakthroughs (e.g., Gemini's long-context windows), Anthropic is differentiating on accountability and trust. The skill aligns with their broader 'constitutional AI' philosophy, extending it from safety to business value.
Early adopters include a mid-sized e-commerce company that used the skill to audit its customer service chatbot. The company defined 'earned' as the number of successfully resolved queries (with a 90%+ satisfaction rate) and 'burned' as API costs plus human escalation time. The audit revealed that while the chatbot handled 70% of queries, the cost per resolved query was $0.12, compared to $2.50 for human agents—a 95% cost reduction. However, the audit also flagged that queries involving refunds had a 40% escalation rate, indicating a need for model fine-tuning on that specific domain.
Another case involves a software development team using Claude to generate unit tests. They set 'earned' as the number of tests passing on the first run and 'burned' as the time spent reviewing and fixing generated tests. The initial net value was negative because the model generated many false positives. After adjusting the prompt to include more context about the codebase, the net value turned positive, with a 3:1 ratio of time saved versus time spent on review.
A comparison of how major AI providers address ROI measurement:
| Provider | ROI Tool/Method | Key Feature | Limitation |
|---|---|---|---|
| Anthropic (Claude) | 'Earned vs. Burned' skill | User-defined metrics, transparent ledger | Manual input, no real-time tracking |
| OpenAI | Usage dashboard + cost calculator | Automated token tracking, per-model cost | No earned metrics; only shows cost, not value |
| Google (Vertex AI) | Model Garden + cost monitoring | Integration with GCP billing, pre-built templates | Complex setup, requires cloud infrastructure |
| Open-source (LangChain) | Callbacks + custom evaluators | Highly customizable, code-level control | Requires significant engineering effort |
*Data Takeaway: Anthropic's skill is the only tool that directly asks users to define 'earned' value, making it the most business-oriented solution. However, its manual nature limits scalability compared to automated dashboards from OpenAI and Google.*
Industry Impact & Market Dynamics
The 'Earned vs. Burned' skill arrives at a critical juncture. Enterprise AI spending is projected to reach $150 billion by 2027 (Gartner), but a 2024 survey by a major consulting firm found that only 15% of organizations have a formal process for measuring AI ROI. This disconnect creates a massive opportunity for tools that bridge the gap between technical capability and business value.
The skill's emergence could catalyze several market shifts:
1. Rise of AI Auditors: A new role—AI Value Auditor—may emerge, similar to how cloud cost optimization created the FinOps specialist. These auditors would use tools like 'Earned vs. Burned' to regularly assess AI deployments and recommend optimizations.
2. Product Differentiation: AI-as-a-service platforms (e.g., Jasper, Copy.ai) will likely incorporate similar value-tracking features into their dashboards. Startups that fail to provide ROI transparency may lose enterprise contracts.
3. Pricing Model Evolution: API pricing may shift from pure token-based to value-based models. For example, Anthropic could offer a 'net value guarantee' where customers only pay if the earned value exceeds burned costs. This would be a powerful competitive moat.
Market growth data:
| Year | Global AI Software Revenue (USD) | % of Enterprises with Formal AI ROI Process |
|---|---|---|
| 2023 | $62.5 billion | 8% |
| 2024 | $85.0 billion | 12% |
| 2025 (est.) | $110.0 billion | 18% |
| 2027 (proj.) | $150.0 billion | 30% |
*Data Takeaway: The rapid growth in AI software revenue is not matched by a corresponding increase in ROI measurement adoption. This gap represents both a risk (wasteful spending) and an opportunity (for tools that close the measurement gap).*
Risks, Limitations & Open Questions
Despite its promise, the 'Earned vs. Burned' skill has significant limitations:
1. Subjectivity of 'Earned' Metrics: The skill relies entirely on user-defined metrics, which can be gamed. A team might inflate 'earned' values (e.g., counting trivial tasks as high-value) to justify continued AI use. Without external validation, the ledger can become a self-serving document.
2. Ignoring Indirect Costs and Benefits: The framework only captures direct, measurable costs and benefits. It does not account for intangible factors like employee morale (AI reducing tedious work) or brand reputation (AI customer service failures). A narrow focus on net value could lead to underinvestment in transformative but hard-to-measure AI applications.
3. Lack of Standardization: Without industry-wide standards for what constitutes 'earned' value, comparisons across teams or companies are meaningless. A 'resolved ticket' for one company might be a simple password reset; for another, it could be a complex technical support issue. This limits benchmarking.
4. Potential for Misuse: Managers could use the skill to justify layoffs by showing that AI 'earns' more than human workers on a per-task basis, ignoring the broader context of human judgment, creativity, and oversight.
Ethical concerns also arise. If AI systems are optimized solely for 'earned vs. burned' metrics, they may be tuned to maximize short-term, quantifiable outputs at the expense of quality, safety, or long-term value. For example, a content-generation AI might be incentivized to produce high volumes of low-quality articles that generate ad revenue (earned) while ignoring the reputational damage (unmeasured).
AINews Verdict & Predictions
The 'Earned vs. Burned' skill is a deceptively simple but profoundly important innovation. It is not a technical breakthrough but a conceptual one: it forces the AI industry to confront the question of value. For too long, the conversation has been dominated by what AI *can* do, not what it *should* do for a business. This skill flips the script.
Our predictions:
1. Within 12 months, every major AI model provider will offer a similar ROI-tracking feature, either natively or through partnerships. Anthropic's first-mover advantage here is real but temporary.
2. The 'Earned vs. Burned' framework will become a standard chapter in AI procurement contracts. Enterprises will demand that vendors provide a value ledger as part of their service-level agreements.
3. A new category of AI Value Management (AVM) software will emerge, combining usage tracking, cost allocation, and outcome measurement. Startups in this space will attract significant venture capital—expect at least $500 million in funding to flow into AVM tools by 2026.
4. The biggest losers will be AI applications that cannot demonstrate clear, positive net value. Chatbots for low-stakes tasks (e.g., simple FAQ bots) will be replaced by more targeted, high-value AI agents. The era of 'AI for AI's sake' is ending.
Our verdict: The 'Earned vs. Burned' skill is a wake-up call. It tells developers, product managers, and executives that the honeymoon is over. AI must now earn its keep. Those who embrace this mindset will build sustainable, value-driven AI operations. Those who ignore it will find their budgets slashed. The future of AI is not about the biggest model—it's about the best net value.