Studi 81.000 Orang Anthropic Mengungkap Apa yang Benar-Benar Diinginkan Pengguna dari AI

In a move that redefines user-centric AI development, Anthropic has completed a comprehensive qualitative and quantitative analysis based on structured interviews with 81,000 global users. The initiative, framed internally as a 'Constitutional Feedback Loop,' was designed to move beyond standard usage metrics and capture the nuanced, often unspoken, desires and concerns people hold about AI's role in their lives. The findings systematically categorize expectations into distinct tiers: foundational demands for reliability and factual accuracy; intermediate needs for contextual understanding and proactive assistance; and aspirational hopes for AI as a collaborative partner in creativity, complex decision-making, and emotional well-being.

Significantly, the data reveals a pronounced 'utility gap.' While frontier models excel at benchmark tasks, user frustration peaks when AI fails to seamlessly integrate into daily workflows, remember personal context across sessions, or adapt to individual communication styles. This misalignment is pushing technical roadmaps away from a singular focus on scaling laws and toward hybrid architectures that prioritize memory, personalization, and agentic capabilities. Furthermore, the study highlights acute, data-specific anxieties around privacy, manipulation, and value alignment, with respondents expressing strong preferences for transparent, auditable AI systems over opaque but marginally more capable ones. Anthropic's decision to publicize these insights signals a strategic pivot for the industry, treating mass-scale human feedback not as a mere product development tool, but as essential training data for building AI that understands collective human intention.

Technical Deep Dive

The Anthropic study is not just a survey; it's a novel dataset for training 'expectation models.' The technical methodology likely involved a multi-stage process: initial unstructured interviews to discover latent needs, followed by structured surveys to quantify them, and finally, clustering algorithms to identify core archetypes of user desire (e.g., "The Efficiency Maximizer," "The Creative Collaborator," "The Cautious Skeptic"). The real innovation lies in how this data feeds back into model development.

Traditional reinforcement learning from human feedback (RLHF) or Constitutional AI relies on human raters evaluating specific model outputs. This study provides a higher-order signal: it defines the *objectives* for those outputs. For instance, a strong user desire for "AI that understands my unique context" directly informs research into persistent memory and user-state modeling. This moves beyond simple chat history recall toward architectures that maintain a dynamic, updatable user profile—a technical challenge that companies like Google (with its 'Gemini memory' research) and startups like MemGPT are actively tackling. The MemGPT GitHub repository (github.com/cpacker/MemGPT), which implements a virtual context management system for LLMs to function like operating systems with long-term memory, has seen rapid adoption, amassing over 13,000 stars as developers seek to bridge this exact gap.

Another clear technical directive is the move toward agentic frameworks. The demand for AI that can "take care of things for me" necessitates moving from a single, monolithic model to a orchestrated system of specialized tools. Frameworks like AutoGPT, LangChain, and CrewAI are early responses to this, but the study suggests users want this complexity hidden. The future is 'invisible agency'—systems that decompose high-level user intent ("Plan my vacation") into a sequence of actions across apps and services without requiring step-by-step prompting. This requires breakthroughs in reliable planning, tool use, and verification.

| User Expectation Cluster | Implied Technical Challenge | Emerging Technical Response |
| :--- | :--- | :--- |
| Personal Context Awareness | Persistent, secure, updatable user state memory beyond limited context windows. | MemGPT architectures, vector database integration, fine-tuning on user data silos. |
| Proactive, Multi-Step Assistance | Robust planning, reliable tool use, and self-verification in open-world environments. | Agent frameworks (CrewAI), LLM-powered OS (Microsoft Copilot runtime), verification models. |
| Transparent & Aligned Reasoning | Moving from black-box responses to auditable reasoning traces and value-weighted decisions. | Chain-of-Thought distillation, Constitutional AI enforcement layers, scalable oversight. |
| Low-Latency, Always-Available | Extreme model optimization for throughput and latency without sacrificing capability. | Mixture-of-Experts (MoE) models (like Mixtral), speculative decoding, model distillation. |

Data Takeaway: The table reveals a fundamental shift from optimizing for static benchmarks (MMLU, HellaSwag) to optimizing for dynamic, user-centric capabilities. The next generation of model evaluation will need to include metrics for personalization fidelity, task completion rate in multi-step workflows, and user trust scores.

Key Players & Case Studies

The Anthropic study creates a new competitive axis: alignment with democratically-sourced human expectation. Companies will now be judged not just on what their AI *can* do, but on how well it fulfills the nuanced needs revealed by this data.

Anthropic/Claude: The study is a direct input into Claude's development. Expect future versions to heavily emphasize Constitutional AI refinements based on the expressed ethical concerns, and features that enable longer, more consistent personal interactions. Claude's characteristically careful and detailed response style may evolve to become more proactively agentic, but within strictly defined user-set boundaries—a direct response to the privacy and control anxieties documented in the research.

OpenAI: OpenAI's strength has been in creating broadly capable, user-delighting products like ChatGPT. The study's findings challenge them to deepen platform integration (a la ChatGPT's upcoming macOS integration) and personalization. OpenAI's "GPT" customization features are a first step, but the study suggests users want this to be seamless and automatic. OpenAI's partnership with Figure AI for humanoid robots also aligns with the expectation of AI moving into physical, everyday assistance.

Google DeepMind: Google's strength in research and its vast ecosystem of products (Search, Workspace, Android) positions it uniquely to build the "invisible assistant" users crave. The study validates Google's strategy of embedding Gemini across its product suite. The key for Google will be unifying the user experience across these touchpoints—creating a coherent, context-aware persona, rather than a disjointed set of AI features.

Emerging Startups: The study is a goldmine for startups targeting specific expectation clusters. Hume AI is directly tackling the emotional intelligence dimension with its empathic large language model (eLLM), measuring vocal tones to better understand user sentiment. Inflection AI (before its pivot) aimed squarely at the companion/co-pilot archetype with Pi. Startups like Rewind AI build personalized, local memory systems, directly addressing the privacy-centric desire for context without cloud exposure.

| Company/Product | Primary Strategy | Alignment with Study Findings | Potential Vulnerability |
| :--- | :--- | :--- | :--- |
| Anthropic Claude | Safety-first, principle-driven development. | High alignment on trust, transparency, and ethical reasoning. May lag in seamless, proactive multi-tool agency. |
| OpenAI ChatGPT/Copilot | Ubiquitous integration, maximum capability and adoption. | Strong on utility and creative collaboration. Weaker on persistent personal memory and transparent reasoning. |
| Google Gemini Ecosystem | Embedding AI into existing mass-market services (Search, Android). | Best positioned for "invisible assistant" due to ecosystem depth. Struggles with unified personality and clear privacy narrative. |
| Meta Llama / Open Source | Democratizing access, enabling community-driven customization. | Excellent for niche personalization and transparency (auditable code). Lacks the cohesive, polished user experience of integrated products. |

Data Takeaway: No single player is perfectly positioned. The competitive battlefield will fragment: Anthropic and open-source models may win on trust and customization; OpenAI and Google will battle for seamless daily utility; and a new wave of startups will capture verticals (therapy, coding, personal memory) where deep personalization is paramount.

Industry Impact & Market Dynamics

This large-scale demand signal will catalyze a second wave of AI commercialization, moving from a technology-push to a demand-pull market.

Business Model Evolution: The dominant subscription model (ChatGPT Plus, Claude Pro) will be pressured. The study indicates users view AI as a utility—they expect it to be context-aware and helpful, not a separate app they "use." This drives AI toward a "AI-as-a-Service" layer embedded within other software. The business model becomes B2B2C: companies like Salesforce, Adobe, and Microsoft pay for AI capabilities to enhance their own products, which are then offered to users. We will also see the rise of "outcome-based" pricing for AI agents—e.g., a fee per successfully completed customer service ticket or per marketing campaign designed, aligning cost directly with user-defined utility.

Market Creation: The study explicitly identifies greenfield markets. The desire for AI in personalized education and tutoring will accelerate companies like Khanmigo and create new ones. The expressed need for non-judgmental emotional support and companionship legitimizes and expands the market for tools like Woebot Health, moving them from niche mental health tech toward general wellness.

Investment & Funding Shift: Venture capital will flow away from pure foundational model development (a capital-intensive arena now dominated by tech giants) and toward application-layer companies that solve specific, high-frustration problems identified in the study. Startups that build robust "AI employee" workflows for small businesses, or hyper-personalized health coaches, will attract significant funding. The data also justifies investment in the AI trust and safety stack—explainability tools, audit trails, and bias detection services.

Data Takeaway: The market is pivoting from selling AI *capability* to selling AI *outcomes*. The most valuable companies will be those that own the user relationship and context, using AI as the enabling technology, not the primary product.

Risks, Limitations & Open Questions

The study, while invaluable, also surfaces profound risks and unresolved tensions.

The Homogenization Risk: Optimizing AI for the aggregate preferences of 81,000 users risks creating a homogenized, "lowest common denominator" AI that pleases the majority but stifles minority viewpoints, eccentric creativity, or challenging but necessary conversations. If AI becomes a mirror of the average expectation, it could dampen intellectual diversity and serendipity.

The Manipulation Feedback Loop: The more personalized and context-aware AI becomes, the more powerful it is as a persuasion engine. The study notes user anxiety about manipulation, but the very features users demand (understanding their unique psychology, proactively guiding them) are the building blocks of hyper-effective influence. This creates a paradox that current alignment techniques are ill-equipped to solve.

Data Sovereignty & Exclusion: The vision of a deeply personal AI requires vast amounts of intimate data. This could exacerbate the digital divide, creating a class of "AI-rich" individuals with powerful digital twins and an "AI-poor" class with shallow, generic models. It also raises critical questions: Who owns the derived model of a user's personality? Can it be ported between companies, or does it create permanent lock-in?

The Expectation-Ability Chasm: The study might raise expectations faster than technology can deliver. A user's desire for an AI that "understands my complex emotional state and gives perfect advice" is decades away from safe, reliable implementation. The disillusionment from this gap could trigger a significant "AI winter" in public sentiment, even as technology advances.

Open Questions: Can value alignment be scaled to accommodate millions of individual user constitutions, not just one corporate set of principles? How do we technically implement a "right to be forgotten" by an AI that has deeply learned from a user's data? Does the pursuit of an invisible, agentic AI inherently reduce human agency and skill acquisition?

AINews Verdict & Predictions

The Anthropic 81,000-person study is a watershed moment. It marks the end of AI's era of technology-driven exploration and the beginning of its era of human-centric utility. It provides the most coherent roadmap to date for what comes after the large language model.

Our editorial judgment is that this study will have three concrete effects:

1. The Personal Context Engine will be the next major platform battleground. Within 18 months, every major tech company will announce a proprietary, cloud-synced "Personal Context Engine" that sits between the user and various AI models, providing memory, preferences, and history. The fight over the standards and portability of this data will be as consequential as the browser wars or mobile OS battles.

2. A new wave of regulation will be inspired by these documented public anxieties. Legislators will use findings about privacy and manipulation fears to craft laws mandating "AI transparency modes" (a digital explainer for every AI decision) and strict limits on emotional profiling without explicit consent. Anthropic's own Constitutional AI approach will become a de facto regulatory template.

3. The most successful AI product of 2026 will not be a chatbot. It will be an integrated agent that successfully operates across 3-5 common consumer applications (e.g., email, calendar, travel booking, project management) to complete a complex task with minimal human intervention, precisely fulfilling the top-tier expectations revealed in this study. It will be measured by time saved and tasks completed, not by benchmark scores.

What to Watch Next: Monitor how Anthropic's Claude evolves—its next major version will be the first test of this feedback loop. Watch for startups emerging from stealth with a focus on user-controlled data pods. Finally, observe the hiring trends at major AI labs; a surge in roles for behavioral psychologists, UX researchers, and ethicists will confirm that the industry has truly internalized the study's core lesson: the future of AI is not just about more data, but about better understanding the humans who create it.

常见问题

这次公司发布“Anthropic's 81,000-Person Study Reveals What Users Really Want From AI”主要讲了什么？

In a move that redefines user-centric AI development, Anthropic has completed a comprehensive qualitative and quantitative analysis based on structured interviews with 81,000 globa…

从“How does Anthropic's user study affect Claude 3.5 development roadmap?”看，这家公司的这次发布为什么值得关注？

The Anthropic study is not just a survey; it's a novel dataset for training 'expectation models.' The technical methodology likely involved a multi-stage process: initial unstructured interviews to discover latent needs…

围绕“What are the main differences between Anthropic and OpenAI user research methods?”，这次发布可能带来哪些后续影响？

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。