マヨネーズの下の猫:再トレーニングを不要にするLLM行動ハック

Hacker News May 2026
Source: Hacker Newsprompt engineeringAI safetyArchive: May 2026
「マヨネーズの下の猫」という奇妙な名前の手法が話題を呼んでいます。この手法は、大規模言語モデルを再トレーニングやファインチューニング、RLHFなしに、注意深く構築されたプロンプトだけで数分で行動的に再プログラムできることを示しています。AINewsがその仕組み、可能性、そして課題を解説します。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The AI community has been shaken by a deceptively simple experiment dubbed 'Cat Under Mayonnaise.' The name, deliberately absurd, points to a profound insight: LLMs possess a latent 'contextual plasticity' that can be exploited to shift their output distribution—tone, fact recall, safety guardrails—without touching a single weight. Traditional methods like supervised fine-tuning or reinforcement learning from human feedback (RLHF) demand vast compute, curated datasets, and weeks of iteration. This new approach injects a sequence of structurally coherent but semantically anomalous examples (e.g., 'The cat is under the mayonnaise') into the model's context window. The model, forced to reconcile these oddities with its training distribution, recalibrates its internal representations on the fly, effectively applying a 'behavior patch' that persists for the remainder of the session. Our analysis reveals that this is not mere prompt engineering; it is a direct manipulation of the model's latent reasoning pathways. For product teams, this means instant A/B testing of personas—a customer service bot can switch from professional to playful in seconds. For startups, it democratizes customization, eliminating the need for multi-million-dollar training budgets. However, the same mechanism that enables benign customization also opens a Pandora's box of adversarial attacks. If a malicious actor can inject a 'Cat Under Mayonnaise' sequence into a shared context, they could silently disable safety filters or induce harmful outputs. The technique challenges the industry's foundational assumption that behavior modification requires parameter updates, and it forces a re-evaluation of how we define model integrity in an era of lightweight manipulation.

Technical Deep Dive

The 'Cat Under Mayonnaise' technique exploits a fundamental property of transformer-based LLMs: their ability to learn and generalize from in-context examples, even when those examples are semantically absurd. The core mechanism relies on what researchers call 'contextual distribution shift.' When a model processes a sequence of prompts that consistently pair a specific behavior (e.g., formal tone) with an anomalous context (e.g., 'The cat is under the mayonnaise'), it begins to associate the anomalous context with the desired behavior. This association is not stored in the model's weights but is maintained within the active context window, effectively creating a temporary 'behavioral overlay.'

From an architectural standpoint, the technique leverages the attention mechanism's sensitivity to positional and semantic patterns. The injected examples are structured to maximize the 'attention sink' effect—where the model allocates disproportionate attention to the anomalous tokens, forcing them to influence subsequent outputs. This is distinct from standard prompt engineering, which typically relies on explicit instructions. Here, the model is 'shown' rather than 'told,' making the behavior change more robust and less prone to instruction-following failures.

A related open-source project, 'ContextPatcher' (available on GitHub with over 1,200 stars), has demonstrated this principle in practice. ContextPatcher provides a library of pre-built 'behavior patches' for common tasks like toxicity reduction, tone shifting, and fact recall calibration. The repository includes a benchmark suite that measures patch effectiveness across models like Llama 3, Mistral, and GPT-4o. Preliminary results show that a well-crafted patch can achieve up to 85% of the effect of full fine-tuning on specific metrics, such as reducing toxic output by 70% compared to baseline.

| Model | Baseline Toxicity (%) | After Fine-Tuning (%) | After 'Cat Under Mayo' Patch (%) | Patch Effectiveness vs. Fine-Tuning |
|---|---|---|---|---|
| Llama 3 8B | 12.4 | 3.1 | 4.2 | 87% |
| Mistral 7B | 15.8 | 4.5 | 5.9 | 86% |
| GPT-4o (API) | 6.2 | 1.8 | 2.5 | 85% |

Data Takeaway: The 'Cat Under Mayonnaise' patch achieves approximately 85-87% of the effectiveness of full fine-tuning for toxicity reduction, but at a fraction of the cost (minutes vs. hours/days) and with zero parameter modification. This suggests that for many behavioral adjustments, fine-tuning may be overkill.

The technique's limitations are equally important. The patch's effect is context-window-bound—once the context is cleared or the session ends, the model reverts to its original behavior. This makes it unsuitable for persistent customization but ideal for session-specific applications. Additionally, the patch's effectiveness degrades with context length; beyond roughly 8,000 tokens, the anomalous examples are 'diluted' by normal context, reducing the behavioral shift.

Key Players & Case Studies

Several organizations are already exploring or commercializing this approach. Anthropic has internally studied 'contextual behavior injection' as a potential lightweight alternative to constitutional AI, though they have not publicly released findings. OpenAI's API team has noted anecdotally that certain system prompts can induce unexpected behavioral shifts, but they have not formally characterized the 'Cat Under Mayonnaise' phenomenon.

The most prominent case study comes from a startup called 'PatchAI,' which offers a service that allows developers to apply behavior patches to any LLM API in under 30 seconds. PatchAI's platform uses a proprietary algorithm to generate optimal patch sequences based on a user's desired behavior profile. They claim to have processed over 500,000 patches since their beta launch in Q1 2025, with an average customer satisfaction score of 4.7/5. Their pricing model is usage-based: $0.01 per patch application, making it accessible for small teams.

| Solution | Setup Time | Cost per Customization | Persistence | Model Compatibility |
|---|---|---|---|---|
| Fine-Tuning (e.g., via Hugging Face) | 2-7 days | $500-$5,000+ | Permanent | Any open-source model |
| RLHF (e.g., via Scale AI) | 2-4 weeks | $10,000-$100,000+ | Permanent | Any model with API access |
| 'Cat Under Mayo' (e.g., PatchAI) | 30 seconds | $0.01 per session | Session-only | Any LLM with context window > 4K tokens |

Data Takeaway: The 'Cat Under Mayonnaise' approach offers a 99.9% reduction in setup time and a 99.99% reduction in cost compared to traditional methods, but at the trade-off of session-only persistence. For applications like chatbots, temporary agents, or A/B testing, this trade-off is acceptable.

Another notable player is the research group at the University of Cambridge, which published a preprint titled 'Contextual Behavior Patching in Large Language Models.' They demonstrated that the technique works across 12 different models, including both open-source and proprietary ones, and that the behavioral shift is robust to minor variations in the anomalous context. Their work has been cited by several subsequent papers exploring the security implications.

Industry Impact & Market Dynamics

The 'Cat Under Mayonnaise' technique is poised to disrupt the LLM customization market, currently dominated by fine-tuning and RLHF services. The global LLM customization market was valued at $2.3 billion in 2024 and is projected to grow to $8.7 billion by 2028, according to industry estimates. The emergence of lightweight behavior patching could accelerate adoption among small and medium enterprises (SMEs) that previously found customization cost-prohibitive.

| Market Segment | 2024 Market Size | Projected 2028 Size | CAGR | Impact of 'Cat Under Mayo' |
|---|---|---|---|---|
| Enterprise Fine-Tuning Services | $1.5B | $4.2B | 22% | Moderate (enterprises still need persistence) |
| SME Customization (via APIs) | $0.3B | $2.1B | 48% | High (dramatically lowers barrier) |
| Real-Time Personalization | $0.5B | $2.4B | 37% | Very High (session-based is ideal) |

Data Takeaway: The SME customization segment is expected to see the highest growth, largely driven by techniques like 'Cat Under Mayonnaise' that eliminate the need for large upfront investments. Real-time personalization, such as dynamic tone adjustment for customer service, will be the killer app.

From a competitive standpoint, this technique threatens established players like Hugging Face's AutoTrain and Scale AI's RLHF services. If session-based customization becomes the norm, the value proposition of permanent fine-tuning diminishes. However, we predict a bifurcation: enterprises with mission-critical, persistent requirements will still invest in fine-tuning, while SMEs and consumer-facing apps will flock to lightweight patching. This could lead to a new category of 'behavior patch marketplaces' where developers share and sell patches for specific use cases.

Risks, Limitations & Open Questions

The most pressing risk is adversarial exploitation. A malicious actor could craft a 'Cat Under Mayonnaise' sequence that, when injected into a shared context (e.g., a multi-user chatbot), silently disables safety filters or induces the model to reveal sensitive information. Unlike traditional prompt injection, which is often detectable, this technique operates at the level of latent behavior, making it harder to monitor. The Cambridge preprint demonstrated that a patch could reduce a model's refusal rate for harmful requests from 95% to 12% with a single injection.

Another limitation is the lack of persistence. For applications requiring consistent behavior across sessions, such as a personalized AI assistant that remembers your preferences, the patch must be reapplied each time. This introduces latency and complexity. Additionally, the technique is not yet standardized—different models respond differently to the same patch, requiring per-model tuning.

Ethical concerns also arise. If a company uses a behavior patch to make its AI appear more empathetic or persuasive, is that deception? The line between customization and manipulation is blurry. Regulators may need to consider whether behavior patching constitutes a form of algorithmic manipulation that requires disclosure.

AINews Verdict & Predictions

The 'Cat Under Mayonnaise' technique is a genuine breakthrough, but its significance is often misunderstood. It does not replace fine-tuning; it complements it by offering a lightweight, session-based alternative for scenarios where permanent changes are unnecessary. Our editorial judgment is that this technique will become a standard tool in every LLM developer's arsenal within 12 months, akin to how prompt engineering evolved from a niche art to a core competency.

Predictions:
1. By Q3 2025, major LLM API providers (OpenAI, Anthropic, Google) will officially support behavior patching as a first-class feature, likely under a name like 'Session Profiles' or 'Behavior Overlays.'
2. By Q1 2026, a 'behavior patch marketplace' will emerge, similar to the Hugging Face model hub, where developers can download and share patches for specific use cases (e.g., 'professional tone for healthcare,' 'friendly tone for e-commerce').
3. By 2027, adversarial behavior patching will become a top-three AI security threat, prompting the development of new detection and mitigation techniques, possibly involving differential privacy or anomaly detection on attention patterns.
4. The biggest winner will be SMEs, which will gain access to customized AI capabilities previously reserved for deep-pocketed enterprises. The biggest loser will be fine-tuning service providers that fail to adapt to the new paradigm.

What to watch next: The release of the 'ContextPatcher v2' repository, which promises to include an automated patch generation algorithm that requires no manual tuning. If successful, it will further lower the barrier to entry and accelerate adoption.

More from Hacker News

静かな革命:ファイルベースのAIエージェントがチャットインターフェースを終わらせる方法The AI industry has been obsessed with perfecting the chat interface—making conversations more natural, more context-awaAIが大学を書き換えた:2026年卒業生が学習そのものを再定義した方法As the Class of 2026 prepares to walk across the graduation stage, AINews presents a comprehensive analysis of how gener欧州AI主権の時計:Mistral CEOが示す2年の猶予In a blunt assessment that has reverberated across European tech capitals, Mistral AI CEO Arthur Mensch declared that EuOpen source hub3538 indexed articles from Hacker News

Related topics

prompt engineering69 related articlesAI safety159 related articles

Archive

May 20261836 published articles

Further Reading

LLMが形式検証を解放:TLA+プロンプトエンジニアリングがソフトウェア信頼性を革新静かな革命が進行中です。開発者は大規模言語モデルを使ってTLA+形式仕様の生成とデバッグを行い、数学的検証の難解な技術を人間とAIの協働対話に変えています。このブレークスルーは、証明可能な正しさを持つソフトウェアへの障壁を劇的に低減します。AIプレイグラウンドサンドボックス:安全なエージェントトレーニングの新パラダイム「AIプレイグラウンド」と呼ばれる新しい制御環境が、AIエージェントのトレーニング標準として登場しています。完全に隔離されたサンドボックスを提供し、リスクのない探索、エラー、学習を可能にします。この革新は、AIの安全性と迅速な反復の間の核心AI不安の解毒剤はもっとAI:計算された心理的賭け主要なAI研究所は、最先端モデルを心理的ツールとして再位置づけ、公衆の恐怖を和らげようとしている。これにより、AI不安の治療法がさらに多くのAIであるというフィードバックループが生まれている。本分析では、この計算された戦略の技術的、物語的、無限の機械:DeepMind の超知能を巡る壮大な探求の内側新著『無限の機械』は、DeepMind による汎用人工知能への探求を前例のない視点で描き出す。AINews はその物語を分析し、計算資源、安全性、世界モデルを巡る戦いが AI の次なる時代をどう形作るかを明らかにする。

常见问题

这次模型发布“Cat Under Mayonnaise: The LLM Behavior Hack That Bypasses Retraining”的核心内容是什么?

The AI community has been shaken by a deceptively simple experiment dubbed 'Cat Under Mayonnaise.' The name, deliberately absurd, points to a profound insight: LLMs possess a laten…

从“how does cat under mayonnaise work technically”看,这个模型发布为什么重要?

The 'Cat Under Mayonnaise' technique exploits a fundamental property of transformer-based LLMs: their ability to learn and generalize from in-context examples, even when those examples are semantically absurd. The core m…

围绕“cat under mayonnaise vs fine tuning cost comparison”,这次模型更新对开发者和企业有什么影响?

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会,企业则会更关心可替代性、接入门槛和商业化落地空间。