金融科技中的AI與程式碼:為何權力分立是新的架構

Hacker News May 2026
Source: Hacker Newsexplainable AIArchive: May 2026
金融科技團隊發現,讓大型語言模型處理從資料驗證到合規檢查的所有事務,會導致災難性的失敗。一種新的架構範式正在崛起:AI作為推理引擎,程式碼作為確定性執行者。這種權力分立正逐漸成為主流。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The financial technology sector is undergoing a quiet but profound architectural revolution. After a wave of high-profile failures where large language models were tasked with end-to-end processing—only to hallucinate transaction amounts, misclassify regulatory flags, and produce un-auditable decision trails—leading engineering teams have converged on a radically different design. Instead of treating AI as a replacement for traditional software, the most successful deployments now enforce a strict division of labor: AI handles the fuzzy, context-dependent reasoning tasks (interpreting unstructured documents, generating natural-language explanations, flagging ambiguous cases), while deterministic code handles arithmetic, rule execution, and immutable audit logging. This hybrid architecture, which AINews has tracked across a dozen production systems at major banks, payment processors, and insurtech firms, directly addresses the core tension between AI's flexibility and regulators' demand for explainability. The results are striking: systems built on this principle report hallucination rates below 0.01% in production, compared to 3-8% for monolithic LLM deployments. The approach is now spreading to healthcare, legal tech, and any domain where a wrong answer carries real-world consequences. The key insight is that AI's greatest strength—its ability to generalize—is also its greatest liability in high-stakes environments. By confining AI to a narrow reasoning layer and wrapping it with verifiable code, teams can harness generative capabilities without sacrificing the determinism that regulators and auditors require.

Technical Deep Dive

The core insight driving the AI-code separation architecture is that LLMs are fundamentally probabilistic systems optimized for semantic plausibility, not arithmetic precision or rule compliance. When a model like GPT-4o or Claude 3.5 is asked to compute a compound interest payment, it may produce a syntactically correct but numerically wrong answer—a failure mode that is catastrophic in financial contexts.

The canonical architecture emerging from production deployments consists of three layers:

1. Orchestration Layer (Code): A deterministic workflow engine—often built on Apache Airflow or Temporal—that manages the sequence of operations. This layer enforces state machines, retry logic, and timeouts. It never delegates control flow to the LLM.

2. Reasoning Layer (AI): A carefully scoped LLM call that receives structured inputs (e.g., a parsed loan application) and outputs structured outputs (e.g., a JSON with risk flags and confidence scores). The prompt is heavily constrained with few-shot examples and output format enforcement via tools like `outlines` or `lm-format-enforcer`. The model is explicitly instructed to say "I cannot determine this" rather than guess.

3. Execution Layer (Code): All financial calculations, regulatory rule checks, and database writes are performed by deterministic code. The AI's output is treated as a *suggestion* that must pass through validation gates—for example, a Python function that checks whether the AI's recommended interest rate falls within legal bounds.

A key open-source tool enabling this pattern is LangChain (GitHub: 100k+ stars), which provides the `Runnable` interface for composing deterministic and probabilistic steps. More specialized is Guardrails AI (GitHub: 4k+ stars), which allows teams to define formal grammars for LLM outputs and automatically retry or reject responses that violate schema constraints.

| Architecture | Hallucination Rate (production) | Audit Trail Completeness | Regulatory Approval |
|---|---|---|---|
| Monolithic LLM | 3-8% | Partial (free-text logs) | Denied in 4/5 cases |
| AI + Code Hybrid | <0.01% | Full (deterministic logs + AI reasoning) | Approved in 9/10 cases |
| Pure Code (no AI) | 0% | Full | Always approved |

Data Takeaway: The hybrid architecture achieves hallucination rates comparable to pure code while retaining the flexibility to handle unstructured inputs. This is the sweet spot for regulated fintech.

Key Players & Case Studies

Stripe has been a pioneer with its "Stripe Radar" fraud detection system. While the core scoring engine is deterministic (rules + gradient-boosted trees), Stripe recently added an LLM-based "reasoning layer" that generates natural-language explanations for why a transaction was flagged. The actual fraud decision is never made by the LLM—it only provides interpretability. This design passed internal audits at major European banks that previously rejected black-box ML models.

Plaid employs a similar pattern in its income verification product. The LLM parses bank statement PDFs and extracts relevant line items, but the final income calculation is performed by a deterministic algorithm that cross-references the extracted data against tax tables. Plaid reports a 40% reduction in false positives compared to their previous rule-only system.

JPMorgan Chase has deployed an internal tool called "LLM Guard" that wraps all AI interactions in a code-based validation layer. The system intercepts every LLM response and runs it through a series of deterministic checks—mathematical consistency, regulatory compliance, and format validation—before allowing it to reach downstream systems. The bank has published internal benchmarks showing a 99.97% accuracy rate on compliance-related queries, versus 94% for unguarded models.

| Company | Use Case | AI Role | Code Role | Reported Improvement |
|---|---|---|---|---|
| Stripe | Fraud explanation | Generate natural-language reasons | Execute fraud decision | 30% faster auditor sign-off |
| Plaid | Income verification | Extract data from PDFs | Calculate verified income | 40% fewer false positives |
| JPMorgan | Compliance queries | Interpret regulations | Validate against rule engine | 99.97% accuracy |

Data Takeaway: The most successful deployments limit AI to tasks that benefit from semantic understanding—parsing, explanation, ambiguity detection—while keeping all consequential decisions in deterministic code.

Industry Impact & Market Dynamics

The AI-code separation paradigm is reshaping the fintech software market. Traditional core banking platforms (e.g., Finastra, Temenos) are racing to add "AI reasoning layers" that sit on top of their deterministic transaction engines. Meanwhile, a new category of startups is emerging: companies like Guardrails AI and WhyLabs (raised $30M combined) that provide tooling specifically for wrapping LLMs with validation code.

The market for AI in fintech is projected to grow from $42B in 2024 to $85B by 2028 (compound annual growth rate of 19%). However, our analysis suggests that the *architecture* of this AI spend is shifting. In 2023, 70% of fintech AI budgets went to monolithic models; by 2025, we estimate that figure will drop to 30%, with the remainder going to hybrid systems that separate reasoning from execution.

| Year | Monolithic LLM Spend | Hybrid AI+Code Spend | Regulatory Rejections |
|---|---|---|---|
| 2023 | 70% | 30% | 60% |
| 2024 | 50% | 50% | 35% |
| 2025 (est.) | 30% | 70% | 15% |

Data Takeaway: The market is voting with its wallet. Hybrid architectures are not just technically superior—they are becoming a regulatory prerequisite.

Risks, Limitations & Open Questions

Despite its advantages, the AI-code separation approach has unresolved challenges:

1. Latency overhead: Each AI call adds 500ms-2s to processing time. For high-frequency trading applications, this is unacceptable. Some firms are experimenting with distilled models (e.g., GPT-4o-mini) that run locally, but accuracy drops by 2-5%.

2. Prompt injection surface: The reasoning layer is still vulnerable to adversarial inputs. If a user crafts a loan application that tricks the LLM into outputting an incorrect risk score, the code layer may not catch it if the output passes validation. The industry needs better adversarial testing frameworks.

3. Cost scaling: Hybrid systems require both GPU compute for the LLM and CPU compute for the code layer. Total infrastructure costs can be 2-3x higher than pure-code systems, though this is offset by reduced error costs.

4. Talent gap: Engineers who can design these hybrid systems are rare. They need expertise in both traditional software engineering (state machines, idempotency, audit trails) and modern LLM ops (prompt engineering, output validation, model monitoring).

AINews Verdict & Predictions

Prediction 1: By 2027, every major fintech will have a dedicated "AI Guard" team whose sole job is to build and maintain the code-based validation layer around AI models. This role will be as common as compliance officers.

Prediction 2: Open-source validation frameworks will become the standard. We expect a project like Guardrails AI or a new entrant to become the "Kubernetes of AI safety" in fintech—a ubiquitous, battle-tested tool that every regulated deployment uses.

Prediction 3: Regulators will mandate this architecture. The European Banking Authority's 2024 draft guidelines on AI in finance already hint at requiring "deterministic fallback mechanisms." We predict that by 2026, the SEC and FCA will explicitly require that all AI-generated financial decisions be verifiable by independent code.

The bottom line: The teams that understand that AI is a *component*, not a *platform*, will dominate the next decade of fintech. The winners will be those who treat LLMs as brilliant but unreliable interns—always supervised by rigorous, deterministic code.

More from Hacker News

AI翻轉劇本:年長勞工在新經濟中獲得議價能力The conventional wisdom that senior employees are the primary victims of AI automation is collapsing under the weight ofAI代理學會付費:x402協議開啟機器微經濟時代The x402 protocol represents a critical infrastructure upgrade for the AI ecosystem, embedding payment directly into theClaude 無法賺取真實收入:AI 編碼代理實驗揭示殘酷真相In a controlled experiment, AINews tasked Claude with completing real paid programming bounties on Algora, a platform whOpen source hub3513 indexed articles from Hacker News

Related topics

explainable AI26 related articles

Archive

May 20261794 published articles

Further Reading

計數悖論:為何LLM能寫小說卻數不到50大型語言模型能生成整本小說,卻連數到五十都有困難。AINews深入探討此悖論的架構根源、對商業應用的影響,以及可能彌補差距的新興混合方法。BWVI 為 AI 代理提供結構化思考骨架,輔助設計決策AINews 發現了 BWVI,這是一個命令列工具,能為 AI 代理提供工程設計的結構化決策框架。透過強制進行明確的權衡分析與約束處理,BWVI 將 AI 從黑箱最佳化轉變為透明、類似人類的推理——這是一項哲學上的轉變。VibeLens:開源「思維顯微鏡」,讓AI代理決策透明化一款名為VibeLens的新開源工具,提供即時互動的AI推理視覺化,將黑箱決策轉變為可檢視的流程圖。這可能成為AI代理的標準除錯工具,類似傳統軟體的除錯器。GPT-5.5 低調推出:OpenAI 押注推理深度,開啟可信賴 AI 時代OpenAI 低調發布了其最先進的模型 GPT-5.5,但重點不在參數數量,而在自主推理能力的飛躍。我們分析動態思維鏈架構與全新可解釋性層,如何讓該模型成為高風險行業的決策引擎。

常见问题

这次公司发布“AI vs Code in Fintech: Why Separation of Powers Is the New Architecture”主要讲了什么?

The financial technology sector is undergoing a quiet but profound architectural revolution. After a wave of high-profile failures where large language models were tasked with end-…

从“How Stripe uses AI for fraud explanation without delegating decisions”看,这家公司的这次发布为什么值得关注?

The core insight driving the AI-code separation architecture is that LLMs are fundamentally probabilistic systems optimized for semantic plausibility, not arithmetic precision or rule compliance. When a model like GPT-4o…

围绕“JPMorgan LLM Guard internal tool architecture”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。