AI Diagnosis in Chinese Medicine: Transparent Reasoning Through Knowledge Graphs and Multi-Turn Dialogue

arXiv cs.AI June 2026
Source: arXiv cs.AIlarge language modelexplainable AIArchive: June 2026
A novel AI diagnostic system for traditional Chinese medicine combines large language models with a structured knowledge graph, enabling transparent, multi-turn dialogue and multi-modal treatment plans. By making the reasoning process visible and interactive, it addresses the long-standing 'black box' problem in AI-assisted TCM, paving the way for standardized, trustworthy digital health tools.

The integration of large language models (LLMs) with knowledge graphs has produced a diagnostic system for traditional Chinese medicine (TCM) that finally breaks the 'black box' pattern. The system’s core knowledge graph contains 241 syndromes, 1263 symptoms, and 2485 relationships, effectively creating a verifiable clinical encyclopedia. Instead of outputting a static conclusion, the AI engages patients in multi-turn dialogues, actively asking clarifying questions to narrow down the diagnostic scope. Once a syndrome is identified, it generates multi-modal treatment plans that include text, charts, and even acupoint diagrams. This design allows physicians to inspect the AI’s reasoning chain in real time and enables patients to understand why a particular diagnosis was made and how the treatment plan was derived. The system is naturally suited for online consultation platforms, primary care support, and TCM education. For junior doctors, it acts as a 24/7 'syndrome differentiation mentor'; for patients, it is a transparent assistant that explains every step. The underlying architecture—combining knowledge graphs with LLMs—is highly replicable and could be extended to acupuncture, tuina, or other traditional medicine systems (e.g., Ayurveda), forming a general framework for explainable traditional medicine AI. More profoundly, this work demonstrates that the 'experiential' nature of TCM is not inherently unquantifiable. When AI anchors its reasoning in structured knowledge and communicates through natural language, the modernization of TCM becomes a genuine technological empowerment rather than a forced Westernization.

Technical Deep Dive

The system’s architecture is a hybrid pipeline that marries the structured reasoning of a knowledge graph (KG) with the conversational fluency of a large language model (LLM). At its foundation lies a meticulously curated TCM ontology: 241 syndromes (e.g., Liver Qi Stagnation, Spleen Qi Deficiency), 1263 symptoms (e.g., pale tongue, wiry pulse), and 2485 causal and associative relations. This KG is not a flat list but a directed graph where nodes represent clinical entities and edges encode relationships such as ‘has_symptom’, ‘caused_by’, and ‘treated_by’.

The inference process unfolds in three stages. First, the LLM parses the patient’s free-text description and extracts symptom entities, mapping them onto the KG. Second, the system enters a multi-turn dialogue loop: it identifies ambiguous or missing information (e.g., “Is the pain dull or stabbing?”) and generates clarifying questions. Each patient response updates the set of active symptom nodes, and a graph traversal algorithm computes the most probable syndrome(s) by evaluating path weights and co-occurrence statistics. The LLM serves as the natural language interface, while the KG provides the logical backbone—a classic hybrid approach that mitigates the hallucination tendencies of pure LLMs.

Once a syndrome is confirmed, the system retrieves treatment templates from the KG: herbal formulas, acupoint prescriptions, dietary advice, and lifestyle modifications. These are rendered as a multi-modal output: a textual explanation, a visual diagram of the acupoint locations, and a timeline chart showing expected recovery phases.

A relevant open-source project that parallels this approach is TCM-KG (GitHub repo: `tcm-kg/tcm-knowledge-graph`, ~1.2k stars), which provides a base ontology for TCM entities but lacks the LLM integration and multi-turn dialogue capabilities. Another is MedKG (GitHub repo: `medical-knowledge-graph/MedKG`, ~800 stars), which focuses on Western medicine. The current system’s innovation lies in bridging these two worlds with a real-time interactive loop.

Performance benchmarks are still emerging, but preliminary internal tests show:

| Metric | Value | Comparison Baseline (Pure LLM) |
|---|---|---|
| Syndrome accuracy (top-3) | 87.3% | 72.1% (GPT-4o, zero-shot) |
| Average dialogue turns to diagnosis | 4.2 | 1 (single query) |
| Patient satisfaction (1-5) | 4.6 | 3.8 |
| Physician agreement rate | 91.5% | 78.2% |

Data Takeaway: The hybrid system achieves 15 percentage points higher syndrome accuracy than a pure LLM baseline, albeit requiring more dialogue turns. The trade-off between efficiency and accuracy is acceptable in clinical settings where diagnostic confidence is paramount.

More from arXiv cs.AI

UntitledThe core challenge in scaling AI agents has been the manual, labor-intensive process of crafting reusable skills from raUntitledFor years, the Achilles' heel of large language model (LLM) agents has been their inability to effectively manage memoryUntitledOpenSkill represents a fundamental shift in how we think about AI agent autonomy. Traditionally, self-evolving agents haOpen source hub430 indexed articles from arXiv cs.AI

Related topics

large language model66 related articlesexplainable AI31 related articles

Archive

June 2026633 published articles

Further Reading

How Ontology Simulation is Transforming Enterprise AI from Black Box to Auditable White BoxEnterprise AI adoption is hitting a 'trust ceiling' as fluent but ungrounded model outputs fail audit requirements. A brCalibrated Interactive RL Ends LLM Agent Distribution Shift, Ushering Dynamic LearningA new theoretical framework, calibrated interactive reinforcement learning, directly tackles the context distribution shSciAtlas: The Knowledge Graph Highway Powering Autonomous AI ScientistsSciAtlas is a large-scale knowledge graph designed to solve the fragmentation of scientific literature. Unlike keyword oPopuLoRA: How Population Evolution Unlocks Self-Improving AI Reasoning Beyond RLHFPopuLoRA introduces a population-based asynchronous self-play framework where specialized LoRA adapters on a shared froz

常见问题

这次公司发布“AI Diagnosis in Chinese Medicine: Transparent Reasoning Through Knowledge Graphs and Multi-Turn Dialogue”主要讲了什么?

The integration of large language models (LLMs) with knowledge graphs has produced a diagnostic system for traditional Chinese medicine (TCM) that finally breaks the 'black box' pa…

从“TCM AI diagnosis system knowledge graph size”看,这家公司的这次发布为什么值得关注?

The system’s architecture is a hybrid pipeline that marries the structured reasoning of a knowledge graph (KG) with the conversational fluency of a large language model (LLM). At its foundation lies a meticulously curate…

围绕“multi-turn dialogue TCM AI explainability”,这次发布可能带来哪些后续影响?

后续通常要继续观察用户增长、产品渗透率、生态合作、竞品应对以及资本市场和开发者社区的反馈。