Rust與WASM如何透過rhwp專案打破韓國的文件壟斷

GitHub April 2026
⭐ 1341📈 +264
Source: GitHubArchive: April 2026
基於Rust與WebAssembly技術的HWP檢視器及編輯器專案「rhwp」,正對韓國長久以來的文件格式依賴發起關鍵挑戰。開發者Edward Kim的這項創作,透過運用現代系統程式設計與網路標準,首次為實現真正跨平台相容性提供了可行途徑。
The article body is currently shown in English by default. You can generate the full version in this language on demand.

The GitHub repository `edwardkim/rhwp` represents a significant technical and cultural intervention in the world of document processing. HWP, the proprietary format of Hancom's Hangul Word Processor, has dominated South Korea's government, academic, and corporate sectors for decades, creating a persistent platform lock-in that ties users to Windows and specific software. The rhwp project directly confronts this by implementing a complete HWP parser, viewer, and editor in Rust, compiled to WebAssembly for browser execution. This approach delivers three core advantages: performance and memory safety from Rust, universal accessibility via the web browser runtime through WASM, and liberation from operating system dependencies.

The project's rapid GitHub traction—surpassing 1,300 stars with notable daily growth—signals strong pent-up demand. Its significance extends beyond a simple utility. It challenges the economic and technical assumptions that have allowed a single, closed format to maintain its grip on a national digital workflow. For developers and organizations outside Korea, rhwp finally provides a clear technical pathway to parse and interact with HWP files without reverse-engineering or relying on unstable converters. While still in early development, its pure Rust implementation suggests a foundation for robust, embeddable libraries that could power future document conversion services, archival systems, and cross-platform office suites. The project is not merely building a tool; it is constructing an open bridge into a previously walled garden of Korean digital content.

Technical Deep Dive

The rhwp project's architecture is a masterclass in applying modern systems programming to a legacy format problem. At its core, it is a pure Rust implementation of the HWP binary format specification. Rust was chosen not just for trendiness, but for its foundational guarantees: zero-cost abstractions, fearless concurrency, and compile-time memory safety. These are critical when parsing complex, potentially malformed binary files where security vulnerabilities in parsers are common attack vectors.

The technical stack follows a layered approach:
1. Core Parser (`rhwp-core`): A low-level library that reads the HWP file structure, including its OLE (Object Linking and Embedding) Compound File binary container, streams, and sectors. This layer decodes the file's internal directory and extracts compressed text, paragraph, and style information.
2. Model Layer: Constructs an in-memory, structured representation of the document—paragraphs, characters, sections, and embedded objects—transforming the raw binary data into a manipulable data model.
3. Rendering Engine: This component, targeting both native and web, takes the document model and calculates layout, font metrics, and positioning. For the web target, this logic is compiled alongside the core library into WebAssembly.
4. WASM Bindings & Frontend: Using `wasm-bindgen`, the Rust functions are exposed to JavaScript. The provided web demo uses a lightweight frontend (potentially Vanilla JS or a minimal framework) to orchestrate file uploads, call into the WASM module for parsing and rendering, and display the results on an HTML5 canvas or via DOM manipulation.

The use of WebAssembly is particularly ingenious. It allows the computationally intensive parsing and layout work to run at near-native speed within the browser's sandbox, bypassing JavaScript performance bottlenecks. This creates a user experience where a complex HWP document can be viewed without any server-side processing, enhancing privacy and reducing latency.

A key challenge the project must overcome is the sheer complexity of the HWP format. Unlike open standards like ODF or even the more structured DOCX, HWP is a monolithic binary format with decades of feature accretion. Full support requires implementing:
- Text Layout: Korean's mixed-script composition (Hangul, Hanja, Latin) with complex line-breaking rules.
- Paragraph and Character Styles: A deep hierarchy of formatting properties.
- Page Layout: Headers, footers, margins, and columns.
- Embedded Objects: Tables, images, and equations.
- Legacy Features: Support for older versions of the format.

The project's progress can be benchmarked against the official Hancom Viewer. While Hancom's solution has full fidelity, it is a closed, Windows/macOS-native application.

| Feature | rhwp (WASM) | Hancom Office Viewer | LibreOffice (via external filter) |
|---|---|---|---|
| Platform | Any modern browser | Windows, macOS | Windows, Linux, macOS |
| Installation | Zero (web) / WASM module | Native install | Native install + plugin |
| Fidelity (Est.) | Medium (improving) | High | Low-Medium (unstable) |
| Editing | Basic (goal) | View-only | Limited via import/export |
| Performance | Fast parsing, slower complex render | Fast | Slow, prone to crashes |
| License | Open Source (MIT/Apache) | Proprietary | Open Source (MPL/GPL) |

Data Takeaway: The table reveals rhwp's unique value proposition: browser-native, zero-install access with growing fidelity. It occupies a niche distinct from both the official proprietary viewer and the patchy support in general-purpose open-source suites.

Key Players & Case Studies

The development of rhwp exists within a broader ecosystem of entities grappling with the HWP problem.

Hancom Inc. is the incumbent, whose Hangul Word Processor holds over 70% market share in the Korean word processor sector. Their strategy has historically been one of vertical integration within the Korean market, with deep ties to government procurement and education. Their response to cross-platform demand has been the release of Hancom Office 2024 for iOS/iPadOS and Android, and a web-based "Hancom Office Online," but these remain within their proprietary ecosystem. The existence of rhwp represents a direct, open-source counterpoint to their walled garden.

The Korean Government and Public Institutions are the ultimate decision-makers. Past initiatives, like the 2007 attempt to mandate Open Document Format (ODF), faltered due to compatibility issues and inertia. However, the National Archives of Korea and other bodies have a long-term interest in digital preservation, for which open, well-documented formats are essential. Projects like rhwp could become crucial tools in archival workflows, ensuring HWP documents remain readable decades from now without dependency on a single company's software.

Open Source Communities are the other key player. The `pyhwp` project on GitHub is a Python-based HWP text extractor, but it lacks rendering and editing capabilities. LibreOffice has intermittently worked on HWP support through an external filter, but progress has been slow and unstable. rhwp, with its robust Rust foundation, has the potential to become the reference open-source implementation. Developer Edward Kim (the project lead) has positioned it not as a competitor to these efforts but as a potential core engine that could be integrated elsewhere. For instance, the rhwp parser compiled to a C-compatible library could one day feed into LibreOffice's native import filter, dramatically improving its reliability.

Case Study: Academic Paper Submission. Many Korean universities still require thesis submissions in HWP format. A foreign researcher or journal using Linux-based systems faces a significant barrier. An integration of rhwp's WASM module into a university's submission portal could allow for inline preview and basic validation of uploaded HWP files directly in the browser, solving a real-world cross-platform pain point without forcing the submitter to find a Windows machine.

Industry Impact & Market Dynamics

rhwp's emergence is a symptom of a larger shift: the erosion of proprietary document format dominance by open standards and web-native tooling. Its impact will unfold across several axes.

1. Democratization of Access: The primary impact is breaking the platform lock. Developers worldwide can now build HWP support into their web applications—think cloud storage previews (like Dropbox or Google Drive), document management systems, or e-discovery platforms—without negotiating licenses or deploying fragile conversion servers. This opens the Korean document corpus to global digital workflows.

2. Catalyst for Standardization Pressure: A high-quality open-source implementation raises the public's expectations. It becomes a living reference that questions why certain features are opaque or undocumented. This can increase pressure on Hancom to participate more actively in standardization efforts or to better document its format, benefiting the entire ecosystem, including their own developers.

3. New Business Model Enabler: While rhwp itself is open-source, it enables commercial services around it. Startups could offer:
- High-fidelity, API-based HWP-to-PDF/DOCX conversion services using rhwp as the core engine.
- SaaS platforms for collaborative annotation of HWP documents in mixed-OS teams.
- Plugins for popular web frameworks (React, Vue) for embedding HWP viewers.

The market for document format conversion and compatibility is substantial. While specific data for HWP is scarce, the global document management systems market is projected to grow from ~$6.5 billion in 2023 to over $11 billion by 2028.

| Potential Service | Target Market | rhwp's Role |
|---|---|---|
| Cloud Preview API | Cloud Storage, CMS | Core parsing/rendering engine |
| Batch Conversion Service | Enterprises, Archives | Reliable, auditable conversion pipeline |
| Embedded Viewer SDK | Software Vendors | WASM library for in-app viewing |
| Accessibility Tooling | Government, Education | Text extraction for screen readers |

Data Takeaway: The project's true economic value lies not in direct monetization but in its role as infrastructure that unlocks downstream commercial and institutional use cases previously blocked by technical and legal hurdles.

4. Long-term Preservation: For the National Archives of Korea and similar bodies, proprietary formats are a preservation risk. rhwp, as an open-source specification-implementing tool, becomes an insurance policy. Its code is itself the documentation, ensuring that future generations can recover the content of HWP files even if Hancom as a company ceases to exist.

Risks, Limitations & Open Questions

Despite its promise, rhwp faces substantial hurdles.

Technical Limitations: The project is in its early stages. Full support for HWP's advanced features—complex tables, mathematical equations, dynamic fields, macros, and revision tracking—is a multi-year engineering undertaking. The rendering fidelity, especially for documents using esoteric fonts or layout tricks, will likely lag behind Hancom's official software for a long time. Performance of the WASM module for very large documents (hundreds of pages) in the browser is also an unproven area.

Legal and Specification Risks: HWP is a proprietary format. While reverse-engineering for interoperability is generally protected under laws in many jurisdictions (like the EU's Software Directive), the legal landscape is complex. The project's health depends on a clean-room implementation mindset. Furthermore, Hancom could update the format in ways that are difficult to reverse-engineer, forcing the open-source project into a perpetual game of catch-up.

Adoption and Sustainability Risk: The project relies on the sustained effort of a lead developer and a growing community. If Edward Kim were to step away without a clear maintainer succession plan, momentum could stall. Gaining the trust of conservative institutions (like government agencies) to rely on an open-source tool for critical document viewing will require not just technical maturity but also professional support channels, which are currently absent.

Open Questions:
1. Will Hancom see this as a threat or an opportunity? They could attempt to litigate, ignore it, or—most beneficially—engage by providing official, non-binding documentation to improve interoperability.
2. Can rhwp achieve "good enough" fidelity for 95% of documents? This is the crucial threshold for widespread utility.
3. What is the optimal integration path for larger open-source projects? Should the LibreOffice project adopt rhwp's core as a library, or should rhwp remain a standalone, web-focused tool?
4. How will the project handle security vulnerabilities? As a parser of complex binary data, it will be a target. A robust security disclosure and patching process needs to be established.

AINews Verdict & Predictions

The rhwp project is more than a clever piece of code; it is a strategic breach in the wall surrounding one of the world's last major proprietary document fortresses. Its technical choices—Rust for safety and performance, WASM for universal delivery—are exemplary and position it for long-term success where previous attempts have floundered.

Our editorial verdict is bullish, with cautious medium-term expectations. rhwp will not replace Hancom Office for native Korean users creating complex documents in the next 3-5 years. However, it will absolutely become the de facto standard for cross-platform HWP consumption and processing within the next 18-24 months. Its growth on GitHub is a leading indicator of massive latent demand.

Specific Predictions:
1. Within 12 months: A major international cloud service (such as a component within a large cloud provider's file preview service) will quietly integrate a fork of rhwp's engine to offer HWP previews, marking its first major commercial adoption.
2. By end of 2025: The project will see its first significant corporate sponsorship or grant, likely from a Korean IT service company or a global cloud player looking to solidify its position in the Korean market.
3. The "LibreOffice Integration" will happen, but indirectly: Instead of a full merge, the LibreOffice project will develop a bridge that uses rhwp's core as an external, standalone conversion service, significantly improving its HWP import filter by 2026.
4. Hancom's Response: Hancom will not sue. Instead, they will accelerate development of their own cloud APIs and emphasize services beyond mere file format compatibility, attempting to stay ahead by moving up the value stack.

The key metric to watch is not star count, but the diversity of contributors and the list of projects that declare a dependency on the `rhwp-core` library. When it appears in the dependency tree of a major document processing suite or a commercial SaaS platform, its role as critical open infrastructure will be cemented. Edward Kim's rhwp has lit a fuse; the explosion it triggers will finally connect Korea's digital document history to the open web.

More from GitHub

免費LLM API生態系:是普及AI存取,還是創造脆弱的依賴?The landscape of AI development is undergoing a quiet revolution as dozens of providers offer free access to Large LanguAgentGuide 如何揭示 AI 智能體開發與職涯轉型的嶄新藍圖The AgentGuide project represents a significant meta-trend in the AI development landscape: the formalization and systemManifest 智慧路由革命:智能 LLM 編排如何將 AI 成本削減 70%Manifest represents a pivotal evolution in the infrastructure layer for generative AI, moving beyond simple API wrappersOpen source hub860 indexed articles from GitHub

Archive

April 20261851 published articles

Further Reading

JKVideo:React Native 如何驅動高性能的 Bilibili 替代方案JKVideo 是一款為 Bilibili 打造的開源 React Native 客戶端,已在 GitHub 上迅速獲得超過 4,500 顆星,顯示出開發者的高度關注。此專案挑戰了人們對於 React Native 在構建複雜、多媒體豐富應Pyodide的WebAssembly革命:Python如何征服瀏覽器及其對數據科學的意義Pyodide代表了一場典範轉移,它將整個CPython直譯器與關鍵科學函式庫編譯為WebAssembly,從而在瀏覽器中原生執行。這項突破消除了Python運算傳統的伺服器-客戶端隔閡,為互動式、可攜帶的應用開闢了全新可能。NewPipe 的反向工程方法挑戰串流平台主導地位NewPipe 在行動串流領域代表著一場靜默的反抗。這款開源 Android 應用程式透過反向工程解析平台網站,而非使用官方 API,不僅提供無廣告、無追蹤器的內容,更挑戰了科技巨頭對使用者體驗的根本控制。Pydantic-Core:Rust 如何重寫 Python 的數據驗證規則,實現 50 倍速度提升Pydantic-Core 代表了 Python 生態系統的一次根本性架構轉變,它以 Rust 編譯的程式碼取代了關鍵的驗證邏輯,從而實現了顯著的性能提升。此舉標誌著一個更廣泛的產業趨勢:Python 在保持其對開發者友好的介面同時,正積極

常见问题

GitHub 热点“How Rust and WASM Are Breaking Korea's Document Monopoly with the rhwp Project”主要讲了什么?

The GitHub repository edwardkim/rhwp represents a significant technical and cultural intervention in the world of document processing. HWP, the proprietary format of Hancom's Hangu…

这个 GitHub 项目在“how to integrate rhwp wasm viewer into react application”上为什么会引发关注?

The rhwp project's architecture is a masterclass in applying modern systems programming to a legacy format problem. At its core, it is a pure Rust implementation of the HWP binary format specification. Rust was chosen no…

从“rhwp vs hancom office viewer performance benchmark”看,这个 GitHub 项目的热度表现如何?

当前相关 GitHub 项目总星标约为 1341,近一日增长约为 264,这说明它在开源社区具有较强讨论度和扩散能力。