Filtración de claves de bases de datos en habilidades de agentes de IA: el 15% contiene credenciales de escritura codificadas

15 de mayo de 2026 a las 09:32 AINews Hacker News May 2026

Source: Hacker News AI Agent security prompt injection Archive: May 2026

Una auditoría de seguridad exhaustiva ha revelado que el 15% de los archivos de habilidades de agentes de IA contienen credenciales de bases de datos codificadas con permisos de escritura. Esta vulnerabilidad sistémica convierte a cada agente comprometido en un vector directo para la manipulación y extorsión de datos, reflejando los fallos de seguridad de los primeros días de la era IoT.

The article body is currently shown in English by default. You can generate the full version in this language on demand.

A comprehensive security analysis of over 10,000 publicly available AI Agent skill files has revealed a startling statistic: 15% contain hardcoded credentials that grant direct database write access. These credentials, often embedded as plaintext connection strings within YAML, JSON, or Python skill definitions, are intended to streamline agent autonomy but create a catastrophic security hole. When an agent is compromised—via prompt injection, malicious skill chaining, or API endpoint exposure—an attacker inherits the full write privileges of the embedded credential, bypassing traditional network perimeters and authentication layers. The finding highlights a systemic flaw in the current agent development paradigm, where speed of deployment and ease of testing are prioritized over security fundamentals. Developers, often under pressure to deliver autonomous capabilities, treat database connection strings as configuration artifacts rather than the crown jewels they are. The analysis also found that over 40% of these exposed credentials used default or weak passwords, and 70% were connected to cloud-hosted databases with public endpoints. The industry is now facing a critical choice: implement mandatory credential vaulting, runtime permission isolation, and automated secret scanning in CI/CD pipelines, or risk a wave of data breaches that could erode public trust in autonomous AI systems. This is not a theoretical risk—proof-of-concept attacks have already demonstrated that a single prompt injection can exfiltrate a skill file's embedded key and execute arbitrary SQL commands within seconds.

Technical Deep Dive

The root cause of this vulnerability lies in the architecture of modern AI Agent frameworks. Most agents, including those built on LangChain, AutoGPT, and CrewAI, rely on a 'skill file'—a declarative or scripted module that defines a specific capability (e.g., 'query customer database', 'update inventory'). These files often contain the agent's instructions, tool definitions, and crucially, the connection parameters to external services.

The Engineering Flaw:
Developers commonly embed database connection strings directly into the skill file's configuration section. For example, in a LangChain tool definition, a PostgreSQL connector might be instantiated like this:

```python
from langchain.tools import tool
import psycopg2

@tool
def update_customer_record(customer_id: str, new_email: str) -> str:
conn = psycopg2.connect(
host="prod-db.example.com",
database="customers",
user="agent_user",
password="P@ssw0rd!"
)
# ... execute UPDATE query
```

This pattern is repeated across thousands of skill files on GitHub and other public repositories. The security analysis scanned repositories using a custom tool built on top of `truffleHog` and `GitLeaks`, enhanced with pattern matching for common database connection string formats (PostgreSQL, MySQL, MongoDB, Redis). The scanner also performed live connectivity tests to verify the credentials were still valid.

The Attack Vector:
The attack chain is straightforward:
1. Discovery: An attacker scans public repositories or uses a compromised agent's API to retrieve skill file contents.
2. Extraction: The hardcoded credential is parsed from the file.
3. Exploitation: The attacker connects directly to the database, bypassing the agent entirely. Since the credential has write permissions, they can INSERT, UPDATE, DELETE, or DROP tables.
4. Escalation: In many cases, the database user had broader privileges than necessary (e.g., `SUPERUSER` in PostgreSQL or `db_owner` in SQL Server), enabling lateral movement within the cloud environment.

Benchmark Data:
The analysis categorized the exposed credentials by database type and permission level:

| Database Type | % of Exposed Credentials | % with Write Permissions | % Using Default/Weak Passwords |
|---|---|---|---|
| PostgreSQL | 38% | 92% | 45% |
| MySQL | 31% | 88% | 52% |
| MongoDB | 18% | 95% | 38% |
| Redis | 8% | 100% (by nature) | 60% |
| Other (SQLite, MSSQL) | 5% | 80% | 30% |

Data Takeaway: MongoDB and Redis credentials were the most dangerous due to their inherently permissive default configurations. PostgreSQL and MySQL, while slightly better, still showed alarmingly high rates of weak passwords.

Relevant Open-Source Tools:
- truffleHog (GitHub: `trufflesecurity/trufflehog`): Scans Git repositories for secrets. Recently added support for scanning agent skill file formats (YAML, JSON). Over 15,000 stars.
- GitLeaks (GitHub: `gitleaks/gitleaks`): A SAST tool for detecting hardcoded secrets. Now includes custom rules for database connection strings. Over 18,000 stars.
- Credential Digger (GitHub: `SAP/credential-digger`): A tool by SAP that uses machine learning to detect credentials in code. Particularly effective at finding obfuscated or encoded strings. Over 1,500 stars.

Takeaway: The technical fix is not complex—use environment variables, secret vaults (like HashiCorp Vault or AWS Secrets Manager), and runtime permission isolation (e.g., granting the agent only SELECT permissions, with a separate write service). Yet the prevalence of hardcoded credentials suggests a cultural problem, not a technical one.

Key Players & Case Studies

Several companies and open-source projects are at the center of this issue, both as contributors to the problem and as potential solution providers.

LangChain: The most popular framework for building LLM-powered agents, LangChain's documentation has historically shown examples with hardcoded credentials. While they have since updated their tutorials to recommend environment variables, the damage is done—thousands of skill files on GitHub still follow the old pattern. LangChain's `Tool` abstraction does not enforce any security best practices, leaving it to the developer.

AutoGPT: The autonomous agent project that sparked the current wave of agent development. Its skill file format (`.yaml` files in the `skills/` directory) explicitly allows embedding of API keys and database credentials. The project's rapid iteration cycle prioritized functionality over security.

CrewAI: A newer framework for orchestrating multi-agent systems. CrewAI's design encourages role-based agents, each with its own skill files. The analysis found that CrewAI-based projects had a slightly lower rate of hardcoded credentials (11%) compared to LangChain (17%), likely because the framework is newer and the community is more security-aware.

Comparison of Framework Security Postures:

| Framework | % of Projects with Hardcoded DB Creds | Built-in Secret Management | Default Permission Model |
|---|---|---|---|
| LangChain | 17% | No (requires manual setup) | None (full access to tool) |
| AutoGPT | 14% | No | None |
| CrewAI | 11% | No | None |
| Semantic Kernel (Microsoft) | 6% | Yes (Azure Key Vault integration) | Configurable |
| Google ADK | 4% | Yes (Secret Manager integration) | Configurable |

Data Takeaway: Microsoft's Semantic Kernel and Google's Agent Development Kit (ADK) show that it is possible to bake security into the framework itself. Their lower rates of hardcoded credentials are not accidental—they actively discourage the practice through documentation and built-in integrations.

Notable Researchers:
- Eugene Bagdasaryan (Cornell): Pioneered research on prompt injection attacks against agents, demonstrating how an attacker can manipulate an agent to reveal its embedded credentials.
- Kai Greshake (SAP): Published a paper on 'indirect prompt injection' where a skill file's embedded credentials can be exfiltrated through a compromised third-party API call.

Takeaway: The frameworks that succeed in the long run will be those that treat security as a first-class feature, not an afterthought. Microsoft and Google are leading this charge, but the open-source community needs to follow suit.

Industry Impact & Market Dynamics

This vulnerability is reshaping the competitive landscape of the AI agent ecosystem. The market for AI agents is projected to grow from $5.1 billion in 2024 to $47.1 billion by 2030 (CAGR of 44.8%), according to industry estimates. However, this growth is predicated on trust—if enterprises cannot trust agents with their data, adoption will stall.

Enterprise Adoption Impact:
A survey of 500 enterprise CTOs conducted by a major consulting firm (not named here) found that 68% cited 'security concerns' as the primary barrier to deploying autonomous agents in production. The hardcoded credential finding will only amplify these fears. We predict a short-term slowdown in agent deployments as enterprises mandate security audits.

Market Shifts:
- Rise of Agent Security Startups: New startups are emerging to fill the gap. Companies like 'ProtectAI' and 'Guardian Agents' are building runtime security layers that monitor agent behavior and block unauthorized database access. These startups are attracting significant venture capital—ProtectAI raised a $35 million Series A in Q1 2025.
- Cloud Provider Opportunity: AWS, Azure, and Google Cloud are all updating their managed agent services (Amazon Bedrock Agents, Azure AI Agent Service, Vertex AI Agent Builder) to include mandatory secret management and permission scoping. This could drive migration from open-source frameworks to managed services.
- Insurance Market: Cyber insurance providers are beginning to ask about agent security postures. Policies may soon require proof of credential scanning and runtime isolation.

Funding Data:

| Company | Focus | Total Funding (est.) | Key Investors |
|---|---|---|---|
| ProtectAI | Agent runtime security | $45M | Sequoia, Accel |
| Guardrails AI | Agent input/output validation | $28M | Greylock, a16z |
| Vaulted (startup) | Agent credential management | $12M | Y Combinator, SV Angel |

Data Takeaway: The market is responding with capital, but the solutions are still nascent. The next 12 months will be critical for these startups to prove their efficacy.

Takeaway: The hardcoded credential problem is a market catalyst, not a market killer. It will accelerate the shift toward secure-by-design agent platforms and create new opportunities for security vendors.

Risks, Limitations & Open Questions

While the finding is alarming, there are important caveats and unresolved challenges:

1. Scope of the Analysis: The analysis covered only publicly available skill files on GitHub and similar platforms. Private repositories and enterprise-internal agents likely have similar issues, but the true scale is unknown. The 15% figure might be an undercount if private repos are worse, or an overcount if public repos are more sloppy.

2. False Positives: Some of the detected 'credentials' might be test or dummy values. However, the live connectivity test showed that 70% of the scanned credentials were still active, suggesting most are real.

3. The Human Factor: Even with perfect tooling, developers will find ways to bypass security. The pressure to ship fast, combined with the complexity of secret management, means that some level of credential leakage is inevitable. The question is how to minimize it.

4. Ethical Concerns: The analysis itself raises ethical questions. The researchers who conducted the scan accessed and tested live databases. While they did not modify data, the act of connecting to a database without authorization is legally gray in many jurisdictions. This highlights the need for responsible disclosure frameworks for agent security.

5. Open Questions:
- Can we build agents that are autonomous yet incapable of leaking secrets, even under prompt injection?
- Should skill files be signed and verified before execution?
- How do we balance developer convenience with security without slowing innovation?

AINews Verdict & Predictions

Verdict: The 15% figure is a wake-up call, but not a death knell. The AI agent ecosystem is repeating the mistakes of the early internet, where security was bolted on after the fact. However, the speed of the AI industry means that the correction will happen faster—driven by market forces, not regulation.

Predictions:
1. By Q3 2026, all major agent frameworks will include mandatory secret scanning in their CI/CD pipelines. LangChain, AutoGPT, and CrewAI will either add this natively or be forked by security-conscious communities.
2. The 'agent security engineer' will become a recognized job title. Companies will hire specialists to audit agent skill files and design secure agent architectures.
3. A major breach will occur within 12 months where an attacker exploits a hardcoded credential in a publicly deployed agent, leading to a significant data leak. This will be the 'Equifax moment' for AI agents, triggering regulatory scrutiny.
4. Managed agent services (AWS, Azure, GCP) will capture 40% of the enterprise agent market by 2028, precisely because they offer built-in security guarantees that open-source frameworks cannot match.
5. The most successful agent frameworks will be those that make the secure path the easiest path. Developers will choose frameworks that automatically handle credential rotation, permission scoping, and audit logging.

What to Watch: Keep an eye on the GitHub repositories for LangChain and AutoGPT. If they add mandatory secret detection and runtime permission isolation in their next major releases, the industry will follow. If they don't, a fork or a new entrant will displace them.

常见问题

这次模型发布“AI Agent Skills Leak Database Keys: 15% Carry Hardcoded Write Credentials”的核心内容是什么？

A comprehensive security analysis of over 10,000 publicly available AI Agent skill files has revealed a startling statistic: 15% contain hardcoded credentials that grant direct dat…

从“how to prevent hardcoded credentials in AI agent skill files”看，这个模型发布为什么重要？

围绕“best practices for securing AI agent database connections”，这次模型更新对开发者和企业有什么影响？

开发者通常会重点关注能力提升、API 兼容性、成本变化和新场景机会，企业则会更关心可替代性、接入门槛和商业化落地空间。

Filtración de claves de bases de datos en habilidades de agentes de IA: el 15% contiene credenciales de escritura codificadas

Technical Deep Dive

Key Players & Case Studies

Industry Impact & Market Dynamics

Risks, Limitations & Open Questions

AINews Verdict & Predictions

More from Hacker News

Related topics

Archive

Further Reading

常见问题