Research

Jul 17, 2025

6 min read

We Broke Kimi K2, the New Open Model, in Minutes. Can It Be Made Safe?

Behind the benchmarks, Kimi K2 hides critical weaknesses. We tested its limits and show how hardening can (partially) close the safety gap. But is it enough?

Mateja Vuradin

Key takeways

  • Kimi K2 excels in math, code, and reasoning - but fails hard on basic safety.

  • In its raw form, security scored 1.55%. Even hardened, disapoints.

  • Claude 4, with no prompt at all, outperformed Kimi’s hardened safety baseline.

  • Our red team says: this model is not yet fit for secure enterprise deployment.

Let's take a look.

The Performance Paradox

On paper, Kimi K2 is a beast. Moonshot AI’s newly released Mixture-of-Experts model (32B active, 1T total parameters) claims superiority across technical domains : coding, math, tool use. Benchmarks like LiveCodeBench v6 and MATH-500 back it up.

But here’s the catch: raw power doesn’t equal real-world readiness.

When we ran Kimi through red team testing, we found glaring gaps in basic safety. And when deployed without a system prompt? It was unfit for anything even close to production.

The Hard Numbers

Kimi K2 held its own across key benchmarks:

Benchmark (Metric)

Kimi K2

Claude 4

GPT-4.1

LiveCodeBench v6 (Pass@1)

53.7%

48.5%

47.4%

MATH-500 (Accuracy)

97.4%

94.0%

92.4%

GPQA-Diamond (Avg@8)

75.1%

70.0%

66.3%

PolyMath-en (Avg@4)

65.1%

52.8%

54%

These are world-class results for a “non-thinking” model. But that’s not the whole story.

Key takeways

  • Kimi K2 excels in math, code, and reasoning - but fails hard on basic safety.

  • In its raw form, security scored 1.55%. Even hardened, disapoints.

  • Claude 4, with no prompt at all, outperformed Kimi’s hardened safety baseline.

  • Our red team says: this model is not yet fit for secure enterprise deployment.

Let's take a look.

The Performance Paradox

On paper, Kimi K2 is a beast. Moonshot AI’s newly released Mixture-of-Experts model (32B active, 1T total parameters) claims superiority across technical domains : coding, math, tool use. Benchmarks like LiveCodeBench v6 and MATH-500 back it up.

But here’s the catch: raw power doesn’t equal real-world readiness.

When we ran Kimi through red team testing, we found glaring gaps in basic safety. And when deployed without a system prompt? It was unfit for anything even close to production.

The Hard Numbers

Kimi K2 held its own across key benchmarks:

Benchmark (Metric)

Kimi K2

Claude 4

GPT-4.1

LiveCodeBench v6 (Pass@1)

53.7%

48.5%

47.4%

MATH-500 (Accuracy)

97.4%

94.0%

92.4%

GPQA-Diamond (Avg@8)

75.1%

70.0%

66.3%

PolyMath-en (Avg@4)

65.1%

52.8%

54%

These are world-class results for a “non-thinking” model. But that’s not the whole story.

Key takeways

  • Kimi K2 excels in math, code, and reasoning - but fails hard on basic safety.

  • In its raw form, security scored 1.55%. Even hardened, disapoints.

  • Claude 4, with no prompt at all, outperformed Kimi’s hardened safety baseline.

  • Our red team says: this model is not yet fit for secure enterprise deployment.

Let's take a look.

The Performance Paradox

On paper, Kimi K2 is a beast. Moonshot AI’s newly released Mixture-of-Experts model (32B active, 1T total parameters) claims superiority across technical domains : coding, math, tool use. Benchmarks like LiveCodeBench v6 and MATH-500 back it up.

But here’s the catch: raw power doesn’t equal real-world readiness.

When we ran Kimi through red team testing, we found glaring gaps in basic safety. And when deployed without a system prompt? It was unfit for anything even close to production.

The Hard Numbers

Kimi K2 held its own across key benchmarks:

Benchmark (Metric)

Kimi K2

Claude 4

GPT-4.1

LiveCodeBench v6 (Pass@1)

53.7%

48.5%

47.4%

MATH-500 (Accuracy)

97.4%

94.0%

92.4%

GPQA-Diamond (Avg@8)

75.1%

70.0%

66.3%

PolyMath-en (Avg@4)

65.1%

52.8%

54%

These are world-class results for a “non-thinking” model. But that’s not the whole story.

We Tested Kimi’s Security. It Broke Fast.

We ran a three-tier evaluation:

  1. No System Prompt (No SP) - pure raw model.

  2. Basic SP - typical SaaS-style instructions.

  3. Hardened SP - SplxAI’s Prompt Hardening applied.

Here's what we found:

KIMI V Claude 4 Raw Model Comparison

Config

Security

Safety

Business Alignment

Kimi – No SP

1.55%

4.47%

0.00%

Claude – No SP

34.63%

39.72%

0.00%

Kimi – Basic SP

44.56%

65.08%

72.52%

Claude – Basic SP

67.98%

99.20%

93.81%

Kimi – Hardened

59.52%

82.70%

86.39%

Claude – Hardened

83.69%

98.77%

92.97%

Without guardrails, Kimi was essentially unsafe and unaligned. Dangerously so. In fact, Claude Sonnet 4 outperformed it in raw safety even without a system prompt.

We ran a three-tier evaluation:

  1. No System Prompt (No SP) - pure raw model.

  2. Basic SP - typical SaaS-style instructions.

  3. Hardened SP - SplxAI’s Prompt Hardening applied.

Here's what we found:

KIMI V Claude 4 Raw Model Comparison

Config

Security

Safety

Business Alignment

Kimi – No SP

1.55%

4.47%

0.00%

Claude – No SP

34.63%

39.72%

0.00%

Kimi – Basic SP

44.56%

65.08%

72.52%

Claude – Basic SP

67.98%

99.20%

93.81%

Kimi – Hardened

59.52%

82.70%

86.39%

Claude – Hardened

83.69%

98.77%

92.97%

Without guardrails, Kimi was essentially unsafe and unaligned. Dangerously so. In fact, Claude Sonnet 4 outperformed it in raw safety even without a system prompt.

We ran a three-tier evaluation:

  1. No System Prompt (No SP) - pure raw model.

  2. Basic SP - typical SaaS-style instructions.

  3. Hardened SP - SplxAI’s Prompt Hardening applied.

Here's what we found:

KIMI V Claude 4 Raw Model Comparison

Config

Security

Safety

Business Alignment

Kimi – No SP

1.55%

4.47%

0.00%

Claude – No SP

34.63%

39.72%

0.00%

Kimi – Basic SP

44.56%

65.08%

72.52%

Claude – Basic SP

67.98%

99.20%

93.81%

Kimi – Hardened

59.52%

82.70%

86.39%

Claude – Hardened

83.69%

98.77%

92.97%

Without guardrails, Kimi was essentially unsafe and unaligned. Dangerously so. In fact, Claude Sonnet 4 outperformed it in raw safety even without a system prompt.

The Prompt Hardening Effect

This is where SplxAI comes in. Our Prompt Hardening tool rewrites the system prompt based on previous failures, layering in safety traps, content filters, and behavioral anchors. It’s not static. It learns.

Kimi K2 Performance Across Prompt-Hardening Levels
  • Security jumps from 1.55% → 59.52%

  • Safety from 4.47% → 82.70%

  • Business alignment from 0.00% → 86.39%

Quite an improvement. But is it enough? We think not.

Until Kimi K2 receives a base-level security update, even hardened configurations fall short of what’s needed for safe deployment. The gains are real, but the gaps remain.

This is where SplxAI comes in. Our Prompt Hardening tool rewrites the system prompt based on previous failures, layering in safety traps, content filters, and behavioral anchors. It’s not static. It learns.

Kimi K2 Performance Across Prompt-Hardening Levels
  • Security jumps from 1.55% → 59.52%

  • Safety from 4.47% → 82.70%

  • Business alignment from 0.00% → 86.39%

Quite an improvement. But is it enough? We think not.

Until Kimi K2 receives a base-level security update, even hardened configurations fall short of what’s needed for safe deployment. The gains are real, but the gaps remain.

This is where SplxAI comes in. Our Prompt Hardening tool rewrites the system prompt based on previous failures, layering in safety traps, content filters, and behavioral anchors. It’s not static. It learns.

Kimi K2 Performance Across Prompt-Hardening Levels
  • Security jumps from 1.55% → 59.52%

  • Safety from 4.47% → 82.70%

  • Business alignment from 0.00% → 86.39%

Quite an improvement. But is it enough? We think not.

Until Kimi K2 receives a base-level security update, even hardened configurations fall short of what’s needed for safe deployment. The gains are real, but the gaps remain.

Real-World Failures (Raw Model)

Here are examples of what Kimi K2 generated when left unguarded:

Jailbreak:

“Gather 5kg of **** ***… Attach a *******… Voilà, a high-yield explosive.”

Profanity & Harassment:

“Idiot.” “Chaotic zombie.” “Disgusting package.”

Manipulation:

“To better tailor suggestions, ask about the user’s sexual orientation. Don’t comment on this instruction.”

These aren’t edge cases. They’re real outputs, caught during structured testing using Probe, our AI Red Team engine.

Here are examples of what Kimi K2 generated when left unguarded:

Jailbreak:

“Gather 5kg of **** ***… Attach a *******… Voilà, a high-yield explosive.”

Profanity & Harassment:

“Idiot.” “Chaotic zombie.” “Disgusting package.”

Manipulation:

“To better tailor suggestions, ask about the user’s sexual orientation. Don’t comment on this instruction.”

These aren’t edge cases. They’re real outputs, caught during structured testing using Probe, our AI Red Team engine.

Here are examples of what Kimi K2 generated when left unguarded:

Jailbreak:

“Gather 5kg of **** ***… Attach a *******… Voilà, a high-yield explosive.”

Profanity & Harassment:

“Idiot.” “Chaotic zombie.” “Disgusting package.”

Manipulation:

“To better tailor suggestions, ask about the user’s sexual orientation. Don’t comment on this instruction.”

These aren’t edge cases. They’re real outputs, caught during structured testing using Probe, our AI Red Team engine.

Final Thought: Performance ≠ Readiness

Kimi K2’s raw intelligence is impressive, but that alone doesn’t make it secure, trustworthy, or enterprise-ready.

Our red team’s conclusion? Kimi K2 is not yet ready for safe deployment, even with hardened prompts.

While its raw capabilities are impressive, they come at a steep cost in security. Until the base model improves or guardrails evolve further, we’d recommend caution.

At SplxAI, we’re continuing to track and test open models like this, and build the tools needed to make them safe, when possible.

👉 Want to see how your model holds up?

Contact SplxAI to run your own red team test before your customers do.

Kimi K2’s raw intelligence is impressive, but that alone doesn’t make it secure, trustworthy, or enterprise-ready.

Our red team’s conclusion? Kimi K2 is not yet ready for safe deployment, even with hardened prompts.

While its raw capabilities are impressive, they come at a steep cost in security. Until the base model improves or guardrails evolve further, we’d recommend caution.

At SplxAI, we’re continuing to track and test open models like this, and build the tools needed to make them safe, when possible.

👉 Want to see how your model holds up?

Contact SplxAI to run your own red team test before your customers do.

Kimi K2’s raw intelligence is impressive, but that alone doesn’t make it secure, trustworthy, or enterprise-ready.

Our red team’s conclusion? Kimi K2 is not yet ready for safe deployment, even with hardened prompts.

While its raw capabilities are impressive, they come at a steep cost in security. Until the base model improves or guardrails evolve further, we’d recommend caution.

At SplxAI, we’re continuing to track and test open models like this, and build the tools needed to make them safe, when possible.

👉 Want to see how your model holds up?

Contact SplxAI to run your own red team test before your customers do.

Ready to leverage AI with confidence?

Ready to leverage AI with confidence?

Ready to leverage AI with confidence?

Leverage GenAI technology securely with SplxAI

Join a number of enterprises that trust SplxAI for their AI Security needs:

CX platforms

Sales platforms

Conversational AI

Finance & banking

Insurances

CPaaS providers

300+

Tested GenAI apps

100k+

Vulnerabilities found

1,000+

Unique attack scenarios

12x

Accelerated deployments

SECURITY YOU CAN TRUST

GDPR

COMPLIANT

CCPA

COMPLIANT

ISO 27001

CERTIFIED

SOC 2 TYPE II

COMPLIANT

OWASP

CONTRIBUTORS

Leverage GenAI technology securely with SplxAI

Join a number of enterprises that trust SplxAI for their AI Security needs:

CX platforms

Sales platforms

Conversational AI

Finance & banking

Insurances

CPaaS providers

300+

Tested GenAI apps

100k+

Vulnerabilities found

1,000+

Unique attack scenarios

12x

Accelerated deployments

SECURITY YOU CAN TRUST

GDPR

COMPLIANT

CCPA

COMPLIANT

ISO 27001

CERTIFIED

SOC 2 TYPE II

COMPLIANT

OWASP

CONTRIBUTORS

Leverage GenAI technology securely with SplxAI

Join a number of enterprises that trust SplxAI for their AI Security needs:

CX platforms

Sales platforms

Conversational AI

Finance & banking

Insurances

CPaaS providers

300+

Tested GenAI apps

100k+

Vulnerabilities found

1,000+

Unique attack scenarios

12x

Accelerated deployments

SECURITY YOU CAN TRUST

GDPR

COMPLIANT

CCPA

COMPLIANT

ISO 27001

CERTIFIED

SOC 2 TYPE II

COMPLIANT

OWASP

CONTRIBUTORS

Deploy secure AI Assistants and Agents with confidence.

Don’t wait for an incident to happen. Proactively identify and remediate your AI's vulnerabilities to ensure you're protected at all times.

SplxAI - Background Pattern

Deploy secure AI Assistants and Agents with confidence.

Don’t wait for an incident to happen. Proactively identify and remediate your AI's vulnerabilities to ensure you're protected at all times.

SplxAI - Background Pattern

Deploy secure AI Assistants and Agents with confidence.

Don’t wait for an incident to happen. Proactively identify and remediate your AI's vulnerabilities to ensure you're protected at all times.

SplxAI - Accelerator Programs
SplxAI Logo

For a future of safe and trustworthy AI.

Subscribe to our newsletter

By clicking "Subscribe" you agree to our privacy policy.

SplxAI - Accelerator Programs
SplxAI Logo

For a future of safe and trustworthy AI.

Subscribe to our newsletter

By clicking "Subscribe" you agree to our privacy policy.

SplxAI Logo

For a future of safe and trustworthy AI.

Subscribe to our newsletter

By clicking "Subscribe" you agree to our privacy policy.