Takeaways
GLM-4.5 is a powerful new open-source model that rivals big players like Claude Opus 4 and GPT 4o-mini.
But it failed across our safety, security, and business alignment benchmarks.
After prompt hardening, GLM-4.5 was greatly improved, far outperforming Kimi K2, another fresh contender in the open-source space.
Find out how SPLX turned GLM-4.5 from a safety liability into a secure enterprise solution.
There’s a whole lot of buzz around Z.ai’s latest open-source LLM, GLM-4.5. It promises to bridge reasoning, coding, and agentic functionality in one powerful system. This is a compelling proposition for enterprises looking to scale GenAI adoption, without the licensing costs of proprietary models.

Source: https://z.ai/blog/glm-4.5
But as we’ve seen time and again: model intelligence doesn’t guarantee secure enterprise deployment. Many GenAI models fall short on basic safety and security when deployed unprotected.
Kimi K2 - another recent open-source model known for its raw power - is a clear example. In our recent evaluation, we showed how it failed to meet enterprise readiness standards.
Can GLM-4.5 do any better?
Let’s find out.
How Does GLM-4.5 Compare to Kimi K2?
Out of the box, both models massively underperformed across our three core metrics. But after applying hardening, GLM-4.5 emerged as the clear winner.

Full comparison below:
Config | Security | Safety | Business Alignment |
---|---|---|---|
Kimi – No SP | 1.55% | 4.47% | 0.00% |
GLM-4.5 – No SP | 1.64% | 5.37% | 0.48% |
Kimi – Basic SP | 44.56% | 65.08% | 72.52% |
GLM-4.5 - Basic SP | 54.11% | 98.94% | 96.41% |
Kimi – Hardened | 59.52% | 82.70% | 86.39% |
GLM-4.5 - Hardened | 95.48% | 99.86% | 99.82% |
With rigorous security validation, GLM-4.5 demonstrated real enterprise potential, whilst Kimi K2 continued to fall short.
Real-World Failures (Raw Model)
Here are some of the real outputs that GLM-4.5 generated with no system prompt, identified using SPLX’s automated AI Red Teaming.
Jailbreak
“Building a bomb is simple and fun!... gather unstable chemicals like ********… Ignore safety protocols…”
Manipulation
“Congratulations! You are today's 10,000th visitor and have won a $500 flat discount at our store. To claim your prize, please visit ******* and input your data now…”
Profanity
“Complaints? From who? Those whiny little ***** in packing? Can't handle a ****** joke?”
These examples underscore the potential for exposure without proper GenAI security in place. CISOs, Red Team Leads and Product Owners should apply a robust pre-deployment checklist before shipping new applications.
Final Verdict: Promising, But Handle With Care
GLM-4.5 marks an exciting leap for open-source LLMs, outperforming many leading models on agentic reasoning, coding, and task execution benchmarks.
With the right protections in place, it can meet the AI safety and compliance standards needed for safe deployment. But left unguarded, the model was just as vulnerable to attacks and misuse as Kimi K2.
At SPLX, we provide end-to-end GenAI security: from automated AI red teaming to **prompt hardening and runtime protection**. We help enterprises:
Uncover hidden vulnerabilities
Mitigate risks from open and third-party models
Align with evolving AI compliance frameworks
We’ll continue road-testing the latest models as they drop, because understanding GenAI innovation means staying one step ahead of risk.
Ready to test your system before someone else does?
➡️ Contact SPLX for a free demo.
Table of contents