We stress-tested GPT-5: See the results

Breaking

We stress-tested GPT-5

Breaking

Go back

Blog

May 27, 2025

7 min read

OpenAI Used Agentic Radar to Judge Europe’s Largest AI Hackathon – Here Are The Results

Most AI builders underestimate security risks and rely on default guardrails – leaving their agentic workflows vulnerable to breaches.

Dorian Granoša

Ante Gojsalić

SplxAI – OpenAI & AI TInkerers Hackathon Cover

On April 25th, 2025, Warsaw (Poland) became the global centre of agentic innovation, by hosting the largest AI hackathon to date on European soil: The OpenAI x AI Tinkerers Hackathon. With over 1,000 applicants and only 40 teams selected for the 24-hour challenge, the event brought together some of the most promising builders, researchers, and product minds working on the next generation of AI-native products.

All participating teams developed their projects using the OpenAI Agents SDK, pushing the limits of what agent-based AI applications can do – and revealing just how much more we need to do to keep them secure.

SplxAI was proud to be one of the sponsors of this event – and even more proud that our open-source agentic security scanner, Agentic Radar, was used as one of the core scoring tools by the judging panel. Agentic Radar was utilized to evaluate the quality, safety, and security of agentic architectures submitted by all participating teams.

In what may be the largest single batch of agentic AI applications developed and analyzed in one place, every project submitted at the hackathon was run through Agentic Radar. The goal: to assess architectural structures and highlight potential security flaws and risk exposure in the agentic AI systems.

Key Security & Safety Findings from 40 Agentic Projects

The hackathon was a breeding ground for creativity – but also revealed how early we are in building secure and robust agentic AI systems. Below are the key insights that were uncovered by Agentic Radar:

1. Over-reliance on built-in LLM Guardrails

98% of teams shipped their workflows without adding a single layer of protection beyond what the LLM provides out of the box.
Only a few teams implemented layered defense mechanisms or external filters, exposing a systemic overtrust in model providers.

2. Neglected Risks: Data and Supply Chain Poisoning

0% of teams had considered or implemented defenses against Datasource Poisoning or Supply Chain Poisoning.
This reflects a critical blind spot, especially as many workflows have pulled in external plugins, APIs, or third-party datasets.

3. Minimal Use of Model Context Protocols (MCPs)

Only 3% of the teams embedded Model Context Protocols (MCPs) in their workflows – structured approaches for overseeing agent actions, memory, and reasoning. This was surprising given the recent rise in popularity of MCPs across the agentic AI community.

4. Intentional Misuse Was the Biggest Concern

20% of the projects implemented some form of safeguards against Intentional Misuse.
This left the majority of agentic workflows vulnerable to out-of-context interactions.

5. Recognized Risk ≠ Action Taken

86% of participants acknowledged that harmful content generation and jailbreaks were among their biggest concerns.
Yet, only 10% took steps to add additional safety layers beyond default LLM protections.

6. Complexity of Agentic Workflows

The most complex architecture featured 17 agents in a single workflow.
On average, submitted projects included 4 agents and 3 tools, reflecting the rising complexity and interconnectivity of agentic systems – and the corresponding increase in attack surface. Notably, this was just a short hackathon – real-world products are likely to involve far greater complexity and risk exposure.

The Full Breakdown of Submitted Agentic Projects

Below, you’ll find a detailed table listing each project submitted during the hackathon – including a link to the project overview and repository, the full Agentic Radar report, a summary of the AI Bill of Materials (AI BOMs) showing the number of agents and tools used in the workflow, and the number of detected risks.

This structured approach gave the judging panel immediate visibility into the architecture, complexity, and risk exposure of each agentic solution – something that would have been nearly impossible to evaluate manually within the 24-hour hackathon timeframe.

Project	Full Report	AI BOM Summary	Detected Risks
ConNect	Link	17 agents, 9 tools	20+
Air Patrol	Link	4 agents, 2 tools	20+
AsystentNFZ	Link	1 agent, 3 tools	6
Meeting Guru	Link	1 agent, 0 tools	6
AI Agents-Enhanced To-Do List	Link	3 agents, 1 tool	17
Slow Takeoff	Link	1 agent, 0 tools	6
SmartMap AI	Link	4 agents, 2 tools	20+
Agent Gateway	Link	3 agents, 3 tools	17
Loomy	Link	4 agents, 3 tools	20+
Ask Your Neighbour	Link	3 agents, 0 tools	12
MachineUnlearning	Link	5 agents, 0 tools	20+
MARA	Link	3 agents, 9 tools	18
Cue	Link	7 agents, 5 tools	20+
BrandMate	Link	3 agents, 9 tools	18
Night Receptionist	Link	5 agents, 4 tools	20+
Sentio Platform	Link	3 agents, 1 tool	18
CareAgent	Link	1 agent, 1 tool	6
LearnScope	Link	3 agents, 0 tools	17
oHacker	Link	4 agents, 1 tool	20+
CrunchByte	Link	1 agent, 0 tools	6

Agentic Radar enabled the judging panel to instantly review the architectural integrity, agent interactions, and security posture of OpenAI Agent-based solutions – all in a single automated scan.

To illustrate how the tool was used in practice, we’re highlighting one of our favorite hackathon finalists below. This example demonstrates how a well-designed agentic system can balance complexity, innovation, and security.

Agentic Workflow Spotlight: ConNect

One standout project from the hackathon was ConNect, an application designed to help parents build stronger relationships with their children. It plans engaging activities and generates visually appealing stories that both entertain and educate.

According to the full Agentic Radar report, the ConNect team implemented:

17 Agents, each with a clearly defined task and minimal role overlap
9 Tools, integrated to support planning, content generation, and visual storytelling

You can view the project submission here: ConNect Hackathon Entry

Agentic Workflow Graph:

Agents Overview:

Agentic Vulnerabilities:

Despite its well-designed architecture, the Agentic Radar scan surfaced over 20 security vulnerabilities in the ConNect workflow. Most of the agents lacked any form of implemented guardrails or misuse protections, making the system vulnerable to prompt injection, unintended behaviors, and harmful outputs.

This underlines a key takeaway from the hackathon: Clarity in design doesn’t automatically equal safety. Even the most well-structured agentic workflows need some form of protection to ensure resilience against misuse and unintended behavior.

Conclusion: Building Fast Is No Excuse for Insecure AI

In the fast-paced, high-pressure environment of a 24-hour hackathon, it’s perhaps unsurprising that 80% of teams shipped their applications without implementing any additional security measures.

However, this comes with some real consequences. Simply enabling internet access for an agentic application dramatically expands the attack surface – and without proper safeguards, even the most innovative solutions can quickly become vulnerable.

While the OpenAI Agent SDK includes a straightforward implementation for Guardrails agents, most participants chose not to use it. The primary reason? It reduced result accuracy and increased the number of incorrect refusals during early testing – leading many teams to intentionally deprioritize security in favor of smoother user experience or faster prototyping.

This tradeoff might be tolerated in a hackathon context – but in real-world, enterprise-grade deployments, security and safety cannot be optional.

At SplxAI, we believe the future of AI won’t be defined just by what agents can do, but by how safely and reliably they do it. Hackathons like this are an exciting proving ground – and tools like Agentic Radar are here to make sure the next generation of AI products are not only powerful, but trustworthy.

If you're building with agents or deploying AI into real-world environments, don't leave security as an afterthought. Reach out to our team to have your AI workflows and applications tested, hardened, and secured – before vulnerabilities turn into headlines.

Table of contents

Key Security & Safety Findings from 40 Agentic Projects

The Full Breakdown of Submitted Agentic Projects

Agentic Workflow Spotlight: ConNect

Conclusion: Building Fast Is No Excuse for Insecure AI