Blog

May 27, 2025

7 min read

OpenAI Used Agentic Radar to Judge Europe’s Largest AI Hackathon – Here Are The Results

Most AI builders underestimate security risks and rely on default guardrails – leaving their agentic workflows vulnerable to breaches.

Dorian Granoša

Ante Gojsalić

SplxAI – OpenAI & AI TInkerers Hackathon Cover
SplxAI – OpenAI & AI TInkerers Hackathon Cover
SplxAI – OpenAI & AI TInkerers Hackathon Cover

On April 25th, 2025, Warsaw (Poland) became the global centre of agentic innovation, by hosting the largest AI hackathon to date on European soil: The OpenAI x AI Tinkerers Hackathon. With over 1,000 applicants and only 40 teams selected for the 24-hour challenge, the event brought together some of the most promising builders, researchers, and product minds working on the next generation of AI-native products.

All participating teams developed their projects using the OpenAI Agents SDK, pushing the limits of what agent-based AI applications can do – and revealing just how much more we need to do to keep them secure.

SplxAI was proud to be one of the sponsors of this event – and even more proud that our open-source agentic security scanner, Agentic Radar, was used as one of the core scoring tools by the judging panel. Agentic Radar was utilized to evaluate the quality, safety, and security of agentic architectures submitted by all participating teams.

In what may be the largest single batch of agentic AI applications developed and analyzed in one place, every project submitted at the hackathon was run through Agentic Radar. The goal: to assess architectural structures and highlight potential security flaws and risk exposure in the agentic AI systems.

On April 25th, 2025, Warsaw (Poland) became the global centre of agentic innovation, by hosting the largest AI hackathon to date on European soil: The OpenAI x AI Tinkerers Hackathon. With over 1,000 applicants and only 40 teams selected for the 24-hour challenge, the event brought together some of the most promising builders, researchers, and product minds working on the next generation of AI-native products.

All participating teams developed their projects using the OpenAI Agents SDK, pushing the limits of what agent-based AI applications can do – and revealing just how much more we need to do to keep them secure.

SplxAI was proud to be one of the sponsors of this event – and even more proud that our open-source agentic security scanner, Agentic Radar, was used as one of the core scoring tools by the judging panel. Agentic Radar was utilized to evaluate the quality, safety, and security of agentic architectures submitted by all participating teams.

In what may be the largest single batch of agentic AI applications developed and analyzed in one place, every project submitted at the hackathon was run through Agentic Radar. The goal: to assess architectural structures and highlight potential security flaws and risk exposure in the agentic AI systems.

On April 25th, 2025, Warsaw (Poland) became the global centre of agentic innovation, by hosting the largest AI hackathon to date on European soil: The OpenAI x AI Tinkerers Hackathon. With over 1,000 applicants and only 40 teams selected for the 24-hour challenge, the event brought together some of the most promising builders, researchers, and product minds working on the next generation of AI-native products.

All participating teams developed their projects using the OpenAI Agents SDK, pushing the limits of what agent-based AI applications can do – and revealing just how much more we need to do to keep them secure.

SplxAI was proud to be one of the sponsors of this event – and even more proud that our open-source agentic security scanner, Agentic Radar, was used as one of the core scoring tools by the judging panel. Agentic Radar was utilized to evaluate the quality, safety, and security of agentic architectures submitted by all participating teams.

In what may be the largest single batch of agentic AI applications developed and analyzed in one place, every project submitted at the hackathon was run through Agentic Radar. The goal: to assess architectural structures and highlight potential security flaws and risk exposure in the agentic AI systems.

Key Security & Safety Findings from 40 Agentic Projects

The hackathon was a breeding ground for creativity – but also revealed how early we are in building secure and robust agentic AI systems. Below are the key insights that were uncovered by Agentic Radar:

1. Over-reliance on built-in LLM Guardrails

  • 98% of teams shipped their workflows without adding a single layer of protection beyond what the LLM provides out of the box.

  • Only a few teams implemented layered defense mechanisms or external filters, exposing a systemic overtrust in model providers.

2. Neglected Risks: Data and Supply Chain Poisoning

  • 0% of teams had considered or implemented defenses against Datasource Poisoning or Supply Chain Poisoning.

  • This reflects a critical blind spot, especially as many workflows have pulled in external plugins, APIs, or third-party datasets.

3. Minimal Use of Model Context Protocols (MCPs)

  • Only 3% of the teams embedded Model Context Protocols (MCPs) in their workflows – structured approaches for overseeing agent actions, memory, and reasoning. This was surprising given the recent rise in popularity of MCPs across the agentic AI community.

4. Intentional Misuse Was the Biggest Concern

  • 20% of the projects implemented some form of safeguards against Intentional Misuse.

  • This left the majority of agentic workflows vulnerable to out-of-context interactions.

5. Recognized Risk ≠ Action Taken

  • 86% of participants acknowledged that harmful content generation and jailbreaks were among their biggest concerns.

  • Yet, only 10% took steps to add additional safety layers beyond default LLM protections.

6. Complexity of Agentic Workflows

  • The most complex architecture featured 17 agents in a single workflow.

  • On average, submitted projects included 4 agents and 3 tools, reflecting the rising complexity and interconnectivity of agentic systems – and the corresponding increase in attack surface. Notably, this was just a short hackathon – real-world products are likely to involve far greater complexity and risk exposure.

The hackathon was a breeding ground for creativity – but also revealed how early we are in building secure and robust agentic AI systems. Below are the key insights that were uncovered by Agentic Radar:

1. Over-reliance on built-in LLM Guardrails

  • 98% of teams shipped their workflows without adding a single layer of protection beyond what the LLM provides out of the box.

  • Only a few teams implemented layered defense mechanisms or external filters, exposing a systemic overtrust in model providers.

2. Neglected Risks: Data and Supply Chain Poisoning

  • 0% of teams had considered or implemented defenses against Datasource Poisoning or Supply Chain Poisoning.

  • This reflects a critical blind spot, especially as many workflows have pulled in external plugins, APIs, or third-party datasets.

3. Minimal Use of Model Context Protocols (MCPs)

  • Only 3% of the teams embedded Model Context Protocols (MCPs) in their workflows – structured approaches for overseeing agent actions, memory, and reasoning. This was surprising given the recent rise in popularity of MCPs across the agentic AI community.

4. Intentional Misuse Was the Biggest Concern

  • 20% of the projects implemented some form of safeguards against Intentional Misuse.

  • This left the majority of agentic workflows vulnerable to out-of-context interactions.

5. Recognized Risk ≠ Action Taken

  • 86% of participants acknowledged that harmful content generation and jailbreaks were among their biggest concerns.

  • Yet, only 10% took steps to add additional safety layers beyond default LLM protections.

6. Complexity of Agentic Workflows

  • The most complex architecture featured 17 agents in a single workflow.

  • On average, submitted projects included 4 agents and 3 tools, reflecting the rising complexity and interconnectivity of agentic systems – and the corresponding increase in attack surface. Notably, this was just a short hackathon – real-world products are likely to involve far greater complexity and risk exposure.

The hackathon was a breeding ground for creativity – but also revealed how early we are in building secure and robust agentic AI systems. Below are the key insights that were uncovered by Agentic Radar:

1. Over-reliance on built-in LLM Guardrails

  • 98% of teams shipped their workflows without adding a single layer of protection beyond what the LLM provides out of the box.

  • Only a few teams implemented layered defense mechanisms or external filters, exposing a systemic overtrust in model providers.

2. Neglected Risks: Data and Supply Chain Poisoning

  • 0% of teams had considered or implemented defenses against Datasource Poisoning or Supply Chain Poisoning.

  • This reflects a critical blind spot, especially as many workflows have pulled in external plugins, APIs, or third-party datasets.

3. Minimal Use of Model Context Protocols (MCPs)

  • Only 3% of the teams embedded Model Context Protocols (MCPs) in their workflows – structured approaches for overseeing agent actions, memory, and reasoning. This was surprising given the recent rise in popularity of MCPs across the agentic AI community.

4. Intentional Misuse Was the Biggest Concern

  • 20% of the projects implemented some form of safeguards against Intentional Misuse.

  • This left the majority of agentic workflows vulnerable to out-of-context interactions.

5. Recognized Risk ≠ Action Taken

  • 86% of participants acknowledged that harmful content generation and jailbreaks were among their biggest concerns.

  • Yet, only 10% took steps to add additional safety layers beyond default LLM protections.

6. Complexity of Agentic Workflows

  • The most complex architecture featured 17 agents in a single workflow.

  • On average, submitted projects included 4 agents and 3 tools, reflecting the rising complexity and interconnectivity of agentic systems – and the corresponding increase in attack surface. Notably, this was just a short hackathon – real-world products are likely to involve far greater complexity and risk exposure.

The Full Breakdown of Submitted Agentic Projects

Below, you’ll find a detailed table listing each project submitted during the hackathon – including a link to the project overview and repository, the full Agentic Radar report, a summary of the AI Bill of Materials (AI BOMs) showing the number of agents and tools used in the workflow, and the number of detected risks.

This structured approach gave the judging panel immediate visibility into the architecture, complexity, and risk exposure of each agentic solution – something that would have been nearly impossible to evaluate manually within the 24-hour hackathon timeframe.

Project

Full Report

AI BOM Summary

Detected Risks

ConNect

Link

17 agents, 9 tools

20+

Air Patrol

Link

4 agents, 2 tools

20+

AsystentNFZ

Link

1 agent, 3 tools

6

Meeting Guru

Link

1 agent, 0 tools

6

AI Agents-Enhanced To-Do List

Link

3 agents, 1 tool

17

Slow Takeoff

Link

1 agent, 0 tools

6

SmartMap AI

Link

4 agents, 2 tools

20+

Agent Gateway

Link

3 agents, 3 tools

17

Loomy

Link

4 agents, 3 tools

20+

Ask Your Neighbour

Link

3 agents, 0 tools

12

MachineUnlearning

Link

5 agents, 0 tools

20+

MARA

Link

3 agents, 9 tools

18

Cue

Link

7 agents, 5 tools

20+

BrandMate

Link

3 agents, 9 tools

18

Night Receptionist

Link

5 agents, 4 tools

20+

Sentio Platform

Link

3 agents, 1 tool

18

CareAgent

Link

1 agent, 1 tool

6

LearnScope

Link

3 agents, 0 tools

17

oHacker

Link

4 agents, 1 tool

20+

CrunchByte

Link

1 agent, 0 tools

6

Agentic Radar enabled the judging panel to instantly review the architectural integrity, agent interactions, and security posture of OpenAI Agent-based solutions – all in a single automated scan.

To illustrate how the tool was used in practice, we’re highlighting one of our favorite hackathon finalists below. This example demonstrates how a well-designed agentic system can balance complexity, innovation, and security.

Below, you’ll find a detailed table listing each project submitted during the hackathon – including a link to the project overview and repository, the full Agentic Radar report, a summary of the AI Bill of Materials (AI BOMs) showing the number of agents and tools used in the workflow, and the number of detected risks.

This structured approach gave the judging panel immediate visibility into the architecture, complexity, and risk exposure of each agentic solution – something that would have been nearly impossible to evaluate manually within the 24-hour hackathon timeframe.

Project

Full Report

AI BOM Summary

Detected Risks

ConNect

Link

17 agents, 9 tools

20+

Air Patrol

Link

4 agents, 2 tools

20+

AsystentNFZ

Link

1 agent, 3 tools

6

Meeting Guru

Link

1 agent, 0 tools

6

AI Agents-Enhanced To-Do List

Link

3 agents, 1 tool

17

Slow Takeoff

Link

1 agent, 0 tools

6

SmartMap AI

Link

4 agents, 2 tools

20+

Agent Gateway

Link

3 agents, 3 tools

17

Loomy

Link

4 agents, 3 tools

20+

Ask Your Neighbour

Link

3 agents, 0 tools

12

MachineUnlearning

Link

5 agents, 0 tools

20+

MARA

Link

3 agents, 9 tools

18

Cue

Link

7 agents, 5 tools

20+

BrandMate

Link

3 agents, 9 tools

18

Night Receptionist

Link

5 agents, 4 tools

20+

Sentio Platform

Link

3 agents, 1 tool

18

CareAgent

Link

1 agent, 1 tool

6

LearnScope

Link

3 agents, 0 tools

17

oHacker

Link

4 agents, 1 tool

20+

CrunchByte

Link

1 agent, 0 tools

6

Agentic Radar enabled the judging panel to instantly review the architectural integrity, agent interactions, and security posture of OpenAI Agent-based solutions – all in a single automated scan.

To illustrate how the tool was used in practice, we’re highlighting one of our favorite hackathon finalists below. This example demonstrates how a well-designed agentic system can balance complexity, innovation, and security.

Below, you’ll find a detailed table listing each project submitted during the hackathon – including a link to the project overview and repository, the full Agentic Radar report, a summary of the AI Bill of Materials (AI BOMs) showing the number of agents and tools used in the workflow, and the number of detected risks.

This structured approach gave the judging panel immediate visibility into the architecture, complexity, and risk exposure of each agentic solution – something that would have been nearly impossible to evaluate manually within the 24-hour hackathon timeframe.

Project

Full Report

AI BOM Summary

Detected Risks

ConNect

Link

17 agents, 9 tools

20+

Air Patrol

Link

4 agents, 2 tools

20+

AsystentNFZ

Link

1 agent, 3 tools

6

Meeting Guru

Link

1 agent, 0 tools

6

AI Agents-Enhanced To-Do List

Link

3 agents, 1 tool

17

Slow Takeoff

Link

1 agent, 0 tools

6

SmartMap AI

Link

4 agents, 2 tools

20+

Agent Gateway

Link

3 agents, 3 tools

17

Loomy

Link

4 agents, 3 tools

20+

Ask Your Neighbour

Link

3 agents, 0 tools

12

MachineUnlearning

Link

5 agents, 0 tools

20+

MARA

Link

3 agents, 9 tools

18

Cue

Link

7 agents, 5 tools

20+

BrandMate

Link

3 agents, 9 tools

18

Night Receptionist

Link

5 agents, 4 tools

20+

Sentio Platform

Link

3 agents, 1 tool

18

CareAgent

Link

1 agent, 1 tool

6

LearnScope

Link

3 agents, 0 tools

17

oHacker

Link

4 agents, 1 tool

20+

CrunchByte

Link

1 agent, 0 tools

6

Agentic Radar enabled the judging panel to instantly review the architectural integrity, agent interactions, and security posture of OpenAI Agent-based solutions – all in a single automated scan.

To illustrate how the tool was used in practice, we’re highlighting one of our favorite hackathon finalists below. This example demonstrates how a well-designed agentic system can balance complexity, innovation, and security.

Agentic Workflow Spotlight: ConNect

One standout project from the hackathon was ConNect, an application designed to help parents build stronger relationships with their children. It plans engaging activities and generates visually appealing stories that both entertain and educate.

According to the full Agentic Radar report, the ConNect team implemented:

  • 17 Agents, each with a clearly defined task and minimal role overlap

  • 9 Tools, integrated to support planning, content generation, and visual storytelling

You can view the project submission here: ConNect Hackathon Entry

Agentic Workflow Graph:

Agentic Worflow Example

Agents Overview:

Agents Overview Example

Agentic Vulnerabilities:

Agentic Vulnerabilities Example

Despite its well-designed architecture, the Agentic Radar scan surfaced over 20 security vulnerabilities in the ConNect workflow. Most of the agents lacked any form of implemented guardrails or misuse protections, making the system vulnerable to prompt injection, unintended behaviors, and harmful outputs.

This underlines a key takeaway from the hackathon: Clarity in design doesn’t automatically equal safety. Even the most well-structured agentic workflows need some form of protection to ensure resilience against misuse and unintended behavior.

One standout project from the hackathon was ConNect, an application designed to help parents build stronger relationships with their children. It plans engaging activities and generates visually appealing stories that both entertain and educate.

According to the full Agentic Radar report, the ConNect team implemented:

  • 17 Agents, each with a clearly defined task and minimal role overlap

  • 9 Tools, integrated to support planning, content generation, and visual storytelling

You can view the project submission here: ConNect Hackathon Entry

Agentic Workflow Graph:

Agentic Worflow Example

Agents Overview:

Agents Overview Example

Agentic Vulnerabilities:

Agentic Vulnerabilities Example

Despite its well-designed architecture, the Agentic Radar scan surfaced over 20 security vulnerabilities in the ConNect workflow. Most of the agents lacked any form of implemented guardrails or misuse protections, making the system vulnerable to prompt injection, unintended behaviors, and harmful outputs.

This underlines a key takeaway from the hackathon: Clarity in design doesn’t automatically equal safety. Even the most well-structured agentic workflows need some form of protection to ensure resilience against misuse and unintended behavior.

One standout project from the hackathon was ConNect, an application designed to help parents build stronger relationships with their children. It plans engaging activities and generates visually appealing stories that both entertain and educate.

According to the full Agentic Radar report, the ConNect team implemented:

  • 17 Agents, each with a clearly defined task and minimal role overlap

  • 9 Tools, integrated to support planning, content generation, and visual storytelling

You can view the project submission here: ConNect Hackathon Entry

Agentic Workflow Graph:

Agentic Worflow Example

Agents Overview:

Agents Overview Example

Agentic Vulnerabilities:

Agentic Vulnerabilities Example

Despite its well-designed architecture, the Agentic Radar scan surfaced over 20 security vulnerabilities in the ConNect workflow. Most of the agents lacked any form of implemented guardrails or misuse protections, making the system vulnerable to prompt injection, unintended behaviors, and harmful outputs.

This underlines a key takeaway from the hackathon: Clarity in design doesn’t automatically equal safety. Even the most well-structured agentic workflows need some form of protection to ensure resilience against misuse and unintended behavior.

Conclusion: Building Fast Is No Excuse for Insecure AI

In the fast-paced, high-pressure environment of a 24-hour hackathon, it’s perhaps unsurprising that 80% of teams shipped their applications without implementing any additional security measures.

However, this comes with some real consequences. Simply enabling internet access for an agentic application dramatically expands the attack surface – and without proper safeguards, even the most innovative solutions can quickly become vulnerable.

While the OpenAI Agent SDK includes a straightforward implementation for Guardrails agents, most participants chose not to use it. The primary reason? It reduced result accuracy and increased the number of incorrect refusals during early testing – leading many teams to intentionally deprioritize security in favor of smoother user experience or faster prototyping.

This tradeoff might be tolerated in a hackathon context – but in real-world, enterprise-grade deployments, security and safety cannot be optional.

At SplxAI, we believe the future of AI won’t be defined just by what agents can do, but by how safely and reliably they do it. Hackathons like this are an exciting proving ground – and tools like Agentic Radar are here to make sure the next generation of AI products are not only powerful, but trustworthy. 

If you're building with agents or deploying AI into real-world environments, don't leave security as an afterthought. Reach out to our team to have your AI workflows and applications tested, hardened, and secured – before vulnerabilities turn into headlines.

In the fast-paced, high-pressure environment of a 24-hour hackathon, it’s perhaps unsurprising that 80% of teams shipped their applications without implementing any additional security measures.

However, this comes with some real consequences. Simply enabling internet access for an agentic application dramatically expands the attack surface – and without proper safeguards, even the most innovative solutions can quickly become vulnerable.

While the OpenAI Agent SDK includes a straightforward implementation for Guardrails agents, most participants chose not to use it. The primary reason? It reduced result accuracy and increased the number of incorrect refusals during early testing – leading many teams to intentionally deprioritize security in favor of smoother user experience or faster prototyping.

This tradeoff might be tolerated in a hackathon context – but in real-world, enterprise-grade deployments, security and safety cannot be optional.

At SplxAI, we believe the future of AI won’t be defined just by what agents can do, but by how safely and reliably they do it. Hackathons like this are an exciting proving ground – and tools like Agentic Radar are here to make sure the next generation of AI products are not only powerful, but trustworthy. 

If you're building with agents or deploying AI into real-world environments, don't leave security as an afterthought. Reach out to our team to have your AI workflows and applications tested, hardened, and secured – before vulnerabilities turn into headlines.

In the fast-paced, high-pressure environment of a 24-hour hackathon, it’s perhaps unsurprising that 80% of teams shipped their applications without implementing any additional security measures.

However, this comes with some real consequences. Simply enabling internet access for an agentic application dramatically expands the attack surface – and without proper safeguards, even the most innovative solutions can quickly become vulnerable.

While the OpenAI Agent SDK includes a straightforward implementation for Guardrails agents, most participants chose not to use it. The primary reason? It reduced result accuracy and increased the number of incorrect refusals during early testing – leading many teams to intentionally deprioritize security in favor of smoother user experience or faster prototyping.

This tradeoff might be tolerated in a hackathon context – but in real-world, enterprise-grade deployments, security and safety cannot be optional.

At SplxAI, we believe the future of AI won’t be defined just by what agents can do, but by how safely and reliably they do it. Hackathons like this are an exciting proving ground – and tools like Agentic Radar are here to make sure the next generation of AI products are not only powerful, but trustworthy. 

If you're building with agents or deploying AI into real-world environments, don't leave security as an afterthought. Reach out to our team to have your AI workflows and applications tested, hardened, and secured – before vulnerabilities turn into headlines.

Ready to leverage AI with confidence?

Ready to leverage AI with confidence?

Ready to leverage AI with confidence?

Leverage GenAI technology securely with SplxAI

Join a number of enterprises that trust SplxAI for their AI Security needs:

CX platforms

Sales platforms

Conversational AI

Finance & banking

Insurances

CPaaS providers

300+

Tested GenAI apps

100k+

Vulnerabilities found

1,000+

Unique attack scenarios

12x

Accelerated deployments

SECURITY YOU CAN TRUST

GDPR

COMPLIANT

CCPA

COMPLIANT

ISO 27001

CERTIFIED

SOC 2 TYPE II

COMPLIANT

OWASP

CONTRIBUTORS

Leverage GenAI technology securely with SplxAI

Join a number of enterprises that trust SplxAI for their AI Security needs:

CX platforms

Sales platforms

Conversational AI

Finance & banking

Insurances

CPaaS providers

300+

Tested GenAI apps

100k+

Vulnerabilities found

1,000+

Unique attack scenarios

12x

Accelerated deployments

SECURITY YOU CAN TRUST

GDPR

COMPLIANT

CCPA

COMPLIANT

ISO 27001

CERTIFIED

SOC 2 TYPE II

COMPLIANT

OWASP

CONTRIBUTORS

Leverage GenAI technology securely with SplxAI

Join a number of enterprises that trust SplxAI for their AI Security needs:

CX platforms

Sales platforms

Conversational AI

Finance & banking

Insurances

CPaaS providers

300+

Tested GenAI apps

100k+

Vulnerabilities found

1,000+

Unique attack scenarios

12x

Accelerated deployments

SECURITY YOU CAN TRUST

GDPR

COMPLIANT

CCPA

COMPLIANT

ISO 27001

CERTIFIED

SOC 2 TYPE II

COMPLIANT

OWASP

CONTRIBUTORS

Deploy secure AI Assistants and Agents with confidence.

Don’t wait for an incident to happen. Proactively identify and remediate your AI's vulnerabilities to ensure you're protected at all times.

SplxAI - Background Pattern

Deploy secure AI Assistants and Agents with confidence.

Don’t wait for an incident to happen. Proactively identify and remediate your AI's vulnerabilities to ensure you're protected at all times.

SplxAI - Background Pattern

Deploy secure AI Assistants and Agents with confidence.

Don’t wait for an incident to happen. Proactively identify and remediate your AI's vulnerabilities to ensure you're protected at all times.

SplxAI - Accelerator Programs
SplxAI Logo

For a future of safe and trustworthy AI.

Subscribe to our newsletter

By clicking "Subscribe" you agree to our privacy policy.

SplxAI - Accelerator Programs
SplxAI Logo

For a future of safe and trustworthy AI.

Subscribe to our newsletter

By clicking "Subscribe" you agree to our privacy policy.

SplxAI Logo

For a future of safe and trustworthy AI.

Subscribe to our newsletter

By clicking "Subscribe" you agree to our privacy policy.