On April 25th, 2025, Warsaw (Poland) became the global centre of agentic innovation, by hosting the largest AI hackathon to date on European soil: The OpenAI x AI Tinkerers Hackathon. With over 1,000 applicants and only 40 teams selected for the 24-hour challenge, the event brought together some of the most promising builders, researchers, and product minds working on the next generation of AI-native products.
All participating teams developed their projects using the OpenAI Agents SDK, pushing the limits of what agent-based AI applications can do – and revealing just how much more we need to do to keep them secure.
SplxAI was proud to be one of the sponsors of this event – and even more proud that our open-source agentic security scanner, Agentic Radar, was used as one of the core scoring tools by the judging panel. Agentic Radar was utilized to evaluate the quality, safety, and security of agentic architectures submitted by all participating teams.
In what may be the largest single batch of agentic AI applications developed and analyzed in one place, every project submitted at the hackathon was run through Agentic Radar. The goal: to assess architectural structures and highlight potential security flaws and risk exposure in the agentic AI systems.
On April 25th, 2025, Warsaw (Poland) became the global centre of agentic innovation, by hosting the largest AI hackathon to date on European soil: The OpenAI x AI Tinkerers Hackathon. With over 1,000 applicants and only 40 teams selected for the 24-hour challenge, the event brought together some of the most promising builders, researchers, and product minds working on the next generation of AI-native products.
All participating teams developed their projects using the OpenAI Agents SDK, pushing the limits of what agent-based AI applications can do – and revealing just how much more we need to do to keep them secure.
SplxAI was proud to be one of the sponsors of this event – and even more proud that our open-source agentic security scanner, Agentic Radar, was used as one of the core scoring tools by the judging panel. Agentic Radar was utilized to evaluate the quality, safety, and security of agentic architectures submitted by all participating teams.
In what may be the largest single batch of agentic AI applications developed and analyzed in one place, every project submitted at the hackathon was run through Agentic Radar. The goal: to assess architectural structures and highlight potential security flaws and risk exposure in the agentic AI systems.
On April 25th, 2025, Warsaw (Poland) became the global centre of agentic innovation, by hosting the largest AI hackathon to date on European soil: The OpenAI x AI Tinkerers Hackathon. With over 1,000 applicants and only 40 teams selected for the 24-hour challenge, the event brought together some of the most promising builders, researchers, and product minds working on the next generation of AI-native products.
All participating teams developed their projects using the OpenAI Agents SDK, pushing the limits of what agent-based AI applications can do – and revealing just how much more we need to do to keep them secure.
SplxAI was proud to be one of the sponsors of this event – and even more proud that our open-source agentic security scanner, Agentic Radar, was used as one of the core scoring tools by the judging panel. Agentic Radar was utilized to evaluate the quality, safety, and security of agentic architectures submitted by all participating teams.
In what may be the largest single batch of agentic AI applications developed and analyzed in one place, every project submitted at the hackathon was run through Agentic Radar. The goal: to assess architectural structures and highlight potential security flaws and risk exposure in the agentic AI systems.
Key Security & Safety Findings from 40 Agentic Projects
The hackathon was a breeding ground for creativity – but also revealed how early we are in building secure and robust agentic AI systems. Below are the key insights that were uncovered by Agentic Radar:
1. Over-reliance on built-in LLM Guardrails
98% of teams shipped their workflows without adding a single layer of protection beyond what the LLM provides out of the box.
Only a few teams implemented layered defense mechanisms or external filters, exposing a systemic overtrust in model providers.
2. Neglected Risks: Data and Supply Chain Poisoning
0% of teams had considered or implemented defenses against Datasource Poisoning or Supply Chain Poisoning.
This reflects a critical blind spot, especially as many workflows have pulled in external plugins, APIs, or third-party datasets.
3. Minimal Use of Model Context Protocols (MCPs)
Only 3% of the teams embedded Model Context Protocols (MCPs) in their workflows – structured approaches for overseeing agent actions, memory, and reasoning. This was surprising given the recent rise in popularity of MCPs across the agentic AI community.
4. Intentional Misuse Was the Biggest Concern
20% of the projects implemented some form of safeguards against Intentional Misuse.
This left the majority of agentic workflows vulnerable to out-of-context interactions.
5. Recognized Risk ≠ Action Taken
86% of participants acknowledged that harmful content generation and jailbreaks were among their biggest concerns.
Yet, only 10% took steps to add additional safety layers beyond default LLM protections.
6. Complexity of Agentic Workflows
The most complex architecture featured 17 agents in a single workflow.
On average, submitted projects included 4 agents and 3 tools, reflecting the rising complexity and interconnectivity of agentic systems – and the corresponding increase in attack surface. Notably, this was just a short hackathon – real-world products are likely to involve far greater complexity and risk exposure.
The hackathon was a breeding ground for creativity – but also revealed how early we are in building secure and robust agentic AI systems. Below are the key insights that were uncovered by Agentic Radar:
1. Over-reliance on built-in LLM Guardrails
98% of teams shipped their workflows without adding a single layer of protection beyond what the LLM provides out of the box.
Only a few teams implemented layered defense mechanisms or external filters, exposing a systemic overtrust in model providers.
2. Neglected Risks: Data and Supply Chain Poisoning
0% of teams had considered or implemented defenses against Datasource Poisoning or Supply Chain Poisoning.
This reflects a critical blind spot, especially as many workflows have pulled in external plugins, APIs, or third-party datasets.
3. Minimal Use of Model Context Protocols (MCPs)
Only 3% of the teams embedded Model Context Protocols (MCPs) in their workflows – structured approaches for overseeing agent actions, memory, and reasoning. This was surprising given the recent rise in popularity of MCPs across the agentic AI community.
4. Intentional Misuse Was the Biggest Concern
20% of the projects implemented some form of safeguards against Intentional Misuse.
This left the majority of agentic workflows vulnerable to out-of-context interactions.
5. Recognized Risk ≠ Action Taken
86% of participants acknowledged that harmful content generation and jailbreaks were among their biggest concerns.
Yet, only 10% took steps to add additional safety layers beyond default LLM protections.
6. Complexity of Agentic Workflows
The most complex architecture featured 17 agents in a single workflow.
On average, submitted projects included 4 agents and 3 tools, reflecting the rising complexity and interconnectivity of agentic systems – and the corresponding increase in attack surface. Notably, this was just a short hackathon – real-world products are likely to involve far greater complexity and risk exposure.
The hackathon was a breeding ground for creativity – but also revealed how early we are in building secure and robust agentic AI systems. Below are the key insights that were uncovered by Agentic Radar:
1. Over-reliance on built-in LLM Guardrails
98% of teams shipped their workflows without adding a single layer of protection beyond what the LLM provides out of the box.
Only a few teams implemented layered defense mechanisms or external filters, exposing a systemic overtrust in model providers.
2. Neglected Risks: Data and Supply Chain Poisoning
0% of teams had considered or implemented defenses against Datasource Poisoning or Supply Chain Poisoning.
This reflects a critical blind spot, especially as many workflows have pulled in external plugins, APIs, or third-party datasets.
3. Minimal Use of Model Context Protocols (MCPs)
Only 3% of the teams embedded Model Context Protocols (MCPs) in their workflows – structured approaches for overseeing agent actions, memory, and reasoning. This was surprising given the recent rise in popularity of MCPs across the agentic AI community.
4. Intentional Misuse Was the Biggest Concern
20% of the projects implemented some form of safeguards against Intentional Misuse.
This left the majority of agentic workflows vulnerable to out-of-context interactions.
5. Recognized Risk ≠ Action Taken
86% of participants acknowledged that harmful content generation and jailbreaks were among their biggest concerns.
Yet, only 10% took steps to add additional safety layers beyond default LLM protections.
6. Complexity of Agentic Workflows
The most complex architecture featured 17 agents in a single workflow.
On average, submitted projects included 4 agents and 3 tools, reflecting the rising complexity and interconnectivity of agentic systems – and the corresponding increase in attack surface. Notably, this was just a short hackathon – real-world products are likely to involve far greater complexity and risk exposure.
The Full Breakdown of Submitted Agentic Projects
Below, you’ll find a detailed table listing each project submitted during the hackathon – including a link to the project overview and repository, the full Agentic Radar report, a summary of the AI Bill of Materials (AI BOMs) showing the number of agents and tools used in the workflow, and the number of detected risks.
This structured approach gave the judging panel immediate visibility into the architecture, complexity, and risk exposure of each agentic solution – something that would have been nearly impossible to evaluate manually within the 24-hour hackathon timeframe.
Project | Full Report | AI BOM Summary | Detected Risks |
---|---|---|---|
17 agents, 9 tools | 20+ | ||
4 agents, 2 tools | 20+ | ||
1 agent, 3 tools | 6 | ||
1 agent, 0 tools | 6 | ||
3 agents, 1 tool | 17 | ||
1 agent, 0 tools | 6 | ||
4 agents, 2 tools | 20+ | ||
3 agents, 3 tools | 17 | ||
4 agents, 3 tools | 20+ | ||
3 agents, 0 tools | 12 | ||
5 agents, 0 tools | 20+ | ||
3 agents, 9 tools | 18 | ||
7 agents, 5 tools | 20+ | ||
3 agents, 9 tools | 18 | ||
5 agents, 4 tools | 20+ | ||
3 agents, 1 tool | 18 | ||
1 agent, 1 tool | 6 | ||
3 agents, 0 tools | 17 | ||
4 agents, 1 tool | 20+ | ||
1 agent, 0 tools | 6 |
Agentic Radar enabled the judging panel to instantly review the architectural integrity, agent interactions, and security posture of OpenAI Agent-based solutions – all in a single automated scan.
To illustrate how the tool was used in practice, we’re highlighting one of our favorite hackathon finalists below. This example demonstrates how a well-designed agentic system can balance complexity, innovation, and security.
Below, you’ll find a detailed table listing each project submitted during the hackathon – including a link to the project overview and repository, the full Agentic Radar report, a summary of the AI Bill of Materials (AI BOMs) showing the number of agents and tools used in the workflow, and the number of detected risks.
This structured approach gave the judging panel immediate visibility into the architecture, complexity, and risk exposure of each agentic solution – something that would have been nearly impossible to evaluate manually within the 24-hour hackathon timeframe.
Project | Full Report | AI BOM Summary | Detected Risks |
---|---|---|---|
17 agents, 9 tools | 20+ | ||
4 agents, 2 tools | 20+ | ||
1 agent, 3 tools | 6 | ||
1 agent, 0 tools | 6 | ||
3 agents, 1 tool | 17 | ||
1 agent, 0 tools | 6 | ||
4 agents, 2 tools | 20+ | ||
3 agents, 3 tools | 17 | ||
4 agents, 3 tools | 20+ | ||
3 agents, 0 tools | 12 | ||
5 agents, 0 tools | 20+ | ||
3 agents, 9 tools | 18 | ||
7 agents, 5 tools | 20+ | ||
3 agents, 9 tools | 18 | ||
5 agents, 4 tools | 20+ | ||
3 agents, 1 tool | 18 | ||
1 agent, 1 tool | 6 | ||
3 agents, 0 tools | 17 | ||
4 agents, 1 tool | 20+ | ||
1 agent, 0 tools | 6 |
Agentic Radar enabled the judging panel to instantly review the architectural integrity, agent interactions, and security posture of OpenAI Agent-based solutions – all in a single automated scan.
To illustrate how the tool was used in practice, we’re highlighting one of our favorite hackathon finalists below. This example demonstrates how a well-designed agentic system can balance complexity, innovation, and security.
Below, you’ll find a detailed table listing each project submitted during the hackathon – including a link to the project overview and repository, the full Agentic Radar report, a summary of the AI Bill of Materials (AI BOMs) showing the number of agents and tools used in the workflow, and the number of detected risks.
This structured approach gave the judging panel immediate visibility into the architecture, complexity, and risk exposure of each agentic solution – something that would have been nearly impossible to evaluate manually within the 24-hour hackathon timeframe.
Project | Full Report | AI BOM Summary | Detected Risks |
---|---|---|---|
17 agents, 9 tools | 20+ | ||
4 agents, 2 tools | 20+ | ||
1 agent, 3 tools | 6 | ||
1 agent, 0 tools | 6 | ||
3 agents, 1 tool | 17 | ||
1 agent, 0 tools | 6 | ||
4 agents, 2 tools | 20+ | ||
3 agents, 3 tools | 17 | ||
4 agents, 3 tools | 20+ | ||
3 agents, 0 tools | 12 | ||
5 agents, 0 tools | 20+ | ||
3 agents, 9 tools | 18 | ||
7 agents, 5 tools | 20+ | ||
3 agents, 9 tools | 18 | ||
5 agents, 4 tools | 20+ | ||
3 agents, 1 tool | 18 | ||
1 agent, 1 tool | 6 | ||
3 agents, 0 tools | 17 | ||
4 agents, 1 tool | 20+ | ||
1 agent, 0 tools | 6 |
Agentic Radar enabled the judging panel to instantly review the architectural integrity, agent interactions, and security posture of OpenAI Agent-based solutions – all in a single automated scan.
To illustrate how the tool was used in practice, we’re highlighting one of our favorite hackathon finalists below. This example demonstrates how a well-designed agentic system can balance complexity, innovation, and security.
Agentic Workflow Spotlight: ConNect
One standout project from the hackathon was ConNect, an application designed to help parents build stronger relationships with their children. It plans engaging activities and generates visually appealing stories that both entertain and educate.
According to the full Agentic Radar report, the ConNect team implemented:
17 Agents, each with a clearly defined task and minimal role overlap
9 Tools, integrated to support planning, content generation, and visual storytelling
You can view the project submission here: ConNect Hackathon Entry
Agentic Workflow Graph:

Agents Overview:

Agentic Vulnerabilities:

Despite its well-designed architecture, the Agentic Radar scan surfaced over 20 security vulnerabilities in the ConNect workflow. Most of the agents lacked any form of implemented guardrails or misuse protections, making the system vulnerable to prompt injection, unintended behaviors, and harmful outputs.
This underlines a key takeaway from the hackathon: Clarity in design doesn’t automatically equal safety. Even the most well-structured agentic workflows need some form of protection to ensure resilience against misuse and unintended behavior.
One standout project from the hackathon was ConNect, an application designed to help parents build stronger relationships with their children. It plans engaging activities and generates visually appealing stories that both entertain and educate.
According to the full Agentic Radar report, the ConNect team implemented:
17 Agents, each with a clearly defined task and minimal role overlap
9 Tools, integrated to support planning, content generation, and visual storytelling
You can view the project submission here: ConNect Hackathon Entry
Agentic Workflow Graph:

Agents Overview:

Agentic Vulnerabilities:

Despite its well-designed architecture, the Agentic Radar scan surfaced over 20 security vulnerabilities in the ConNect workflow. Most of the agents lacked any form of implemented guardrails or misuse protections, making the system vulnerable to prompt injection, unintended behaviors, and harmful outputs.
This underlines a key takeaway from the hackathon: Clarity in design doesn’t automatically equal safety. Even the most well-structured agentic workflows need some form of protection to ensure resilience against misuse and unintended behavior.
One standout project from the hackathon was ConNect, an application designed to help parents build stronger relationships with their children. It plans engaging activities and generates visually appealing stories that both entertain and educate.
According to the full Agentic Radar report, the ConNect team implemented:
17 Agents, each with a clearly defined task and minimal role overlap
9 Tools, integrated to support planning, content generation, and visual storytelling
You can view the project submission here: ConNect Hackathon Entry
Agentic Workflow Graph:

Agents Overview:

Agentic Vulnerabilities:

Despite its well-designed architecture, the Agentic Radar scan surfaced over 20 security vulnerabilities in the ConNect workflow. Most of the agents lacked any form of implemented guardrails or misuse protections, making the system vulnerable to prompt injection, unintended behaviors, and harmful outputs.
This underlines a key takeaway from the hackathon: Clarity in design doesn’t automatically equal safety. Even the most well-structured agentic workflows need some form of protection to ensure resilience against misuse and unintended behavior.
Conclusion: Building Fast Is No Excuse for Insecure AI
In the fast-paced, high-pressure environment of a 24-hour hackathon, it’s perhaps unsurprising that 80% of teams shipped their applications without implementing any additional security measures.
However, this comes with some real consequences. Simply enabling internet access for an agentic application dramatically expands the attack surface – and without proper safeguards, even the most innovative solutions can quickly become vulnerable.
While the OpenAI Agent SDK includes a straightforward implementation for Guardrails agents, most participants chose not to use it. The primary reason? It reduced result accuracy and increased the number of incorrect refusals during early testing – leading many teams to intentionally deprioritize security in favor of smoother user experience or faster prototyping.
This tradeoff might be tolerated in a hackathon context – but in real-world, enterprise-grade deployments, security and safety cannot be optional.
At SplxAI, we believe the future of AI won’t be defined just by what agents can do, but by how safely and reliably they do it. Hackathons like this are an exciting proving ground – and tools like Agentic Radar are here to make sure the next generation of AI products are not only powerful, but trustworthy.
If you're building with agents or deploying AI into real-world environments, don't leave security as an afterthought. Reach out to our team to have your AI workflows and applications tested, hardened, and secured – before vulnerabilities turn into headlines.
In the fast-paced, high-pressure environment of a 24-hour hackathon, it’s perhaps unsurprising that 80% of teams shipped their applications without implementing any additional security measures.
However, this comes with some real consequences. Simply enabling internet access for an agentic application dramatically expands the attack surface – and without proper safeguards, even the most innovative solutions can quickly become vulnerable.
While the OpenAI Agent SDK includes a straightforward implementation for Guardrails agents, most participants chose not to use it. The primary reason? It reduced result accuracy and increased the number of incorrect refusals during early testing – leading many teams to intentionally deprioritize security in favor of smoother user experience or faster prototyping.
This tradeoff might be tolerated in a hackathon context – but in real-world, enterprise-grade deployments, security and safety cannot be optional.
At SplxAI, we believe the future of AI won’t be defined just by what agents can do, but by how safely and reliably they do it. Hackathons like this are an exciting proving ground – and tools like Agentic Radar are here to make sure the next generation of AI products are not only powerful, but trustworthy.
If you're building with agents or deploying AI into real-world environments, don't leave security as an afterthought. Reach out to our team to have your AI workflows and applications tested, hardened, and secured – before vulnerabilities turn into headlines.
In the fast-paced, high-pressure environment of a 24-hour hackathon, it’s perhaps unsurprising that 80% of teams shipped their applications without implementing any additional security measures.
However, this comes with some real consequences. Simply enabling internet access for an agentic application dramatically expands the attack surface – and without proper safeguards, even the most innovative solutions can quickly become vulnerable.
While the OpenAI Agent SDK includes a straightforward implementation for Guardrails agents, most participants chose not to use it. The primary reason? It reduced result accuracy and increased the number of incorrect refusals during early testing – leading many teams to intentionally deprioritize security in favor of smoother user experience or faster prototyping.
This tradeoff might be tolerated in a hackathon context – but in real-world, enterprise-grade deployments, security and safety cannot be optional.
At SplxAI, we believe the future of AI won’t be defined just by what agents can do, but by how safely and reliably they do it. Hackathons like this are an exciting proving ground – and tools like Agentic Radar are here to make sure the next generation of AI products are not only powerful, but trustworthy.
If you're building with agents or deploying AI into real-world environments, don't leave security as an afterthought. Reach out to our team to have your AI workflows and applications tested, hardened, and secured – before vulnerabilities turn into headlines.
Ready to leverage AI with confidence?
Ready to leverage AI with confidence?
Ready to leverage AI with confidence?