
Breaking Into Vuln-Bank: A Human-AI Hacking Adventure
β οΈ Important Context & Limitations
Vulnerable by Design: Vuln-Bank is an intentionally vulnerable application created for educational and training purposes. It contains deliberately implemented security flaws that would never exist in a properly designed production banking system. The extreme number and severity of vulnerabilities found (21 critical issues) reflect this intentional design rather than typical real-world applications.
AI Evolution: This assessment represents AI capabilities as of May 2025 using Claude 3.5 Sonnet. As AI technology continues to advance rapidly, future versions will likely demonstrate even more sophisticated reasoning, pattern recognition, and vulnerability discovery capabilities. What you see here is the worst AI will ever be at cybersecurity - it will only get better from here.
Methodology Focus: The primary value of this assessment lies not in the specific vulnerabilities found, but in demonstrating the revolutionary potential of human-AI collaboration in cybersecurity. The strategic prompting framework, systematic execution approach, and evidence collection methodology represent reproducible techniques applicable to real-world security assessments.
Real-World Application: While Vuln-Bank's vulnerabilities are extreme, the human-AI collaboration methodology demonstrated here has been successfully applied to actual production systems, revealing genuine security issues in enterprise environments with significantly more sophisticated defenses.
The Genesis: A NaijaSecForce Conversation
It all started two weeks ago during one of our regular NaijaSecForce group discussions. We were deep into exploring AI use cases across our different companies when the conversation turned to cybersecurity applications. As we shared experiences about integrating AI into our security workflows, I found myself describing a concept that had been brewing in my mind.
"Think about it," I said to the group, "while you're intercepting requests with Burp Suite, why not let AI go loose on the APIs and applications? But not in the traditional way - not just throwing series of payloads blindly at targets."
I explained my "Plan-to-Exploit" methodology - a controlled approach I'd been playing around with for the past 8 months. The concept was simple: AI first enumerates the target, understands its architecture, then comes up with a detailed testing plan including exclusions and specific focus areas. This plan then becomes custom instructions for everything the AI needs to do during the assessment.
In some cases, I feed cline a Postman collection file to work with directly. In others, I let Burp Suite handle the crawling to map out all the endpoints and capture the requests and responses. I then extract the relevant request paths and feed them into Cline for focused testing.
The group was intrigued. "So instead of automated chaos, you get systematic intelligence?" one member asked. Exactly. The AI becomes your strategic partner, not just another tool.
But let me be honest - this journey hadn't been smooth. Over the past 8 months, I'd experimented with various AI models, each presenting unique challenges:
- Claude 3 Sonnet: My first serious attempt, but constantly hit walls with "I can't perform penetration testing without proper authorization"
- Claude 3 Opus: More capable, but still overly cautious about security testing
- Uncensored models: Explored alternatives to bypass constraints, but lacked the reasoning sophistication needed
- Claude 3.5 Sonnet: Better reasoning, but still had authorization concerns
- Claude Sonnet 4: The breakthrough - finally found the perfect balance of advanced reasoning, capability, and cooperation
The key wasn't just finding the right model - it was developing the right prompting strategies, the right context setting, and the right way to frame security testing as legitimate research rather than malicious activity. Claude Sonnet 4 proved to be the game-changer with its superior reasoning capabilities and willingness to engage in complex security scenarios when properly contextualized.
That conversation with NaijaSecForce reminded me I had the perfect test case sitting right there: Vuln-Bank, a deliberately vulnerable banking application developed by AlΒ AmirΒ Badmus (available at github.com/Commando-X/vuln-bank). Time to put 8 months of methodology refinement to the ultimate test with Claude Sonnet 4 as my AI partner.
The Target: A Digital Bank's Worst Nightmare
π― TARGET ACQUIRED
Application: Vuln-Bank
URL: http://localhost:5050
Type: Complete Black-Box Assessment
Knowledge Level: Zero (URL only)
Mission: Find and exploit every vulnerability
Complete Vulnerability Inventory
Here's a comprehensive list of all 21 critical vulnerabilities discovered during our assessment:
Bill Payment Vulnerabilities
- Negative Bill Payments: The application accepts negative payment amounts, allowing financial fraud through negative payments
- Transaction History Exposure: The application exposes payment history of other users, enabling unauthorized access to sensitive financial information
- Race Conditions in Payment Processing: The application processes concurrent payment requests without proper isolation, allowing bypassing balance checks
- Missing Payment Limits: The application has no rate limiting or transaction count restrictions, enabling automated attacks
- Predictable Reference Numbers: The application uses sequential reference numbers for bill payments, enabling payment enumeration and forgery
- BOLA in Payment History Access: The application allows accessing payment details of other users through query parameters
File Operation Vulnerabilities
- Unrestricted File Upload: The application allows uploading files with dangerous extensions, enabling remote code execution
- No File Type Validation: The application accepts any file type without validation, enabling malware distribution
- No File Size Limits: The application has no file size restrictions, enabling denial of service attacks
- Path Traversal Vulnerabilities: The application inadequately sanitizes filenames with directory traversal sequences
- Unsafe File Naming: The application uses simple replacement for special characters in filenames, potentially enabling XSS
Authentication Vulnerabilities
- SQL Injection in Login Endpoint: The login endpoint is vulnerable to SQL injection, allowing complete authentication bypass
- Trivial Password Reset Bypass: Password resets use only 3-digit PINs with no rate limiting, enabling account takeover
- JWT Secret Exposure: The application exposes JWT secrets in debug logs, enabling token forgery
Transaction Vulnerabilities
- Negative Amount Transfers: The application accepts negative transfer amounts, allowing unlimited fund generation
- No Validation on Recipient Accounts: The application doesn't validate recipient accounts, enabling money laundering
- Race Conditions in Transfers: The application is vulnerable to race conditions, allowing exceeding available balance
- No Transaction Limits: The application has no transaction limits or rate limiting, enabling automated attacks
- Transaction History Information Disclosure: The application exposes transaction history and is vulnerable to SQL injection
Other Critical Vulnerabilities
- Complete Payment Card Data Exposure: The application returns complete, unmasked card numbers and CVV codes in API responses
- Debug Mode Enabled: The application runs with debug mode enabled, exposing sensitive information in logs
Picture this: You're handed a single URL - http://localhost:5050
- and told it's a banking application called "Vuln-Bank." That's it. No source code, no documentation, no insider knowledge. Just a web address and the challenge to uncover its secrets.
This is the essence of black-box penetration testing - approaching a target with the same level of knowledge as a real-world attacker. But this time, I wasn't going in alone. I had a secret weapon: Cline, an AI assistant powered by Claude Sonnet 4, ready to help me systematically tear apart this digital fortress.
Meet My AI Partner: Cline (Claude Sonnet 4)
Cline isn't your typical security scanner. While traditional tools mindlessly throw payloads at applications, Cline thinks, adapts, and learns. Powered by Claude Sonnet 4's advanced reasoning capabilities, it's like having a brilliant senior penetration tester who never gets tired, never misses a detail, and can execute complex attack chains with surgical precision.
The beauty of human-AI collaboration in pentesting isn't about replacing human expertise - it's about amplifying it. I provide the strategic direction, the business context, and the ethical boundaries. Cline provides the systematic execution, the tireless attention to detail, and the ability to process vast amounts of information without fatigue.
First Contact: Reconnaissance
With that simple command, Cline sprang into action. Within minutes, it had mapped the entire application structure, identified the technology stack, and discovered something that made my blood run cold:
Debug mode enabled in production? That's like leaving the bank vault door wide open with a sign saying "Free Money Inside." But this was just the beginning of our digital heist.
The Human-AI Dance: Strategic Prompting
The key to effective AI-powered pentesting lies in strategic prompting. It's not about giving the AI a list of vulnerabilities to check - it's about guiding it to think like an attacker while maintaining the systematic approach of a professional security assessment.
I guided Cline to understand the application's purpose, technology stack, and basic functionality.
Together, we developed a comprehensive testing plan covering 80+ test cases across 8 major vulnerability categories.
Cline systematically executed each test case while I provided strategic guidance and validation.
We chained vulnerabilities together to achieve maximum impact and demonstrate real-world attack scenarios.
The First Crack: SQL Injection Gold Mine
Every penetration tester knows that feeling when you find your first vulnerability. But what Cline discovered wasn't just a vulnerability - it was the master key to the entire kingdom.
Within seconds, Cline had crafted the perfect payload:
The response was immediate and devastating. Not only did we bypass authentication completely, but the debug logs revealed something that made my jaw drop:
π CRITICAL: Complete System Exposure
Exposed in plain text: Admin password ('hacked123'), account balance ($999,800), reset PIN ('393'), complete database structure, and a valid JWT token!
In a real-world scenario, this single vulnerability would have given us complete control over the bank's systems. But we were just getting started.
The Money Printer: Business Logic Nightmare
With admin access secured, Cline turned its attention to the core banking functionality. What it discovered next defied belief.
The results were catastrophic. Cline discovered that the application would accept negative transfer amounts, effectively turning the transfer function into a money printing machine:
π° UNLIMITED MONEY GENERATION CONFIRMED
With this vulnerability, an attacker could generate infinite funds, causing unlimited financial damage to the institution.
The Transaction Nightmare: More Financial Exploits
Continuing our methodical approach, Cline discovered several more critical vulnerabilities in the transaction system:
π¦ CRITICAL: No Validation on Recipient Accounts
β‘ CRITICAL: Race Conditions in Transfers
π CRITICAL: Transaction History Information Disclosure
π HIGH: No Transaction Limits
The Bill Payment Disaster: Financial Controls Bypass
Continuing our systematic approach, Cline turned its attention to the bill payment functionality. What it discovered was yet another set of critical vulnerabilities that could be exploited for financial fraud.
The results were shocking. Cline discovered multiple critical vulnerabilities in the bill payment system:
πΈ CRITICAL: Negative Bill Payments
π CRITICAL: Transaction History Exposure
β‘ CRITICAL: Race Conditions & No Payment Limits
π’ CRITICAL: Predictable Reference Numbers
The Card Catastrophe: PCI DSS Nightmare
As if unlimited money generation wasn't enough, Cline's systematic approach uncovered another devastating flaw in the virtual card system.
The results violated every principle of payment card security:
π³ CRITICAL: Complete Payment Card Data Exposure
The AI Advantage: Systematic Destruction
What made this assessment truly remarkable wasn't just the vulnerabilities we found, but how we found them. Cline's systematic approach ensured we didn't miss anything:
While I provided strategic direction and business context, Cline executed with machine-like precision:
- Authentication Testing: SQL injection, weak passwords, session management
- Authorization Testing: Privilege escalation, access control bypasses
- Input Validation: Injection attacks, business logic flaws
- Data Security: Information disclosure, encryption weaknesses
- Session Management: Token manipulation, session hijacking
- Business Logic: Financial transaction flaws, workflow bypasses
- Bill Payment Processing: Amount validation, transaction limits, reference number generation
- File Operations: Upload validation, file type checking, path traversal
The Complete Compromise: Chaining Attacks
The true power of human-AI collaboration became evident when we started chaining vulnerabilities together. What began as individual security flaws became a complete system takeover:
SQL injection bypass β Admin JWT token β System access
Negative transfers β Unlimited money generation β Virtual card creation
Admin panel access β Complete user database β Payment card data
Password reset bypass β Account takeover β Permanent access
The Hidden Admin Panel: Secret Backdoor
Just when we thought we'd seen everything, Cline made another shocking discovery:
This hidden endpoint exposed the complete user database, including account balances, personal information, and administrative status - a treasure trove for any attacker.
The Weak Link: 3-Digit PIN Catastrophe
As if the application couldn't get any worse, Cline discovered that password resets used only 3-digit PINs. But what happened next showcased the true power of AI-driven security testing.
Within seconds, Cline had automatically generated and executed a complete brute force attack:
π CRITICAL: Trivial Password Reset Bypass
PIN Range: 000-999 (only 1,000 possibilities)
Rate Limiting: None
Auto-Generated Payloads: 1,000 combinations in milliseconds
Time to Crack: 0.8 seconds (PIN '393' found on attempt 394)
Impact: Complete account takeover for any user
The beauty of AI-powered testing was evident here. While a human tester might manually try a few common PINs or write a custom script, Cline instantly recognized the vulnerability pattern, auto-generated the complete payload set, and executed a systematic brute force attack - all within seconds of identifying the weakness.
The File Upload Fiasco: Remote Code Execution
As we continued our systematic assessment, Cline discovered yet another critical vulnerability in the profile picture upload functionality.
The results were alarming. Cline discovered that the application had no protection against malicious file uploads:
π₯ CRITICAL: Unrestricted File Upload
β οΈ CRITICAL: No File Type Validation
π HIGH: No File Size Limits
π HIGH: Path Traversal Vulnerabilities
π MEDIUM: Unsafe File Naming
The Debug Disaster: System Confession
Throughout our entire assessment, the application was confessing its sins in real-time through debug logs:
Every action we took was logged with complete technical details, exposing:
- Database queries and results
- Plaintext passwords and PINs
- JWT secrets and file paths
- Card numbers and CVVs
- System architecture details
The Business Impact
This wasn't just a technical exercise - it was a demonstration of how a real-world attack could devastate a financial institution. The combination of unlimited money generation, complete data exposure, and regulatory violations would have been catastrophic for any real bank.
The AI Revolution in Pentesting
This assessment proved that human-AI collaboration represents the future of cybersecurity testing. Here's what made it so effective:
What Cline Brought to the Table:
- Systematic Coverage: Never missed a test case or vulnerability category
- Pattern Recognition: Quickly identified vulnerable code patterns
- Contextual Understanding: Understood how vulnerabilities could be chained
- Real-time Adaptation: Adjusted strategies based on discoveries
- Evidence Collection: Automatically documented every finding with proof
- Business Impact Analysis: Calculated precise financial risk
What Human Expertise Provided:
- Strategic Direction: Focused AI efforts on high-impact areas
- Business Context: Understood real-world implications
- Quality Validation: Ensured findings were accurate and relevant
- Ethical Boundaries: Maintained responsible disclosure practices
- Creative Thinking: Guided AI to consider novel attack vectors
The Aftermath: Lessons Learned
This black-box assessment of Vuln-Bank revealed more than just vulnerabilities - it demonstrated the transformative potential of human-AI collaboration in cybersecurity. In just 4 hours, we achieved what would traditionally take weeks:
- Complete system compromise through systematic vulnerability discovery
- Detailed business impact analysis with precise financial calculations
- Comprehensive remediation guidance with implementation examples
- Professional-grade documentation suitable for all stakeholders
The Future of Security Testing
As I reflect on this assessment, one thing is clear: the future of penetration testing lies not in replacing human security professionals, but in augmenting their capabilities with AI tools that enable faster, more thorough, and more accurate security assessments.
The combination of human intuition and AI precision creates a powerful synergy that can uncover vulnerabilities that might be missed by either approach alone. Strategic prompting, systematic execution, and continuous validation create a methodology that is both efficient and effective.
Conclusion: The Digital Heist Complete
Our black-box assessment of Vuln-Bank at http://localhost:5050
provided valuable insights into both the applicationβs vulnerabilities and the potential of human-AI collaboration in penetration testing. Starting with nothing but a URL, Cline assessed every security control, exposed every vulnerability, and demonstrated complete system compromise.
The 21 critical vulnerabilities we discovered, the potential business impact, and the complete regulatory compliance failures paint a picture of an application that was fundamentally broken from a security perspective.
But more importantly, this assessment proved that the future of cybersecurity lies in human-AI collaboration. By combining strategic human guidance with AI systematic execution, we can achieve unprecedented efficiency and thoroughness in vulnerability discovery.
As cyber threats continue to evolve, so too must our approaches to defending against them. The methodology demonstrated in this assessment - strategic prompting, systematic execution, and continuous validation - represents a new paradigm in cybersecurity that will help organizations stay ahead of attackers in an increasingly complex digital landscape.
PS: Another interesting fact β this article was drafted by Cline after the assessment, which explains the slightly enthusiastic tone π.About the Author
Rotimi Akinyele is a cybersecurity leader, ethical hacker, and AI security pioneer. As Nigeria's first OSCE/OSCP and a Harvard-trained cybersecurity strategist, he leads security for a global fintech platform protecting over 3 million customers and processing $600+ billion monthly.
This assessment represents his approach to human-AI collaboration in cybersecurity, demonstrating how strategic prompting and systematic instruction of AI assistants can revolutionize penetration testing effectiveness.
Connect: NaijaSecForce.com | LinkedIn
Note: This assessment was conducted on a controlled test environment for educational purposes. All vulnerabilities and techniques described are for professional development and security awareness.
Β© 2025 Rotimi Akinyele. Educational and professional use authorized.