AI Dev Workshop · Wallbox 2026

Uncovering
better ways of
developing

...yes, again.

Jorge Castro & Joan Leon

Questions about your day to day

Are you up for it?

menti.com

4889 5833

Scan the QR or go to menti.com and enter the code

02

The key question

Our job is to solve problems,
not to write code.

🪚

The carpenter who rejects the milling machine

He is not more skilled. He is just slower.

→

⌨️

Devs in 2026

Will we say the same about agents?

The question is not whether to use the milling machine.
It is how to use it with master's degree.

Manifesto for Agile Software Development

We are uncovering better ways of developing
software by doing it and helping others do it.
Through this work we have come to value:

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.

Kent Beck
Mike Beedle
Arie van Bennekum
Alistair Cockburn
Ward Cunningham
Martin Fowler

James Grenning
Jim Highsmith
Andrew Hunt
Ron Jeffries
Jon Kern
Brian Marick

Robert C. Martin
Steve Mellor
Ken Schwaber
Jeff Sutherland
Dave Thomas

© 2001, the above authors
this declaration may be freely copied in any form,
but only in its entirety through this notice.

Twelve Principles of Agile Software

04

Agile Manifesto · 2001

"We are UNCOVERING
better ways of
developing software"

— Agile Manifesto, 2001

It does not say "we have uncovered". It says "we are uncovering". A process that never ends.

05

METR Study · July 2025

16

senior developers

+5

years working on their own repos

246

real tasks from their own projects

Perception: +20% faster

19% SLOWER

February 2026 — Claude Opus 4.6 completes tasks that would take an expert human ~15 hours, 1 out of 2 times.

Source: METR — Early 2025 AI-Experienced OS Dev Study

06

The pattern we need to break

It is not the AI. It is how we use it.

01

Vague prompts

Without clear context, the AI guesses. And when it guesses, it hallucinates.

02

No Memory

Every session starts from zero. It is like hiring someone new every day.

03

No iterative workflow

One single session to research, plan and implement. The context gets polluted.

04

No human review

Code goes to production without review. Nobody can review everything the AI generates.

AI amplifies what you give it. If you give it a disciplined process, it amplifies your productivity. If you give it chaos, it amplifies the chaos.

07

The map

Where are we in automation?

Level	Approach	Description	Maturity
L1	Code-level CompletionCopilot (inline), Tabby, Codeium, Supermaven	Autocomplete in the editor. The human writes, the AI suggests.	Mainstream
L2	Task-level Generation · Prompt to UIChatGPT, Claude (chat), Cursor, Windsurf, Aider, Cline, bolt.new, Lovable, v0	The human defines a task, the AI generates code/UI. The human approves each step.	Adopted
Now We are here	Task-level with Optional AutonomyClaude Code, Codex CLI, Cursor (Agent Mode), Copilot Edits, Windsurf (cascade)	L2 by default, configurable to chain autonomous actions (headless mode).	Adopted (requires setup)
L3	Ticket to PR · Self-healing CIClaude Code (headless + CI), Codex (cloud agent), Copilot Coding Agent, Devin, Codegen	The agent receives an issue, generates code, creates a PR, iterates on CI failures. Escalates when out of scope.	Early Adoption
L4	AI Software EngineerDevin (enterprise), Factory.ai, Genie	Full cycle: requirements → code → deploy → monitoring → rollback.	Emerging
L5	AI Development TeamsAutoDev, MetaGPT, MGX	Multiple specialized agents collaborating autonomously.	Experimental

08

The 4 problems

What AI does wrong

01

No context

At the start of each session, the AI knows nothing about your code. Zero.
And it does not know when it does not know enough.

02

Hallucinations

It invents APIs, methods and dependencies that do not exist... and tells you with full confidence.

03

Not deterministic

The same prompt can produce different results every time you run it.

04

Hard to review

We generate code faster than our cognitive load allows us to review.

09

The flow we already know

Separating the phases is not optional with AI

01

Research

Context, constraints, decisions

→

02

Plan

Design, small steps, criteria

→

03

Implement in phases

Clean session, precise context

→

04

Validate

Tests, review, real feedback

Each phase needs a clean context window.
Keep the context clean or you will have problems.

10

We already know what works

The bottleneck has never been
lines of code per minute

01

Story Splitting

Split work into deployable vertical slices. With AI you can generate a lot in little time. If the scope is not cut, the chaos scales just as fast.

02

Hamburger Method

Deliver value end-to-end continuously. Cut the feature into layers, implement the thinnest slice.

03

Small safe steps

Each step must be reversible. Just because the agent can generate a lot very fast does not mean we should push it all to production at once.

04

Advanced testing

Mutation testing, Acceptance Testing, Architectural Testing. All of them expand the feedback loop and now there are no excuses.

More generated code = more risk if you do not have a solid feedback loop.

11

Design for oversight

You cannot review every line.
Design systems that catch the errors.

Automated

Technical guardrails

Hooks: automatic linter on every tool use
Architecture tests: verify structure, not just functionality
Acceptance Testing: expected behavior defined upfront
Mature CI/CD: if it is not reliable with humans, it will be chaos with agents
Agent permissions: define what it can do without approval and what it cannot

Human judgment

Pre-merge checklist

DomainThe code reflects domain concepts, not technical jargon
QualityDo the tests fail when the behavior fails?
OpsIs there traceability if this fails in prod?
DebtCould I delete this and nobody would notice?

12

Agent security

Set up permissions.
Calibrate autonomy to risk.

The agent does without asking

Edit code
Run tests
Generate documentation
Create branches
Open draft PRs

vs

Always with your approval

Push to main
Modify CI/CD pipelines
Touch IAM / Terraform / infra
Access secrets or credentials
Self-approve its own PRs

Low risk → more autonomy. Start without asking permission.

High risk → more oversight. The harder to undo, the more human eyes needed.

13

Autonomy without oversight

Real example

YO

Del plan .claude/plans/pending/2026… tengo un par de dudas

CC

Acción ejecutada

Listo, archivo eliminado.

YO

¿Por qué has eliminado el archivo?

CC

Me equivoqué. Interpreté "del" como el comando inglés delete, pero en realidad "del plan" es la preposición en español, estabas haciendo referencia al plan, no pidiéndome eliminarlo.

Lo siento mucho. ¿Tienes el contenido del archivo en otro lugar, o hay forma de recuperarlo?

14

How much autonomy?

Calibrate autonomy to risk

Context

Autonomy level

Before merging

Internal prototype
If it fails: we lose time

You design, the AI implements. Do not worry about the code

Unit + integration tests
Straight to main

Product with users
If it fails: user impacted

You design, the AI implements, you oversee the architecture

Architecture + acceptance tests
Observability + AI-reviewed PR

Business core
If it fails: money or trust

You design and review code in critical modules

Mutation testing + security scan
PR with human review

Physical world
If it fails: real harm

You decide and oversee at a low level — the AI only assists

Tests on real hardware
Mandatory safety review

15

The transformation

Tools change.
Responsibility does not.

Changes

Who writes the code

The speed of generation

The format of knowledge

Stays the same

Simple architecture

Small safe steps

Define before you implement

16

Summary

4 ideas to take with you today

1

Your value is in understanding the problem, not in typing the solution.

2

AI does not improve your process. It amplifies it. If it is good, it goes faster. If it is bad, it fails faster.

3

An agent without configuration is a junior without onboarding. Give it context, rules and a way to verify its work.

4

Share what you learn — let us keep uncovering together.

Uncoveringbetter ways ofdeveloping

Our job is to solve problems,not to write code.