What AI Software Engineers Can (and Can't) Do

"AI will replace developers" makes for great headlines. It also makes for bad strategy if you actually believe it.

We've spent months building an AI software engineer. Here's what we've learned about what it can do, what it can't, and why the nuance matters more than the hype.

What Vince Handles Well

Let's start with the wins. These are the tasks where AI consistently delivers value:

Bug Fixes with Clear Reproduction

✅ "Login button doesn't respond on mobile Safari"
✅ "NullPointerException in UserService.getProfile when user has no avatar"
✅ "Date picker shows wrong timezone for users in Australia"

When you can describe the bug clearly—what happens, what should happen, where it happens—AI can usually find and fix it. These are the tasks that eat hours of developer time but don't require deep system knowledge.

Test Coverage Expansion

✅ "Add unit tests for the PaymentService class"
✅ "Increase coverage on src/utils/ to 80%"
✅ "Add integration tests for the checkout flow"

AI is surprisingly good at writing tests. It can read existing patterns in your test suite, understand your assertion style, and generate comprehensive coverage. This is often the first thing teams automate.

Refactoring and Code Cleanup

✅ "Extract the validation logic into a separate module"
✅ "Convert this class component to a functional component with hooks"
✅ "Remove deprecated API calls and replace with v2 endpoints"

Mechanical transformations—where the "what" is clear but the "doing" is tedious—are AI sweet spots. The code changes are often large but low-risk.

Documentation Generation

✅ "Add JSDoc comments to all public methods in src/api/"
✅ "Generate README for the authentication module"
✅ "Update API documentation after the recent changes"

AI can read code and explain what it does. It won't write your architecture decision records, but it can absolutely document your function signatures.

Dependency Updates

✅ "Update React from 17 to 18 and fix breaking changes"
✅ "Migrate from moment.js to date-fns"
✅ "Update TypeScript to 5.x and resolve type errors"

Following migration guides, updating import statements, fixing type errors—this is grunt work that AI handles competently.

What Vince Declines

Here's where it gets interesting. A good AI engineer knows what it shouldn't attempt. These tasks get declined with an explanation:

Vague Requirements

❌ "Make the app faster"
❌ "Improve the user experience"
❌ "Fix the authentication issues"

If a human would need to ask clarifying questions, so does the AI. Vague tasks get a NEEDS_CLARITY response, not a guess.

Architectural Decisions

❌ "Should we use GraphQL or REST for the new API?"
❌ "Redesign the database schema for better performance"
❌ "Set up the microservices infrastructure"

These require understanding business context, future requirements, team preferences, and tradeoffs that aren't in the code. AI lacks the context to make these calls.

Security-Sensitive Changes

❌ "Implement the new authentication flow"
❌ "Add encryption for user data"
❌ "Fix the SQL injection vulnerability"

Security code needs expert review from the start, not just at the PR stage. The cost of getting it wrong is too high to automate.

Large-Scale Changes

❌ "Migrate the monolith to microservices"
❌ "Rewrite the frontend in a new framework"
❌ "Implement the new feature epic"

Tasks touching 50+ files, spanning multiple systems, or requiring multi-day effort are out of scope. These need human orchestration.

Anything Requiring External Coordination

❌ "Integrate with the new payment provider" (needs API keys, contracts)
❌ "Fix the production outage" (needs access, monitoring, coordination)
❌ "Update the third-party SDK" (needs vendor communication)

If the task requires talking to people, getting approvals, or accessing systems not in the codebase, it's not automatable.

The NEEDS_CLARITY State

Between "accept" and "decline" is a third option: asking for clarification. This is where Vince acts like a thoughtful junior developer rather than a blind task executor.

💬 Vince commented on BACKEND-789:

I want to make sure I implement this correctly. The ticket says
"add caching to the API," but I have a few questions:

1. Which endpoints should be cached? All of them or specific ones?
2. What cache backend should I use? Redis is configured, but I
   see mentions of memcached in some comments.
3. What's the expected TTL? I see different values in different
   parts of the codebase.

Could you clarify these points?

This prevents the AI from making assumptions that waste everyone's time. A wrong implementation is worse than no implementation.

The 80/20 of Developer Work

Here's the mental model that actually matches reality:

Work Type	% of Time	AI Capability
Routine implementation	~40%	✅ High
Bug fixes and debugging	~20%	✅ Medium-High
Code review and PR feedback	~15%	🔄 Augmented
Architecture and design	~15%	❌ Human
Meetings and coordination	~10%	❌ Human

AI doesn't replace the whole job. It handles the ~60% that's mechanical, freeing you for the ~40% that requires judgment.

Where Humans Remain Essential

Let's be explicit about what AI can't do:

Understanding Business Context

The code says if (user.plan === 'enterprise'). Why? What's the business reason? What happens if we change it? The AI sees syntax. Humans see strategy.

Making Tradeoff Decisions

"Should we optimize for performance or readability here?" "Is this technical debt worth taking on?" "Do we build or buy?" These require values and priorities that exist outside the codebase.

Communicating with Stakeholders

"The feature will take longer than expected because..." "We need to change the approach due to..." "The security review found..." AI can write code, but humans navigate organizations.

Handling Ambiguity

Real requirements are messy. They contradict each other. They assume context that isn't written down. Humans resolve ambiguity through conversation; AI needs it resolved before starting.

Taking Responsibility

When the AI's code causes a production incident, who's accountable? The human who approved the PR. The human who designed the system. The human who decided to use AI in the first place. Accountability doesn't automate.

Realistic Expectations

Here's what actually happens when you deploy an AI software engineer:

Week 1: You assign simple bug fixes. Some work perfectly. Some need minor corrections in PR review. You learn what task descriptions work best.

Week 4: You've developed intuition for what to assign. The AI handles 20-30% of your backlog items autonomously. Review time per PR drops because you trust the patterns.

Week 12: The AI is a normal part of your workflow. Routine work gets queued for it automatically. Your team focuses on the hard problems. Throughput is up, but headcount hasn't changed.

What doesn't happen: Mass layoffs. Complete automation. The AI "taking over" the codebase.

The Multiplier Effect

The right framing isn't "AI replaces developers." It's "AI multiplies developers."

A senior engineer who spends 40% of their time on routine implementation now spends 10%. That 30% goes to:

Mentoring junior developers
Improving architecture
Reducing technical debt
Actually thinking about hard problems

The team doesn't shrink. It levels up.

The Honest Pitch

If you're evaluating AI coding tools, here's the honest pitch:

What you get:

Faster throughput on routine tasks
More consistent code style
Better test coverage
Fewer context switches for developers

What you don't get:

Replacement for engineering judgment
Solution for unclear requirements
Magic fix for organizational problems
Elimination of code review

AI software engineers are a tool. A powerful one, but still a tool. They work best when humans define the work clearly, review the output critically, and stay responsible for the result.

The 10x developer was always a myth. The 10x team—where humans and AI each do what they're best at—is starting to look achievable.

Want to see what Vince can handle for your codebase? Start with your most tedious backlog item and see what happens.