Blog

LLM Productivity Verdict: Grading AI's Real-World Impact

Oct 21, 2025

This series has explored one central question: can Large Language Models (LLMs) deliver measurable productivity value to real-world product development teams? To provide a definitive answer, I undertook Mission Control, a solo, timeboxed 200-hour project. In this experiment, I assumed every critical role—Product Manager, UX Designer, Engineer, and QA—leveraging LLMs at every stage to accelerate progress.

The verdict is in: yes, LLMs can provide value to real world teams. However, the magnitude of that value is directly proportional to the expertise of the person guiding them. Throughout this project, LLMs acted as brilliant, tireless interns. They can research, draft, and build at an incredible pace, but they require expert direction to produce high-quality work and avoid costly mistakes. Without strategic oversight, their output can be generic, flawed, or entirely misaligned with project goals.

As clarified in a previous post, my level of expertise varies across the different roles I assumed as part of this project. I am an expert when it comes to Product Management. I have intermediate Software Engineering and QA Engineering skills (where I started my career). I have very little experience/skills when it comes to DevOps and UX/Design. The productivity grades that I will highlight in the next section, as you'll see, are directly correlated to my level of experience and expertise that I was able to apply when guiding or managing my LLM "interns."

LLM Productivity Grades by Functional Area

To quantify the impact, I graded the productivity gains observed across each discipline involved in building Mission Control. The results were not uniform, highlighting how LLMs performance truly relies on the level of expertise guiding them.

Product Management: A- (Solid 2–5x Gain)
LLMs proved to be a powerful ally in the foundational stages of product definition of Mission Control. Tasks like framing the product vision, drafting initial Product Requirements Documents (PRDs), and identifying core value propositions were significantly accelerated. The ability to synthesize market research, summarize competitor analyses, and brainstorm in/around user personas in minutes instead of days was a clear win.

However, LLMs struggle without domain-specific context. Left unguided, their suggestions were often over-generalized and lacked strategic nuance. The real breakthrough came from using expert-level prompts that embedded constraints, business goals, and a deep understanding of the target user. I used my 20+ years' experience in Product Management and Leadership roles to prompt the LLM to a deeper level of product strategy.

Architectural Design & Tech Stack Selection: A+ (Significant 10–20x Gain)
Architectural decisions carry long-term consequences, and LLMs were instrumental in de-risking this process. I utilized them to conduct deep comparisons of backend infrastructure options, database technologies, service layer patterns, and frontend frameworks. Generating comprehensive tables that detailed the pros, cons, and tradeoffs of different choices—complete with implementation considerations—was remarkably efficient.

The primary caveat here is the LLM's tendency to "hallucinate" or present outdated information. It occasionally referenced outdated documentation of platforms that have recently gone through major updates. Success required a strict validation workflow: use the LLM to generate options and structure the analysis, but always verify every claim against official documentation and trusted community sources. Human oversight was critical to separate insightful recommendations from plausible-sounding falsehoods.

But, at the end of the day, I ended up with a very solid architectural design and the tech stack for Mission Control is one I would use with confidence for any new product venture foundation.

UX Design: B (Solid 2–5x Gain, but in narrow focused areas)
In UX design, LLMs demonstrated solid utility in translating requirements into tangible structures. I tasked them with generating wireframe concepts and basic mockups based on product requirements, which served as an excellent starting point for iteration. They were also effective at structuring frontend components that could be leveraged across the application, helping maintain consistency. This workflow accelerated the initial stages of UX design and frontend implementation.

The limitations became apparent in areas requiring visual nuance and complex information hierarchy. LLMs can arrange elements on a page, but they lack a true designer's eye for balance, flow, and emotional resonance. The most effective workflow involved pairing the LLM's structural speed with human judgment. I used it to quickly scaffold layouts and then applied my own expertise to refine the details, polish the user journey, and ensure the final design was not just functional, but intuitive and engaging.

In narrow focused areas around visual mocks creation and frontend component translation, the LLMs provided a solid productivity boost, but I grade this area a 'B' because I wish I could have leveraged LLMs for much more solid UX wireframing and complex flow design.

Software Engineering: A+ (Exponential 10–50x Gain)
This is where LLMs delivered their most transformative impact. Across both frontend and backend development, the productivity gains were exponential. LLMs excelled at scaffolding new features, writing comprehensive unit and integration tests, refactoring complex code blocks, and generating database migration scripts. The ability to instantly connect to APIs, handle data transformations, and boilerplate entire modules turned hours of tedious work into minutes of prompting and review.

There was risk associated with all of this that I experienced and had to learn how to mitigate and/or remediate. LLMs can introduce subtle logic bugs, forget really important context of recent changes, rely on outdated or insecure libraries, and/or produce code that is functional but not optimal. To mitigate this, a rigorous process is non-negotiable. I found success by enforcing a "technical design, phased build / documentation, manual test" cycle:

Technical Design: Design a feature with the LLM as a design partner, leveraging product requirements and targeted prompts to not only come up with a good technical approach to building the solution, but an understanding of the tradeoffs or risks as well. This technical design would help me understand exactly and sign off on the approach, and provide an up front opportunity to break up this "mini project" into phases or chunks.
Phased Build & Documentation Iterations: I would move into an iterative motion with the LLM to build smaller chunks of the "mini project." I also got into the good habit of having developer and testing documentation created along the way.
Manual Testing: Leveraging the big picture technical design and the iterative developer and test case documentation, I would then do a manual testing / verification pass on that phase of work before moving the LLM into the next.

For the most part, software engineering on the backend and frontend with LLM's was incredibly productive. I consider myself to have intermediate developer skills, but I think it was just enough to help me "keep the LLMs honest" as we developed Mission Control.

(The one exception to this trend was: DevOps (Grade: C, Modest 1.5–2x Gain). While the LLM could generate configuration files and suggest auth and user / system permissions changes, the ecosystem's complexity, frequent version mismatches, and subtle dependency conflicts demanded heavy manual iteration and verification. I encountered a lot of frustrating, mystifying blocks along my DevOps journey. To be fair, it should be noted that I have very little expertise when it comes to DevOps, so in effect I was a poor "LLM Intern" manager in this area.)

QA Engineering: A+ (Exponential 10–50x Gain)
For quality assurance, LLMs were a force multiplier. I leveraged them to generate exhaustive test plans from product requirements, discover obscure edge cases, and create detailed checklists for exploratory testing sessions. They were also highly effective at writing SQL queries for data validation and parsing dense server logs to quickly identify the root cause of an error. This freed up significant time to focus on higher-level testing strategy and actual bug-fixing.

The key guardrail was ensuring traceability and avoiding false positives. Prompts needed to be precise to ensure the generated test cases directly mapped back to specific user stories or acceptance criteria. For log analysis, the LLM’s findings always required cross-validation with the source data to confirm its interpretations were accurate.

This is another functional area for which I have experience and expertise from a "past professional life" - I was able to apply that expertise in getting the most of of the LLMs during the Mission Control project.

Putting It All Together: LLM Interns Are Worth It

Across the entire product development lifecycle, the net result was overwhelmingly positive. The crucial caveat remains: LLMs perform best when treated as high-leverage interns. They need clear instructions, contextual awareness, and expert oversight to stay on track and produce work that meets professional standards.

The overall impact on the Mission Control project was profound. A solo 200-hour timebox produced a production-quality application built on a modern, strategic tech stack. By my estimation, this same project would have required close to 2,000 hours of effort if tackled alone without LLM assistance. The models compressed the timeline by an order of magnitude.

For teams evaluating the ROI of integrating LLMs, consider this framework:

Candidate Tasks: Identify repetitive, template-driven, or research-intensive work.
Risk Profile: Assess the cost of an error for each task. Low-risk tasks (e.g., drafting internal docs) are great starting points.
Oversight Model: Define who reviews the AI's output and what the review criteria are.
Measurement: Track metrics like cycle time, code review throughput, and defect escape rates to quantify impact.
Change Management: Prepare your team with training and clear guidelines for effective and responsible use.

Conclusion

My recommendation for product leaders, founders, and engineering teams is to adopt LLMs through a phased "crawl, walk, run" approach. The potential for productivity gains is too significant to ignore. There are numerous low-hanging-fruit opportunities uncovered in this project that can deliver immediate value with minimal risk.

Start with tasks like synthesizing user feedback, drafting initial PRDs, generating comparison tables of new technology candidates, scaffolding boilerplate code, writing unit tests, and creating data validation queries. By integrating LLMs in a focused and measured way, you can begin unlocking their power while building the internal expertise needed for more advanced applications.

In the final post of this series, we will look ahead. I will discuss the next steps for the Mission Control project, outline key areas for deeper exploration in AI-assisted development, and share my plans for continuing these experiments with emerging LLM technologies.