Upholding Production Standards with LLMs

Sep 17, 2025

Welcome to the seventh post in our series, "Harnessing LLMs for Real-World Product Development." The central question for my "Mission Control" project was this: Can LLMs truly deliver value in real-world development scenarios while adhering to the stringent production standards that define professional software?

This post will detail how I maintained those standards while building the universal features every product team encounters. I'll break down my approach, the specific actions I took, and the critical lessons learned. The results confirm that, with the right strategy, LLMs can be powerful accelerators, not just novelties.

Defining "Real-World" Production Standards

To test the viability of LLMs, I established a non-negotiable set of production-grade standards from day one. These weren't watered-down requirements; they were the same benchmarks any professional team would set for a public-facing product.

My core commitments were:

  1. Infrastructure Excellence: The system had to be scalable, performant, and reliable. No compromises.
  2. Modern Technology Stack: I selected a stack that offered flexibility for future development and extensibility.
  3. Universal Capabilities: The application needed to include common B2B SaaS features to simulate authentic development challenges.
  4. Workflow Best Practices: I followed modern software development lifecycle (SDLC) best practices, from source control to DevOps security practices.

These standards formed the foundation of the Mission Control challenge. The goal was not just to build an app but to build it right.

How I Applied Production Standards with LLMs

Adhering to my standards required a deliberate and methodical approach. It involved more than just asking an LLM to write code. I needed to integrate the model into a professional workflow, treating it as a specialized intern that required precise direction and rigorous oversight.

Data and Asset Load Performance
Performance was a critical metric. My architectural decisions, which included a Hasura GraphQL layer over a PostgreSQL database and asset storage in Google Cloud Storage, were only part of the solution. The implementation had to be efficient.

I developed a highly specific prompting strategy to guide the LLM in constructing performant data queries and asset retrieval calls. For example, when fetching data, I instructed the model to design GraphQL queries that minimized data transfer and optimized for speed. Similarly, for image assets, the prompts specified efficient loading practices from Google Cloud Storage. Every integration built with the LLM was subjected to performance validation to ensure it met my speed and efficiency benchmarks.

I ended up refactoring certain capabilities because the load times were not up to par with my requirements. For example, the AI-generated smart tagging feature of content in Mission Control was not performant after the first implementation. Content would load, but the smart tagging module in the dashboard took seconds to load. I went through an iterative refactoring process within VS Code's Copilot for GitHub integration, following these steps:

  1. I provided a high-level expectation of load time performance for the Smart Tags Widget to the LLM, with quantifiable times expected (less than 0.5 seconds). I had the LLM write a technical proposal for the smart tagging capability that covered how it planned on reducing load times and what the pros and cons would be of these different options.
  2. I made a decision and defined a phased refactoring project to limit the scope of changes at any given time. I directed the LLM to implement the changes phase by phase, pausing to allow me to manually verify those changes were not straying from the big-picture goal.
  3. Ultimately, the refactoring reduced the load time of the smart tag module from ~5 seconds to <0.5 seconds, a successful outcome.

Building Universal B2B SaaS Features

The Mission Control app was designed as a testbed for building features common to most enterprise software. This allowed me to simulate the challenges a typical product team faces. With the LLM's assistance, I successfully designed and built:

  • User Authentication & Management: Standard create account, login, and user profile management with preference settings.
  • Content Management: A hybrid CMS driven by both Directus for structured content and a PostgreSQL/GCP Storage combination for dynamic data and assets.
  • Metadata-Based Search: Functionality to search content based on associated metadata.
  • Collaboration Tools: Features like team notes, user-to-user requests, and event scheduling to facilitate teamwork.
  • Data Obfuscation: Backend systems designed to enforce strict data separation between different customers—a cornerstone of SaaS architecture.

For each feature, I provided the LLM with strict requirements grounded in B2B SaaS expectations. The prompts demanded solutions that were performant, reliable, and delivered a high-quality user experience. Each of these features required an approved technical proposal, a phased project definition, and an iterative workflow with the LLM "intern" building out changes per phase, then pausing and documenting those changes and writing test case recommendations that I could then test manually to sign off on. This workflow worked successfully because it was methodical, well-planned, and followed real-world product development best practices (strong, clear requirements, iterative design decisions, solid documentation, and testing).

Adhering to Modern Development Workflows
Building a product is as much about process as it is about code. I integrated the LLM into established, modern workflows.

  • DevOps and Permissions: I configured my Google Cloud Platform environment with standard user and system roles, restricting access based on the principle of least privilege. This area proved challenging. Without prior DevOps experience, I was in the "blind leading the blind" scenario I've discussed before. While the LLM helped me complete the tasks, the productivity gains were modest due to the iterative and cautious approach required. But with virtually no DevOps experience, I was still able to get the job done with LLM assistance.
  • Containerization: I used Docker to containerize my backend services, including the PostgreSQL database, Hasura API layer, and the Flutter application itself. This ensured consistency across development and deployment environments.
  • Source Control: I used GitHub in VS Code from the very beginning. This was an absolute necessity. It provided a safety net, allowing me to experiment with LLM-generated solutions and easily roll back changes that didn't meet my standards.
  • Robust Debugging: Like any professional development effort, I built a comprehensive debug panel into the app. It was hidden behind a feature flag and tied to my debug configuration, allowing me to test user profiles, generate test data (using LLMs), modify company attributes, and validate my AI integrations like smart tagging and search.

Some additional benefits I realized along the way with an LLM "intern" are the ease with which I could generate solid developer documentation and test cases, which proved extremely valuable. I'll write further about the QA benefits I saw from LLM assistance in a future post.

Key Learnings and Observations

The Mission Control project proved that it is entirely possible to build a product to real-world standards using LLMs as a core part of the development team. However, success depends on a strategic shift in how you manage the work.

We observed exponential productivity gains—in some cases, between 10x and 50x—across most software engineering tasks. The notable exception was in specialized areas like DevOps, where a lack of foundational knowledge limited the LLM's effectiveness as a force multiplier.

Ultimately, we built Mission Control: a functional simulation of a real-world SaaS platform built with the same rigor as a commercial product. This experiment validated our hypothesis that LLMs can be integrated into professional workflows to accelerate development without sacrificing quality.

Success, however, isn't automatic. It requires a different type of oversight than managing human engineers. It hinges on deliberate planning, clear and explicit prompting, robust testing and validation protocols, and leveraging the LLM to create thorough documentation for future reference. When you treat the LLM as a highly capable but literal-minded team member, you unlock its true potential to build better products, faster.

Next up: We'll delve into best practices around recovery situations when AI fails, and any preventative measures that are must-haves to put into place before diving into development.