When AI Fails: How to Avoid Pitfalls and Recover
Harnessing LLMs for Real-World Product Development: Post 8
Introduction
Throughout the Mission Control project—a 200-hour challenge to build a production-ready app solo using LLMs—there were several occasions where the AI went completely off the rails. These experiences were invaluable, teaching me how to recognize when a project was heading for trouble and how to recover effectively when failures occurred.
This post shares the key lessons learned from those challenges. For any founder or developer integrating LLMs into their product development workflow, these insights can help you avoid common pitfalls, manage your AI "intern" more effectively, and execute a swift recovery when things go wrong.
Early Warning Signs: Preventing Failure Before It Starts
Recognizing the early indicators of an impending AI failure is the most effective strategy for mitigating risk. Here are a few crucial warning signs I identified and the preventative measures that proved most effective.
1. An Inadequate Development Environment
One of the first and most critical recommendations is the importance of a professional-grade development environment (which is why this was one of the "north star" production standards I set up front). Attempting to manage AI-generated code without robust safeguards is a recipe for disaster. The most essential preventative tool is source control.
For Mission Control, I used a combination of GitHub and its integration with Visual Studio Code. This setup, paired with GitHub Copilot, provided the control needed to manage code changes with confidence. Knowing I could discard file changes or roll back to a previous commit at any time offered peace of mind. It created a safety net that allowed me to experiment with the LLM's suggestions without risking the entire project.
2. Working in Low-Data Domains
LLMs are only as knowledgeable as their training data. If you are working in a niche technical area, using a newer technology, or tackling a domain with sparse documentation, you must temper your expectations. The LLM will not be as reliable or accurate.
I encountered this twice with Mission Control:
- DevOps in Google Cloud: The world of DevOps can be esoteric, with much of the knowledge siloed in the minds of experienced professionals rather than in clear, accessible documentation. When using an LLM to navigate Google Cloud's infrastructure, I frequently ran in circles dealing with user permissions, security settings, and role configurations. My own lack of deep expertise in this area compounded the problem, making it difficult to guide the LLM toward a correct solution.
- Directus CMS: I intentionally chose Directus, an emerging headless CMS, for its powerful capabilities and cost-effectiveness. However, its newer status meant less training data for the LLM. With less public community support and a smaller documentation footprint, the productivity boost from the AI was noticeably diminished.
Knowing you are entering a less-supported domain allows you to adjust your strategy. Rely less on the LLM for flawless execution and more for brainstorming and foundational code, while preparing for more manual oversight.
3. Giving the LLM Too Much Leeway
I quickly learned to recognize a "spidey sense" that things were about to go wrong, which almost always happened when I got greedy with my requests. Treating the LLM as a talented but inexperienced intern is the correct mental model. You wouldn't ask an intern to build a complex feature from scratch without supervision; the same principle applies here.
The most effective preventative measure was a structured, phased approach to feature development:
- Collaborative Design: First, I worked with the LLM in a conversational manner to outline a technical design and implementation strategy for the new feature.
- Phased Implementation: Once the design was solid, I broke the project into small, manageable phases. For each phase, I provided a highly specific prompt detailing the exact requirements and explicitly stating that no other code should be changed.
- Validation and Documentation: Each prompt concluded with a directive to stop development and await manual testing. I also required the LLM to document its changes against my technical design and provide test cases for me to execute.
This bite-sized approach was the single most effective technique for preventing major failures and keeping the project on track.
Failure Recovery: Pivoting and Rebuilding
Despite the best preventative measures, failures will still happen. When they do, having a clear recovery plan is essential.
1. When the LLM Gets Stuck: Pivot Your Approach
There were instances where, even with a phased approach, the LLM simply could not resolve a bug or implement a feature correctly. The typical pattern was this: I'd complete a phase, but the feature would have persistent bugs that the LLM was unable to fix. At its most frustrating, the LLM would start repeating its previous, ineffective recommendations—a sure sign you're in a pointless loop.
One memorable example was a refactoring project to make the left-hand navigation in Mission Control static. I got most of the way there, but a gnarly bug caused the navigation to revert to its old behavior during certain page transitions. The LLM was stumped.
To break the cycle, I pivoted my approach entirely. Instead of asking for a fix, I instructed the LLM to write a detailed technical document outlining the code path during the problematic page transition. Using this LLM-generated documentation, I manually debugged the code. The process quickly uncovered the root cause: the LLM was confusing two similarly named functions and occasionally calling the old, incorrect one. This was a "head-scratcher" moment that reinforced the "talented intern" model—the AI lacked the higher-level awareness to see its own mistake. By changing the task from "fix this" to "explain this," I found the solution.
2. Recovering from Catastrophic Failures
Of course, there were times when the LLM went completely off the rails, leading to catastrophic code changes. These are the scenarios that can be most harmful to a project if the right safeguards aren't in place.
Early in the development of Mission Control, I got ambitious and tasked the LLM with implementing content obfuscation based on a user's company—a core feature for a B2B SaaS application. I should have broken this down, but instead, I asked for a major change to the object model to associate users with a new "company" entity. The LLM made incorrect, sweeping changes that corrupted some of the user data.
This was an anxiety-inducing moment, but recovery was manageable for two reasons:
- Source Control: Rolling back the code to its last known good state was simple thanks to my GitHub setup.
- Test Data: The corrupted data was minimal and only for testing purposes. However, it was a stark reminder of the potential damage in a production environment.
After reverting the code and manually fixing the data, I re-approached the feature with the LLM using a much more granular, phased strategy. I eventually succeeded, but the experience underscored the absolute necessity of source control, developer documentation, and clear test cases.
Conclusion
The 200-hour Mission Control project taught me so much in terms of leveraging LLMs for real-world development. In particular, how to prevent failures, pivot when an LLM gets stuck, and recover from catastrophic errors. It was a learning process conducted in a controlled environment, and the insights gained have proven invaluable.
For anyone integrating LLMs into their development process, I hope these lessons provide tangible value. By treating your LLM as a capable intern and implementing the right safeguards, you can avoid major pitfalls and harness its power to accelerate your product development.
In my next post, I will explore how LLMs can be used to boost productivity and effectiveness in the QA process.