Vibe Coding with Claude

The future, it seems, is here, and I’m wasting it making vibe-coded video games.

Shown above: Jason’s Game of Life (aka Land Grab Simulator). Check out the demo, or, if you’re brave, the 100% Claude-authored code.

The proof-of-concept game was vibe-coded in a handful of hours with incredible ease. The experience was so simple that I truly believe anyone can make demo-level software now, which is incredible. Don’t believe me? Just look at the Claude conversation logs for the project shown above.

I’ve recently ventured into AI-assisted software engineering for the first time, and I thought I’d transcribe my early experiences here before I forget my initial concerns and surprises.

I’m using Claude Code with the Sonnet 4.6 model inside a Docker container while experimenting. Trekhleb’s claude-pod simplified getting Claude going in minutes, even though I’d never directly used Docker before.

The only hangup I experienced while using claude-pod was that you need to have Docker’s desktop app installed and running on your machine before you attempt to start the container via the prescribed claude-pod instructions. I had assumed the command-line instructions would automatically spin up the overarching necessary Docker services on demand, but that was not the case. Oh, also, invoking the claude-pod script occasionally required two executions to get the container running. You’ll know the container’s not running if you try to run claude-pod, then run the claude command in the terminal and nothing happens.

My initial experience working with Claude Code has been extremely impressive in many regards. Not only can Claude create (mostly) legible software, the software itself works. Furthermore, Claude can bug fix, enhance, and much more with surprising accuracy.

For my very first experiment, I began by conversationally vibe-coding a small python utility while carefully reviewing Claude’s output, and then I began another experiment in carefully creating and refining a spec-based PRD (product requirement document) that I’ll ultimately try two approaches with, one: see how Claude performs with a one-shot generation utilizing the Sonnet model vs the current Opus model, and second: see how the one-shot output compares to an iterative Ralph Loop approach. Briefly, a Ralph Loop is a method of AI orchestration that limits your AI tooling’s context to performing one task at a time before the session ends.

Continuation of those larger-scoped projects will be discussed in future posts, but for now, a high-level overview of my Claude experience thus far. First, let’s discuss privacy and security concerns.

Privacy Concerns

Everything the AI accesses will almost certainly be fed to the algorithm.

Code and other personal data that’s scanned and transmitted to an AI provider’s servers can (and likely will) be fed into the algorithm’s ongoing learning cycle. An organization’s proprietary information should, as much as possible, never be entrusted to another entity, regardless of the entity being AI or not.

Given the current AI landscape in terms of profitability and the industry’s track record of training models without consent, I am skeptical of claims that data transmitted to a vendor’s servers is not actively being fed or will not be used to feed the algorithm in the future. Conspiratorily, I wonder if data transmitted to the vendor is stored and not yet fed to the algorithm. When the inevitable large-scale mergers develop over the coming years, will our data not be sold to the highest bidder?

Security Concerns

#1: AI tooling can wreck your computer.

I was, and still am, extremely skeptical of Anthropic’s current commitment to Claude not running amok on my personal laptop. There have been several headline-grabbing incidents where Claude has deleted production databases and wiped hard drives accidentally. Presumably, these incidents occurred due to a lack of security hygiene, but it seems many such issues can be mitigated by limiting the tooling to the confines of a container such as Docker, Kubernetes, or a VM of some sort.

It’s not a great idea to give any non-deterministic software unhindered control of a computer, even if the software is warrantied to operate only within certain folders. Currently, I believe invasive AI tooling with alarming amounts of control of a computer such as OpenClaw should not be trusted. Though, perhaps with more experience, I’ll trust the advertised guard rails in time.

#2: AI Poisoning is very real.

When an AI model learns (or worse, continually learns) from poisoned data, the model will repeat what it has learned. Domains with small amounts of sample data are especially vulnerable to these attacks.

A hypothetical, overly simplified AI poisoning example: a relatively obscure SAAS company provides an API that allows removing your database. If a malicious actor is able to publicly publish sample code that’s advertised as the solution to some other problem but in reality calls the delete-database API, a model trained on that data could take the malicious sample as fact. If that were to happen, a simple, unreviewed auto-completed bit of code in your product might unintentionally wipe your database when an end-user clicks a button that was meant to perform a different operation.

My Current Security Methodology: Zero Trust

#1: Use containers.

Running Claude Code inside a Docker container with claude-pod as mentioned earlier mitigates the risk of Claude feeding data to the algorithm that I would not want it to.

#2: Backup, often.

If Claude can remove or change a file, consider it lost. Utilizing source control such as git along with a backup to a remote provider such as github can mitigate this concern. But if you’re overly paranoid or not a source control wizard, simply zip up the entire Docker container’s data mount folder from time to time and store the zip file somewhere Claude cannot get to.

#3: Assume zero privacy.

Thus far, I’ve been careful to work with Claude on greenfield projects that have been exclusively designed to be fully open-sourced. This includes all specifications, the CLAUDE.md file, and more – every single file Claude can see, I assume, can or will be fed to future models for learning. I also consider every bit of conversation with the bot to be equally fair game for data harvesting, public disclosure, and/or model fodder.

#4: Always verify results.

I am not (yet) running Claude in the possibly destructive –dangerously-skip-permissions mode, and thus, any time Claude wants to execute a command or change a file, it asks for permission to do so. These human-in-the-loop permission requests can be bypassed with the skip-permissions mode, but for my experimentation thus far, I’ve enjoyed cautiously glancing at Claude’s intentions before agreeing to the changes.

Getting Started With Claude Code

Net Ninja’s Claude Code video Tutorials and Dan Vega’s YouTube videos are great sources for picking up the basics of Claude, but, beyond getting Claude to a running state, truly, working with it is as simple as having a conversation.

Generally speaking, Claude Code uses a .claude folder in your home directory and an optional CLAUDE.md file that you create within a project’s root folder. These folders and files are where Claude stores notes for itself.

In my case, because Claude is running in a Docker container, anything it attempted to write outside of the project folder would either be forbidden by the container or be lost when the Docker container was recreated or restarted, so I asked Claude itself to create a boiler plate CLAUDE.md file in the root folder of my project and directed it to write notes it’d want to know later in the file.

Ordinarily, one might expect configuration for any piece of software to be carefully formatted and unlikely to be interpreted by a human, but here, that’s not the case. Consider a portion the CLAUDE.md file from the project we’re discussing here. After telling Claude that it was running inside a container, I asked if its memory store would be compromised if the container was restarted or recreated from scratch, and it (perhaps incorrectly) answered that even a container restart could impact its memory, so I asked it to make a note to itself in CLAUDE.md that memory stores should be within the project itself, in a misc subfolder. Here’s what Claude wrote into the CLAUDE.md file regarding this:

## Session startup — read these files first
- `misc/memory/MEMORY.md` — persistent memory index, read this and any linked files that seem relevant

As you can see, not only is CLAUDE.md perfectly legible and editable, but the formatting of the directives themselves are imprecise and written in a very casual manner.

The contents of misc/memory/MEMORY.md are similarly disarming:

# Memory Index

- [User identity](user-identity.md) — Jason Baker, jason@onejasonforsale.com, use for git config

For another project I’m working on, when asked to store anything that may be needed for future coding sessions, Claude created a file related to previously communicated instructions for git commits:

---
name: Git commit discipline
description: Only create git commits when Jason explicitly asks — do not auto-commit after edits
type: feedback
---

Only create git commits when Jason explicitly asks for one. Do not commit automatically after making file edits.

**Why:** Jason doesn't want a cluttered commit history from auto-commits after every change.

**How to apply:** Make edits freely, but hold off on `git commit` until he says so (e.g. "commit this", "make a commit", "squash: ...").

With little previous experience with LLMs in general, I found the simplicity in configuring and customizing with simple jargon-filled markdown files both fascinating and charming.

(Not) Executing Code With Claude

I am not letting Claude run code, for complicated reasons. One: my initial larger-scoped project is a python project, and the as-thin-as-can-be claude-pod Docker container has minimal content, essentially just Node.js and Claude, so having Claude execute python code on its own wasn’t initially possible, and two: the project discussed here (land grab simulator) is a vanilla javascript and html project with fairly complicated javascript, and behavior in this case is not as simple as executing a program checking output.

For the first project, rather than rebuild my Docker container to include Python for Claude, one of my first experimental curiosities was to see how flexible Claude was without typical verification methods available. My initial thought was to rigorously instruct Claude to create a Python-based test harness to execute the utility in question, with a suspicion that Claude would somehow only be compatible with popular testing standards from the community, but to my surprise, Claude was much more flexible than I imagined.

> okay, hey, let's make this easier for you to debug, augment run.sh to pipe the output to the 'tee' command, and make tee output to a file called 'lastrun.txt' for me.                                                                        
                                                                                                                        
Read 1 file (ctrl+o to expand)                                                                                        
                                                                                                                      
Update(run.sh)                                                                                                        
  2 +python3.11 src/processaudio/process_audio.py 2>&1 | tee lastrun.txt
                                                                                                                      
Done. The 2>&1 also captures any errors into the file, not just standard output — so if something crashes you'll have the full picture in lastrun.txt. Also worth adding that file to .gitignore.

... later ...

> thanks, okay, there's another new error, look at lastrun.txt now

Read 1 file (ctrl+o to expand)                                                                                        

Same pattern — another missing transitive dependency. Add tensorflow to pyproject.toml the same way: ...

As you can see above, I simply invented a crude workaround method of collaboration. As we made progress, I ran the utility that we were working on and piped the output from the utility to a file called lastrun.txt that Claude had access to. Then, Claude reviewed the output when needed.

Claude immediately understood our working relationship for the project from then on. Further, I often purposely tested Claude’s abilities by referring very vaguely to the designated lastrun.txt file that we were using to Claude to see the utility’s output. I used phrasing such as “check the output again?” or “I ran it, look at the last run file.” In all of these cases, Claude never once made a mistake in understanding my directions no matter how sloppy or vague my communication became – incredible.

Claude’s Impressive Abilities

#1: Conversational Style.

Like other LLMs, Claude’s conversational style seemingly adapts to match your own. In particular, perhaps due to the way I initially conversed from my side, Claude was not overly congratulatory as we worked together. Instead, it seemingly silently adopted my communication style and conversed with me in a friendly and professional tone that I have not experienced with other LLMs.

#2:Bug fixing.

Claude possesses remarkable capabilities in terms of hands-off bug fixing. Even with intentionally vague bug descriptions such as “the Genghis bot sometimes seems to get stuck”, Claude only proposed and attempted changes when it was as certain as it could be that the change would fix the issue. This was particularly impressive when working with the JavaScript-based game that Claude had no visual or logged output to interpret.

On rare occasions, Claude scanned a number of files more than once as it changed its mind on the way to attack a given issue, but in all cases, Claude always verbosely described the issue just before it proposed code changes. Additionally, there were very few issues that were not fixed on the first try, and regressions or new issues rarely occurred as well.

#3: Non-trivial feature changes.

While developing the land grab game, I purposely led Claude down an overly simplistic vibe-coded path to quickly produce a working proof of concept. As we worked, I rarely even glanced at the code that Claude was creating. I followed this approach because I was incredulous that Claude would be able to handle anything as complicated as a simple game.

As we worked through the task of me sloppily describing next steps, Claude easily delivered a working proof of concept without issue. The proof of concept hinged on a single game mode: a simplistic mode where the number of tiles owned by a given bot would determine the winner. Once the so-called ‘boring’ game mode was working, I worked with Claude to add numerous bots with differing behaviors along with powerups, board size, and game time options.

After I was satisfied at Claude’s delivery of the minimal version of the game, I instructed it to add a tournament mode:

> okay, let's implement the other game modes, first, tournament mode is a new mode, in this mode we keep track of howmany wins a bot has and we run the game multiple times. when a game round ends, we increment the win count of a given bot. after each round a popup is shown showing the current leaderboard which shows the number of wins that bot has, and cumulative score. the game is over when a bot wins 3 rounds.

Before diving in, a few quick questions to make sure I get this right:

User answered Claude's questions:

How is the round winner determined? → Highest score (most cells)

What happens on a tie for round winner? → Both get a win

After the final round (3-win threshold reached), does the popup look different? → Yes — declare a tournament winner

What does New Game do during an in-progress tournament? → Resets the tournament (starts fresh)

Good, I have everything I need. Here's the plan:

 - start() gets a resetTournament param — New Game passes true, Next Round passes false
 - endGame() branches into handleTournamentRoundEnd() when in tournament mode
 - A single tournament overlay handles both the between-round popup and the final winner screen 
 - The all-bots-dead path in tick() gets simplified to just call endGame() so tournament popups fire in that case too
 - Score panel title becomes "round N" during tournament mode                                                          
                                                                                                                        
Starting the edits now.

In making the edits for the additional tournament mode, Claude burned through 10% of the current token window, but had the new game mode properly implemented with very few issues within minutes.

#4: Claude’s high tolerance for bullshit.

Claude can withstand an incredible amount of imprecise communication. As our vibe-coding session continued delivering consistently and impressively, I began occasionally testing Claude’s inference abilities. Where I had started days before with an incredibly polite, precise, and friendly demeanor, I wondered how careful my communication really had to be. Some examples that Claude handled perfectly:

> change the game mode slider to be a radio button for the three options

> lawnmower should work like a lawnmore that's slowly cutting grass inward, when it is on a track that is fully it's color, it should move inward one column and row.

> commit, and then suggest some new powerup types we could do

> when the game is in battle royale mode, make the title of the score card say "Battle Royale"

> now the settings panel is too tall and i can't scroll up and down to see it all, can we make it where wcrolling the page will make it possible to go up and down the settings panel view? right now its like the page behind the settings panel moves up and down when i drag with my finger but the settings panel is floating above the page never moving

> make the score part of the score card on mobile be two columns, left side is the top N bots by score, right side is the bottom N, where each column has roughly half of the bot names

> for some reason the powerup sprites are not scaling with the canvas now? they're sometimes off the board?

> update spec and commit

Impressively, Claude was not once confused by any directive I communicated, and often times it would ask clarifying questions if my directives contradicted or complicated previous instructions. Misspelling did not matter. Punctuation did not matter. Casually referring to a “spec” without explicitly saying “misc/specifications.md” did not affect Claude’s excellent accuracy in any cases. In short, communicating imprecisely with Claude seemed to work better than it does with humans.

Issues with Claude

While working with Claude, I’ve encountered a few unexpected issues, several of which are probably easily remedied with very little continued effort.

Issue #1: Sloppy code.

I suppose if one is vibe-coding a proof of concept, quality of the underlying code should not matter too much, but in the case of the game, I had Claude code on its own and the spaghetti quickly became, in my opinion, unacceptable. While much of the code produced is legible, occasional impenetrable job-security passages such as the following from drunk-jezzball.js were generated as well:

chooseNewAxis() {
	const axes = [
		[[-1, 0], [1, 0]],
		[[0, -1], [0, 1]],
		[[-1,-1], [1, 1]],
		[[-1, 1], [1,-1]],
	];
	const pair = axes[Math.floor(Math.random() * axes.length)];
	if (Math.random() < 0.5) {
		[this.dc, this.dr] = pair[0];
		[this.partner.dc, this.partner.dr] = pair[1];
	} else {
		[this.dc, this.dr] = pair[1];
		[this.partner.dc, this.partner.dr] = pair[0];
	}
	this.stuck = false;
	this.partner.stuck = false;
}

Perhaps to a proper game engineer, the matrix shown toward the top of the code block is standard boilerplate of some sort, but to me it may as well be written in a foreign language. Further, what is a ‘dc’ or a ‘dr’? The generated code often favored very succinct variable names such as these that’d be intermixed with variables that were named much more clearly.

Another source of frustration that surfaced as we added new game modes to the project was that Claude’s implementation became sloppier. I would not be surprised if this issue could have been avoided had I given Claude a well-defined, holistic specification to work from at the very start of the session rather than randomly springing the idea of significant changes in the middle of the project’s completion. To me, these hacks look not at all dissimilar to not only code I’ve witnessed, but code I myself have written when previously unknown scope balloons while time constraints remain unchanged. An excerpt from the project’s game.js file, where 90% of the game’s code resides:

if (this.gameMode === 1) {
	this.tournamentRound++;
}

const scorePanelTitle = document.querySelector('.score-panel-title');
if (scorePanelTitle) {
	scorePanelTitle.textContent = this.gameMode === 1 ? `round ${this.tournamentRound}` : this.gameMode === 2 ? 'Battle Royale' : 'Scores';
}

if (this.gameMode === 2) {
	this.activeBounds = { minCol: 0, maxCol: COLS - 1, minRow: 0, maxRow: ROWS - 1 };
	this.shrinkInterval = setInterval(() => this.shrinkPerimeter(), 10000);
} else {
	this.activeBounds = null;
}

Here, if it were me, at a minimum, gameMode constants would be well-named, and I wouldn’t personally nest multiple ternary operators for the score panel’s title. There are many such strange decisions in the code that I’m inclined to believe are so nonsensical that only a human could have written them and Claude simply copy/pasted.

I presume some of the issues I have with this code likely could have been addressed by directing Claude to make the variable naming more explicit and/or add comments inline, but for the sake of unadulterated purity, I’ve left these issues alone in the final product in case other engineers are curious at the default behaviors shown here.

If an experiment in having Claude elongate variable names and add verbose commentary to the code was successful, I would then ask Claude to add notes about these preferences into the project’s CLAUDE.md file for future automatic reference when a new Claude session is started.

A closing thought regarding the code’s structure: I explicitly required Claude to create the project using vanilla JavaScript rather than a more appropriate choice such as TypeScript, as my preference for the end product was that someone can simply copy/paste the project’s source files and use them without installing Node.js or some other such unnecessarily complicated tooling for this use case.

My working theory / curiosity is that Claude’s software engineering is a product of the industry’s output that it feeds upon. That is, vanilla JavaScript isn’t all that popular, video game developers are notorious for dense, hard-to-follow code, and engineers such as myself who prefer vanilla JavaScript when possible are typically doing so out of a characteristic bout of laziness. Therefore, Claude creating somewhat half-assed and hard to read code isn’t all that surprising given the corpus the output is based on.

My theory that the code is a product of the sources the model fed upon is further bolstered by the anecdotal example of a small project my friend created with Claude. His project, a React-based kitchen planning app, benefits from a clear and decisive CLAUDE.md file that contains explicit directions for Claude regarding the placement and content of various source files, shorthand descriptions of database table layout, and a number of smart default directives preventing Claude from getting too far off the script, such as ‘Do not generate exact grocery quantities‘.

Perhaps the land of vanilla javascript tech demos is not as mature nor elegant as the mountains of properly-formatted and documented React source code in the world, and thus Claude’s contributions to the proof-of-concept game was less impressive than work in other projects, but a part of me wonders if perhaps Claude can infer my tolerance for trashy code in this instance based on my sloppy conversational style that I began the game project with.

Issue #2: Specifications are non-ideal.

To Claude’s credit, Claude will write specifications and allow me to conversationally work through the spec with it, which is more than I can say for my own typical work preferences, as it is very hard to write software specifications – and it’s boring, too.

Ultimately, I’m incredibly impressed with Claude’s ability to write very precise specifications with very little error for a fairly small project, but in my spec-first experiment on my previous project, I found that Claude needs to be repeatedly asked to refine the spec and check for errors.

As we refined the spec, it became increasingly clear that Claude was having trouble keeping the whole picture in mind, as it’d occasionally be confused about something it’d written clearly in the document earlier. For clarity, in each of these cases, after working through an iteration of improvements, I’d ask Claude quite simply to re-read the specification and suggest revisions. Perhaps that experience was an artifact of the project at hand’s complexity, but given my lack of experience thus far, I’m not sure.

Issue #3: Complexity.

As mentioned previously, while vibe-coding this relatively simple and tiny JavaScript game, I burned through an astonishing amount of tokens. The very simple project consumed one full five-hour token allowance, and 40% of a second five-hour window. Further, on my limited subscription, the project ate 70% of my entire weekly token allowance as well.

I’m currently on what Anthropic is calling the ‘Pro’ account plan that runs $20 per month here in the before times, while AI vendors are underpricing their offerings in hopes of adoption. Currently, other plans exist with 5x ($100) and 20x ($200) as many tokens as my current plan, allegedly.

The precision of what defines a ‘token’ and the nebulous, undisclosed amount of tokens available for any given Claude session seems a bit hand-wavey currently, but I presume this is either a measure taken to obscure the true cost of Claude usage, or perhaps proper per-usage observability for LLMs is simply too expensive to implement. Conspiratorially, given the zeitgeist’s current witch hunt, it’s probably in Anthropic’s best interest to not tell me the precise number of kilowatt-hours I’m burning when I tell the robot, “thank you.”

Regardless of the reason, the ‘Pro’ subscription I’m currently on is so limited that I cannot realistically expect to do anything more complex than incredibly simplistic tech demos and limited experiments, and I’m not sure that upgrading to the 5x $100 plan would work well for me either, as a simple Ralph Loop agentically chugging through a task list with dozens of tasks for a complex project would likely chew through a 5x alotment within a single day.

Enterprise-level subscription plans presumably exist for Claude, though I’ve not done the research to determine if that’s true, mainly because I would not be able to independently afford a multi-thousand-dollar usage-based bill should a Ralph Loop develop a mind of its own while I step out for a coffee break.

Still, even with an unlimited amount of tokens at my disposal, as the complexity of the game increased, Claude’s token usage and thinking time often seemed to be growing larger and larger, but I haven’t truly dug in and graphed the data to prove this perception. Regardless, Claude taking a moment to process a game with less than two thousand lines of (admittedly more complex than usual) code doesn’t seem to bode well for much larger projects in theory.

Issue #4: Software Engineering has permanently changed.

I don’t have anything useful to add to the prevalent lamentation here, except to say that I personally very quickly experienced not only the excitement at AI’s potential to make my job easier, more efficient, and less error-prone, but I also experienced the whiplash.

The software engineering world as we’ve previously understood it is a relic of the past. Like our mentors who cut their teeth on debugging literal ones and zeros in memory during the assembly days, the amount of hands-on-keyboard time required to get shit done has now decreased dramatically. I’m here for it, personally.

I think the AI revolution for our industry, just the software engineering aspect alone, is going to bring a future that was millions of years away to our present-day doorstep with great ease, just like Excel and Windows did. The jury’s still out on precisely measuring engineering productivity increases with scientific accuracy, but with my limited experience already, I am excited to see creation of simple and even complex software become democratized and accessible to anyone with an idea.

As to the argument of privacy, I’m guessing that we’ll be running present-day frontier models of AI on our own commodity consumer-grade hardware soon. Unless I’m seriously underestimating the amount of hardware Anthropic is subsidizing with my monthly $20 subscription, I think we’ll all be vibe-coding very complex software alarmingly soon. Once the model is running on our personal computers, assuming they work without a suspicious internet connection requirement, privacy will be a non-issue.

After decades of seeing various forms of WYSIWYG editors and other such snake oil come and go, I was inclined to dismiss the hype, but after my limited experience so far, I think vibe coding is here to stay. In the near future, this means less time spent hand-crafting proof-of-concept non-functional screenshots and UML graphics in PowerPoint, and instead we will be casually building fully-functional software prototypes to show at a single afternoon meeting before discarding the project entirely. I can’t wait.