Like everything else, we have been using AI in various forms for a while now, from asking ChatGPT to write a function to asking it to explain an error, then graduating to running it on our code in the IDE, and finally to full-blown independent coding assistants.
Recently, we shifted into a much higher gear, rolling it out across most of the teams at RavenDB. I want to talk specifically about what that looks like in practice in real production software.
RavenDB is a mature codebase, with about 18 years of history behind it. The core team is a few dozen developers working on this full-time. We also care very deeply about correctness, performance, and maintainability.
With all the noise about Claude, Codex, and their ilk recently, we decided to run some experiments to see how we can leverage them to help us build RavenDB.
The numbers that got my attention
We started with features that were relatively self-contained — ambitious enough to be real work, but isolated enough that an AI agent could take them end-to-end without stepping on core aspects of RavenDB.
The first one was estimated at about a month of work for a senior developer. We completed it in two days. To be fair, a significant portion of that time was spent learning how to work effectively with Claude as an agent, learning the ropes and the right discipline and workflows, not just the task itself.
The second was estimated at roughly three months for an initial version. It was delivered in about a week. And we didn't just hit the target — we significantly exceeded the planned feature set.
In terms of efficiency, we are talking about a proper leap from what we previously could expect.
This isn't vibe coding
I want to be direct about something: this is not "prompt it and ship it." There is a discipline required here. The AI can move very fast, explore a lot of ground, and generate code that looks right, but isn’t. Code ownership and engineering responsibility don't go away; they become much more demanding.
I personally sat and read 30,000 lines of code. I had to understand what was there, push back on decisions, redirect the approach, and enforce the standards that RavenDB has built up over many years.
Those 30,000 lines of code didn’t appear out of thin air. They were the final result of a lot of planning, back and forth with the agent, incremental steps in the right direction (and many wrong ones, etc.).
To be fair, 30,000 lines of code sounds like a lot, right? About 60% of that is actually tests, and about half of the remaining code is boilerplate infrastructure that we need to have, but isn’t really interesting.
The juicy parts are only around 5,000 lines or so.
In many respects, this isn’t prompt-and-go but feels a lot more like a pair programming session on steroids.
What AI agents give you is the ability to explore the problem space cheaply and quickly. After we had something built, I had a different idea about how to go about implementing it. So I asked it to do that, and it gave me something that I could actually explore.
Being able to evaluate multiple different approaches to a solution is crazy valuable. It is transformative for architectural decisions.
Having said that, using a coding agent to take all the boilerplate stuff meant that I was able to focus on the “fun parts”, the pieces that actually add the most value, not everything else that I need to do to get to that part.
What this means going forward
AI agents are going to amplify your existing engineering culture, for better or worse.
A lot of the cost of writing good software is going to move from actually writing code to reviewing it. For many people, the act of writing the code was also the part where they thought about it most deeply.
Now the thinking part moves either upfront, at the planning phase, or to the end, when you look at the pull request. Reading a pull request, you could reasonably expect to see code that has already been reasoned about and properly tamed.
Now, in some cases, this is the first time that a human is actually going to properly walk through the whole thing. To ensure proper quality, you also need to shift a lot of your focus to that part.
The bottleneck for good software is going to be the review cycle, the architectural approach, and an experienced team that can actually evaluate the output and ensure consistent high quality.
Without that, you can go very fast, but just generating code quickly is a losing proposition. You’ll go very fast directly into a painful collision with a wall.
We are still settling down and trying to properly understand the best approach to take, but I have to say that this experiment was a major success.
