I’m trying to reason about the behavior of this code, and I can’t decide if this is a stroke of genius or if I’m suffering from a stroke. Take a look at the code, and then I’ll discuss what I’m trying to do below:
The idea is pretty simple, I’m opening the same file twice. Once in buffered mode and mapping that memory for both reads & writes. The problem is that to flush the data to disk, I have to either wait for the OS, or call FlushViewOfFile() and FlushFileBuffers() to actually flush it to disk explicitly.
The problem with this approach is that FlushFileBuffers() has undesirable side effects. So I’m opening the file again, this time for unbuffered I/O. I’m writing to the memory map and then using the same mapping to write to the file itself. On Windows, that goes through a separate path (and may lose coherence with the memory map).
The idea here is that since I’m writing from the same location, I can’t lose coherence. I either get the value from the file or from the memory map, and they are both the same. At least, that is what I hope will happen.
For the purpose of discussion, I can ensure that there is no one else writing to this file while I’m abusing the system in this manner. What do you think Windows will do in this case?
I believe that when I’m writing using unbuffered I/O in this manner, I’m forcing the OS to drop the mapping and refresh from the disk. That is likely the reason why it may lose coherence, because there may be already reads that aren’t served from main memory, or something like that.
This isn’t an approach that I would actually take for production usage, but it is a damn interesting thing to speculate on. If you have any idea what will actually happen, I would love to have your input.
I would really love to have a better understanding of what is going on here!
If you format a 32 MB disk using NTFS, you’ll get the following result:
So about 10 MB are taken for NTFS metadata. I guess that makes sense, and giving up 10 MB isn’t generally a big deal these days, so I wouldn’t worry about it.
I write a 20 MB file and punch a hole in it between 6 MB and 18 MB (12 MB in total), so we have:
And in terms of disk space, we have:
The numbers match, awesome! Let’s create a new 12 MB file, like so:
And the disk is:
And now I’m running the following code, which maps the first file (with the hole punched in it) and writes 4 MB to it using memory-mapped I/O:
HANDLE hMapFile = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, 0, NULL);if(hMapFile == NULL){
fprintf(stderr, "Could not create file mapping object: %x\n", GetLastError());
exit(__LINE__);}
char* lpMapAddress = MapViewOfFile(hMapFile, FILE_MAP_WRITE, 0, 0, 0);if(lpMapAddress == NULL){
fprintf(stderr, "Could not map view of file: %x\n", GetLastError());
exit(__LINE__);}for(i =6 * MB; i <10 * MB; i++){((char*)lpMapAddress)[i]++;
}
if (!FlushViewOfFile(lpMapAddress,0)){
fprintf(stderr, "Could not flush view of file: %x\n", GetLastError());
exit(__LINE__);}if(!FlushFileBuffers(hFile)){
fprintf(stderr, "Could not flush file buffers: %x\n", GetLastError());
exit(__LINE__);}
The end for this file is:
So with the other file, we have a total of 24 MB in use on a 32 MB disk. And here is the state of the disk itself:
The problem is that there used to be 9.78 MB that were busy when we had a newly formatted disk. And now we are using at least some of that disk space for storing file data somehow.
I’m getting the same behavior when I use normal file I/O:
moveAmount.QuadPart =6 * MB;
SetFilePointerEx(hFile, moveAmount, NULL, FILE_BEGIN);for(i =6; i <10; i++){if(!WriteFile(hFile, buffer, MB, &bytesWritten, NULL)){
fprintf(stderr, "WriteFile failed on iteration %d: %x\n", i, GetLastError());
exit(__LINE__);}}
So somehow in this sequence of operations, we get more disk space. On the other hand, if I try to write just 22 MB into a single file, it fails. See:
hFile = CreateFileA("R:/original_file.bin", GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);if(hFile == INVALID_HANDLE_VALUE){
printf("Error creating file: %d\n", GetLastError());
exit(__LINE__);}for(int i =0; i <22; i++){if(!WriteFile(hFile, buffer, MB, &bytesWritten, NULL)){
fprintf(stderr, "WriteFile failed on iteration %d: %x\n", i, GetLastError());
exit(__LINE__);}}
You can find the full source here. I would love to understand what exactly is happening and how we suddenly get more disk space usage in this scenario.
Today I set out to figure out an answer to a very specific question. What happens at the OS level when you try to allocate disk space for a sparse file and there is no additional disk space?
Sparse files are a fairly advanced feature of file systems. They allow you to define a file whose size is 10GB, but that only takes 2GB of actual disk space. The rest is sparse (takes no disk space and on read will return just zeroes). The OS will automatically allocate additional disk space for you if you write to the sparse ranges.
This leads to an interesting question, what happens when you write to a sparse file if there is no additional disk space?
Let’s look at the problem on Linux first. We define a RAM disk with 32MB, like so:
As expected, this code will fail on the 5th write (since there is no disk space to allocate in the disk). The error would be:
Write error: errno = 28 (No space left on device)
Here is what the file system reports:
$ du -h /mnt/ramdisk/*4.0M /mnt/ramdisk/anotherfile
28M /mnt/ramdisk/fullfile
$ ll -h /mnt/ramdisk/
total 33M
drwxrwxrwt 2 root root 80 Jan 910:43./
drwxr-xr-x 6 root root 4.0K Jan 910:30../-rw-r--r--1 ayende ayende 4.0M Jan 910:43 anotherfile
-rw-r--r--1 ayende ayende 32M Jan 910:43 fullfile
As you can see, we have a total of 32 MB of actual size reported, but ll is reporting that we actually have files bigger than that (because we have hole punching).
What would happen if we were to run this using memory-mapped I/O? Here is the code:
This will lead to an interesting scenario. We need to allocate disk space for the memory, and we’ll do so (note that we are writing into the hole), and this code will fail with a segmentation fault.
It will fail in the loop, by the way, as part of the page fault to bring the memory in, the file system needs to allocate the disk space. If there is no such disk space, it will fail. The only way for the OS to behave in this case is to fail the write, which leads to a segmentation fault.
I also tried that on Windows. I defined a virtual disk like so:
This creates a 32MB disk and assigns it the letter R. Note that we are using NTFS, which has its own metadata, we have roughly 21MB or so of usable disk space to play with here.
Here is the Windows code that simulates the same behavior as the Linux code above:
#include <stdio.h>#include <windows.h>#define MB (1024 * 1024)
int main(){
HANDLE hFile, hFile2;
DWORD bytesWritten;
LARGE_INTEGER fileSize, moveAmount;
char* buffer = malloc(MB);
int i;
DeleteFileA("R:\\original_file.bin");
DeleteFileA("R:\\another_file.bin");
hFile = CreateFileA("R:/original_file.bin", GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);if(hFile == INVALID_HANDLE_VALUE){
printf("Error creating file: %d\n", GetLastError());
exit(__LINE__);}for(int i =0; i <20; i++){if(!WriteFile(hFile, buffer, MB, &bytesWritten, NULL)){
fprintf(stderr, "WriteFile failed on iteration %d: %x\n", i, GetLastError());
exit(__LINE__);}if(bytesWritten != MB){
fprintf(stderr, "Failed to write full buffer on iteration %d\n", i);
exit(__LINE__);}}
FILE_ZERO_DATA_INFORMATION zeroDataInfo;
zeroDataInfo.FileOffset.QuadPart =6 * MB;
zeroDataInfo.BeyondFinalZero.QuadPart =18 * MB;if(!DeviceIoControl(hFile, FSCTL_SET_SPARSE, NULL, 0, NULL, 0, NULL, NULL)||!DeviceIoControl(hFile, FSCTL_SET_ZERO_DATA, &zeroDataInfo, sizeof(zeroDataInfo), NULL, 0, NULL, NULL)){
printf("Error setting zero data: %d\n", GetLastError());
exit(__LINE__);}
// Create another file of size 4 MB
hFile2 = CreateFileA("R:/another_file.bin", GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);if(hFile2 == INVALID_HANDLE_VALUE){
printf("Error creating second file: %d\n", GetLastError());
exit(__LINE__);}for(int i =0; i <4; i++){if(!WriteFile(hFile2, buffer, MB, &bytesWritten, NULL)){
fprintf(stderr, "WriteFile 2 failed on iteration %d: %x\n", i, GetLastError());
exit(__LINE__);}if(bytesWritten != MB){
fprintf(stderr, "Failed to write full buffer 2 on iteration %d\n", i);
exit(__LINE__);}}
moveAmount.QuadPart =12 * MB;
SetFilePointerEx(hFile, moveAmount, NULL, FILE_BEGIN);for(i =0; i <8; i++){if(!WriteFile(hFile, buffer, MB, &bytesWritten, NULL)){
printf("Error writing to file: %d\n", GetLastError());
exit(__LINE__);}}return0;}
And that gives us the exact same behavior as in Linux. One of these writes will fail because there is no more disk space for it. What about when we use memory-mapped I/O?
I didn’t bother checking Mac or BSD, but I’m assuming that they behave in the same manner. I can’t conceive of anything else that they could reasonably do.
RavenDB is a database, a transactional one. This means that we have to reach the disk and wait for it to complete persisting the data to stable storage before we can confirm a transaction commit. That represents a major challenge for ensuring high performance because disks are slow.
I’m talking about disks, which can be rate-limited cloud disks, HDD, SSDs, or even NVMe. From the perspective of the database, all of them are slow. RavenDB spends a lot of time and effort making the system run fast, even though the disk is slow.
An interesting problem we routinely encounter is that our test suite would literally cause disks to fail because we stress them beyond warranty limits. We actually keep a couple of those around, drives that have been stressed to the breaking point, because it lets us test unusual I/O patterns.
We recently ran into strange benchmark results, and during the investigation, we realized we are actually running on one of those burnt-out drives. Here is what the performance looks like when writing 100K documents as fast as we can (10 active threads):
As you can see, there is a huge variance in the results. To understand exactly why, we need to dig a bit deeper into how RavenDB handles I/O. You can observe this in the I/O Stats tab in the RavenDB Studio:
There are actually three separate (and concurrent) sets of I/O operations that RavenDB uses:
Blue - journal writes - unbuffered direct I/O - in the critical path for transaction performance because this is how RavenDB ensures that the D(urability) in ACID is maintained.
Green - flushes - where RavenDB writes the modified data to the data file (until the flush, the modifications are kept in scratch buffers).
Red - sync - forcing the data to reside in a persistent medium using fsync().
The writes to the journal (blue) are the most important ones for performance, since we must wait for them to complete successfully before we can acknowledge that the transaction was committed. The other two ensure that the data actually reached the file and that we have safely stored it.
It turns out that there is an interesting interaction between those three types. Both flushes (green) and syncs (red) can run concurrently with journal writes. But on bad disks, we may end up saturating the entire I/O bandwidth for the journal writes while we are flushing or syncing.
In other words, the background work will impact the system performance. That only happens when you reach the physical limits of the hardware, but it is actually quite common when running in the cloud.
To handle this scenario, RavenDB does a number of what I can only describe as shenanigans. Conceptually, here is how RavenDB works:
deftxn_merger(self):while self._running:with self.open_tx()as tx:while tx.total_size < MAX_TX_SIZE and tx.time < MAX_TX_TIME:
curOp = self._operations.take()if curOp isNone:break# no more operations
curOp.exec(tx)
tx.commit()# here we notify the operations that we are done
tx.notify_ops_completed()
The idea is that you submit the operation for the transaction merger, which can significantly improve the performance by merging multiple operations into a single disk write. The actual operations wait to be notified (which happens after the transaction successfully commits).
If you want to know more about this, I have a full blog post on the topic. There is a lot of code to handle all sorts of edge cases, but that is basically the story.
Notice that processing a transaction is actually composed of two steps. First, there is the execution of the transaction operations (which reside in the _operations queue), and then there is the actual commit(), where we write to the disk. It is the commit portion that takes a lot of time.
Here is what the timeline will look like in this model:
We execute the transaction, then wait for the disk. This means that we are unable to saturate either the disk or the CPU. That is a waste.
To address that, RavenDB supports async commits (sometimes called early lock release). The idea is that while we are committing the previous transaction, we execute the next one. The code for that is something like this:
deftxn_merger(self):
prev_txn = completed_txn()while self._running:
executedOps =[]with self.open_tx()as tx:while tx.total_size < MAX_TX_SIZE and tx.time < MAX_TX_TIME:
curOp = self._operations.take()if curOp isNone:break# no more operations
executedOps.append(curOp)
curOp.exec(tx)if prev_txn.completed:break# verify success of previous commit
prev_txn.end_commit()# only here we notify the operations that we are done
prev_txn.notify_ops_completed()# start the commit in async manner
prev_txn = tx.begin_commit()
The idea is that we start writing to the disk, and while that is happening, we are already processing the operations in the next transaction. In other words, this allows both writing to the disk and executing the transaction operations to happen concurrently. Here is what this looks like:
This change has a huge impact on overall performance. Especially because it can smooth out a slow disk by allowing us to process the operations in the transactions while waiting for the disk. I wrote about this as well in the past.
So far, so good, this is how RavenDB has behaved for about a decade or so. So what is the performance optimization?
This deserves an explanation. What this piece of code does is determine whether the transaction would complete in a synchronous or asynchronous manner. It used to do that based on whether there were more operations to process in the queue. If we completed a transaction and needed to decide if to complete it asynchronously, we would check if there are additional operations in the queue (currentOperationsCount).
The change modifies the logic so that we complete in an async manner if we executed any operation. The change is minor but has a really important effect on the system. The idea is that if we are going to write to the disk (since we have operations to commit), we’ll always complete in an async manner, even if there are no more operations in the queue.
The change is that the next operation will start processing immediately, instead of waiting for the commit to complete and only then starting to process. It is such a small change, but it had a huge impact on the system performance.
Here you can see the effect of this change when writing 100K docs with 10 threads. We tested it on both a good disk and a bad one, and the results are really interesting.
The bad disk chokes when we push a lot of data through it (gray line), and you can see it struggling to pick up. On the same disk, using the async version (yellow line), you can see it still struggles (because eventually, you need to hit the disk), but it is able to sustain much higher numbers and complete far more quickly (the yellow line ends before the gray one).
On the good disk, which is able to sustain the entire load, we are still seeing an improvement (Blue is the new version, Orange is the old one). We aren’t sure yet why the initial stage is slower (maybe just because this is the first test we ran), but even with the slower start, it was able to complete more quickly because its throughput is higher.
In RavenDB, we really care about performance. That means that our typical code does not follow idiomatic C# code. Instead, we make use of everything that the framework and the language give us to eke out that additional push for performance. Recently we ran into a bug that was quite puzzling. Here is a simple reproduction of the problem:
usingSystem.Runtime.InteropServices;var counts =newDictionary<int,int>();var totalKey =10_000;refvar total =ref CollectionsMarshal.GetValueRefOrAddDefault(
counts, totalKey,out _);for(int i =0; i <4; i++){var key = i %32;refvar count =ref CollectionsMarshal.GetValueRefOrAddDefault(
counts, key,out _);
count++;
total++;}
Console.WriteLine(counts[totalKey]);
What would you expect this code to output? We are using two important features of C# here:
Value types (in this case, an int, but the real scenario was with a struct)
CollectionMarshal.GetValueRefOrAddDefault()
The latter method is a way to avoid performing two lookups in the dictionary to get the value if it exists and then add or modify it.
If you run the code above, it will output the number 2.
That is not expected, but when I sat down and thought about it, it made sense.
We are keeping track of the reference to a value in the dictionary, and we are mutating the dictionary.
The documentation for the method very clearly explains that this is a Bad Idea. It is an easy mistake to make, but still a mistake. The challenge here is figuring out why this is happening. Can you give it a minute of thought and see if you can figure it out?
A dictionary is basically an array that you access using an index (computed via a hash function), that is all. So if we strip everything away, the code above can be seen as:
var buffer =newint[2];
ref var total = ref var buffer[0];
We simply have a reference to the first element in the array, that’s what this does behind the scenes. And when we insert items into the dictionary, we may need to allocate a bigger backing array for it, so this becomes:
var buffer =newint[2];
ref var total = ref var buffer[0];var newBuffer =newint[4];
buffer.CopyTo(newBuffer);
buffer = newBuffer;
total =1;var newTotal = buffer[0]
In other words, the total variable is pointing to the first element in the two-element array, but we allocated a new array (and copied all the values). That is the reason why the code above gives the wrong result. Makes perfect sense, and yet, was quite puzzling to figure out.
I wanted to test low-level file-system behavior in preparation for a new feature for RavenDB. Specifically, I wanted to look into hole punching - where you can give low-level instructions to the file system to indicate that you’re giving up disk space, but without actually reducing the size of the file.
This can be very helpful in space management. If I have a section in the file that is full of zeroes, I can just tell the file system that, and it can skip storing that range of zeros on the disk entirely. This is an advanced feature for file systems. I haven't actually used that in the past, so I needed to gain some expertise with it.
The code for Windows is here if you want to see it. I tested the feature on both Windows & Linux, and it worked. I could see that while the file size was 128MB, I was able to give back 16MB to the operating system without any issues. I turned the code above into a test and called it a day.
And then the CI build broke. But that wasn’t possible since I tested that. And there had been CI runs that did work on Linux. So I did the obvious thing and started running the code above in a loop.
I found something really annoying. This code worked, sometimes. And sometimes it just didn’t.
In order to get the size, I need to run this code:
I’m used to weirdness from file systems at this point, but this is really simple. All the data is 4KB aligned (in fact, all the data is 16MB aligned). There shouldn’t be any weirdness here.
As you can see, I’m already working at the level of Linux syscalls, but I used strace to check if there is something funky going on. Nope, there was a 1:1 mapping between the code and the actual system calls issued.
That means that I have to debug deeper if I want to understand what is going on. This involves debugging the Linux Kernel, which is a Big Task. Take a look at the code in the relevant link. I’m fairly certain that the issue is in those lines. The problem is that this cannot be, since both offset & length are aligned to 4KB.
I got out my crystal ball and thinking hat and meditated on this. If you’ll note, the difference between the expected and actual values is exactly 4KB. It almost looks like the file itself is not aligned on a 4KB boundary, but the holes must be.
Given that I just want to release this space to the operating system and 4KB is really small, I can adjust that as a fudge factor for the test. I would love to understand exactly what is going on, but so far the “file itself is not 4KB aligned, but holes are” is a good working hypothesis (even though my gut tells me it might be wrong).
If you know the actual reason for this, I would love to hear it.
And don't get me started on what happened with sparse files in macOS. There, the OS will randomly decide to mark some parts of your file as holes, making any deterministic testing really hard.
I’m currently deep in the process of modifying the internals of Voron, trying to eke out more performance out of the system. I’m making great progress, but I’m also touching parts of the code that haven’t even been looked at for a long time.
In other words, I’m mucking about with the most stable and most critical portions of the storage engine. It’s a lot of fun, and I’m actually seeing some great results, but it is also nerve-wracking.
We have enough tests that I’ve great confidence I would catch any actual stability issues, but the drive back toward a fully green build has been a slog.
The process is straightforward:
Change something.
Verify that it works better than before.
Run the entire test suite (upward of 30K tests) to see if there are any breaks.
The last part can be frustrating because it takes a while to run this sort of test suite. That would be bad enough, but some of the changes I made were things like marking a piece of memory that used to be read/write as read-only. Now any access to that memory would result in an access violation.
I fixed those in the code, of course, but we have a lot of tests, including some tests that intentionally corrupt data to verify that RavenDB behaves properly under those conditions.
One such test writes garbage to the RavenDB file, using read-write memory. The idea is to verify that the checksum matches on read and abort early. Because that test directly modifies what is now read-only memory, it generates a crash due to a memory access violation. That doesn’t just result in a test failure, it takes the whole process down.
I’ve gotten pretty good at debugging those sorts of issues (--blame-crash is fantastic) and was able to knock quite a few of them down and get them fixed.
And then there was this test, which uses encryption-at-rest. That test started to fail after my changes, and I was pretty confused about exactly what was going on. When trying to read data from disk, it would follow up a pointer to an invalid location. That is not supposed to happen, obviously.
Looks like I have a little data corruption issue on my hands. The problem is that this shouldn’t be possible. Remember how we validate the checksum on read? When using encryption-at-rest, we are using a mechanism called AEAD (Authenticated Encryption with Associated Data). That means that in order to successfully decrypt a page of data from disk, it must have been cryptographically verified to be valid.
My test results showed, pretty conclusively, that I was generating valid data and then encrypting it. The next stage was to decrypt the data (verifying that it was valid), at which point I ended up with complete garbage.
RavenDB trusts that since the data was properly decrypted, it is valid and tries to use it. Because the data is garbage, that leads to… excitement. Once I realized what was going on, I was really confused. I’m pretty sure that I didn’t break 256-bit encryption, but I had a very clear chain of steps that led to valid data being decrypted (successfully!) to garbage.
It was also quite frustrating to track because any small-stage test that I wrote would return the expected results. It was only when I ran the entire system and stressed it that I got this weird scenario.
I started practicing for my Fields medal acceptance speech while digging deeper. Something here had to be wrong. It took me a while to figure out what was going on, but eventually, I tracked it down to registering to the TransactionCommit event when we open a new file.
The idea is that when we commit the transaction, we’ll encrypt all the data buffers and then write them to the file. We register for an event to handle that, and we used to do that on a per-file basis. My changes, among other things, moved that logic to apply globally.
As long as we were writing to a single file, everything just worked. When we had enough workload to need a second file, we would encrypt the data twice and then write it to the file. Upon decryption, we would successfully decrypt the data but would end up with still encrypted data (looking like random fluff).
The fix was simply moving the event registration to the transaction level, not the file level. I committed my changes and went back to the unexciting life of bug-fixing, rather than encryption-breaking and math-defying hacks.
I’m trying to pay a SaaS bill online, and I run into the following issue. I have insufficient permissions to pay the invoice on the account. No insufficient funds, which is something that you’ll routinely run into when dealing with payment processing. But insufficient permissions!
Is… paying something an act that requires permissions? That something that happens? Can I get more vulnerabilities like that? When I get people to drive-by pay for my bills?
I can’t think of a scenario where you are prevented from paying to the provider. That is… weird.
And now I’m in this “nice” position where I have to chase after the provider to give them money, because otherwise they’ll close the account.
RavenDB is a .NET application, written in C#. It also has a non trivial amount of unmanaged memory usage. We absolutely need that to get the proper level of performance that we require.
With managing memory manually, there is also the possibility that we’ll mess it up. We run into one such case, when running our full test suite (over 10,000 tests) we would get random crashes due to heap corruption. Those issues are nasty, because there is a big separation between the root cause and the actual problem manifesting.
I recently learned that you can use the gflags tool on .NET executables. We were able to narrow the problem to a single scenario, but we still had no idea where the problem really occurred. So I installed the Debugging Tools for Windows and then executed:
What this does is enable a special debug heap at the executable level, which applies to all operations (managed and native memory alike).
With that enabled, I ran the scenario in question:
PS C:\Work\ravendb-6.0\test\Tryouts> C:\Work\ravendb-6.0\test\Tryouts\bin\release\net7.0\Tryouts.exe 42896 Starting to run 0 Max number of concurrent tests is: 16 Ignore request for setting processor affinity. Requested cores: 3. Number of cores on the machine: 32. To attach debugger to test process (x64), use proc-id: 42896. Url http://127.0.0.1:51595 Ignore request for setting processor affinity. Requested cores: 3. Number of cores on the machine: 32. License limits: A: 3/32. Total utilized cores: 3. Max licensed cores: 1024 http://127.0.0.1:51595/studio/index.html#databases/documents?&database=Should_correctly_reduce_after_updating_all_documents_1&withStop=true&disableAnalytics=true Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. at Sparrow.Server.Compression.Encoder3Gram`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Encode(System.ReadOnlySpan`1<Byte>, System.Span`1<Byte>) at Sparrow.Server.Compression.HopeEncoder`1[[Sparrow.Server.Compression.Encoder3Gram`1[[System.__Canon, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], Sparrow.Server, Version=6.0.0.0, Culture=neutral, PublicKeyToken=37f41c7f99471593]].Encode(System.ReadOnlySpan`1<Byte> ByRef, System.Span`1<Byte> ByRef) at Voron.Data.CompactTrees.PersistentDictionary.ReplaceIfBetter[[Raven.Server.Documents.Indexes.Persistence.Corax.CoraxDocumentTrainEnumerator, Raven.Server, Version=6.0.0.0, Culture=neutral, PublicKeyToken=37f41c7f99471593],[Raven.Server.Documents.Indexes.Persistence.Corax.CoraxDocumentTrainEnumerator, Raven.Server, Version=6.0.0.0, Culture=neutral, PublicKeyToken=37f41c7f99471593]](Voron.Impl.LowLevelTransaction, Raven.Server.Documents.Indexes.Persistence.Corax.CoraxDocumentTrainEnumerator, Raven.Server.Documents.Indexes.Persistence.Corax.CoraxDocumentTrainEnumerator, Voron.Data.CompactTrees.PersistentDictionary) at Raven.Server.Documents.Indexes.Persistence.Corax.CoraxIndexPersistence.Initialize(Voron.StorageEnvironment)
That pinpointed things so I was able to know exactly where we are messing up.
I was also able to reproduce the behavior on the debugger:
This saved me hours or days of trying to figure out where the problem actually is.