Metrics hunt in RavenDB
You might have noticed that we are paying a lot of options for operational concerns in RavenDB 3.0. This is especially true because we moved away from performance counters to metrics.net, which means that is is much easier and light weight to add metrics to RavenDB.
As a result of that, we are adding a lot of stuff that will be very useful for ops team. From monitoring the duration of queries to the bandwidth available for replication to a host of other stuff.
What I wanted to ask is what kind of things do you want us to track?
Comments
One of the questions we're trying to answer at the moment is whether to make some big changes to our indexes. I'd like to answer this question: Over the last hour, how much CPU time, memory and I/O was spent on indexing (and ideally by which indexes)?
This way when I change my index and put it under some load, I can see whether my improved index is going to reduce memory/CPU/I/O or increase it.
(If there's a way to do this currently in Raven I'd love to know)
a report of which are the most frequent queries, which are the most expensive queries, and what operations are taking the most juice over a period of time would be great.
How about more info regarding long running background file operations, like the status when deleting a large index.
More importantly though, query/index efficiency (or inefficiencies). Like if the entire document is being loaded to satisfy a query when a projection could (or should) have been requested/used instead. Or metrics to know if/when transformers are loading related documents. Or how many CPU and IO ops it takes to satisfy an index map/reduce result (to find indexes that are doing more work than expected).
Paul, You can see something pretty close to it in the indexing /stats. There is a performance option that list the input and duration for the indexing. Calculating memory / cpu is pretty hard, because we don't really have access to it.
Afif, Queries in RavenDB tend to be pretty short, as far as actually querying the db. Most of the "expensive" queries we have seen are actually getting large amount of data, and thus take a lot of time to send over the network.
Mufasa, We have added query details that will let you know how much time was spent executing the query on the index, how much loading the data from the database, etc. You can add this using ShowTimings() on the query.
@Ayende Not sure if you are interested, but I've created another .NET port of the java Metrics library. (the reasons are in the readme - but the main one is that Daniel's port is not actively developed anymore).
The code is available here https://github.com/etishor/Metrics.NET The docs are in the wiki: https://github.com/etishor/Metrics.NET/wiki The NuGet package: Metrics.NET
My main focus was to provide the simplest api possible for the consumers of the library.
I would appreciate it if you could take a look and share your opinions.
Lulian, Take a look at my original post about Metrics.NET A lot of the same issues apply to the code you have there.
Ayende,
When doing the port I actually used parts of your post as inspiration :)
Code is targeting 4.5, 4.5.1 & mono, there is a separate branch with 4.0 support - so no thread sleep or thread waiting.
A metric does not depend on a type, it only has a string name. There is an overload method that can take a type parameter that is used to build the name based on the type, but that is completely optional. Since I've been using the lib I've mostly preferred explicit metric names that don't depend on the type name.
Also there is no assumption of a static, fixes set of metrics. New metrics can be added to a registry at any time ( take a look at how metrics for each request are added the first time the request is made in the NancyFx adapter ). It is true, metrics can't be removed from a registry - but since metrics are cheap I can't see why you would remove a metric.
For convenience a static class is used to configure metrics and one default metrics registry. This is because this is the most common use case.
I have not really targeted multi-registry scenarios (yet) but there should be nothing stopping you from using multiple registries that can be created or destroyed at any time. I've added a sample of how multi registry would be done here: https://github.com/etishor/Metrics.NET/blob/dev/Samples/Metrics.Samples/MultiRegistryMetrics.cs
It is on my todo list to improve the way you would manage multiple metric "sets" by providing utility classes or apis and also to improve the reporters/visualizers to account for multiple metric sets.
Don't get me wrong, i'm not saying you should use my lib, not trying to convince you of anything, but considering the scenario where you need metrics (RavenDb) and also considering your experience in building developer friendly stuff - i'm very interested in fixing or improving anything you would consider an issue.
Thanks, iulian
Comment preview