ADO.Net Entity Framework: Round II
After my last post about the ADO.Net Entity Framework, I got a long comment from Pablo Castro, the ADO.NET Technical Lead. I took the time answerring it, mainly because I wanted to consider my words carefully. Again, I'm probably not a partial side in this matter, with my strong NHibernate bias, but I do have quite a bit of experiance with Object Relational Mapping and how they map to real world applications.
My previous post was mainly about extensibility in the framework, and how much I can extend it to fit the needs of the applications. So, without further ado :-), let me get to the points Pablo brought up:
- we are working on making sure that the data model itself is extensible so services can be built on top of the model and have service-specific metadata that's accessible through the metadata API and at the same time doesn't interfere with the code runtime that depends on it.
- we're also working finishing the details of the provider model so everyone can plug-in their stores into the entity framework.
I read this paragraph several times, and I am afraid that I'm lacking the background to understand exactly what he means here. The Provider model, as I understand it, will allow to use Oracle, MySQL, etc as the database engine underlying the technology. The services is something I'm puzzled about, I'm not sure what is a service in this particual instance. Is the LINQ wrapper on the EDM a service? Or does it mean application level service?
- many to many relationships: it's not in the current bits, but it's not ruled out either as of yet. The system models certain things as link-tables already (which doesn't necessarily mean you have to use link tables for 1-n relationships in the actual store, of course, you can use the regular 2-table PK/FK story), although not all the 2details are in place for specifying and updating m-m relationships. We'll see were we land on this.
This worries me. Many to many is a very common idiom in databases and in domain models. Users and groups is a very simple example of this, but in any model there are many such relationships. I am aware of the amount of work that this feature require, but that is the reason we get those frameworks in the first place, so the developers wouldn't have to handle this.
Sets match the database model very nicely, but as a developer, I often has other need from collections. For instance, a set of filtering rules where the orderring of the rules is important match very nicely to a list (indexed by the position). A set of attributes that can related to an object is modeled as a index collection of key and value pairs. Frankly, I don't see much of a difference between dictionaries and indexed collections at a high level.
Those are the simple things, by the way. What about cases where the collection's key has a valid business meaning. A good example is checking valid date ranges. My house' lease have a contract for a specific period only, this naturally maps to an object model that has the relation between the house and the current leaser is a dictionary of date ranges and customers. Add a couple more freakish requirements, and you have to have support for those issues. As much as I would like it to be, using simple sets is often just not possible, too much information goes away this way.
I wrote my own collection classes for NHibernate that will do the fixups for the relationships, so I fully understand the need and how nice it is to have this. That said, please don't try to protect me from myself. If writing a custom collection is something really hard, document it with big red giant letters, and let me feel the pain of doing it myself. I will need to do this, period.
I understand the issues with exploding test matrixes and scenarios that you can't support because while you may infinite supply of resources, you don't have infinite supply of time :-). But I would much rather a "Here Be Dragons" sign over a locked door.
For the specific scenario, you can take a look at this post, I discussed a bit how I solved the issue. Note that all the History collections are of IDictionary<DateRange, T> type.
It took me a moment to figure out what you meant here. To the readers who aren't versed in OR/M implementation details, the issue is this:
sale.Customer = currentCustomer;
context.SubmitChanges();
The OR/M nees to figure out what the Id of the currentCustomer is, so it can save it to the database. I'm not sure that I understand the problem here, though.
You have the instance of the related object, and you know the object model. It is very simple to get from those two pieces of information to the value that should go to the database. I understand that you are working slightly differently in your model, using EntityRef<T> to explicitly hold the ID value, but I don't see this as an unsolvable issue, or even a very hard one. The simplest issue here is to dictate that you need this value, and fail if it is not there. If I do my custom thing, I should play nice with the rest of the framework.
You'll probably regret saying this :-) but I will try.
First, the main assumtion that I am making here is that the ADO.Net Entity Framework is supposed to be used in both new projects and existing ones, and that it is supposed to be a fully featured package, and not one that is targeted at the simple scenarios. If I am wrong, than most of my comments are invalid by defination, but the way I see ADO.Net Entity Framework presented seems to support this assumtion.
- Legacy support
This is a major issue if you decide that you want to support existing projects and not just new ones. In this case, you need to support some fairly crazy database schemas. From the sane to the fully de-normalized ones to the Let Us Put This Data Somehere approach that some peole prefer.
To give you a few examples: - A PK/FK relationship where the FK is found via substring(5, PK) in the parent table.
- A boolean flag that has null / not null mapped to true or false, with completely random values as the non null values.
- A table where each row contains several entities, and the relations between them.
- Inheritance model
From the documentation that I have seen so far, the inheritance models supported are table per hierarchy (discriminator) and table per sub-class. It there support for table per class, and how well it plays with the rest of the system. - Splitting a value from a table
This is related to too much information in a row, but I may want to break a part of the row into a value object that is associated with the object. - Caching
What is the caching story? I know that you have Identity Map in place, but what about application wide caching? I assume that you will use ASP.Net's cache, but what happen on a web farm scenario? - Master / slave scanerios
What happen when I want to scale by making all my writes to a single server, and replicate from there? - Connection Control
What happens if I want to explicitly control the connection lifetime and behavior?
For that matter, how much control do I have for where exactly the data goes? Can I decide to move to a different database for a save, and then move back? - Change tracking
Change tracking on entities is usually done by comparing them to their original values, when this is done on large object graphs, it can be a significant performance hit, especially if I'm only updating a few values, but I need to read a lot of data. Can I take action in this case? - SQL Functionality
What happens if I have some specific functionality in the database that I need to use? UDF is one example, but my SQL Functions come to mind as well. Is it possible? How well it integrates into the rest of the systems? - Composite Keys
Can I use them? How well do they play with relashions? What happen if the relation is based on only part of the compose key? - Custom Types
I mentioned the null bool scenario, but there are many other cases where I want to get involved with the way the framework is creating and persisting properties in my objects.
Customer scenarios - This is here because of a post by Clemens Vasters about how much less affective he became in influencing the direction of WCF since he joined the team. Now he doesn't have the justification of a customer need to do it.
Here a few examples of scenarios that I personally run into:
- Timed objects - I discussed how I used OR/M to make sense of a really complicated business model. This approach require customized collection and quite a bit flexibility on the side of the OR/M when defining the way the data is transformed from the database to the object graph.
- Xml Extended Attributes - I have got a customer who want to keep a list of extended attributes inside the table itself, and has decided to use XML Column in order to do so. The challange here is to get the data from the XML Column inot an object property and persist it back to the XML. This allows extending the table for child objects without modifying the object structure.
- Handling legacy schema such as this one, I know how I can map it cleanly so the object model has no idea how the database looks like.
I posted anoter list of features that I constant use in OR/M here.
The issue that I have with this approach is that it targets the simple scenarios, those who probably wouldn't get that much benefit from this. The amount of effort that Microsoft is putting into OR/M efforts is proof that the DataSet model doesn't really scale for complex applications. The decision to make this accessible only via DataAdapter punishes anyone who decide that they can do better by going lower in the stack and build from there. I am used to having more options the lower I go, not the other way around.
In ASP.Net, for instance, I may not be able to change the Page Controller architecture in the WebForms level, but I can go down a couple of levels and use Front Controller architecture, and the only functionality that I lose is the one spesific for WebForms.
But this is crying over spilled milk, I am afraid. The most pressing question at the moment is will this be fixed in .Net 3.0? And yes, I am talking about the former WinFX release. I want and need this ability. This is the closest release, and the next one after that is sometimes in 2008, which is way too long.
Comments
Comment preview