ADO.Net Entity Framework: Round II

time to read 14 min | 2786 words

After my last post about the ADO.Net Entity Framework, I got a long comment from Pablo Castro, the ADO.NET Technical Lead. I took the time answerring it, mainly because I wanted to consider my words carefully. Again, I'm probably not a partial side in this matter, with my strong NHibernate bias, but I do have quite a bit of experiance with Object Relational Mapping and how they map to real world applications.

My previous post was mainly about extensibility in the framework, and how much I can extend it to fit the needs of the applications. So, without further ado :-), let me get to the points Pablo brought up:

We're not ignoring extensibility :), we're looking at some aspects of this right now. A couple of examples:
  - we are working on making sure that the data model itself is extensible so services can be built on top of the model and have service-specific metadata that's accessible through the metadata API and at the same time doesn't interfere with the code runtime that depends on it.
  - we're also working finishing the details of the provider model so everyone can plug-in their stores  into the entity framework.

I read this paragraph several times, and I am afraid that I'm lacking the background to understand exactly what he means here. The Provider model, as I understand it, will allow to use Oracle, MySQL, etc as the database engine underlying the technology. The services is something I'm puzzled about, I'm not sure what is a service in this particual instance. Is the LINQ wrapper on the EDM a service? Or does it mean application level service?

A few comments on your specific concerns:
- many to many relationships: it's not in the current bits, but it's not ruled out either as of yet. The system models certain things as link-tables already (which doesn't necessarily mean you have to use link tables for 1-n relationships in the actual store, of course, you can use the regular 2-table PK/FK story), although not all the 2details are in place for specifying and updating m-m relationships. We'll see were we land on this.

This worries me. Many to many is a very common idiom in databases and in domain models. Users and groups is a very simple example of this, but in any model there are many such relationships. I am aware of the amount of work that this feature require, but that is the reason we get those frameworks in the first place, so the developers wouldn't have to handle this.

- Indexed collections. The current collections are not indexable, but I'll take the feedback and look into it. As for dictionary, I acknowledge that there are scenarios for it, although we currently don't have plans for doing it. Again, feedback taken.

Sets match the database model very nicely, but as a developer, I often has other need from collections. For instance, a set of filtering rules where the orderring of the rules is important match very nicely to a list (indexed by the position). A set of attributes that can related to an object is modeled as a index collection of key and value pairs. Frankly, I don't see much of a difference between dictionaries and indexed collections at a high level.

Those are the simple things, by the way. What about cases where the collection's key has a valid business meaning. A good example is checking valid date ranges. My house' lease have a contract for a specific period only, this naturally maps to an object model that has the relation between the house and the current leaser is a dictionary of date ranges and customers. Add a couple more freakish requirements, and you have to have support for those issues. As much as I would like it to be, using simple sets is often just not possible, too much information goes away this way.

- Custom collections: we have had *really long* discussion over this among ourselves...right now we are focusing on the scenarios where our own classes are used for collections; that allows us to do relationship fixups and the ends of a relationship changes. It also makes it straightforward to do change-tracking over related sets. With custom collections we can't fix up ends of relationships, so let's say you have a Customer object and an SalesOrder object; when you add the SalesOrder to the Customer.Orders collection, you'll expect SalesOrder.Customer to point to the right customer...we do that by hooking up into the collection. It results in less surprises, particular for users that don't have a lot of experience with the tricky areas of object layers on top of databases. As for your particular scenario, is that something you can embed in the CLR type that represents the entity (e.g. as property in the user part of the partial class)?

I wrote my own collection classes for NHibernate that will do the fixups for the relationships, so I fully understand the need and how nice it is to have this. That said, please don't try to protect me from myself. If writing a custom collection is something really hard, document it with big red giant letters, and let me feel the pain of doing it myself. I will need to do this, period.

I understand the issues with exploding test matrixes and scenarios that you can't support because while you may infinite supply of resources, you don't have infinite supply of time :-). But I would much rather a "Here Be Dragons" sign over a locked door.

For the specific scenario, you can take a look at this post, I discussed a bit how I solved the issue. Note that all the History collections are of IDictionary<DateRange, T> type.

- The tricky part of this one is update. When you change the value in your entity from false (null) to true (not-null) what value do you put in the database? "not-null" is not specific enough.

It took me a moment to figure out what you meant here. To the readers who aren't versed in OR/M implementation details, the issue is this:

sale.Customer = currentCustomer;
context.SubmitChanges();

The OR/M nees to figure out what the Id of the currentCustomer is, so it can save it to the database. I'm not sure that I understand the problem here, though.

You have the instance of the related object, and you know the object model. It is very simple to get from those two pieces of information to the value that should go to the database. I understand that you are working slightly differently in your model, using EntityRef<T> to explicitly hold the ID value, but I don't see this as an unsolvable issue, or even a very hard one. The simplest issue here is to dictate that you need this value, and fail if it is not there. If I do my custom thing, I should play nice with the rest of the framework.

Thoughts and further questions are welcome :)

You'll probably regret saying this :-) but I will try.

First, the main assumtion that I am making here is that the ADO.Net Entity Framework is supposed to be used in both new projects and existing ones, and that it is supposed to be a fully featured package, and not one that is targeted at the simple scenarios. If I am wrong, than most of my comments are invalid by defination, but the way I see ADO.Net Entity Framework presented seems to support this assumtion.

  • Legacy support
    This is a major issue if you decide that you want to support existing projects and not just new ones. In this case, you need to support some fairly crazy database schemas. From the sane to the fully de-normalized ones to the Let Us Put This Data Somehere approach that some peole prefer.
    To give you a few examples:
    • A PK/FK relationship where the FK is found via substring(5, PK) in the parent table.
    • A boolean flag that has null / not null mapped to true or false, with completely random values as the non null values.
    • A table where each row contains several entities, and the relations between them.
  • Inheritance model
    From the documentation that I have seen so far, the inheritance models supported are table per hierarchy (discriminator) and table per sub-class. It there support for table per class, and how well it plays with the rest of the system.
  • Splitting a value from a table
    This is related to too much information in a row, but I may want to break a part of the row into a value object that is associated with the object.
  • Caching
    What is the caching story? I know that you have Identity Map in place, but what about application wide caching? I assume that you will use ASP.Net's cache, but what happen on a web farm scenario?
  • Master / slave scanerios
    What happen when I want to scale by making all my writes to a single server, and replicate from there?
  • Connection Control
    What happens if I want to explicitly control the connection lifetime and behavior?
    For that matter, how much control do I have for where exactly the data goes? Can I decide to move to a different database for a save, and then move back?
  • Change tracking
    Change tracking on entities is usually done by comparing them to their original values, when this is done on large object graphs, it can be a significant performance hit, especially if I'm only updating a few values, but I need to read a lot of data. Can I take action in this case?
  • SQL Functionality
    What happens if I have some specific functionality in the database that I need to use? UDF is one example, but my SQL Functions come to mind as well. Is it possible? How well it integrates into the rest of the systems?
  • Composite Keys
    Can I use them? How well do they play with relashions? What happen if the relation is based on only part of the compose key?
  • Custom Types
    I mentioned the null bool scenario, but there are many other cases where I want to get involved with the way the framework is creating and persisting properties in my objects.

Customer scenarios - This is here because of a post by Clemens Vasters about how much less affective he became in influencing the direction of WCF since he joined the team. Now he doesn't have the justification of a customer need to do it.

Here a few examples of scenarios that I personally run into:

  • Timed objects - I discussed how I used OR/M to make sense of a really complicated business model. This approach require customized collection and quite a bit flexibility on the side of the OR/M when defining the way the data is transformed from the database to the object graph.
  • Xml Extended Attributes - I have got a customer who want to keep a list of extended attributes inside the table itself, and has decided to use XML Column in order to do so. The challange here is to get the data from the XML Column inot an object property and persist it back to the XML. This allows extending the table for child objects without modifying the object structure.
  • Handling legacy schema such as this one, I know how I can map it cleanly so the object model has no idea how the database looks like.

I posted anoter list of features that I constant use in OR/M here.

off-topic: regarding data-adapter batching, I made that call early in the Whidbey cycle; it was painful, but it was unrelated to wanting/not-wanting extensibility. It was a scoping decision. At some point you have to decide where to cut a release and it forces you to prioritize scenarios, and that includes painful decisions. I completely understand your frustration, but you can be assured that it wasn't, and it's not now in the Entity Framework design, lack of interest for extensibility.

The issue that I have with this approach is that it targets the simple scenarios, those who probably wouldn't get that much benefit from this. The amount of effort that Microsoft is putting into OR/M efforts is proof that the DataSet model doesn't really scale for complex applications. The decision to make this accessible only via DataAdapter punishes anyone who decide that they can do better by going lower in the stack and build from there. I am used to having more options the lower I go, not the other way around.

In ASP.Net, for instance, I may not be able to change the Page Controller architecture in the WebForms level, but I can go down a couple of levels and use Front Controller architecture, and the only functionality that I lose is the one spesific for WebForms.

But this is crying over spilled milk, I am afraid. The most pressing question at the moment is will this be fixed in .Net 3.0? And yes, I am talking about the former WinFX release. I want and need this ability. This is the closest release, and the next one after that is sometimes in 2008, which is way too long.