How to quickly judge the quality of a database

time to read 3 min | 567 words

When I start using a new database, I usually run the following queries to find how good or bad the project is going to be:

select count(*) as [Number Of Tables] from information_schema.tables

Anything bigger than a couple of hundreds, and I start to feel really nervous. Ideally, it is around 20 - 50 main ones, and maybe additional dozen tables for constants. I have worked on databases where then numbers are in the thousands (and yes, I did touch every table)

select table_name, count(*) as [Number Of Columns In Table] from information_schema.columns

group by table_name

order by count(*) desc

If you see a table with more than 25 - 30 columns, this is a sign of a big problem. Either the database is not normalizied, the data model is completely wierd, or you are looking into a "the database in a single table".

The following is my number one criteria for the quality of the database:

select column_name, count(*) [Number Of Columns Named This Way] from information_schema.columns

where data_type = 'DATETIME'

group by column_name

order by count(*) desc

The important things to note here are:

  • The number of rows returned, the more there are, the worst you are.
  • The commonly named columns are very important. I often see things (in Hebrew) that looks like this: Taarich / Taarih / Tarich / Taaizh / Taar - Different ways to transliterate Date from Hebrew to English. When I see those prolifer, I know that I am in trouble.

A variant of the above is to run it over all columns regardless of datatype, and check for common names, but I found that focusing on the dates is fairly accurate.