Bloom Lab: SQL data reliability

Data Integrity

The following categories of the data integrity exist with each RDBMS:

Entity Integrity: There are no duplicate rows in a table.
Domain Integrity: Enforces valid entries for a given column by restricting the type, the format, or the range of values.
Referential integrity: Rows cannot be deleted, which are used by other records.
User-Defined Integrity: Enforces some specific business rules that do not fall into entity, domain or referential integrity.

Constraints

Rules enforced on data columns that ensures the accuracy and reliability of the DB data.

Constrains could be column or table level.

All existing Constraint types will be described below: NOT NULL, DEFAULT, UNIQUE, CHECK, PRIMARY KEY, FOREIGN KEY.

Constraints Naming

Ensure that each column has a descriptive name!

Doing so makes DB more self-documenting and easy to understand for those who didn't develop it.

Consider adding surrogate keys – columns that have no business meaning but can uniquely identify each row. Identity columns are a great example of surrogate keys.

If you do use surrogate keys be sure that their name includes the table/entity name.
For example, do not add a column called ID to each table. Instead use customer_id, supplier_id, store_id and so forth.

There are a lot of conventions and holy wars regarding the Constraints naming.
Just choose the most understandable one and follow it.

I use the following Constraint prefixes:

Name prefix	Statement type
`PK_`	`PRIMARY KEY`
`FK_`	`FOREIGN KEY`
`DF_`	`DEFAULT`
`UQ_`	`UNIQUE`
`CHK_`	`CHECK`
`IDX_`	`INDEX`

I prefer UQ_ over UK_ or UN_ which u can find in some SQL standarts, cause UQ_ isn't intercepted with others abbreviations.
The same for CHK_ instead of CC_ or CK_ and IDX_ instead of IX_ - cause it's more precise & understandable.

Whatever, if u work in a team, just consider the team guidelines and don't mess it up with your own keys!

----
## Create Constraints
U can specify constraints at once during the table creation via `CREATE TABLE`:
 
      ---> CREATE TABLE users(
        -> user_id INT NOT NULL,
        -> username VARCHAR(50) NOT NULL,
        -> password VARCHAR(30),
        -> first_name VARCHAR(50),
        -> last_name VARCHAR(50),
        -> email VARCHAR(50),
        -> sex CHAR(1),
        -> birth DATETIME,
        ->
        -> CONSTRAINT PK_USER_ID PRIMARY KEY (user_id)
        -> );  
 
*U might need some extra fields like: `created, updated/modified, active, phone, address` or `country, city, region, zip, etc.`*
  
U can add constraints to the existing table via using `ALTER TABLE`:

ALTER TABLE users
    ADD CONSTRAINT UQ_USERNAME UNIQUE (username);
 
----       
## Remove Constrains
U can remove constraint from the existing table via using `ALTER TABLE`:

*SQL Server / Oracle / MS Access:*

ALTER TABLE table_name
    DROP CONSTRAINT constraint_name;

*MySQL - Constraint type specific shit:*

mysql> ALTER TABLE users
        -> DROP INDEX UQ_USERNAME;

----                                   
## `NOT NULL` Constraint
The `NOT NULL` Constraint enforces a column to NOT accept NULL values.  
This means that you cannot insert a new record, or update a record without adding a value to this field.

`NOT NULL` Constrains usually have no names and are added during the table creation.

> Attempt to configure column null-ability correctly.   
> If the column should always have a value, then configure it as NOT NULL.   
> Using default constraints is more efficient than having columns allowing NULL values.

----
## `DEFAULT` Constraint
The `DEFAULT` constraint is used to insert a default value into a column.  
The default value will be added to all new records, if no other value is specified.

> Consider using `BIT` data type columns for Boolean values, as opposed to storing `true/false`, `yes/no` or other character strings. 
> SQL Server can store up to 8 columns with `BIT` data type in a single byte. 
> One scenario where this could be handy is when you're using *soft deletes*.
> Instead of physically removing a record from a table, you simply tag it as deleted, 
> by flipping the bit value of the *deleted* column to 1.

*Define `DEFAULT` Constraint during the table creation or adding the new column:*

ALTER TABLE users ADD active BIT DEFAULT 1;

*Add or update `DEFAULT` Constraint during the exiting column update:*

ALTER TABLE users MODIFY active BIT DEFAULT 0;

----
## `UNIQUE` Constraint
The `UNIQUE` Constraint prevents two records from having identical values in a particular column.

----
## `PRIMARY KEY` Constraint
A `PRIMARY KEY` Constraint automatically has a `UNIQUE` & `NOT NULL` constraints defined on it.
U can have only one `PRIMARY KEY` Constraint per table.

> Primary keys should be as small as necessary.   
> Prefer a numeric type because numeric types are stored in a much more compact format than 
> character formats.   
> This is because most primary keys will be foreign keys in another table as well as used in multiple indexes.   
> The smaller your key, the smaller the index, the less pages in the cache you will use.

> **Primary keys should never change!**   
> Updating a primary key should always be out of the question.   
> This is because it is most likely to be used in multiple indexes and used as a foreign key.   
> Updating a single primary key could cause of ripple effect of changes.

> Do NOT use *your problem primary key* as your logic model primary key.   
> For example: *passport number*, *social security number*, or *employee contract number* as these *primary key* can change for real world 
situations.

Some practical rules regarding *surrogate* vs *natural* keys:
 
* If the *natural* key is small and will never change it can be used as a *primary* key.
   
* If the *natural* key is large or likely to change It's better to use *surrogate* keys.
 
* If there is no primary key it's still better to make a *surrogate* key,   
because you will always add tables to your schema and wish you'd put a *primary* key in place.

----
## `FOREIGN KEY` Constraint
The `FOREIGN KEY` Constraint is a column or list of columns which points to the `PRIMARY KEY` of another table
to ensure the referential integrity of the data in one table to match values in another table.

For each row in the referencing table*(table contains FOREIGN KEY)*, the foreign key must match an existing primary key in the referenced table*(table contains PRIMARY KEY)*.   
This enforcement of `FOREIGN KEY` called the *Referential Integrity*.

The `FOREIGN KEY` constraint is used to prevent actions that would destroy links between tables.

The `FOREIGN KEY` constraint also prevents invalid data from being inserted into the foreign key column, 
because it has to be one of the values contained in the table it points to.

---> CREATE TABLE orders (
        -> order_id INT NOT NULL,
        -> order_date DATETIME NOT NULL,
        -> user_id INT(11) NOT NULL,
        ->
        -> CONSTRAINT PK_ORDER_ID PRIMARY KEY(order_id),
        -> CONSTRAINT FK_USER_ID FOREIGN KEY(user_id) REFERENCES users(user_id)
        -> );
        
        +------------+----------+------+-----+---------+-------+
        | Field      | Type     | Null | Key | Default | Extra |
        +------------+----------+------+-----+---------+-------+
        | order_id   | int(11)  | NO   | PRI | NULL    |       |
        | order_date | datetime | NO   |     | NULL    |       |
        | user_id    | int(11)  | NO   | MUL | NULL    |       |
        +------------+----------+------+-----+---------+-------+

----
## `CHECK` Constraint
The `CHECK` Constraint is used to limit the value range that can be placed in a column.

> Is not a good thing to store Age in a database.    
> Age changes as time passes, which means you'd have to update your records often.   
> Instead, it is better to store a fixed value such as date of birth.   
> Then, whenever you need age, you can calculate it as the difference between the current date and the birth date 
> and u'll even be able to send birthday greetings to your users:)
 
    ALTER TABLE users
    ADD CONSTRAINT CHK_AGE CHECK (TIMESTAMPDIFF(YEAR,birth,CURDATE()) >= 18);

According to [MySQL Reference Manual](http://dev.mysql.com/doc/refman/5.1/en/create-table.html):

> **The `CHECK` clause is parsed but ignored by all storage engines.**

So, u have to use `CREATE TRIGGER` as workaround to check condition before insertion.

---- 
## List of Constraints from MySQL Database
Use the `information_schema.table_constraints` table to get the names of the constraints defined on each table:

SELECT * FROM information_schema.table_constraints
    WHERE constraint_schema = 'db_name';

Use the `information_schema.key_column_usage` table to get the fields in each one of those constraints:  
 
    SELECT * FROM information_schema.key_column_usage
    WHERE constraint_schema = 'db_name';

To check the foreign key constraints, use `information_schema.referential_constraints`:        
 
    SELECT * FROM information_schema.referential_constraints
    WHERE constraint_schema = 'db_name'
                                  
----
## see Also
* [ISO/IEC 11179 Naming principles](http://metadata-standards.org/11179/#A5)
* [SQL Server Naming Conventions](http://www.isbe.state.il.us/ilds/pdf/sql_server_standards.pdf) 
* [Oracle Naming Conventions](http://oracle-base.com/articles/misc/naming-conventions.php)
* [SQL Server Best Practices](http://searchsqlserver.techtarget.com/tip/Creating-SQL-Server-columns-A-best-practices-guide)

Bloom Lab

Pages

Monday, March 16, 2015

SQL data reliability - CONSTRAINTS usage

Data Integrity

Constraints

Constraints Naming

No comments:

Post a Comment