Monday, March 16, 2015

SQL data reliability - CONSTRAINTS usage

Data Integrity

The following categories of the data integrity exist with each RDBMS:

  • Entity Integrity: There are no duplicate rows in a table.
  • Domain Integrity: Enforces valid entries for a given column by restricting the type, the format, or the range of values.
  • Referential integrity: Rows cannot be deleted, which are used by other records.
  • User-Defined Integrity: Enforces some specific business rules that do not fall into entity, domain or referential integrity.

Constraints

Rules enforced on data columns that ensures the accuracy and reliability of the DB data.
Constrains could be column or table level.

All existing Constraint types will be described below: NOT NULL, DEFAULT, UNIQUE, CHECK, PRIMARY KEY, FOREIGN KEY.

Constraints Naming

Ensure that each column has a descriptive name!
Doing so makes DB more self-documenting and easy to understand for those who didn't develop it.

Consider adding surrogate keys – columns that have no business meaning but can uniquely identify each row. Identity columns are a great example of surrogate keys.

If you do use surrogate keys be sure that their name includes the table/entity name.
For example, do not add a column called ID to each table. Instead use customer_id, supplier_id, store_id and so forth.

There are a lot of conventions and holy wars regarding the Constraints naming.
Just choose the most understandable one and follow it.

I use the following Constraint prefixes:

Name prefix Statement type
PK_ PRIMARY KEY
FK_ FOREIGN KEY
DF_ DEFAULT
UQ_ UNIQUE
CHK_ CHECK
IDX_ INDEX
I prefer UQ_ over UK_ or UN_ which u can find in some SQL standarts, cause UQ_ isn't intercepted with others abbreviations.
The same for CHK_ instead of CC_ or CK_ and IDX_ instead of IX_ - cause it's more precise & understandable.
Whatever, if u work in a team, just consider the team guidelines and don't mess it up with your own keys!

Create Constraints

U can specify constraints at once during the table creation via CREATE TABLE:

  ---> CREATE TABLE users(
    -> user_id INT NOT NULL,
    -> username VARCHAR(50) NOT NULL,
    -> password VARCHAR(30),
    -> first_name VARCHAR(50),
    -> last_name VARCHAR(50),
    -> email VARCHAR(50),
    -> sex CHAR(1),
    -> birth DATETIME,
    ->
    -> CONSTRAINT PK_USER_ID PRIMARY KEY (user_id)
    -> );  

U might need some extra fields like: created, updated/modified, active, phone, address or country, city, region, zip, etc.

U can add constraints to the existing table via using ALTER TABLE:

ALTER TABLE users
ADD CONSTRAINT UQ_USERNAME UNIQUE (username);

Remove Constrains

U can remove constraint from the existing table via using ALTER TABLE:

SQL Server / Oracle / MS Access:

ALTER TABLE table_name
DROP CONSTRAINT constraint_name;

MySQL - Constraint type specific shit:

mysql> ALTER TABLE users
    -> DROP INDEX UQ_USERNAME;

NOT NULL Constraint

The NOT NULL Constraint enforces a column to NOT accept NULL values.
This means that you cannot insert a new record, or update a record without adding a value to this field.

NOT NULL Constrains usually have no names and are added during the table creation.

Attempt to configure column null-ability correctly.
If the column should always have a value, then configure it as NOT NULL.
Using default constraints is more efficient than having columns allowing NULL values.


DEFAULT Constraint

The DEFAULT constraint is used to insert a default value into a column.
The default value will be added to all new records, if no other value is specified.

Consider using BIT data type columns for Boolean values, as opposed to storing true/false, yes/no or other character strings. SQL Server can store up to 8 columns with BIT data type in a single byte. One scenario where this could be handy is when you're using soft deletes. Instead of physically removing a record from a table, you simply tag it as deleted, by flipping the bit value of the deleted column to 1.

Define DEFAULT Constraint during the table creation or adding the new column:

ALTER TABLE users ADD active BIT DEFAULT 1;

Add or update DEFAULT Constraint during the exiting column update:

ALTER TABLE users MODIFY active BIT DEFAULT 0; 

UNIQUE Constraint

The UNIQUE Constraint prevents two records from having identical values in a particular column.


PRIMARY KEY Constraint

A PRIMARY KEY Constraint automatically has a UNIQUE & NOT NULL constraints defined on it. U can have only one PRIMARY KEY Constraint per table.

Primary keys should be as small as necessary.
Prefer a numeric type because numeric types are stored in a much more compact format than character formats.
This is because most primary keys will be foreign keys in another table as well as used in multiple indexes.
The smaller your key, the smaller the index, the less pages in the cache you will use.

Primary keys should never change!
Updating a primary key should always be out of the question.
This is because it is most likely to be used in multiple indexes and used as a foreign key.
Updating a single primary key could cause of ripple effect of changes.

Do NOT use your problem primary key as your logic model primary key.
For example: passport number, social security number, or employee contract number as these primary key can change for real world situations.

Some practical rules regarding surrogate vs natural keys:

  • If the natural key is small and will never change it can be used as a primary key.

  • If the natural key is large or likely to change It's better to use surrogate keys.

  • If there is no primary key it's still better to make a surrogate key,
    because you will always add tables to your schema and wish you'd put a primary key in place.


FOREIGN KEY Constraint

The FOREIGN KEY Constraint is a column or list of columns which points to the PRIMARY KEY of another table to ensure the referential integrity of the data in one table to match values in another table.

For each row in the referencing table*(table contains FOREIGN KEY), the foreign key must match an existing primary key in the referenced table(table contains PRIMARY KEY)*.
This enforcement of FOREIGN KEY called the Referential Integrity.

The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.

The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key column, because it has to be one of the values contained in the table it points to.

  ---> CREATE TABLE orders (
    -> order_id INT NOT NULL,
    -> order_date DATETIME NOT NULL,
    -> user_id INT(11) NOT NULL,
    ->
    -> CONSTRAINT PK_ORDER_ID PRIMARY KEY(order_id),
    -> CONSTRAINT FK_USER_ID FOREIGN KEY(user_id) REFERENCES users(user_id)
    -> );
    
    +------------+----------+------+-----+---------+-------+
    | Field      | Type     | Null | Key | Default | Extra |
    +------------+----------+------+-----+---------+-------+
    | order_id   | int(11)  | NO   | PRI | NULL    |       |
    | order_date | datetime | NO   |     | NULL    |       |
    | user_id    | int(11)  | NO   | MUL | NULL    |       |
    +------------+----------+------+-----+---------+-------+

CHECK Constraint

The CHECK Constraint is used to limit the value range that can be placed in a column.

Is not a good thing to store Age in a database.
Age changes as time passes, which means you'd have to update your records often.
Instead, it is better to store a fixed value such as date of birth.
Then, whenever you need age, you can calculate it as the difference between the current date and the birth date and u'll even be able to send birthday greetings to your users:)

ALTER TABLE users
ADD CONSTRAINT CHK_AGE CHECK (TIMESTAMPDIFF(YEAR,birth,CURDATE()) >= 18);

According to MySQL Reference Manual:

The CHECK clause is parsed but ignored by all storage engines.

So, u have to use CREATE TRIGGER as workaround to check condition before insertion.


List of Constraints from MySQL Database

Use the information_schema.table_constraints table to get the names of the constraints defined on each table:

SELECT * FROM information_schema.table_constraints
WHERE constraint_schema = 'db_name';

Use the information_schema.key_column_usage table to get the fields in each one of those constraints:

SELECT * FROM information_schema.key_column_usage
WHERE constraint_schema = 'db_name';

To check the foreign key constraints, use information_schema.referential_constraints:

SELECT * FROM information_schema.referential_constraints
WHERE constraint_schema = 'db_name'

see Also

No comments:

Post a Comment