History

Contents
A chapter title
1
Another awesome chapter
5
Be more creative with your titles ;-)
9
← Back to the blog

This is a short overview of the history of databases, ending with the invention of the relational database concept by EF Codd, which will be the main focus of this series.

      - [Databases](#sec-1)
- [History](#sec-1-1) -
      [File based](#sec-1-1-1)
- [Hierarchical databases](#sec-1-1-2)
-
      [Network](#sec-1-1-3)
- [Relational databases](#sec-1-1-4)
      

This post will give a short overview of what a database is, it's history, how it came to be, how it has evolved, and what the current overview of the field is. Like most other literature on the subject we will be eventually homing in on the relational database concept, as defined by E. Codd in 1970 [1].

History

Databases is not a new concept by any measure of the imagination. In fact, the storing and retrieval of information has been done in one form or another, long before electronical computers were a thing. Think of libraries, or other paper based methods of keeping track of information. Accountants will sure nod their head in agreement here. However, our interest is in computers, and hence we will delve into this aspect of databases.

File based

The first databases were file based. This means that they stored the information from a program in a file, and the same program (or multiple programs) would retrieve and store the information from this file. The knowledge of how the data was stored in the file was stored in the programs themselves. This even came built-in as a part of some programming languages such as COBOL [2].

IDENTIFICATION DIVISION.
. . .
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
    SELECT filename ASSIGN TO assignment-name  (1) (2)
    ORGANIZATION IS org ACCESS MODE IS access  (3) (4)
    FILE STATUS IS file-status  (5)
    . . .
DATA DIVISION.
FILE SECTION.
FD  filename
01  recordname  (6)
    nn . . . fieldlength & type  (7) (8)
    nn . . . fieldlength & type
    . . .
WORKING-STORAGE SECTION.
01  file-status    PIC 99.
    . . .
PROCEDURE DIVISION.
    OPEN iomode filename   (9)
    . . .
    READ filename
    . . .
    WRITE recordname
    . . .
    CLOSE filename
  STOP RUN.
         

Above we see that the program defines the recordnames and the types of the individual records, and as such is a template which you can project onto a file, and as such makes storage and retrieval easy. At least from this one single program. If multiple programs are to use the same file however, their definitions must be in sync. If one is to change the data schema in the file, both, or all programs will have to be changed to conform to this new schema. Hence, as computer usage, and program complexity grew, this was no longer tenable as a solution for more complex use-cases. This then takes us to the first \`real\` databases. Which do separate the definition of the structure of the data (schema), from the data and retrieval itself.

Hierarchical databases

The first hierarchichal databases were used in industry, often to keep track of the Bill of Materials in industry for instance. The requirement on this data was largely hierarchichal, and as such data was organised in a tree, with relations going from one node, to multple child nodes. This works great for this single use-case, but has some inherent weaknesses, which we will be looking closer into. The strengths of the system however, was that it solved all the biggest problems of the file-based approach. The schema was now separated out from the programs, and was also separate from the data, and could be used by the database, and the programs alike, in order to get the structure of the data stored. The weaknesses was the limitation on the relations in the tree. They were 1:many in all nodes. As such, a lot of modelling domains did not amend itself to this model.

Network

Next in line is the network model. This amends the hierarchichal model to let the user model many:many relationships. But there is still a requirement that the model is hierarchichal. As such, a child-node can have multiple parents, but a child-node can not relate to a grand-parent node in any way.


              .o.
           ..      ..
          .          ..
        .o.           ..
      ..   ..         .o..
     ..     . .      ..  .
    ..         ...   .    ..
    o            o..       o...

            

The network databases also came with a separate domain language for defining the database schema, and sub-schemas (which we will talk more about later).

Relational databases

Finally we arrive at the relational databases, which will be our main focus for this write-up series. Relational databases came to be with EF Codd's paper [1]. This form was the first type which had a purely mathematical basis, basing the database design and structure around the concepts of \`relations\`, \`tuples\` and \`attributes\`. Relational databases came with the added benefit that they can model relationships which are not strictly hierarchichal. This opens the door for the database designer to model vastly more complex systems. The design of the database also allowed for embedding a linguistically strongly defined language which is used to create the database schema, and run queries on the existing database to retrieve information (read: SQL - or Sequel as it was initially named). So hold tight, in the next series we are going to look at relational databases, and the theory for them more closely.