Database Fundamentals

1. What is a Database?

A database is a container and designed to efficiently store, retrieve, and manipulate large amounts organized of data.

The term "database" is frequently misused to refer to the database software being used. However, this is an incorrect usage and often leads to confusion. The correct term for the software is the Database Management System (DBMS). The database itself is the container that is created and manipulated through the DBMS. The nature and structure of the database can vary between different databases.

Following information is for Relational Databases, which will be discussed in Database Management Systems section.

1.1. Database Types

1.1.1. Online Transactional Processing (OLTP) vs Online Analytical Processing (OLAP)

Online Transactional Processing (OLTP) systems are designed to support everyday transaction processing needs. They focus on handling numerous short online transactions such as insert, update, and delete operations. OLTP systems prioritize speed and efficiency in processing a large number of transactions, ensuring data integrity and quick query processing to support daily operations like order entry, financial transactions, and customer information management. The data in OLTP databases is highly normalized to reduce redundancy.

Online Analytical Processing (OLAP) systems, in contrast, are intended for complex data analysis and querying tasks. These systems allow users to perform multi-dimensional analysis of business data, supporting activities like business reporting, forecasting, and decision-making. OLAP is optimized for reading large volumes of data, aggregating information, and identifying trends and patterns over time. Unlike OLTP, OLAP databases are often denormalized to facilitate faster query performance and support complex operations like data mining and business intelligence.

1.1.2. Differences between OLTP and OLAP

Aspect

OLTP (Online Transaction Processing)

OLAP (Online Analytical Processing)

Purpose

Designed for managing transactional data.

Designed for complex queries and data analysis.

Data Type

Real-time, detailed, operational data (e.g., sales transactions).

Aggregated, historical, summary data for analysis.

Data Consistency

ACID (Atomicity, Consistency, Isolation, Durability) properties are critical.

Data may be slightly less consistent as it is designed for analysis.

Data Structure

Relational tables, normalized to reduce redundancy.

Star/snowflake schemas, denormalized for fast access.

Volume of Data

Smaller datasets (typically 1000s of rows).

Larger datasets (typically millions of rows).

Transaction Volume

High transaction volume, quick insertions, updates, and deletions.

Lower transaction volume, mostly read-heavy operations.

Update Frequency

Real-time or frequent updates with high concurrency.

Periodic updates, often in batches.

Query Complexity

Simple, quick queries with specific data retrieval.

Complex queries involving large data sets and multidimensional analysis.

Performance Focus

Transaction speed and data integrity.

Query performance and aggregation speed.

Example Use Case

Banking systems, e-commerce websites, order management systems.

Business intelligence, data mining, market research, financial analysis.

2. What is a Schema?

In SQL, the database schema defines the structure and characteristics of the database objects within a database system. It specifies the names of the objects, the columns or fields they contain, the data types of those columns, and any constraints or rules that govern the data. For instance, the 'year' feature of almost all tables is restricted to hold only Integer values, while a 'name' column may contain only string values. This predefined structure enables a database to maintain efficiency and consistency, even when storing millions or billions of rows.

3. What is a Table?

A table in a database represents a structured collection of related data. It is a two-dimensional arrangement of rows and columns, where each row represents a specific record or instance of data, and each column represents a particular attribute or field of that data, and has an associated datatype, i.e. type of data the column can contain, defined in the schema. Every table in a database has a unique name that identifies it.

Hint: Breaking Up Data

It is extremely important to break data into multiple columns correctly. For example, city, state, and ZIP code should always be separate columns. By breaking these out, it becomes possible to sort or filter data by specific columns (for example, to find all customers in a particular state or in a particular city). If city and state are combined into one column, it would be difficult to sort or filter by state.

3.1. Primary Key (PK)

In a table, it is essential for each row to have a column (or set of columns) that uniquely identifies it. This ensures that no two rows have identical identification values. For instance, a table storing customer information may utilize a customer ID column as the unique identifier, while an orders table may employ an order ID.

The column (or set of columns) responsible for uniquely identifying each row in a table is referred to as the primary key (PK). It serves as a means to reference and distinguish individual rows. When a primary key is absent, updating or deleting specific rows becomes challenging, as there is no reliable method to accurately identify the intended rows to be affected.

Having a primary key in a table guarantees the ability to precisely reference and manipulate rows, ensuring the integrity and reliability of the data within the table.

The primary key in a table can be chosen from any column, provided that it adheres to the following conditions:

No two rows can share the same primary key value.
Each row must have a PK value, meaning primary key columns cannot have NULL values.
Modifying or updating primary key values are not allowed
Primary key values should not be reused. If a row is deleted, its primary key should not be assigned to any new rows in the future.

While primary keys are commonly defined on a single column, it is also possible to use multiple columns as a composite primary key. In such cases, the aforementioned rules apply to all the columns, and the combination of values from these columns must be unique (individual columns do not need to have unique values).

Hint: Always Assign Primary Keys

It is crucial to establish a primary key in a table to ensure the uniqueness, integrity, and reliable identification of rows.

3.2. Foreign Key (FK)

A foreign key is a column in a table whose values must be listed in a primary key in another table.

4. What is a View?

A view is a virtual table that is derived from the data in one or more underlying tables. It does not store any data itself but presents the data in a structured and organized format, allowing users to query and manipulate it as if it were a physical table.

Views offer several advantages:

Simplify complex queries: Views can abstract away the complexity of joining multiple tables or applying filters by encapsulating them into a single virtual table.
Data security and access control: Views can be used to provide restricted access to certain columns or rows, allowing users to see only the data they are authorized to access.
Data abstraction and consistency: Views can present a subset or combination of columns from different tables, providing a unified and consistent view of the data.
Query reusability: Views can encapsulate frequently used queries, enabling them to be reused by different users or applications without the need to rewrite the underlying query.

Materialized View:

There is also another type of view called Materialized View. Unlike a standard view, a materialized view stores the result of the underlying query on disk. It is a snapshot of the data that can be periodically refreshed, providing faster read performance at the cost of data freshness.

4.1. Main Differences Between Tables and Views

Table

View

Definition

Data Storage Object

Saved Query

Purpose

To store structured data persistently

To simplify complex queries, present data in a specific format, enforce security, and provide abstraction

Data Storage

Data is physically stored and occupy disk space

Does not store data on its own

Modification of Data

You can insert, update, or delete data directly

Generally read-only, cannot modify data

4.2 Check the Existing Views in PostgreSQL

4.2.1. Using `\dv` in psql (PostgreSQL command-line interface):

If you are using the psql command-line tool, you can list all the views with the following command:

\dv

This will show all the views in the current database along with their schema and type.

To see views in a specific schema:
```
\dv schema_name.*
```

4.2.2. Querying the system catalog (`pg_views`):

You can also use a SQL query to check for existing views. The pg_views system catalog contains metadata about all the views in the database.

SELECT viewname, schemaname
FROM pg_views
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

This query retrieves the names of all views, filtering out system views from pg_catalog and information_schema schemas.

4.2.3. Using `information_schema.views`:

Another option is to query the information_schema.views table, which stores information about all views in a database.

SELECT table_name AS view_name, table_schema
FROM information_schema.views
WHERE table_schema NOT IN ('pg_catalog', 'information_schema');

This will return a list of view names and the schemas they belong to, while excluding system views.

 view_name       | table_schema
-----------------+--------------
 customer_view   | public
 sales_view      | analytics

In summary, tables are the foundational data storage objects in a database, while views are saved SQL queries that present data from one or more tables in a specific way. When you query a view, the database runs the underlying query to fetch the data.

For Table and View Creation please check out Creating and Modifying Tables/Views section.

Last updated 6 months ago

Was this helpful?