PostgreSQL Questions

1-What is a postgresql database cluster and database?

A database cluster is a collection of databases managed by a PostgreSQL server. ‘Database cluster’ in PostgreSQL does not mean ‘a group of database servers’. A PostgreSQL server runs on a single host and manages a single database cluster.

2-What is tablespace in postgresql?

While PostgreSQL supports tablespaces, the meaning of the term is different from other RDBMS. A tablespace in PostgreSQL is one directory that contains some data outside of the base directory. A tablespace in PostgreSQL is an additional data area outside the base directory. The tablespace directory is addressed by a symbolic link from the pg_tblspc subdirectory, and the link name is the same as the OID value of tablespace

3-What is TOAST in postgres?

Heap files composed of 8KB pages .Postgres does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome this limitation, large field values are compressed and/or broken up into multiple physical rows. The technique is affectionately known as TOAST.

Inside pages Tuples(real data) are placed. if heap tuple size is greater than about 2 KB (about 1/4 of 8 KB) then it is stored and managed using a method called TOAST (The Oversized-Attribute Storage Technique).

The TOAST infrastructure is also used to improve handling of large data values in-memory.

Only certain data types support TOAST.

This scheme has a number of advantages compared to a more straightforward approach such as allowing row values to span pages. Assuming that queries are usually qualified by comparisons against relatively small key values, most of the work of the executor will be done using the main row entry. The big values of TOASTed attributes will only be pulled out (if selected at all) at the time the result set is sent to the client. Thus, the main table is much smaller and more of its rows fit in the shared buffer cache than would be the case without any out-of-line storage. Sort sets shrink also, and sorts will more often be done entirely in memory.

4-What is the difference between TUPLE and ROW in postgres?

There’s a difference between the two: a single row might exist on-disk as more than one tuple at any time, with only one of them visible to any single transaction.

The transaction doing an update now sees the new version of the row, the new tuple just inserted on-disk. As long as this transaction has yet to commit then the rest of the world still sees the previous version of the row, which is another tuple on-disk.

While in some contexts tuples and rows are equivalent, we must be careful to use them in the right context.

5- what is the difference between B-Tree index scan and Sequential Scan in postgres?

Sequential scan – All tuples in all pages are sequentially read by scanning all line pointers in each page.
B-tree index scan – An index file contains index tuples, each of which is composed of an index key and a TID pointing to the target heap tuple. If the index tuple with the key that you are looking for has been found, PostgreSQL reads the desired heap tuple using the obtained TID value.

6-What are the main access methods in Postgres?

PostgreSQL supports Sequential scan , B-tree index scan, TID-Scan, Bitmap-Scan, and Index-Only-Scan.

7-Can you explain Process Architectures of PostgreSQL?

PostgreSQL is an object oriented and client/server type relational database management system with the multi-process architecture and runs on a single host.

It is fully ACID compliant and implements transactions isolation to handle the concurrency issues.

Multiple processes managing one database cluster is usually referred to as a ‘PostgreSQL server’(postmaster) and it contains the following types of processes:

A postgres server process is a parent of all processes related to a database cluster management.
Each backend process handles all queries and statements issued by a connected client.
Various background processes perform processes of each feature (e.g., VACUUM and CHECKPOINT processes) for database management.
In the replication associated processes, they perform the streaming replication. The details are described in
In the background worker process supported from version 9.3, it can perform any processing implemented by users. As not going into detail here, refer to the official document.

8- Can you mention about Memory Architecture of PostgreSQL?

Memory architecture in PostgreSQL can be classified into two broad categories:

Local memory area – allocated by each backend process for its own use.

work_mem	Executor uses this area for sorting tuples by ORDER BY and DISTINCT operations, and for joining tables by merge-join and hash-join operations.
maintenance_work_mem	Some kinds of maintenance operations (e.g., VACUUM, REINDEX) use this area.
temp_buffers	Executor uses this area for storing temporary tables.

Shared memory area – used by all processes of a PostgreSQL server.

shared buffer pool	PostgreSQL loads pages within tables and indexes from a persistent storage to here, and operates them directly.
WAL buffer	To ensure that no data has been lost by server failures, PostgreSQL supports the WAL mechanism. WAL data (also referred to as XLOG records) are transaction log in PostgreSQL; and WAL buffer is a buffering area of the WAL data before writing to a persistent storage.
commit log	Commit Log(CLOG) keeps the states of all transactions (e.g., in_progress, committed, aborted) for Concurrency Control (CC) mechanism.

9-Can you explain transaction isolation in PostgreSQL?

There are four isolation levels defined by the standard: read uncommitted, read committed, repeatable read, and serializable. PostgreSQL doesn’t implement read uncommitted, which allows dirty reads, and instead defaults to read committed.

The definition of those isolation levels says that read committed disallows dirty read anomalies, repeatable read disallows dirty read and nonrepeatable read, and serializable disallows all anomalies.

REPEATABLE READ as defined in the ANSI SQL-92 standard allows Phantom Reads. However, PostgreSQL’s implementation does not allow them. In principle, SI does not allow Phantom Reads.

PostgreSQL’s concurrency control based on SI uses the first-updater-win scheme. In contrast, PostgreSQL’s SSI uses the first-committer-win scheme.

Isolation Level	Dirty Reads	Non-repeatable Read	Phantom Read	Serialization Anomaly
READ COMMITTED	Not possible	Possible	Possible	Possible
REPEATABLE READ	Not possible	Not possible	Not possible in PG; (Possible in ANSI SQL)	Possible
SERIALIZABLE	Not possible	Not possible	Not possible	Not possible

10-What is the default isolation level and transaction snapshot in PostgreSQL?

Read Committed is the default isolation level in PostgreSQL. Transaction snapshots are provided by the transaction manager. In the READ COMMITTED isolation level, the transaction obtains a snapshot whenever an SQL command is executed; otherwise (REPEATABLE READ or SERIALIZABLE), the transaction only gets a snapshot when the first SQL command is executed. The obtained transaction snapshot is used for a visibility check of tuples.

When using the obtained snapshot for the visibility check, active transactions in the snapshot must be treated as in progress even if they have actually been committed or aborted. This rule is important because it causes the difference in the behavior between READ COMMITTED and REPEATABLE READ (or SERIALIZABLE).

11-What are conflicts in Transactions and how PostgreSQL approach them?

Conceptually, there are three types of conflicts:

wr-conflicts (Dirty Reads),

ww-conflicts (Lost Updates)

rw-conflicts.

However, wr- and ww-conflicts do not need to be considered because, PostgreSQL prevents such conflicts. Thus, Serializable Snapshot Isolation implementation in PostgreSQL only needs to consider rw-conflicts.

PostgreSQL takes the following strategy for the SSI (Serializable Snapshot Isolation) implementation:

Record all objects (tuples, pages, relations) accessed by transactions as SIREAD locks.
Detect rw-conflicts using SIREAD locks whenever any heap or index tuple is written.
Abort the transaction if a serialization anomaly is detected by checking detected rw-conflicts.

12- Can you explain Wraparound problem and how PostgreSQL solves it?

Assume that tuple Tuple_1 is inserted with a txid of 100, i.e. the t_xmin of Tuple_1 is 100. The server has been running for a very long period and Tuple_1 has not been modified. The current txid is 2.1 billion + 100 and a SELECT command is executed. At this time, Tuple_1 is visible because txid 100 is in the past. Then, the same SELECT command is executed; thus, the current txid is 2.1 billion + 101. However, Tuple_1 is no longer visible because txid 100 is in the future (Fig. 5.20). This is the so called transaction wraparound problem in PostgreSQL.

To deal with this problem, PostgreSQL introduced a concept called frozen txid, and implemented a process called FREEZE.

In PostgreSQL, a frozen txid, which is a special reserved txid 2, is defined such that it is always older than all other txids. In other words, the frozen txid is always inactive and visible.

The freeze process is invoked by the vacuum process. The freeze process scans all table files and rewrites the t_xmin of tuples to the frozen txid if the t_xmin value is older than the current txid minus the vacuum_freeze_min_age (the default is 50 million).

The XMIN_FROZEN bit is set to the t_infomask field of tuples.

13-How do you access other databases remotely in PostgreSQL?

we can use the FDW feature, we need to install the appropriate extension and execute setup commands, such as CREATE FOREIGN TABLE, CREATE SERVER and CREATE USER MAPPING.

postgres_fdw extension is used to access a remote PostgreSQL server. Others: oracle_fdw, mysql_fdw, firebird_fdw, sqlite_fdw, tds_fdw (sysbase and mssql server) etc…

After providing the appropriate setting, the functions defined in the extension are invoked during query processing to access the foreign tables.

14-what is a transaction in DBMS?

A transaction is a set of operations performed by an application that transfers a database from one correct state to another correct state (Consistency), provided that the transaction is completed (Atomicity) and without interference from other transactions (Isolation). Together with Durability these comprises ACID rules acronym.

15-What is Vacuum Processing in PostgreSQL?

Vacuum processing performs the following tasks for specified tables or all tables in the database.

Removing dead tuples
- Remove dead tuples and defragment live tuples for each page.
- Remove index tuples that point to dead tuples.
Freezing old txids
- Freeze old txids of tuples if necessary.
- Update frozen txid related system catalogs (pg_database and pg_class).
- Remove unnecessary parts of the clog (Commit Log) if possible.
Others
- Update the FSM (Free Space Map) and VM (Visibility Map) of processed tables.
- Update several statistics (pg_stat_all_tables, etc).

The VACUUM command has supported the PARALLEL option since version 13, it processes vacuuming index and cleaning index up phases in parallel if set the option and there are multiple indexes which are created.

Note that this feature is valid in only the VACUUM command, autovacuum does not support it.

16- What is Visibility Map in postgreSQL?

Vacuum processing is costly; thus, the VM has been introduced in version 8.4 to reduce this cost.

The basic concept of the VM is simple. Each table has an individual visibility map that holds the visibility of each page in the table file. The visibility of pages determines whether each page has dead tuples. Vacuum processing can skip a page that does not have dead tuples.

Figure 6.2 shows how the VM is used. Suppose that the table consists of three pages, and the 0th and 2nd pages contain dead tuples and the 1st page does not. The VM of this table holds information about which pages contain dead tuples. In this case, vacuum processing skips the 1st page by referring to the VM’s information.

Each VM is composed of one or more 8 KB pages, and this file is stored with the ‘vm’ suffix. As an example, one table file whose relfilenode is 18751 with FSM (18751_fsm) and VM (18751_vm) files are matched.

17- Can You define Autovacuuming process in PostgreSQL?

Vacuum processing has been automated with the autovacuum daemon; thus, the operation of PostgreSQL has become extremely easy.

The autovacuum daemon periodically invokes several autovacuum_worker processes. By default, it wakes every 1 min (defined by autovacuum_naptime), and invokes three workers (defined by autovacuum_max_works).

The autovacuum workers invoked by the autovacuum perform vacuum processing concurrently for respective tables gradually with minimal impact on database activity.

18-What can a developer do during the logical and physical design of a database to help ensure that their database and application perform well?

A developer must investigate volumes of data (capacity planning), what types of information will be stored, and how that data will be accessed. If you are dealing with an upgrade to an existing system, analyzing the present data and where existing data volumes occur, how that data is accessed and where the current response bottlenecks are occurring, can help you search for problem areas in the design.

A new system would require a thorough investigation of what data will be captured, and looking at volumes of data held in other formats also will aid design. Knowing your data is just as important as knowing the constituents of your data. Also, constantly revisit your design. As your system is built, check relationships, volumes of data, and indexes to ensure that the physical design is still at its optimum.

19-What are the 3 Types of Data Models?

Conceptual data models, logical data models and physical data models make up the three types of data model. While they require different approaches to build, each type of data model conveys the same information, from different perspectives. Leveraging all three types of data models helps address different stakeholders’ needs and levels of expertise.

The models are used in different stages of the development process to foster the alignment of business goals and requirements with how data resources are used.

Conceptual data models are used to communicate business structures and concepts at a high level of abstraction. These models are constructed without taking system constraints into account and are usually developed by business stakeholders and data architects to define and organize the information that is needed to develop a system.

Logical data models are concerned with the types, attributes, and relationships of the entities that will inhabit the system. A logical model is often created by a data architect and used by business analysts. The goal is to develop a platform-independent representation of the entities and their relationships. This stage of data modeling provides organizations with insight pertaining to the limitations of their current technologies.

Physical data models are used to define the implementation of logical data models employing a particular database management system (DBMS). They are built with the current – or expected to be – technological capabilities. Database developers and analysts work with physical data models to enact the ideas and processes refined by conceptual and logical models.

Concepts and system requirements are refined at each step as they move from conceptual models to logical models and are solidified in physical models.

DBA Notes

PostgreSQL Questions

Leave a Reply

Related posts:

Leave a Reply