Introduction to PostgreSQL - Configuration.

How to configure PostgreSQL to get the most out of this database in its version 8.2.x.

As shown in our previous article, PostgreSQL can start using without having to configure, just finished installing and then starting our cluster. But if we are going to use PostgreSQL for something important and true volume of data and users to configure it is essential for such work.

It is not the first time that any protest or super user worried about what this evil and its slow works cluster PostgreSQL database on a server last model with massive memory. Usually the problem is that PostgreSQL has not been configured to work with the volume of data to users and that we're using. Not a great help to have a server with several GBytes of RAM if we told PostgreSQL, for example, no longer than 32MBytes.

We also have to say that any database you are using an active, not just PostgreSQL, is a dynamic and alive, in which the data are constantly changing and where the size of the stored data is usually grown with time. This means that a configuration that works well with certain value today may not work so well after a year of use and need to adjust to work optimal.

Settings

PostgreSQL's behavior in our system can be controlled by three configuration files that are in the data directory where PostgreSQL initialize our cluster (in our case / var / pgsql / data). These three files are:

     * Pg_hba.conf: This file is used to define the different types of access a user has in the cluster.
     * Pg_ident.conf: This file is used to define the necessary information to use in case of an access type in pg_hba.conf ident.
     * Postgresql.conf: This file can change all the settings that affect the functioning and performance of PostgreSQL on your machine.

We turn now to explain the most important changes we can make some of these files.
pg_hba.conf

This file is used to define how, where and from which site a user can use our cluster PostgreSQL. All lines that begin with the character # are interpreted as comments. The rest must have the following format:

[Connection type] [database] [user] [IP] [Netmask] [authentication type] [options]

Depending on the type of connection and the type of authentication, [IP], [Netmask] and [option] may be optional. We will explain a bit like defining access rules. The type of connection may have the following values, local host, hostssl and hostnossl. The kind of method can have the following values, trust, reject, md5, crypt, password, krb5, identity, or pam ldap

A number of examples will help us better understand how we can set different access to the cluster PostgreSQL.

Example 1 .- Access by TCP / IP (network) to the database test001, test user from a computer with IP 10.0.0.100 and md5 authentication method:

test001 test 10.0.0.100 host 255.255.255.255 md5

The same entry could also write with netmask in CIDR notation:

test001 host test 10.0.0.100/32 md5

Example 2 .- Access by TCP / IP (network) to the database test001, user test from all the computers on the network 10.0.0.0 with netmask 255.255.255.0 (254 computers in total) and authentication method md5:

test001 host test 10.0.0.0 255.255.255.0 md5

The same entry could also write with netmask in CIDR notation:

10.0.0.0/24 md5 host test001 test

Example 3 .- Access by TCP / IP (network), encryption, all the databases of our cluster, as a user test from the computer with IP 10.0.0.100 and 10.1.1.100 and computer authentication method md5 (we need two entries in our pg_hba.conf file:

hostssl all test 10.0.0.100 255.255.255.255 md5
hostssl all md5 test 10.1.1.100 255,255,255,255

Example 4 .- Deny access to all databases on our test cluster to the user, from every computer on the network 10.0.0.0/24 and provide access to the rest of the world with the md5 method:

10.0.0.0/24 reject all host test
host all all 0.0.0.0 / 0 md5

So we could keep playing with all the possibilities offered by this configuration file. Of course, users and databases used in this file must exist in our cluster for everything to work and some of the parameters can only be used if we compiled with the relevant options in the installation process (eg hostssl, pam, krb5)

Production in order to change this file will have to tell PostgreSQL to re-read it. Just a simple 'reload' (/ usr / local / bin / pg_ctl-D / var / pgsql / data reload) from the command line or pg_reload_conf function () as a user from postgres psql, the PostgreSQL client.

[postgres @ server] # / usr / local / bin / psql

Welcome to psql 8.2.4, the PostgreSQL interactive terminal.

Type: \ copyright for distribution terms
        \ h for help with SQL commands
        \? for help with psql commands
        \ g or terminate with semicolon to execute query
        \ q to quit

postgres = # SELECT pg_reload_conf ();

  pg_reload_conf
----------------
  t
(1 row)

postgres = #

For a detailed documentation on the file pg_hba.con, go through the Chapter 20 section. Client Authentication of official documentation of PostgreSQL.
postgresql.conf

The changes we make in this file affect all the databases that we have identified in our cluster PostgreSQL. Most of the changes can be put into production with a simple 'reload' (/ usr / local / bin / pg_ctl-D / var / pgsql / data reload), other changes needed we start our new cluster (/ usr / local / bin / pg_ctl-D / var / pgsql / data restart).

More information on all the parameters we can change in this file, as they affect and can be put into production can be found in section 17. Server Configuration of the official documentation of PostgreSQL.

Then we will see the most important parameters that should change if we start to use PostgreSQL for a purpose if we are serious and make the most of our machines. There are many more parameters that can be over time and must be adjusted, here we will focus on the most important and which we should change before you start using PostgreSQL seriously.

max_connections: maximum number of clients connected to both our databases. This value should increase in proportion to the number of concurrent clients in our cluster PostgreSQL. A good start is the value to 100:

max_connections = 100

shared_buffers: This parameter is important and defines the size of the buffer memory used by PostgreSQL. Not to increase this value will have much better response. In a dedicated server we can start with 25% of our total memory. Never more than 1 / 3 (33%) of the total. For example, a server with 4Gbytes memory 1024MB can use as initial value.

shared_buffers = 1024MB

work_mem: Used in operations containing ORDER BY, DISTINCT, joins .... In a dedicated server can use a 2-4% of our minds if we take only a few sessions (clients) large. As initial value we can use 8 Mbytes.

work_mem = 8MB

maintenance_work_mem: Used in operations such VACUUM, ANALYZE, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY. Its value depends much on the size of our databases. For example, a server with 4Gbytes memory, 256MB can be used as initial value.

maintenance_work_mem = 256MB

effective_cache_size: parameters used for the 'query planner' in our database engine to optimize the reading of data. In a dedicated server we can start with 50% of our total memory. At most about 2 / 3 (66%) of the total. For example, a server with 4Gbytes memory 2048MB can use as initial value.

effective_cache_size = 2048MB

checkpoint_segments: This parameter is very important in many databases with write operations (insert, update, delete). To begin with we can start with a value of 64. In large databases with many Gbytes of data can be written to increase this value to 128-256.

checkpoint_segments = 64

It is very important to keep in mind that increasing the default values for many of these parameters, we must increase the default values of some parameters of the kernel of our system. Detailed information on how to do this is in Section 16.4. Managing Kernel Resources for official documentation of PostgreSQL