Feed aggregator

Chris Travers: PostgreSQL vs Hadoop

Planet PostgreSQL - 19 August, 2016 - 08:43
So one of the folks I do work with is moving a large database from PostgreSQL to Hadoop.  The reasons are sound -- volume and velocity are major issues for them, and PostgreSQL is not going away in their data center and in their industry there is a lot more Hadoop usage and tooling than there is PostgreSQL tooling for life science analytics (Hadoop is likely to replace both PostgreSQL and, hopefully, a massive amount of data on NFS).  However this has provided an opportunity to think about big data problems and solutions and their implications.  At the same time I have seen as many people moving from Hadoop to PostgreSQL as the other way around.  No, LedgerSMB will never likely use Hadoop as a backend.  It is definitely not the right solution to any of our problems.

Big data problems tend to fall into three categories, namely managing ever increasing volume of data, managing increasing velocity of data, and dealing with greater variety of data structure.  It's worth noting that these are categories of problems, not specific problems themselves, and the problems within the categories are sufficiently varied that there is no solution for everyone.  Moreover these solutions are hardly without their own significant costs.  All too often I have seen programs like Hadoop pushed as a general solution without attention to these costs and the result is usually something that is overly complex and hard to maintain, may be slow, and doesn't work very well.

So the first point worth noting is that big data solutions are specialist solutions, while relational database solutions for OLTP and analytics are generalist solutions.  Usually those who are smart start with the generalist solutions and move to the specialist solutions unless they know out of the box that the specialist solutions address a specific problem they know they have.  No, Hadoop does not make a great general ETL platform.....

One of the key things to note is that Hadoop is built to solve all three problems simultaneously.  This means that you effectively buy into a lot of other costs if you are trying to solve only one of the V problems with it.

The single largest cost comes from the solutions to the variety of data issues.  PostgreSQL and other relational data solutions provide very good guarantees on the data because they enforce a lack of variety.  You force a schema on write and if that is violated, you throw an error.  Hadoop enforces a schema on read, and so you can store data and then try to read it, and get a lot of null answers back because the data didn't fit your expectations.  Ouch.  But that's very helpful when trying to make sense of a lot of non-structured data.

Now, solutions to check out first if you are faced with volume and velocity problems include Postgres-XL and similar shard/clustering solutions but these really require good data partitioning criteria.  If your data set is highly interrelated, it may not be a good solution because cross-node joins are expensive.  Also you wouldn't use these for smallish datasets either, certainly not if they are under a TB since the complexity cost of these solutions is not lightly undertaken either.

Premature optimization is the root of all evil and big data solutions have their place.  However don't use them just because they are cool or new, or resume-building.  They are specialist tools and overuse creates more problems than underuse.

Simon Riggs: Postgres-BDR: 2 Years in Production

Planet PostgreSQL - 18 August, 2016 - 18:30

Postgres-BDR has now reached 1.0 production status.

Over the last 2 years, Postgres-BDR has been used daily for mission critical production systems.

As you might imagine, it’s been improved by both bug fixes and feature enhancements that allow it to be used smoothly, so its mature, robust and feature-rich.

The BDR Project introduced logical replication for PostgreSQL, now available as pglogical. In addition, it introduced replication slots, background workers and many other features. But is it still relevant?

Postgres-BDR delivers all of these features that aren’t yet in PostgreSQL 9.6, and likely won’t all be in PostgreSQL 10.0 either

  • Automatic replication of DDL, reducing maintenance costs for DBAs
  • Automatic replication of sequences
  • Conflict resolution and logging

Ernst-Georg Schmid: Hexastores are easy

Planet PostgreSQL - 18 August, 2016 - 15:10
Did you know that you can make a Hexastore from a RDF triple in just one line of SQL? (This needs PostgreSQL 9.4 or better, because of the multi-array unnest)

CREATE OR REPLACE FUNCTION hexify(
    IN sub text,
    IN pred text,
    IN obj text)
  RETURNS TABLE(ord text, a text, b text, c text) AS
$$select A.t || B.t || C.t as ord, A.v, B.v, C.v from (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as A(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as B(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as C(v, t) where a.v != b.v and a.v != c.v and b.v != c.v order by ord desc$$
  LANGUAGE sql IMMUTABLE STRICT
  COST 100
  ROWS 6;

SELECT* FROM hexify('subject','predicate','object');

Sometimes, PostgreSQL SQL is just awesome...

More on Hexastores here and here.

Colin Copeland: Postgres Present and Future (PyCon 2016 Must-See Talk: 6/6)

Planet PostgreSQL - 18 August, 2016 - 15:00

Part six of six in our annual PyCon Must-See Series, a weekly highlight of talks our staff especially loved at PyCon. With so many fantastic talks, it’s hard to know where to start, so here’s our short list.

Coming from a heavy database admin background, I found Craig Kerstiens’s “Postgres Present and Future “to be incredibly well organized and engaging. Of particular interest to me, because I am somewhat new to Postgres (while having more background history with MS SQL), was the deep dive into indexes in Postgres.

Check out 5:44-8:39 to find out when to use different types of indexes, outside of the standard B-Tree. For instance, Gin indexes are helpful when searching multiple values for a single column, ie. an array field or a JSONB field.

Click over to 17:22-19:33 to learn about the new Bloom Filter in Postgres 9.6, which is coming out in a few months. This extension seems like it will be incredibly useful to speed up queries on wide tables with a bunch of different options.


More in the annual PyCon Must-See Talks Series.

Gulcin Yildirim: Evolution of Fault Tolerance in PostgreSQL: Synchronous Commit

Planet PostgreSQL - 17 August, 2016 - 12:22

PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the fourth post of the series and we’ll talk about synchronous commit and its effects on fault tolerance and dependability of PostgreSQL.

If you would like to witness the evolution progress from the beginning, please check the first three blog posts of the series below. Each post is independent, so you don’t actually need to read one to understand another.

  1. Evolution of Fault Tolerance in PostgreSQL 
  2. Evolution of Fault Tolerance in PostgreSQL: Replication Phase 
  3. Evolution of Fault Tolerance in PostgreSQL: Time Travel
Synchronous Commit

By default, PostgreSQL implements asynchronous replication, where data is streamed out whenever convenient for the server. This can mean data loss in case of failover. It’s possible to ask Postgres to require one (or more) standbys to acknowledge replication of the data prior to commit, this is called synchronous replication (synchronous commit).

With synchronous replication, the replication delay directly affects the elapsed time of transactions on the master. With asynchronous replication, the master may continue at full speed.

Synchronous replication guarantees that data is written to at least two nodes before the user or application is told that a transaction has committed.

The user can select the commit mode of each transaction, so that it is possible to have both synchronous and asynchronous commit transactions running concurrently.

This allows flexible trade-offs between performance and certainty of transaction durability.

Configuring Synchronous Commit

For setting up synchronous replication in Postgres we need to configure synchronous_commit parameter in postgresql.conf.

The parameter specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a success indication to the client. Valid values are on, remote_apply, remote_write, local, and off. We’ll discuss how things work in terms of synchronous replication when we setup synchronous_commit parameter with each of the defined values.

Let’s start with Postgres documentation (9.6):

The default, and safe, setting is on. When off, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter to off does not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction.

Here we understand the concept of synchronous commit, like we described at the introduction part of the post, you’re free to set up synchronous replication but if you don’t, there is always a risk of losing data. But without risk of creating database inconsistency, unlike turning fsync off – however that is a topic for another post -. Lastly, we conclude that if we need don’t want to lose any data between replication delays and want to be sure that the data is written to at least two nodes before user/application is informed the transaction has committed, we need to accept losing some performance.

Let’s see how different settings work for different level of synchronisation. Before we start let’s talk how commit is processed by PostgreSQL replication. Client execute queries on the master node, the changes are written to a transaction log (WAL) and copied over network to WAL on the standby node. The recovery process on the standby node then reads the changes from WAL and applies them to the data files just like during crash recovery. If the standby is in hot standby mode, clients may issue read-only queries on the node while this is happening. For more details about how replication works you can check out the replication blog post in this series.

 

Fig.1 How replication works

synchronous_commit = off

When we set sychronous_commit = off,  the COMMIT does not wait for the transaction record to be flushed to the disk. This is highlighted in Fig.2 below.

Fig.2 synchronous_commit = off

synchronous_commit = local

When we set synchronous_commit = local,  the COMMIT waits until the transaction record is flushed to the local disk. This is highlighted in Fig.3 below.

Fig.3 sychronous_commit = local

synchronous_commit = on (default)

When we set synchronous_commit = on, the COMMIT will wait until the server(s) specified by synchronous_standby_names confirm reception of the transaction record (meaning that it has the data in the memory).This is highlighted in Fig.4 below.

Note: When synchronous_standby_names is empty, this setting behaves same as synchronous_commit = local.

Fig.4 synchronous_commit = on

synchronous_commit = remote_write

When we set synchronous_commit = remote_write, the COMMIT will wait until the server(s) specified by synchronous_standby_names confirm write of the transaction record to the disk. This is highlighted in Fig.5 below.

Fig.5 synchronous_commit = remote_write

synchronous_commit = remote_apply

When we set synchronous_commit = remote_apply, the COMMIT will wait until the server(s) specified by synchronous_standby_names confirm that the transaction record was applied to the database. This is highlighted in Fig.6 below.

Fig.6 synchronous_commit = remote_apply

Now, let’s look at sychronous_standby_names parameter in details, which is referred above when setting synchronous_commit as on, remote_apply or remote_write.

synchronous_standby_names = ‘standby_name [, …]’

The synchronous commit will wait for reply from one of the standbys listed in the order of priority. This means that if first standby is connected and streaming, the synchronous commit will always wait for reply from it even if the second standby already replied. The special value of  * can be used as stanby_name which will match any connected standby.

synchronous_standby_names = ‘num (standby_name [, …])’

The synchronous commit will wait for reply from at least num number of standbys listed in the order of priority. Same rules as above apply. So, for example setting synchronous_standby_names = '2 (*)' will make synchronous commit wait for reply from any 2 standby servers.

synchronous_standby_names is empty

If this parameter is empty as shown it changes behaviour of setting synchronous_commit to on, remote_write or remote_apply to behave same as local (ie, the COMMIT will only wait for flushing to local disk).

Note: synchronous_commit parameter can be changed at any time; the behaviour for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. For example, to make a single multi-statement transaction commit asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF within the transaction.

Conclusion

In this blog post, we discussed synchronous replication and described different levels of protection which are available in Postgres. We’ll continue with logical replication in the next blog post.

References

Special thanks to my colleague Petr Jelinek for giving me the idea for illustrations.

PostgreSQL Documentation
PostgreSQL 9 Administration Cookbook – Second Edition

Joshua Drake: Rich in the Jungle: A AWS to Softlayer comparison for PostgreSQL

Planet PostgreSQL - 16 August, 2016 - 21:28

I have updated my Rich in the Jungle presentation with new pricing for AWS vs. Softlayer. Things haven't changed much, in terms of raw performance per dollar (which is not the only qualifier) Softlayer is clearly the winner.

Kaarel Moppel: Insert-only data modeling with PostgreSQL?

Planet PostgreSQL - 16 August, 2016 - 12:28

During recent years there has been quite a lot of fuzz about “insert-only” approaches and some database-like products (Datomic, Apache Samza) have emerged, being inspired by the idea of having an immutable datastore. In light of cheap storage and powerful hardware, I see the idea definitely having potential for certain use cases. So why not […]

The post Insert-only data modeling with PostgreSQL? appeared first on Cybertec - The PostgreSQL Database Company.

Tatsuo Ishii: Importing PostgreSQL 9.6's SQL parser

Planet PostgreSQL - 16 August, 2016 - 11:42
In almost every releases of Pgpool-II, we import the latest version of PostgreSQL's SQL parser (more precisely, the raw parser). This time, our new Pgpool-II developer faced with an interesting problem while importing PostgreSQL 9.6's parser.

In PostgreSQL, the SQL parser is written in bion, a general-purpose parser generator. The particular file including the SQL grammar rules is named "gram.y". gram.y used to include "scan.l", which is a lexical scanner written in flex.

 In reality, gram.y is translated into a C source file by bison, then compiled. Same thing can be said to scan.l, which is translated by flex though.
So the main part of SQL parser  source file was single big file consisted of gram.y and scan.l.

From PostgreSQL 9.6, however, PostgreSQL developers decided to keep gram.y and flex.l separated.

Build backend/parser/scan.l and interfaces/ecpg/preproc/pgc.l standalone.

This gives enough confusion to the developer in charge of the work and took some time before realize the change. I would say it's a fun part of the task when we work on an OSS project:-)  However I cannot stop saying that it would be nice if the SQL parser is exported as a separate library so that we do not need this kind of work in every releases of Pgpool-II.

Chris Travers: Forthcoming new scalable job queue extension

Planet PostgreSQL - 14 August, 2016 - 10:48
So for those of you who know, I now spend most of my time doing more general PostgreSQL consulting and a fair bit of time still on LedgerSMB.  One of my major projects lately has been on a large scientific computing platform currently run on PostgreSQL, but due to volume and velocity of data being moved to Hadoop (the client maintains other fairly large PostgreSQL instances with no intention of moving btw).

With this client's permission I have decided to take a lot of the work I have done in optimizing their job queue system and create an extension under PostgreSQL for it..  The job queue currently runs tens of millions of jobs per day (meaning twice that number of write queries, and a fair number of read queries too) and is one of the most heavily optimized parts of the system, so this will be based on a large number of lessons learned on what is a surprisingly hard problem.

It is worth contrasting this to pg_message_queue of which I am also the author.  pg_message_queue is intended as a light-weight, easy to use message queue extension that one can use to plug into other programs to solve common problems where notification and message transfer are the main problems.  This project will be an industrial scale job queuing system aimed at massive concurrency.  As a result simplicity and ease of use take second place to raw power and performance under load.  In other words here I am not afraid to assume the dba and programming teams know what they are doing and has the expertise to read the manual and implement appropriately.

The first version (1.x) will support all supported versions of PostgreSQL and make the following guarantees:


  1. massively multiparallel, non-blocking performance  (we currently use with 600+ connections to PostgreSQL by worker processes.
  2. Partitioning, coalescing, and cancelling of jobs similar in some ways to TheSchwartz
  3. Exponential pushback based on number of times a job has failed
  4. Jobs may be issued again after deletion but that this can always be detected and bad jobs pruned
  5. Optionally job table partitioning.
The first client written will rely on hand-coded SQL along with DBIx::Class's schema objects.  This client will guarantee that:
  1. Work modules done always succeeds or fails in a transaction
  2. A job notifier class will be shown
  3. Pruning of completed jobs will be provided via the  perl module and a second query.
The history of this is that this came from a major client's use of The Schwartz and they out grew it for scalability reasons.  While the basic approach is thus compatible, the following changes are made:
  1. Job arguments are in json format rather than in Storable format in bytea columns
  2. Highly optimized performance on PostgreSQL
  3. Coalesce is replaced by a single integer cancellation column
  4. Jobs may be requested by batches of various sizes
2.x will support 9.5+ and dispense with the need for both advisory locks and rechecking.  I would like to support some sort of graph management as well (i.e. a graph link that goes from one job type to another which specifies "for each x create a job for y" type of semantics.  That is still all in design.

Shaun M. Thomas: PG Phriday: Inevitable Interdiction

Planet PostgreSQL - 12 August, 2016 - 19:48

“Hey! That row shouldn’t be in that table! How the heck did that get there!? Alright, who wrote the application client filters, because you’re fired!”

Good application developers know never to trust client input, but not all realize that a single app is rarely the only vector into a database. Databases don’t just preserve data with various levels of paranoia, they’re also the central nexus of a constellation of apps, scripts, APIs, GUIs, BMIs, HMOs, and STDs. As such, unless every single one of those share a common ancestor that sanitizes, boils, renders, and formats content before storage, there’s bound to be inconsistencies. That’s just how things work.

One of the major benefits of using an RDBMS, is that engines like Postgres provide several mechanisms for ensuring data integrity beyond crash durability and transaction control. Continuing our discussion on database objects from last week, we vaguely referred to other types of constraint. So, what other mechanisms are available aside from primary keys and unique constraints?

Let’s start with foreign keys, since they illustrate how tables are related and enforce that relation to reject invalid content. Here are two simply related tables and a couple of rows:

CREATE TABLE pet_type ( type_id SERIAL PRIMARY KEY, animal VARCHAR UNIQUE NOT NULL );   CREATE TABLE pet ( pet_id SERIAL PRIMARY KEY, type_id INT NOT NULL, pet_name VARCHAR NOT NULL, owner_name VARCHAR NOT NULL );   ALTER TABLE pet ADD CONSTRAINT fk_pet_pet_type FOREIGN KEY (type_id) REFERENCES pet_type (type_id);   INSERT INTO pet_type (animal) VALUES ('cat'), ('dog');   INSERT INTO pet (type_id, pet_name, owner_name) VALUES (1, 'Meow Meow Fuzzyface', 'Cedric'), (2, 'Mr. Peanutbutter', 'Paul');

Foreign keys provide a buffer between pending modifications by enforcing the relationship. In this case, we can’t remove “dog” as a pet type because at least one pet references it. We also can’t insert a pet fish, because there’s no corresponding type.

The other job the foreign key fills is to normalize the type names. If we had used a VARCHAR column in the pet table instead, we could have types of “Dog”, “dog”, “GDo”, or any number of typos preserved for eternity and forever complicating searches. This helps illustrate and sanitize the relationship and all affected data. Let’s see it in action:

DELETE FROM pet_type WHERE animal = 'dog';   ERROR: UPDATE OR DELETE ON TABLE "pet_type" violates FOREIGN KEY CONSTRAINT "fk_pet_pet_type" ON TABLE "pet"   INSERT INTO pet (type_id, pet_name, owner_name) VALUES (3, 'Wanda', 'Lisa');   ERROR: INSERT OR UPDATE ON TABLE "pet" violates FOREIGN KEY CONSTRAINT "fk_pet_pet_type"

Foreign keys do have other modes of operation. We could for instance, declare the constraint as ON DELETE CASCADE. That would enable us to delete from the parent table, but a delete cascade would mean every row in any table that referenced the deleted value would also be removed. That’s a fairly dangerous operation, and a pretty good reason it’s not the default in Postgres.

Foreign keys are fairly limited in application, however. Beyond describing a relationship and enforcing its integrity, there isn’t much left. If we wanted to impose actual rules on the data itself, we need to go a bit further into the toolbox.

For example, suppose we want to ensure pet birth dates are firmly established in the past. None of the previously mentioned constraints will let us apply arbitrary rules on column values, right? That’s where CHECK constraints come in!

ALTER TABLE pet ADD birth_date DATE;   ALTER TABLE pet ADD CONSTRAINT ck_pet_no_future CHECK (CURRENT_DATE - birth_date > 0);   INSERT INTO pet (type_id, pet_name, owner_name, birth_date) VALUES (1, 'Princess Carolyn', 'Amy', '2017-08-12');   ERROR: NEW ROW FOR relation "pet" violates CHECK CONSTRAINT "ck_pet_no_future"

Not bad, eh? Check constraints are exceptionally useful when there are very simple rules we want to enforce. Maybe prices should always be positive. Perhaps invoice line items should be positive unless they’re a credit. There is a lot of potential here, but there’s also room for abuse. There’s technically no limit on the amount of conditionals a check constraint enforces, or the number of checks we prescribe, so we must be judicious or risk performance degradation.

Still, preventing critical data flaws prior to insert is a stupendous capability. Can we go further, though? Of course we can! The final constraint type is for data exclusion. Imagine in our examples that pets can change owners, but can’t be owned by two people simultaneously. Well, we can’t use check constraints for that since they only operate on the current row, and a unique constraint won’t work either.

Let’s watch EXCLUDE handle the situation with ease:

CREATE EXTENSION btree_gist;   ALTER TABLE pet ADD owner_range TSRANGE;   ALTER TABLE pet ADD CONSTRAINT ex_owner_overlap EXCLUDE USING GIST ( pet_name WITH =, owner_range WITH && );   INSERT INTO pet_type (animal) VALUES ('horse');   INSERT INTO pet (type_id, pet_name, owner_name, birth_date, owner_range) VALUES (3, 'Bojack', 'Will', '1964-06-12', '[1964-06-12,2014-09-07)');   INSERT INTO pet (type_id, pet_name, owner_name, birth_date, owner_range) VALUES (3, 'Bojack', 'Arnett', '1964-06-12', '[2013-09-07,)');   ERROR: conflicting KEY VALUE violates exclusion CONSTRAINT "ex_owner_overlap"

There are a couple prerequisites for using exclusion this way, of course. Since the gist index type wasn’t designed to handle types like INT or VARCHAR natively, we need to give it B-Tree capability first with an extension.

Beyond that oddity, we merely added the new ownership date range and then added the constraint itself. The exclusion syntax is to list the column and then the type of operator that should be applied. For ranges, that operator is && to indicate overlap. For our particular example, no pet with the same name can have overlapping ownership ranges. This is a tiny universe indeed!

Exclusion constraints work better for things like scheduling, preventing archive overlaps, and other operations that would be awkward or impossible otherwise. With other database engines, an application might have to search for a date range and other parameters and self-verify that a record is safe to insert. Under this paradigm, any tertiary data vector that isn’t so diligent would be free to ignore scheduling conflicts.

But no application, script, or API can ignore rules the database itself enforces. That’s what constraints are for: those times when being a shared resource is a liability. Judicious application of various constraints can protect as well as describe the data, and make things easier (and safer) for everyone.

And if that isn’t the Postgres motto, maybe it should be.

Craig Ringer: BDR 1.0

Planet PostgreSQL - 12 August, 2016 - 08:22

I’m pleased to say that we’ve just released Postgres-BDR 1.0, based on PostgreSQL 9.4.9.

This release contains significant improvements to DDL replication locking, global sequences, documentation, performance, and more. It also removes the deprecated UDR component in favour of pglogical.

It’s taken a lot of work to get to this point. This release sets the foundation to port BDR to PostgreSQL 9.6 and to enhance its high-availability capabilities, and I’m excited to be pushing BDR forward.

gabrielle roth: PDXPUG: August meeting in two days

Planet PostgreSQL - 12 August, 2016 - 02:10

When: 6-8pm Thursday August 18, 2106
Where: iovation
Who: Brian Panulla, Marty Zajac, Gabrielle Roth
What: ETL Throwdown (or up)

For August, we’re having a panel discussion on various ETL tools we’ve tried.
Brian: moving from bulk ETL to near real time ETL with Tungsten Replicator to replicate from MySQL to PostgreSQL and Solr
Marty: Pentaho
Gabrielle: CloverETL, home-grown solution using postgres_fdw, and a brief rant about Informatica if we have time.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

iovation provides us a light dinner (usually sandwiches or pizza).

Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!


Kaarel Moppel: Logging of data modifications and the “log_statement” configuration parameter

Planet PostgreSQL - 11 August, 2016 - 09:36

PostgreSQL has a bagful of server configuration parameters (249 according to my counting for version 9.5) at your disposal, which mostly is a good thing as it enables to take the maximum out of your hardware if you’re willing to put in the necessary time. But some of the parameters might leave the door open […]

The post Logging of data modifications and the “log_statement” configuration parameter appeared first on Cybertec - The PostgreSQL Database Company.

Postgres-BDR 9.4 1.0 Released

PostgreSQL News - 11 August, 2016 - 03:00

Oxford, United Kingdom - August 11, 2016

Bi-Directional Replication for PostgreSQL (Postgres-BDR, or BDR) is the first open source multi-master replication system for PostgreSQL to reach full production status, developed by 2ndQuadrant and assisted by a keen user community. BDR is specifically designed for use in geographically distributed clusters, using highly efficient asynchronous logical replication, supporting anything from 2-48 nodes in a distributed database.

WHAT'S NEW?

Significant improvements related to stability and bug fixes have been made in this release. The most important factors of this release are:

  • Smoother handling of schema changes (DDL) statements allowing increased operational stability and reduced maintenance.
  • Various bug fixes for operational issues demonstrating high level of maturity
  • Performance tuning, especially of global sequence handling
  • Removal of the now deprecated UDR
  • Extensive documentation improvements based upon user feedback

For a more comprehensive list of release notes, please click here: http://bdr-project.org/docs/stable/release-1.0.0.html

SUPPORT & RELEASE SCHEDULE

Postgres-BDR 9.4 1.0 will be supported until December 2019, when PostgreSQL 9.4 goes EOL. BDR continues to be developed and BDR9.6 will be released within 6 months. The BDR project has already delivered the majority of its code into PostgreSQL core, though it will likely be about 2-3 more years before this functionality is available in PostgreSQL core.

IS Postgres-BDR RIGHT FOR YOU?

BDR is well suited for databases where:

  • Data is distributed globally
  • Majority of data is written to from only one node at a time (For example, the US node mostly writes changes to US customers, each branch office writes mostly to branch-office-specific data, and so on.)
  • There is a need for SELECTs over complete data set (lookups and consolidation)
  • There are OLTP workloads with many smaller transactions
  • Transactions mostly touching non overlapping sets of data
  • There is partition and latency tolerance

However, this is not a comprehensive list and use cases for BDR can vary based on database type and functionality.

In addition, BDR aids business continuity by providing increased availability during network faults. Applications can be closer to the data and more responsive for users, allowing for a much more satisfying end-user experience.

PROFESSIONAL SUPPORT FOR BDR

BDR is developed and supported by 2ndQuadrant. 2ndQuadrant provides professional support for BDR, as well as consultation and training. With our deep knowledge of distributed databases, we will ensure that you are able to evaluate your use case and applications and determine the most suitable product or solution.

BDR is also available in 2ndQuadrant Cloud, initially on AWS.

For more information, please visit the homepage: https://2ndquadrant.com/en/resources/bdr/.

2016-08-11 Security Update Release

PostgreSQL News - 11 August, 2016 - 03:00

The PostgreSQL Global Development Group has released an update to all supported versions of our database system, including 9.5.4, 9.4.9, 9.3.14, 9.2.18 and 9.1.23. This release fixes two security issues. It also patches a number of other bugs reported over the last three months. Users who rely on security isolation between database users should update as soon as possible. Other users should plan to update at the next convenient downtime.

Security Issues

Two security holes have been closed by this release:

  • CVE-2016-5423: certain nested CASE expressions can cause the server to crash.
  • CVE-2016-5424: database and role names with embedded special characters can allow code injection during administrative operations like pg_dumpall.

The fix for the second issue also adds an option, -reuse-previous, to psql's \connect command. pg_dumpall will also refuse to handle database and role names containing line breaks after the update. For more information on these issues and how they affect backwards-compatibility, see the Release Notes.

Bug Fixes and Improvements

This update also fixes a number of bugs reported in the last few months. Some of these issues affect only version 9.5, but many affect all supported versions:

  • Fix misbehaviors of IS NULL/IS NOT NULL with composite values
  • Fix three areas where INSERT ... ON CONFLICT failed to work properly with other SQL features.
  • Make INET and CIDR data types properly reject bad IPv6 values
  • Prevent crash in "point ## lseg" operator for NaN input
  • Avoid possible crash in pg_get_expr()
  • Fix several one-byte buffer over-reads in to_number()
  • Don't needlessly plan query if WITH NO DATA is specified
  • Avoid crash-unsafe state in expensive heap_update() paths
  • Fix hint bit update during WAL replay of row locking operations
  • Avoid unnecessary "could not serialize access" with FOR KEY SHARE
  • Avoid crash in postgres -C when the specified variable is a null string
  • Fix two issues with logical decoding and subtransactions
  • Ensure that backends see up-to-date statistics for shared catalogs
  • Prevent possible failure when vacuuming multixact IDs in an upgraded database
  • When a manual ANALYZE specifies columns, don't reset changes_since_analyze
  • Fix ANALYZE's overestimation of n_distinct for columns with nulls
  • Fix bug in b-tree mark/restore processing
  • Fix building of large (bigger than shared_buffers) hash indexes
  • Prevent infinite loop in GiST index build with NaN values
  • Fix possible crash during a nearest-neighbor indexscan
  • Fix "PANIC: failed to add BRIN tuple" error
  • Prevent possible crash during background worker shutdown
  • Many fixes for issues in parallel pg_dump and pg_restore
  • Make pg_basebackup accept -Z 0 as no compression
  • Make regression tests safe for Danish and Welsh locales

The libpq client library has also been updated to support future two-part PostgreSQL version numbers. This update also contains tzdata release 2016f, with updates for Kemerovo, Novosibirsk, Azerbaijan, Belarus, and Morocco.

EOL Warning for Version 9.1

PostgreSQL version 9.1 will be End-of-Life in September 2016. The project expects to only release one more update for that version. We urge users to start planning an upgrade to a later version of PostgreSQL as soon as possible. See our Versioning Policy for more information.

Updating

All PostgreSQL update releases are cumulative. As with other minor releases, users are not required to dump and reload their database or use pg_upgrade in order to apply this update release; you may simply shut down PostgreSQL and update its binaries. Users who have skipped one or more update releases may need to run additional, post-update steps; please see the release notes for earlier versions for details.

Links: Download Release Notes Security Page Versioning Policy

PostgreSQL 9.6 Beta 4 Released

PostgreSQL News - 11 August, 2016 - 03:00

The PostgreSQL Global Development Group announces today that the fourth beta release of PostgreSQL 9.6 is available for download. This release contains previews of all of the features which will be available in the final release of version 9.6, including fixes to many of the issues found in the first and second betas. Users are encouraged to continue testing their applications against 9.6 beta 4.

Changes Since Beta 3

9.6 Beta 4 includes the security fixes in the 2016-08-11 Security Update, as well as the general bug fixes offered for stable versions. Additionally, it contains fixes for the following beta issues reported since the last beta:

  • Change minimum max_worker_processes from 1 to 0
  • Make array_to_tsvector() sort and de-duplicate the given strings
  • Fix ts_delete(tsvector, text[]) to cope with duplicate array entries
  • Fix hard to hit race condition in heapam's tuple locking code
  • Prevent "snapshot too old" from trying to return pruned TOAST tuples
  • Make INSERT-from-multiple-VALUES-rows handle targetlist indirection
  • Do not let PostmasterContext survive into background workers
  • Add missing casts in information schema
  • Fix assorted problems in recovery tests
  • Block interrupts during HandleParallelMessages()
  • Remove unused arguments from pg_replication_origin_xact_reset function
  • Correctly handle owned sequences with extensions
  • Many fixes for tsqueue.c
  • Eliminate a few more user-visible "cache lookup failed" errors
  • Teach parser to transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest
  • Allow functions that return sets of tuples to return simple NULLs
  • Repair damage done by citext--1.1--1.2.sql
  • Correctly set up aggregate FILTER expression in partial-aggregation plans

This beta also includes many documentation updates and improvements.

Due to changes in system catalogs, a pg_upgrade or pg_dump and restore will be required for users migrating databases from earlier betas.

Note that some known issues remain unfixed. Before reporting a bug in the beta, please check the Open Items page.

Beta Schedule

This is the fourth beta release of version 9.6. The PostgreSQL Project will release additional betas as required for testing, followed by one or more release candidates, until the final release in late 2016. For further information please see the Beta Testing page.

Links

9.5.4

PostgreSQL versions - 11 August, 2016 - 03:00
9.5.4 is the latest release in the 9.5 series.

9.4.9

PostgreSQL versions - 11 August, 2016 - 03:00
9.4.9 is the latest release in the 9.4 series.

9.3.14

PostgreSQL versions - 11 August, 2016 - 03:00
9.3.14 is the latest release in the 9.3 series.

9.2.18

PostgreSQL versions - 11 August, 2016 - 03:00
9.2.18 is the latest release in the 9.2 series.
Syndicate content