Feed aggregator

Andrew Dunstan: pgbouncer enhancements

Planet PostgreSQL - 2 May, 2014 - 17:20
A couple of customers have recently asked for enhancements of pgbouncer, and I have provided them.

One that's been working for a while now, puts the address and port of the actual client (i.e. the program that connects to the proxy) into the session's application_name setting. That means that if you want to see where the client is that's running some query that's gone rogue, it's no longer hidden from you by the fact that all connections appear to be coming from the pgbouncer host.You can see it appearing in places like pg_stat_activity.

It only works when a client connects, so if the client itself sets application_name then the setting gets overridden. But few clients do this, and the original requester has found it useful. I've submitted this to the upsteam repo, as can be seen at  https://github.com/markokr/pgbouncer-dev/pull/23.

The other enhancement is the ability to include files in the config file. This actually involves a modification to the library pgbouncer uses as a git submodule, libusual. With this enhancement, a line that has "%include filename" causes the contents of that file to be included in place of the directive. Includes can be nested up to 10 deep. The pull request for this is at  https://github.com/markokr/libusual/pull/7. This one too sems to be working happily at the client's site.

There is one more enhancement on the horizon, which involves adding in host based authentication control similar to that used by Postgres. That's a rather larger bit of work, but I hope to get to it in the next month or two.

Josh Berkus: New Finding Unused Indexes Query

Planet PostgreSQL - 2 May, 2014 - 04:07
As long as we're overhauling standard monitoring queries for PostgreSQL, here's another one.  This query helps you find indexes which are relatively unused, and as a result could probably be dropped.  I wrote a more complex (yes, really) version for our internal Performance Health Check suite, but the linked version is usable by anyone.

The query is also an example of why CTEs, otherwise known as "WITH statements", are a life-saver for working with complex queries.  I've only tested it on 9.2 and 9.3; I don't know if it'll work on older versions.

Before you use it, you need to check how long you've been collecting data into pg_stat_user_indexes and friends.  The default is since you created the database, but some people reset stats on a daily or monthly basis.  So it's important to know what you're looking at.   Don't make the mistake of dropping the indexes which are needed for the month-end reports!

The query divides seldom-used indexes into four groups:

Indexes Which Aren't Scanned At All: these indexes have no scans during the stats period.  These pretty much certainly can be dropped, except those on really small tables which you expect to grow later.

Seldom Used Indexes on Heavily Written Tables:  as a general rule, if you're not using an index twice as often as it's written to, you should probably drop it.  This query is a little more conservative than that.

Really Large, Seldom-Used Indexes: these indexes get used, but not that often, and they're taking up a lot of RAM and storage space.   Consider dropping them after some thought about why they were added.

Large Non-BTree Indexes On Busy Tables:  as a rule, non-BTree indexes like GiST and GIN don't accurately report usage stats.  As a result, we can't check how often they're used, just how big they are and if they're attached to tables which get a lot of writes.  This list of indexes should be very judiciously pruned.

Happy pruning!

Andreas Scherbaum: GSoC 2014: Implementing clustering algorithms in MADlib

Planet PostgreSQL - 2 May, 2014 - 02:22

Andreas 'ads' Scherbaum

Happy to announce that Maxence Ahlouche has been accepted by both the PostgreSQL Project and by Google to implement clustering algorithms in MADlib during the Google Summer of Code 2014.

This project is mentored by Atri Sharma (former GSoC student) and me, with technical help from Pivotal (product owner of MADlib).

Looking forward to another successful GSoC year!

Marko Tiikkaja: Why I consider USING harmful

Planet PostgreSQL - 1 May, 2014 - 03:31
Often on the topic of the different JOIN syntaxes in SQL I mention that I consider ON the only reasonable way to specify JOIN clauses in production code, and I get asked for details.  It always takes me a while to recall the specific problem and to come up with a plausible scenario, so I figured it would be easier to document this once. Suppose you have a database containing information about

Gurjeet Singh: Postgres Hibernator: Reduce Planned Database Down Times

Planet PostgreSQL - 30 April, 2014 - 20:23

TL;DR: Reduce planned database down times by about 97%, by using Postgres Hibernator.

DBAs are often faced with the task of performing some maintenance on their database server(s) which requires shutting down the database. The maintenance may involve anything from a database minor-version upgrade, to a hardware upgrade. One ugly side-effect of restarting the database server/service is that all the data currently in database server’s memory will be all lost, which was painstakingly fetched from disk and put there in response to application queries over time. And this data will have to be rebuilt as applications start querying database again. The query response times will be very high until all the “hot” data is fetched from disk and put back in memory again.

People employ a few tricks to get around this ugly truth, which range from running a select * from app_table;, to dd if=table_file ..., to using specialized utilities like pgfincore to prefetch data files into OS cache. Wouldn’t it be ideal if the database itself could save and restore its memory contents across restarts!

The Postgres Hibernator extension for Postgres performs the automatic save and restore of database buffers, integrated with database shutdown and startup, hence reducing the durations of database maintenance windows, in effect increasing the uptime of your applications.

Postgres Hibernator automatically saves the list of shared buffers to the disk on database shutdown, and automatically restores the buffers on database startup. This acts pretty much like your Operating System’s hibernate feature, except, instead of saving the contents of the memory to disk, Postgres Hibernator saves just a list of block identifiers. And it uses that list after startup to restore the blocks from data directory into Postgres' shared buffers.

As explained in my earlier post, this extension is a set-it-and-forget-it solution, so, to get the benefits of this extension there’s not much a DBA has to do, except install it.

Ideal database installations that would benefit from this extension would be the ones with a high cache-hit ratio. With Postgres Hibernator enabled, your database would start cranking pre-maintenance TPS (Transactions Per Second) within first couple of minutes after a restart.

As can be seen in the chart below, the database ramp-up time drops dramatically when Postgres Hibernator is enabled. The sooner the database TPS can reach the steady state, the faster your applications can start performing at full throttle.

The ramp-up time is even shorter if you wait for the Postgres Hibernator processes to end, before starting your applications.

Sample Runs

As is quite evident, waiting for Postgres Hibernator to finish loading the data blocks before starting the application yeilds a 97% impprovement in database ramp-up time (2300 seconds to get to 122k TPS without Postgres Hibernator vs. 70 seconds).


Please note that this is not a real benchmark, just something I developed to showcase this extension at its sweet spot.

The full source of this mini benchmark is available with the source code of the Postgres Hibernator, at its Git repo.

Hardware: MacBook Pro 9,1 OS Distribution: Ubuntu 12.04 Desktop OS Kernel: Linux 3.11.0-19-generic RAM: 8 GB Physical CPU: 1 CPU Count: 4 Core Count: 8 pgbench scale: 260 (~ 4 GB database)

Before every test run, except the last (‘DB-only restart; No Hibernator’), the Linux OS caches are dropped to simulate an OS restart.

In ‘First Run’, the Postgres Hibernator is enabled, but since this is the first ever run of the database, Postgres Hibernator doesn’t kick in until shutdown, to save the buffer list.

In ‘Hibernator w/ App’, the application (pgbench) is started right after database restart. The Postgres Hibernator is restoring the data blocks to shared buffers while the application is also querying the database.

In the ‘App after Hibernator’, the application is started after the Postgres Hibernator has finished reading database blocks. This took 70 seconds for reading the ~4 GB database.

In ‘DB-only restart; No Hibernator` run, the OS caches are not dropped, but just the database service is restarted. This simulates database minor version upgrades, etc.

It’s interesting to monitor the bi column in vmstat 10 output, while these tests are running. In ‘First Run’ and ‘DB-only restart’ cases this column’s values stayed between 2000 and 5000 until all data was in shared buffers, and then it dropped to zero (meaning that all data has been read from disk into shared buffers). In ‘Hibernator w/ app’ case, this column’s value ranges from 15,000 to 65,000, with an average around 25,000. it demonstrates that Postgres Hibernator’s Block Reader process is aggressively reading the blocks from data directory into the shared buffers, but apparently not fast enough because the applicaion’s queries are causing random reads from disk, which interfere with the sequential scans that Postgres Hibernator is trying to perform.

And finally, in ‘App after Hibernator’ case, this column consistently shows values between 60,000 and 65,000, implying that in absence of simultaneous application load, the Block Reader can read data into shared buffers much faster.

Tim van der Linden: PostgreSQL: A full text search engine - Part 1

Planet PostgreSQL - 30 April, 2014 - 17:00

PostgreSQL, the database of miracles, the RDBMS of wonders.

People who have read my stuff before know that I am a fan of the blue-ish elephant and I greatly entrust it with my data. For reasons why, I invite you to read the "Dolphin ass-whopping" part of the second chapter of my mail server setup series.

But what some of you may not know is that PostgreSQL is capable of much more then simply storing and retrieving your data. Well, that is actually not entirely correct...you are always storing and retrieving data. A more correct way to say it is that PostgreSQL is capable of storing all kinds of data and gives you all kinds of ways to retrieve it. It is not limited to storing boring stuff like "VARCHAR" or "INT". Neither is it limited to retrieving and comparing with boring operators like "=", "ILIKE" or "~".

For instance, are you familiar with PostgreSQL's "tsvector" data type? Or the "tsquery" type? Or what these two represent? No? Well, diddlydangeroo, then by all means, keep reading, because that is exactly what this series is all about!

In the following three chapters I would like to show you how you can configure PostgreSQL to be a batteries included, blazing fast, competition crunching, full text search engine.

But, I can already search strings of text with PostgreSQL!

Hmm, that is very correct. But the basic operators you have at your disposal are limited.

Let me demonstrate.

Imagine we would have a table, called "phraseTable" containing thousands of strings, all saved in a regular, old VARCHAR column named "phrase". Now we would like to find the string "An elephant a day keeps the dolphins at bay.". We do not fully remember the above string, but we do remember it had the word "elephant" in it. With regular SQL you could use the "LIKE" operator to try and find a matching substring. The resulting query would look something like this:

SELECT phrase FROM phraseTable WHERE phrase LIKE '%elephant%';

It would work, you render any index on the table mute when using front and back wildcards, but it would work. Now imagine a humble user would like to find the same string but their memory is bad, they thought the word elephant was capitalized, because it may refer to PostgreSQL, of course. The query would become this:

SELECT phrase FROM phraseTable WHERE phrase LIKE '%Elephant%';

And as a result, you get back zero records.

"But wait!", you shout, "I am a smart ass, there is a solution to this!". And you are correct: the ILIKE operator. The "I" stands for Insensitive...as in Case Insensitive. So you change the query:

SELECT phrase FROM phraseTable WHERE phrase ILIKE '%Elephant%';

And now you will get back a result. Good for you.

A day goes by and the same user comes back and wishes to find this string again. But, his memory still being bad and all, he thought there where multiple elephants keeping the dolphins at bay, because, you know, pun. So the query, you altered yesterday, now reads:

SELECT phrase FROM phraseTable WHERE phrase ILIKE '%Elephants%';

And...now the query will return zero results.

"Ha!", you shout in my general direction. "I am a master of Regular Expressions! I shall fix thay query!".

No, you shall not fix my query. Never, ever go smart on my derrière by throwing a regular expression in the mix to solve a database lookup problem. It is unreadable, un-scalable and fits only one solution perfectly-ish. And, not to forget, is slow as hell for it not only ignores any index you have set, it also asks more of the database engine then a LIKE or ILIKE.

Let me put an end to this and tell you that I am afraid there are no more (scalable) smart ass tricks left to perform and the possibilities to search text with regular, build-in operators are exhausted.

You agree? Yes? Good! So, enter "full text search"!

Full text search?

But before we delve into the details of the PostgreSQL implementation, let us take a step back and first see what exactly a full text search engine is.

Short version: A full text search engine is a system that can retrieve documents, or parts of documents, based on natural language searching.

Natural language means the living, breathing language we humans use. And as you know, human language can be complex and above all ambiguous.

Consider yourself in the situation where you knew, for sure, that you have read an interesting article about elephants in the latest edition of "Your Favorite Elephant Magazine". You liked it so much that you want to show it to your best friend, who happens to be an elephant lover too. The only bummer is, you cannot remember the title, but you do remember it has an interesting sentence in it.

So what do you do? First you quote the sentence in your mind: "The best elephants have a blue skin color.". Next, you pick up the latest edition and you start searching, flipping through the pages, skimming for that sentence.

After a minute or two you shout: "Dumbo!, I have found the article!". You read the sentence out loud: "The best Elephants bear a blue skin tone.". You are happy with yourself, call up your friend and tell him that you will be over right away to show him that specific article.

One thing you forgot to notice was that the sentence in your head, and the sentence that was actually printed where different, but your brain (which is trained in this natural stuff), sees them as the same. How did that work? Well, your brain used its internal synonym list and thesaurus to link the different words together, making them the same thing, just written differently:

  • "elephants" is the same as "Elephants"
  • "have" is the same as "bear"
  • "skin color" is the same "skin tone"

Without noticing it, you have just completed a full text search using natural language algorithms, your magazine as the database, your brain as the engine.

But how does such a natural language lookup work...on an unnatural computer?

What a perfectly timed question, I was just getting to that.

Now that you have a basic understanding of what natural language searching is, how does one port this idea to a stupid, binary ticking tin box? By dissecting the process we do in our brains, lay it out in several programmable steps and concepts. Such a process, run by computers, will never be as good on the natural part as our brains are, but it is certainly a lot faster with flipping and skimming through the magazine pages.

Let us look at how a computer, regardless of which program, platform or engine you use, would go about being "natural" when searching for strings of text.

To speed up the search process, a full text search engine will never search through the actual document itself. That is how we humans would do it, and that is slow and (for our eyes) error prone. Before a document can be searched through with a full text search engine, it has to be parsed into a list of words first. The parsing is where the magic happens, this is our attempt at programming the natural language process. Once the document is parsed, the parsed state is saved. Depending on your database model, you can save the parsed state together with a reference to the original document for later retrieval.

Note that a document, in this context, is simply a big collection of words contained within a file. The engine does not care, and most of the time does not know, about what kind of file (plain text, LibreOffice document, HTML file, ...) it is handling or what the files structure is. It simply looks at all the readable words inside of the file.

So how does the parsing work? Parsing, in this regard, is all about compressing the text found in a document. Cutting down the word count to the least possible, so later, when a user searches, the engine has to plow through fewer words. This compressing, in most engines, is done in roughly three steps.

Eliminate casing

The first step in the compression process is the elimination of casing - keeping only the lower case versions of a word. If you would keep a search case sensitive, then "The ElEphAnt" would not match "the elephant", but generally you do want a match to happen. The user will many times not care (or not know) about casing in a full text search.

Remove stop words

The following step is the removal of words that do not add any searchable value to the text and are seldom searched for. These words are known as "stop words", a term first coined by Hans Peter Luhn, a renowned IBM computer scientist who specialized in the retrieval and indexing of information stored in computer systems.

The list of stop words is not limited to simply ones like "and" or "the". There is an extensive list of hundreds and hundreds of words which are generally considered to be of little value in a search context. A (very) short list of stop words: her, him, the, also, each, was, we, after, been, they, would, up, from, only, you, while, ... .

Remove synonyms, employing a thesaurus and perform stemming

The last part in the compacting of our to-be-indexed document is removing words that have the same meaning and perform stemming. Synonym lookups are used for removing words of the same meaning where as thesaurus lookups are used to compact whole phrases with similar meaning.

Only one instance of all the synonyms, thesaurus phrases and case eliminations is stored, the surviving word is referred to as a lexeme, the smallest, meaningful word. The lexemes that are stored usually (depending on the engine you use) get an accompanying list of (alpha)numeric values stored alongside. Two types of (alpha)numeric values can be stored in case of PostgreSQL:

  • The first type are pure numerical and represent pointer(s) to where the word occurs in the original document.
  • The second type is pure alphabetical (actually only capital A,B,C,D) and represent the weight a certain lexeme has.

Do not worry to much about these two (alpha)numerical values for now, we will get to that later.

Next, let us get practical and start to actually use PostgreSQL to see how all of this works.

The tsvector

As PostgreSQL is an extendable database engine, two new data types where added to make full text search possible, as you have seen in the beginning. One of them is called tsvector, "ts" for text search and "vector", which is analogous with the generic programming data type "vector". It is the container in which the result of the parsing is eventually stored.

Let me show you an example of such a tsvector, as presented by PostgreSQL on querying. Imagine a document with the following string of text inside: "The big blue elephant jumped over the crippled blue dolphin.". A perfectly normal sentence, elephants jump over dolphins all the time.

Without bothering about how to do it, if we let PostgreSQL parse this string, we will get the following tsvector stored in our record:

'big':2 'blue':3,9 'crippl':8 'dolphin':10 'eleph':4 'jump':5

You will notice a few things about this vector, let me go over them one by one.

  • First, you recognize the structure of a vector-ish data type. Hence the name "tsvector".
  • Next, the numbers behind the lexemes themselves, like I said before, represent the pointer(s) to that word. Notice the word "blue" in particular, it has two pointers for the two occurrences in the string.
  • And last, notice how some lexemes do not even look like English words at all. The lexeme "crippl" or "eleph" do not mean anything, to us humans anyway. These are the surviving lexemes of "cripple" and "elephant". PostgreSQL has stemmed and reduced the words to match all possible variants. The lexeme "crippl", for example, matches "cripple", "crippled", "crippling", "cripples", ... .

Note that the above example is the simplest of full text search parsing results, we did not add any weights nor did we employ a thesaurus (or an advanced dictionary) to get back a more efficient compressing.

Now that we are dwelling inside of PostgreSQL, I can elaborate a bit more about how the parsing works exactly. As we have seen above, it happens in roughly three steps. But I intentionally neglected to say that with PostgreSQL, there is an intermediate state between the word and the resulting lexeme.

When PostgreSQL parses the string of text it goes over them and first categorizes each word into sections like "stop words", "plural words", "synonyms", ... . Once the words are broken down into categories, we refer to them as tokens. This is the intermediate state. For a token to become a lexeme, PostgreSQL will consult a set of defined dictionaries for each category to try and find a match. If a match is found, the dictionary will propose a lexeme. This lexeme is the one that will finally be put in the vector as the parsed result.

If the dictionaries did not find a match, the word is discarded. The one exception to this are the "stop words", if a word matches a stop word, it will be discarded instead of kept.

Let us now get our hands dirty and setup a quick testing database and rig it up with the phraseTable table we have been using in our journey so far. But instead of a varchar column, this table will contain a tsvector type for we will unleash to power of Full Text Search!

Note: I am assuming you have at least PostgreSQL 9.1 or higher. This post was written with PostgreSQL 9.3 in mind.

So connect to your PostgreSQL install and create the database:


Do not worry to much about the ownership of this database nor the ownership of its tables, you can discard it whole later. Now, switch over to the database:

\c phrases

And create the phraseTable table:

CREATE TABLE phrases (phrase tsvector NOT NULL);

Okay, simple enough. We now have a tiny database, with a table containing one column of type tsvector.

Let us insert a parsed vector into the table. Again, without employing a thesaurus or any other tools, we only use the built-in, default configuration to parse a string and save it as a vector.

Let us insert the vector, shall we?

INSERT INTO phraseTable VALUES (to_tsvector('english','The big blue elephant jumped over the crippled blue dolphin.'));

That was easy enough. Most of what you see is simple, regular SQL with one new kid on the block: "to_tsvector". The latter is a function that is shipped with PostgreSQL's Full Text Search extension and it does what its name suggests: it takes a string of text and converts it into a tsvector.

As a first argument to this function you can optionally input the full text search configuration you wish the parser to use. The default is "english", so I could have omitted it from the argument list. This configuration holds everything that PostgreSQL will employ to do all of the parsing, including a basic dictionary, stop word list, ... . PostgreSQL has some default settings, which many times are good enough. The 'english' configuration is such an example.

In the next chapter we will delve deep into creating our own configuration, for now just take it for granted.

If we query the result, with a simple select, it will return our newly created vector:

SELECT phrase from phraseTable

Will return:

'big':2 'blue':3,9 'crippl':8 'dolphin':10 'eleph':4 'jump':5

Now remember that I talked about the second kind of value we could store alongside the numeric pointers, the weights? Let us take a deeper look into that now.

First, weights are not mandatory and only give you an extra tool for ranking the results afterwards. They are nothing more then a label you can put on a lexeme to group it together. With weights you could, for example, reflect the structure the original document had. You may wish to put a higher weight on lexemes that come from a title element and a lower weight on those from the body text.

PostgreSQL knows four weight labels A, B, C, D. The lowest in rank being D. In fact, if you do not define any weights to the lexemes inside a tsvector, all of them will implicitly get a D assigned. If all the lexemes in a tsvector carry a D, it is omitted from display when printing the tsvector, simply for readability. The above query result could thus also be written as:

'big':2D 'blue':3D,9D 'crippl':8D 'dolphin':10D 'eleph':4D 'jump':5D

It is exactly the same result, but unnecessarily verbose.

I told you, in the very beginning, that a full text engine does not know or care about the structure of a document, it only sees the words. So how can it then put labels on lexemes based on a document structure that it does not know?

It cannot.

It is your job to provide PostgreSQL with label information when building the tsvector. Up until now we have been working with simple text strings, which contain no hierarchy. If you wish to reflect your original document structure by using weights, you will have to preprocess the document and construct your to_tsvector query manually.

Just for demonstration purposes, we could, of course, assign weights to the lexemes inside a simple text string. The process of weight assignment is trivial. PostgreSQL gives you the appropriately named setweight function for this. This function accepts a tsvector as the first argument and a weight label as the second.

To demonstrate, let me update our record and give all the lexemes in our famous sentence a A weight label:

UPDATE phraseTable SET phrase = setweight(phrase,'A');

If we now query this table, the result will be this:

'big':2A 'blue':3A,9A 'crippl':8A 'dolphin':10A 'eleph':4A 'jump':5A

Simple, right?

One more for fun. What if you wanted to assign different weights to the lexemes? For this, you have to concatenate several setweight functions together. An example query would look something like this:

update phraseTable set phrase= setweight(to_tsvector('the big blue elephant'), 'A') || setweight(to_tsvector('jumped over the'), 'B') || setweight(to_tsvector('crippled blue dolphin.'), 'C');

The result:

'big':2A 'blue':3A,7C 'crippl':6C 'dolphin':8C 'eleph':4A 'jump':5B

Not very usefull, but it demonstrates the principle.

If the documents you wish to index have a fixed structure, many times the table that will hold the tsvectors for these documents will reflect that structure with appropriately named columns. For example, if your document would always have a title, body text and a footer, you could create a table which contains three tsvector type columns, named after each structure type. When you parse the document and construct the query, you could assign all lexemes that will be stored in the title column with an A label, in the body column with a B and in the footer column with a C.

Okay, that is enough about weights. Simply remember that they give you extra power to influence the search result ranking, if needed.

We now have a table with a decent tsvector inside. The data is in, so to speak. But what can we do with it now?

Well, let us try to retrieve and compare it, shall we!

The tsquery

You could, of course, simply retrieve the results stored in a tsvector by doing a SELECT on the column. However, you have no way of filtering out the results using the operators we have seen before (LIKE, ILIKE). Even if you could use them, you would still run into the same kind of problems as before. You would still have a user who will search for a synonym or search for a plural form of the stemmed lexeme actually stored in the vector.

So how do we query it?

Step in tsquery. What is it? It is a data type that gives us extra tools to query the full text search vector.

Pay attention to the fact that we do not call tsquery a set of extra operators but we call it a data type. This is very important to understand. With tsquery we can construct search predicates, which can search through a tsvector type and can employ specially designed indexes to speed up the process.

Let me throw a query at you that will try to find the word "elephants" in our favorite string using tsquery:

SELECT phrase FROM phraseTable WHERE phrase @@ to_tsquery('elephants');

Try it out, this will give you back the same result set we had before. Let me explain what just happened.

As you can see, there is again a new function introduced: to_tsquery and it is almost identical to its to_tsvector counterpart. The function to_tsquery takes one argument, a string containing the tokens (not the words, not the lexemes) you wish to search for.

Let us first look a bit more at this one. Say, for instance, you wish to find two tokens of the sentence inside your database. Your first instinct would be to query this the following way:

SELECT phrase FROM phraseTable WHERE phrase @@ to_tsquery('elephants blue');

Unfortunately, this will throw an error stating there is a syntax error. Why? Because the string your provided as an argument is malformed. The to_tsquery helper function does not accept a simple string of text, it needs a string of tokens separated by operators. The operators at your disposal are the regular & (AND), | (OR) and ! (NOT). Note that the ! operator needs the & or the | operator.

It then goes and creates a true tsquery to retrieve the results. Let us try this query again, but with correct syntax this time:

SELECT phrase FROM phraseTable WHERE phrase @@ to_tsquery('elephants & blue');

Perfect! This will return, once more, the same result as before. You could even use parenthesis inside your string argument to enforce grouping if desired. Like I said before, what this helper function does is translate its input (the tokens in the string) into actual lexemes. After that, it tries to match this result with the lexemes present in the tsvector.

We still have a problem if we would let a user type her or his search string into an interface search box and feed it to to_tsquery, for a user does not know about the operators they need to use. Luckily for us, there is another help function, the plainto_tsquery which takes care of exactly that problem: convert an arbitrary string of text into lexemes.

Let me demonstrate:

SELECT phrase FROM phraseTable WHERE phrase @@ plainto_tsquery('elephants blue');

Notice we did not separate the words with operators, now it is a simple search string. In fact, plainto_tsquery converts it to a list of lexemes separated by an & (AND) operator. The only drawback is that this function can only separate the lexemes with an & operator. If you wish to have something other then the & operator, you will have to stick to to_tsquery.

A word of caution though, the plainto_tsvector may seem interesting, but is most of the time not a general solution for building a higher level search interface. When you are building, say, a web application that contains a full text search box, there are a few more steps between the string entered in that box, and the final query that will be preformed.

Building a web application and safely handling user input that travels to the database is a separate story and way beyond the scope of this post, but you will have to build your own parser that sits between the user input and the query.

If you would play dumb and accept the fact that your interface would only allow to enter a string in the search box (no operators, no grouping, ...) then you still need to send over the user input using query parameters and you need to make sure that the parameter sent over is a string. This, of course, is not really a parser, this is more basic, sane database handling on the web.

As tempting (and simple) it might seem to be to build a query like that, it will probably frustrate your end users. The reason why is because as I mentioned before, the plainto_tsquery accepts a string, but will chop the string into separate lexemes and put the & operator between them. This means that all the words entered by the user (or at least their resulting lexemes) must be found in the string.

Many times, this may not be what you want. Users expect to have their search string interpreted as | (OR) separated lexemes, or users may want the ability to define these operators themselves on the interface.

So, one way or the other, you will have to write your own parser if you want a non-database user to work with your application. This parser looks at the options your present on your search form and will crawl over the user entered string to interpret certain characters not as words to search but as operators or grouping tokens to build your final query.

But enough about web applications, that is not our focus now. Let us continue.

The next, new item you will see in the last few queries is the @@ operator. This operator (also referred to as text-search-matching operator) is also specific to a full text search context. It allows you to match a ts_vector against the results of a ts_query. In our queries we matched the result of a ts_query against a column, but you could also match against a ts_vector on the fly:

SELECT to_tsvector('The blue elephant.') @@ to_tsquery('blue & the');

A nice little detail about the @@ operator is that it can also match against a TEXT or VARCHAR data type, giving you a poor-mans full text capability. Let me demonstrate:

SELECT 'The blue elephant.' :: VARCHAR @@ to_tsquery('blue & the');

This 'on-the-fly' query will generate a VARCHAR string (by using the :: or cast operator) and try to match the tokens blue and the. The result will be t, meaning that a match is found.

Before I continue, it is nice to know that you can always test the result of a ts_query, meaning, test the output of what it will use to find lexemes in the ts_vector. To see that output, you simply call it with the helper function, the same way we called the to_tsvector a while ago:

SELECT to_tsquery('elephants & blue');

This will result in:

'eleph' & 'blue'

It is also important to note that to_tsquery (and plainto_tsquery) too uses a configuration of the same kind to_tsvector uses, for it too has to do the same parsing to find the lexemes of the string or tokens you feed it. So the first, optional argument to to_tsquery is the configuration, this also defaults to "english". This means we could rewrite the query as such:

SELECT to_tsquery('english','elephants & blue');

And we would get back the same results.

Okay, I think this is enough to take in for now. You have got a basic understanding of what full text search means, you know how to construct a vector containing lexemes, pointers and weights. You also know how to build a query data type and perform basic matching to retrieve the text you desire.

In part 2 we will look at how we can dig deeper and setup our own full text search configuration. We will cover fun stuff like:

  • Looking deeper into PostgreSQL's guts
  • Defining dictionaries
  • Building Stop word lists
  • Mapping token categories to our dictionaries
  • Defining our own, super awesome full text configuration
  • And, of course, more dolphin pun...

In the last part we will break open yet another can of full text search goodness and look at:

  • Creating special, full text search indexes
  • Ranking search results
  • Highlighting word inside search results
  • Setting up update triggers for ts_vector records

Hang in there!

And as always...thanks for reading!

Michael Paquier: Postgres 9.4 feature highlight: Creating an output plugin for logical replication

Planet PostgreSQL - 30 April, 2014 - 16:15

Continuing on the series of posts about logical replication and after looking at some basics and the concept of REPLICA IDENTITY for a table, now is time to begin more serious things by having a look at how to develop an output plugin for the logical decoding facility.

Do you remember? Logical decoding is made of roughly two major parts:

  • A decoder generating the database changes, that reads and translate WAL files into a format given by a custom plugin that can be defined when a logical replication slot is created. An example of that in core is the contribution module called test_decoding. This is linked with a replication slot.
  • A receiver, that can consume the changes decoder has created. pg_recvlogical as well as the SQL functions pg_logical_slot_[get|peek]_changes as good examples of that.

In short, in PostgreSQL architecture a logical decoder is a set of callback functions that are loaded from a library listed in shared_preload_libraries when server starts. Those callbacks functions need to be defined through a function called _PG_output_plugin_init and here is their list:

  • startup_cb, called when a replication slot is created or when changes are requested. Put in here things like dedicated memory contexts or option management for example.
  • shutdown_cb, called when a replication slot is not used anymore. You should do here the necessary clean-up actions for things created at startup phase.
  • begin_cb and commit_cb, called when respectively a transaction BEGIN or COMMIT has been decoded. Transaction context is available here so you use this data in the decoding stream as well.
  • change_cb, called when a change on database has occurred. By far this is the most important one to be aware of in my opinion. With this callback is available information about the relation changed in the shape of a Relation entry and data: the old and new tuple as some HeapTupleData and HeapTupleHeaderData entries.

The role of the decoder plugin is to use all the data available and put it in a shape that will be useful for a receiver when it is streamed. Its flexibility is actually what makes this facility really powerful and useful already for many things like rolling upgrades, audit functions, replication, etc.

Note as well that a decoder can also use _PG_init to perform initialization actions when it is loaded. By the way, when looking at implementing your own decoder, the best advice givable is to base your implementation on the existing plugin in PostgreSQL core called test_decoding. This module does not do much in itself, but it is a gold mine for beginners and will save you hours and a lot of tears. It is however preferable to have some knowledge of the APIs of PostgreSQL before jumping in the wild. And this is actually what I did to implement my own decoder, called decoder_raw available in a personal repository on github called pg_plugins (that's actually the repository I used to create a set of templates for backgroud workers when studying them). This plugin is able to decode WAL entries and generate raw SQL queries that stream receivers can use as-is. The code is a bit longer than test_decoding as there is some logic in the WHERE clause of UPDATE and DELETE queries caused by the logic of REPLICA IDENTITY for the DEFAULT and INDEX cases to enforce tuple selectivity on a table's primary key (DEFAULT case) or a not null unique index (INDEX case).

Here is for example how the callback functions are defined:

void _PG_output_plugin_init(OutputPluginCallbacks *cb) { AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit); cb->startup_cb = decoder_raw_startup; cb->begin_cb = decoder_raw_begin_txn; cb->change_cb = decoder_raw_change; cb->commit_cb = decoder_raw_commit_txn; cb->shutdown_cb = decoder_raw_shutdown; }

Then, without entering in the details, here a couple of things that you need to be aware of even if implementation is based on the structure of test_decoding.

First, the relation information in structure Relation contains a new OID field called rd_replidindex that defines the OID if the REPLICA IDENTITY index if it exists. This is either a PRIMARY KEY or a user-defined index. Use and abuse of it! It is really helpful. A common usage of rd_replidindex is to open an index relation on it and then scan indnatts to find the list of column attributes the index refers to. Useful to define a list of keys that can uniquely define a tuple on a remote source for an UPDATE or DELETE change. Here is a way to do that:

int key; Relation indexRel = index_open(relation->rd_replidindex, ShareLock); for (key = 0; key < indexRel->rd_index->indnatts; key++) { int relattr = indexRel->rd_index->indkey.values[key - 1]; /* * Perform an action with the attribute number of parent relation * and tuple data. */ do_action_using_attribute_number(relattr, tupledata); } index_close(indexRel, NoLock);

Then, use a common way to generate the relation name for all the change types (INSERT, UPDATE, DELETE). Something like that will generate a complete relation name with both namespace and table name:

Relation rel = blabla; Form_pg_class class_form = RelationGetForm(rel); appendStringInfoString(s, quote_qualified_identifier( get_namespace_name( get_rel_namespace(RelationGetRelid(rel))), NameStr(class_form->relname)));

Now let's see how this works, with for example the following SQL sequence:

=# -- Create slot =# SELECT slotname FROM pg_create_logical_replication_slot('custom_slot', 'decoder_raw'); slotname ------------- custom_slot (1 row) =# -- A table using DEFAULT as REPLICA IDENTITY and some operations =# CREATE TABLE aa (a int primary key, b text); CREATE TABLE =# INSERT INTO aa VALUES (1, 'aa'), (2, 'bb'); INSERT 0 2 =# UPDATE aa SET b = 'cc' WHERE a = 1; UPDATE 1 =# DELETE FROM aa WHERE a = 1; DELETE 1

And the following output is generated by the plugin.

=# SELECT data FROM pg_logical_slot_peek_changes('custom_slot', NULL, NULL, 'include-transaction', 'on'); data ---------------------------------------------------- BEGIN; COMMIT; BEGIN; INSERT INTO public.aa (a, b) VALUES (1, 'aa'); INSERT INTO public.aa (a, b) VALUES (2, 'bb'); COMMIT; BEGIN; UPDATE public.aa SET a = 1, b = 'cc' WHERE a = 1 ; COMMIT; BEGIN; DELETE FROM public.aa WHERE a = 1 ; COMMIT; (12 rows)

Pretty handy, no? Note that the rows of UPDATE and DELETE queries are identified using the primary key of relation.

Consuming such changes shapped like that is straight-forward: simply use them on another PostgreSQL node that has schema and data consistent with the node decoding the changes before replication slot was marked as active. Here is for example a command using pg_logical_slot_get_changes running periodically that is able to replicate all the changes on an other node listening to port 5433:

psql -At -c "SELECT pg_logical_slot_get_changes('custom_slot', NULL, NULL)" | \ psql -p 5433

This way, the second node replicates all the changes occuring on first node doing the decoding effort. The code of decoder_raw has been released under the PostgreSQL license and is available here. Feel free to use it, feedback is welcome as well.

Joel Jacobson: Garbage Collection of Unused PostgreSQL Tables and Columns

Planet PostgreSQL - 30 April, 2014 - 12:11

Over the last five years, our database at Trustly have kept growing in number of tables, columns and functions, where some of the tables and columns are being used by any database functions any longer. Getting rid of them is important, as otherwise people working with the database will be confused and annoyed. Database developers should always be able to rely on the data model being relevant and up to date.

In our system, no applications access the database tables directly, instead everything goes through stored procedures.

This means, if a table or column name is not present anywhere in any function’s source code, it’s very likely the table/column is not being used by anything. The only exception is if you have dynamically crafted queries executed using EXECUTE, where the table/column names are constructed from different parts. In our system, we thankfully only have a few such cases.

SELECT Tables.TableName FROM ( SELECT DISTINCT regexp_replace(pg_catalog.pg_class.relname,'s$','') FROM pg_catalog.pg_class INNER JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace WHERE pg_catalog.pg_class.relkind = 'r' AND pg_catalog.pg_namespace.nspname NOT IN ('information_schema','pg_catalog') ) Tables(TableName) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.pg_proc WHERE pg_catalog.pg_proc.prosrc ~* Tables.TableName ) ORDER BY Tables.TableName

This query returned quite a lot of table names with about half of them being false positives,
but still a managable list to go through manually.

50 minutes of manual work later:

92 files changed, 1114 deletions(-) DROP TABLE: 16 DROP VIEW: 6 DROP SEQUENCES: 7 DROP FK_CONSTRAINTS: 5 DROP CONSTRAINTS: 17 False positives: 14

Then I moved on to the task of finding unused table columns.
The query below excludes any false positives found in the previous query.

SELECT DISTINCT Columns.ColumnName FROM ( SELECT regexp_replace(pg_catalog.pg_attribute.attname,'id$','') FROM pg_catalog.pg_class INNER JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace INNER JOIN pg_catalog.pg_attribute ON pg_catalog.pg_attribute.attrelid = pg_catalog.pg_class.oid WHERE pg_catalog.pg_class.relkind = 'r' AND pg_catalog.pg_attribute.attnum > 0 AND NOT pg_catalog.pg_attribute.attisdropped AND pg_catalog.pg_namespace.nspname NOT IN ('information_schema','pg_catalog') -- Exclude columns in tables we know are unused by the stored procedures, but we want to keep around anyway: AND pg_catalog.pg_class.relname !~* '^(alexacountrie|bankersworldonline|bindgroup|clearinghousecurrencie|dailystat|geoiporganization|hourlystat|usercommitment|polishbank|polishbanknumber|swedishpostalcode|testbasedata|testperlmodule|useraccesslogarchive)' ) Columns(ColumnName) WHERE NOT EXISTS ( SELECT 1 FROM pg_catalog.pg_proc WHERE pg_catalog.pg_proc.prosrc ~* Columns.ColumnName ) ORDER BY Columns.ColumnName

It took me two hours to go through all code. It was an interesting journey in time with lots of memories.

50 files changed, 5396 insertions(+), 5917 deletions(-) ALTER TABLE DROP COLUMN: 30 False positives: 87

The reason why there were so many insertions and deletions, was because the dropped columns affected some of the base tables with reference data, which had to be regenerated, thus affecting all lines in those files.

In summary, the ROI on those three hours of time invested is enormous. Developers can now feel confident all tables and columns fulfill a purpose in the system. This exercise will of course need to be repeated in the future though.

Jeff Frost: How to start PostgreSQL the Debian way with pg_ctl

Planet PostgreSQL - 30 April, 2014 - 04:04
Sometimes I find myself wanting to start a PostgreSQL cluster with pg_ctl on a Debian or Ubuntu system.  If you've ever tried this, you know that something like this doesn't work:

vagrant@vagrant-ubuntu-trusty-64:/vagrant$ sudo -u postgres /usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main/ start
server starting
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ postgres cannot access the server configuration file "/var/lib/postgresql/9.3/main/postgresql.conf": No such file or directory

because the Debian packages place the config files in /etc/postgresql/<version>/<cluster name>

So, I used to think that this was the way to start it:

vagrant@vagrant-ubuntu-trusty-64:/vagrant$ sudo -u postgres /usr/lib/postgresql/9.3/bin/pg_ctl -D /etc/postgresql/9.3/main/ start
server starting
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ 2014-04-30 00:59:32 UTC LOG:  database system was shut down at 2014-04-30 00:57:49 UTC
2014-04-30 00:59:32 UTC LOG:  database system is ready to accept connections
2014-04-30 00:59:32 UTC LOG:  autovacuum launcher started

And that does generally work, but the proper way to do it is this:

vagrant@vagrant-ubuntu-trusty-64:/vagrant$ sudo -u postgres /usr/lib/postgresql/9.3/bin/pg_ctl -D /var/lib/postgresql/9.3/main/ -o "-c config_file=/etc/postgresql/9.3/main/postgresql.conf" start
server starting
vagrant@vagrant-ubuntu-trusty-64:/vagrant$ 2014-04-30 01:00:22 UTC LOG:  database system was shut down at 2014-04-30 01:00:16 UTC
2014-04-30 01:00:22 UTC LOG:  database system is ready to accept connections
2014-04-30 01:00:22 UTC LOG:  autovacuum launcher started

The real Debian way is to use pg_ctlcluster like so:

vagrant@vagrant-ubuntu-trusty-64:~$ sudo -u postgres pg_ctlcluster 9.3 main start

Devrim GÜNDÜZ: Support for Oracle Enterprise Linux 6 is now available

Planet PostgreSQL - 29 April, 2014 - 15:05
Recently, there is some demand in supporting Oracle Enterprise Linux in the PostgreSQL RPM repository.

Finally I found time to create repository RPMs for Oracle Linux. We support all active PostgreSQL releases (9.3 to 8.4).

Announcement page is here.

Raghavendra Rao: "WARNING: Mismatch found between sl_table and pg_class." in Slony-I

Planet PostgreSQL - 29 April, 2014 - 05:41
WARNING: Mismatch found between sl_table and pg_class. Slonik command REPAIR CONFIG may be useful to rectify this.
2014-04-26 07:32:54 PDT FATAL slon_node_health_check() returned false - fatal health problem!
REPAIR CONFIG may be helpful to rectify this problemYou see this WARNING message in logs and immediate stop of replication, if Slony has observed a mismatch of pg_class.oid and sl_table.tabreloid of a replicating table in a node. Because, by architecture slony holds all replicating objects OID information in its catalogs captured at configure time from pg_class.oid.

In which case pg_class.oid != sl_table.tabreloid ?

Most cases, a node moved its place using pg_dump/pg_restore by causing objects OID to change.

To mimic the above WARNING message, I have used two node replication setup between two database on same cluster[5432] for few tables. (Refer here on how to setup Slony replication). Here's the current OID information on slave node(demo database) for one of the object 'dtest':
demo=# select oid,relfilenode,relname from pg_class where relname='dtest';
oid | relfilenode | relname
26119 | 26119 | detest
(1 row)
demo=# select tab_id,tab_reloid,tab_relname from _rf.sl_table where tab_relname='dtest';
tab_id | tab_reloid | tab_relname
2 | 26119 | dtest
(1 row)
Ok, 'dtest' OID 26119 stored in slony catalog in sl_table.tabreloid.(Slony schema _rf). Take the logical backup and restore of same demo database simply to change the object OID like below: (Remember, Slon process are stopped at this moment)
-bash-4.1$ pg_dump -Fc -p 5432 -U postgres demo >/tmp/demo93.dmp
-bash-4.1$ psql -c "alter database demo rename to demo_bk;"
-bash-4.1$ psql -c "create database demo;"
-bash-4.1$ pg_restore -Fc -p 5432 -U postgres -d demo /tmp/demo93.dmp
-bash-4.1$ psql -c "select oid,relfilenode,relname from pg_class where relname='dtest';"
oid | relfilenode | relname
26640 | 26640 | dtest
(1 row)
-bash-4.1$ psql -c "select tab_id,tab_reloid,tab_relname from _rf.sl_table where tab_relname='dtest';"
tab_id | tab_reloid | tab_relname
2 | 26119 | dtest
(1 row)
Now, pg_class.oid of 'dtest' has changed to 26640 whereas sl_table.tab_reloid still reflects the old OID 26119. At this stage if we start slon process it essentially stops with WARNING message on mismatch of OID by running a query pg_class.oid = sl_table.tabreloid. On returning false result it won't move ahead until its fixed. We can also call the function slon_node_health_check() explicitly for verification :
demo=# select _rf.slon_node_health_check();
WARNING: table [id,nsp,name]=[1,a,public] - sl_table does not match pg_class/pg_namespace
WARNING: table [id,nsp,name]=[2,dtest,public] - sl_table does not match pg_class/pg_namespace
WARNING: table [id,nsp,name]=[3,movepage,public] - sl_table does not match pg_class/pg_namespace
WARNING: Mismatch found between sl_table and pg_class. Slonik command REPAIR CONFIG may be useful to rectify this.
(1 row)
We can fix this in two ways.

  1. Using Slonik command line utility with preamble script REPAIR CONFIG or 
  2. Using Slony catalog function updatereloid() within psql terminal.

Method 1: Create preamble script as below and execute with slonik command. I would be using second method, its just for reference.
demo=# \o /tmp/repair_conf.slonik
demo=# select 'REPAIR CONFIG ( SET ID = '||set_id||', EVENT NODE = 1 );' FROM _rf.sl_set;
demo=# \o

Add nodes information at the beginning of the file "/tmp/repair_conf.slonik"

cluster name = rf;
node 1 admin conninfo = 'host=localhost dbname=postgres user=postgres port=5432 password=postgres';
node 2 admin conninfo = 'host=localhost dbname=demo user=postgres port=5432 password=postgres';


-bash-4.1$ slonik /tmp/repair_conf.slonik
Method 2: Pass the table-set id and node information to a function:
demo=# select _rf.updatereloid(tab_set,2) from _rf.sl_table ;
(3 rows)
Cool, lets check the OID information now on slave node (demo database) from pg_class and _slonycatalog.sl_table
-bash-4.1$ psql -d demo -c "select oid,relfilenode,relname from pg_class where relname='dtest';"
oid | relfilenode | relname
26119 | 26119 | dtest
(1 row)

-bash-4.1$ psql -d demo -c "select tab_id,tab_reloid,tab_relname from _rf.sl_table where tab_relname='dtest';"
tab_id | tab_reloid | tab_relname
2 | 26119 | dtest
(1 row)
After the update, slony will begin syncing without any issues.
Thanks to Slony-I team.


Greg Sabino Mullane: Custom plans prepared statements in PostgreSQL 9.2

Planet PostgreSQL - 28 April, 2014 - 13:00

Image by Flickr user Brett Neilson

Someone was having an issue on the #postgresql channel with a query running very fast in psql, but very slow when using DBD::Pg. The reason for this, of course, is that DBD::Pg (and most other clients) uses prepared statements in the background. Because Postgres cannot know in advance what parameters a statement will be called with, it needs to devise the most generic plan possible that will be usable with all potential parameters. This is the primary reason DBD::Pg has the variable pg_server_prepare. By setting that to 0, you can tell DBD::Pg to avoid using prepared statements and thus not incur the "generic plan" penalty. However, that trick will not be needed much longer for most people: version 9.2 of Postgres added a great feature. From the release notes:

Allow the planner to generate custom plans for specific parameter values even when using prepared statements.

Because the original IRC question involved a LIKE clause, let's use one in our example as well. The system table pg_class makes a nice sample table: it's available everywhere, and it has a text field that has a basic B-tree index. Before we jump into the prepared statements, let's see the three cases of LIKE queries we want to try out: no wildcards, a trailing wildcard, and a leading wildcard. The relname column of the pg_class table is used in the index pg_class_relname_nsp_index. Yes, there is a touch of Hungarian notation in those system catalogs! (Technically relname is type "name", not type "text", but they are identical as far as this example goes).

The first case is a LIKE with no wildcards. When Postgres sees this, it converts it to a simple equality clause, as if the LIKE was an equal sign. Thus, it is able to quite easily use the B-tree index:

test# EXPLAIN SELECT 1 FROM pg_class WHERE relname LIKE 'foobar' QUERY PLAN -------------------------------------------------------------- Index Only Scan using pg_class_relname_nsp_index on pg_class Index Cond: (relname = 'foobar'::name) Filter: (relname ~~ 'foobar'::text)

Now consider the case in which we only know the first part of the word, so we put a wildcard on the end:

test# EXPLAIN SELECT 1 FROM pg_class WHERE relname LIKE 'foo%' QUERY PLAN ---------------------------------------------------------------------- Index Only Scan using pg_class_relname_nsp_index on pg_class Index Cond: ((relname >= 'foo'::name) AND (relname < 'fop'::name)) Filter: (relname ~~ 'foo%'::text)

As we know how the string starts, there is no problem in using the index. Notice how Postgres is smart enough to change the foo% into a range check for anything between foo and fop!

Finally, the most interesting one: the case where we only know the end of the relname, so the wildcard goes in the front:

test# EXPLAIN SELECT 1 FROM pg_class WHERE relname LIKE '%bar' QUERY PLAN ------------------------------------- Seq Scan on pg_class Filter: (relname ~~ '%bar'::text)

In this case, Postgres falls back to a sequential scan of the main table, and does not use the index at all, for it offers no gain. The B-tree is useless, and the entire table must be walked through (this can be worked around by clever use of a reverse clause)

So those are the three potential variations of LIKE. When a prepared statement is created, the argument is unknown and left as a placeholder. In other words, Postgres does not know in advance if we are going to search for 'foobar', 'foo%', '%bar', or something else. Watch what happens when we create a basic prepared statement based on the queries above:

test# PREPARE zz(TEXT) AS SELECT 1 FROM pg_class WHERE relname LIKE $1 PREPARE

The $1 is the parameter that will be passed to this statement when it is executed. Because Postgres has no way of knowing what will be passed in, it must create a plan that can work with all possible inputs. This means using a sequential scan, for as we've seen above, a wildcard at the start of the input requires ones. All the examples using indexes can safely fall back to a sequential scan as well. We can use EXPLAIN EXECUTE to see the plan in action:

test# EXPLAIN EXECUTE zz('%bar'); QUERY PLAN --------------------------- Seq Scan on pg_class Filter: (relname ~~ $1)

As expected, this plan is the only one available for the query given, as the index cannot be used with a leading wildcard. Now for the fun part. Let's put the wildcard on the end, and see what happens on Postgres version 9,1:

test# SELECT substring(version() from '(.+?) on'); PostgreSQL 9.1.13 # EXPLAIN EXECUTE zz('foo%'); QUERY PLAN --------------------------- Seq Scan on pg_class Filter: (relname ~~ $1)

That's really not a good plan! It gets worse:

# EXPLAIN EXECUTE zz('foobar'); QUERY PLAN --------------------------- Seq Scan on pg_class Filter: (relname ~~ $1)

Before version 9.2, the prepared statement's plan was locked in place. This was the cause of many woes, and the reason why programs and functions were "slow" but the same queries were fast on the command line. Enter Tom Lane's commit from September 2011:

Redesign the plancache mechanism for more flexibility and efficiency. Rewrite plancache.c so that a "cached plan" (which is rather a misnomer at this point) can support generation of custom, parameter-value-dependent plans, and can make an intelligent choice between using custom plans and the traditional generic-plan approach. The specific choice algorithm implemented here can probably be improved in future, but this commit is all about getting the mechanism in place, not the policy.

Yes, you read that correctly - new plans can be generated to match the parameters! (In case you were wondering, things have been improved since this commit, as hoped for in the last sentence.) Let's see what happens when we run the exact same prepared statements above, but on Postgres version 9.3:

# SELECT substring(version() from '(.+?) on'); PostgreSQL 9.3.4 test# EXPLAIN EXECUTE zz('%bar'); QUERY PLAN ------------------------------------- Seq Scan on pg_class Filter: (relname ~~ '%bar'::text) test# EXPLAIN EXECUTE zz('foo%'); QUERY PLAN ---------------------------------------------------------------------- Index Only Scan using pg_class_relname_nsp_index on pg_class Index Cond: ((relname >= 'foo'::name) AND (relname < 'fop'::name)) Filter: (relname ~~ 'foo%'::text) test# EXPLAIN EXECUTE zz('foobar'); QUERY PLAN -------------------------------------------------------------- Index Only Scan using pg_class_relname_nsp_index on pg_class Index Cond: (relname = 'foobar'::name) Filter: (relname ~~ 'foobar'::text)

Tada! We have three different plans for the same prepared statement, If you look close, you will see that even the first plan is now a "custom" one, as it has the exact parameter string rather than just $1 as before. The moral of the story: don't settle for anything less than version 9.2 of Postgres!

Josh Berkus: Second Unconference Day at PgCon

Planet PostgreSQL - 28 April, 2014 - 02:02
We will be having our second annual PostgreSQL Unconference on the Saturday after pgCon.  For those of you who weren't there last year, the wiki explains what you do on an unconference day.  We are able to hold this again thanks to the generous sponsorship of Salesforce.com, who continue to support PostgreSQL development.

While the Unconference will be mostly organized at 10am that day, you can get a jump on other session proposers by adding your session proposals to the wiki.  These session ideas will be allowed to first during the pitch session.  I also urge folks who will be attending the Unconference to put your names by session ideas you'd like to attend.

I also need the help of two volunteers to help me with organizing the Unconference on the day of.  This will not prevent you from attending and/or leading sessions; I just need some folks to help me put up signs, organize the schedule, and make sure that notes get taken in each session.  Contact me if you are available to help with this.

Unfortunately, if you weren't already planning to attend the unconference, it's probably too late to change your travel plans; Saturday night rooms in Ottawa are pretty much unavailable due to the marathon.

P.S. if you're into replication and clustering, don't miss the 2014 Clustering Summit and the PostgresXC Pizza Demo on Tuesday of pgCon week!  (the Pizza demo will be at 7pm, location TBD).

Marko Tiikkaja: SQL gotcha of the week

Planet PostgreSQL - 26 April, 2014 - 15:44
SELECT 'foo' IN ('foo' 'bar'); ?column? ---------- f (1 row) Be careful out there.

Oleg Bartunov: New GIN opclass for hstore (Faster and smaller) !

Planet PostgreSQL - 25 April, 2014 - 16:26
We feel a bit uncomfortable after cancellation of nested hstore project for jsonb sake,so we decide to release new hash-based GIN-opclass (from jsonb) for hstore, which greatly improves the performance of @> operator (we expect an order of magnitude). We expect it should works for 9.2+ postgres, but you can test it for older releases too. Patches are welcome. Please, download it from https://github.com/akorotkov/hstore_ops and sent us feedback, so we could officially release it. This is a separate extension, so you need contrib/hstore already installed.
Here is a short example:

create extension hstore;
select hstore(t) as h into proc from pg_proc t;
create index hstore_gin_idx on proc using gin(h);
=# explain analyze select count(*) from proc where h @> 'proargtypes => "2275"';
Aggregate (cost=30.97..30.98 rows=1 width=0) (actual time=0.201..0.201 rows=1 loops=1)
-> Bitmap Heap Scan on proc (cost=20.02..30.96 rows=3 width=0) (actual time=0.094..0.196 rows=76 loops=1)
Recheck Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Rows Removed by Index Recheck: 93
Heap Blocks: exact=50
-> Bitmap Index Scan on hstore_gin_idx (cost=0.00..20.02 rows=3 width=0) (actual time=0.082..0.082 rows=169 loops=1)
Index Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Planning time: 0.067 ms
Execution time: 0.222 ms
(9 rows)

Now, we install hstore_ops

create extension hstore_ops;
create index hstore_gin_hash_idx on proc using gin(h gin_hstore_hash_ops);
=# explain analyze select count(*) from proc where h @> 'proargtypes => "2275"';
Aggregate (cost=18.97..18.98 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
-> Bitmap Heap Scan on proc (cost=8.02..18.96 rows=3 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Recheck Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Heap Blocks:
-> Bitmap Index Scan on hstore_gin_hash_idx (cost=0.00..8.02 rows=3 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: (h @> '"proargtypes"=>"2275"'::hstore)
Planning time: 0.067 ms
Execution time: 0.032 ms
(8 rows)

One can see considerable improvement from 0.222ms to 0.032 ms, but the table proc has only 2744 rows and we need more results from hstore users.

Also, we added support of ?, ?|, ?& operators to gin_hstore_hash_ops (jsonb lacks them), but we didn't tested it and, again, we need your help.

Here is size comparison:

=# \dt+ proc
List of relations
Schema | Name | Type | Owner | Size | Description
public | proc | table | postgres | 1616 kB |
=# \di+
List of relations
Schema | Name | Type | Owner | Table | Size | Description
public | hstore_gin_hash_idx | index | postgres | proc | 232 kB |
public | hstore_gin_idx | index | postgres | proc | 616 kB |
(2 rows)

Notice, that new opclass is not just faster, but also several times less !

Leo Hsu and Regina Obe: Using HStore for Archiving

Planet PostgreSQL - 25 April, 2014 - 07:24

I'm not a big proponent of schemaless designs, but they have their place. One particular place where I think they are useful is for archiving of data where even though the underlying table structure of the data you need to archive is changing, you want the archived record to have the same fields as it did back then. This is a case where I think Hstore and the way PostgreSQL has it implemented works pretty nicely.

Side note: one of the new features of PostgreSQL 9.4 is improved GIN indexes (faster and smaller) which is very often used with hstore data (and the new jsonb type). We're really looking forward to the GIN improvements more so than the jsonb feature. We're hoping to test out this improved index functionality with OpenStreetMap data soon and compare with our existing PostgreSQL 9.3. OpenStreetMap pbf and osm extract loaders (osm2pgsql, imposm) provide option for loading tagged data into PostgreSQL hstore fields, in addition to PostGIS geometry and other attribute fields. So 9.4 enhancements should be a nice gift for OSM data users. More on that later.

Continue reading "Using HStore for Archiving"

gabrielle roth: PDXPUG Lab – Streaming Rep Saturday Part 3 happens in two weeks.

Planet PostgreSQL - 25 April, 2014 - 02:08

When: Thu May 8, 6pm-9pm (dinner provided)
Where: Emma, B-side 6 Building, 5th floor

Yeah, we’re not quite finished yet. But I expect that this will be the last one & we’ll move on to something else next time.

The general plan:
- review of setup
- try this: https://github.com/darkixion/pg_rep_test
- look into the monitoring
- alternatives to the ‘cp’ archiving command example
- failover & recovery

As before, you are welcome to come to this one, even if you missed the first two.

Please come prepared with a laptop with two Postgres clusters of the same version installed. You can do this by having a couple of VMs, linux containers, or just run two instances of Pg on different ports. If you don’t understand how to do that, or you don’t have a laptop of your own, please let me know when you sign up, and we’ll take care of that for you.

Space is limited, so please sign up in advance.

Dinner provided by PDXPUG.
Space provided by Emma.

Michael Paquier: Postgres 9.4 feature highlight: REPLICA IDENTITY and logical replication

Planet PostgreSQL - 24 April, 2014 - 16:53

Among the many things to say about logical replication features added in PostgreSQL 9.4, REPLICA IDENTITY is a new table-level parameter that can be used to control the information written to WAL to identify tuple data that is being deleted or updated (an update being a succession of an insert and a delete in MVCC).

This parameter has 4 modes:

  • USING INDEX index
  • FULL

First let's set up an environment using some of the instructions in a previous post dealing with some basics of logical decoding to set up a server using test_decoding in a replication slot.

=# SELECT * FROM pg_create_logical_replication_slot('my_slot', 'test_decoding'); slotname | xlog_position ----------+--------------- my_slot | 0/16CB0F8 (1 row)

The replication slot used here will be used in combination with pg_logical_slot_get_changes to consume each change of the slot (to compare with pg_logical_slot_peek_changes that can be used to view the changes but not consume them).

In the case of DEFAULT, old tuple data is only identified with the primary key of the table. This data is written into WAL only when at least one column of the primary key is updated. Columns that are not part of the primary key do not have their old value written.

=# CREATE TABLE aa (a int, b int, c int, PRIMARY KEY (a, b)); CREATE TABLE =# INSERT INTO aa VALUES (1,1,1); INSERT 0 1 =# [ ... Clean up of slot information up to now ... ] =# UPDATE aa SET c = 3 WHERE (a, b) = (1, 1); UPDATE 1 =# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); location | xid | data -----------+------+----------------------------------------------------------------- 0/1728D50 | 1013 | BEGIN 1013 0/1728D50 | 1013 | table public.aa: UPDATE: a[integer]:1 b[integer]:1 c[integer]:3 0/1728E70 | 1013 | COMMIT 1013 (3 rows) =# UPDATE aa SET a = 2 WHERE (a, b) = (1, 1); UPDATE 1 =# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); location | xid | data -----------+------+--------------------------------------------------------------------------------------------------------------- 0/1728EA8 | 1014 | BEGIN 1014 0/1728EA8 | 1014 | table public.aa: UPDATE: old-key: a[integer]:1 b[integer]:1 new-tuple: a[integer]:2 b[integer]:1 c[integer]:3 0/1728FF0 | 1014 | COMMIT 1014 (3 rows)

Ît is important to know that REPLICA IDENTITY can only be changed using ALTER TABLE, and that the parameter value is only viewable with '\d+' only if default behavior is not used. Also, after creating a table, REPLICA IDENTITY is set to DEFAULT (Surprise!).

=# \d+ aa Table "public.aa" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+---------+--------------+------------- a | integer | not null | plain | | b | integer | not null | plain | | c | integer | | plain | | Indexes: "aa_pkey" PRIMARY KEY, btree (a, b) =# ALTER TABLE aa REPLICA IDENTITY FULL; ALTER TABLE =# \d+ aa Table "public.aa" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+---------+--------------+------------- a | integer | not null | plain | | b | integer | not null | plain | | c | integer | | plain | | Indexes: "aa_pkey" PRIMARY KEY, btree (a, b) Replica Identity: FULL =# [ ... Replication slot changes are consumed here ... ]

In the case of FULL, all the column values are written to WAL all the time. This is the most verbose, and as well the most resource-consuming mode. Be careful here particularly for heavily-updated tables.

=# UPDATE aa SET c = 4 WHERE (a, b) = (2, 1); UPDATE 1 =# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); location | xid | data -----------+------+---------------------------------------------------------------------------------------------------------------------------- 0/172EC70 | 1016 | BEGIN 1016 0/172EC70 | 1016 | table public.aa: UPDATE: old-key: a[integer]:2 b[integer]:1 c[integer]:3 new-tuple: a[integer]:2 b[integer]:1 c[integer]:4 0/172EE00 | 1016 | COMMIT 1016

On the contrary, NOTHING prints... Nothing. (Note: operation done after an appropriate ALTER TABLE and after consuming replication slot information).

=# UPDATE aa SET c = 4 WHERE (a, b) = (2, 1); UPDATE 1 =# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); location | xid | data -----------+------+----------------------------------------------------------------- 0/1730F58 | 1018 | BEGIN 1018 0/1730F58 | 1018 | table public.aa: UPDATE: a[integer]:2 b[integer]:1 c[integer]:4 0/1731100 | 1018 | COMMIT 1018

Finally, there is USING INDEX, which writes to WAL the values of the index defined with this option. The index needs to be unique, cannot contain expressions and must contain NOT NULL columns.

=# ALTER TABLE aa ALTER COLUMN c SET NOT NULL; ALTER TABLE =# CREATE unique INDEX aai on aa(c); CREATE INDEX =# ALTER TABLE aa REPLICA IDENTITY USING INDEX aai; ALTER TABLE =# [ ... Consuming all information from slot ... ] =# UPDATE aa SET c = 5 WHERE (a, b) = (2, 1); UPDATE 1 =# SELECT * FROM pg_logical_slot_get_changes('my_slot', NULL, NULL); location | xid | data -----------+------+-------------------------------------------------------------------------------------------------- 0/1749A68 | 1029 | BEGIN 1029 0/1749A68 | 1029 | table public.aa: UPDATE: old-key: c[integer]:4 new-tuple: a[integer]:2 b[integer]:1 c[integer]:5 0/1749D40 | 1029 | COMMIT 1029 (3 rows)

Note that in this case the primary key information is not decoded, only the NOT NULL column c that the index covers.

REPLICA IDENTITY should be chosen carefully for each table of a given application, knowing that for example FULL generates an extra amount of WAL that may not be necessary, NOTHING may forget about essential information. In most of the cases, DEFAULT provides a good cover though.

Hans-Juergen Schoenig: The power of response times

Planet PostgreSQL - 24 April, 2014 - 11:41
The goal of the PostgreSQL optimizer is to provide you with a plan, which executes as fast as possible and returns all the data as rapidly as possible. In short: The game is about overall execution time of queries. However, there are cases in which you are not interested in receiving the entire result as […]

Dinesh Kumar: N-Node Mutlimaster Replication With Bucardo...!

Planet PostgreSQL - 24 April, 2014 - 11:12
Our team recently got  a problem, which is to solve the N-Node multi master replication in PostgreSQL.

We all know that, there are some other db engines like Postgres-XC which works in this way. But, we don't have any tool available in PostgreSQL, except Bucardo.

Bucardo is the nice solution for 2-Nodes. Is there a way we can exceed this limitation from 2 to N..?

As an initial step on this, I have done with 3 Nodes, which I believe, we can extend this upto N. { I might be wrong here.}

Please follow the below steps to set up the 1 - 1 multi master replication.

1. Follow the below steps to get all the pre-requisites for the Bucardo.

yum install perl-DBIx-Safe
apt-get install libdbix-safe-perl

Install the below components from CPAN.

boolean (Bucardo 5.0 and higher)

Download the latest tarball from here.

tar xvfz Bucardo-4.4.8.tar.gz
cd Bucardo-4.4.8
perl Makefile.PL
sudo make install
2. We need to create plperl extension in db. For this, download the required active perl to set up or simply do as below.
apt-get install postgresql-plperl-9.3
yum install postgresql-plperl-9.3

Make a copy of Old $PGLIBPATH/plperl.so
Move the new plperl.so to $PGLIBPATH/plperl.so
3. plperl extension.
Create extension plperl;
4. Create 3 databases like "node1", "node2", "node3".
5. Execute below statements on 2 databases (node1, node2).
INSERT INTO test1 VALUES(generate_series(1, 10));
6. Install Bucardo catalog database using the below command.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl install
Postgres version is: 9.3
Attempting to create and populate the bucardo database and schema
Database creation is complete
Connecting to database 'bucardo' as user 'bucardo'
Updated configuration setting "piddir"
Installation is now complete.
7. Adding databases, those will be part of mutli-master replication.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add db db1 dbname="node1"
Added database "db1"
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add db db2 dbname="node2"
Added database "db2"
8. Creating a herd of db1 tables.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add all tables db=db1 herd=db1_herd
Creating herd: db1_herd
New tables added: 1
Already added: 0
9. Creating a sync of this herd from db1->db2.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add sync db1_to_db2 type=pushdelta source=db1_herd targetdb=db2
Added sync "db1_to_db2"
10. Creating a herd of db2 tables.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add all tables db=db2 herd=db2_herd
Creating herd: db2_herd
New tables added: 1
Already added: 0
11. Creating a sync of this herd from db2->db1.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add sync db2_to_db1 type=pushdelta source=db2_herd targetdb=db1
Added sync "db2_to_db2"
12. Start the Bucardo process. Default location for the Bucardo instance is SYSLOG.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl start
Checking for existing processes
Removing /tmp/fullstopbucardo
Starting Bucardo

postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ps -ef|grep bucardo
postgres 2740 1128 0 21:26 ? 00:00:00 postgres: bucardo bucardo [local] idle
postgres 2743 1128 0 21:26 ? 00:00:00 postgres: bucardo node1 [local] idle
13. Check the DML activities between these two nodes.
node1=# INSERT INTO test1 VALUES(-1);
node1=# \c node2
node2=# SELECT * FROM test1 WHERE t=-1;
(1 row)

node2=# INSERT INTO test1 VALUES(-2)
node2=# \c node1
node1=# SELECT * FROM test1 WHERE t=-2;
(1 row)
14. Check the status from the below command.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl status
Days back: 3  User: bucardo  Database: bucardo  PID of Bucardo MCP: 2739
Name       Type  State PID  Last_good Time  I/U/D Last_bad Time
db1_to_db2| P   |idle |2746|7m49s    |0s   |1/0/0|unknown |   
db2_to_db1| P   |idle |2745|6m11s    |0s   |1/0/0|unknown |   
It seems everything is perfect, and data is replicating from node1 to node2 and vice-versa. Now, let's try to add a new node, "node3" as part of this mutli-master replication. 15. Add new replication database to Bucardo.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add db db3 dbname="node3"
Added database "db3"
16. Dump and restore of test1 table from either node1 or node2. 17. Create a herd of db3 tables
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add all tables db=db3 herd=db3_herd
Creating herd: db3_herd
New tables added: 1
Already added: 0
18. Create a new sync from db2->db3, db3->db2.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add sync db2_to_db3 type=pushdelta source=db2_herd targetdb=db3
Added sync "db2_to_db3"
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl add sync db3_to_db2 type=pushdelta source=db3_herd targetdb=db2
Added sync "db3_to_db2"
19. Restart the Bucardo to check for the latest node status.
postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl stop
Creating /tmp/fullstopbucardo ... Done

postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl start
Checking for existing processes
Removing /tmp/fullstopbucardo
Starting Bucardo

postgres@dinesh-VirtualBox:~/Bucardo-4.5.0$ ./bucardo_ctl status
Days back: 3  User: bucardo  Database: bucardo  PID of Bucardo MCP: 2553
Name       Type  State PID  Last_good Time  I/U/D Last_bad Time
db1_to_db2| P   |idle |2563|1s       |0s   |0/0/0|unknown |   
db2_to_db1| P   |idle |2560|1s       |0s   |0/0/0|unknown |   
db2_to_db3| P   |idle |2561|0s       |0s   |1/0/1|unknown |   
db3_to_db2| P   |idle |2562|0s       |0s   |0/0/0|unknown |   
20. Here is the small hack, which makes you to embed this db3 into the multi master replication. As Burcardo replication system works based on triggers, we need to make all the triggers what it created on nodes to "ENABLE ALWAYS" for each table.
node1=# \d test1
     Table "public.test1"
 Column |  Type   | Modifiers
 t      | integer | not null
    "test1_pkey" PRIMARY KEY, btree (t)
    bucardo_add_delta AFTER INSERT OR DELETE OR UPDATE ON test1 FOR EACH ROW EXECUTE PROCEDURE bucardo.bucardo_add_delta_t()
    bucardo_triggerkick_db1_to_db2 AFTER INSERT OR DELETE OR UPDATE OR TRUNCATE ON test1 FOR EACH STATEMENT EXECUTE PROCEDURE bucardo.bucardo_triggerkick_db1_to_db2()

node1=# ALTER TABLE test1 ENABLE ALWAYS TRIGGER bucardo_add_delta;
node1=# ALTER TABLE test1 ENABLE ALWAYS TRIGGER bucardo_triggerkick_db1_to_db2;

Do "ENABLE ALWAYS" these two triggers on this table. Similarly, do the same activity on all nodes. Testing Multi-Master replication.
node3=# INSERT INTO test1 VALUES(-1010);
node3=# \c node2
node2=# SELECT * FROM test1 WHERE t=-1010;
(1 row)

node2=# \c node1
node1=# SELECT * FROM test1 WHERE t=-1010;
(1 row)

node2=# INSERT INTO test1 VALUES(-2020); 
node2=# \c node3
node3=# SELECT * FROM test1 WHERE t=-2020;
(1 row)

node3=# \c node1
node1=# SELECT * FROM test1 WHERE t=-2020;
(1 row)Seems Bucardo is doing 3 node mulit master replication. Thanks to Bucardo Team. :)

--Dinesh Kumar
Syndicate content