Feed aggregator

Ian Barwick: A Foreign Data Wrapper (FDW) for Firebird

Planet PostgreSQL - 1 January, 2014 - 13:28

Well I've finally fulfilled one of my 2013 New Year's resolutions, which was to write some kind of foreign data wrapper , so with no further ado here is the initial implementation . It is basically working but is still very much in the proof-of-concept/unstable/may-eat-your-data stage. Having said that, if anyone has a practical use for this I'd be very interested to hear about it - my main motivation for creating this FDW is to learn more about PostgreSQL internals.

more...

gabrielle roth: Streaming Rep Saturday

Planet PostgreSQL - 30 December, 2013 - 23:00

We’re having a lab day!

When: Sat Jan 25, 10am-2pm (lunch provided)
Where: Emma, B-side 6 Building, 5th floor

Come experiment with streaming rep in a friendly environment with a delicious lunch and beverages, instead of in a crisis environment with your boss breathing down your neck.

Agenda, as much as we ever have one:
- Discussion of the various configurations (streaming rep, hot standby, etc)
- Actual setup
- Is it working?
- Discussion of results
- If time allows, we’ll come up with creative ways to break it and look at what’s required to recover.

As for previous labs, we have limited space. Please sign up here.

Please come prepared with a laptop with two VMs, both with the latest version of Postgres installed. If you don’t have a laptop, we can set up VMs for you on the community equipment – please indicate that when you sign up, because we need some lead time to do that.

If you have any tips, tricks, questions, or horror stories, please bring those, too!

Lunch provided by EnterpriseDB.
Space provided by Emma.


Vasilis Ventirozos: pgreplay-ing logfiles

Planet PostgreSQL - 30 December, 2013 - 20:50
Recently I wanted to find something to rerun a workload (from logfiles) in postgres. the reasons ? benchmark , testing , troubleshooting.
My colleague Denish reminded me a project called pgreplay. pgreplay reads a PostgreSQL logfile , extracts the SQL statements and executes them in the same order and with (or without) the original timing against a PG database.

I download it, compiled it and went straight to the testing.

The setup :

log_min_messages = error  (or more)
   (if you know that you have no cancel requests, 'log' will do)
log_min_error_statement = log  (or more)
log_connections = on
log_disconnections = on
log_line_prefix = '%m|%u|%d|%c|'  (if you don't use CSV logging)
log_statement = 'all'
lc_messages must be set to English (the encoding does not matter)
bytea_output = escape  (from version 9.0 on, only if you want to replay
                        the log on 8.4 or earlier)

It is highly recommended that you use CSV logging, because anything that
the PostgreSQL server or any loaded modules write to standard error will
be written to the stderr log and might confuse the parser.


For small and simple workloads it seemed that it worked fine. Bellow the output from a simple workload :

Replay statistics
=================

Speed factor for replay: 1.000
Total run time: 29.700 seconds
Maximum lag behind schedule: 0 seconds
Calls to the server: 8
(0.269 calls per second)
Total number of connections: 1
Maximum number of concurrent connections: 1
Average number of concurrent connections: 1.000
Average session idle percentage: 99.680%
SQL statements executed: 6
(0 or 0.000% of these completed with error)
Maximum number of concurrent SQL statements: 1
Average number of concurrent SQL statements: 0.003
Average SQL statement duration: 0.016 seconds
Maximum SQL statement duration: 0.038 seconds
Statement duration histogram:
0 to 0.02 seconds: 83.333%
0.02 to 0.1 seconds: 16.667%
0.1 to 0.5 seconds: 0.000%
0.5 to 2 seconds: 0.000%
over 2 seconds: 0.000%


So far so good, i wanted to see what will happen when concurrency occurred so i tried with pgbench , with 2 concurrent everything was ok, so i tried to scale it up a bit to 16 concurrent transactions, and then these nice messages started showing :
"Execution is 30 minutes behind schedule"

The reason ? Locks Lots and lots of Locks ! but let's see closer what happened...
While pgreplay was running trying to handle my logfile the best way it could, a simple lock investigating query was enough to show me that i would never get the successful prompt that i wanted :

monkey=#   SELECT bl.pid     AS blocked_pid,
monkey-#          a.usename  AS blocked_user,
monkey-#          kl.pid     AS blocking_pid,
monkey-#          ka.usename AS blocking_user,
monkey-#          a.query    AS blocked_statement
monkey-#    FROM  pg_catalog.pg_locks         bl
monkey-#     JOIN pg_catalog.pg_stat_activity a  ON a.pid = bl.pid
monkey-#     JOIN pg_catalog.pg_locks         kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid
monkey-#     JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
monkey-#    WHERE NOT bl.granted;

 blocked_pid | blocked_user | blocking_pid | blocking_user |                          blocked_statement                          
-------------+--------------+--------------+---------------+----------------------------------------------------------------------
        7398 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7409 | vasilis      |         7404 | vasilis       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;

Deadlock, the reason ?
(quote from pgreplay documentation) :

While pgreplay makes sure that commands are sent to the server in the order in which they were originally executed, there is no way to guarantee that they will be executed in the same order during replay: Network delay, processor contention and other factors may cause a later command to "overtake" an earlier one. While this does not matter if the commands don't affect each other, it can lead to SQL statements hitting locks unexpectedly, causing replay to deadlock and "hang". This is particularly likely if many different sessions change the same data repeatedly in short intervals.

And that was my case, lots of processes were changing the same data.
For the record, i also set up a simple troubleshooting project that works on postgres, with 2 users doing their own thing pgreplay worked fine.

conclusion :
Ok , pgreplay didn't do the job for me, i am pretty sure that if i got the logs from a production DB i would run into deadlocks, still messing a bit with it was good because i could use this for replaying simple workload logfiles to a different machine or just because its a cool project that does what it does and you never know when you might need something like this..



Hans-Juergen Schoenig: Fed up with long WHERE-clauses?

Planet PostgreSQL - 30 December, 2013 - 15:50
SQL is a very easy to use language and it is pretty easy to make things work out of the box. In fact, from my point of view simplicity is one of the core advantages of SQL in general. But, what if you want to compare dozens of columns with each other? It is not […]

Magnus Hagander: Nordic PGDay 2014 - come join us up north!

Planet PostgreSQL - 28 December, 2013 - 16:55

It's been discussed for several years, so I'm very happy to finally be able to announce that next year, there will be a Nordic PGDay, in Stockholm, Sweden, on March 20th. It will follow a pattern that is common to those of you that have already visited other PostgreSQL event - a combination of great talks from PostgreSQL insiders, and a hallway track to meet and talk with other PostgreSQL people. As this is our first instance in the Nordics, it will be a single-track single-day event, but we hope to grow it in the future.

Our main target audience is in the Nordics, and we hope to see a lot of visitors not just from Sweden but also from our neighboring countries. Since this means the official language of the conference will be English, we will of course also welcome both visitors and speakers from the rest of Europe and the world as well!

We have not yet opened our call for papers, but plan to do so shortly. We have also not opened registration yet, as we do not know exactly what the entrance cost will be yet. This will be depending on how many sponsors we can sign up for the event - which we have also not opened up for yet (a pattern is emerging here). But the date and venue is set, so mark your calendars and start moving other events around so you can be sure to make it...

If your company is interested in sponsoring the event, please let us know at sponsors@nordicpgday.org, and we'll be sure to send you our sponsorship information as soon as it's available!

For the rest of you, keep up with our website or follow us on Twitter for updates!

See you in Stockholm in March next year!

Pavel Stehule: new PostgreSQL extension - plpgsql_check

Planet PostgreSQL - 27 December, 2013 - 21:54
Three years ago I worked on complex project, that was written in PL/pgSQL almost. I wrote a regress tests, but these tests was not able to find all errors due missing coverage some paths. So I wrote simple extension, that ensured a plan preparation for all embedded SQL on start. This extension helps me to find 99% bugs. I cleaned this extension and I published it as plpgsql_lint. As a plpgsql developer I am sure, so this functionality is missing in core, so I started in core implementation. Two years later I am still on start. I didn't find how to merge a requested functionality with current code cleanly (so committers will be happy) - probably we have to do some significant plpgsql refactoring first. And now is clean, so this code will not be in prepared release PostgreSQL 9.4. Although a code is not enough for committers, it can be useful for common plpgsql developers.

I started new github project - plpgsql_check. It is plpgsql_lint (next generation). plpgsql_lint checked functions on start. plpgsql_check is coming with plpgsql_check_function. You can check any plpgsql function without execution. It can be joined with event trigger and you can ensure a extended check after function creating.

examples:
postgres=# LOAD 'plpgsql';
LOAD
postgres=# CREATE EXTENSION plpgsql_check;
LOAD
postgres=# CREATE TABLE t1(a int, b int);
CREATE TABLE

postgres=#
CREATE OR REPLACE FUNCTION public.f1()
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE r record;
BEGIN
FOR r IN SELECT * FROM t1
LOOP
RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column
END LOOP;
END;
$function$;

CREATE FUNCTION

postgres=# select f1(); -- execution doesn't find a bug due empty table t1
f1
────

(1 row)

postgres=# \x
Expanded display is on.
postgres=# select * from plpgsql_check_function_tb('f1()');
─[ RECORD 1 ]───────────────────────────
functionid │ f1
lineno │ 6
statement │ RAISE
sqlstate │ 42703
message │ record "r" has no field "c"
detail │ [null]
hint │ [null]
level │ error
position │ 0
query │ [null]

postgres=# \sf+ f1
CREATE OR REPLACE FUNCTION public.f1()
RETURNS void
LANGUAGE plpgsql
1 AS $function$
2 DECLARE r record;
3 BEGIN
4 FOR r IN SELECT * FROM t1
5 LOOP
6 RAISE NOTICE '%', r.c; -- there is bug - table t1 missing "c" column
7 END LOOP;
8 END;
9 $function$

select * from plpgsql_check_function('f1()', fatal_errors := false);
plpgsql_check_function
------------------------------------------------------------------------
error:42703:4:SQL statement:column "c" of relation "t1" does not exist
Query: update t1 set c = 30
-- ^
error:42P01:7:RAISE:missing FROM-clause entry for table "r"
Query: SELECT r.c
-- ^
error:42601:7:RAISE:too few parameters specified for RAISE
(7 rows)

postgres=# select * from plpgsql_check_function('fx()', format:='xml');
plpgsql_check_function
────────────────────────────────────────────────────────────────
<function oid="16400"> ↵
<issue> ↵
<level>error</level> ↵
<sqlstate>42P01</Sqlstate> ↵
<message>relation "foo111" does not exist</Message> ↵
<stmt lineno="3">RETURN</Stmt> ↵
<query position="23">SELECT (select a from foo111)</Query>↵
</Issue> ↵
</Function>
(1 row)

Michael Paquier: Postgres 9.4 feature highlight: delayed standbys

Planet PostgreSQL - 25 December, 2013 - 10:20
Postgres 9.4 has added a new feature allowing to delay WAL replay on standbys or even to delay recovery by a given amount of time. It has been introduced by this commit. commit 36da3cfb457b77a55582f68208d815f11ac1399e Author: Simon Riggs Date: Thu Dec 12 10:53:20 2013 +0000   Allow time delayed standbys and recovery   Set min_recovery_apply_delay to [...]

Hans-Juergen Schoenig: PostgreSQL shutdown modes

Planet PostgreSQL - 23 December, 2013 - 16:34
When doing trainings or consulting sessions I got the feeling that most people are not fully aware of PostgreSQL’s shutdown modes. However, knowing about this functionality is pretty important. Without this information sysadmis may cause unintended damage and issues. This can easily be avoided. 3 shutdown modes When doing pg_ctl –help the system will reveal […]

Hubert 'depesz' Lubaczewski: OmniPITR 1.3.2

Planet PostgreSQL - 22 December, 2013 - 20:58
Otto Bretz reported bug in OmniPITR. The bug was that when using -dr (remote destinations for backups) – you couldn't use –skip-xlogs. Obvious overlook on my side. Fix was trivial, and so 1.3.2 version was born.

Leo Hsu and Regina Obe: Using wget directly from PostgreSQL using COPY FROM PROGRAM

Planet PostgreSQL - 22 December, 2013 - 07:12

One of the features new to PostgreSQL 9.3 is the COPY FROM PROGRAM sql construct and in psql the equivalent \copy from program. Michael Paquier covered the COPY TO/FROM PROGRAM in hist Postgres 9.3 feature highlight: COPY TO/FROM PROGRAM. Depesz covered the companion psql construction in Support for piping copy to from an external program. Michael demonstrated an example using curl. I wanted to try something similar using wget since I have wget readily available on all my Linux and Unix boxes. For this example I'll demonstrate doing it on windows, but doing it on Linux is much the same and simpler since the wget and curl are generally already in Linux default path.


Continue reading "Using wget directly from PostgreSQL using COPY FROM PROGRAM"

Hans-Juergen Schoenig: Tracking changes in PostgreSQL

Planet PostgreSQL - 20 December, 2013 - 12:41
Database security is always an essential issue in any database application. Especially when critical data is stored, it might be interesting to know who has changed which data when and how. To track those changes made to tables in PostgreSQL you can write yourself a generic changelog trigger. The easiest way to do that is […]

Pavel Stehule: some basic PostgreSQL orchestration from Bash

Planet PostgreSQL - 20 December, 2013 - 12:32
I had to designed some scripts for migration, upgrade and rebuild PostgreSQL. Some fragments can be useful for others:

Prolog:
#!/bin/bash

if [ ! -n "$PG_DATA" ]; then
PG_DATA=/usr/local/pgsql/data
fi

Ensure a stop of PostgreSQL, continue when server is down:
su postgres -c "cd; pg_ctl status -D $PG_DATA"
if [ $? != 1 ]; then
su postgres -c "cd;pg_ctl stop -D $PG_DATA -m fast"
STATUS=$?
if [ $STATUS != 1 -a $STATUS != 0 ]; then
su postgres -c "cd;pg_ctl status -D $PG_DATA"
while [ $? != 1 ]; do
sleep 1
su postgres -c "cd;pg_ctl status -D $PG_DATA"
done
fi
fi

Start server and continue when server is ready to accept connection:
/etc/init.d/pgsql start
/usr/local/pgsql/bin/pg_isready
while [ $? != 0 ]; do
sleep 1;
/usr/local/pgsql/bin/pg_isready;
done

Hubert 'depesz' Lubaczewski: Waiting for 9.4 – Allow time delayed standbys and recovery

Planet PostgreSQL - 20 December, 2013 - 05:08
On 12th of December, Simon Riggs committed patch: Allow time delayed standbys and recovery   Set min_recovery_apply_delay to force a delay in recovery apply for commit and restore point WAL records. Other records are replayed immediately. Delay is measured between WAL record time and local standby time.   Robert Haas, Fabrízio de Royes Mello and […]

Joshua Tolley: Database federation performance showdown

Planet PostgreSQL - 19 December, 2013 - 22:02

Flickr user garryknight

The PostgreSQL Foreign Data Wrapper has gotten a fair bit of attention since its release in PostgreSQL version 9.3. Although it does much the same thing the dblink contrib module has long done, it is simpler to implement for most tasks and reuses the same foreign data wrapper infrastructure employed by several other contrib modules. It allows users to "federate" distinct PostgreSQL databases; that is, it allows them to work in combination as though they were one database. This topic of database federation has interested me for some time -- I wrote about it a couple years ago -- and when postgres_fdw came out I wanted to see how it compared to the solution I used back then.

First, some background. The key sticking point of database federation that I'm focused on is transaction management. Transactions group a series of steps, so either they all complete in proper sequence, or none of them does. While lots of databases, and other technologies like messaging servers, can handle transactions that involve only one service (one database or one messaging server instance, for example), federation aims to allow transactions to span multiple services. If, for instance, given a transaction involving multiple databases, one database fails to commit, all the other databases in the transaction roll back automatically. See my post linked above for a more detailed example and implementation details. In that post I talked about the Bitronix transaction manager, whose job is to coordinate the different databases and other services in a transaction, and make sure they all commit or roll back correctly, even in the face of system failures and other misbehavior. There are other standalone transaction managers available. I used Bitronix simply because a knowledgeable friend recommended it, and it proved sufficient for the testing I had in mind.

So much for introduction. I wanted to see how Bitronix compared to postgres_fdw, and to get started I took the simple sequence of queries used by default by pgbench, and created a test database with pgbench, and then made three identical copies of it (named, of course, athos, porthos, aramis, and dartagnan -- I wasn't energetic enough to include the apostrophe in the name of the fourth database). The plan was to federate athos and porthos with Bitronix, and aramis and dartagnan with postgres_fdw. More precisely, the pgbench test schema consists of a
small set of tables representing a simple banking scenario. In its default benchmark, pgbench selects from, inserts into, and updates these tables with a few simple queries, shown below. Like pgbench, my test script replaces identifiers starting with a ":" character with values selected randomly for each iteration.

UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; SELECT abalance FROM pgbench_accounts WHERE aid = :aid; UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

I decided to configure my test as though pgbench's "accounts" table was in one database, and the "tellers", "branches", and "history" tables were in another. For the Bitronix test I can simply connect to both databases and ignore the tables that aren't applicable, but for testing postgres_fdw I need to set up dartagnan's pgbench_accounts table as a foreign table in the aramis database, like this:

aramis=# drop table pgbench_accounts; DROP TABLE aramis=# create server dartagnan foreign data wrapper postgres_fdw options (dbname 'dartagnan'); CREATE SERVER aramis=# create user mapping for josh server dartagnan options (user 'josh'); CREATE USER MAPPING aramis=# create foreign table pgbench_accounts (aid integer not null, bid integer, abalance integer, filler character(84)) server dartagnan; CREATE FOREIGN TABLE

The test script I wrote has two modes: Bitronix mode, and postgres_fdw mode. For each, it repeats the pgbench test queries a fixed number of times, grouping a certain number of these iterations into a single transaction. It then changes the number of iterations per transaction, and repeats the test. In the end, it gave me the following results, which I found very interesting:

The results show that for small transactions, postgres_fdw performs much better. But when the transactions get large, Bitronix catches up and takes the lead. The graph shows a curve that may be part of an interesting trend, but it didn't seem worthwhile to test larger numbers of iterations per single transaction, because the larger transactions in the test are already very large compared to typical real-life workloads. It's difficult to see exactly what's going on in the center of the graph; here's a log rescaling of the data to make it clear what the numbers are up to.

All in all, it doesn't surprise me that postgres_fdw would be faster than Bitronix for small and medium-sized transactions. Being more tightly coupled to PostgreSQL, it has a faster path to get done what it wants to do, and in particular, isn't restricted to using two-phase commit, which is generally considered slow. I was surprised, however, to see that Bitronix managed to catch up for very large transactions.

Hubert 'depesz' Lubaczewski: Waiting for 9.4 – Improve EXPLAIN to print the grouping columns in Agg and Group nodes.

Planet PostgreSQL - 19 December, 2013 - 04:56
On 12th of December, Tom Lane committed patch: Improve EXPLAIN to print the grouping columns in Agg and Group nodes.   Per request from Kevin Grittner. Well, the description is pretty simple, so let's just see it in some example: Before: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=63239.58..65800.66 ROWS=102443 width=887) (actual TIME=3749.183..19549.198 ROWS=117 loops=1) -> Sort (cost=63239.58..63495.69 […]

Vasilis Ventirozos: A Problem Geeked Into Big Data

Planet PostgreSQL - 19 December, 2013 - 02:12
Today my article got published in OmniTI Seeds. it was a very interesting project, i had a lot of fun while doing it, and even if its not exactly Postgres related, I wouldn't be able to do it without Postgres. It begun from the basic everyday operation that all dba's are doing (log parsing / analysis) but it got to a whole different level. I hope you like it.

A Problem Geeked Into Big Data

Jim Mlodgenski: PGConf NYC 2014 Call for Proposals

Planet PostgreSQL - 18 December, 2013 - 21:43

PGConf NYC 2014 will be on April 3rd to 4th, 2014 at the New York Marriott Downtown.

See http://nyc.pgconf.us/2014

PGConf NYC 2014 focuses on PostgreSQL usage – how do you make optimal use of PostgreSQL in your daily life? We want you to share your experiences using PostgreSQL – your uses cases, experiences, deployment strategies, tools you’ve used in and around PostgreSQL – anything that you feel you would like to highlight to the PostgreSQL community!

Some topics to consider:

  • How to leverage new and/or innovative PostgreSQL features
  • Developing applications with PostgreSQL
  • Strategies for running PostgreSQL in production
  • Case studies of PostgreSQL on deployments
  • Managing your development process with PostgreSQL
  • How to use PostgreSQL extensions, e.g. PostGIS
  • PostgreSQL integrations with other database, i.e. leveraging foreign data wrappers
  • PostgreSQL Scalability + Replication
  • Accessible Insights into PostgreSQL Internals
  • Scalability strategies with PostgreSQL

Our call for proposals is open until January 10th, 2014, 12:00pm EST.

You can submit your proposal at:
http://nyc.pgconf.us/2014/submit/

Vasilis Ventirozos: cooking with fdw

Planet PostgreSQL - 18 December, 2013 - 20:45
Back in 9.1 a feature called foreign data wrappers was introduced, you could basically connect a PostgreSQL Server to remote data stores (other dbs, csv files etc). Today i plan to play with this FDW and see what and how we can use it.

First of all there are many FDW's, a list can be found here.

For the sake of this post i used the following data wrappers :

     Name     | Version |   Schema   |                    Description                    
--------------+---------+------------+----------------------------------------------------
 file_fdw     | 1.0     | public     | foreign-data wrapper for flat file access
 postgres_fdw | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 redis_fdw    | 1.0     | public     | Foreign data wrapper for querying a Redis server

file_fdw and postgres_fdw are both available in contrib, redis fdw can be found in the following link. you will also need to compile and install hiredis which can be found here. I wont go much into detail on how to install this, but there is plenty documentation available.

The plan is to connect a redis server a remote postgreSQL server and a csv and interact with everything.

First thing that has to be done after the extensions have been created is to create foreign servers and tables, the following example shows what i did with the remote postgreSQL server :

test_redis=# CREATE SERVER node_2
FOREIGN DATA WRAPPER postgres_fdw  
OPTIONS (host '192.168.0.3', dbname 'postgres');     
CREATE SERVER
test_redis=# CREATE USER MAPPING FOR PUBLIC SERVER node_2;
CREATE USER MAPPING
test_redis=# CREATE FOREIGN TABLE test_pg (id int)
SERVER node_2 OPTIONS (table_name 'test_pg');
CREATE FOREIGN TABLE

csv and redis foreign tables were really similar to create, so when i was done i had this :

 public | db15       | foreign table | vasilis
 public | pgcsv      | foreign table | vasilis
 public | test       | table         | vasilis
 public | test2_pg   | foreign table | vasilis





one local table and one for each of my foreign wrappers. So, lets start playing around.

test_redis=# begin;
BEGIN
test_redis=# insert into test2_pg values ('koko','lala');
INSERT 0 1
test_redis=# commit;
COMMIT


Yep, i could insert into a remote PG server, in a transactional way, goodbye dblink ? (oh yes!)
Unfortunately this wouldn't work on redis, but only because the fdw doesn't support it (yet?) .
Now lets join them all and see how that works.

test_redis=# select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
 key  | value | id | key  |    date    |   time   | num | key  | value | key  | value 
------+-------+----+------+------------+----------+-----+------+-------+------+--------
 koko | koko1 |  1 | koko | 2013-12-01 | 00:01:01 | 234 | koko | lala  | koko | ninini

the statement is not something that is actually usable, but it demonstrates by the execution plan that all foreign tables have been accessed :

test_redis=# explain analyze select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=111.24..156.43 rows=1 width=244) (actual time=0.981..1.031 rows=1 loops=1)
   Join Filter: (test2_pg.key = db15.key)
   Rows Removed by Join Filter: 2
   ->  Nested Loop  (cost=101.24..143.39 rows=1 width=180) (actual time=0.857..0.862 rows=1 loops=1)
         ->  Hash Join  (cost=101.11..142.55 rows=5 width=116) (actual time=0.833..0.835 rows=1 loops=1)
               Hash Cond: (test2_pg.key = pgcsv.key)
               ->  Foreign Scan on test2_pg  (cost=100.00..137.90 rows=930 width=64) (actual time=0.767..0.768 rows=1 loops=1)
               ->  Hash  (cost=1.10..1.10 rows=1 width=52) (actual time=0.049..0.049 rows=3 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Foreign Scan on pgcsv  (cost=0.00..1.10 rows=1 width=52) (actual time=0.033..0.042 rows=3 loops=1)
                           Foreign File: /home/vasilis/lala.csv
                           Foreign File Size: 98
         ->  Index Scan using kokoako on test  (cost=0.12..0.16 rows=1 width=64) (actual time=0.019..0.021 rows=1 loops=1)
               Index Cond: (key = test2_pg.key)
   ->  Foreign Scan on db15  (cost=10.00..13.00 rows=3 width=64) (actual time=0.071..0.162 rows=3 loops=1)
         Foreign Redis Table Size: 3
 Total runtime: 2.184 ms
(17 rows)
This is awesome from many aspects, imagine that you have a heterogeneous environment with mongo, redis, postgres , oracle ,csv logs etc.. you can put a postgreSQL in the middle, acting like a bridge and create supersets, subsets, aggregations, views,temp tables, report tables and so many other things. You could even limit users to access only the "bridge postgres" so they can see whatever you want from many different systems. you can use SQL in redis since redis commands are so limited. and because i am starting getting a bit paranoid i will leave it up to you on how to use this :)


Thanks for reading
-- Vasilis





Hubert 'depesz' Lubaczewski: Waiting for 9.4 – PL/Perl: Add event trigger support

Planet PostgreSQL - 18 December, 2013 - 05:43
On 11th of December, Peter Eisentraut committed patch: PL/Perl: Add event trigger support   From: Dimitri Fontaine <dimitri@2ndQuadrant.fr> In PostgreSQL 9.3 we got EVENT triggers. To remind – these are triggers that get executed, not on change to table, but on specific event – related to general schema. Creation of new object. Altering state of […]

Hubert 'depesz' Lubaczewski: OmniPITR 1.3.1

Planet PostgreSQL - 18 December, 2013 - 01:35
Right after releasing 1.3.0 I realized that I forgot about one thing. If you're using ext3 (and possibly other, not sure) file system, removal of large file can cause problems due to heavy IO traffic. We did hit this problem earlier at one of client sites, and devised a way to remove large files by […]
Syndicate content