Pages

Monday, June 11, 2012

SQL Query Optimization Techniques

1) SQL query became fast if you  select actual columns instead of selecting all values (*)
2) HAVING clause is used to filter the rows after all the rows are selected. It is just like a filter. Do not use HAVING clause for any other purposes. 
For Example: Write the query as
SELECT subject, count(subject) FROM student_details WHERE subject != 'Science' AND subject != 'Maths' GROUP BY subject;
Instead of:
SELECT subject, count(subject) FROM student_details GROUP BY subject HAVING subject!= Vancouver' AND subject!= 'Toronto';


3) Try to minimize the number of subquery block in your query. 
For Example: Write the query as
SELECT name FROM employee WHERE (salary, age ) = (SELECT MAX (salary), MAX (age) FROM employee_details) AND dept = 'Electronics'; 
Instead of:
SELECT name FROM employee WHERE salary = (SELECT MAX(salary) FROM employee_details) AND age = (SELECT MAX(age) FROM employee_details) AND emp_dept = 'Electronics';


4) Use operator EXISTS, IN and table joins appropriately in your query. 
    a) Usually IN has the slowest performance. 
    b) IN is efficient when most of the filter criteria is in the sub-query. 
    c) EXISTS is efficient when most of the filter criteria is in the main query.
For Example: Write the query as
Select * from product p where EXISTS 
(select * from order_items o where o.product_id = p.product_id)
Instead of:
Select * from product p where product_id IN (select product_id from order_items


5) Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship. For Example: Write the query as
SELECT d.dept_id, d.dept FROM dept d 
WHERE EXISTS ( SELECT 'X' FROM employee e WHERE e.dept = d.dept);
Instead of:
SELECT DISTINCT d.dept_id, d.dept FROM dept d,employee e WHERE e.dept = e.dept;


6) Use of WHERE clause.
Symbol operator: For Example: Write the query as
SELECT id, first_name, age FROM student_details WHERE age > 10;
Instead of:
SELECT id, first_name, age FROM student_details WHERE age != 10; 


Wildcard vs sub-string: Write the query as:
SELECT id, first_name, age FROM student_details WHERE first_name LIKE 'Chan%';
Instead of:
SELECT id, first_name, age FROM student_details WHERE SUBSTR(first_name,1,3) = 'Cha';


Write the query as:
SELECT id, first_name, age FROM student_details WHERE first_name LIKE NVL ( :name, '%');
Instead of:
SELECT id, first_name, age FROM student_details WHERE first_name = NVL ( :name, first_name);

Write the query as:
SELECT id, first_name, age FROM student_details WHERE first_name LIKE NVL ( :name, '%');
Instead of:
SELECT id, first_name, age FROM student_details WHERE first_name = NVL ( :name, first_name); 

Write the query as:
SELECT product_id, product_name FROM product 
WHERE unit_price BETWEEN MAX(unit_price) and MIN(unit_price)
Instead of:
SELECT product_id, product_name FROM product WHERE unit_price >= MAX(unit_price) 
and unit_price <= MIN(unit_price) 

Write the query as:
SELECT id, name, salary FROM employee WHERE dept = 'Electronics' AND location = 'Bangalore';
Instead of:
SELECT id, name, salary FROM employee WHERE dept || location= 'Electronics Bangalore';

Use non-column expression on one side of the query because it will be processed earlier.
Write the query as:
SELECT id, name, salary FROM employee WHERE salary < 25000;
Instead of:
SELECT id, name, salary FROM employee WHERE salary + 10000 < 35000;

Write the query as:
SELECT id, first_name, age FROM student_details WHERE age > 10;
Instead of:
SELECT id, first_name, age FROM student_details WHERE age NOT = 10;



7) Utilize Union instead of OR
Indexes lose their speed advantage when using them in OR-situations
Write the query as:
SELECT * FROM TABLE WHERE COLUMN_A = 'value' OR COLUMN_B = 'value'
Instead of:
SELECT * FROM TABLE WHERE COLUMN_A = 'value'
UNION
SELECT * FROM TABLE WHERE COLUMN_B = 'value'
8) Use DECODE to avoid the scanning of same rows or joining the same table repetitively. DECODE can also be made used in place of GROUP BY or ORDER BY clause.
For Example: Write the query as
SELECT id FROM employee WHERE name LIKE 'Ramesh%' and location = 'Bangalore';
Instead of:
SELECT DECODE(location,'Bangalore', id, NULL) id FROM employee WHERE name LIKE 'Ramesh%';

10) To store large binary objects, first place them in the file system and add the file path in the database.
11) To write queries which provide efficient performance follow the general SQL standard rules.
     a) Use single case for all SQL verbs
     b) Begin all SQL verbs on a new line
     c) Separate all words with a single space 
     d) Right or left aligning verbs within the initial SQL verb

Tuesday, May 1, 2012

Manual Migration SQLSERVER to PostgreSQL

Following article explains a manual migration of a small SQL Server database structure to a postgres SQL
Create table scripts 
1.       Force to lower case (Notepad++  -> edit -> convert case to -> lower)
2.       Remove all square brackets ( [ and ] ),  owner prefixes (i.e. "dbo.").
3.       Remove all non-supported keywords (i.e. "WITH NOCHECK", "WITH CHECK",  "CLUSTERED", “ASC”, “with fillfactor = 90”, “ON PRIMARY”, “TEXTIMAGE_ON”)
4.       Remove unsupported commands (entire rows) using  following regular expression (Notepad++ for small scripts)
· ^.*//******.*$
· ^.*SET ANSI_NULLS.*$
· ^.*SET QUOTED_IDENTIFIER.*$

5.       Replace the T-SQL batch terminator "GO" with the PostgreSQL batch terminator ";" and manually remove additional “;”  OR remove all “GO” statements and add “;” for each sql statement.

6.       Replace all non-supported data types
  • Nvarchar, text, ntext -- varchar
  • DATETIME, Smalldatetime -- TIMESTAMP
  • MONEY -- NUMERIC(19,4)
  • Bit (Boolean type) -- smallint or boolean
  • Smalldatetime -- TIMESTAMP
  • Float -- bigint
  • decimal -- numeric
  • Tinyint -- smallint
  • Int -- integer
  • Image -- bytea
7.       uniqueidentifier data type

Operation
PostgreSQL
SQL Server
Create a table that has a UUID/GUID type automatically assigned
CREATE TABLE equipment( equip_id uuid PRIMARY KEY        
DEFAULT uuid_generate_v1() ,
equip_name  varchar(255));
CREATE TABLE equipment(
    equip_id uniqueidentifier
PRIMARY KEY
DEFAULT NEWID() ,
equip_name varchar(255));

CREATE TABLE equipment(
equip_id uniqueidentifier
PRIMARY KEY ROWGUIDCOL ,
equip_name varchar(255) );
Create a table that has text representation of UUID/GUID as primary for easier transportation
CREATE TABLE equipment(
equip_id char(32) PRIMARY KEY
DEFAULT LOWER(
REPLACE(  CAST(
uuid_generate_v1() As varchar(50))  , '-','')
 ) ,  equip_name varchar(255));
CREATE TABLE equipment(
    equip_id char(32) PRIMARY KEY
DEFAULT LOWER( REPLACE(
CAST(NEWID() As varchar(50)), '-','')
) , equip_name varchar(255));


8.   Instead of int IDENTITY(1,1) for an auto incrementing column, Postgres uses sequences, the easiest way to transform this is to use the serial (or bigserial for 8 byte numbers) data type If you need to start the sequence at a specified seed or have it decrement or increment by more than 1, you must create the sequence manually.
CREATE SEQUENCE <> START <>  INCREMENT <> ;
And then assign the default value for the column to use this sequence, using thenextval function:CREATE TABLE ( int default nextval('<>') not null;)
9. Find “DEFAULT (0)” value and add that manually at the end of relevant field in relevant table. Then remove “ALTER TABLE ……….. CHECK CONSTRAINT ……… DEFAULT (0) FOR …….” Rows
10. Remove “ALTER TABLE ……….. CHECK CONSTRAINT ………” rows
11.Change the index statements
SQL Server index
create nonclustered index ix_tblstudiesfieldstatus on studies (field_status_id );
Postgres Index
CREATE INDEX ix_tblstudiesfieldstatus  ON ccadmin.studies USING tree( field_status_id ); ALTER TABLE ccadmin.studies CLUSTER ON times_idx;
12.   Replace “getdate()” with “now()”
13.   Replace “len” with “length” 


Example

SQL Server Table
Postgre SQL Table
/****** Object:  Table [dbo].[WEBINC_TRACK]    Script Date: 05/03/2012 11:07:38 ******/
SET ANSI_NULLS ON GO
SET QUOTED_IDENTIFIER ON GO
SET ANSI_PADDING ON GO
CREATE TABLE [dbo].[WEBINC_TRACK](
[SESSIONID] [uniqueidentifier] NOT NULL CONSTRAINT  [DF_WEBINC_TRACK_SESSIONID]  DEFAULT (newid()),
[ID] [int] IDENTITY(1,1) NOT NULL,
[TrackDate] [smalldatetime] NOT NULL CONSTRAINT [DF_WEBINC_TRACK_TrackDate]  DEFAULT (getdate()),
 [pathinfo] [varchar](1000) NULL,
[loginpathinfo] [varchar](1000) NULL,
[EMAILSENT] [tinyint] NULL,
CONSTRAINT [PK_WEBINC_TRACK] PRIMARY KEY CLUSTERED
([ID] ASC )WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]
GO
Drop table if exists webinc_track;
create table webinc_track(
sessionid varchar(36) not null constraint df_webinc_track_sessionid default upper(cast(uuid_generate_v4() as varchar(36))),
id serial not null,
trackdate TIMESTAMP not null constraint df_webinc_track_trackdate  default (now()),
puin varchar(255) null,
pathinfo varchar(1000) null,
loginpathinfo varchar(1000) null,
emailsent smallint null,
 constraint pk_webinc_track primary key  (id));

SQL Server Function
Postgre SQL function
/****** Object:  UserDefinedFunction [dbo].[GetGenderFromSalutation]    Script Date: 05/09/2012 16:52:18 ******/
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF
GO
CREATE FUNCTION  [dbo].[GetGenderFromSalutation](@SalutationID smallint)
RETURNS smallint  AS 
BEGIN
declare @result smallint
set @result = 0
if @SalutationID in (3,4,7,2) 
                set @result =  2
else if (@SalutationID = 1)
                set @result =  1
return @result
END
CREATE OR REPLACE FUNCTION GetGenderFromSalutation(SalutationID smallint)
RETURNS smallint  AS
$BODY$
DECLARE
                result smallint := 0 ;
BEGIN

IF (SalutationID in (3,4,7,2)) THEN
                result :=  2;
ELSIF (SalutationID = 1) THEN
                result :=  1;
END IF;
RETURN result;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
Procedures and functions
Apply same rules for procedures and function as applied in table creation scripts. Additionally follow following point.
1)      Instead of “create function” use “Create or replace function”
2)      Add “AS $BODY$ DECLARE” after returning data type at beginning
3)      Remove individual declare phase from variable declaration and add “;” at end of each variable declaration statement.
4)      Add “then”  and “END IF;” word in all if conditional blocks. Further replace “else if” phase with “ELSIF”
5)      Remove “SET” and “@” words
6)      Add “$BODY$  LANGUAGE plpgsql VOLATILE” at the end of the function

7) PostgreSQL functions do not support output parameters, avoid them.
8) PostgreSQL functions do not support default values, all parameters must be passed to the function (although function overtyping is supported, allowing wrapper functions that allow defaults).
9) Use FOUND instead of @@FETCH_STATUS. Found is true if the last SELECT, SELECT INTO, or FETCH statement returned a value.
10) Instead of @@IDENTITY use currval(sequence name) by default the sequence name is __seq.
11) Remove DEALLOCATE cursor statements, the function is accomplished with CLOSE and DEALLOCATE has a different use.
12) Use PERFORM instead of EXEC to call stored procedures/functions.
13) Forvariable declarations, SELECT foo = foo column won't work. Use either SELECT INTO or the PL/pgsql definition := (as in foo := foocolumn.)
14) Error handling is different, when a PL/pgsql program encounters a nerror the execution is immediately stopped(can't check @@ERROR) or execution isnt to the EXCEPTION block (if it exists) after the current BEGIN block. Thus, if you want to return specific error values, you will need an EXCEPTION block.

Application level changes
1.Change "SELECT TOP 10 * FROM table" queries in to "SELECT * FROM table LIMIT 10" structure
2. LIKE statements are case sensitive in postgresql
3. concatination so "SELECT firstname + ' ' + lastname AS fullname" becomes "SELECT firstname || ' ' || lastname AS fullname"

Monday, March 26, 2012

Python basic tutorial


#genaral
C:\Python27\ python         #cmd prompt sart
import os                          # import a module
dir(os)                              # list all parameters in os module
os.listdir("C:\users\iss")  # list * files in a folder
help(modulename)           #To get the docs on all the functions at once

import aa                          #import a method 1
aa.get_directory(aa.drive)
from aa import *               #import a method 2
get_directory(drive)

#----------------------------------------------------------------
# String Formatting
name = "John"
age = 23
print "%s is %d years old." % (name, age)

mylist = [1,2,3]
print "A list: %s" % mylist       # List print

%s  - String (or any object with a string representation, like numbers)
%d  - Integers
%f   - Floating point numbers
%. f - Floating point numbers with a fixed amount of digits to the right of the dot.
%x/%X - Integers in hex representation (lowercase/uppercase)

s = "Hey there! what should this string be?"
print "Length of s = %d" % len(s)                                                # Length = 38
print "The first occurrence of the letter a = %d" % s.index("a")  # First occurrence of "a" =13
print "a occurs %d times" % s.count("a")                                    # Number of a's =1
# Slicing the string into bits
print "The first five characters are '%s'" % s[:5]                # Start to 5
print "The next five characters are '%s'" % s[5:10]            # 5 to 10
print "The twelfth character is '%s'" % s[12]                     # Just number 12
print "The last five characters are '%s'" % s[-5:]               # 5th-from-last to end

print "String in uppercase: %s" % s.upper()                      # Convert everything to uppercase
print "String in lowercase: %s" % s.lower()                     # Convert everything to lowercase

# Check how a string starts
s = "Str Hey there! what should this string be? some"
if s.startswith("Str"):
    print "String starts with 'Str'. Good!"
# Check how a string ends
if s.endswith("ome!"):
    print "String ends with 'ome!'. Good!"
# Split the string into three separate strings
print "Split the words of the string: %s" % s.split(" ")

#Basic Operators-------------------------------------------------
number = 1 + 2 * 3 / 4.0
remainder = 11 % 3
squared = 7 ** 2
cubed = 2 ** 3
helloworld = "hello" + " " + "world"
lotsofhellos = "hello" * 10                               # multiplying strings to form a string
even_numbers = [2,4,6,8]
odd_numbers = [1,3,5,7]
all_numbers = odd_numbers + even_numbers #Lists can be joined
print [1,2,3] * 3                                                #repeating sequence

#Conditions------------------------------------------------------
x = 2
print x == 2  # prints True
print x == 3  # prints False
print x < 3   # prints True

name = "John"
age = 23
if name == "John" and age == 23:
    print "Your name is John, and you are also 23 years old."
if name == "John" or name == "Rick":
    print "Your name is either John or Rick."

#Loop------------------------------------------------------------
#The "for" loop - Prints out the numbers 0,1,2,3,4
for x in xrange(5):
    print x
# Prints out 3,4,5
for x in xrange(3,6):
    print x

#"while" loops - Prints out 0,1,2,3,4
count = 0
while count < 5:
    print count
    count += 1
#break is used to exit a for loop or a while loop, whereas continue is used to skip the
#current block, and return to the "for" or "while"
count = 0
while True:
    print count
    if count > 5:
        break
# Prints out only odd numbers - 1,3,5,7,9
for x in xrange(10):
    # Check if x is even
    if x % 2 == 0:
        continue
    print x

#Function---------------------------------------------------------
def func2(x, y, z, m):
    print x+y+z+m
func2(1,2,3,4)

#function within fuunction
def wrapper1(func, *args): # with star
    func(*args)
def wrapper2(func, args): # without star ..positional args
    func(*args)
def func2(x, y, z, m):
    print x+y+z+m

wrapper1(func2, 1, 2, 3, 4)
wrapper2(func2, [1, 2, 3, 4])

#multiple function argument
def foo(xxx, yyy, zzz, *mmm):
    print "First: %s" % xxx
    print "Second: %s" % yyy
    print "Third: %s" % zzz
    print "And all the rest... %s" % mmm
foo(1,2,3,4)

#classes and objects-----------------------------------------------
class MyClass:
      variable = "blah" # define variable withing functtion
      def function(self): # define variable
           print "This is a message inside the class."
       
myobjectx = MyClass()      # assign class to an object
print myobjectx.variable    # access variable within class
print myobjectx.function()  # access function within class

myobjecty = MyClass()
myobjecty.variable = "yackity"
print myobjecty.variable

#Dictionaries-------------------------------------------------------
#A dictionary is a data type similar to arrays, but works with keys and values instead of
#indexes. Each value stored in a dictionary can be accessed using a key, which is any type of
#object (a string, a number, a list, etc.) instead of using its index to address it

phonebook = {}
phonebook["John"] = 938477566
phonebook["Jack"] = 938377264
or
phonebook = {
    "John" : 938477566,
    "Jack" : 938377264
}
#dictionary, unlike a list, does not keep the order of the values stored in it.
for name, number in phonebook.iteritems():
    print "Phone number of %s is %d" % (name, number)

#To remove a specified index, use either one of the following notations:
del phonebook["John"] or
phonebook.remove("John")

#Genaraters----------------------------------------------------------
#Generators are simple functions which return an iterable set of items, one at a time, in a special way.
import random
def lottery():
    # returns 6 numbers between 1 and 40
    for i in xrange(6):
        yield random.randint(1, 40)
    # returns a 7th number between 1 and 15
    yield random.randint(1,15)

for random_number in lottery():
    print "And the next number is... %d!" % random_number

#List Comprehensions--------------------------------------------------
#creates a new list based on another list, in a single, readable line
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_lengths = [len(word) for word in words if word != "the"]
print words
print word_lengths

#regular expression----------------------------------------------------
#Search and Replace: Some of the most important re methods that use regular expressions is sub.
#re.sub(pattern, repl, string, max=0)
#replace all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided.
#This method would return modified string.
phone = "2004-959-559          #This is Phone Number"
num = re.sub(r'#.', "", phone)  # replace "This"  = 2004-959-559 #is Phone Number
num = re.sub(r'#.*$', "", phone) # replace All after # mark with "" = 2004-959-559
num = re.sub(r'\D', "", phone)   # replace anything other than digits with ""  = 2004959559
print "Phone Num : ", num

#http://www.tutorialspoint.com/python/python_reg_expressions.htm
#Pattern Description
#-----------------------------------------------------------------------------------------------------
^ Matches beginning of line.
$ Matches end of line.
. Matches any single character except newline. Using m option allows it to match newline as well.
[...] Matches any single character in brackets.
[^...] Matches any single character not in brackets
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more occurrence of preceding expression.
re? Matches 0 or 1 occurrence of preceding expression.
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression.
a| b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
(?imx) Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected.
(?-imx) Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected.
(?: re) Groups regular expressions without remembering matched text.
(?imx: re) Temporarily toggles on i, m, or x options within parentheses.
(?-imx: re) Temporarily toggles off i, m, or x options within parentheses.
(?#...) Comment.
(?= re) Specifies position using a pattern. Doesn't have a range.
(?! re) Specifies position using pattern negation. Doesn't have a range.
(?> re) Matches independent pattern without backtracking.
\w Matches word characters.
\W Matches nonword characters.
\s Matches whitespace. Equivalent to [\t\n\r\f].
\S Matches nonwhitespace.
\d Matches digits. Equivalent to [0-9].
\D Matches nondigits.
\A Matches beginning of string.
\Z Matches end of string. If a newline exists, it matches just before newline.
\z Matches end of string.
\G Matches point where last match finished.
\b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
\B Matches nonword boundaries.
\n, \t, etc. Matches newlines, carriage returns, tabs, etc.
\1...\9 Matches nth grouped subexpression.
\10 Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.


File handler and line counts
f = open('received/20000402172012MARK0OUT.DAT', 'r')
s = open('received/20000402172012MARK0.NA', 'r')
num_lines = sum(1 for line in f)
num_liness= sum(1 for line in s)
print num_lines + num_liness

Advantage database commands

Permanently delete recodes from a table
execute procedure sp_zaptable 
('T:\studies\82289\data_extract\extract.dbf')
http://devzone.advantagedatabase.com/dz/webhelp/advantage8.1/supported_statements/miscellaneous_functions.htm

Select column, data type and status from a table
select name, field_type, field_can_be_null from System.columns
where parent = 'rec_sample_test'
order by 2,1


Merge tables

MERGE web_sample_test p1 USING "//loki/web/secure/82124/cust/US/live/rec/study.add".sample p2 
ON ( p1.sam_no = p2.sam_no and p1.waveno =p2.waveno) 
WHEN MATCHED THEN 
UPDATE SET p1.surname = p2.surname, p1.forename = p2.forename 
WHEN NOT MATCHED THEN 
INSERT (sam_no, waveno, forename, surname) VALUES (p2.sam_no, p2.waveno, p2.forename, p2.surname)


Advantage database server index
http://devzone.advantagedatabase.com/dz/webhelp/Advantage10.1/advantage_kwindex_static.html

PostgreSQL 9.2 Volume 2 Part ii

Chapter 11. Indexes
If there are many rows  a tabl and only a few rows (perhaps zero or one) that would be returned by a query, this is clearly an inefficient method. But if the system has been instructed to maintain an index on the id column, it can use a more efficient method for locating matching rows. For instance, it might only have to walk a few levels deep into a search tree.
CREATE INDEX test1_id_index ON test1 (id);
B-trees indexes can handle equality and range queries on data.involved in a comparison using one of these operators: <, <=, =, >=, >, BETWEEN, IN, IS NULL, IS NOT NULL and patter matching (LIKE). B-tree indexes can also be used to retrieve data in sorted order. This is not always faster than a simple scan and sort, but it is often helpful.
B-tree indexes can also be used to retrieve data in sorted order. This is not always faster than a simple scan and sort, but it is often helpful. Hash indexes can only handle simple equality comparisons. Involved in a comparison using the = operator. 
CREATE INDEX name ON table USING hash (column);
Multi column Indexes 
CREATE INDEX test2_mm_idx ON test2 (major, minor);
http://www.postgresql.org/docs/9.1/interactive/indexes.html
Chapter 12. Full Text Search
http://www.postgresql.org/docs/9.1/interactive/index.html
Chapter 13. Concurrency Control
data consistency is maintained by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that while querying a database each transaction sees a snapshot of data as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session.
MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data  which guarantee even when providing the strictest level of transaction isolation through the use of an innovative Serializable Snapshot Isolation (SSI) level.

13.2. Transaction Isolation

The SQL standard defines four levels of transaction isolation. The most strict is Serializable.The phenomena which are prohibited are various levels are:
Dirty readA transaction reads data written by a concurrent uncommitted transaction.
Non-repeatable read - A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
phantom read - A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
The four transaction isolation levels and the corresponding behaviors are described in Table 13-1.
Transaction Isolation Levels

Isolation LevelDirty ReadNonrepeatable ReadPhantom Read
Read uncommittedPossiblePossiblePossible
Read committedNot possiblePossiblePossible
Repeatable readNot possibleNot possiblePossible
SerializableNot possibleNot possibleNot possible
13.3. Explicit Locking
http://www.postgresql.org/docs/9.1/interactive/explicit-locking.html 
Chapter 14. Performance Tips
14.1. Using EXPLAIN

PostgreSQL devises a query plan for each query it receives. You can use the EXPLAIN command to see what query plan the planner creates for any query. 

The structure of a query plan is a tree of plan nodes. The first line (topmost node) has the estimated total execution cost for the plan; it is this number that the planner seeks to minimize.

EXPLAIN SELECT * FROM tenk1;
                         QUERY PLAN
-------------------------------------------------------------
 Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
The numbers that are quoted by EXPLAIN are (left to right): 
  • Estimated start-up cost (time expended before the output scan can start, e.g., time to do the sorting in a sort node) 
  • Estimated total cost (if all rows are retrieved, though they might not be; e.g., a query with a LIMIT clause will stop short of paying the total cost of the Limit plan node's input node) 
  • Estimated number of rows output by this plan node (again, only if executed to completion) 
  • Estimated average width (in bytes) of rows output by this plan node 

14.4. Populating a Database

One might need to insert a large amount of data when first populating a database.

** Disable Auto commit

When using multiple INSERTs, turn off autocommit and just do one commit at the end. An additional benefit of doing all insertions in one transaction is that if the insertion of one row were to fail then the insertion of all rows inserted up to that point would be rolled back, so you won't be stuck with partially loaded data.

** Use COPY
Use COPY to load all the rows in one command, instead of using a series of INSERT commands. The COPY command is optimized for loading large numbers of rows; it is less flexible than INSERT, but incurs significantly less overhead for large data loads. Since COPY is a single command, there is no need to disable autocommit if you use this method to populate a table.
COPY is fastest when used within the same transaction as an earlier CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be written, because in case of an error, the files containing the newly loaded data will be removed anyway. However, this consideration only applies when wal_level is minimal as all commands must write WAL otherwise.
** Remove Indexes
If you are loading a freshly created table, the fastest method is to create the table, bulk load the table's data using COPY, then create any indexes needed for the table. Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded.
If you are adding large amounts of data to an existing table, it might be a win to drop the indexes, load the table, and then recreate the indexes. Of course, the database performance for other users might suffer during the time the indexes are missing. One should also think twice before dropping a unique index, since the error checking afforded by the unique constraint will be lost while the index is missing.
** Remove Foreign Key Constraints
Just as with indexes, a foreign key constraint can be checked "in bulk" more efficiently than row-by-row. So it might be useful to drop foreign key constraints, load data, and re-create the constraints. Again.
when you load data into a table with existing foreign key constraints, each new row requires an entry in the server's list of pending trigger events (since it is the firing of a trigger that checks the row's foreign key constraint). Loading many millions of rows can cause the trigger event queue to overflow available memory, leading to intolerable swapping or even outright failure of the command.Alternative method is to split up the load operation into smaller transactions

** Increase maintenance_work_mem

Temporarily increasing the maintenance_work_mem configuration variable. This will help to speed up CREATE INDEX commands and ALTER TABLE ADD FOREIGN KEY commands. It won't do much for COPY itself, so this advice is only useful when you are using one or both of the above techniques.

** Increase checkpoint_segments

Temporarily increasing the "checkpoint_segments" configuration variable. This is because loading a large amount of data into PostgreSQL will cause checkpoints to occur more often than the normal checkpoint frequency (specified by the checkpoint_timeout configuration variable). Whenever a checkpoint occurs, all dirty pages must be flushed to disk. By increasing checkpoint_segments temporarily during bulk data loads, the number of checkpoints that are required can be reduced.
** Run ANALYZE Afterwards
Whenever you have significantly altered the distribution of data within a table, running ANALYZE is strongly recommended. which ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner might make poor decisions during query planning. Note that if the autovacuum daemon is enabled, it might run ANALYZE automatically



Friday, February 17, 2012

Regular Expressions

Regular Expressions
. (dot) Any single character except newline
* zero or more occurances of any character
[...]        Any single character specified in the set
[^...]        Any single character not specified in the set
^ Anchor - beginning of the line
$ Anchor - end of line
\< Anchor - begining of word
\> Anchor - end of word
\(...\)  Grouping - usually used to group conditions
\n Contents of nth grouping

[...]         Set Examples
[A-Z] The SET from Capital A to Capital Z
[a-z]   The SET from lowercase a to lowercase z
[0-9]         The SET from 0 to 9 (All numerals)
[./=+] The SET containing . (dot), / (slash), =, and +
[-A-F] The SET from Capital A to Capital F and the dash (dashes must be specified first)
[0-9 A-Z] The SET containing all capital letters and digits and a space
[A-Z][a-zA-Z] In the first position, the SET from Capital A to Capital Z

Regular Expression Examples
/Hello/ Matches if the line contains the value Hello
/^TEST$/   Matches if the line contains TEST by itself
/^[a-zA-Z]/ Matches if the line starts with any letter
^.*SET.*$  Select whole line which start with "SET"
/^[a-z].*/    Matches if the first character of the line is a-z and there is at least one more of any character 
"^.* as " select a phase from beginning up to as word. Can be used to replace "xxxx as c_q6" as "c_q6"

following it
/2134$/ Matches if line ends with 2134
/\(21|35\)/   Matches is the line contains 21 or 35
/[0-9]*/         Matches if there are zero or more numbers in the line
/^[^#]/        Matches if the first character is not a # in the line
\r\n[^\r\nO]  Matches the lines start with \r\n in previous line but not start with O

Notes:
1. Regular expressions are case sensitive
2. Regular expressions are to be used where pattern is specified

Tuesday, February 14, 2012

Bash command with parameters to run PDI transformation

fatigue_export.sh file
#!/bin/bash
echo "Enter the name of the sample file(eg CVAR_Good_20110104):"
read sample_file #sample_file is the variable which pick the entered value
echo "Enter the wave (eg 1):" read wave_no   #wave_no is the variable 
/home/kettle4/pan.sh -file=/home/pdi_transforms/ibmcvar/fatigue/pdi_transforms/fatigue_export.ktr -norep -param:FILENAME=$sample_file -param:WAVE=$wave_no -XX:-UseGCOverheadLimit
PDI Transformation properties (Ctrl + T)
Define the parameters name in transformation properties and keep the default values empty














windows batch file with few validations
@ECHO OFF
ECHO  Enter your job number %1
SET /P userin=

IF "%userin%"=="" (
 ECHO ----------------------------------
 echo "You should enter a job number..."
 ECHO ----------------------------------
 pause
 exit
)
IF NOT EXIST Q:\secure\%userin% (
 ECHO %userin% Job directory doesn't exist in Q:\secure\
 ECHO -----------------------------
 ECHO Please enter valid job number
 ECHO -----------------------------
 pause
 exit
)
IF EXIST Q:\secure\%userin% (
 ECHO ------------------------------------------------- 
 ECHO Maximum sample numbers are being calculated......
 ECHO This may take few minutes. please wait...........
 L:\kettle4.2\Kitchen.bat /file="W:\Panelsample_iss\max_sample_no\max_sample_no.kjb" "-param:jobno=%userin%" > nul
 ECHO -------------------------------------------------  
 ECHO -------------------------------------------------
 ECHO Please check max_sample_nos_%userin% file inside 
 IF EXIST w:\%userin%\Sample\ToLoad (
  echo w:\%userin%\Sample\ToLoad
  pause
 ) else ( 
  echo w:\%userin%US\Sample\ToLoad directory
  pause
 )
)