Pages

Monday, January 16, 2012

Appendix F. Additional Supplied Modules

F.15. fuzzystrmatch

F.15.1. Soundex

The Soundex system is a method of matching similar-sounding names by converting them to the same code. (Soundex is not very useful for non-English names). The soundex function converts a string to its Soundex code. The difference function converts two strings to their Soundex codes and then reports the number of matching code positions. Since Soundex codes have four characters, the result ranges from zero to four, with zero being no match and four being an exact match. (Thus, the function is misnamed — similarity would have been a better name.)

SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
     
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
    
F.15.3. Metaphone

Metaphone & Double Metaphone are algorithms that produce variable length keys for indexing words by their sound. definitely better compared to SOUNDEX because it is more precise compared to the fixed 4 character code of SOUNDEX. 

Metaphone, like Soundex, is based on the idea of constructing a representative code for an input string. Two strings are then deemed similar if they have the same codes.(source has to be a non-null string with a maximum of 255 characters)

SELECT metaphone('GUMBO', 4);
 metaphone
-----------
 KM
F.15.4. Double Metaphone
The Double Metaphone system computes two "sounds like" strings for a given input string — a "primary" and an "alternate". In most cases they are the same, but for non-English names especially they can be a bit different, depending on pronunciation. These functions compute the primary and alternate codes:

select dmetaphone('gumbo');
 dmetaphone
------------
 KMP

Pass Cursor data through a function

ParentFunction - pass cursor data through function
CREATE OR REPLACE FUNCTION process_original_matched_records()
  RETURNS void AS
$BODY$
DECLARE
    datarec to_process_struct%ROWTYPE;
    datarec1 to_process_struct;

BEGIN
    perform select_records_to_process();
    for datarec in select * from to_process where original_check=1 and puin is not NULL and update_status is NULL LOOP
--RAISE NOTICE 'in process_orginal_matched_records before upsert % ', datarec.puin;
        datarec1 := datarec;
        --RAISE NOTICE 'in process_orginal_matched_records after datarec1=datarec % ', datarec.puin;
        PERFORM upsert_all_tables(datarec, cast(1 as smallint), datarec.puin, cast(1 as smallint));      
    END LOOP;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE;

Child Function - use cursor date from parent table

CREATE OR REPLACE FUNCTION upsert_all_tables(datarec to_process_struct, i_update_status smallint, puin_to_use character, i_statusid smallint)

  RETURNS void AS

$BODY$
DECLARE
   voicephone text; orig_statusid smallint;  update_statusid smallint;verified smallint;
BEGIN
insert into pre_update_data
select ...... m.prim_specialityid, m.sub_specialityid,m.hp_id,c.panelid,now() as update_date, datarec.studyid, datarec.sam_no, i.biz1,i.biz2,i.biz3,i.biz4,i.biz5,i.biz6
from contact c inner join address a on c.panelid=a.panelid left outer join combined_phones cp on cp.panelid=c.panelid
left outer join medical_extra m on m.panelid=c.panelid left outer join it_demogs i on i.panelid=c.panelid
where c.panelid=puin_to_use;
        if datarec.stat_type in ('WN','DS','FX') THEN voicephone := NULL;
        ELSE
            voicephone := datarec.voicephone;
        END IF;
        if datarec.studyid in (select studyid from study inner join survey on study.surveyid=survey.surveyid
inner join project on project.projectid=survey.projectid where project.projectid=424) THEN
if datarec.stat_type='CP' THEN
verified := 1;
ELSE
verified := 0;
END IF;
END IF;
        PERFORM upsert_phone_or_email(puin_to_use, datarec.fax, 5, 'phone');
        PERFORM upsert_phone_or_email(puin_to_use, datarec.deptphone, 6, 'phone');
        
select statusid from contact into orig_statusid where panelid=puin_to_use;
IF FOUND THEN
select assign_statusid(datarec.stat_type, orig_statusid) into update_statusid;
ELSE
update_statusid := i_statusid;
END IF;

PERFORM * from webinc_track where puin=puin_to_use;
IF FOUND THEN
RAISE NOTICE 'Here is a puin in upsert_all_tables before block 1 %', datarec.puin;
PERFORM upsert_contact_details(puin_to_use, datarec.salutation, datarec.forename, datarec.surname, datarec.title, cast(datarec.pxtype as integer), orig_statusid, verified);
RAISE NOTICE 'Here is a puin in upsert_all_tables after block 1 %', datarec.puin;
END IF;
        PERFORM upsert_address_details(puin_to_use, datarec.company, datarec.street, datarec.city, datarec.state, datarec.zipcode, cast(datarec.ccode as smallint));

        END IF;       
        if datarec.pxtype = 1 THEN
            PERFORM upsert_it_demogs(
            puin_to_use, 
            cast(datarec.biz1 as integer),           
        END IF;

        PERFORM upsert_contact_statistics(puin_to_use, datarec.stat_type);
        update interviewed_sample set update_status=i_update_status, update_date=current_date, panelmatchpuin=puin_to_use where studyid=datarec.studyid and sam_no=datarec.sam_no;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE