iii. Replacing/Removing Characters In a String: REPLACE
SELECTdescriptionFROM film WHERE film_id =133;/*A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria*/SELECTREPLACE(description, 'Moose', 'Rhino') ASdescriptionFROM film WHERE film_id =133/*A Fateful Reflection of a Rhino And a Husband who must Overcome a Monkey in Nigeria*/-- Remove white space in between stringsSELECT title,REPLACE(title, ' ','') AS title_formated,FROM film;/*title title_formatedChamber Italian ChamberItalianGrosse Wonderful GrosseWonderfulAirport Pollock AirportPollockBright Encounters BrightEncountersAcademy Dinosaur AcademyDinosaur...*/
iv. Removing Leading/Trailing White Spaces or any Special Characters from a String: TRIM, LTRIM, RTRIM, BTRIM
-- remove leading and trailing whitespaces from a stringSELECTLTRIM(' basketball '),RTRIM(' basketball '), BTRIM(' basketball ')/*"basketball ", " basketball", "basketball"*/-- remove special characters from a stringSELECTTRIM(LEADING'('FROM'(checked)'),TRIM(TRAILING')'FROM'(checked)'),TRIM(BOTH'()'FROM'(checked)')/*ltrim rtrim btrimchecked) (checked checked*/-- OR (following functions are faster in runtime)SELECTLTRIM('(checked)', '('),RTRIM('(checked)', ')'), BTRIM('(checked)', '()')/*ltrim rtrim btrimchecked) (checked checked*/
v. Padding Strings: LPAD, RPAD
SELECT title,-- LAPD(strinig, total_char_count, char2padwith) LPAD(title, 20, '*') AS title_formated1, RPAD(title, 20, '*') AS title_formated2FROM film;/*title title_formated1 title_formated2Chamber Italian *****Chamber Italian Chamber Italian*****Grosse Wonderful ****Grosse Wonderful Grosse Wonderful****Airport Pollock *****Airport Pollock Airport Pollock*****Bright Encounters ***Bright Encounters Bright Encounters***Academy Dinosaur ****Academy Dinosaur Academy Dinosaur****...*/
vi. Reversing Characters In a String: REVERSE
SELECT title,REVERSE(title) AS title_reversedFROM film;/*title title_reversedChamber Italian nailatI rebmahCGrosse Wonderful lufrednoW essorGAirport Pollock kcolloP tropriA...*/
B. Parsing Strings
i. Parsing Strings from Left, Right or Anywhere: LEFT, RIGHT, SUBSTR
SELECT-- LEFT(string, numberofcharacters),LEFT(description, 15),-- RIGHT(string, numberofcharacters),RIGHT(description, 15),-- SUBSTR(string, position2start, numberofcharacters) SUBSTR(description, 5, 10)FROM film;/*left right substrA Fateful Refle nkey in Nigeria teful ReflA Epic Drama of se in Australia ic Drama oA Epic Tale of n Ancient India ic Tale of...*/
ii. Splitting a String: SPLIT_PART
SELECT-- SPLIT_PART(string,'delimiter', <field_number>) title, SPLIT_PART(title,' ',1) AS title_p1, SPLIT_PART(title,' ',2) AS title_p2FROM film;/*title title_p1 title_p2Chamber Italian Chamber ItalianGrosse Wonderful Grosse WonderfulAirport Pollock Airport PollockBright Encounters Bright EncountersAcademy Dinosaur Academy Dinosaur...*/
iii. Finding the Length of a String: LENGTH, CHAR_LENGTH
/* While 'LENGTH' function measures the length in bytes, 'CHAR_LENGTH' function measures the length of a string in characters. */SELECT title,LENGTH(title) AS len1, CHAR_LENGTH(title) AS len2FROM film;/*title len1 len2Chamber Italian 15 15Grosse Wonderful 16 16Airport Pollock 15 15Bright Encounters 17 17Academy Dinosaur 16 16*/
iv. Finding the position/indexing of a Character in a String: POSITION - STRPOS
/* POSITION and STRPOS functions perform the same way, but they differ in syntax.Reason why they both exist and differ only in syntax is that POSITION(str1 IN str2) is defined by ANSI SQL standard. If PostgreSQL had only strpos() it wouldn't be able to run ANSI SQL queries and scripts */SELECT email,-- POSITION('A' IN string) POSITION('@'IN email),-- STRPOS(string, 'A') STRPOS(email,'@')FROM customer;/*email position strpostjared.ely@sakilacustomer.org 10 10mary.smith@sakilacustomer.org 11 11patricia.johnson@sakilacustomer.org 17 17linda.williams@sakilacustomer.org 15 15barbara.jones@sakilacustomer.org 14 14*/
v. Extracting Substring Upto/From a Specific Character: SUBSTRING
Note that SUBSTR()takes the length of the substring to be returned, while SUBSTRING()takes end index (excluding) for a substring.
/*The same output can be achieved using SPLIT_PART() funtion*/SELECT email, SPLIT_PART(email,'@',1), SPLIT_PART(email,'@',2)FROM customer;/*email substring substring-2jared.ely@sakilacustomer.org jared.ely sakilacustomer.orgmary.smith@sakilacustomer.org mary.smith sakilacustomer.orgpatricia.johnson@sakilacustomer.org patricia.johnson sakilacustomer.org...*/
C. Strings to Array
Adding strings to arrays (an array is a collection of elements of the same data type, arranged in a fixed-size sequence) is very useful when we would like to answer questions such as
what is the number of words in a movie's description?
SELECTdescription, ARRAY_LENGTH(STRING_TO_ARRAY(description, ','), 1) AS word_countsFROM film;/*The same result can be reached via:SELECT description, LENGTH(description) - LENGTH(REPLACE(description,',','')) + 1 AS word_countsFROM film*//* OUTPUT for the first 4 rows: description | word_counts ---------------------------------------------------------------------------------------------+------------- A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria | 16 A Epic Drama of a Cat And a Explorer who must Redeem a Moose in Australia | 16 A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India | 17 A Fateful Yarn of a Lumberjack And a Feminist who must Conquer a Student in A Jet Boat | 18 */
STRING_TO_ARRAY(words, ','): This function splits the comma-separated string in the 'description' column into an array.
ARRAY_LENGTH(..., 1): It calculates the length of the array, giving you the count of words in the 'description' column.
what the is the word frequency in the description of a certain movie?
WITH words_only AS (SELECTLOWER(UNNEST(STRING_TO_ARRAY(description,' '))) AS words FROM film WHERE film_id =384)SELECT words, COUNT(*) FROM words_onlyGROUP BY1ORDER BY2DESC, 1ASC;/* OUTPUT words | word_count -----------+------------ a | 4 and | 1 australia | 1 cat | 1 drama | 1 epic | 1 explorer | 1 in | 1 moose | 1 must | 1 of | 1 redeem | 1 who | 1 */
-- Extracting numbers with 10digit longSELECT(REGEXP_MATCHES(string, '\d{10}'))[1] -- note the () and the index[1]REGEXP_SUBSTR(string, '[0-9]{10}') -- PostgreSQL version 15 and up-- String RemovalSELECTREGEXP_REPLACE('ABC12345xyz','[[:alpha:]]','','g'); --removes all the lettersREGEXP_REPLACE('abc1de2f3g432hi45j','[[:digit:]]','','g') --removes all the digitsREGEXP_REPLACE('you, are s?per smart!!!','[[:punct:]]','','g') --removes all the punctuation
3. Full-Text Search (Advanced - Postgres)
Full-text search offers a method for conducting natural language queries on text data. It utilizes stemming, fuzzy string matching to address spelling errors, and a ranking mechanism to order results based on similarity to the search string.
In order to perform Full text search in Postgres, we use two built-in functions: to_tsvector and to_tsquery, and the match operator (@@). Those functions transform text and string data into a tsvector (text search vector) data type, which represents a sorted list of words normalized into various forms of the same word, i.e. variants called lexemes.
A Fast-Paced Drama of a Student And a Crocodile who must Meet a Database Administrator in The Outback
INDIAN LOVE
A Insightful Saga of a Mad Scientist And a Mad Scientist who must Kill a Astronaut in An Abandoned Fun House
LAWRENCE LOVE
A Fanciful Yarn of a Database Administrator And a Mad Cow who must Pursue a Womanizer in Berlin
GRAFFITI LOVE
A Unbelievable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer in Berlin
LOVE SUICIDES
A Brilliant Panorama of a Hunter And a Explorer who must Pursue a Dentist in An Abandoned Fun House
LOVELY JINGLE
A Fanciful Yarn of a Crocodile And a Forensic Psychologist who must Discover a Crocodile in The Outback
Note that the LIKE operator with the % wildcard at the beginning and end of the string may or may not return the same result depending on the string. For instance, in the following query we will see that the results are not the same:
SELECT title, descriptionFROM film-- WHERE to_tsvector(title) @@ to_tsquery('love');WHERE title LIKE'%LOVE%';
Output:
title
description
IDAHO LOVE
A Fast-Paced Drama of a Student And a Crocodile who must Meet a Database Administrator in The Outback
INDIAN LOVE
A Insightful Saga of a Mad Scientist And a Mad Scientist who must Kill a Astronaut in An Abandoned Fun House
LAWRENCE LOVE
A Fanciful Yarn of a Database Administrator And a Mad Cow who must Pursue a Womanizer in Berlin
GRAFFITI LOVE
A Unbelievable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer in Berlin
IDENTITY LOVER
A Boring Tale of a Composer And a Mad Cow who must Defeat a Car in The Outback
LOVE SUICIDES
A Brilliant Panorama of a Hunter And a Explorer who must Pursue a Dentist in An Abandoned Fun House
LOVELY JINGLE
A Fanciful Yarn of a Crocodile And a Forensic Psychologist who must Discover a Crocodile in The Outback
LOVER TRUMAN
A Emotional Yarn of a Robot And a Boy who must Outgun a Technical Writer in A U-Boat
LOVERBOY ATTACKS
A Boring Story of a Car And a Butler who must Build a Girl in Soviet Georgia
STRANGELOVE DESIRE
A Awe-Inspiring Panorama of a Lumberjack And a Waitress who must Defeat a Crocodile in An Abandoned Amusement Park