String Operations

1. Dataset

In this section, we will be using film and customer tables from the dvdrental database ( formally known as Sakila Database) which can be accessed via: https://www.postgresqltutorial.com/postgresql-getting-started/postgresql-sample-database/.

If you like to download and the run examples in your local postgres server, you can follow the instructions here.

2. String Operations

2.1. Formatting Strings

2.1.1. Changing the Case of Strings: UPPER, LOWER, INITCAP

SELECT 
    title,
    UPPER(title) AS title_upper,
    LOWER(title) AS title_lower,
    INITCAP(title) AS title_initial_capitalized
FROM film;
/*
title	            title_upper	        title_lower	    title_initial_capitalized
Chamber Italian	    CHAMBER ITALIAN	chamber italian	    Chamber Italian
Grosse Wonderful    GROSSE WONDERFUL	grosse wonderful    Grosse Wonderful
Airport Pollock	    AIRPORT POLLOCK	airport pollock	    Airport Pollock
Bright Encounters   BRIGHT ENCOUNTERS	bright encounters   Bright Encounters
Academy Dinosaur    ACADEMY DINOSAUR	academy dinosaur    Academy Dinosaur
...
*/

2.1.2. Concatenating Strings: CONCAT, | | (using pipes)

SELECT
-- CONCAT(string1, 'separator', string2)
    CONCAT(film_id,' - ', title) AS film_id_name_concat,
-- string1 || 'separator' || string2
    film_id || ' -- ' || title AS film_id_name_pipe
FROM film;
/*
film_id_name_concat	    film_id_name_pipe
133 - Chamber Italian	    133 -- Chamber Italian
384 - Grosse Wonderful	    384 -- Grosse Wonderful
8 - Airport Pollock	    8 -- Airport Pollock
98 - Bright Encounters	    98 -- Bright Encounters
...
*/

2.1.3. Removing Leading/Trailing White Spaces or any Special Characters from a String: TRIM, LTRIM, RTRIM, BTRIM

-- remove leading and trailing whitespaces from a string
SELECT
    LTRIM('  basketball  '),
    RTRIM('  basketball  '),
    BTRIM('  basketball  ')
/*
"basketball  ",	"  basketball",	"basketball"
*/


-- remove special characters from a string
SELECT
    TRIM(LEADING '(' FROM '(checked)'),
    TRIM(TRAILING ')' FROM '(checked)'),
    TRIM(BOTH '()' FROM '(checked)')
/*
ltrim        rtrim        btrim
checked)     (checked     checked
*/

-- OR (following functions are faster in runtime)
SELECT
    LTRIM('(checked)', '('),
    RTRIM('(checked)', ')'),
    BTRIM('(checked)', '()')
/*
ltrim        rtrim        btrim
checked)     (checked     checked
*/

2.1.4. Padding Strings: LPAD, RPAD


SELECT
    title,
-- LAPD(strinig, total_char_count, char2padwith)
    LPAD(title, 20, '*') AS title_formated1,
    RPAD(title, 20, '*') AS title_formated2
FROM film;
/*
title	                title_formated1	          title_formated2
Chamber Italian	        *****Chamber Italian	  Chamber Italian*****
Grosse Wonderful	****Grosse Wonderful	  Grosse Wonderful****
Airport Pollock	        *****Airport Pollock	  Airport Pollock*****
Bright Encounters	***Bright Encounters	  Bright Encounters***
Academy Dinosaur	****Academy Dinosaur	  Academy Dinosaur****
...
*/

2.1.5. Replacing/Removing Characters In a String: REPLACE

SELECT description 
FROM film 
WHERE film_id = 133;
/*
A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria
*/

SELECT
    REPLACE(description, 'Moose', 'Rhino') AS description
FROM film 
WHERE film_id = 133
/*
A Fateful Reflection of a Rhino And a Husband who must Overcome a Monkey in Nigeria
*/

-- Remove white space in between strings
SELECT
    title,
    REPLACE(title, ' ','') AS title_formated,
FROM film;
/*
title	                title_formated
Chamber Italian	        ChamberItalian
Grosse Wonderful	GrosseWonderful
Airport Pollock	        AirportPollock
Bright Encounters	BrightEncounters
Academy Dinosaur	AcademyDinosaur
...
*/

2.1.6. Reversing Characters In a String: REVERSE

SELECT
    title,
    REVERSE(title) AS title_reversed
FROM film;
/*
title	                title_reversed
Chamber Italian	        nailatI rebmahC
Grosse Wonderful	lufrednoW essorG
Airport Pollock	        kcolloP tropriA
...
*/

2.2. Parsing Strings

2.2.1. Parsing Strings from Left, Right or Anywhere: LEFT, RIGHT, SUBSTR

SELECT
-- LEFT(string, numberofcharacters),
    LEFT(description, 15),
-- RIGHT(string, numberofcharacters),
    RIGHT(description, 15),
-- SUBSTR(string, position2start, numberofcharacters)
    SUBSTR(description, 5, 10)
FROM film;
/*
left			right			substr
A Fateful Refle		nkey in Nigeria		teful Refl
A Epic Drama of		se in Australia		ic Drama o
A Epic Tale of 		n Ancient India		ic Tale of
...
*/

2.2.2. Splitting a String: SPLIT_PART

SELECT 
-- SPLIT_PART(string,'delimiter', <field_number>)
    title,
    SPLIT_PART(title,' ',1) AS title_p1,
    SPLIT_PART(title,' ',2) AS title_p2
FROM film;
/*
title	            title_p1	title_p2
Chamber Italian	    Chamber	Italian
Grosse Wonderful    Grosse	Wonderful
Airport Pollock	    Airport	Pollock
Bright Encounters   Bright	Encounters
Academy Dinosaur    Academy	Dinosaur
...
*/

2.2.3. Finding the Length of a String: LENGTH, CHAR_LENGTH

/* 
While 'LENGTH' function measures the length in bytes, 
'CHAR_LENGTH' function measures the length of a string in characters. 
*/

SELECT
    title,
    LENGTH(title) AS len1,
    CHAR_LENGTH(title) AS len2
FROM film;
/*
title	           len1	len2
Chamber Italian	    15	 15
Grosse Wonderful    16	 16
Airport Pollock	    15	 15
Bright Encounters   17   17
Academy Dinosaur    16	 16
*/

2.2.4. Finding the position/indexing of a Character in a String: POSITION and STRPOS

SELECT
    email,
-- POSITION('A' IN string)
    POSITION('@' IN email),
-- STRPOS(string, 'A')
    STRPOS(email,'@')
FROM customer;

/*
email	                                position    strpost
jared.ely@sakilacustomer.org	        10	    10
mary.smith@sakilacustomer.org	        11	    11
patricia.johnson@sakilacustomer.org	17	    17
linda.williams@sakilacustomer.org	15	    15
barbara.jones@sakilacustomer.org	14	    14
*/

POSITION and STRPOS functions perform the same way, but they differ in syntax.Reason why they both exist and differ only in syntax is that POSITION(str1 IN str2) is defined by ANSI SQL standard. If PostgreSQL had only strpos() it wouldn't be able to run ANSI SQL queries and scripts

2.2.5. Extracting Substring Upto/From a Specific Character: SPLIT_PART and SUBSTRING

/*The same output can be achieved using SPLIT_PART() funtion*/

SELECT
    email,
    SPLIT_PART(email,'@',1),
    SPLIT_PART(email,'@',2)
FROM customer;

/*
email	                               substring          substring-2
jared.ely@sakilacustomer.org	       jared.ely          sakilacustomer.org
mary.smith@sakilacustomer.org	       mary.smith         sakilacustomer.org
patricia.johnson@sakilacustomer.org    patricia.johnson	  sakilacustomer.org
...
*/
SELECT
    email,
    SUBSTRING(email FROM 0 FOR POSITION('@' IN email)),
    SUBSTRING(email FROM POSITION('@' IN email)+1 FOR CHAR_LENGTH(email))
FROM customer;
/*
email	                               substring          substring-2
jared.ely@sakilacustomer.org	       jared.ely          sakilacustomer.org
mary.smith@sakilacustomer.org	       mary.smith         sakilacustomer.org
patricia.johnson@sakilacustomer.org    patricia.johnson	  sakilacustomer.org
...
*/

Note that SUBSTR()takes the length of the substring, e.g. SUBSTR(string, position2start, numberofcharacters), to be returned, while SUBSTRING() takes end index (excluding) for a substring.

2.3. Regular Expressions

-- Extracting numbers with 10digit long
SELECT
(REGEXP_MATCHES(string, '\d{10}'))[1] -- note the () and the index[1]
REGEXP_SUBSTR(string, '[0-9]{10}') --  PostgreSQL version 15 and up


-- String Removal
SELECT
REGEXP_REPLACE('ABC12345xyz','[[:alpha:]]','','g'); --removes all the letters
REGEXP_REPLACE('abc1de2f3g432hi45j','[[:digit:]]','','g') --removes all the digits
REGEXP_REPLACE('you, are s?per smart!!!','[[:punct:]]','','g') --removes all the punctuation

2.4. Strings to Array

Adding strings to arrays (an array is a collection of elements of the same data type, arranged in a fixed-size sequence) is very useful when we would like to answer questions such as

  • how many words are there in a movie's description?

SELECT
  description,
  ARRAY_LENGTH(STRING_TO_ARRAY(description, ','), 1) AS word_counts
FROM film;

/*
The same result can be reached via:
SELECT 
  description,
  LENGTH(description) - LENGTH(REPLACE(description,',','')) + 1 AS word_counts
FROM film
*/

/* OUTPUT for the first 4 rows:
  description                                                                                | word_counts 
---------------------------------------------------------------------------------------------+-------------
 A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria         |          16
 A Epic Drama of a Cat And a Explorer who must Redeem a Moose in Australia                   |          16
 A Epic Tale of a Moose And a Girl who must Confront a Monkey in Ancient India               |          17
 A Fateful Yarn of a Lumberjack And a Feminist who must Conquer a Student in A Jet Boat      |          18
 */
  1. STRING_TO_ARRAY(words, ','): This function splits the comma-separated string in the 'description' column into an array.

  2. ARRAY_LENGTH(..., 1): It calculates the length of the array, giving you the count of words in the 'description' column.

  • what the is the word frequency in the description of a certain movie?

WITH words_only AS (
SELECT LOWER(UNNEST(STRING_TO_ARRAY(description,' '))) AS words 
FROM film 
WHERE film_id = 384)
SELECT words, COUNT(*) 
FROM words_only
GROUP BY 1
ORDER BY 2 DESC, 1 ASC;

/* OUTPUT 
  words   | word_count 
-----------+------------
 a         |          4
 and       |          1
 australia |          1
 cat       |          1
 drama     |          1
 epic      |          1
 explorer  |          1
 in        |          1
 moose     |          1
 must      |          1
 of        |          1
 redeem    |          1
 who       |          1
 */
  1. UNNEST: It expands an array to a set of rows

For other array functions you can visit https://www.postgresql.org/docs/9.1/functions-array.html

2.5. Array to String

The STRING_AGG function in PostgreSQL is an aggregate function that concatenates values from multiple rows into a single string, with a specified delimiter separating the values. This function is particularly useful for creating comma-separated lists or other delimited formats from the results of a query.

Syntax

STRING_AGG(expression, delimiter)
  • expression: The column or expression whose values you want to concatenate.

  • delimiter: The string used to separate the concatenated values.

Example Usage

2.5.1. Basic Example

Let's say we have a table named fruits:

CREATE TABLE fruits (
    name TEXT
);

INSERT INTO fruits (name) VALUES
('apple'), ('banana'), ('cherry'), ('date');

To concatenate the fruit names into a single string separated by commas:

SELECT STRING_AGG(name, ', ') AS fruits_list
FROM fruits;

Result:

fruits_list

apple, banana, cherry, date

2.5.2. Grouping Results

You can use STRING_AGG to concatenate values within groups. Suppose we have another table named sales with categories:

CREATE TABLE sales (
    category TEXT,
    amount NUMERIC
);

INSERT INTO sales (category, amount) VALUES
('Accessory', 10.0), ('Accessory', 20.0), ('Beauty', 30.0), ('Beauty', 40.0), ('Beauty', 50.0);

To concatenate the sales amounts for each category:

SELECT category, STRING_AGG(amount::TEXT, ', ') AS amounts_list
FROM sales
GROUP BY category;

Result:

category
amounts_list

Accessory

10.0, 20.0

Beauty

30.0, 40.0, 50.0

2.5.3. Ordering Results

You can specify an order for the concatenated values using the ORDER BY clause:

SELECT 
    category, 
    STRING_AGG(amount::TEXT, ', ' ORDER BY amount DESC) AS amounts_list
FROM sales
GROUP BY category;

Result:

category
amounts_list

Accessory

20.0, 10.0

Beauty

50.0, 40.0, 30.0

2.5.4. Combining with Other Functions

You can combine string_agg with other functions to manipulate and format data further:

SELECT 
    category, 
    STRING_AGG(amount::TEXT || ' USD', ', ' ORDER BY amount) AS amounts_list
FROM sales
GROUP BY category;

Result:

category
amounts_list

Accessory

10.0 USD, 20.0 USD

Beauty

30.0 USD, 40.0 USD, 50.0 USD

2.6. Changing Data Type to String

TO_CHAR() is a powerful function that allows us to convert other data types into string types according to the given format. For full list of its capabilities visit: https://www.postgresql.org/docs/current/functions-formatting.html

The following queries enable us to convert different date/time numeric data types into string type:

SELECT TO_CHAR('2015-10-06 17:31:14'::timestamp,'Day') 
-- Tuesday

SELECT TO_CHAR(INTERVAL '15h 2m 12s', 'HH24:MI:SS') 
-- 15:02:12

Below is the table including the functions are used to convert a specific date type to another the data type:

-- change date format
SELECT
    create_date,
    TO_CHAR(create_date,'YYYY-MM')
FROM customer;
/*
create_date	to_char
2/14/06    	2006-02
2/14/06	    2006-02
...
*/

3. Full-Text Search (Advanced - Postgres)

Full-text search offers a method for conducting natural language queries on text data. It utilizes stemming, fuzzy string matching to address spelling errors, and a ranking mechanism to order results based on similarity to the search string.

In order to perform Full text search in Postgres, we use two built-in functions: to_tsvector and to_tsquery, and the match operator (@@). Those functions transform text and string data into a tsvector (text search vector) data type, which represents a sorted list of words normalized into various forms of the same word, i.e. variants called lexemes.

To show how full-text search using film table from the dvdrental database (formally known as Sakila Database) which can be accessed via: https://www.postgresqltutorial.com/postgresql-getting-started/postgresql-sample-database/.

SELECT title, description
FROM film
WHERE to_tsvector(title) @@ to_tsquery('love');

Output:

title

description

IDAHO LOVE

A Fast-Paced Drama of a Student And a Crocodile who must Meet a Database Administrator in The Outback

INDIAN LOVE

A Insightful Saga of a Mad Scientist And a Mad Scientist who must Kill a Astronaut in An Abandoned Fun House

LAWRENCE LOVE

A Fanciful Yarn of a Database Administrator And a Mad Cow who must Pursue a Womanizer in Berlin

GRAFFITI LOVE

A Unbelievable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer in Berlin

LOVE SUICIDES

A Brilliant Panorama of a Hunter And a Explorer who must Pursue a Dentist in An Abandoned Fun House

LOVELY JINGLE

A Fanciful Yarn of a Crocodile And a Forensic Psychologist who must Discover a Crocodile in The Outback

Note that the LIKE operator with the % wildcard at the beginning and end of the string may or may not return the same result depending on the string. For instance, in the following query we will see that the results are not the same:

SELECT title, description
FROM film
-- WHERE to_tsvector(title) @@ to_tsquery('love');
WHERE title LIKE '%LOVE%';

Output:

title
description

IDAHO LOVE

A Fast-Paced Drama of a Student And a Crocodile who must Meet a Database Administrator in The Outback

INDIAN LOVE

A Insightful Saga of a Mad Scientist And a Mad Scientist who must Kill a Astronaut in An Abandoned Fun House

LAWRENCE LOVE

A Fanciful Yarn of a Database Administrator And a Mad Cow who must Pursue a Womanizer in Berlin

GRAFFITI LOVE

A Unbelievable Epistle of a Sumo Wrestler And a Hunter who must Build a Composer in Berlin

IDENTITY LOVER

A Boring Tale of a Composer And a Mad Cow who must Defeat a Car in The Outback

LOVE SUICIDES

A Brilliant Panorama of a Hunter And a Explorer who must Pursue a Dentist in An Abandoned Fun House

LOVELY JINGLE

A Fanciful Yarn of a Crocodile And a Forensic Psychologist who must Discover a Crocodile in The Outback

LOVER TRUMAN

A Emotional Yarn of a Robot And a Boy who must Outgun a Technical Writer in A U-Boat

LOVERBOY ATTACKS

A Boring Story of a Car And a Butler who must Build a Girl in Soviet Georgia

STRANGELOVE DESIRE

A Awe-Inspiring Panorama of a Lumberjack And a Waitress who must Defeat a Crocodile in An Abandoned Amusement Park

Last updated