5

I have a column in my Postgresql database named Description which contains text. For some analytics purpose, I want to find total number of words (seperated by spaces) - all words and unique words - in that description and set those counts in columns word_count and unique_word_count. For eg.

id Description
1 What a great day
2 This is a product. It is useful

I want to have following output:

id word_count unique_word_count Description
1 4 4 What a great day
2 7 6 This is a product. It is useful

The unique_word_count for id = 2 is 6 because the word is has been repeated 2 times.

Chan Guan Yu
  • 47
  • 1
  • 6
Pranjal
  • 155
  • 1
  • 2
  • 5

1 Answers1

6

EDIT:

Not being 100% happy with my answer, I awarded a bonus for a question of my own on this topic here.

Erwin Brandstetter (as usual) provided a sublime answer - please use that instead of what I wrote below - unless your use case is very simple and even then...

==============================================================

To solve your issue I did the following:

CREATE TABLE wordcount (id SERIAL NOT NULL PRIMARY KEY, description TEXT NOT NULL);

Populate it:

INSERT INTO wordcount (description) VALUES ('What a great day');
INSERT INTO wordcount (description) VALUES ('This is a product. It is useful');

Then performed the following query (fiddle available here):

WITH arranged AS
(
  SELECT id, 
  UNNEST
  (
    STRING_TO_ARRAY
    (
      REGEXP_REPLACE(description,  '[^\w\s]', '', 'g'), ' '
    )
  ) AS word, 
  description 
  FROM wordcount
)  
SELECT a.id, COUNT(a.word), COUNT(DISTINCT(a.word)), a.description
FROM arranged a
GROUP BY a.id, a.description;

Result:

id Word_Count Unique_Word_Count Description
1 4 4 What a great day
2 7 6 This is a product. It is useful

If you're new to all this, I would advise you to go through the fiddle here which shows the development of my thought processes which led to the query above.

I would advise you to put this logic into an onInsert/onUpdate trigger - it's probably fairly computationally intensive.

Vérace
  • 30,923
  • 9
  • 73
  • 85