Wordcloud substring formula?

How would I extract a new string of specific words from a “wordcloud” using word count as a filter?

Eg if the wordcloud is, for example, “pears, apples, orange, apple, apple, apples, oranges, pear, pears, cherry, cherry, mango, lemon, lemons, pineapple, kiwi, Leon, lemons, pear, apple, pineapple, cranberry, strawberry, pear, lemon, lemon, apple, pears, apples, orange, oranges, pear, pears, cherry, cherry, mango, lemon, lemons, pineapple, kiwi, pear, lemon, lemon, apple, pear, pears, cherries, pineapple, lemons, cranberry, strawberry, pear, lemon, lemon, apples, pears”

And the separate “target list” of words is, for example, “pear, cherry, lemon, apple, cranberry”

What formula could I use to extract only the “top three” reoccurring words from the target list that appear in the wordcloud (in this case, would be pear, lemon, apple), and also considering that the words may be plural apple | apples or cherry | cherries

Thanks!

Is the wordcloud built as a string or as a list of texts? If it is a list of texts, then it could be a complexly nested formula with the following structure:

SLICE(ORDER<counted>(MAP<target>(targetlist, {label: target, count: COUNT(SELECT<wcloud>(wordcloudlist, wcloud==target))}), counted.count, "desc"), 0, 3)

There is a lot going on here. Quick explanation:
We want to get the top 3 occurances. So wee need to SLICE() the 0 to 3 items (as 3 is not included) from an descending list of objects by count. Slice needs a list, so let’s build it by ordering our counted targets, that is what ORDER() does here. Order needs a list again that is what the MAP() does. MAP() checks for every target text how many times does that appear in the wordcloud list of texts. It does so with the help of the SELECT() which selects those items that are in the target and are also in the wordcloud list. It returns a list, which can be counted by the COUNT() formula. Now to show that which Tarver has how many items that is why we build an object {label: target, count:…}. So our MAP() will return a list of objects that have a property label which is the target word and the count how many times did that appear in the wordcloudlist. So this is ordered by the ORDER() and in the end we have our top 3 objects.

Hope it helps, if needed I’ll get back to it later. (Sorry for little effort to edit, I am from phone now.)

1 Like

Sounds good, let me try that and get back to you, thanks.

1 Like