Wordcloud creates a messy image

7

1

I'm trying to use Wordcloud built in function (I'm using v10.3.0.0) but the result is not good. I'm using a list with expressions and respective weights:

Scores={{"políticas ativas de emprego", 0.0105614189853},
{"envolvimento da comissão europeia",0.00847597097115},
{"procedimento de défice excessivo",0.00689428164087},
{"auxílios de estado", 0.00672761334329},
{"mercado interno de energia", 0.00651015550018},
{"cenário de políticas invariantes", 0.00616838495429},
{"termos _per capita_", 0.00528444675534},
{"fundo monetário internacional", 0.005156888605},
{"procedimento por défice excessivo", 0.00446366885149},
{"estado da portugal telecom", 0.00439010129432},
{"rácio de dívida pública", 0.00424150840713},
{"programa de assistência económica", 0.00421705550905},
{"_stock_ de dívida", 0.00373126508612},
{"informação da segurança social", 0.00367081223817},
{"tempo o partido socialista", 0.00363397022484},
{"caso da portugal telecom", 0.00349820427426},
{"_quantitative easing_", 0.00348106383557},
{"área da moeda única", 0.00346640013996},
{"portugal o partido socialista", 0.00334529581534},
{"introdução do quociente familiar",0.00330896979159},
{"banco espírito santo", 0.00326703397878},
{"questões do tribunal constitucional", 0.00289720906606},
{"serviço nacional de saúde", 0.00287941068182},
{"estatuto dos tribunais administrativos",0.00282062904814},
{"rendimento _per capita_", 0.00281631447877},
{"defesa da zona euro",0.002747796651},
{"exemplo o partido socialista", 0.00273072152925},
{"período na união europeia", 0.00250570307651},
{"área da fiscalidade verde",0.00248133473314},
{"governo regional dos açores", 0.00246743326021},
{"pano de fundo", 0.00245993070015},
{"dados da comissão europeia", 0.00245971711069},
{"resposta da segurança social", 0.00245027724487},
{"cimeira da zona euro", 0.00243492258733},
{"conselho de finanças públicas", 0.00234683012794},
{"empresa geral de fomento",0.00234021101473},
{"inação durante demasiados anos", 0.00229753939975},
{"caminho da espiral recessiva", 0.0022651746261},
{"aprofundamento da união económica",0.00225349666112},
{"caso do partido socialista", 0.00224751981847},
{"líder do partido socialista", 0.00222932489691},
{"endossadas pelo conselho europeu",0.00220717043396},
{"caso dos estaleiros navais", 0.0021845584449},
{"economia social de mercado", 0.00217228819124},
{"mundo cor de rosa", 0.00215947175436},
{"mobilidade geográfica dos trabalhadores", 0.00214962058393},
{"bilhetes do tesouro", 0.00214936829988},
{"projeção das contas públicas", 0.00211737006025},
{"_hub_ de lisboa", 0.0021145554114},
{"escola no decreto lei", 0.00209209131064}};

This is the result when using

WordCloud[Scores]

enter image description here

I get a "better" result with:

WordCloud[Scores, WordSpacings -> 20]

enter image description here

but in this case there is too much space between the words... So, how can make a nice wordcloud for this case?

EDIT: So this is what gives when using

WordCloud[Scores, WordSpacings -> {10, 8}]

enter image description here

and this is what I wish I had:

enter image description here

It was created using http://www.wordle.net/. I want to recreated it using Mathematica because I want to manipulate to colours, for which case Wordle is quite rigid in the options.

Miguel

Posted 2016-03-15T15:00:23.360

Reputation: 951

Closely related. Although I'd imagine even those solutions aren't optimise for phrases this long. – Martin Ender – 2016-03-15T15:04:40.947

What about values lower than 20? WordCloud[Scores, WordSpacings -> {10, 8}] looks nice to me – Jason B. – 2016-03-15T15:04:49.937

@MartinBüttner I was trying that solution before but it was taking a huge time to calculate any result. But thanks. – Miguel – 2016-03-15T15:54:26.100

@Miguel, this seems to be a regression, as I now get messy clouds with the same code that made clean ones in 10.2 I think. Can you contact WRI tech support? – alancalvitti – 2016-04-15T20:48:41.743

Answers

9

I find a decent result by using the two-argument version of WordSpacings, where the first number is along the word's direction (horizontal here) and the second is perpendicular to it (vertical here). Further, if I give it the same image dimensions as the wordle version above, the result is very similar

WordCloud[Scores, WordSpacings -> {10, 3}, ImageSize -> {1181, 498}]

enter image description here

And it is pretty easy to apply a custom color to each word. As far as I can tell the only argument given to ColorFunction is the weight, so we can't use that very easily), but we can use Style. Here is a list of random colors,

colorlist = RandomColor[Length@Scores]

enter image description here

Now I just apply Style to each word in the list,

WordCloud[
 MapIndexed[{Style[First@#1, colorlist[[#2]]], Last@#1} &, Scores], 
 WordSpacings -> {10, 3}, ImageSize -> {1181, 498}]

enter image description here

Jason B.

Posted 2016-03-15T15:00:23.360

Reputation: 58 546

Ok, thanks. It is strange, because for that WordSpacings it still results a bit messy. I'm going to edit to show this result, and what I would like to give (from wordle). – Miguel – 2016-03-15T15:21:42.253

@Miguel - it matches the wordle result moreso now, but there's probably a different algorithm underneath. – Jason B. – 2016-03-15T15:43:04.740

Ok, thanks! Much better now. Now I need to colour each word with my own RGB list. Any idea if it is possible? – Miguel – 2016-03-15T15:52:56.070

@Miguel - as my daughter would say, "easy peasy lemon squeezy" – Jason B. – 2016-03-15T16:06:40.847

Nice! Thank you and your daughter. :) – Miguel – 2016-03-15T16:09:51.907