Page 1 of 1

Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 10:54 am
by ottomatic
Hi,

I am building a WebResouceExtractor for Svtplay.se.

When parsing html documents to extract the title info for a web resource, I (naturally) get the titles in HTML format.

So, a show titled "Räksmörgås & annat" would be described as "Räksmörgås & annat".

Now, I am wondering if there is a utility in any of the the serviio / groovy namespaces wich could help me unescape the entity characters?

(It is my understanding that the common way to do this in java is to use org.apache.commons.lang.StringEscapeUtils.unescapeHtml() for this. But the org.apache.commons.lang package is not included in a standard serviio install, as far as I know.)

Re: Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 1:05 pm
by zip
I was looking at it recently too and didn't find anything. Does the Apache library actually deal with these advanced codes or does in only understand special characters, like ampersand, etc?

Re: Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 1:56 pm
by ottomatic
From the source code found at
http://www.docjar.com/html/api/org/apac ... .java.html
and
http://www.docjar.com/html/api/org/apac ... .java.html

It seems like the Entities.HTML40.unescape covers just about anything.

Re: Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 1:57 pm
by zip
Ok, I'll add the package to 1.1.

Re: Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 2:02 pm
by ottomatic
Cool.

So, if I want to reference it in my plugin before the release of 1.1, I presume it is alright to attach a jar file together with the plugin and a proposed FFMPeg wrapper (which will also be necessary for the plugin to work properly before the comma escape bug is fixed)?

Re: Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 2:19 pm
by zip
yes. Obviously it'd be nice if you can try it all in the upcoming beta.

Re: Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 2:27 pm
by ottomatic
Will do.

I'll apply for the betatester section asap.

Re: Utility to unescape HTML character entities?

PostPosted: Mon Nov 19, 2012 3:01 pm
by ottomatic
Petr,

I have now announced the new plugin in this forum thread:
viewtopic.php?f=20&t=8062

You may mark the old plugin as obsolete if you wish.

Regards

/ O