人気ブログランキング | 話題のタグを見る

[Java]HTML内の危険なタグを削除



**
* HTML内の危険なタグ(script)とかを削除
*
* @param url
* @return HTMLテキスト
*/
static public String replace_unsafe_tags(url){
try{
URI base_uri = new URI(url);
Document doc = createHTMLDocument(url);

//--- 危険なタグを削除 ---//
String[] tags = {"//SCRIPT","//BODY[@onload]",
"//PLAINTEXT","//XMP","//XML","//LISTING","//ISINDEX","//OBJECT[@classid='#']"};
for(int i=0; i<tags.length; i++){
nl = XPathAPI.selectNodeList(doc,tags[i]);
for(int t=0; t<nl.getLength(); t++){
Node n = nl.item(t);
Node pn = n.getParentNode();
pn.removeChild(n);
}
}

//--- DOMをStringにして返す ---//
TransformerFactory tfactory = TransformerFactory.newInstance();
Transformer transformer = tfactory.newTransformer();
StringWriter sw = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(sw));
return sw.toString();
}catch(URISyntaxException e){
System.err.println("replace_unsafe_tags:URISyntaxException:"+text);
}catch(TransformerException e){
System.err.println("rreplace_unsafe_tags:TransformerException:"+text);
}
return null;
}

by etrojan2006 | 2007-02-14 15:27 | Java  

<< [Ruby]スレッド起動 [Java]HTML内のAタグ... >>