I am wondering if there is a parser or library in java for extracting the second level domain (SLD) in an URL - or failing that an algo or regex for doing the same. For example:
URI uri = new URI("http://www.mydomain.ltd.uk/blah/some/page.html");
String host = uri.getHost();
System.out.println(host);
which prints:
mydomain.ltd.uk
Now what I'd like to do is robustly identify the SLD ("ltd.uk") component. Any ideas?
Edit: I'm ideally looking for a general solution, so I'd match ".uk" in "police.uk", ".co.uk" in "bbc.co.uk" and ".com" in "amazon.com".
Thanks
解决方案
Don't know your purpose but Second-Level Domain may not mean much to you. You probably need to find public suffix and the domain right below it is what you are looking for.
Apache Http Component (HttpClient 4) comes with classes to handle this,
org.apache.http.impl.cookie.PublicSuffixFilter
org.apache.http.impl.cookie.PublicSuffixListParser
You need to download the public suffix list from here,
博客探讨在Java中提取URL二级域名(SLD)的方法,提出是否有解析器、库或算法、正则表达式可用。给出示例代码,后指出可寻找公共后缀,其下一级域名即为所求,还介绍了Apache Http Component相关处理类。
2120

被折叠的 条评论
为什么被折叠?



