The libtld is a library used to extract the TLD from a URI. This allows you to extract the exact domain name, sub-domains, and all the TLD (top level, second level, third level, etc.)
The problem with TLDs is that you cannot know where the domain starts. Some domains can use one top-level domain, others use two, etc. However, it may be useful to know where the domain is to have the exact list of sub-domains. For example, if you want to force www. at the start of the domain name if no other sub-domains are specified, then you need to know exactly how many TLD are defined in a URI.
The libtld offers one main function: tld(), which gives you a way to extract the TLD from any URI. The result is the offset where the TLD starts. This gives you enough information to extract everything else you need.
You can download the TLD Library from SourceForge.net (source, binaries for Linux in .deb, documentation).
If you want to live on the edge you can get the latest source code from git.
You got a problem with the library? An idea to improve it? Please post a ticket in the Support area of SourceForget.net.
The library offers:
The development environment required CMake to generate the Makefiles.
The XML parser makes use of C++ and Qt4. Note that you may just compile the libtld library using the dev/libtld-only-CMakeListst.txt which allows you to skip on the Qt4 dependency. Later we will have a better set of CMakeLists.txt with better tests to automatically avoid dependencies when not available.
The library itself does not have any requirements (other than a C or C++ compiler, obviously.) It comes with one header (tld.h).
The PHP extension requires the php5-dev environment (also called Zend) to get compiled.
The libtld has a small documentation since it includes only a very few functions. The documentation is available in the References section of this website.
To compile the library, you need cmake. If this is your first time with cmake, do not be afraid, it is very easy to use. There are the few steps to get the library compiled:
cd to/directory/with/libtld-1.4.0.tar.gz tar xf libtld-1.4.0.tar.gz mkdir build cd build cmake ../libtld-1.4.0 make sudo make install
The last line is not required if you do not want to install the library on your system. To change the installation path, you can use cmake this way:
cmake -DCMAKE_INSTALL_PREFIX=/absolute/path/to/somewhere/else ../libtld-1.4.0
and now the files will go under /absolute/path/tp/somwhere/else.
If you have warnings appearing, edit the main CMakeLists.txt and change the lines setting up the C and CXX flags. I already offer the ones with just -O3. Note that I will look into a way for future versions to offer a better solution (i.e. a cmake flag that can be used to switch the warnings ON instead of having them ON by default.)
The Mozilla Foundation keeps a list of top-level domain names as a text file including comments. The project is called Public Suffix List.
I'm thinking to add a test that uses that link to check the libtld against URLs generated using this list. That way we can easily find discrepancies. From a quick look, it seems more complete, but at the same time, it looks like it may include valid URLs.
The list is very specifically used to handle cookies and prevent users from assigning a cookie at the wrong level (i.e. you may assign snapwebsites.org as the domain of a cookie, but not just .org; see Supercookie in wikipedia.)
The idn library includes support for checking TLDs in a string. Their interface is 100% in C and it includes the possibility to have additional definitions (overrides) of the TLD data.
On an Ubuntu system you can install the development library with:
sudo apt-get install libidn11-dev
At this time I did not test that library, however, I changed the name of the libtld library header from just tld.h to libtld/tld.h to avoid the header conflict (because libidn also called their TLD header tld.h).
You can find manual pages by looking up the existing functions in the /usr/include/tld.h from the libidn11-dev package and use man to find the corresponding pages. For example:
It feels like they offer a check of the characters of a domain as per each TLD defined rules.