libtld: Main Page

libtld Documentation

The libtld project is a library that gives you the capability to determine the TLD part of any Internet URI or email address.

The main function of the library, tld(), takes a URI string and a tld_info structure. From that information it computes the position where the TLD starts in the URI. For email addresses (see the tld_email_list C++ object, or the tld_email.cpp file for the C functions,) it breaks down a full list of emails verifying the syntax as defined in RFC 5822.

For C Programmers

The C functions that you are expected to use are listed here:

For C++ Programmers

For C++ users, please make use of these tld classes:

In C++, you may also make use of the tld_version() to check the current version of the library.

To check whether the version is valid for your tool, you may look at the version handling of the libdebpackages library of the wpkg project. The libtld version is always a Debian compatible version.

For PHP Programmers

At this point I do not have a very good environment to recompile everything for PHP. The main reason is because the library is being compiled with cmake opposed to the automake toolchain that Zend expects.

This being said, the php directory includes all you need to make use of the library under PHP. It works like a charm for me and there should be no reason for you not to be able to do the same with the library.

The way I rebuild everything for PHP:

# from within the libtld directory:
mkdir ../BUILD
(cd ../BUILD; cmake ../libtld)
make -C ../BUILD
cd php

The build script will copy the resulting file where it needs to go using sudo. Your system (Red Hat, Mandrake, etc.) may use su instead. Update the script as required.

Note that the libtld will be linked statically inside the so you do not have to actually install the libtld environment to make everything work as expected.

The resulting functions added to PHP via this extension are:

  • check_tld()
  • check_uri()
  • check_email()

For information about these functions, check out the php/php_libtld.c file which describes each function, its parameters, and its results in great details.

Compiling on Other Platforms

We can successfully compile the library under MS-Windows with cygwin and the Microsoft IDE. To do so, we use the CMakeLists.txt file found under the dev directory. Overwrite the CMakeLists.txt file in the main directory before configuring and you'll get a library without having to first compile Qt4.

cp dev/libtld-only-CMakeLists.txt CMakeListst.txt

At this point this configuration only compiles the library. It gives you a shared (.DLL) and a static (.lib) version. With the IDE you may create a debug and a release version.

Later we'll look into having a single CMakeLists.txt so you do not have to make this copy.


We offer a file named example.c that shows you how to use the library in C. It is very simple, one main() function so it is very easy to get started with libtld.

For a C++ example, check out the src/validate_tld.cpp tool which was created as a command line tool coming with the libtld library.

/* TLD library -- TLD example
* Copyright (C) 2011-2015 Made to Order Software Corp.
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
#include "libtld/tld.h"
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
const char *uri = "WWW.Example.Co.Uk";
char *uri_lowercase;
struct tld_info info;
enum tld_result r;
if(argc > 1)
uri = argv[1];
// if your input may include uppercase characters and you
// do not have an easy way to compute the lowercase before
// calling tld(), call the tld_domain_to_lowercase() function
uri_lowercase = tld_domain_to_lowercase(uri);
r = tld(uri_lowercase, &info);
const char *s = uri_lowercase + info.f_offset - 1;
while(s > uri_lowercase)
if(*s == '.')
// here uri_lowercase points to your sub-domains, the length is
// "s - uri_lowercase"
// if uri_lowercase == s then there are no sub-domains
// s points to the domain name, the length is "info.f_tld - s"
// and info.f_tld points to the TLD
// When TLD_RESULT_SUCCESS is returned the domain cannot be an
// empty string; also the TLD cannot be empty, however, there
// may be no sub-domains.
printf("Sub-domain(s): \"%.*s\"\n", (int)(s - uri_lowercase), uri_lowercase);
printf("Domain: \"%.*s\"\n", (int)(info.f_tld - s), s);
printf("TLD: \"%s\"\n", info.f_tld);
return 0;
return 1;
// vim: ts=4 sw=4 et

Programmers & Maintainers

If you want to work on the library, there are certainly things to enhance. We could for example offer more offsets in the info string, or functions to clearly define each part of the URI.

However, the most important part of this library is the XML file which defines all the TLDs. Maintaining that file is what will help the most. It includes all the TLDs known at this point (as defined in different places such as Wikipedia and each different authority in that area.) The file is easy to read so you can easily find whether your extension is defined and if not you can let us know.

Library Requirements

  • Usage

The library doesn't need anything special. It's a few C functions.

The library also offers a C++ classes. You do not need a C++ compiler to use the library, but if you do program in C++, you can use the tld_object and tld_email_list instead of the C functions. It makes things a lot easier!

Also if you are programming using PHP, the library includes a PHP extension so you can check URIs and emails directly from PHP without trying to create crazy regular expressions (that most often do not work right!)

  • Compiling

To compile the library, you'll need CMake, a C++ compiler for different parts and the Qt library as we use the QtXml and QtCore (Qt4). The QtXml library is used to parse the XML file (tld_data.xml) which defines all the TLDs, worldwide.

To regenerate the documentation we use Doxygen. It is optional, though.

  • PHP

In order to recompile the PHP extension the Zend environment is required. Under a Debian or Ubuntu system you can install the php5-dev package.

Tests Coming with the Library

We have the following tests at this time:

This test checks the tld() function as end users of the library. It checks all the existing TLDs, a few unknown TLDs, and invalid TLDs.
This test verifies that the tld_object works as expected. It is not exhaustive in regard to the tld library itself, only of the tld_object.
This test includes the tld.c directly so it can check each internal function directly. This test checks the cmp() and search() functions, with full coverage.
This test runs 100% coverage of the tld_domain_to_lowercase() function. This includes conversion of XX encoded characters and UTF-8 to wide characters that can be case folded and saved back as encoded XX characters. The test verifies that all characters are properly supported and that errors are properly handled.
The Mozilla foundation offers a file with a complete list of all the domain names defined throughout the world. This test reads that list and checks all the TLDs against the libtld system. Some TLDs may be checked in multiple ways. We support the TLDs that start with an asterisk (*) and those that start with an exclamation mark (!) which means all the TLDs are now being checked out as expected. This test reads the effective_tld_names.dat file which has to be available in your current directory.
A copy of the Mozilla file is included with each version of the TLD library. It is named tests/effective_tld_names.dat and should be up to date when we produce a new version for download on
The library includes an advanced function that checks the validity of complete URIs making it very simple to test such in any software. The URI must include a scheme (often called protocol), fully qualified domain (sub-domains, domain, TLD), an absolute path, variables (after the question mark,) and an anchor. The test ensures that all the checks the parser uses are working as expected and allow valid URIs while it forbids any invalid URIs.
The libtld supports verifying and breaking up emails in different parts. This is done to make sure users enter valid emails (although it doesn't mean that the email address exists, it at least allows us to know when an email is definitively completely incorrect and should be immediately rejected.) The test ensures that all the different types of invalid emails are properly being caught (i.e. emails with control characters, invalid domain name, missing parts, etc.)
This test checks that the versions in all the files (two CMakeLists.txt and the changelog) are equal. If one of those does not match, then the test fails.
Shell script to run against the tld_data.xml file to ensure its validity. This is a good idea any time you make changes to the file. It runs with the xmllint tool. If you do not have the tool, it won't work. The tool is part of the libxml2-utils package under Ubuntu.

This document is part of the Snap! Websites Project.

Copyright by Made to Order Software Corp.

Syndicate content

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly