libtld: /home/snapwebsites/BUILD/contrib/libtld/include/libtld/tld.h File Reference

libtld  1.5.13
A library to determine the Top-Level Domain name of any URL.
tld.h File Reference

The public header of the libtld library. More...

#include <string>
#include <vector>
#include <stdexcept>
Include dependency graph for tld.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

class  invalid_domain
 Exception thrown when querying for data of an invalid domain. More...
 
struct  tld_email
 Parts of one email. More...
 
class  tld_email_list
 The C++ side of the email list implementation. More...
 
struct  tld_email_list::tld_email_t
 Parts of one email. More...
 
struct  tld_info
 Set of information returned by the tld() function. More...
 
class  tld_object
 Class used to ease the use o the tld() function in C++. More...
 

Macros

#define LIBTLD_EXPORT
 The export API used by MS-Windows DLLs. More...
 
#define LIBTLD_VERSION   "1.5.13"
 The version of the library as a string. More...
 
#define LIBTLD_VERSION_MAJOR   1
 The major version as a number. More...
 
#define LIBTLD_VERSION_MINOR   5
 The minor version as a number. More...
 
#define LIBTLD_VERSION_PATCH   13
 The patch version as a number. More...
 
#define VALID_URI_ASCII_ONLY   0x0001
 Whether to check that the URI only includes ASCII. More...
 
#define VALID_URI_NO_SPACES   0x0002
 Whether to check that the URI do not include any spaces. More...
 

Enumerations

enum  tld_category {
  TLD_CATEGORY_INTERNATIONAL, TLD_CATEGORY_PROFESSIONALS, TLD_CATEGORY_LANGUAGE, TLD_CATEGORY_GROUPS,
  TLD_CATEGORY_REGION, TLD_CATEGORY_TECHNICAL, TLD_CATEGORY_COUNTRY, TLD_CATEGORY_ENTREPRENEURIAL,
  TLD_CATEGORY_BRAND, TLD_CATEGORY_UNDEFINED
}
 The list of categories for the different TLDs. More...
 
enum  tld_email_field_type {
  TLD_EMAIL_FIELD_TYPE_INVALID = -1, TLD_EMAIL_FIELD_TYPE_UNKNOWN, TLD_EMAIL_FIELD_TYPE_MAILBOX_LIST, TLD_EMAIL_FIELD_TYPE_MAILBOX,
  TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST, TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST_OPT
}
 Type of email as determined by the email_field_type() function. More...
 
enum  tld_result {
  TLD_RESULT_SUCCESS, TLD_RESULT_INVALID, TLD_RESULT_NULL, TLD_RESULT_NO_TLD,
  TLD_RESULT_BAD_URI, TLD_RESULT_NOT_FOUND
}
 The result returned by tld(). More...
 
enum  tld_status {
  TLD_STATUS_VALID, TLD_STATUS_PROPOSED, TLD_STATUS_DEPRECATED, TLD_STATUS_UNUSED,
  TLD_STATUS_RESERVED, TLD_STATUS_INFRASTRUCTURE, TLD_STATUS_UNDEFINED, TLD_STATUS_EXCEPTION = 100
}
 Defines the current status of the TLD. More...
 

Functions

LIBTLD_EXPORT enum tld_result tld (const char *uri, struct tld_info *info)
 Get information about the TLD for the specified URI. More...
 
LIBTLD_EXPORT enum tld_result tld_check_uri (const char *uri, struct tld_info *info, const char *protocols, int flags)
 Check that a URI is valid. More...
 
LIBTLD_EXPORT void tld_clear_info (struct tld_info *info)
 Clear the info structure. More...
 
LIBTLD_EXPORT char * tld_domain_to_lowercase (const char *domain)
 Transform a domain with a TLD to lowercase before processing. More...
 
LIBTLD_EXPORT struct tld_email_listtld_email_alloc ()
 Allocate a list of emails object. More...
 
LIBTLD_EXPORT int tld_email_count (struct tld_email_list *list)
 Return the number of emails found after a parse. More...
 
LIBTLD_EXPORT void tld_email_free (struct tld_email_list *list)
 Free the list of emails. More...
 
LIBTLD_EXPORT int tld_email_next (struct tld_email_list *list, struct tld_email *e)
 Retrieve the next email. More...
 
LIBTLD_EXPORT enum tld_result tld_email_parse (struct tld_email_list *list, const char *emails, int flags)
 Parse a list of emails in the email list object. More...
 
LIBTLD_EXPORT void tld_email_rewind (struct tld_email_list *list)
 Rewind the reading of the emails. More...
 
LIBTLD_EXPORT const char * tld_version ()
 Return the version of the library. More...
 

Detailed Description

This file declares all the functions, objects, structures, etc. publicly available from the libtld library.

Definition in file tld.h.

Macro Definition Documentation

#define LIBTLD_EXPORT

This definition is used to mark functions and classes as exported from the library. This allows other programs to automatically use functions defined in the library.

The LIBTLD_EXPORT may be set to dllexport or dllimport depending on whether you compile the library or you intend to link against it.

#define LIBTLD_VERSION   "1.5.13"

This definition represents the version of the libtld header you are compiling against. You can compare it to the returned value of the tld_version() function to make sure that everything is compatible (i.e. if the version is not the same, then the tld_info structure may have changed.)

Referenced by main(), and tld_version().

#define LIBTLD_VERSION_MAJOR   1

This definition represents the major version of the libtld header you are compiling against.

#define LIBTLD_VERSION_MINOR   5

This definition represents the minor version of the libtld header you are compiling against.

#define LIBTLD_VERSION_PATCH   13

This definition represents the patch version of the libtld header you are compiling against. Some people call this number the release number.

#define VALID_URI_ASCII_ONLY   0x0001

By default the tld_check_uri() function accepts any extended character (i.e. characters over 0x80). This flag can be used to refuse such characters.

Referenced by tld_check_uri().

#define VALID_URI_NO_SPACES   0x0002

By default the tld_check_uri() function accepts spaces as valid characters in a URI (whether they are explicit " ", or written as "+" or "%20".) This flag can be used to refuse all spaces (i.e. this means the "+" and "%20" are also refused.)

Referenced by tld_check_uri().

Enumeration Type Documentation

Defines the category of the TLD. The most well known categories are International TLDs (such as .com and .info) and the countries TLDs (such as .us, .uk, .fr, etc.)

IANA offers and is working on other extensions such as .pro for profesionals, and .arpa for their internal infrastructure.

Enumerator
TLD_CATEGORY_INTERNATIONAL 

International TLDs.

This category represents TLDs that can be used by anyone anywhere in the world. In some cases, these have some limits (i.e. only a museum can register a .museum TLD.) However, the most well known international extension is .com and this one has absolutely no restrictions.

TLD_CATEGORY_PROFESSIONALS 

Professional TLDs.

This category is offered to professionals. Some countries already offer second-level domain name registrations for professionals and either way they are not used very much. These are reserved for people such as accountants, attorneys, and doctors.

Only people who have a lisence with a government can register a .pro domain name.

TLD_CATEGORY_LANGUAGE 

Language specific TLDs.

At time of writing, there is one language extension: .cat for the Catalan language. The idea of the language extensions is to offer a language, rather than a country, a way to have a website that all the people on the Earth can read in their language.

TLD_CATEGORY_GROUPS 

Groups specific TLDs.

The concept of groups is similar to the language grouping, but in this case it may reference to a specific group of people (but not based on anything such as etnicity.)

Examples of groups are Kids, Gay people, Ecologists, etc. This is only proposed at this point.

TLD_CATEGORY_REGION 

Region specific TLDs.

It has been proposed, like the .eu, to have extensions based on well defined regions such as .asia for all of Asia. We currently also have .aq for Antartique. Some proposed regions are .africa and city names such as .paris and .wien.

Old TLDs that were for countries but are not assigned to those because the country disappeared (i.e. in general was split in two and both new countries have different names,) and future regions appear in this category.

We keep old TLDs because it is not unlikely that such will be used every now and then and they can, in this way, cleanly be refused by your software.

TLD_CATEGORY_TECHNICAL 

Technical extensions are considered internal.

These are likely valid (i.e. the .arpa is valid) but are used for technical reasons and not for regular URIs. So they are present but must certainly be ignored by your software.

To avoid returning TLD_RESULT_SUCCESS when a TLD with such a category is found, we mark these with the TLD_STATUS_INFRASTRUCTURE.

TLD_CATEGORY_COUNTRY 

A country extension.

Most of the extensions are country extensions. Country extensions are generally further broken down with second-level domain names. Some countries even have third, forth, and fifth level domain names.

TLD_CATEGORY_ENTREPRENEURIAL 

A private extension.

Some private companies and individuals purchased domains that they then use as a TLD reselling sub-domains from that main domain name.

For example, the ".blogspot.com" domain is offered by blogspot as a TLD to their users. This gives the users the capability to define a cookie at the ".blogspot.com" level but not directly under ".com". In other words, two distinct site such as:

  • "a.blogspot.com", and
  • "b.blogspot.com"

cannot share their cookies. Yet, ".com" by itself is also a top-level domain name that anyone can use.

TLD_CATEGORY_BRAND 

The TLD is owned and represents a brand.

This category is used to mark top level domain names that are specific to one company. Note that certain TLDs are owned by companies now, but they are not automatically marked as a brand (i.e. ".lol").

TLD_CATEGORY_UNDEFINED 

The TLD was not found.

This category is used to initialize the information structure and is used to show that the TLD was not found.

Definition at line 51 of file tld.h.

A string may represent various types of email data which are represented by the type in this enumeration.

Enumerator
TLD_EMAIL_FIELD_TYPE_INVALID 

The input of email_field_type() was not valid.

An email field is expected to be valid ASCII characters. This error is returned if invalid characters are found.

TLD_EMAIL_FIELD_TYPE_UNKNOWN 

The input does not represent valid emails.

The email_field_type() function returns this value if the input field does not represent what is considered a field with email addresses. If you are parsing many email fields, you probably want to see this as a soft error (i.e. an error saying that the field can be skip as far as the TLD library is concerned.)

TLD_EMAIL_FIELD_TYPE_MAILBOX_LIST 

The input represents a mailbox list.

The fields FROM and RESENT-FROM are viewed as mailbox lists. These fields may include a list of email addresses.

TLD_EMAIL_FIELD_TYPE_MAILBOX 

The input represents a mailbox.

The fields SENDER and RESENT-SENDER are viewed as mailbox fields. These are expected to include only one email address.

TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST 

The input represents a mandatory list of mailboxes.

The fields TO, CC, REPLY-TO, RESENT-TO, and RESENT-CC are viewed as mailbox fields. These are expected to include any number of email addresses.

TLD_EMAIL_FIELD_TYPE_ADDRESS_LIST_OPT 

The input represents an optional list of email addresses.

The fields BBC and RESENT-BBC are viewed as optional mailbox fields. These may not exist, be empty, or have one or more email addresses.

Definition at line 125 of file tld.h.

enum tld_result

This enumeration defines all the possible results of the tld() function.

Only the TLD_RESULT_SUCCESS is considered to represent a valid result.

The TLD_RESULT_INVALID represents a TLD that was found but is not currently marked as valid (it may be deprecated or proposed, for example.)

Enumerator
TLD_RESULT_SUCCESS 

Success! The TLD of the specified URI is valid.

This result is returned when the URI includes a valid TLD. The function further includes valid results in the tld_info structure.

You can accept this URI as valid.

TLD_RESULT_INVALID 

The TLD was found, but it is marked as invalid.

This result represents a TLD that is not valid as is for a URI, but it was defined in the TLD data. The function includes further information in the tld_info structure. There you can check the category, status, and other parameters to determine what the TLD really represents.

It may be possible to use such a TLD, although as far as web addresses are concerned, these are not considered valid. As mentioned in the statuses, some may mean that the TLD can be changed for another and work (i.e. a country name that changed.)

TLD_RESULT_NULL 

The input URI is empty.

The tld() function returns this value whenever the input URI pointer is NULL or the empty string (""). Obviously, no TLD is found in this case.

TLD_RESULT_NO_TLD 

The input URI has no TLD defined.

Whenever the URI does not include at least one period (.), this error is returned. Local URIs are considered valid and don't generally include a period (i.e. "localhost", "my-computer", "johns-computer", etc.) We expect that the tld() function would not be called with such URIs.

A valid Internet URI must include a TLD.

TLD_RESULT_BAD_URI 

The URI includes characters that are not accepted by the function.

This value is returned if a character is found to be incompatible or a sequence of characters is found incompatible.

At this time, tld() returns this error if two periods (.) are found one after another. The errors will be increased with time to detect invalid characters (anything outside of [-a-zA-Z0-9.%].)

Note that the URI should not start or end with a period. This error will also be returned (at some point) when the function detects such problems.

TLD_RESULT_NOT_FOUND 

The URI has a TLD that could not be determined.

The TLD of the URI was searched in the TLD data and could not be found there. This means the TLD is not a valid Internet TLD.

Definition at line 81 of file tld.h.

enum tld_status

Each TLD has a status. By default, it is generally considered valid, however, many TLDs are either proposed or deprecated.

Proposed TLDs are not yet officially accepted by the official entities taking care of those TLDs. They should be refused, but may become available later.

Deprecated TLDs were in use before but got dropped. They may be dropped because a country doesn't follow up on their Internet TLD, or because the extension is found to be boycotted.

Enumerator
TLD_STATUS_VALID 

The TLD is currently valid.

This status represents a TLD that is currently fully valid and supported by the owners.

These can be part of URIs representing valid resources.

TLD_STATUS_PROPOSED 

The TLD was proposed but not yet accepted.

The TLD is nearly considered valid, at least it is in the process to get accepted. The TLD will not work until officially accepted.

No valid URIs can include this TLD until it becomes TLD_STATUS_VALID.

TLD_STATUS_DEPRECATED 

The TLD was once in use.

This status is used by TLDs that were valid (TLD_STATUS_VALID) at some point in time and was changed to another TLD rendering that one useless (or incorrect in the case of a country name change.)

This status means such URIs are not to be considered valid. However, it may be possible to emit a 301 (in terms of HTTP protocol) to fix the problem.

TLD_STATUS_UNUSED 

The TLD was officially assigned but not put to use.

This special status is used for all the TLDs that were assigned to a specific entity, but never actually put to use. Many smaller countries (especially islands) are assigned this status.

Unused TLDs are not valid in any URI until marked valid.

TLD_STATUS_RESERVED 

The TLD is reserved so no one can use it.

This special case forces the specified TLDs into a "do not use" list. Seeing such TLDs may happen by people who whish it were official, but it is not considered legal.

A reserved TLD may represent a second TLD that was assigned to a specific country or other category. It may be possible to do a transfer from that TLD to the official TLD (i.e. Great Britain was assigned .gb, but instead uses .uk; URIs with .gb could be transformed with .uk and checked for validity.)

TLD_STATUS_INFRASTRUCTURE 

These TLDs are reserved for the Internet infrastructure.

These TLDs cannot be used with standard URIs. These are used to make the Internet functional instead.

All URIs for standard resources must refuse these URIs.

TLD_STATUS_UNDEFINED 

Special status to indicate we did not find the TLD.

The info structure is returned with an undefined status whenever the TLD could not be found in the list of existing TLDs. This means the URI is completely invalid. (The only exception would be if you support some internal TLDs.)

URI what cannot get a TLD_STATUS_VALID should all be considered invalid. But those marked as TLD_STATUS_UNDEFINED are completely invalid. This being said, you may want to make sure you passed the correct string. The URI must be just and only the set of sub-domains, the domain, and the TLDs. No protocol, slashes, colons, paths, query strings, anchors are accepted in the URI.

TLD_STATUS_EXCEPTION 

Special status to indicate an exception which is not directly a TLD.

When a NIC decides to change their setup it can generate exceptions. For example, the UK first made use of .uk and as such offered a few customers to use .uk. Later they decided to only offer second level domain names such as the .co.uk and .ac.uk. This generates a few exceptions on the .uk domain name. For example, the police.uk domain is still in use and thus it is an exception. We reference it as ".police.uk" in our XML data file yet the TLD in that case is just ".uk".

Definition at line 65 of file tld.h.

Function Documentation

LIBTLD_EXPORT enum tld_result tld ( const char *  uri,
struct tld_info info 
)

The tld() function searches for the specified URI in the TLD descriptions. The results are saved in the info parameter for later interpretetation (i.e. extraction of the domain name, sub-domains and the exact TLD.)

The function extracts the last extension of the URI. For example, in the following:

1 example.co.uk

the function first extracts ".uk". With that extension, it searches the list of official TLDs. If not found, an error is returned and the info parameter is set to unknown.

When found, the function checks whether that TLD (".uk" in our previous example) accepts sub-TLDs (second, third, forth and fifth level TLDs.) If so, it extracts the next TLD entry (the ".co" in our previous example) and searches for that second level TLD. If found, it again tries with the third level, etc. until all the possible TLDs were exhausted. At that point, it returns the last TLD it found. In case of ".co.uk", it returns the information of the ".co" TLD, second-level domain name.

All the comparisons are done in lowercase. This is because all the data is saved in lowercase and we expect the input of the tld() function to already be in lowercase. If you have a doubt and your input may actually be in uppercase, make sure to call the tld_domain_to_lowercase() function first. That function makes a duplicate of your domain name in lowercase. It understands the XX characters (since the URI is expected to still be encoded) and properly handles UTF-8 characters in order to define the lowercase characters of the input. Note that the function returns a newly allocated pointer that you are responsible to free once you are done with it.

Warning
If you call tld() with the pointer return by tld_domain_to_lowercase(), keep in mind that the tld() function saves pointers of the input string directly in the tld_info structure. In other words, you want to free() that string AFTER you are done with the tld_info structure.

The info structure includes:

  • f_category – the category of TLD, unless set to TLD_CATEGORY_UNDEFINED, it is considered valid
  • f_status – the status of the TLD, unless set to TLD_STATUS_UNDEFINED, it was defined from the tld_data.xml file; however, only those marked as TLD_STATUS_VALID are considered to currently be in use, all the other statuses can be used by your software, one way or another, but it should not be accepted as valid in a URI
  • f_country – if the category is set to TLD_CATEGORY_COUNTRY then this pointer is set to the name of the country
  • f_tld – is set to the full TLD of your domain name; this is a pointer WITHIN your uri string so make sure you keep your URI string valid if you intend to use this f_tld string
  • f_offset – the offset to the first period within the domain name TLD (i.e. in our previous example, it would be the offset to the first period in ".co.uk", so in "example.co.uk" the offset would be 7. Assuming you prepend "www." to have the URI "www.example.co.uk" then the offset would be 11.)
Note
In our previous example, the ".uk" TLD is properly used: it includes a second level domain name (".co".) The URI "example.uk" should have returned TLD_RESULT_INVALID since .uk by itself was not supposed to be acceptable. This changed a few years ago. The good thing is that it resolves some problems as some companies were given a simple ".uk" TLD and these were exceptions the library does not need to support anymore. There are still some countries, such as ".bd", which do not accept second level names, so "example.bd" does return an error (TLD_RESULT_INVALID).

Assuming that you always get valid URIs, you should get one of those results:

  • TLD_RESULT_SUCCESS – success! the URI is valid and the TLD was properly determined; use the f_tld or f_offset to extract the TLD domain and sub-domains
  • TLD_RESULT_INVALID – known TLD, but not currently valid; this result is returned when we know that the TLD is not to be accepted

Other results are returned when the input string is considered invalid.

Note
The function only accepts a bare URI, in other words: no protocol, no path, no anchor, no query string, and still URI encoded. Also, it should not start and/or end with a period or you are likely to get an invalid response. (i.e. don't use any of ".example.co.uk.", "example.co.uk.", nor ".example.co.uk")
/* TLD library -- TLD example
* Copyright (c) 2011-2019 Made to Order Software Corp. All Rights Reserved
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
#include "libtld/tld.h"
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
const char *uri = "WWW.Example.Co.Uk";
char *uri_lowercase;
struct tld_info info;
enum tld_result r;
if(argc > 1)
{
uri = argv[1];
}
// if your input may include uppercase characters and you
// do not have an easy way to compute the lowercase before
// calling tld(), call the tld_domain_to_lowercase() function
uri_lowercase = tld_domain_to_lowercase(uri);
r = tld(uri_lowercase, &info);
{
const char *s = uri_lowercase + info.f_offset - 1;
while(s > uri_lowercase)
{
if(*s == '.')
{
++s;
break;
}
--s;
}
// here uri_lowercase points to your sub-domains, the length is
// "s - uri_lowercase"
// if uri_lowercase == s then there are no sub-domains
// s points to the domain name, the length is "info.f_tld - s"
// and info.f_tld points to the TLD
//
// When TLD_RESULT_SUCCESS is returned the domain cannot be an
// empty string; also the TLD cannot be empty, however, there
// may be no sub-domains.
printf("Sub-domain(s): \"%.*s\"\n", (int)(s - uri_lowercase), uri_lowercase);
printf("Domain: \"%.*s\"\n", (int)(info.f_tld - s), s);
printf("TLD: \"%s\"\n", info.f_tld);
free(uri_lowercase);
return 0;
}
free(uri_lowercase);
return 1;
}
// vim: ts=4 sw=4 et
Parameters
[in]uriThe URI to be checked.
[out]infoA pointer to a tld_info structure to save the result.
Returns
One of the TLD_RESULT_... enumeration values.

Definition at line 555 of file tld.c.

References tld_description::f_category, tld_info::f_category, tld_info::f_country, tld_description::f_country, tld_description::f_end_offset, tld_description::f_exception_apply_to, tld_description::f_exception_level, tld_info::f_offset, tld_description::f_start_offset, tld_description::f_status, tld_info::f_status, tld_info::f_tld, search(), tld_clear_info(), tld_descriptions, tld_end_offset, tld_max_level, TLD_RESULT_BAD_URI, TLD_RESULT_INVALID, TLD_RESULT_NO_TLD, TLD_RESULT_NOT_FOUND, TLD_RESULT_NULL, TLD_RESULT_SUCCESS, tld_start_offset, TLD_STATUS_EXCEPTION, and TLD_STATUS_VALID.

Referenced by cat_ext(), snap::output_tlds(), tld_email_list::tld_email_t::parse(), PHP_FUNCTION(), snap::read_tlds(), search(), tld_object::set_domain(), tld_check_uri(), and tld_encode().

LIBTLD_EXPORT enum tld_result tld_check_uri ( const char *  uri,
struct tld_info info,
const char *  protocols,
int  flags 
)

This function very quickly parses a URI to determine whether it is valid.

Note that it does not (currently) support local naming conventions which means that a host such as "localhost" will fail the test.

The protocols variable can be set to a list of protocol names that are considered valid. For example, for HTTP protocol one could use "http,https". To accept any protocol use an asterisk as in: "*". The protocol must be only characters, digits, or underscores ([0-9A-Za-z_]+) and it must be at least one character.

The flags can be set to the following values, or them to set multiple flags at the same time:

  • VALID_URI_ASCII_ONLY – refuse characters that are not in the first 127 range (we expect the URI to be UTF-8 encoded and any byte with bit 7 set is considered invalid if this flag is set, including encoded bytes such as A0)
  • VALID_URI_NO_SPACES – refuse spaces whether they are encoded with + or %20 or verbatim.

The return value is generally TLD_RESULT_BAD_URI when an invalid character is found in the URI string. The TLD_RESULT_NULL is returned if the URI is a NULL pointer or an empty string. Other results may be returned by the tld() function. If a result other than TLD_RESULT_SUCCESS is returned then the info structure may or may not be updated.

Parameters
[in]uriThe URI which validity is being checked.
[out]infoThe resulting information about the URI domain and TLD.
[in]protocolsList of comma separated protocols accepted.
[in]flagsA set of flags to tell the function what is valid/invalid.
Returns
The result of the operation, TLD_RESULT_SUCCESS if the URI is valid.
See also
tld()
Todo:
The following is WRONG:
  • the domain %XX are not being checked properly, as it stands the characters following % can be anything!
  • the tld() function must be called with the characters still encoded; if you look at the data, you will see that I kept the data encoded (i.e. with the %XX characters)
  • what could be checked (which I guess could be for the entire domain name) is whether the entire string represents valid UTF-8; I don't think I'm currently doing so here. (I have such functions in the tld_domain_to_lowercase() now)

Definition at line 741 of file tld.c.

References tld_info::f_offset, tld_info::f_tld, h2d(), tld(), tld_clear_info(), TLD_RESULT_BAD_URI, TLD_RESULT_NULL, VALID_URI_ASCII_ONLY, and VALID_URI_NO_SPACES.

Referenced by check_uri(), and PHP_FUNCTION().

LIBTLD_EXPORT void tld_clear_info ( struct tld_info info)

This function initializes the info structure with defaults. The different TLD functions that make use of this structure will generally call this function first to represent a failure case.

Note that by default the category and status are set to undefined (TLD_CATEGORY_UNDEFINED and TLD_STATUS_UNDEFINED). Also the country and tld pointer are set to NULL and thus they cannot be used as strings.

Parameters
[out]infoThe tld_info structure to clear.

Definition at line 441 of file tld.c.

References tld_info::f_category, tld_info::f_country, tld_info::f_offset, tld_info::f_status, tld_info::f_tld, TLD_CATEGORY_UNDEFINED, and TLD_STATUS_UNDEFINED.

Referenced by tld(), and tld_check_uri().

LIBTLD_EXPORT char* tld_domain_to_lowercase ( const char *  domain)

This function will transform the input domain name to lowercase. You should call this function before you call the tld() function to make sure that the input data is in lowercase.

This function interprets the XX input data and transforms that to characters. The function further converts UTF-8 characters to wide characters to be able to determine the lowercase version.

Warning
The function allocates a new buffer to save the result in it. You are responsible for freeing that buffer. So the following code is wrong:
1 struct tld_info info;
2 tld(tld_domain_to_lowercase(domain), &info);
3 // WRONG: tld_domain_to_lowercase() leaked a heap buffer

In C++ you may use an std::unique_ptr<> with free as the deleter to not have to bother with the call by hand (especially if you have possible exceptions in your code):

1 std::unique_ptr<char, void(*)(char *)> lowercase_domain(tld_domain_to_lowercase(domain.c_str()), reinterpret_cast<void(*)(char *)>(&::free));
Parameters
[in]domainThe input domain to convert to lowercase.
Returns
A pointer to the resulting conversion, NULL if the buffer cannot be allocated or the input data is considered invalid.

Definition at line 492 of file tld_domain_to_lowercase.c.

References tld_mbtowc(), and tld_wctomb().

Referenced by tld_email_list::tld_email_t::parse().

LIBTLD_EXPORT struct tld_email_list* tld_email_alloc ( )

This function allocates a list of emails object that can then be used to parse a string representing a list of emails and retrieve those emails with the use of the tld_email_next() function.

Note
The object is a C++ class.
Returns
A pointer to a list of emails object.
See also
tld_email_next()

Definition at line 1483 of file tld_emails.cpp.

References tld_email_list::tld_email_list().

Referenced by email_to_vstring(), and PHP_FUNCTION().

LIBTLD_EXPORT int tld_email_count ( struct tld_email_list list)

This function returns the number of emails that were found in the list of emails passed to the tld_email_parse() function.

Parameters
[in]listThe email list object.
Returns
The number of emails defined in the object, it may be zero.

Definition at line 1528 of file tld_emails.cpp.

References tld_email_list::count().

Referenced by email_to_vstring().

LIBTLD_EXPORT void tld_email_free ( struct tld_email_list list)

This function frees the list of emails as allocated by the tld_email_alloc(). Afterward the list pointer is not valid anymore.

Parameters
[in]listThe list to be freed.

Definition at line 1496 of file tld_emails.cpp.

References list().

Referenced by email_to_vstring(), and PHP_FUNCTION().

LIBTLD_EXPORT int tld_email_next ( struct tld_email_list list,
struct tld_email e 
)

This function retrieves the next email found when parsing the emails passed to to the tld_email_parse() function. The function returns 1 when another email was defined. It returns 0 when no more emails exist and the e parameter does not get set. The function can be called any number of times after it returned zero (0).

Parameters
[in]listThe list from which the email is to be read.
[out]eThe buffer where the email is to be written.
Returns
The function returns 0 if the end of the list was reached, it returns 1 if e was defined with the next email.
See also
tld_email_parse()

Definition at line 1562 of file tld_emails.cpp.

References tld_email_list::next().

Referenced by email_to_vstring(), and PHP_FUNCTION().

LIBTLD_EXPORT enum tld_result tld_email_parse ( struct tld_email_list list,
char const *  emails,
int  flags 
)

This function parses the email listed in the emails parameter and saves the result in the list parameter. The function saves the information as a list of email list in the list object.

Parameters
[in]listThe list of emails object.
[in]emailsThe list of emails to be parsed.
[in]flagsThe flags are used to change the behavior of the parser.
Returns
TLD_RESULT_SUCCESS if the email was parsed successfully, another TLD_RESULT_... when an error is detected

Definition at line 1514 of file tld_emails.cpp.

References tld_email_list::parse().

Referenced by email_to_vstring(), and PHP_FUNCTION().

LIBTLD_EXPORT void tld_email_rewind ( struct tld_email_list list)

This function resets the position to the start of the list. The next call to the tld_email_next() function will return the first email again.

Parameters
[in]listThe list of email object to reset.

Definition at line 1541 of file tld_emails.cpp.

References tld_email_list::rewind().

Referenced by email_to_vstring().

LIBTLD_EXPORT const char* tld_version ( )

This functino returns the version of this library. The version is defined with three numbers: <major>.<minor>.<patch>.

You should be able to use the libversion to compare different libtld versions and know which one is the newest version.

Returns
A constant string with the version of the library.

Definition at line 1043 of file tld.c.

References LIBTLD_VERSION.

Referenced by cat_ext(), main(), and tld_encode().

This document is part of the Snap! Websites Project.

Copyright by Made to Order Software Corp.

Syndicate content

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly