Snap! Websites Grammar version 1.0 working!

The Snap! system is to make use of a lexx & yacc like capability so we can include fields that accept very complex expressions. For example, we want to support fields in HTML forms where you can enter sqrt(sin(20) * 3) and get the expected result.

At this point though, it is used for the domain names and website names. These are complex enough to justify the grammar. I'm to finish up the grammar of the domain and website implementations, but the grammar itself works. I have a test that checks that in detail enough to prove that it is now in place.

At this point, the domain grammar looks as follow (the lexer is a C/C++ lexer, you cannot change that part at this point):

start: rule_list

rule_list: rule
         | rule_list rule
rule: IDENTIFIER '{' sub_domain_list '}' ';'

sub_domain_list: sub_domain
               | sub_domain_list sub_domain
sub_domain: optional sub_domain_var ';'
          | required sub_domain_var ';'
sub_domain_var: qualified_name '=' string
              | qualified_name '=' website '(' STRING ',' STRING ')'
              | qualified_name '=' flag '(' STRING [ ',' STRING ] ')'

qualified_name: IDENTIFIER
              | qualified_name '::' IDENTIFIER

I have plans to extend the qualified names with namespace definitions but at this point it is not required as the qualified name is sufficient to get the full functionality we're looking for.

I suppose you can see that this is quite complex already. Note that our grammar makes use of C++ only (a bit like Spirit, except that ours happens to work.) It is compatible with Qt, and at some point we'll certainly want to make it available to the Qt community. However, right now it's not yet ready to be its own library.

One detail about the current implementation: the compiling of input data is done on the fly. That means the grammar is not compiled first. It makes it a lot faster on the initialization side and likely to save many cycles in most cases (when the grammar is not too complex or does not require all the rules to evaluate the input being parsed.)

The following is a small sample of the grammar as it appears in a C++ program. It shows how the rule_list choices are added to the start choices and then how the lexer tokens are parsed (g.parser() call!) and how you get the result. g is the grammar. The lexer is initialized with a string.

    // lexer definition
    lexer l;
    l.set_input(input);

    --snip--

    // rule_list
    choices rule_list(&g, "rule_list");
    rule_list >>= rule >= set_new_rule_list
                | rule_list >> rule >= set_add_rule_list
    ;

    // start
    choices start(&g, "start");
    start >>= rule_list;

    if(!g.parse(l, start)) {
        return false;
    }

    // it worked, manage the result (check it)
    QSharedPointer<token_node> r(g.get_result());

The result will depend on your function. In the sample we see set_new_rule_list and set_add_rule_list. Those functions will massage the data and generally save it in a tree that can later be reused or saved. The data can be used immediately as well if your grammar allows it.

 

 

Comments

Replied

McweGUYGgeOd

hehe, to be honest the hardest part is handled by llvm @thomas, yes you are definitively right, buffer overflow are a plague, you can never be fully protected from them, at best, you have taken as much protection as possible. One things that gives some protection in that area is that the actual library and llvm are written in C++ using the standard string, which reduce a lot string overflow, compared to use char*. There is also a need for protection on access to arrays inside a Shiva/CTL program, stuff like int array[1]; array[-10000] = 10; could give access for viruses. One other level of protection is that neither CTL nor Shiva allows machine code extensions and don't provide system function. But yes, since security is one of the goals, and I invite anyone to find security issues and to report them to me.

Replied

Hey, that's a clever way of

Hey, that's a clever way of thinking about it.

Snap! Websites
An Open Source CMS System in C++

Contact Us Directly