Tree Transducer Language Membership

Sorry for vanishing for so long- I was trying to track down the reference for this problem, which is from a Ph.D. thesis from 1977, so was hard to get.  I probably could have (or should have) moved on to the next problem while we were working on that, but doing these problems in order is too ingrained in me to change now.

The problem: Tree Transducer Language Membership.  This is problem AL21 in the appendix.

The description: Given a description of a Top-Down Finite State Tree Transducer (see below) T and a string in its output grammar w, is w generated by some initial string by T?

A Top-Down Finite State Tree Transducer (abbreviated “t-fst” by Reiss in his thesis) defines a tree for rewriting strings into other strings.  Each rule replaces a tree (or a subtree) with a new tree (or subtree).

Example: Here’s an example Reiss uses:

What you’re seeing here is a tree that can rewrite strings of the form an into strings of the form an^2.  The bottom part shows how this set of rewritings can turn the string “aa” into the string “aaaa”.  First, we apply the first rule (on our starting “q1” tree) into the second tree.  Then we have a second rule to replace a q1 tree with a single child and a single grandchild with the same tree without the q1.  We have similar rules to remove the q2 symbols in the tree.  The final tree is a derivation for “aaaa”.

The reason the capital “A” symbols are in the trees is because these trees are parse trees for context-free grammars.  In particular, these trees come from the CFG:

A->Aa | a

Notice though that our tree rewriting rules only turn certain parse trees into other parse trees.

So, an instance of our problem is: Given a result string (such as “aaaa”) does there exist some initial string (such as “aa”) that our tree rewriting rules can generate?  Reiss calls this the “Inverting” a t-fst.

Reduction: Reiss reduces from 3SAT.  Our 3SAT instance will have m variables and r clauses.  We will assume that each variable appears at most once in a clause, and that r is an exact power of 2 (r = 2k).  We can add dummy clauses to ensure this.

First, he defines a “standard” set of tree rewriting rules.  These rules are always the same and do not depend on our SAT instance.  The rules will take a string of the form 1k\$<variable settings>\$, where <variable settings> is a string of m “t” or “f” symbols corresponding to the settings of the variables.

The output of the transformations is a string built out of one substring for each clause: 0m+7\$<m a b or c symbols>.  The substrings for each clause are concatenated together.

Our problem instance is to start with a string in the form of this output transformation and see if an input string exists (and to show that one does if and only if the SAT instance was satisfiable).  Each variable contributes an a,b, or c symbol to the clause substring as follows:

• If the variable does not appear in the clause, we choose a.
• If the variable appears positively in the clause, we choose b.
• If the variable appears negatively in the clause, we choose c.
• We also reverse the ordering of the letters (so variable 1’s letter appears last)

So, suppose we have (v1, v2, ~v3) and (~v1, v3, v4) as two clauses.  Our initial string w would be: 00000000000\$acbb00000000000bbac

We’re looking for a string like 1\$tftt:  1’s equal to the log of the # of clauses, and then the truth values of the variables

Here’s another example from the thesis.  Note that each variable appears in each clause, so there are no “a” symbols:

If the formula is satisfiable, then we have an input string (from the settings of the variables) that will hopefully generate the output string.  The way the rewriting rules work is that the 1’s at the start of the string generate a subtree for each clause (copying all of our truth values), and then from there, we generate the string for each clause.

In each clause, we need to generate m+7 0 symbols as well as the a’s b’s and c’s that correspond to the clause.  Each of the a’s eventually maps to a 0 in our rewriting, which will give us m-3 of the 0’s we need- we still need to generate 10 more.  Assigning a literal to true will get us 3 0’s and setting it to false will get us 2 0’s.  So if the clause is satisfied, we will have 7-9 0’s, and we will have 6 0’s if the clause is not satisfied.  The replacement of the \$ can generate 1-3 more 0’s.  So if the clause is satisfied, we will be able to get our 10 0’s, but if it’s not, we will not be able to.

In the other direction, if some string exists that can generate our clause string w, we know it has to start with k 1’s, then a \$, then m t or f symbols.  The same logic will show that any string of that form that does not satisfy a clause will not generate the correct number of 0’s.  So whatever initial string was created to generate w has to be a way to satisfy our formula.

Difficulty: The hardest thing is to understand what is happening with the tree replacement.  Once you have that, the whole “figure out a way to handle 1-3 (but not 0) ways to satisfy a clause” is something that we’ve seen a few times.  So I guess I’d say a 8 for the actual reduction.