Sabbatical Project Log Fall 2007 Barbara Wahl
FRI 8-24-07 Compared the texts by Linz and Sipser. Need to settle on a specific text which my lab manual will accompany. Re-examined both books and also read many reviews on the www. Linz provides more copious examples and explanation, but Sipser says more precisely in two pages what Linz covers in ten. Conclusion: Will have students buy the Sipser text, but will refer them to the Linz text (available in the library) for extra examples and explanation. Will write my lab manual to comport with the text by Michael Sipser, Introduction to the Theory of Computation, Second Edition (2006). Came across an online Turing machine simulator at http://wwwsys.informatik.fh-wiesbaden.de/weberl/turing/tm/html . May investigate this more at a later time. Read many online syllabi for Theory of Computation. Once again I’m convinced that this course is overwhelmingly being taught without programming assignments, that is, as a course in mathematics. Every syllabus I looked at based student grades on homework and exams and not much else. Printed a list of recommended books related to theory of computation; will compare with our library holdings and consider ordering some of them.
MON 8-27-07 Ran a Google search on “theory of computation” + “programming assignment”; read through the first 110 hits. Found a few courses which require programming, even fewer which provide details of the assignments. Reviewed what I did in teaching Theory (CS 335J) during Winter 2006. There were 4 programming assignments contributing a total of 10% to the course grade. 1) begin implementation of a DFA class; 2) implement “testString” method in DFA class; 3) begin implementation of a CNF (Chomsky Normal Form) class; 4) implement more of the CNF class. Students were also allowed to propose an additional programming assignment for “extra credit.” See section “win 06 labs” for the four programming assignments. These assignments assumed the students would be using C++; I need to decide which language students are most likely to use this Winter and be prepared. Emailed Joe Oldham (Centre College) at the suggestion of Michael Bradshaw. Oldham has made recent efforts to make his teaching of Theory more “hands on” and I’d like to pick his brain, show him what I’m working on. Read articles by Rodgers (FLAP), Rodgers et. al. (JFLAP), and Robinson et. al. (JCT). JFLAP is the only software package of the three that is up-to-date; see JFLAP book (2006). Downloaded JFLAP software from www.jflap.org and learned how to create and test FA. So far, easy!
Tues 8-28-07 Scanned the SIGCSE 2007 proceedings; read three relevant articles. 1) Interactive Visualization for the Active Learning Classroom has good advice about common characteristics of active learning activities: interactive; simple to understand; short time frame; creative and motivational; sometimes collaborative; relevant to the topic being studied. 2) Engagement and Frustration in Programming Projects describes an approach to gauge student engagement and frustration levels with programming projects. I should assess my own programming projects in this way. Their main conclusions: computer science is naturally engaging (use projects which apply what is being discussed in class); students enjoy a challenge, and there is no challenge without some frustration; we should recognize when an assignment was a failure and scrap it; students appreciate the chance to give feedback on projects. 3) Engaging Students in Formal Language Theory and Theory of Computation argues for the use of the Moore Method in teaching ToC. Spent more time learning JFLAP.
Tues 9-4-07: Worked on Java code for Alphabet class & method to print language in lexicographic order. Considered defining an Alphabet as an implementation of the Set interface (Java.util) but decided this would probably just complicate matters.
Weds 9-5-07: Worked on Java code for Alphabet class & method to generate language in lexicographic order. Have decided to store data for Alphabet class as a char array, and will use a Vector (java.util.Vector) of StringBuffer objects to hold languages. Instead of printing the language, Alphabet will return a Vector of all the strings it generates, in lexicographic order, up to length n. This method is now working. Drafted a “lab 1” assignment (Alphabet class) and corresponding document on vault. Started work on a “lab 2” assignment (DFA class).
Thur 9-6-07: Talked with John Collins re: problem with posting “.docx” (Word 2007) documents to website (will save documents as “.doc” as a work-around). Revised lab 1 (Alphabet class) assignment to make it more clear. Drafted “advice to instructor” document for use with lab 1. Worked more on lab 2 assignment (DFA class).
Fri 9-7-07: Read about the class java.util.Scanner for getting input from the console in Java (use in lab 2). Worked more on lab 2 (DFA class). Added “charAt(int n)” method to Alphabet class.
Mon 9-10-07: Revised Alphabet class (lab 1): decided to implement with StringBuffer instead of char array; improved pre- and post-comments; added more testing to the main method; changed “getChars” to “toString”; revised “lab 1 source code” on website accordingly. Revised DFA class (lab 2): improved comments; used “System.arraycopy” instead of for loops for copying arrays; added accessor methods & corresponding test code Started work on DFA class extensions (lab 3): testString method
Tue 9-11-07: Read up on Java’s switch statement. Revised lab 2 code: removed the 4-argument DFA constructor; made input more robust in the constructor (doesn’t crash if you misspell “true” or “false”, etc) . Finished DFA class extensions for lab 3: testString and dialogTest methods. Worked on DFA class extensions for lab 4: getLanguage method, enhancements to dialogTest method.
Wed 9-12-07: Corresponded with Joe Oldham (Centre College). Revised Alphabet class (lab 1): added new method, indexArray, to move some of the messy details in the DFA class to their appropriate place [when a DFA runs its computation on a string, it really needs the string characters’ positions in the alphabet, or their “indices”; this new method accepts a string and returns an int array representing the indices of the string’s characters; also has a boolean return type, which is set to “false” if the string contains any characters not in the alphabet]. Revised “lab 1 source code” to incorporate this new method.
Fri 9-14-07: Revised “lab 1 assignment handout”. Tested lab 1 assignment in CFA lab setting (played student). Decided to split into two labs to allow more time for reviewing/learning Java, and to allow time to have students write their own test code. Decided to save source code files using notepad (.txt format) so that Eclipse formatting is not altered when copying to-and-fro.
Mon 9-17-07: Tested Lab 1 assignment again and found problems with saving source code in .txt format. Reverted to Word (.doc). Found ways to make the formatting copy back-and-forth correctly between Eclipse and Word. Improved the comments in Lab 1 source code and tested lab 1 assignment thoroughly. Next step on lab 1 is to have someone else try it out. Began work on “source code” and “assignment handout” documents for lab 2.
Tues 9-18-07: Finished “source code” and “assignment handout” documents for lab 2 (Alphabet class, part 2). Revised DFA class to use the new indexArray method mentioned above. Got all code working for labs 3-5. Began work on “source code” and “assignment handout” documents for lab 3.
Weds 9-19-07: Started thinking about how to implement an NFA (possible lab 6). Read related articles online (optimizing regular expressions in Java; automatically building transition tables from transition diagrams for NFAs, etc.). Decided the most efficient and obvious approach (and most accessible to students) is to have a method which converts an NFA to an equivalent DFA. This also has the advantage of using the DFA class we’ve worked so hard to construct. Wrote constructor and some basic methods for NFA class (lab 6). The data fields are almost the same as for NFA class, but I had the nifty idea to represent the transition function as a 3-dimensional boolean array. TO DO: write code to convert NFA to equivalent DFA (lab 7?).
Thurs 9-20-07: Started work on conversion method (NFA to DFA). Wrote a helper method “getTransitions” for NFA class which takes a state and a symbol and returns a Vector of Integers indicating all the possible transition destinations for that (state,symbol) pair. (Am ignoring epsilon transitions for the moment.) Wrote a helper method “getE” for NFA class which takes a state “s” and returns a Vector of Integers indicating all the states which are reachable from s using only epsilon transitions – returns E({s}) in the notation of Sipser p.56. This is an interesting programming problem, the first really tricky thing I’ve come across so far in the project. First I notice that in a graph with n nodes, if it’s possible to get from node i to node j, then it must be possible to do so in n-1 or fewer transitions [proof is a nice application of the pigeonhole principle – should discuss with students in lecture]. For example, if a graph has only 3 nodes, no trip between a specified starting node and stopping node can require more than 2 transitions (unless the trip is impossible). So, a recursive procedure might work well for “getE”. As noted above, if the NFA has n states then E({s}) = E_{n-1}({s}) [use subscript on E to indicate max number of transitions allowed]. E_{n-1}({s}) is the union of E_{n-2}({t}) over all states t in E_{1}({s}). E_{n-2}({t}) is the union of E_{n-3}({u}) over all states u in E_{1}({t}), and so on, where of course E_{0}({s}) = {s} for any state s (base case for the recursion). The recursive method to calculate E_{k}({s}) first checks the base case; if k = 0, return {s}. Otherwise, the method calls itself inside of a request to form a union of all E_{k}-1({t}) where t is reachable from s in 0 or 1 epsilon transitions. Representing a set of states by an ordered Vector of Integer objects is unwieldy. Created class called “IntSet” for working with sets of integers. Big improvement! Still need to test the “getE” method and keep working on converting NFA to DFA.
Mon 9-24-07 Cleaned up / standardized method names in labs 1-6. Changed “printAlphabet” to return a String instead of calling System.out directly. Did more work on lab 6 (NFA): Decided to put epsilon transitions in a separate array (“epsilon”) rather than including them with the “regular” transitions – code is now cleaner and easier to read; added boolean method “validState” and used it to make the dialog code more robust (added to DFA class as well. Eliminated “excess” methods from all classes to the minimum needed functionality. Improved IntSet comments and main method.
Tues 9-25-07 Read about javadoc comments (java.sun.com/j2se/javadoc). Consider inserting javadoc comments in all my code. Fixed and tested recursive “getE” method for finding all states reachable from a given state via epsilon transitions (had to make the union method destructive in IntSet class). Wrote a non-recursive version of “getE” [more efficient for space and time]. With a bit of thought, wrote an O(n^{3}) algorithm for filling the array E. Added IntSet[ ] “E” as a new data field in NFA, so that E[i] is only calculated once for each NFA. Since the NFA’s are not mutable once constructed, E[i] does not change either. Improved the testing method “main” in NFA.java Improved the “printNFA” method in NFA.java Read up on the java Math class for arithmetic operations (need to calculate 2^{n} as part of the conversion method).
Wed 9-26-07 Continued work on conversion method (NFA to DFA): added a 3-argument constructor to the DFA class to be used by the conversion method; added a “makeFinal” method to the DFA class to be used by the conversion method. Thoughts on the conversion method: In the standard construction, if the NFA has n states, the equivalent DFA has 2^n states. The start state for the NFA is 0 (in our implementation), so the start state for the DFA is E(0). The alphabet for the NFA is the same as the alphabet for the DFA. The final states for the NFA can be determined from the final states for the DFA without much trouble. The difficult part of the conversion from NFA to DFA is defining the new transition function. The DFA will have a transition from R to S on symbol a iff S is the set of all states s in the NFA for which there exists a state r in R such that, in the NFA, we can transition from r to s with an a-transition in combination with any number of epsilon-transitions. Sipser claims we can FIRST consider the set of reachable states from R via a, THEN expand this set with the E( ) operation. (I am not convinced he is right – can I find a counterexample to his claim that you can always do the symbol-consuming transition before the epsilon-transition?) But, assuming for the moment that Sipser is right (hard to imagine this mistake would have survived into the 2^{nd} edition of his text!), our conversion method will have to do the following, for each symbol a in the alphabet: (1) iterate through all the states (i) in the new DFA; (2) for each i in the range 0 to 2^n – 1, know which subset R of {0,1,…,n-1} is being represented by i; (3) over all states r in R (r is a state in the original NFA), form the union of the sets reachable(r, a) to find all states reachable from R by consuming an a; (4) Let U be this union from step 3, and over all states u in U form the union of the sets E(u); (5) Let S be this union from step 4, and find the numerical representation (j) of the state in the DFA which corresponds to the subset S (S is a subset of the states in the original NFA); (6) the new transition function will take i to j on consuming symbol a. As we see in steps 2 and 5, we need an efficient translation mechanism (going both directions) between integers representing states in the new DFA and sets of integers representing sets of states in the old NFA. Obviously there is a one-to-one correspondence between the states Q’ = {0,1,…,2^n -1} in the DFA and the following related representations: n-bit binary codes. For example, if n = 3 then the original NFA has 3 states, Q = {0,1,2}. The new DFA has 2^3 = 8 states, represented by Q’ = {0,1,2,3,4,5,6,7}. To convert an element i in Q’ to a 3-bit binary code, find its binary digits (b is the conversion function): b(0) = 000, b(1) = 001, b(2) = 010, b(3) = 011, b(4) = 100, b(5) = 101, b(6) = 110, b(7) = 111. Let b(i) = be the binary digits for i. We can generate the digits with arithmetic as follows: find the remainder on integer division of i by 2 – this remainder is a_{0}; reduce i to the quotient on integer division of i by 2, then find the remainder on integer division of (this new value of ) i by 2 – this remainder is a_{1}; repeat these steps of “mod” and “div” until all n bits have been found (a simple for loop). subsets of Q = {0,1,2,…,n-1} (P(Q)). Specifically, for any i from 0 to 2^n – 1, let b(i) = be the binary digits for i. Set f (i) = {j in Q | = 1}. Thus, f(0) = { }, f(1) = {0}, f(2) = {1}, f(3) = {0,1}, f(4) = {2}, f(5) = {0,2}, f(6) = {1,2}, f(7) = {0,1,2}, f(8) = {3}, and so on (a modification of the above simple for loop). an array of length 2^n of boolean arrays. Each individual boolean array can be used to indicate a subset of Q in accordance with the function f above. For example, with n = 3 we have Q = {0,1,2} and Q’ = {0,1,2,3,4,5,6,7}. An array of 8 boolean arrays, each inner array having length 3 (that is, a 2-dimensional 8-by-3 boolean array), makes a nice representation of P(Q): -
i | Array[i][0] (0 in set?) | Array[i][1] (1 in set?) | Array[i][2] (2 in set?) | 0 | False | False | False | 1 = 2^{0} | True | False | False | 2 = 2^{1} | False | True | False | 3 = 2^{0}+2^{1} | True | True | False | 4 = 2^{2} | False | False | True | 5 = 2^{0}+2^{2} | True | False | True | 6 = 2^{1}+2^{2} | False | True | True | 7 = 2^{0}+2^{1}+2^{2} | True | True | True | Notice how simple it would be to initialize this 2-dimensional array with a double for-loop.
To accomplish step 2 above, given i in the range 0 to 2^n – 1, access the ith row of the 2-dimensional boolean array [or do appropriate arithmetic!] to know which set of states in Q is intended. For step 5, the binary codes approach is simpler. The IntSet S is converted to a binary code (for example, the subset {2} of Q = {0,1,2,3} would be converted to 0100 (= 2^{2}) while the set {0,1} in {0,1,2,3} would be converted to 0011 (= 2^{0 }+ 2^{1}) and then the binary code converted to an integer i in the proper range.
Thurs 9-27-07 Regarding the following comment made on 9-26-07: “Sipser claims we can FIRST consider the set of reachable states from R via a, THEN expand this set with the E( ) operation. (I am not convinced he is right – can I find a counterexample to his claim that you can always do the symbol-consuming transition before the epsilon-transition?)” I thought I had a counterexample, but 95% of the way through emailing Sipser about it, I realized a subtle point in his construction: the bottom line is that the DFA needs to recognize the same language as the NFA. Sipser’s construction and my construction might differ in some of the transitions and yet recognize the same language. In particular, for the counterexample I thought I had, his construction seemed to give the wrong result for transitioning from the pair ({1},a) in the DFA, but on closer examination I realized that the state {1} is actually unreachable in the DFA so transitions from there are irrelevant to the language being recognized. Thus, I am reassured that Sipser’s construction is probably correct, but will use my (less elegant, more-obviously-correct) construction in programming the conversion method. In my construction (see Sipser p.56 for notation), the DFA will transition from (R,a) to E(delta(E(R),a)) rather than from (R,a) to E(delta(R,a)) – I allow epsilon transitions either side of the symbol-consuming transition, as makes sense to anyone used to programming NFAs. Added a method to IntSet class to convert an IntSet (subset of {0,1,...,n-1}) to an integer in the range 0 to 2^n -1, as per the discussion above. Added a method to IntSet class to convert an integer in the range 0 to 2^n -1 to an IntSet (subset of {0,1,...,n-1}), as per the discussion above. Added some methods for exponents and logarithms (base 2) to IntSet. Added more testing to main method in IntSet. Changed the “findE” and “reachable” methods to work on a set of states rather than a single state. Added testing code to main for these new methods. With these helper methods in place, proceeded with implementation of steps 1 through 6 in plan above to accomplish conversion of NFA to DFA. Realized that DFA start state needs to be specifiable (because the DFA coming from an NFA might have any state as its start state). Made this change.
Fri 9-28-07 Realized that it will be much better to have “nameable” states in the DFA so that, when the DFA is coming from the conversion of an NFA, the state labels will reflect subsets of the states of the NFA. At the moment, I think it will work to have NFA states represented by whole numbers (0 to n-1) with 0 as the start state. Made this change; the DFA states are now associated with String labels provided by the user (or by the NFA class in the case of automatic DFA generation). Conversion method seems to be working!!! B^) Added “sort” method to IntSet class (insertion sort modified for Vector of Integers). The state labels for a DFA generated by the conversion method are IntSets which are always sorted into ascending order. Modified IntSet class so that all IntSet objects are maintained in ascending order. Implemented “TestString”, “GetLanguage” and “PrintLanguage” methods for NFA class. Found and fixed problem with conversion (DFA start state not being set correctly by the conversion method).
Mon 10-01-07 Did more testing of the NFA to DFA conversion method. Did more testing of the sort method in IntSet class. Started work on CNF (Chomsky Normal Form) class. Learned how to use regular expressions for pattern matching in Java. Got tired of program crashing when an integer was expected (keyboard input) but a non-integer character was entered by mistake. SO, used pattern matching and parseInt (Integer class) to get rid of all occurrences of scanner.nextInt(). Need to update source code handouts to accompany labs 1-5. [Wait until code is stabilized.]
Tues 10-02-07 Created a “CNFRule” class, and will have each CNF maintain an array of Vectors of CNFRule objects for its rules (rule[n] holds all rules with LHS variable = vn). Created methods “addRule” and “deleteRule” in CNF class. Added data field “startToEpsilon” (boolean) to CNF class. Eliminated variable labels in CNF class. Added “contains” method to Alphabet class. Finished CNF class constructor. Cleaned up IntSet comments. Cleaned up Alphabet comments.
Weds 10-3-07 Cleaned up DFA comments; reduced any overly-long methods so each method can fit on a single screen. Idea for an assignment: Have students look at a list of “regular expression” forms in Java [such as Liang p.281] and demonstrate whether each is an actual regular expression in the sense of Theory of Computation.
Thurs 10-4-07 Cleaned up NFA class comments; reduced any overly-long methods so each method can fit on a single screen. In DFA, NFA, CNF: added static data member “scanner” (for keyboard input) so the scanner only needs to be initialized once. Thought of idea for project assignment (application of DFA to programming): first, add functionality to DFA class regarding regular operations. For example, should be able to operate on instances of the DFA class via Kleene closure (*) and union. Once these functions are added, have students write a main program to (1) read a regular expression from keyboard input (perhaps not in full generality – think about this); use DFA class methods to “compile” the pattern and then test input strings to determine whether or not they match.
Fri 10-5-07 Tested the CNF constructor. Continued to develop the CNF class. To facilitate the method “printLanguage (int maxSteps)”, which prints all the strings of terminals generated by the CNF grammar in derivations of length maxSteps or less, added a new class “stringVarTerm”, for representing strings of variables and terminals (as occur in grammar derivations). Added “print language” option to CNF constructor menu.
Mon 10-8-07 Added enhancement to stringVarTerm and CNF classes: when the language of the CNF is generated, the derivation of each terminal string is also generated. When the language of the CNF is printed, user is given an option to also print the derivation of each string. Note: the “print language” command in CNF class can result in the same string being printed many times – this is not a problem, but a good thing. If the grammar is ambiguous (if there exists a string in the language with at least two different left-most derivations) then there is a large-enough value n so that printLanguage(n) demonstrates this ambiguity – some string will appear more than once. By also displaying the derivations, the user can verify that the string is repeated because the grammar is ambiguous (not because the code is working improperly). Did more testing of the CNF class. For example, ran constructor to create the following grammar for 0*10*: v0->1; v1->1; v2->0; v0->v1 v2; v0->v2 v1; v0->v2 v3; v2->v2 v2; v3->v1 v2. Works great! Nice demonstration that the grammar is ambiguous and generates the desired language L = {1,01,10,001,010,100,0001,...} There is a theorem (Sipser p. 130) that to derive a string of terminals of length n > 0 in a Chomsky Normal Form grammar, exactly 2n - 1 steps are required. (My CNF class gives a nice demonstration of this fact, by the way.) Used this idea to write a “testString” method for the CNF class: the user provides a non-empty string and the menuHandler code calls getLanguage(2n - 1) to determine whether or not the string is generated by the grammar.
Tues 10-9-07 Incorporated “testString” method into the CNF constructor menu. Can now test any non-empty string, or epsilon. Did more testing of the CNF class. Programmed the grammar for language 0*1*0^{+}. Rules: v0->0, v0->v1 v2, v0->v2 v2, v0->v3 v2, v1->v2 v3, v2->0, v2->v2 v2, v3->1, v3->v3 v3. Works great! Did more testing of the CNF class. Programmed the grammar for language 1*(001^{+})*. Plan for creating rules: (1) generate 1*; (2) generate 001^{+}; (3) generate (001^{+})^{+}; (4) generate 1^{+}(001^{+})^{+}. Rules: (1) v0->epsilon, v0->1, v0->v1 v1, v1->1, v1->v1 v1; (2) v2->0, v3->v2 v2, v0->v3 v1; (3) v4->v3 v1, v4->v4 v4, v0->v4 v4; (4) v0->v1 v4. Works great!
Weds 10-10-07 Started going method-by-method thru ALL my code, improving comments, eliminating extraneous methods, re-organizing remaining methods. Eliminated “sort” method from IntSet; instead, used methods of Vector class to efficiently add an element while maintaining the ascending order.
Thurs 10-11-07 Moved static binary math methods and “StringToInt” from IntSet to new class, Util. Moved all use of “Java.Util.Scanner” to Util. Added “runMenu” method to IntSet to make testing easier. Revised StringToInt to parse “0” as an integer.
Fri 10-12-07 Improved “runMenu” in IntSet class. Finished going method-by-method thru ALL my code, improving comments, eliminating extraneous methods, re-organizing remaining methods.
Mon 10-15-07 Revised menus for DFA, NFA, and CNF to make the interface more uniform. Found and fixed bug in “getLanguage” method (CNF). Worked through example of CNF grammar for palindromes on {0,1}. Plan: first think of a non-CNF grammar, like S -> epsilon | 0 | 1 | 0S0 | 1S1. Then, convert to CNF form: S -> epsilon | 0 | 1 | xx | yy | xz | yw; T -> 0 | 1 | xx | yy | xz | yw; x -> 0; y -> 1; z -> Tx; w -> Ty. Tested it successfully using CNF class.
Tues 10-16-07 Tested the new menu system for the NFA class. Did more work making the interface uniform across DFA, NFA, and CNF classes. Very pleasing effect, now easy to jump back and forth among the menus.
Weds 10-17-07 Improved menu-handling code for DFA, NFA. Eliminated the data field “E” for NFA; it will be best just to recalculate E whenever the equivalent DFA is found (E was becoming too much trouble to maintain in a valid state since NFAs are now mutable). Made sure that all methods which mutate the NFA also update the equivalent DFA “D”, except where clearly noted in post-conditions comment. Added method “prune” to DFA class, to delete inaccessible states from the DFA. Added method “indexOf” to IntSet class. Added “prune” option to the DFA menu and did some testing. Found problem with prune (not all states in range of the transition function are accessible) – fixed! Added methods “accessible(Intset,int)” and “accessible( )” to DFA class. Found and fixed another problem with prune (case when start state is not 0 ). Began thinking about adding “regular operations” to the DFA and/or NFA class. The regular operations are: union, concatenation, and star (Kleene closure). Union (see Sipser p.59) and concatenation (see Sipser p.61) are binary operations, and so far the interface is set up to work with just one FA at a time. So, I will start by adding the unary “star” operation (see Sipser p.62). The usual construction for “starring” an NFA is to create a new (final) start state, make a transition from new start to old start on epsilon, and add epsilon-transitions from all other final states to the old start state. If the old NFA recognized language A, the new NFA will recognize A* = {epsilon} A AA AAA ... The star-closure of an NFA is an NFA. I added a new menu option to NFA, “replace this NFA with its star closure.” Tested “starClosure” method. This can be assigned as an extra project for students (not central).
Thurs 10-18-07 Changed the “convertToDFA” method (NFA class) so that the DFA is automatically pruned to remove inaccessible states. Changed DFA class to allow user to manipulate two FAs at once, then changed it back (it was not worth the effort at this point, but could be assigned as a student project). Code for this revised DFA class is saved in my folder “Sabbatical 2007.” Had an epiphany (!) regarding Sipser’s algorithm for converting NFA to DFA (see first bullet point, 9-27-07, this log). The reason we only need to allow 0-or-more epsilon transitions after the symbol-consuming transition (rather than both before and after) is, if we needed to do epsilon-transitioning before a certain input symbol was processed, we could just do the epsilon-transitioning AFTER THE PRECEDING SYMBOL had been processed. Even the first symbol is not a problem, because the start state is given by E[0] rather than just [0], so an epsilon transition can be made from start before processing the first symbol. SO, I’ve altered my conversion method to match Sipser’s. This will make it easier to verify correct operation of my code. Eliminated “makeFinal” method from DFA and incorporated it into the 5-argument constructor (now 6 arguments).
Fri 10-19-07 Tested nfa-to-dfa conversion – seems to be working! Did more testing of the starClosure method (NFA class). Tried this example: The NFA for language {aa} has 3 states, (0,a)->1, (1,a)->2, and state 2 is final. The NFA for language {aa}* = {epsilon, aa, aaaa, aaaaaa, ...} is correctly constructed by the starClosure method, but string testing does not work right (DFA was not being updated – fixed). Tried this example: The NFA for language {aa,b} has 3 states, with (0,a)->1, (1,a)->2, (0,b)->2, where state 2 is final. The NFA for language {aa,b}* = {epsilon, aa, b, aab, baa, aaaa, bb, aabaa, ...} – works great! I’m really happy with the “prune” method which makes the equivalent DFA much simpler. Added six “predefined” (constant) objects to DFA class for user convenience (based on figures in Sipser’s text). Added “copy” method to DFA. Added DFA menu option to set current DFA to one of the predefined ones. Added a 5-argument constructor to NFA class to enable predefined objects. Added four “predefined” (constant) objects to NFA class for user convenience (based on figures in Sipser’s text). Added “copy” method to NFA.
Mon 10-22-07 Added NFA menu option to set current NFA to one of the predefined ones. Added a fully-specified constructor to CNF class to enable predefined objects. Added 5 CNF predefined objects. Added “copy” method to CNF. Added CNF menu option to set current CNF to one of the predefined ones. Thought about adding a pushdown automata class. A pushdown automaton is essentially an NFA plus a stack. (The stack allows pushdown automata to recognize some non-regular languages. PDAs are equivalent in power to context-free grammars.) The stack alphabet can be different from the input alphabet, so a PDA has two alphabets. The stack always starts out empty. If we want to detect the bottom of the stack, we can begin the computation by pushing a special symbol such as “$” onto the stack. PDA transitions are complicated. In addition to looking at the current state, the transition function will either process the next input symbol or transition on input “epsilon”, and it can pop and examine the symbol on top of the stack (or transition without manipulating that symbol). The result of the transition, based on current state, input-symbol-or-epsilon, popped-stack-symbol-or-epsilon, is a set of zero or more pairs (q,s) where q is the new state and s is the symbol to be pushed onto the stack (s = epsilon is also allowed, meaning no symbol is pushed onto the stack). If the PDA is in a final state when all the input symbols have been processed, the input string is accepted. (The stack is not required to be empty at the end of processing for the string to be accepted.) Since PDA’s have the same computational power as context-free grammars, one could “run” the PDA computation by simulating it with an equivalent CNF grammar. However, this is quite involved (have to convert PDA to CFG to CNF). The implementation of a PDA class seems too complex to be useful in this context (weekly 1-hour labs in a course which is essentially theoretical). In fact, it would be much simpler to implement a TM (Turing machine) class, since Turing machines do not require non-determinism (every nondeterministic TM has an equivalent deterministic TM). This would make a good “extra” student project. Added code in CNF class, giving user a choice – print language with possible repeats (if grammar is ambiguous, it may become apparent from examining this output) OR print language with no repeats (rewrite string-generating method to avoid generating duplicates; method will produce more-readable output). Generally works well, but in case of example #3 (Sipser G6, p.108), it is quite slow to print without repeats. Added another utility method to Util class: answerIsYes( ). Reads a String response from the keyboard, returns true if response begins with ‘y’ or ‘Y’. Used this method to eliminate repetitious code in other classes. Tried using a hashtable (Java.util.HashTable) to improve the efficiency of generating CNF language without printing repeats; now am running out of heapspace on example #3 (Sipser G6) with max derivation length = 9. Idea: instead of checking string length at point of dequeing, check it before enqueing – should keep from wasting so much time and space with “q” in getLanguage and getLanguageUnambiguous methods. This helps! However, doing the same example with repeats allowed causes program to run out of memory. (G6 is a very complicated grammar)
Tues 10-23-07 Woke up determined to improve efficiency of the getLanguage method in CNF class. Here is my 6-point action plan: (1) In StringVarTerm class, add a data field “vars” which keeps track of how many variables are in this string (in a CNF grammar, if there are k variables in a string of variables and terminals, it will take k applications of type 1 rules, var -> terminal, to create a string of terminals); (2) In CNF class, store type 0 and type 1 rules separately to make separate application of different type rules efficient (if we only want to have a maximum of 9 steps in our derivation, for example, we should apply type 0 rules no more than 4 times total, as the remaining 5 rule applications need to be type 1 to bring 5 variables to all terminals); (3) Write my own hashtable for open chaining using an array of Vectors – I really don’t like the “black box” feeling I get from using the canned hashtable (Java.util.Hashtable); can track and report load factor each time the hashtable class is used (best efficiency is expected when table size is prime and load factor is approximately 1, i.e., array length ~ number of items to be stored); (4) In the getLanguage methods in CNF class, don’t apply type 0 rules if current.numSteps + current.vars >= maxSteps (this reduces the number of extraneous strings generated in the search for terminal strings); (5) Use new Hashtable class to store “used” StringVarTerm objects in finding the CNF language unambiguously (use key = this.toPrettyString( ) so that only the actual string is relevant, not the derivation used to arrive at that string); (6) In the getLanguage methods in CNF class, “Vector” is a bad choice for implementing a queue (all that shifting takes time); by giving up some space, can save time – don’t actually remove items when dequeing, just increment a “head” counter (easier than creating a SLL class and a Queue class, which is my backup plan). Did (1) as outlined above. Did (2) as outlined above. Added an “Association” class to my project to help with implementation of hashtable (table will store associations, where an association is a key-value pair). Did (3) as outlined above. Did (4) as outlined above. Did (5) as outlined above. getLanguageUnambiguous is now running much faster than yesterday! Using example #C3 (Sipser’s G6), can now test strings of length 6 or less. With less-complicated grammars, can test much longer strings. getLanguageUnambiguous reports the maximum hashtable load factor which occurred during the computation, allowing the user to adjust the tableSize accordingly. Did (6) as outlined above – used Vector with a moving “head” index and lazy deletion to make a queue which avoids shifting elements. It’s running well! The pre-defined objects are a big time-saver in testing these classes. B^)
Weds 10-24-07 Did more testing of “getLanguageUnambiguous” in CNF class – found error (hashtable method “contains” is not working for this application).
Thurs 10-25-07 Corrected the code for “contains” in Hashtable class In CNF class, “getLanguageUnambiguous” method now working! Played around with re-arranging order of rules in pre-defined CNF objects (to see effect on order in which language strings are generated).
Mon 10-29-07 Improved “main” method for Util class (added code to test the input methods). Added a “pump” method to DFA class: given any string s (in the language of the DFA) of length >= numStates, finds the first repeated state in the computation on s and reports values for x, y, z which satisfy the conditions of the Pumping Lemma (see Sipser, p.78-79). Added “pump” method to constructorMenu in DFA class.
(Tues, Weds, Thurs: worked on Discrete Math syllabus and CS curriculum proposals)
Fri 11-2-07 Improved the messages printed to console by “pump” (DFA). Added static method “printString(String)” to Util class; when the parameter is a String of length zero, the method prints “epsilon”; otherwise, it prints the String. Added “pump” option to NFA menu & tested.
Tues 11-6-07 Idea for an “extra” student assignment: write a “prune” method for CNF class to automatically detect and eliminate “unfruitful” variables (those which will never generate strings of terminals). For example, in the grammar S -> A B | A C, A -> a, B -> b, C -> D D, the variables C and D are unfruitful and could be eliminated without changing the language of the grammar. (Their elimination would improve the efficiency of the getLanguage methods for CNF.) Added another utility method to Util, "readBoolean()"; reads a string from keyboard input, and returns "true" if string begins with 't', 'T', 'y', or 'Y'; returns "false" if string begins with 'f', 'F', 'n', 'N'; queries again if none of the above is true. Once more, went method-by-method thru ALL my code to improve clarity and formatting.
Wed 11-7-07 Created three diagrams of class dependencies to explain the relationships among the classes I've created. Mapped out a tentative plan for devising 13 labs plus extra projects (I will provide code for Util and IntSet): lab 1: Alphabet class lab 2: Alphabet class lab 3: DFA class lab 4: DFA class lab 5: DFA class lab 6: NFA class lab 7: NFA class lab 8: NFA class lab 9: CNFRule class lab 10: StringVarTerm class lab 11: CNF class lab 12: CNF class lab 13: CNF class additional ideas for programming projects
Thur 11-8-07 Added copy constructors to most of the support classes. Revised the implementation of pre-defined objects in DFA, CNF classes to eliminate need for deep copy operation. Demonstrated my software and labs for Prof. Skiadas.
Fri 11-9-07 Revised the implementation of pre-defined objects in NFA class to eliminate need for deep copy operation. Changed "derivation" field in StringVarTerm class to a Vector of String objects (don't need to reference entire StringVarTerm objects in the derivation). Implemented javadoc commenting. "The Eclipse Java compiler now processes Javadoc comments. Search reports references in doc comments, and refactoring updates these references as well. This feature is controlled from the |