Скачать 457.9 Kb.

HARDWARE ACCELERATION OF CONSTRAINED SATISFACTION AND LOGIC OPTIMIZATION PROBLEMS USING VELOCE EMULATOR Edited by Marek Perkowski Chapter 1. Introduction, Marek Perkowski. Chapter 2. Satisfiability, CHAPTER 1. INTRODUCTION Draft 1.23 Version February 11, 2011 By Marek Perkowski
It is wellknown that there are many problems in robotics, computational intelligence, artificial intelligence, computer vision, scheduling of industrial operations, planning of actions (for instance actions of an assembly line), computer aided design, computational biology, genetic engineering and other areas that require extremely fast processing speeds for specialized problems, assuming however a limited cost of the computing system – so that a supercomputer is not used for computations. These are basically logic, not arithmetic problems. Many of these problems can be formulated in a maximally simplified form as logic puzzles that you can find in high school textbooks or popular magazines. The books proposes to teach many important concepts of hardware design using a sequence of relatively simple problems. This is not a systematic textbook on logic design similar to Wakerly or a texbook on logic synthesis similar to Hachtel or Kohavi. Our approach here is pragmatic – we are interested not in design algorithms for standard designs but in intuitive methods that you can apply to some special designs, methods that require some kind of ingenuity. This way, our book can be a supplement to standard textbooks, and is especially suited to teach high school students who look for advanced digital design projects. Field Programmable Gate Arrays (FPGAs) are special electronic devices that their hardware can be programmed in arbitrary way by their user and thus they can replace standard computers in many applications. The user can program arbitrary logic gates, their structures and connections, including processors with high parallelism. You can find many FPGAs in your cars, refrigerators and daily life “embedded systems”. FPGAs are also used to build inexpensive supercomputers; specialized rather than generalpurpose ones. Using FPGAs a student can design his or her “own computer” dedicated to some task or a general purpose processor with their own list of instructions. Several FPGAbased hardware accelerators have been already built or proposed for specialized tasks that require high computing speeds. They allow to accelerate algorithms by implementing them all or their crucial parts, in hardware. Such accelerators have for instance applications in robotics, multimedia and medical diagnostics. For example, an imaging system for medical application may use multiple ASICs and FPGAs. This book is intended as a supplemental textbook for undergraduate students in Electrical Engineering and Computer Science. It shows how to design special classes of hardware accelerators, with special emphasis on intelligent robotics applications. We will formulate three classes of problems that can be hardwareaccelerated, and in our opinion, should be accelerated. As the hardware/software system to help design these accelerators we select the VELOCE hardware emulator from Mentor Graphics. Our goal is thus to teach both computer architecture of special high performance systems and the methods to emulate and verify digital designs using modern software/hardware tools, like the VELOCE emulator from Mentor. Our exercises and projects serve both of these goals. Students are encouraged to solve them individually or in groups. In our graduate classes, students solve the problems together with the professor interactively in the class, and next simulate and emulate them individually. We discuss merits and demerits of various created by variants of architectures. The book’s main original thesis is that problems that massively use logic operations need a different type of hardware than that used in contemporary standard computers, or in DSP computers, in which architectures the arithmetic operations such as multiplication are accelerated. Our interest is in problems with massive parallelism but simple processors – such processors may have no arithmetic unit and multipliers at all! Such models of computing have been proposed by several new technologies such as nano, quantum or DNA computing. An example of a logic problem that needs hardware acceleration is the Satisfiability Problem (SAT). Students will learn in this book how popular logic puzzles or robot planners can be reduced to SAT. SAT problem is formulated as follows: 1) Given is a Boolean equation (Boolean function F) with many input variables and one output variable, 2) Find if there exists an assignment of binary values to Boolean variables that function F is satisfied (has value 1) and if yes, find what is this assignment of Boolean values to variables. In other words, find a binary vector of all variable values (called a minterm) that satisfies the logic equation, if such vector exist. In similar formulation we may look for an assignment of not all, but a satisfactory number of variables. Many important and very practical problems can be reduced to the SAT problem as formulated above. For instance in robotics, these problems are usually solved with advanced software programs. Special computers for this task were however also built for CAD applications by NEC, IBM, and other companies, because speeding up SAT would enable efficient solving of important problems that are reducible to SAT. There are thousands of such problems. There are also several important problems that are not identical but similar to SAT. For instance, Tsutomu Sasao from Kyushu Institute of Technology in Japan built a logic machine to solve the logic Tautology problem. Tautology problem is to check for a given function F if F = 1 for all combinations of values of variables that are used in function F (we call them the input variables of F). Many machines to solve Boolean problems, such as solving Boolean Equations, were built in Russia and Germany in 19^{th} and 20^{th} Centuries [ref, Zakrievskij]. We believe that future computers Constraint Satisfaction Problems (CSP) or robotic problems for CAD problems, will use hardware accelerators of various kinds, including logic accelerators as those proposed in this book. These will especially include special hardware accelerators to solve the kinds of logicintensive problems that we are interested in here. This book is a speculation about the nature and technology of general purpose logic accelerators for a selected set of important problems. These problems have been never addressed in a unified way so far and as far as we know, they were never a subject of a book. Hardware emulators such as Veloce are used to emulate/simulate new digital designs, usually general purpose processors, DSP and special embedded systems. They serve thus to verify if the hardware is doing what it was expected by its designer to do. Sometimes the emulators can also help to evaluate testability or speed of the designed prototypes. In this book we want to achieve these goals, and even more. We want to check if a hardware emulator can be used as a hardware accelerator in order to obtain a significant speedup over a standard software implemented on a standard computer (such as a laptop). If yes, then Veloce will be used in the next phase to run, via internet, the realtime control software for mobile robots equipped with stereovision. Veloce box will become thus a “poor man specialized supercomputer” for our robotics projects. A complete cycle of designing a logic accelerator processor will include therefore the stages of designing, simulating, emulating and verifying for correctness and analyzing the potential speedup over standard software for a PC. Even if we will find out that the hardware emulator Veloce is not giving a sufficient speedup, these experiments will help us, for sure, to evaluate what would be a possible speedup of a VLSIbased (or FPGAbased) system that have been first prototyped using the Veloce tool. For this, we should approximately know only the delays of logic gates in the selected by us implementation technology, such as one of Xilinx or Altera FPGA architectures. The emulated design will be thus redesigned with other tools, widely available free FPGAbased software tools such as logic synthesizers and physical design tools, but with a small risk that the design will have different speed or functionality, as the circuits found and emulated on VELOCE will be very accurate. Therefore, there is no danger that the material developed in this book will be of no practical use for its readers. The readers will learn about highperformance special processors, reconfigurable computing, use of CAD tools and last but not least, many constraint satisfaction problems (such as logic puzzles or known mathematical problems) and how to solve them. The general class of CSPaccelerated algorithms should have a wide applicability to logic problems; for automated logic circuit synthesis and optimization, machine learning, robotics, vision, and directed knowledge discovery. It is so because, as it will be shown in this book, all these research areas are closely related. Although some problems and their reductions are known from classical algorithms, they may be new to our reader, so all background material will be explained to make the book selfcontained. The book assumes only the knowledge of basic logic design and machines, on the level of PSU ECE 271 class (second class in logic design). This material can be found in Wakerly’s book, or equivalent. The book is intended for undergraduate students and even smart high school teens. Therefore, dear reader of the book, please help us to improve it when something will be not clear. Grammar, interpunction, organization, examples should be improved. Please correct the files that you get from me and send them back to me. 1.2. Why Logic Accelerators are superior to classical computers when solving certain classes of problems. This book is devoted to some aspects of designing one type of future computers – computers with accelerated logic operations. We can call them the “logic accelerators”, or “logic computers” as they have been historically called. One may ask “Why such computers are of interest and why are they are expected to be more powerful than the standard computers of year 2010? “
1.3. Towards combinatorial problem solving. 1.3.1. NP problems and CSP problems. It is popularly known, even among nonspecialists, that modern computers and all integrated circuits are built using a computer equipped with ComputerAidedDesign software. Humans are just not able to deal with the enormous complexity of such designs without the use of computers in all stages of simulating, designing, optimizing, verifying, validating and testing modern digital systems. This is why CAD tools are developed for all kind of digital and analog design and will remain to be perfected and made more powerful and general. It is however less well known that several basic problems in Computer Aided Design of standard logic circuits, parts of standard software packages, are “NPhard”. NP stands for nonpolynomially, relating to the complexity that is nonpolynomial with respect to the size of the problem. If for instance the size of the problem is n and the complexity of the algorithm is 2^{n}, the problem is nonpolynomial. The problem of complexity 4n^{3}+3n^{2}+8n+99 would be polynomial. Many familiar problems known to be hard to solve (like the travelling salesman problem) are NP problems. The word NPhard means that these problems are optimization problems that are counterparts of the “NPcomplete” decision problems. NPcomplete problems are decision problems that allow verifying that S is a solution to problem P in polynomial time, but these problems need an exponential time to find the solution. The solution to an NPcomplete problem is of Yes/No type. An example of such problem is the Satisfiability Problem in which we have a Boolean formula and we have to answer a question: “does there exist an assignment of values to Boolean variables from the formula such that this formula is satisfied?” Many CAD and robotics problems can be thus reduced to SAT (satisfiability) and few other similar basic combinatorial search problems. These are called the Constraint Satisfaction Problems or CSP problems for short. CSP problems are those that we have a set variables and a set of constraints on values that can be assigned to these variables. We look for a mapping of these variables to their binary values such that all constraints are satisfied. For instance, a problem of coloring a graph with the given specific number of colors K, such that every two adjacent nodes (nodes connected by an edge) have different colors  is a CSP problem. It is an NPcomplete problem. The minimum number of such colors for a given graph G is called the chromatic number of graph G. If one is asked to find the coloring with the number of colors equal to an unknown chromatic number, then it will be the NPhard optimization problem. If the numerical value of the chromatic number is known, the coloring problem will be NPcomplete. In this book we will deal with both types of problems; NPcomplete and NPhard. 1.3.2. Building a circuit for a very simple oracle. Building SAT oracle is just to realize a circuit for the Boolean Function F from the SAT problem. This function in general can have arbitrary operators (logic gates) and arbitrary form (structure, number of logic levels). A commonly used form is the “Product of Sums of literals of variables form”. (A literal is a variable or a negated variable). An example of a SAT problem with Product of Sums (POS) formula of F is the following: F = (a’ + b’ + c) (a’ + b’ + c’) (a + b’) (b’ + c). Is there a set of literals that satisfies F? Oracle for this formula would have one AND gate with 4 inputs and four OR gates, each of OR gates would realize one of OR terms such as (a’ + b’ + c), (a’ + b’ + c’), (a + b’) and (b’ + c). Outputs of OR gates would be connected to inputs of the AND gate and the output of the AND gate would be F. We would generate all inputs to OR gates to find a solution. We leave to our reader the task of drawing the circuit and simulating it with various input combinations of variables a, b and c. The book will illustrate many important and practical problems, from robotics, CAD and other areas that can be reduced to SAT. Now we will present one more example of building a simple oracle for a problem other than SAT, just to illustrate where we are heading for in our book. The problem is this. We want to color nodes of the graph from Figure 1.1 below with as few colors as possible, so that any two nodes linked by an edge have different colors. Assuming that we have no any knowledge of the graph that is being colored other than that this graph has five nodes, we have to assume pessimistically that in the worst case the graph needs as many colors as there are nodes, which means five. Five numbers need at least 3 bits to encode them, it would be too bad to have this kind of a problem for a graph with 10,000,000 nodes which would be colorable with 2 colors, but let us make important point again that we have absolutely no information about the data (the graph properties that may help to evaluate its chromatic number) in this variant. We have only pure graph data, nothing else. However, if we would know some additional information, for instance that the graph is planar, we could solve this problem much more efficiently. For example, one can use the famous “Four Color Theorem” to know that only four colors are sufficient to color a planar graph, and thus encode the colors with only two bits. Thus, “knowledge is power” – any piece of general knowledge can help to solve the problem by designing a smarter search algorithm for it. Figure 1.1: Graph for coloring with five nodes. It is colored with red, blue and yellow colors in such a way that every two neighbor nodes have different colors. The chromatic number of this graph is 3. Assuming no knowledge of the chromatic number of the graph the encoding requires three bits for each color and is shown as in Table from Figure 1.2 below. One particular example of encoding another simpler graph is shown in Figure 1.5.
Figure 1.2: Assignment of bits to encoded colors of nodes for the graph from Figure 1.1. An inequality comparator circuit is used to compare two nodes of the graph, as shown in Figure 1.3 for nodes a and b. Such comparator is connected to encoding bits of any two nodes that are linked by an edge in the graph. If the colors of nodes a and b are the same then the output of the comparator will be zero. If the codes are different (which is good) then the output will be 1. Therefore, if oracle has such a comparator for every two nodes of the graph linked by an edge and if a global AND gate of outputs of comparators is created, the output of this AND gate will be one for a good coloring and will be a zero for a bad coloring. Even if only for one pair of neighbor nodes of the graph the proper coloring is violated, the AND gate will produce a 0. See Figure 1.6 for the standard (combinational) oracle. Figure 1.3: The inequality comparator used in Map Coloring and Graph Coloring problems. Here it compares node a with node b. Observe that the size of this comparator depends significantly on the possible maximum number of colors. The comparator produces “1” at its output if the arguments a and b are different binary vectors of width 3. The binary signal (a ≠ b) is also called a predicate. (a) Figure 1.4: (a) The inequality comparator from Figure 3.3.3 applied assuming five or more ( ≤ 8) colors in the graph. This is a schematic for the inequality operator circuit, which is used in many oracles. The classical schematics of the comparator using EXOR, NOT and AND gates is shown in Figure 1.4a.
Figure 1.5: Encoding of colors for the graph coloring oracle of another graph having 3 nodes. Figure 1.6: Principle of graph coloring applied to a simple graph from Figure 1.5. This is a classical oracle. In this and previous graph coloring problem we are not checking for a minimal solution. We look here only for a coloring that satisfies the constraint of correct coloring. Thus every proper coloring that uses any 3 of 5 colors is good (this example is trivial, but we wanted to have a simple circuit for the example). At this point our only goal was to explain the concept of an oracle for a constraints satisfaction problem different than SAT. Remember that many oracles can use various constraints similar to the constraint (A B) used here. In this example the oracle is very simple and can be designed by hand. In general, the oracle in the processor is a very complex circuit with many logic blocks; its design will require automation and the logic synthesis CAD should be used to build more complex oracles. Think about a graph coloring problem with 10,000 nodes. Think about some image processing system that uses many types of constraint operators other than the inequality constraint (A B) used above. Many such problems exist in real life. The need to design CAD tools for hardware design of large and complex oracles is a one more reason for us to use the VELOCE emulator. 