Volume I section administrative items cover Sheet

Скачать 253.88 Kb.
НазваниеVolume I section administrative items cover Sheet
Размер253.88 Kb.
1   2   3   4   5   6

2.4.5 The Lincoln Labs 3D SRAM experience.

DARPA funded the development of a pioneering 3D process at Lincoln Labs based upon Tungsten TOV’s through stacked FDSOI CMOS wafers for the purpose of applying this technology to fabricating compact sensor arrays called Vertically Integrated Sensor Array (VISA). IN these arrays the sensor area could be extremely large and all the signal capture circuitry could be positioned in 3D fashion under the sensor tier of the stack. 3D MPW fabs were offered to universities and the RPI team designed a 3D three tier SRAM for this process.

Figure 2.4.5-1. Cross-section of Lincoln Labs 3D Chip Stacking Process showing Tungsten TOV’s.

The Lincoln Labs process uses oxide-to-oxide full wafer to full wafer bonding, and each tier uses 0.18micron FDSOI CMOSwafers, but requires rework due to particulate defects. Nevertheless, the process does yield working parts, especially for digital circuits. In particular one of our chips fabricated using this process is a 3D SRAM and was designed by the RPI team. Another chip was submitted and fabricated exploring 3D Floating Body Effect or FBC DRAM. If successful, this type of DRAM can be combined with SRAM in later efforts to yield any combination of speed. or density desired. Ultimately, MRAM could be one of the tiers. MRAM is a magnetic memory with a possible bit density far greater than obtainable with disks, and it can eliminate the rotational and head seek latency of conventional disks.

Figure 2.4.5-3. Three Tier 3D SRAM Designed by RPI and Fabricated by MIT Lincoln Labs using a DARPA 3D MPW. Chip dimensions 5mm x 10mm.

Figure 2.4.5-4. Measured Waveforms from the 3 Tier 3D SRAM shown in Figure 2.4-7

An additional benefit of 3D stacking which was employed in this design is the ability to “fold” word lines as shown in Figure 2.4.5-5, fanning out one third of each word line on each tier of each block. This resulted in an improvement in access time shown in Figure 9. Additional global access lines were shorter in the “x direction” as a result of this block organization. This extra access-time improvement effect is not shown in Figure 2.4.5-6.

Figure 2.4.5-5. Per block word line organization for memory shown in Figure 6 showing 66% length reduction.

The 3D SRAM was designed with a 4 x 384b or 1536b wide cache access bus. For the 0.15-micron process this provides an impressive 96GB of data bandwidth.

Figure 2.4.5-6. Comparison of Read Access Times in 2D and 3 Tier 3D. 1-Decoder, 2-Wordline + Wordline_driver, 3-Bitline, 4-Sense_Amp.

One can see by this comparison that the wire shortening effect in 3 Tiers is only 14%. So while a welcome improvement, wire shortening has only a limited impact, wider bus widths have a far larger impact. Still wider buses are possible, of course, but with access to only one cache line address location, and only one task thread, wider widths have diminishing benefits. This is because instruction branching can force a jump to another line of cache, this may force a cache miss, or data locality may be limited. Dinero can be used effectively to measure this impact. Even the wire shortening effects just mentioned have limited impact. However, the number of vertical vias possible is much larger than 1536 and the natural question is how to exploit this extra bandwidth. Several strategies are possible as suggested in Figure 2.4.5-7, which may use the three tiers differently.

Figure 2.4.5-7. Three Strategies that can Exploit More Vertical Bus Width. (left) Simultaneous Access of Different Addresses on Multiple Tiers, (middle) Simultaneous Access on Different Blocks on Different Tiers, (right) Simultaneous Access on Different Ports on Different Blocks on Different Tiers.

Investigating these strategies is one of the thrusts we propose to study. These strategies also involve devising memory management strategies to coordinate these wider access buses. One of the features that has to be added to the simulations is multiple threading. Essentially when one of these ultra wide buses becomes tied up with one cache transfer the processor can switch to another task which in turn could access a different tier or block or port. That access can be undertaken in a hidden fashion so the two accesses occur nearly simultaneously. The amount of memory provided in the 3D stack can be traded with the bandwidth as codified in Emma’s Law [11]:

This states that the number of threads that can be sustained depends on the bus bandwidth and cache size. This shows that the most powerful effect on thread support is obtained by raw cache size, but shows that there is an augmenting approach through increased bandwidth. The two operating together delivers the ultimate in thread capacity and 3D provides both possibilities in copious abundance and permits one to be traded for the other. The Technical Merit of these strategies is that they are needed in the presence of “cache noise” [12] by providing additional ways to exploit the huge bit bandwidth available, permitting more aggressive use of predictive cache management Multicore chips can have various different inter connections with memory. Perhaps the simplest is through sharing of a ultra wide common bus

Figure 2.4.5-8 (left). Multicores Shared Bus with Shared Memory. Figure 2.4.5-8 (right). Shared Ultra-wide Any-to-Any Crossbar Switch Interconnection Fabric.

The shared bus concept in Figure 2.4.5-8 (left) unfortunately has the effect that all processors need to fight for that scarce resource whose use must be arbitrated. If there are enough transistors available from Moore’s Law scaling this should be replaced by an ultra wide cross bar switch as shown in Figure 4.4.5-8 (right). The replacement of the shared bus with an ultra-wide Any-to-Any crossbar switch preserves much of the flexibility of the shared bus, but vastly reduces the arbitration of conflicts. Finally, by moving much of the memory into the package using 3D, that choke point (on memory access) can be mitigated. At the same time pad driver power is reduced, and ESD and pad parasitics for vertical connecitons are eliminated. However, the number of transistors involved in the crossbar switch and the memory management to implement these ideas may be so large that an extra tier in the memory over processor stack may have to be used to accommodate these circuits again using 3D.

Figure 2.4.5-9 (left). Using 3D Move Much of the Memory into the Package for L1 and L2 Cache to Track Emma’s Law, Increasing both Memory and Bandwidth. Figure 2.4.5-9 (right) Memory-over-Processor 3D Stack (Including Crossbar and Memory Management Tiers).

Some early results on multi core CPI calculations were obtained [10] for multicore processors with very simplified assumptions where easily decomposed threads could be accommodated in the existing software tools, which include CACTI and DINERO [13]. The code set included 6 benchmark programs. These results were for multi-core processors with a 3D cache operated with accesses to the 3D cache memory being independent with no memory access conflicts, and to separate layers.

Figure 2.4.5-10: Execution time of 6 programs when executed on a single core, multiple

cores (2,4) and a 16,32 GHz processors designed in the preceding sections (with 3D

memory and BiCMOS memory and context switching). The impact of bus width is

explored in the case of multi-core processor. Architectural setup as in Figure 10(a).

Figure 2.4.5-9 shows that the wide bus width effect starts to saturate at 256b per processor for multiple cores that have essentially their own independent 3D captive memory on a different tier. Of course this is not totally representative of all code. However practical systems would behave more like a shared memory system with different processes sharing the memory system. One would need to implement cache coherent protocols to maintain valid data in all the cache banks. Such a system can be simulated employing a shared memory multiprocessor simulator, such as RSIM [13][14]. Figure 2.4.6-11 shows the RSIM architecture. Which has been adapted to a multi-core 3D memory system and which most closely resembles Figure 2.4.5-8 (left).

Figure 2.4.5-11: RSIM architecture adapted to 3D multi-core processor memory stack.

Figure 4.4.5-12 shows an improvement in performance in terms of CPI and execution time with memory bandwidth for a multi-core system implemented with an architecture similar to the CC-NUMA distributed memory simulator- RSIM. The simulator provides the option to implement MESI and MSI protocols to maintain cache coherence. The simulations here are carried out with the MSI protocol to maintain coherence between the different memory banks.

Table I: Shared memory –multicore processor setup for 3D architecture.

Figure 2.4.5-12: A multi-core system with 3D memory simulations to show reduction in

execution time using RSIM simulator. For the simulations an FFT program is used being programmed as a 4-way, 16-way and 64-way process is used.

The bus width effect is severely muted in the 16 core and 64 core limit because the scarce resource is the single bus of Figure 2.4.5-8 (left), which can only service one of the processor transfers at a time however wide it might be. Essentially the bus becomes the bottleneck. The algorithm may have S code that contributes to the flattening of performance. This helps focus the research on Technical Merit for our research to model by simulation, namely removing the bottleneck of the internal bus and S code congestion. Figures 2.4.5-7 (middle and right), as well as Figures 2.4.5-8 (right) and 2.4.5-9(left) which have a much richer interconnect path mix, require additional modeling (one of the proposed tasks). However, measuring overall run time for real codes on multiple cores also is influenced by the Amdahl S parameter. To understand this implication we need to examine the new INTEL Nehalem processor. In the Nehalem when cores cannot be used because the code being processed does not parallelize they are depowered, but additionally the remaining serial “S” code is run on a remaining powered core in which the remaining power is redirected towards that single core so that it can be run faster. From the diagram the amount of speed up possible is small because of the wire resistance problem. This idea reveals at least two points. One is that depowering cores would not be worthwhile if S were less than about 4%. As one would save power only 4% of the time. It suggests that S is actually much larger even than 10% since the effort to save only 10% is hardly worth the additional complexity.

Figure 2.4.5-13. INTEL Nehalem Turbo Mode, De-powering Parallel Cores.

This suggests that some form of heterogeneous core be used as the processor to which executing code must retreat during serial segments. This one core could be run at a much higher clock rate when running the serial code, possibly to be itself de-powered when parallel code re-emerges to run on many cores. As further evidence that some important code operates with S closer to 33% we present a COMSOL CPU profile. COMSOL is a Finite Element Modeling (FEM) program that is typical of the kind of application that has a healthy amount of parallelism, including graphics output.

Figure 2.4.5-14. CPU History for typical COMSOL Multiphysics Thermal Analysis Run.

Saturation of performance for COMSOL occurs around 3 cores. The program actually runs slower with 4 cores on two-day tasks. This makes it worthwhile to consider a heterogeneous power managed system in which during serial code operation the clock rate for that core can be higher by a factor of say m. The appropriately changed Amdahl FOM would then be

If m = n then this figure of merit is n/[1 + B/{n(S+P)}], for large enough n (or small enough B), would give a Figure of merit of n. How to make S code go n times faster is the question? Evidently CMOS is not the answer. The clock race is over for CMOS. Although CMOS has been clocked at 5 GHz (A Quad Opteron attained this) Liquid Nitrogen Temperature is required to reduce wire resistance and enhance carrier mobilty. A single core 4.7 GHz Power PC has been obtained at Room Temperature, but it required 120 watts of power dissipation. If we let n go to infinity even if m is not equal to n then this figure of becomes m[(S+P)/(S+B)], which is the product of the Figure of Merit obtained without serial code enhancement, and the clock rate enhancement factor. In other words, if BOTH parallelization and serial code speed-up (by clock rate enhancement) strategies are employed, they complement each other. For example, if we ignore B for simplicity, the figure of merit for an infinite number of course asymptotically approaches (1+P/S). For P=66%, and S=33%, this is just 3.000. But if the residual 33% of serial code is also sped up by the clock rate improvement, m, then the figure of merit is m (1+P/S), and for the same S and P if m is say 10, then the total improvement is 30, not just 3.

2.4.6 Heterogeneous Integration

Industry has begun a transformation from homogeneous cores to heterogenous cores, each tuned to the application it services. What to do about S code is the question. An interesting possibility is suggested by an often-overlooked fact today, which is that the first three Pentiums were BiCMOS. These were at the 0.8µm, 0.6µm, and 0.4µm generations. Die size was on the order of 1.5 cm x 1cm. Counter-intuitively, these Pentiums did not melt. In fact their power dissipations and heat sinks were modest. But their device fT was only roughly 10 GHz. Only three levels of metal were available, and no insulators had low K. Solomon and Tang scaling [16] was used, which requires only that the current conducted by the bipolar devices is kept constant from generation to generation of device scaling by 33% (which does require increasing current densities). With constant current and constant voltage one has the same total power from generation to generation. All of these early Pentiums had Floating Point. Some of the current cores used in multiple core arrays are not much more complex than this early design. All had a RISC architecture at their heart. The larger wire dimensions for bipolar are what preserves the possibility of higher clock rates. So this one core would be larger than the CMOS cores (about 1cm x 1cm). So this will be one of the tasks addressed in the research.

Figure 2.4.6-1. Early BiCMOS Pentium, with Floating Point, L1 TLB Cache, and Pre-fetch Buffers.

Figure 2.4.6-2. 16 GHz Test Core fabricated in IBM 8HP [17].

There are 8 (going to 9) levels of metal and low K dielectrics in the latest processes. The availability of the HBT device in full BiCMOS results in an interesting possibility for heterogeneous integration of one extremely fast core with many slower CMOS cores. These can be deployed (powered up) for parallel computation when appropriate. Long stretches of serial code can be executed on the faster node which itself can be de-powered when the parallel code is running. However, the wire dimensions used for bipolar design is much larger than for CMOS due to the different way that bipolar scales from CMOS, and hence puts off the effect of wire resistance. Next generation building blocks appear to run at 32 GHz. Hence we are proposing that at least one heterogenous core could actually be BiCMOS.

Figure 2.4.6-3. Heterogeneous Integration of power managed Multiple Core Units (MCU’s), Graphics Processor Units (GPU’s), and a High Clock Rate Unit (HCRU) for Fast Serial Code (S) using a mystery technology, which could be SiGe HBT BiCMOS.

Startling and incredible though the idea is, a SiGe HBT BiCMOS HCRU for S code is probably the only way to deal with the S or serial code problem. Heat isn’t the issue if Solomon and Tang scaling is maintained. All the early INTEL Pentiums were BiCMOS, and in 1997, EXPONENTIAL on an Apple subcontract developed a fully bipolar CML PowerPC which continues to be the model for our effort. Mixing circuits at dramatically different clock rates is the challenge. Incorporation of one ultrahigh clock rate core in the processor is one of the research goals. We will study 3D mitigation of high clock rate S core or HCRU as well as the synchronization issues as part of the Technical Merit of Broad Impact in the proposed study.

2.5. Statement of Work

The main thrusts of this work are

(1) Design a (non-cryo) High Clock Rate Unit (HCRU) demonstrator with a minimum clock frequency of 32 GHz and at a power level of 30 watts (or lower) using IBM’s new 90nm 9HP SiGe HBT BiCMOS process. For demonstration, a stripped down PowerPC is proposed with clear pathway towards technology transfer.

(2) Design a 3D memory containing SRAM and Floating Body Capacitance (FBC) DRAM suitable for 3D integration to the aforementioned HCRU using the most advanced scaling nodes possible.

(3) Fabricate the 9HP HCRU on a full wafer substrate using a collaborative IBM early access fabrication run (cost included).

(4) Fabricate the 32SO1 3D memory through the DARPA LEAP program (requires approval from MTO for such access or alternatively through a 3D tier of 90nm memory if this is not possible using the same 9HP wafer).

(5) Use dies from the LEAP fabrication run from (4) to reconstitute a template wafer suitable for 3D wafer to wafer bonding to the 9HP wafer from (3).

(6) Bond and test the 3D stack containing the HCRU (and possibly other circuitry employing standard cells) with the 3D memory.

(7) Thereby demonstrate a high clock rate processor with memory wall mitigation at modest power, and superior performance per watt.

(8) Document this accomplishment in IEEE and ACM publications as well as reports to the sponsor.

2.6. Intellectual Property

No claims of intellectual property will be asserted for the work performed here.. In fact every effort will be made to maintain the portability and publishable nature of the results including use of popular standard cell libraries where possible to insure portability into any possible product flow employing IBM technology. All underlying semiconductor fabrication, we will employ standard off-the-shelf (but extremely new) IBM fabrication technology.

2.7 Management Plan

The proposed design, 3D assembly, and testing will be conducted at RPI. The management plan is made extremely simple through the use of the subcontract mechanism. The major subcontract is to IBM through a MOSIS brokered agreement for early access to IBM’s 9HP on a fully dedicated wafer run to be executed in early 2013 when the first rollout occurs for the process. A second presumed path is through the DARPA LEAP program to obtain early access to IBM’s 32nm SOI process in the form of singulated dies, which are to be inlayed into a reconstituting template for final wafer bonding. 32nm is IBM’s latest CMOS offering and DARPA has negotiated early access to it. Of course the case must be made within DARPA MTO for participation of this project on one of the LEAP runs and this only makes sense if there is an award. In the time window we are discussing it may be that 22SO1 may be the preferred LEAP payload. However we have assumed 32SO1 at IBM for this discussion.

Figure 2.7-1. Organizational Chart showing interactions with Collaborators . The color red identifies the key funded partnerships.

RPI is uniquely situated in close proximity to SEMATECH in Albany, IBM at East Fishkill, and the pending new AMD/Global Foundry facility just 20 miles north of the campus in Malta, New York and due to open for production in 2013. Two of the country’s three major 33nm plants are in NY, and the SiGe HBT BiCMOS is fabricated in nearby Burlington, VT, and so in this cluster, the potential for technology transfer is excellent.

One aspect of this organizational chart is important to understand. IBM has very generously offered to provide early access to CADENCE design kits for 9HP, and already has provided access to their 32nm SOI kit, and has encouraged ARM to provide free cell libraries in support of this goal. Already there is a 9SF ARM library, and we are close to negotiating a 32SO1 cell library. More importantly IBM has enabled MOSIS to give us an early price quotation of $1,650,000 for a full wafer dedicated 9HP run in 2013, among the very first expected. Additional wafers beyond the 6 resulting from this fabrication cost only $14,500 each, and4 additional wafers are in the budget. However, not all of that 9HP reticle plate set is needed for this project. If possible, non-proprietary, compatible customers (such as other universities or DARPA projects) may be encouraged to purchase unused space on this payload, freeing up some of this funding for additional full wafers for use on this project. The MOSIS price quotation is attached here produces up to 6 wafers, and an additional 4 are requested. In addition, possibly only a few 9HP wafers will be needed if success occurs early. It is realized this large subcontract is very expensive but it seems to be the cost of doing research right at the cutting edge.

Implicit in the budget is the assumption that LEAP funding will provide the 32SOI dies. This must be debated by DARPA through LEAP competition. As explained earlier the back-up plan is to flip a 9HP wafer around its centerline at which point a 90nm 3D memory off of one 9HP run is possible.

Figure 2.7-2. MOSIS price quotation for dedicated 9HP fabrication run and price for supplemental wafers, which can be ordered subsequently to the initial wafer purchase.

In addition to this path for a dedicated full wafer run, another approach is being explored to ablate excluded IP from an MPW at a much lower price, but more research will have to be done to verify this can be done. If it can, then substantial reductions in the third year are possible. A second possible price lowering mechanism might be possible on MPW’s where use of stepper blading, or use of a select number of obliterated masks in the proprietary other IP areas might block them out. These are being explored but the full wafer price is requested in this budget estimate. Any cost savings may be either used to reduce the cost to DARPA, or hire additional students for the support of this project. For the moment we must include the worst case scenario, namely that we must purchase a full wafer dedicated run for the 9HP to have guaranteed access to it.

2.8 Schedule and Milestones

2.8.1 Tasks with Milestone Targets

The main thrusts of the program involve finding ways to exploit the huge bit bandwidth of 3D memory over processor architecture. Following Emma’s Law the throughput can be expanced by increasing the number of threads processed on the multiple cores by a combination of ultra dense 3D memory and exploitation of multiple ultra wide high bandwidth buses. A three-year program is envisioned. Each task is listed below, and the successful completion of each effort is the milestone for that part of the project.

In the first year the initial initial building blocks for the architecture will be developed in 9HP and 32SO1 (T1), Simulations in existing tools will be pushed as far as possible using the existing RICE, Dinero, CACTI, and RSIM tools (T2), and the three strategies in Figure 2.4.5-7 (T3) . Small building blocks will be fabricated using 8HP and MOSIS MEP and other pathways to early fabrication (T4). Where possible 9HP small test chips will be designed and submitted for fabrication through a variety of paths available to university subsidized fabrication including IBM donations as made available (T5). In addition the 32SO1 3D memory tier will be designed (T6). Additionally, work will be carried out to develop the C2W process for eventual 3D integration of the singulated 32nm SOI dies on the 9HP wafer substrate (T7).

In the second year the entire architecture of the demonstration processor will be assembled, and wire loading effects will be merged with logic simulation to produce delay realistic simulation (T9), The first multicore simulations with the proposed Memory Management of heterogeneous multicores with one or two HCRU’s will be generated (T10). The full wafer 9HP fabrication tapeout will be completed for early 2013 9HP submission (T11). Designs of the 3D 32nm SOI memory die will be submitted through LEAP for fabrication (T12). Publication of the first journal articles on 9HP fabricated test chips will be accomplished (T13). Practice runs of the 3D C2W bonding will take place (T14).

In the third year, early in 2013 expect the 9HP wafers to be returned from IBM. Dies from the 32nm SOI run will be processed using the C2W approach to form the template (T15).. Wafer bonding of the template to the 9HP substrate will be executed forming the 3D die stack (T16). Testing of the composite 3D stack will then be undertaken (T17). This brings us to task T18 the final report. Additionally quarterly reports will be synchronized with the quarterly reportage trips. Additionally an annual report will be submitted.

Most of these tasks span multiple quarters. Milestones are at the termini of the span of the task in the attached Gantt chart shown here below. Quarterly Reporting on each major result will occur on the quarter after nominal completion of that subtask. Otherwise progress on these tasks will be reported. Again the funds consumption rate will remain reasonably flat for the entire duration of the project due to the unique manner in which university research is done (by units of graduate student tuitions and stipends which are not severable), with the major exception of the subcontract for $1,650,000 plus four additional wafers (total of 10) which will be committed through MOSIS in early 2013. There are a few costs such as clean room charges which accumulate and then are billed at various points during such a contract.

2.8.2 One Page Graphic (Gantt chart)

MILESTONE GANTT CHART (shows quarters past the award date)

2.8.3 Project Management and Interaction Plan –

Because the project design and 3D integration efforts are located at the same place (RPI in Troy, NY) and in fact on the same floor of the George M. Low Center for Innovation in Industry, project interaction is anticipated to be very good. In fact some of the grad students will be involved in both design and in clean room 3D integration efforts.

Since the only major subcontract is at IBM, in Burlington, VT, we are deeply engaged with the people in charge of the progression of the new IBM 9HP process as it evolves. TAPO money is apparently involved in funding this process through a special congressional funding mandate. Some of the key contacts at IBM are

1. Jim Dunn (manager of SiGe line at IBM at Burlington)

2. David Harame (chief scientist, IBM Fellow, IEEE Fellow at IBM Burlington)

3. Bernard Meyerson (developer of the SiGe UHCVD process and the graded base HBT, VP of IBM, Fellow of IBM and the IEEE)

4. Al Joseph (SiGe HBT circuit and device characterization Expert at IBM Burlington)

5. Ned Cahoon, (manager of the early 9HP fabrication MPW’s and in our case the full wafer run).

6. Tak Ning, (IBM and IEEE Fellow at Yorktown Heights for device physics) overview of device technology for computing.

7. Paul Solomon (IBM and IEEE Fellow at Yorktown Heights and creator of the Solomon and Tang scaling)

8. Wilfried Haensch (IBM Fellow charged with 3D by wafer bonding research)

9. Chuck Webb (former RPI student, co-designer of the recent z-196 5GHz 8 core processor)

10. Phil Jacob (former RPI 3D memory wall mitigation doctoral student from this group currently at Yorktown, involved with Blue Gene)

11. Steve Carlough (former RPI CML design doctoral student at IBM Pougkeepsie)

12. Bob Philhower (former RPI CML design doctoral student at Yorktown, currently involved with CELL).

13. Jong-Ru Guo (former RPI CML design doctoral student at East Fishkill)

These contacts keep us informed about 9HP and SOI developments at IBM within proprietary bounds. We are in constant email contact on a day-to-day basis. Each year Professor McDonald attends the DARPA TAPO IBM technology review meetings at IBM’s Burlington facility. All of these IBM facilities lie within a few hours drive from the RPI campus.

2.9 Personnel, Qualifications, and Commitments

Prof. John F. McDonald – Principal Investigator – mcdonald@unix.cie.rpi.edu

a. Professional Preparation.

Undergraduate Institution(s)

Massachusetts Institute of Technology, 1963



Degree & Year

BSEE 1963

Graduate Institution(s)

Yale University

Yale University




Degree & Year

M.Eng. 1965

PhD 1969

b. Appointments.

1985- Full Professor Rensselaer Polytechnic Institute Troy, NY

1974-1985 Associate Professor Rensselaer Polytechnic Institute Troy, NY

1969-1974 Assistant Professor Yale University New Haven, CT

1968-1969 Lecturer Yale University New Haven, CT

c. Publications:

Over 290 publications – roughly one third are journal articles, 20 are book chapters.

c.1. Select Relevant Recent Journal Articles.

{134.} "Cell Library for Current Mode Logic using an Advanced Bipolar Process," (with H. J. Greub, T. Yamaguchi, and T. Creedon), IEEE J. Sol. State Cir., Special issue on VLSI, (D. Bouldin, guest editor), Trans. on Solid State Circuits, Vol. JSSC-26 (#5), pp. 749-762, May, 1991 (birth issue of T. VLSI).

{209.} “A 2-GHz Clocked AlGaAs/GaAs HBT Byte-Slice Datapath Chip,” with S. R. Carlough, R. A. Philhower, Cliff A. Maier, S. A. Steidl, P. M. Campbell, A. Garg, K.-S. Nah, M. W. Ernest, J. R. Loy, T. W. Krawczyk, P. F. Curran, Russell P. Kraft, and H. J. Greub, IEEE Journal of Sol. State Circuits, Vol. 35(#6), June 2000, pp. 885-894.

{231.} “A 32-Word by 32b Three Port Bipolar Register File Implemented using a SiGe HBT BiCMOS Technology,” S. Steidl and J.F. McDonald, I.E.E,E. J. Sol. State Circuits, Vol. 37(#2), Feb. 2002, pp. 228-236.

{244}. "SiGe HBT serial transmitter architecture for high speed SERDES variable bit rate intercomputer networking," T.W. Krawczyk, P.F. Curran, M.W. Ernest, S.A. Steidl, S.R. Carlough, J.F. McDonald and R.P. Kraft, IEE Proc.-Circuits, Devices Syst., Vol. 151, No. 4, August 2004, pp. 215-321.

{255.} "SiGe HBT Microprocessor Core Test Vehicle," Proc. IEEE [Special Issue on SiGe Technology - Guest Editors, R. Singh, D. Harame, and B. Meyerson], P. M. Belemjian, O. Erdogan, R. P. Kraft, and J. F. McDonald, Vol. 93(#9), Sept. 2005, pp. 1669-1678.

{260.} “Predicting the Performance of a 3D Processor-Memory Chip Stack,” P. Jacob, O. Erdogan, A. Zia, P. M. Belemjian, R.P.Kraft, and J. F. McDonald, IEEE Design & Test of Computers, V. 22(#6), Nov.-Dec., 2005, pp. 540-547.

{265.} “A 52 Gb/s 16:1 transmitter in 0.13 micron SiGe BiCMOS

technology,” Y.U. Yim, P.F. Curran, M. Chu, J.F. McDonald and R.P. Kraft, IET Circuits Devices Syst., Vol. 1, No. 6, December 2007, pp. 428-432.

{269.} Mitigating Memory Wall Effects in High Clock Rate and Multi-core CMOS 3D IC’s – Processor Memory Stacks,” P. Jacob, A. Zia, O. Erdogan, P. M. Belemjian, J.-W. Kim, M. Chu, R. Kraft, Kerry Bernstein, and J. F. McDonald, Proc. IEEE (Special Issue on 3D Integration Technology), Vol. 97, No. 1, January 2009, pp. 181-122.

{280.} K. Zhou and J. F. McDonald, “Impact of Deep-Trench-Isolation-Sharing Techniques on Ultrahigh-Speed SiGe HBT Digital Systems,” IEEE Transactions on Circuits and Systems-II, Vol. 56(#10), Oct. 2009, pp. 778-784.

{284.} Michael Chu, Philip Jacob, Jin-Woo Kim, Mitchell R. LeRoy, Russell P. Kraft, and John F. McDonald, "A 40 Gs/s Time Interleaved ADC Using SiGe BiCMOS Technology," IEEE Journal of Solid-State Circuits, Vol. 45(#2), FEBRUARY 2010, pp. 380-390.

{285.} A. Zia, P. Jacob, P. Jacob, J. W. Kim, M. Chu J. F. McDonald, and R. Kraft, “A 3-D Cache With Ultra-Wide Data Bus for 3-D Processor-Memory Integration,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, TVLSI-18(# 6), June 2010, pp. 967-977.

d. Relevant Patents

{10.} “Three Dimensional Face to Face Integration” (USPTO # 7,453,150), 10/15/2008.

{11.} “Face-to-Face 3D techniques for BEOL integration” (USPTO # 7,642,173), 1/5/2010.

e. Professional Affiliations

Institute for Electrical and Electronics Engineers (IEEE), Life Senior Member.

ACM, Member

MRS Member

Sigma Xi, Life Senior Member

Tau Beta Pi

Eta Kappa Nu

f. Synergistic Activities

Research involves SiGe HBT BiCMOS, 3D integrated circuit design, and heterogeneous integration of differing technologies. SiGe HBT BiCMOS activities include design of interleaved ADC/DAC systems, SERDES, Electro Optic modulators, novel digital ultra high speed and/or low power.

g. Relevant Prior and Recent Sponsorship

DARPA NGI, DARPA TEAM (Dan Radack, PM), DARPA 3DI (Mike Fritze and Dan Radack, PM’s), DARPA/MARCO iFRC (Dan Radack, PM), DARPA Seedling (Mike Fritze,PM), NSF EAGER (Krishna Kant, PM).

h. Citizenship


1   2   3   4   5   6


Volume I section administrative items cover Sheet iconVolume I section administrative items cover Sheet

Volume I section administrative items cover Sheet iconVolume I section I. Administrative items i-a. Cover Sheet

Volume I section administrative items cover Sheet iconUniversity of edinburgh cover sheet for a new or revised course section A

Volume I section administrative items cover Sheet iconRegion 4 utah aquistion support center instructional cover sheet

Volume I section administrative items cover Sheet iconDocument cover Sheet-Studies in Avian Biology-tidal marsh vertebrates

Volume I section administrative items cover Sheet icon13 Limitations on section 43 of the Administrative Appeals Tribunal Act 1975

Volume I section administrative items cover Sheet iconBurridge’s Multilingual Dictionary of Birds of the World: Volume XII – Italian (Italiano), Volume XIII – Romansch, and Volume XIV – Romanian (Român)

Volume I section administrative items cover Sheet iconDo not search for items in bold print as they are dated between june and september 2002. If your library is in the location field of these items, change the status to missing. Thank you

Volume I section administrative items cover Sheet icon18 (4), 389-395 Milsom, I., Forssman, L., Biber, B., Dottori, O. and Sivertsson, R. (1983), Measurement of Cardiac Stroke Volume During Cesarean-Section a comparison Between Impedance Cardiography and the Dye Dilution Technique. Acta Anaesthesiologica Scandinavica, 27

Volume I section administrative items cover Sheet iconThe following pages. Cover designed by Jack Gaughan first printing, march 1980 123456789 daw trademark registered printed in canada cover printed in u. S. A

Разместите кнопку на своём сайте:

База данных защищена авторским правом ©lib.znate.ru 2014
обратиться к администрации
Главная страница