- Papers
- » Multiprocessor Organization and System Design
-
Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs
by Ehsan Totoni, Babak Behzad, Swapnil Ghike and Josep Torrellas,
International Symposium on Performance Analysis of Systems and Software (ISPASS),
April 2012.
[Presentation slides]
-
BulkSMT: Designing SMT Processors for Atomic-Block Execution
by Xuehai Qian, Benjamin Sahelices and Josep Torrellas,
International Symposium on High Perfomance Computer
Architecture (HPCA), February 2012.
[Presentation slides]
-
BulkCompactor: Optimized Deterministic Execution via Conflict-Aware
Commit of Atomic Blocks
by Yuelu Duan, Xing Zhou, Wonsun Ahn, and Josep Torrellas,
International Symposium on High Perfomance Computer
Architecture (HPCA), February 2012.
[Presentation slides]
-
FlexBulk: Intelligently Forming Atomic Blocks in
Blocked-Execution Multiprocessors to Minimize Squashes
by Rishi Agarwal and Josep Torrellas,
International Symposium on Computer Architecture (ISCA), June 2011.
[Presentation slides]
-
Rebound: Scalable Checkpointing for Coherent Shared Memory
by Rishi Agarwal, Pranav Garg, and Josep Torrellas,
International Symposium on Computer Architecture (ISCA), June 2011.
[Presentation slides]
-
Cache-Only Memory Architecture
by Josep Torrellas,
Encyclopedia of Parallel Computing, Springer Science+Business Media LLC,
May 2011.
-
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment
by Xuehai Qian, Wonsun Ahn, and Josep Torrellas,
International Symposium on Microarchitecture (MICRO), December 2010.
[Presentation slides]
-
The Bulk Multicore Architecture for Improved Programmability
by Josep Torrellas, Luis Ceze, James Tuck, Calin Cascaval, Pablo Montesinos,
Wonsun Ahn, and Milos Prvulovic,
Communications of the ACM (CACM), December 2009.
[Presentation slides]
-
Architectures for Extreme-Scale Computing
by Josep Torrellas,
IEEE Computer, November 2009.
[Presentation slides]
-
BulkCompiler: High-Performance Sequential Consistency through Cooperative
Compiler and Hardware Support
by Wonsun Ahn, Shanxiang Qi, Jae-Woo Lee, Marios Nicolaides,
Xing Fang, Josep Torrellas, David Wong, and Samuel Midkiff,
International Symposium on Microarchitecture (MICRO), December 2009.
[Presentation slides]
-
Hardware and Software Approaches for
Deterministic Multiprocessor Replay of Concurrent Programs
by Gilles Pokam, Cristiano Pereira, Klaus Danne, Lynda Yang, Samuel King,
and Josep Torrellas,
Intel Technology Journal, Issue on Addressing the Challenges of
Tera-Scale Computing, Vol. 13, Issue 4, December 2009.
-
Two Hardware-based Approaches for Deterministic Multiprocessor Replay
by Derek R. Hower, Pablo Montesinos, Luis Ceze, Mark D. Hill, and Josep Torrellas,
Research Highlight, Communications of the ACM (CACM), June
2009.
-
Lessons Learned During the Development of the CapoOne
Deterministic Multiprocessor Replay System
by Pablo Montesinos, Matthew Hicks, Wonsun Ahn, Samuel T. King, and Josep Torrellas,
Workshop on the Interaction between Operating Systems and Computer Architecture
(WIOSCA), June 2009.
[Presentation slides]
-
Capo: A Software-Hardware Interface for Practical Deterministic
Multiprocessor Replay
by Pablo Montesinos, Matthew Hicks, Samuel T. King, and Josep Torrellas,
14th International Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS), March 2009.
[Presentation slides]
-
DeLorean: Recording and Deterministically Replaying
Shared-Memory Multiprocessor Execution Efficiently
by Pablo Montesinos, Luis Ceze, and Josep Torrellas,
35th Annual International Symposium on Computer Architecture (ISCA), June 2008.
[Presentation slides]
-
Concurrency Control with Data Coloring
by Luis Ceze, Christoph von Praun, Calin Cascaval, Pablo Montesinos,
and Josep Torrellas,
Workshop on Memory Systems Performance and Correctness (MSPC), March 2008.
-
Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors
by Karin Strauss, Xiaowei Shen, and Josep Torrellas,
40th International Symposium on Microarchitecture (MICRO), December 2007.
[Presentation slides]
-
Paceline: Improving Single-Thread Performance
in Nanoscale CMPs through Core Overclocking
by Brian Greskamp and Josep Torrellas,
International Conference on Parallel Architectures and
Compilation Techniques (PACT), September 2007.
[Presentation slides]
-
BulkSC: Bulk Enforcement of Sequential Consistency
by Luis Ceze, James M. Tuck, Pablo Montesinos, and Josep Torrellas,
34th Annual International Symposium on Computer Architecture (ISCA), June 2007.
[Presentation slides]
-
Colorama: Architectural Support for Data-Centric Synchronization
by Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas,
13th International Symposium on High-Performance Computer
Architecture (HPCA07), February 2007.
[Presentation slides]
-
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops
in Embedded-Ring Multiprocessors
by Karin Strauss, Xiaowei Shen, and Josep Torrellas,
33rd Annual International Symposium on Computer Architecture (ISCA), June 2006.
[Presentation slides]
- Bulk Disambiguation of
Speculative Threads in Multiprocessors
by Luis Ceze, James M. Tuck, Calin Cascaval, and Josep Torrellas,
33rd Annual International Symposium on Computer Architecture (ISCA), June 2006.
[Presentation slides]
-
Rapid Prototyping in Architecture Research Using Hardware Hooks in COTS Systems
by Smruti R. Sarangi, Brian Greskamp, and Josep Torrellas,
Workshop on Architectural Research Prototyping (WARP), June 2006.
-
SWICH: A Prototype for Efficient Cache-Level Checkpointing and Rollback
by Radu Teodorescu, Jun Nakano, and Josep Torrellas,
IEEE Micro Magazine, IEEE, Inc., vol. 26, September-October, 2006.
-
ReViveI/O: Efficient Handling of I/O in Highly-Available Rollback-Recovery Servers
by Jun Nakano, Pablo Montesinos, Kourosh Gharachorloo, and Josep Torrellas,
12th International Symposium on High-Performance Computer Architecture (HPCA), February 2006.
[Presentation slides]
- uComplexity: Estimating Processor Design Effort
by Cyrus Bazeghi, Francisco J. Mesa-Martinez, Brian Greskamp, Josep Torrellas, and Jose Renau,
Technical Report No. UIUCDCS-R-2005-2644, August 2005.
-
The Design Complexity of Program Undo Support in a General-Purpose Processor
by Radu Teodorescu and Josep Torrellas,
Workshop on Complexity-Effective Design (WCED), in conjunction with ISCA, June 2005.
[Presentation slides]
-
Prototyping Architectural Support for Program Rollback Using FPGAs
by Radu Teodorescu and J. Torrellas,
2005 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April 2005.
[Presentation slides]
A one-page summary can be found as
Prototyping Architectural Support for Program Rollback:
An Application to Software Debugging
Workshop on Architecture Research using FPGA Platforms,
in conjunction with HPCA-11, February 2005.
[Presentation slides]
-
A Near-Memory Processor for Vector, Streaming and Bit Manipulation Workloads
by Mingliang Wei, Marc Snir, Josep Torrellas, and R. Brett Tremaine
Watson Conference on Interaction between Architecture, Circuits, and Compilers
(P=AC2), September 2005.
[Presentation slides]
Additional details of the Processor can be found in:
A Brief Description of the NMP ISA and Benchmarks
by Mingliang Wei, Marc Snir, Josep Torrellas, and R. Brett Tremaine
Technical Report No. UIUCDCS-R-2005-2633, February 2005.
- High Performance Memory Systems
by Haldun Hadimioglu, David Kaeli, Jeff Kuskin, Ashwini Nanda and Josep Torrellas, editors
290 pages, ISBN: 0-387-00310-X, Springer Verlag, New York, 2003.
- Design Trade-offs in High-Throughput Coherence Controllers
by Anthony Nguyen and Josep Torrellas,
International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2003.
[Presentation slides]
- ReVive: Cost-Effective Architectural
Support for Rollback Recovery in Shared-Memory Multiprocessors
by Milos Prvulovic, Zheng Zhang, and Josep Torrellas
29th Annual International Symposium on Computer Architecture (ISCA), May 2002.
[Presentation slides]
In the paper, there is a typo in the Y-Axes of Figs 9 and 10.
The corrected plots are here.
- Compiler-Assisted Software and Hardware Support for Reduction Operations
by F. Dang, M. Garzaran, M. Prvulovic, Y. Zhang, A. Jula, H. Yu, N. Amato, L. Rauchwerger, and J. Torrellas,
NSF Workshop on Next Generation Systems, April 2002.
- Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors
by Maria Jesus Garzaran, Milos Prvulovic, Alin Jula, Hao Yu, Ye Zhang, Lawrence Rauchwerger, and Josep Torrellas
International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2001.
[Presentation slides]
- Cache-Only Memory Architectures
by Fredrik Dahlgren and Josep Torrellas,
IEEE Computer Magazine, June 1999.
- Improving the Performance of Bristled CC-NUMA Systems Using Virtual Channels and Adaptivity
by José F. Martínez, Josep Torrellas, and Jose Duato,
1999 ACM International Conference on Supercomputing (ICS), June 1999.
- Software Trace Cache
by Alex Ramirez, Josep-L. Larriba-Pey, Carlos Navarro, Josep Torrellas, and Mateo Valero,
1999 ACM International Conference on Supercomputing (ICS), June 1999.
- Excel-NUMA: Toward Programmability, Simplicity, and High Performance
by Zheng Zhang, Marcelo Cintra, and Josep Torrellas,
IEEE Transactions on Computers, Special Issue on Cache Memory, February 1999.
A longer version is CSRD Technical Report 1544, November 1996.
- Upcoming Architectural Advances in DSM Machines and Their Impact on Programmability
by Josep Torrellas,
9th SIAM Conference on Parallel Processing for Scientific Computing, March 1999.
- Enhancing Memory Use in Simple Coma: Multiplexed Simple Coma
by Sujoy Basu and Josep Torrellas,
Fourth International Symposium on High-Performance Computer Architecture (HPCA), February 1998.
- The Performance of the Cedar Multistage Switching Network
by Josep Torrellas and Zheng Zhang,
IEEE Transactions on Parallel and Distributed Systems, April 1997.
A shorter version appeared as
The Performance of the Cedar Multistage Switching Network
Supercomputing'94, November 1994.
- Reducing Remote Conflict Misses: NUMA with Remote Cache versus COMA
by Zheng Zhang and Josep Torrellas,
Third International Symposium on High-Performance Computer Architecture (HPCA), January 1997.
- Speeding up the Memory Hierarchy in Flat COMA Multiprocessors
by Liuxi Yang and Josep Torrellas,
Third International Symposium on High-Performance Computer Architecture (HPCA), January 1997.
- The Illinois Aggressive Coma Multiprocessor Project (i-acoma)
by Josep Torrellas and David Padua,
6th Symposium on the Frontiers of Massively Parallel Computing, October 1996.
- An Efficient Implementation of Tree-Based Multicast Routing for Distributed Shared-Memory Multiprocessors
by Manuel Perez Malumbres(*), Jose Duato(*), and Josep Torrellas,
(* Universidad Politecnica de Valencia). 1996 Symposium on Parallel and Distributed Processing (SPDP), October 1996.
- Optimizing the Primary Cache for Parallel Scientific Applications: The Pool Buffer Approach
by Liuxi Yang and Josep Torrellas,
1996 International Conference on Supercomputing (ICS), June 1996.
- Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors
by Alain Raynaud, Zheng Zhang, and Josep Torrellas,
Second International Symposium on High-Performance Computer Architecture (HPCA), January 1996.
- Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors,
by Josep Torrellas, Andrew Tucker and Anoop Gupta,
Journal of Parallel and Distributed Computing, February 1995.
- The Performance of the Cedar Multistage Switching Network
by Josep Torrellas and Zheng Zhang,
Supercomputing'94, November 1994.
- An Efficient Algorithm for the Run-time Parallelization of DOACROSS Loops
by Ding-Kai Chen, Josep Torrellas and Pen-Chung Yew,
Supercomputing'94, November 1994.
- Comparing the Performance and Programmibility of the DASH and Cedar Multiprocessors for Scientific Loads
by Josep Torrellas and David Koufaty, and David Padua,
1994 International Conference on Parallel Processing (ICPP), August 1994.
- False Sharing and Spatial Locality in Multiprocessor Caches,
by Josep Torrellas, Monica S. Lam and John L. Hennessy,
Transactions on Computers, June 1994.
- Characterizing the Caching and Synchronization Performance of a Multiprocessor Operating System,
by Josep Torrellas, Anoop Gupta, and John Hennessy,
ASPLOS V, October 1992.
- Shared Data Placement Optimizations to
Reduce Multiprocessor Cache Miss Rates
by Josep Torrellas, Monica Lam, and John Hennessy,
1990 International Conference on Parallel Processing (ICPP), August 1990.