1.PASAP: Power Aware Structured ASIC Placement Flow
Ashutosh Chakraborty and David Z Pan
ECE Department
University of Texas at Austin
ashutosh@cerc.utexas.edu dpan@ece.utexas.edu
2.Outline
Introduction & Motivation
Architecture and power model
Placement flow for low power (PASAP)
Results
Conclusions
3.Introduction – Structured ASICs
Middle ground between ASICs and FPGAs
“Almost like ASICs, but slight configurability”
Devices and most metal layers mass produced
Slight configurability using
1 or 2 metal layers, or
Fuse top metal layer vias
Just enough to create differentiated products while sharing same implementation platform
Main problems: Power consumption
Much better than FPGA, but worse than ASIC!
3
4.Introduction – Structured ASICs
Customize
top metal
layers
Warehouse
Power may be the Achilles heal for adoption of structured ASICs in mobile/portable devices
5.Structured ASICs – Rough usage model
Buy a structured ASIC platform (PLAT)
Say having N pre-fabricated gates
“Place” your gate level design on PLAT
Say having M gates s.t. M < N
Find pre-fabricated gates which are unused
Disconnect from power supply. Different granularity.
5
Platform
Shut off
Netlist
6.Motivation – How to save more power?
Existing method:
Place gates, find unused component, shut them down
“Shut down” at some levels of hierarchy (not very fine)
Q: Is there a way to maximize such components?
For example: Power(place. 1) > Power(place. 2)
6
VDD
Placement 1
Placement 2
netlist
7.Key Idea / Problem Formulation
Given:
A structured ASIC platform of given capacity
A gate level design to map on platform
Granularity of power shut-off possible
Perform:
Legal placement (site, clock, and physical constraints)
While maximizing components that are power shut-off
Constraints:
Should work transparently with existing tools
7
8.Outline
Introduction & Motivation
Architecture and power model
Placement flow for low power (PASAP)
Results
Conclusions
9.Architecture of a Structured ASIC
TILE
FLOPS (DFF)
LOGIC CELLS
BLOCK RAM
REGISTERS
netlist
10.Placement constraints
All cells must be assigned to correct location type
Logic cell to red, DFF to green, etc
No more than N global clocks in platform
All global clocks routed to reach each tile
No more than T clocks can enter a tile
Constrained by routing ability within a tile
Which T out of N clocks enter, can be configured
10
TILE
4
to
2
4 global clocks
Any 2 can enter tile
11.Clock Distribution Model (platform level)
Buffer B1 inserts clock in middle
Distributed by B2 level to vertical half trunks
Buffer B3 inserts clock in one tile
TILE
Horizontal Trunk
Vertical Half Trunk
12.Clock Distribution Model (tile level)
B4 level drives each column inside each tile
BRAM / REG may need multiple drivers
No need to drive LOGIC cells (combinational)
Column
13.Clock shutdown model
Input of any B1/B2/B3/B4 buffer can be disconnected by via programming
Clock does not propagate to next level
Saving switching power
13
Maximize unused
vertical spines,
horizontal segments
and columns
Higher abstraction
the better saving.
14.Leakage reduction model
Not related to clock distribution, but devices itself
Millions of pre-fabricated gates
Any column if totally unused, can be shut down.
Need to maximize unused columns
15.Outline
Introduction & Motivation
Architecture and power model
Placement flow for low power (PASAP)
Results
Conclusions
17.Pre-emptive Blockage Insertion
Idea: Make lightly used regions as blockages
Forces placer not to use them
Need to estimate the usage before placement
Rough Placer
(Whole platform)
Usage and density analysis
Create Spine and Tile blockages
Real Placer
(Partial platform)
Next Stage
18.Spine and Tile Reduction
Find spines with few cells in constituent tiles
Find tiles with few cells in constituent columns
Try to evict a spine with just a few cells in it
Until some wirelength penalty is reached
Try to evict a tile with just a few cells in it
Until some wirelength penalty is reached
Both the above procedure network flow based
19.Spine Reduction (network flow)
19
Destination
Source
Identify a low used spine
Make dummy source
Make dummy destination
Add flow edges
Source tiles in spine
Other tiles destination
Tile in spine neighbor tiles
Edge weight = Manhattan distance between tiles
Move cells if flow possible
Tile reduction is similar, see paper
21.Column Minimization
Until now, tried to make whole tile empty
Now, redistribute cells to reduce #columns required
Example: If each column can have at most 10 cells in it, and a tile has 21 cells, must use 3 columns. Want to move the 1 lonely cell to some other tile
Solved using greedy movement
Peephole W x H tiles dimension, slides over layout
Sort tiles w.r.t. “lonely” cells
Move in neighboring tile which can accept such cells
Like: 21 + 24 (6 columns) 20 + 25 (5 columns)
23.Column Minimization
Until now, tried things external to a Tile
Now, club together cells to reduce column count
Using geometric clustering + assignment problem
Example: maximum 2 cells per column.
Cluster cells in bins of size 2
Assignment problem
assign clusters => columns
cost: wirelength increase
Swap cells to reduce WL
24.Outline
Introduction & Motivation
Architecture and power model
Placement flow for low power (PASAP)
Results
Conclusions
25.Benchmarks
Gate level design (netlist)
Platforms Used
In next slides, easicXY = placement of easicX on platform Y
26.Results (Clock Power)
Clock power reduction
In the range of 28% to 61%
Average over all benchmarks: 38%
Clock switched network tightened significantly
Baseline: Clock disconnect applied after existing placer
27.Results (Leakage Power)
Leakage power reduction
In the range of 7% to 40%
Average over all benchmarks: 17%
Higher number of columns can be turned off
Baseline: Column disconnect applied after existing placer
28.Results (Components count)
Raw numbers of # of components turned off
# of unused Spines increases by 58%
# of unused Tiles increases by 27%
# of unused Columns increases by 42%
CPU time increase = 29%
Wirelength penalty: 16%
29.Conclusions
Presented a placement flow to improve the possibility of application of power reduction strategies (by selective shutdown)
Reduces clock power by as much as 60% and leakage power by as much as 40% compared to existing aggressive shutdown mechanism.
Reduces active spine, tile, column count by 58%, 27% and 42% respectively.