1.Portfolio Analysis: Introduction
Office of Portfolio Analysis
Division of Program Coordination, Planning, and Strategic Initiatives
National Institutes of Health
2.
3.Office of Portfolio Analysis
Director – Dr. George Santangelo
Established in 2011
OPA Mission Statement:
Our purpose is to enhance the impact of NIH-supported research by enabling NIH research administrators and decision makers to evaluate and prioritize current, as well as emerging, areas of research that will advance knowledge and improve human health.
4.Mission of the Office of Portfolio Analysis
Coordination of trans-NIH portfolio analysis activities
Conducting NIH-wide analyses for the NIH Director and DPCPSI Director
Planning and hosting Workshops, Symposia, and Seminars
Creating opportunities for crosstalk within the NIH community
Portfolio Analysis Interest Group (PAIG) and blog (The Analyst)
Consultation
Assisting NIH staff in the 27 Institutes and Centers (ICs) with analyses
Has resulted in collaborative development of tools, case studies, etc.
Training
Both formal classes and ad hoc sessions
OPA web site: user manuals, FAQs, instructional videos (under construction)
Developing a science of portfolio analysis
Building new tools / approaches and augmenting pre-existing ones
Primary focus is biomedical research
Building a community of experts: government, academia, private sector
Office of Portfolio Analysis
5.Why do we Carry out Analyses?
Office of Portfolio Analysis
6.Why are portfolio analyses carried out?
In response to questions from senior leadership or external requests
Strategic planning and Program management
Evaluation
Exploration and discovery
Office of Portfolio Analysis
7.What questions can we ask?
Office of Portfolio Analysis
8.Types of Analyses
Content Analysis
What is being done?
How much is being spent?
Is there overlap?
Has the science changed?
Network Analysis
Who is working with who?
Who is being funded by who?
Impact Analysis
What is being published and who is citing the work?
Is there any IP (patents, licensing etc.)?
New clinical guidelines?
9.What is the investment in a certain area?
Official NIH spending reported using RCDC
Not all topics are reportable categories
Total investment in “your favorite area” including intramural (2007-2010 only), and extramural awards.
Office of Portfolio Analysis
10.1
2
3
4
5
6
15
16
17
18
19
20
7
8
9
14
13
12
10
11
IC (b)
IC (a)
Is there overlap between agencies/ICs/divisions?
11.Evolution of Portfolios: Stem Cell Research
2009
Searched QVR for “Stem Cell” in Title and Abstract
291 Projects
12.2013
193 Projects
13.Europe
Japan
FY09 Metabolomics Co-authorship Networks
USA
Is there collaboration in my field?
14.How influential are publications?
NIH-funded research
Publications
Citations
INPUT
OUTPUT
INFLUENCE
15.How influential are publications?
NIH-funded
investigator studying
axon guidance
Random sample
of non-NIH axon
guidance papers
16.How do we get started?
Office of Portfolio Analysis
17.The Basics
Define the question you are trying to answer
Define the data you are going to use
Identify the tools you are going to use
Office of Portfolio Analysis
18.Step 1: Define your question
The Basics: Part One
Office of Portfolio Analysis
19.What is the question you are trying to answer?
Start general and then get specific
How will the analysis be used?
Who will the analysis be shown to?
ALWAYS have a question
Office of Portfolio Analysis
20.Step 2: Define your datasets
The Basics: Part Two
Office of Portfolio Analysis
21.What data are you going to use?
Office of Portfolio Analysis
22.Gathering data
Office of Portfolio Analysis
http://inside.era.nih.gov/files/Activity_Code_Book.pdf
23.iSearch
Fast
Highly tuned document indexes provide subsecond query time over millions of funded and unfunded grants, tens of millions of publications, tens of millions of patents, and hundreds of thousands of clinical trial and drug records.
Comprehensive
Data consist of over 4 million funded and unfunded NIH grant applications from 1975 to the present and approximately 3 million non-NIH grant records from over 200 agencies; 26 million publications; 11 million patents, 223,000 clinical trials, and 32,000 approved drugs.
Easy-to-use
Google-like free text queries, NIH-specific search filters, and real-time drill down make data exploration quick and accurate.
24.iSearch
Expressive
Free text search supports a full range of boolean, phrase, proximity, exact, and wildcard searches over a number of customizable search fields.
Flexible
Numerous combinations of search fields and filters make it possible to find answers to complex questions quickly. Search grants with approved drugs, find patents by grant number, filter publications by admin IC, limit grants by number of publications, export search results directly to iCite.
Up-to-date
Nightly jobs clean and link the latest IMPACII data with publications and patents. Clinical trials are added daily. Publications, patents, drug approvals and RCR values are updated monthly.
25.iSearch – Grants Data
NIH, CDC, SAMHSA, AHRQ, HRSA, VA, FDA, OASH, ADAMHA, ACF
Funded and unfunded applications from IMPACII
1975 – present
Updated daily
Non-NIH grants
Approximately 3 million funded applications from ~230 agencies
1952 – present (depending on agency)
Updated monthly
Data cleaning
Remove boilerplate text (e.g., “Provided by applicant”, “In the space provided”) that interferes with content-based analyses and document clustering
Normalize non-standard characters for improved searching
Remove non-printing characters for more consistent text processing
26.iSearch – Patent Data
11 Million patents
USPTO
Weekly updates
Linkages
Automatically recognize grant number variants in the federal support section and description
Substantially increases the number of patents attributable to NIH grants
iSearch – Publication Data
26 million publications
All of PubMed
Updated monthly
Linked to grants – spires match case 5, 4, and “3.5”
Match case 3.5
Spires match case 3 + name of author matches name of grantee
E.g., “Willman, Cheryl L” -> “Cheryl Willman” or “CL Willman”
27.iSearch – Clinical Trials Data
223,000 Clinical trials
Clinical trials.gov
Updated daily
Linked
Citations in Clinical Trials
Links in IMPACII
iSearch – Approved Drugs
32,000 approved drugs
FDA Orange book
Updated monthly
Linked drugs to patents, patents to grants
Linked Patent Use Code to indication for easy searching
28.Who can use iSearch?
iSearch is designed for extramural staff at the NIH. NIH log-in and QVR credentials are required to access iSearch. For access to iSearch or requests for additional details, please contact isearch@od.nih.gov
29.ExerciseSearching for Publications
iSearch
Fast, interactive grant search
Export to OPA web apps to gather publication data and analyze
https://od.lexicalintelligence.com/dashboard
Office of Portfolio Analysis
30.Step 3: Clean your Data
Missing data
Is there data for all the fields you are interested in?
Need a minimum of Title and Abstract to do content analysis
Ambiguous data
Names
Individuals – problems with attribution of authorship
Departments – useful for defining fields?
Institutions – many ways to refer to the same place
Allow enough time to gather and clean the data
Data cleaning:
Comprehensive and accurate data
Opportunity to become familiar with the data
Approximately 90% of the time is spent at this part of the analysis
31.
32.Ambiguous Names
Office of Portfolio Analysis
Fire and Mello
Fire, Andrew Z
Fire, Andrew
Fire, A Z
Fire, A
33.After disambiguation
Office of Portfolio Analysis
Fire and Mello
34.List of names to be disambiguated
List of disambiguated names
https://od.lexicalintelligence.com/iClean/
a tool that makes disambiguating a list of names easy
accepts outputs from a number of data sources i.e SPIRES, QVR biblio report, etc.
the only requirement is to have the list of names to disambiguate in one column
35.Hilderbrand, S
Hilderbrand, Scott
Hilderbrand, Scott A
Weigl, B H
Weigl, Bernhard
Weigl, Bernhard H
Gaydos, C
Gaydos, C A
Gaydos, Charlotte
Gaydos, Charlotte A
Hilderbrand, Scott A.
Weigl, Bernhard
Gaydos, Charlotte
List of input names
List of disambiguated names
Co-author network before name disambiguation
Co-author network after name disambiguation
36.Identify the tools
The Basics: Part Three
Office of Portfolio Analysis
37.What tools are you going to use?
Select the tool for the job, not the other way around
Sometimes the simplest tool is the right tool
Office of Portfolio Analysis
38.Bibliometric Analysis
iCite
CitNet Explorer
CiteSpace
Text Mining and Clustering
IN-SPIRE
Carrot2
Network Analysis
Sci2/Guess
Gephi
Cytoscape
NodeXL
Office of Portfolio Analysis
39.Office of Portfolio Analysis
Abandoning Impact Factor: a growing consensus
40.Relative Citation Ratio: how influential is an article?
Citations per year received by an article, normalized by:
Field
Year
NIH-funding
“How many citations per year compared to peer articles in the same field?”
Average = 1.0
2.0 = twice as many citations per year as expected
0.5 = half as many citations per year as expected
41.RCR: A scalable measure of influence well-correlated with expert opinion
RCR vs. Expert Review Scores
42.iCite: a bibliometrics dashboard for NIH staff
NIH-funded
investigator studying
axon guidance
Random sample
of non-NIH axon
guidance papers
43.Exercise: Analyzing a portfolio with iCite
Public iCite:
https://icite.od.nih.gov
Lower download limits (200 articles)
NIH-internal iCite:
http://icite-beta.od.nih.gov
High download limits (50,000)
Start from grants search in iSearch:
http://10.157.43.233:8080/iSearch
44.Text Mining and Clustering:IN-SPIRE
Developed by PNNL (Pacific Northwest National Laboratory)
Clusters free text and provides a useful overview of the scientific landscape of a portfolio
Free for government use
http://in-spire.pnnl.gov/
Office of Portfolio Analysis
45.IN-SPIRE Text Processing
Extract text from documents
Create a mathematical vector for each document
Organize according to key topics
Cluster the document vectors in n-space
Present each document as a “docustar” where proximity suggests similar themes
Project the n-space clusters into a 2-D visualization
Office of Portfolio Analysis
46.IN-SPIRE Analysis and Visualization
Analysis
Thematic distribution by various metadata
Query relationships and overlap
Targeted search
Time slicing
Informed exploration and discovery
Visualization
Galaxy View permits intuitive interaction to explore the dataset
Theme View provides a 3-D representation of clusters
Office of Portfolio Analysis
47.Galaxy View:2013 “Stem Cell”
48.Highlight Groups
49.Drill Down
50.ThemeView Classic
2009
291 Projects
51.Text Mining and Clustering: Carrot
Carrot2 is a framework for building document clustering engines
Two specialized document clustering algorithms
Ready-to-use components for fetching search results from various sources such as public search engines
http://carrotsearch.com/opensource-overview
http://search.carrot2.org/stable/search
Office of Portfolio Analysis
52.Office of Portfolio Analysis
53.Office of Portfolio Analysis
54.Office of Portfolio Analysis
55.Network Analysis Tools
Sci2
Supports the temporal, topical and network analysis, and visualization of scholarly datasets
Free software
https://sci2.cns.iu.edu/user/index.php
Office of Portfolio Analysis
56.Europe
Japan
FY09 Co-authorship Networks
USA
Is there collaboration in my field?
57.2009-2012
2009-2013
2009-2014
2009-2011
2009-2010
2009
Co-author network of the portfolio of grants belonging to a particular PO
evolving with time
Networks Evolve over Time
The color & size of the nodes were adjusted to reflect degree
58.Final points
Office of Portfolio Analysis
59.Take contemporaneous notes while you are carrying your analysis
Take time to define the portfolio
Present your results in the context of the question that you posed
Make the visualizations count
Simplify, don’t complicate
Clean your data, clean your data, clean your data!
Office of Portfolio Analysis
60.Contact Us
NIH
https://list.nih.gov/cgi-bin/wa.exe?A0=portfolio_analysis
Office of Portfolio Analysis