Introduction to Portfolio Analysis - DPCPSI

Download this Presentation

0

Presentation Transcript

  • 1.Portfolio Analysis: Introduction Office of Portfolio Analysis Division of Program Coordination, Planning, and Strategic Initiatives National Institutes of Health
  • 2.
  • 3.Office of Portfolio Analysis Director – Dr. George Santangelo Established in 2011 OPA Mission Statement: Our purpose is to enhance the impact of NIH-supported research by enabling NIH research administrators and decision makers to evaluate and prioritize current, as well as emerging, areas of research that will advance knowledge and improve human health.
  • 4.Mission of the Office of Portfolio Analysis Coordination of trans-NIH portfolio analysis activities Conducting NIH-wide analyses for the NIH Director and DPCPSI Director Planning and hosting Workshops, Symposia, and Seminars Creating opportunities for crosstalk within the NIH community Portfolio Analysis Interest Group (PAIG) and blog (The Analyst) Consultation Assisting NIH staff in the 27 Institutes and Centers (ICs) with analyses Has resulted in collaborative development of tools, case studies, etc. Training Both formal classes and ad hoc sessions OPA web site: user manuals, FAQs, instructional videos (under construction) Developing a science of portfolio analysis Building new tools / approaches and augmenting pre-existing ones Primary focus is biomedical research Building a community of experts: government, academia, private sector Office of Portfolio Analysis
  • 5.Why do we Carry out Analyses? Office of Portfolio Analysis
  • 6.Why are portfolio analyses carried out? In response to questions from senior leadership or external requests Strategic planning and Program management Evaluation Exploration and discovery Office of Portfolio Analysis
  • 7.What questions can we ask? Office of Portfolio Analysis
  • 8.Types of Analyses Content Analysis What is being done? How much is being spent? Is there overlap? Has the science changed? Network Analysis Who is working with who? Who is being funded by who? Impact Analysis What is being published and who is citing the work? Is there any IP (patents, licensing etc.)? New clinical guidelines?
  • 9.What is the investment in a certain area? Official NIH spending reported using RCDC Not all topics are reportable categories Total investment in “your favorite area” including intramural (2007-2010 only), and extramural awards. Office of Portfolio Analysis
  • 10.1 2 3 4 5 6 15 16 17 18 19 20 7 8 9 14 13 12 10 11 IC (b) IC (a) Is there overlap between agencies/ICs/divisions?
  • 11.Evolution of Portfolios: Stem Cell Research 2009 Searched QVR for “Stem Cell” in Title and Abstract 291 Projects
  • 12.2013 193 Projects
  • 13.Europe Japan FY09 Metabolomics Co-authorship Networks USA Is there collaboration in my field?
  • 14.How influential are publications? NIH-funded research Publications Citations INPUT OUTPUT INFLUENCE
  • 15.How influential are publications? NIH-funded investigator studying axon guidance Random sample of non-NIH axon guidance papers
  • 16.How do we get started? Office of Portfolio Analysis
  • 17.The Basics Define the question you are trying to answer Define the data you are going to use Identify the tools you are going to use Office of Portfolio Analysis
  • 18.Step 1: Define your question The Basics: Part One Office of Portfolio Analysis
  • 19.What is the question you are trying to answer? Start general and then get specific How will the analysis be used? Who will the analysis be shown to? ALWAYS have a question Office of Portfolio Analysis
  • 20.Step 2: Define your datasets The Basics: Part Two Office of Portfolio Analysis
  • 21.What data are you going to use? Office of Portfolio Analysis
  • 22.Gathering data Office of Portfolio Analysis http://inside.era.nih.gov/files/Activity_Code_Book.pdf
  • 23.iSearch Fast Highly tuned document indexes provide subsecond query time over millions of funded and unfunded grants, tens of millions of publications, tens of millions of patents, and hundreds of thousands of clinical trial and drug records. Comprehensive Data consist of over 4 million funded and unfunded NIH grant applications from 1975 to the present and approximately 3 million non-NIH grant records from over 200 agencies; 26 million publications; 11 million patents, 223,000 clinical trials, and 32,000 approved drugs. Easy-to-use Google-like free text queries, NIH-specific search filters, and real-time drill down make data exploration quick and accurate.
  • 24.iSearch Expressive Free text search supports a full range of boolean, phrase, proximity, exact, and wildcard searches over a number of customizable search fields. Flexible Numerous combinations of search fields and filters make it possible to find answers to complex questions quickly. Search grants with approved drugs, find patents by grant number, filter publications by admin IC, limit grants by number of publications, export search results directly to iCite. Up-to-date Nightly jobs clean and link the latest IMPACII data with publications and patents. Clinical trials are added daily. Publications, patents, drug approvals and RCR values are updated monthly.
  • 25.iSearch – Grants Data NIH, CDC, SAMHSA, AHRQ, HRSA, VA, FDA, OASH, ADAMHA, ACF Funded and unfunded applications from IMPACII 1975 – present Updated daily Non-NIH grants Approximately 3 million funded applications from ~230 agencies 1952 – present (depending on agency) Updated monthly Data cleaning Remove boilerplate text (e.g., “Provided by applicant”, “In the space provided”) that interferes with content-based analyses and document clustering Normalize non-standard characters for improved searching Remove non-printing characters for more consistent text processing
  • 26.iSearch – Patent Data 11 Million patents USPTO Weekly updates Linkages Automatically recognize grant number variants in the federal support section and description Substantially increases the number of patents attributable to NIH grants iSearch – Publication Data 26 million publications All of PubMed Updated monthly Linked to grants – spires match case 5, 4, and “3.5” Match case 3.5 Spires match case 3 + name of author matches name of grantee E.g., “Willman, Cheryl L” -> “Cheryl Willman” or “CL Willman”
  • 27.iSearch – Clinical Trials Data 223,000 Clinical trials Clinical trials.gov Updated daily Linked Citations in Clinical Trials Links in IMPACII iSearch – Approved Drugs 32,000 approved drugs FDA Orange book Updated monthly Linked drugs to patents, patents to grants Linked Patent Use Code to indication for easy searching
  • 28.Who can use iSearch? iSearch is designed for extramural staff at the NIH. NIH log-in and QVR credentials are required to access iSearch. For access to iSearch or requests for additional details, please contact isearch@od.nih.gov
  • 29.ExerciseSearching for Publications iSearch Fast, interactive grant search Export to OPA web apps to gather publication data and analyze https://od.lexicalintelligence.com/dashboard Office of Portfolio Analysis
  • 30.Step 3: Clean your Data Missing data Is there data for all the fields you are interested in? Need a minimum of Title and Abstract to do content analysis Ambiguous data Names Individuals – problems with attribution of authorship Departments – useful for defining fields? Institutions – many ways to refer to the same place Allow enough time to gather and clean the data Data cleaning: Comprehensive and accurate data Opportunity to become familiar with the data Approximately 90% of the time is spent at this part of the analysis
  • 31.
  • 32.Ambiguous Names Office of Portfolio Analysis Fire and Mello Fire, Andrew Z Fire, Andrew Fire, A Z Fire, A
  • 33.After disambiguation Office of Portfolio Analysis Fire and Mello
  • 34.List of names to be disambiguated List of disambiguated names https://od.lexicalintelligence.com/iClean/ a tool that makes disambiguating a list of names easy accepts outputs from a number of data sources i.e SPIRES, QVR biblio report, etc. the only requirement is to have the list of names to disambiguate in one column
  • 35.Hilderbrand, S Hilderbrand, Scott Hilderbrand, Scott A Weigl, B H Weigl, Bernhard Weigl, Bernhard H Gaydos, C Gaydos, C A Gaydos, Charlotte Gaydos, Charlotte A Hilderbrand, Scott A. Weigl, Bernhard Gaydos, Charlotte List of input names List of disambiguated names Co-author network before name disambiguation Co-author network after name disambiguation
  • 36.Identify the tools The Basics: Part Three Office of Portfolio Analysis
  • 37.What tools are you going to use? Select the tool for the job, not the other way around Sometimes the simplest tool is the right tool Office of Portfolio Analysis
  • 38.Bibliometric Analysis iCite CitNet Explorer CiteSpace Text Mining and Clustering IN-SPIRE Carrot2 Network Analysis Sci2/Guess Gephi Cytoscape NodeXL Office of Portfolio Analysis
  • 39.Office of Portfolio Analysis Abandoning Impact Factor: a growing consensus
  • 40.Relative Citation Ratio: how influential is an article? Citations per year received by an article, normalized by: Field Year NIH-funding “How many citations per year compared to peer articles in the same field?” Average = 1.0 2.0 = twice as many citations per year as expected 0.5 = half as many citations per year as expected
  • 41.RCR: A scalable measure of influence well-correlated with expert opinion RCR vs. Expert Review Scores
  • 42.iCite: a bibliometrics dashboard for NIH staff NIH-funded investigator studying axon guidance Random sample of non-NIH axon guidance papers
  • 43.Exercise: Analyzing a portfolio with iCite Public iCite: https://icite.od.nih.gov Lower download limits (200 articles) NIH-internal iCite: http://icite-beta.od.nih.gov High download limits (50,000) Start from grants search in iSearch: http://10.157.43.233:8080/iSearch
  • 44.Text Mining and Clustering:IN-SPIRE Developed by PNNL (Pacific Northwest National Laboratory) Clusters free text and provides a useful overview of the scientific landscape of a portfolio Free for government use http://in-spire.pnnl.gov/ Office of Portfolio Analysis
  • 45.IN-SPIRE Text Processing Extract text from documents Create a mathematical vector for each document Organize according to key topics Cluster the document vectors in n-space Present each document as a “docustar” where proximity suggests similar themes Project the n-space clusters into a 2-D visualization Office of Portfolio Analysis
  • 46.IN-SPIRE Analysis and Visualization Analysis Thematic distribution by various metadata Query relationships and overlap Targeted search Time slicing Informed exploration and discovery Visualization Galaxy View permits intuitive interaction to explore the dataset Theme View provides a 3-D representation of clusters Office of Portfolio Analysis
  • 47.Galaxy View:2013 “Stem Cell”
  • 48.Highlight Groups
  • 49.Drill Down
  • 50.ThemeView Classic 2009 291 Projects
  • 51.Text Mining and Clustering: Carrot Carrot2 is a framework for building document clustering engines Two specialized document clustering algorithms Ready-to-use components for fetching search results from various sources such as public search engines http://carrotsearch.com/opensource-overview http://search.carrot2.org/stable/search Office of Portfolio Analysis
  • 52.Office of Portfolio Analysis
  • 53.Office of Portfolio Analysis
  • 54.Office of Portfolio Analysis
  • 55.Network Analysis Tools Sci2 Supports the temporal, topical and network analysis, and visualization of scholarly datasets Free software https://sci2.cns.iu.edu/user/index.php Office of Portfolio Analysis
  • 56.Europe Japan FY09 Co-authorship Networks USA Is there collaboration in my field?
  • 57.2009-2012 2009-2013 2009-2014 2009-2011 2009-2010 2009 Co-author network of the portfolio of grants belonging to a particular PO evolving with time Networks Evolve over Time The color & size of the nodes were adjusted to reflect degree
  • 58.Final points Office of Portfolio Analysis
  • 59.Take contemporaneous notes while you are carrying your analysis Take time to define the portfolio Present your results in the context of the question that you posed Make the visualizations count Simplify, don’t complicate Clean your data, clean your data, clean your data! Office of Portfolio Analysis
  • 60.Contact Us NIH https://list.nih.gov/cgi-bin/wa.exe?A0=portfolio_analysis Office of Portfolio Analysis