TIENE EN SU CESTA DE LA COMPRA
en total 0,00 €
Unstructured Mining Approaches to Solve Complex Scientific Problems
As the volume of scientific data and literature increases exponentially, scientists need more powerful tools and methods to process and synthesize information and to formulate new hypotheses that are most likely to be both true and important. Accelerating Discovery: Mining Unstructured Information for Hypothesis Generation describes a novel approach to scientific research that uses unstructured data analysis as a generative tool for new hypotheses.
The author develops a systematic process for leveraging heterogeneous structured and unstructured data sources, data mining, and computational architectures to make the discovery process faster and more effective. This process accelerates human creativity by allowing scientists and inventors to more readily analyze and comprehend the space of possibilities, compare alternatives, and discover entirely new approaches.
Encompassing systematic and practical perspectives, the book provides the necessary motivation and strategies as well as a heterogeneous set of comprehensive, illustrative examples. It reveals the importance of heterogeneous data analytics in aiding scientific discoveries and furthers data science as a discipline.
Introduction
Why Accelerate Discovery?
Scott Spangler and Ying Chen
THE PROBLEM OF SYNTHESIS
THE PROBLEM OF FORMULATION
WHAT WOULD DARWIN DO?
THE POTENTIAL FOR ACCELERATED DISCOVERY: USING COMPUTERS TO MAP THE KNOWLEDGE SPACE
WHY ACCELERATE DISCOVERY: THE BUSINESS PERSPECTIVE
COMPUTATIONAL TOOLS THAT ENABLE ACCELERATED DISCOVERY
ACCELERATED DISCOVERY FROM A SYSTEM PERSPECTIVE
ACCELERATED DISCOVERY FROM A DATA PERSPECTIVE
ACCELERATED DISCOVERY IN THE ORGANIZATION
CHALLENGE (AND OPPORTUNITY) OF ACCELERATED DISCOVERY
Form and Function
THE PROCESS OF ACCELERATED DISCOVERY
CONCLUSION
Exploring Content to Find Entities
SEARCHING FOR RELEVANT CONTENT
HOW MUCH DATA IS ENOUGH? WHAT IS TOO MUCH?
HOW COMPUTERS READ DOCUMENTS
EXTRACTING FEATURES
FEATURE SPACES: DOCUMENTS AS VECTORS
CLUSTERING
DOMAIN CONCEPT REFINEMENT
MODELING APPROACHES
DICTIONARIES AND NORMALIZATION
COHESION AND DISTINCTNESS
SINGLE AND MULTIMEMBERSHIP TAXONOMIES
SUBCLASSING AREAS OF INTEREST
GENERATING NEW QUERIES TO FIND ADDITIONAL RELEVANT CONTENT
VALIDATION
SUMMARY
Organization
DOMAIN-SPECIFIC ONTOLOGIES AND DICTIONARIES
SIMILARITY TREES
USING SIMILARITY TREES TO INTERACT WITH DOMAIN
EXPERTS
SCATTER-PLOT VISUALIZATIONS
USING SCATTER PLOTS TO FIND OVERLAPS BETWEEN NEARBY ENTITIES OF DIFFERENT TYPES
DISCOVERY THROUGH VISUALIZATION OF TYPE SPACE
Relationships
WHAT DO RELATIONSHIPS LOOK LIKE?
HOW CAN WE DETECT RELATIONSHIPS?
REGULAR EXPRESSION PATTERNS FOR EXTRACTING
RELATIONSHIPS
NATURAL LANGUAGE PARSING
COMPLEX RELATIONSHIPS
EXAMPLE: P53 PHOSPHORYLATION EVENTS
PUTTING IT ALL TOGETHER
EXAMPLE: DRUG/TARGET/DISEASE RELATIONSHIP
NETWORKS
CONCLUSION
Inference
CO-OCCURRENCE TABLES
CO-OCCURRENCE NETWORKS
RELATIONSHIP SUMMARIZATION GRAPHS
HOMOGENEOUS RELATIONSHIP NETWORKS
HETEROGENEOUS RELATIONSHIP NETWORKS
NETWORK-BASED REASONING APPROACHES
GRAPH DIFFUSION
MATRIX FACTORIZATION
CONCLUSION
Taxonomies
TAXONOMY GENERATION METHODS
SNIPPETS
TEXT CLUSTERING
TIME-BASED TAXONOMIES
KEYWORD TAXONOMIES
NUMERICAL VALUE TAXONOMIES
EMPLOYING TAXONOMIES
Orthogonal Comparison
AFFINITY
COTABLE DIMENSIONS
COTABLE LAYOUT AND SORTING
FEATURE-BASED COTABLES
COTABLE APPLICATIONS
EXAMPLE: MICROBES AND THEIR PROPERTIES
ORTHOGONAL FILTERING
CONCLUSION
Visualizing the Data Plane
ENTITY SIMILARITY NETWORKS
USING COLOR TO SPOT POTENTIAL NEW HYPOTHESES
VISUALIZATION OF CENTROIDS
EXAMPLE: THREE MICROBES
CONCLUSION
Networks
PROTEIN NETWORKS
MULTIPLE SCLEROSIS AND IL7R
EXAMPLE: NEW DRUGS FOR OBESITY
CONCLUSION
Examples and Problems
PROBLEM CATALOGUE
EXAMPLE CATALOGUE
Problem: Discovery of Novel Properties of Known Entities
ANTIBIOTICS AND ANTI-INFLAMMATORIES
SOS PATHWAY FOR ESCHERICHIA COLI
CONCLUSIONS
Problem: Finding New Treatments for Orphan Diseases from Existing Drugs
IC50:IC50
Example: Target Selection Based on Protein Network Analysis
TYPE 2 DIABETES PROTEIN ANALYSIS
Example: Gene Expression Analysis for Alternative Indications
NCBI GEO DATA
CONCLUSION
Example: Side Effects
Example: Protein Viscosity Analysis Using Medline Abstracts
DISCOVERY OF ONTOLOGIES
USING ORTHOGONAL FILTERING TO DISCOVER IMPORTANT RELATIONSHIPS
Example: Finding Microbes to Clean Up Oil Spills
ENTITIES
USING COTABLES TO FIND THE RIGHT COMBINATION OF FEATURES
DISCOVERING NEW SPECIES
ORGANISM RANKING STRATEGY
CHARACTERIZING ORGANISMS
CONCLUSION
Example: Drug Repurposing
COMPOUND 1: A PDE5 INHIBITOR
PPARa/? AGONIST
Example: Adverse Events
FENOFIBRATE
PROCESS
CONCLUSION
Example: Discovering New P53 Kinases
AN ACCELERATED DISCOVERY APPROACH BASED ON ENTITY SIMILARITY
RETROSPECTIVE STUDY
EXPERIMENTAL VALIDATION
CONCLUSION
Conclusion and Future Work
ARCHITECTURE
FUTURE WORK
ASSIGNING CONFIDENCE AND PROBABILITIES TO ENTITIES, RELATIONSHIPS, AND INFERENCES
DEALING WITH CONTRADICTORY EVIDENCE
UNDERSTANDING INTENTIONALITY
ASSIGNING VALUE TO HYPOTHESES
TOOLS AND TECHNIQUES FOR AUTOMATING THE DISCOVERY PROCESS
CROWD SOURCING DOMAIN ONTOLOGY CURATION
FINAL WORDS
References appear at the end of most chapters.