Tutorial Proposal for the Visualization Conference
INFORMATION VISUALIZATION, VISUAL
DATA MINING,
AND
ITS APPLICATION TO DRUG DESIGN
(3 separate parts which can be split or combined as desired)
Level of tutorial: Beginner - Advanced
(knowledge of computer systems and elementary data mining
is recommended but not necessary)
Keywords: Visualization, Visual Data Mining, Data
Exploration,
Bioinformatics, Cheminformatics.
Course Organizers:
Georges Grinstein
Co-director - Institute for Visualization and Perception
Research
Co-director - Center for Bioinformatics and Computational
Biology
Professor - Department of Computer Science
University of Massachusetts at Lowell
Lowell, MA 01854
grinstein@cs.uml.edu
Daniel Keim
ATT Research Labs,
Florham Park, NJ, USA
Professor - Department of Computer Science
University of Constance, Germany
keim@research.att.com
Matthew Ward
Professor - Department of Computer Science
Worcester Polytechnic Institute
att@owl.WPI.EDU
Description:
This tutorial will provide the necessary background to
understand the
issues in the development and usage of visualization
integrated with
data mining and knowledge discovery systems with some focus
on drug
discovery. We will provide a brief history of data
visualization and
data mining, discuss the impact of integration on data
models in drug
discovery, and examine both sample commercial and academic
knowledge
discovery systems that integrate visualization and data
mining. Many
slides, videotapes and demonstrations will be provided.
WHO SHOULD ATTEND?
This course is aimed at those who would like to acquire or
strengthen
their fundamental background in basic visualization as well
as the
role of visual data exploration in modern data mining and
knowledge
discovery with some focus on drug discovery.
Outline (assuming a full day tutorial - outline will vary
depending
on the actual time allocated):
PART I: Introduction to Visual Data Exploration (Beginners)
(Introduction to
Information Visualization)
Chapter 1 - Background
* History of Computer Graphics and Data
Visualization
* What Visualization is and is not
* Exploratory Data Analysis
* The Great Demand for Visualization
* Global Computing Applications
* Goals of Data Visualization
* The Visualization Problem
* What are the Key DATA Factors?
* What are the Key Data Factors for Drug
Discovery?
* History of Visualization with videos
(1940-2000)
Chapter 2 - Background - Graphics and Visualization Concepts
* Visualization Taxonomy
* Visualization Architecture Overview
* The Graphics/Visualization Pipeline
* The Visualization Pipeline
* The Visualization Process
* Visualization Components
* Visualization Interactions
* Data Presentation, Exploration, and
Visualization
Chapter 3 - Visualization Techniques - Data Models and
Management
* Data Models and Management
* Categories of Data
* Typical Data Classes
* 2D and 3D Scalar Data
* Vector Data
* Data Models
* The Visualization Pipeline
- Interactions with a Database Management view
* The Visualization Pipeline
- Interactions with a Machine Learning view
* Data Objects: The Role of Metadata
* A Conceptual Meta-Model for Metadata
* Visualizations and Visualization
Operations (queries)
* Articulating Interactions
* Data probes
* Large Dataset and Database Visualizations
* Distributed Data Management
Chapter 4 - Visualization Techniques - Perception Issues
* Human Perception
* Why Study Perception?
* Human Color Vision
* How do we use Color?
*
Color Scales, Curves and Lookup Tables
* Color Models and Examples
* Key Color Issues
* Effective Color Use
* Absolute Judgment of 1-D Stimuli (Miller)
* Absolute Judgment of Multi-D Stimuli
* Errors in Visual Perception - (Cleveland)
* Human Engineering
Chapter 5 - Spatial Visualization Techniques
* 1D Data (1D domain and range)}
- Time Series Data histograms, line graphs, bar graphs
- Probe through volume data set
* 2D Data
- Geographical Data
- Image Data
* 3D Data
- Surface Data
- Volume Data
* Flow Data
* Hierarchichal
Chapter 6 - Non-Spatial Visualization Techniques
* Taxonomy of Visualization Techniques
(repeated)
* Geometric (positional and orientation)
(scatterplots, parallel coordinates, radviz, ...)
* Pixel (color and proximity)
(recursive pattern, spiral, extensions, pixel bar charts)
* Icon/Glyph (shape and texture)
(stick figure, color, ...)
* Hierarchical Techniques (Nested
Dimensions)
(treemap, dendrogram, dimensional stacking
(temple mvv), trellis)
* Graph and Network
* Software Visualizations
* WWW and Information Retrieval
Visualizations
PART II: Visual Data Exploration (Advanced)
(Integrating Visualization with Data Mining and Knowledge
Discovery)
Chapter 1 - Data Mining Techniques
* Taxonomy of Data Mining Techniques
1. Statistics
2. Machine Learning
3. Database Queries
4. Visualizations
* What Data Mining is and is not
* Similarity and Distance Metrics
* CRISP-DM
* Pattern Recognition (Summarization,
Clustering,
Classification, Association, Prediction)
* Supervised and Unsupervised Learning
* Non-parametric Techniques (Decision
Trees, ...)
* Dimensional Reduction Techniques
* Generating and Mining Association Rules
Techniques include KNN, K-Means, PCA, Neural Networks,
Genetic
Algorithms, Support Vector Machines, Decision Trees, and
Bayesian
Models.
Chapter 2 - Data Mining Techniques - Validation and
Robustness
* Validation of Results
* Comparing Algorithms
* Bias and Variance
* Jackknife and Bootstrap
* Combining Classifiers
* Dealing with Missing Values
Chapter 3 - Data Mining Systems With Minimal Visualization
Support
* R and S-Plus
* SAS
* CART (Salford Systems)
* MARS (Salford Systems)
* SPSS, AnswerTree (SPSS)
* Partek
* Kxen API
* NeuroGenetic Optimizer (BioComp Systems)
* DataSMARTS (Cognition Technologies)
* DataMining Suite (Information Discovery)
* Alice (Isoft S.A.)
* Red Brick Data Mine Option (Red Brick
Systems)
* C5.0, Cubist (RuleQuest Research)
Chapter 4 - Data Mining Systems With Some Visualization
Support
* MatLab (MathSoft)
* Clementine (SPSS; Integral Solutions)
* CrossGraphs (Belmont Research)
* 4Thought (Cognos)
* SOMine (Viscovery)
Chapter 5 - Data Mining Systems With Fully Integrated
Visualization
Support (High Interaction)
* MineSet (SGI)
* Intelligent Miner (IBM)
* DecisionSite (Spotfire)
* Diamond (SPSS)
* TempleMVV (Mihalisin Associates)
* OmniViz
Chapter 6 - Perspectives and Conclusions
* Where is Visualization Heading?
* Experimental Systems
* Software Visualization
* Automated Design Techniques
(Feiner,Mackinley, Marks, Roth, Wilkinson)
* Drug Discovery
* High-Performance Computation
* What is Missing?
* Course Summary
* For More Information
PART III: Biomedical Applications of Visual Data Mining
(can be made more general to cover different applications)
Chapter 1 - General Visualization Systems
* Taxonomy of Visualization Systems
* AVS, Spotfire, Clementine, MineSet,
Diamond, XmdvTool
* NetMap, Daisy
* MATLAB and Mathematica
* Xgobi and others
* S-Plus, R, SAS
* Discovery Series (Visible Decisions)
* Statistica and others
* WWW Visualization Interfaces
Chapter 2 - Specific Drug Discovery Mining Systems
We will briefly discuss a number of Bioinformatics and
Cheminformatics
Packages. These will include some of the following:
* Gene Linker (Molecular Mining)
* Rosetta Resolver (Rosetta)
* GeneTraffic (Iobion)
* GeneSpring (Silicon Genetics)
* Array Informatics (Packard Bioscience)
* GeneSight (BioDiscovery)
* BioMine (Gene Network Systems)
* Lassap (Gene-IT)
* I-Sight Discover (Silico Insights)
* LeadPharmer (Bioreason)
* others TBS
Chapter 3 - Case Studies
We will provide one or two detailed case studies to be
determined.
COURSE ORGANIZER
Georges Grinstein is a full time Professor of Computer
Science at the
University of Massachusetts Lowell, co-director of its
Institute for
Visualization and Perception Research, co-director of its
Center for
Bioinformatics and Computational Biology, and a co-founder
of AnVil,
Inc. He received his B.S. from the City College of
N.Y. in 1967, his
M.S. from the Courant Institute of Mathematical Sciences of
New York
University in 1969 and his Ph.D. in Mathematics from the
University of
Rochester in 1978.
Georges Grinstein is a member of IEEE, ACM , AAAI,
Eurographics, and
served on the journal editorial boards of Computers and
Graphics,
Computer Graphics Forum, and Knowledge Discovery in
Databases and Data
Mining. He served as a member of the executive board
of ANSI X3H3, as
chair of X3H3.6, as a member of ISO, and as vice-chair of
the IFIP WG
5.10 Computer Graphics Group. He was co-chair of the
IFIP 1989
Conference on Workstations for Experiments, member of the
program for
the International Conference on Computer Graphics, India
1993, panels
co-chair for the IEEE Visualization'90 Conference, program
co-chair
for Visualization'91, conference co-chair for
Visualization'92, and
for Visualization'93. He was co-chair of the IFIP 1993
International
Workshop on Perceptual Issues in Visualization, for the 1993
and 1995
Workshops on Database Issues for Data Visualization and
co-chair for
the 1997 AAAI and IEEE Workshops on the Integration of
Visualization
and Data Mining. He was co-chair for the SPIE'95,
SPIE'96, and
SPIE'97 Visual Data Exploration and Analysis
Conferences. He has
participated on numerous panels, presentations, and seminars
in the
area of data visualization and exploration and is co-author
of the new
book on Information Visualization in Data Mining and
Knowledge
Discovery.
Daniel A. Keim is working in the area of information
visualization and
data mining. In the field of information visualization, he
developed
several novel techniques which use visualization technology
for the
purpose of exploring large databases. He has published
extensively on
information visualization and data mining; he has given
tutorials on
related issues at several large conferences including
Visualization,
SIGMOD, VLDB, and KDD; he has been program co-chair of the
IEEE
Information Visualization Symposia in 1999 and 2000; he is
program
co-chair of the ACM SIGKDD conference in 2002; and he is an
editor
of TVCG and the Information Visualization Journal.
Daniel Keim received his Ph.D. in Computer Science from the
University
of Munich in 1994. He has been assistant professor at the CS
department
of the University of Munich, associate professor at the CS
department
of the Martin-Luther-University Halle, and full professor at
the CS
department of the University of Constance. Currently, he is
working
at AT&T Shannon Research Labs, Florham Park, NJ, USA.
Matthew O. Ward is currently a full professor in the
Computer
Science Department at Worcester Polytechnic Institute.
Dr. Ward received
his B.S. degree in Computer Science from Worcester
Polytechnic Institute
in 1977 and his M.S. and Ph.D. in Computer Science from the
University of
Connecticut in 1979 and 1981, respectively. He was
employed as a Member of
the Technical Staff in the Robotics and Computer Systems
Research Laboratory
at AT&T Bell Laboratories between 1980 and 1984 and as a
research scientist
at Skantek Corporation until 1986, when he joined the
faculty at WPI.
His research interests include data and information
visualization, visual data
mining, scientific data management and analysis, and
knowledge-guided image
analysis. He has authored or coauthored more than 60
publications in these
areas, and is actively involved in the development of a text
book in the area
of data visualization with Drs. Keim and Grinstein. He
is currently on the
program committee of the IEEE Information Visualization
Symposium and the ACM
KDD Conference. His research has been funded by
government agencies,
including NSF and DOT, and by industry, including IBM, Sun
Microsystems, and
SGI. Dr. Ward is one of the primary architects of
several public-domain
software packages for multivariate data visualization and
exploration,
including XmdvTool, MAVIS, and SpiralGlyphics.