Tutorial Proposal for the Visualization Conference

 

     INFORMATION VISUALIZATION, VISUAL DATA MINING,

          AND ITS APPLICATION TO DRUG DESIGN

(3 separate parts which can be split or combined as desired)

 

Level of tutorial: Beginner - Advanced

(knowledge of computer systems and elementary data mining

is recommended but not necessary)

 

Keywords:  Visualization, Visual Data Mining, Data Exploration,

Bioinformatics, Cheminformatics.



Course Organizers:   

 

Georges Grinstein

Co-director - Institute for Visualization and Perception Research

Co-director - Center for Bioinformatics and Computational Biology

Professor - Department of Computer Science

University of Massachusetts at Lowell

Lowell, MA 01854

grinstein@cs.uml.edu

<http://www.cs.uml.edu/>

 

Daniel Keim

ATT Research Labs,

Florham Park, NJ, USA

Professor - Department of Computer Science

University of Constance, Germany

keim@research.att.com

 

Matthew Ward

Professor - Department of Computer Science

Worcester Polytechnic Institute

att@owl.WPI.EDU



Description:

 

This tutorial will provide the necessary background to understand the

issues in the development and usage of visualization integrated with

data mining and knowledge discovery systems with some focus on drug

discovery. We will provide a brief history of data visualization and

data mining, discuss the impact of integration on data models in drug

discovery, and examine both sample commercial and academic knowledge

discovery systems that integrate visualization and data mining.  Many

slides, videotapes and demonstrations will be provided.

 

WHO SHOULD ATTEND?

 

This course is aimed at those who would like to acquire or strengthen

their fundamental background in basic visualization as well as the

role of visual data exploration in modern data mining and knowledge

discovery with some focus on drug discovery.



Outline (assuming a full day tutorial - outline will vary depending

on the actual time allocated):

 

PART I: Introduction to Visual Data Exploration (Beginners)

        (Introduction to Information Visualization)

 

Chapter 1 - Background

 

      *       History of Computer Graphics and Data Visualization

      *       What Visualization is and is not

      *       Exploratory Data Analysis

      *       The Great Demand for Visualization

      *       Global Computing Applications

      *       Goals of Data Visualization

      *       The Visualization Problem

      *       What are the Key DATA Factors?

      *       What are the Key Data Factors for Drug Discovery?

      *       History of Visualization with videos (1940-2000)

 

Chapter 2 - Background - Graphics and Visualization Concepts

 

      *       Visualization Taxonomy

      *       Visualization Architecture Overview

      *       The Graphics/Visualization Pipeline

      *       The Visualization Pipeline

      *       The Visualization Process

      *       Visualization Components

      *       Visualization Interactions

      *       Data Presentation, Exploration, and Visualization

 

Chapter 3 - Visualization Techniques - Data Models and Management

 

      *       Data Models and Management

      *       Categories of Data

      *       Typical Data Classes

      *       2D and 3D Scalar Data

      *       Vector Data

      *       Data Models

      *       The Visualization Pipeline

              - Interactions with a Database Management view

      *       The Visualization Pipeline

              - Interactions with a Machine Learning view

      *       Data Objects: The Role of Metadata

      *       A Conceptual Meta-Model for Metadata

      *       Visualizations and Visualization Operations (queries)

      *       Articulating Interactions

      *       Data probes

      *       Large Dataset and Database Visualizations

      *       Distributed Data Management

 

Chapter 4 - Visualization Techniques - Perception Issues

 

      *       Human Perception

      *       Why Study Perception?

      *       Human Color Vision

      *       How do we use Color?

      *       Color Scales, Curves and Lookup Tables

      *       Color Models and Examples

      *       Key Color Issues

      *       Effective Color Use

      *       Absolute Judgment of 1-D Stimuli (Miller)

      *       Absolute Judgment of Multi-D Stimuli

      *       Errors in Visual Perception - (Cleveland)

      *       Human Engineering

 

Chapter 5 - Spatial Visualization Techniques

 

      *       1D Data (1D domain and range)}

              - Time Series Data histograms, line graphs, bar graphs

              - Probe through volume data set

      *       2D Data

              - Geographical Data

              - Image Data

      *       3D Data

              - Surface Data

              - Volume Data

      *       Flow Data

      *       Hierarchichal

 

Chapter 6 - Non-Spatial Visualization Techniques

 

      *       Taxonomy of Visualization Techniques (repeated)

      *       Geometric (positional and orientation)

              (scatterplots, parallel coordinates, radviz, ...)

      *       Pixel (color and proximity)

              (recursive pattern, spiral, extensions, pixel bar charts)

      *       Icon/Glyph (shape and texture)

              (stick figure, color, ...)

      *       Hierarchical Techniques (Nested Dimensions)

              (treemap, dendrogram, dimensional stacking

              (temple mvv), trellis)

      *       Graph and Network

      *       Software Visualizations

      *       WWW and Information Retrieval Visualizations



PART II: Visual Data Exploration (Advanced)

         (Integrating Visualization with Data Mining and Knowledge

          Discovery)

 

Chapter 1 - Data Mining Techniques

 

      *       Taxonomy of Data Mining Techniques

                      1.      Statistics

                      2.      Machine Learning

                      3.      Database Queries

                      4.      Visualizations

      *       What Data Mining is and is not

      *       Similarity and Distance Metrics

      *       CRISP-DM

      *       Pattern Recognition (Summarization, Clustering,

              Classification, Association, Prediction)

      *       Supervised and Unsupervised Learning

      *       Non-parametric Techniques (Decision Trees, ...)

      *       Dimensional Reduction Techniques

      *       Generating and Mining Association Rules

 

Techniques include KNN, K-Means, PCA, Neural Networks, Genetic

Algorithms, Support Vector Machines, Decision Trees, and Bayesian

Models.

 

Chapter 2 - Data Mining Techniques - Validation and Robustness

 

      *       Validation of Results

      *       Comparing Algorithms

      *       Bias and Variance

      *       Jackknife and Bootstrap

      *       Combining Classifiers

      *       Dealing with Missing Values

 

Chapter 3 - Data Mining Systems With Minimal Visualization Support

 

      *       R and S-Plus

        *       SAS

      *       CART (Salford Systems)

      *       MARS  (Salford Systems)

      *       SPSS, AnswerTree (SPSS)

      *       Partek

      *       Kxen API

      *       NeuroGenetic Optimizer (BioComp Systems)

      *       DataSMARTS (Cognition Technologies)

      *       DataMining Suite (Information Discovery)

      *       Alice (Isoft S.A.)

      *       Red Brick Data Mine Option (Red Brick Systems)

      *       C5.0, Cubist (RuleQuest Research)

 

Chapter 4 - Data Mining Systems With Some Visualization Support

 

      *       MatLab (MathSoft)

      *       Clementine (SPSS; Integral Solutions)

      *       CrossGraphs (Belmont Research)

      *       4Thought (Cognos)

      *       SOMine (Viscovery)

 

Chapter 5 - Data Mining Systems With Fully Integrated Visualization

            Support (High Interaction)

 

      *       MineSet (SGI)

      *       Intelligent Miner (IBM)

      *       DecisionSite (Spotfire)

      *       Diamond (SPSS)

      *       TempleMVV (Mihalisin Associates)

      *       OmniViz

 

Chapter 6 - Perspectives and Conclusions

 

      *       Where is Visualization Heading?

                      *       Experimental Systems

                      *       Software Visualization

                      *       Automated Design Techniques

                              (Feiner,Mackinley, Marks, Roth, Wilkinson)

                      *       Drug Discovery

                      *       High-Performance Computation

      *       What is Missing?

      *       Course Summary

      *       For More Information



PART III: Biomedical Applications of Visual Data Mining

(can be made more general to cover different applications)

 

Chapter 1 - General Visualization Systems

 

      *       Taxonomy of Visualization Systems

      *       AVS, Spotfire, Clementine, MineSet, Diamond, XmdvTool

      *       NetMap, Daisy

      *       MATLAB and Mathematica

      *       Xgobi and others

      *       S-Plus, R, SAS

      *       Discovery Series (Visible Decisions)

      *       Statistica and others

      *       WWW Visualization Interfaces

 

Chapter 2 - Specific Drug Discovery Mining Systems

 

We will briefly discuss a number of Bioinformatics and Cheminformatics

Packages. These will include some of the following:

 

      *       Gene Linker (Molecular Mining)

      *       Rosetta Resolver (Rosetta)

      *       GeneTraffic (Iobion)

      *       GeneSpring (Silicon Genetics)

      *       Array Informatics (Packard Bioscience)

      *       GeneSight (BioDiscovery)

      *       BioMine (Gene Network Systems)

      *       Lassap (Gene-IT)

      *       I-Sight Discover (Silico Insights)

      *       LeadPharmer (Bioreason)

      *       others TBS

 

Chapter 3 - Case Studies

 

We will provide one or two detailed case studies to be determined.

      

 

COURSE ORGANIZER

 

Georges Grinstein is a full time Professor of Computer Science at the

University of Massachusetts Lowell, co-director of its Institute for

Visualization and Perception Research, co-director of its Center for

Bioinformatics and Computational Biology, and a co-founder of AnVil,

Inc.  He received his B.S. from the City College of N.Y. in 1967, his

M.S. from the Courant Institute of Mathematical Sciences of New York

University in 1969 and his Ph.D. in Mathematics from the University of

Rochester in 1978. 

 

Georges Grinstein is a member of IEEE, ACM , AAAI, Eurographics, and

served on the journal editorial boards of Computers and Graphics,

Computer Graphics Forum, and Knowledge Discovery in Databases and Data

Mining.  He served as a member of the executive board of ANSI X3H3, as

chair of X3H3.6, as a member of ISO, and as vice-chair of the IFIP WG

5.10 Computer Graphics Group.  He was co-chair of the IFIP 1989

Conference on Workstations for Experiments, member of the program for

the International Conference on Computer Graphics, India 1993, panels

co-chair for the IEEE Visualization'90 Conference, program co-chair

for Visualization'91, conference co-chair for Visualization'92, and

for Visualization'93.  He was co-chair of the IFIP 1993 International

Workshop on Perceptual Issues in Visualization, for the 1993 and 1995

Workshops on Database Issues for Data Visualization and co-chair for

the 1997 AAAI and IEEE Workshops on the Integration of Visualization

and Data Mining.  He was co-chair for the SPIE'95, SPIE'96, and

SPIE'97 Visual Data Exploration and Analysis Conferences.  He has

participated on numerous panels, presentations, and seminars in the

area of data visualization and exploration and is co-author of the new

book on Information Visualization in Data Mining and Knowledge

Discovery.



Daniel A. Keim is working in the area of information visualization and

data mining. In the field of information visualization, he developed

several novel techniques which use visualization technology for the

purpose of exploring large databases. He has published extensively on

information visualization and data mining; he has given tutorials on

related issues at several large conferences including Visualization,

SIGMOD, VLDB, and KDD; he has been program co-chair of the IEEE

Information Visualization Symposia in 1999 and 2000; he is program

co-chair of the ACM SIGKDD conference in 2002; and he is an editor

of TVCG and the Information Visualization Journal.

 

Daniel Keim received his Ph.D. in Computer Science from the University

of Munich in 1994. He has been assistant professor at the CS department

of the University of Munich, associate professor at the CS department

of the Martin-Luther-University Halle, and full professor at the CS

department of the University of Constance. Currently, he is working

at AT&T Shannon Research Labs, Florham Park, NJ, USA.



Matthew O. Ward is currently a full professor in the Computer

Science Department at Worcester Polytechnic Institute.  Dr. Ward received

his B.S. degree in Computer Science from Worcester Polytechnic Institute

in 1977 and his M.S. and Ph.D. in Computer Science from the University of

Connecticut in 1979 and 1981, respectively.  He was employed as a Member of

the Technical Staff in the Robotics and Computer Systems Research Laboratory

at AT&T Bell Laboratories between 1980 and 1984 and as a research scientist

at Skantek Corporation until 1986, when he joined the faculty at WPI. 

 

His research interests include data and information visualization, visual data

mining, scientific data management and analysis, and knowledge-guided image

analysis.  He has authored or coauthored more than 60 publications in these

areas, and is actively involved in the development of a text book in the area

of data visualization with Drs. Keim and Grinstein.  He is currently on the

program committee of the IEEE Information Visualization Symposium and the ACM

KDD Conference.  His  research has been funded by government agencies,

including NSF and DOT, and by industry, including IBM, Sun Microsystems, and

SGI.  Dr. Ward is one of the primary architects of several public-domain

software packages for multivariate data visualization and exploration,

including XmdvTool, MAVIS, and SpiralGlyphics.