Saturday December
7 2013 |
08:30AM - 06:00 PM ICDM Workshops (Center Tower, 4th floor and 37th floor) |
||||
Sunday December
8, 2013 |
08:30-09:00 Conference
Opening by Mark W. Spong, Bhavani
Thuraisingham, Diane Cook [Houston Ballroom] |
||||
09.00-10.00 KEYNOTE
[Houston Ballroom] Chair:
Hui Xiong
Alexander
Tuzhilin at New York University |
|||||
10:00-10:30 Tea/Coffee Break [Houston/San
Antonio Pre-convene, 3rd floor] |
|||||
|
Houston Ballroom A |
Houston Ballroom B |
Houston Ballroom C |
San Antonio B |
|
10:30 - 12:30 |
Session 1A Social Influence Chair:
Jie Tang |
Session 1B Classification
I Chair:
Jing Gao |
Session 1C Applications I Chair:
Abdullah Mueen |
Tutorial 1 Methods and Applications of Network
Sampling |
|
12:30-13:40 Lunch [ON YOUR OWN] |
|||||
13:40 - 15:40 |
Session 2A Social Network Analysis Chair:
Yizhou Sun |
Session 2B Clustering I Chair:
Kamal Karlapalem |
Session 2C Graph and Network Mining Chair:
Hanghang Tong |
ICDM Contest Chair:
Nitesh Chawla Note:
Starting at 14:00 |
|
15:40-16:00Tea/Coffee Break [Houston/San Antonio
Pre-convene, 3rd floor] |
|||||
16:00 - 18:00 |
Session 3A Pattern Discovery Chair:
Xiang Zhang |
Session 3B Models and Algorithms I Chair:
Francois Petitjean |
Session 3C Mobile Intelligence Chair:
Wen-Chih Peng |
Session 3D Data Preprocessing Chair: Petko Bogdanov |
|
18:30-20:00 Reception & Poster [Houston/San
Antonio Pre-convene, State Room 1 and 2, 3rd floor] |
|||||
Monday
December
9, 2013 |
08:45-09:45 KEYNOTE [Houston Ballroom] Chair: George Karypis Title: Predictive Healthcare Analytics under Privacy Constraints
Joydeep Ghosh at University of Texas, Austin |
||||
09:45-10:05Tea/Coffee Break [Houston/San Antonio Pre-convene, 3rd
floor] |
|||||
|
Houston Ballroom A |
Houston Ballroom B |
Houston Ballroom C |
San Antonio B |
|
10:05 - 12:05 |
Session 4A Business Intelligence Chair:
Mohamed Ghalwash |
Session 4B Classification
II Chair: Zhi-Hua
Zhou |
Session 4C Text Mining Chair:
Kyuseok Shim |
Tutorial 2 Applied Matrix Analytics: Recent Advance
and Case Studies |
|
12:05-13:15 Lunch [ON YOUR OWN] |
|||||
13:15 -18:30 Excursion [Board buses at Draft Media
Sports Lounge Exit on Olive Street, Hotel North Tower Main Lobby] |
|||||
18:30-22:00 Banquet
& ICDM 13 Year Impact Award Address [Houston Ballroom] Chair: Diane
Cook |
|||||
Tuesday December
10, 2013 |
09.00-10.00 KEYNOTE
[Houston Ballroom] Chair: Diane Cook
Title: Large-scale Learning in Computational Advertising
Jianchang (JC) Mao at Microsoft |
||||
10:00-10:30 Tea/Coffee Break [Houston/San
Antonio Pre-convene, 3rd floor] |
|||||
|
Houston Ballroom A |
Houston Ballroom B |
Houston Ballroom C |
San Antonio B |
|
10:30 - 12:30 |
Session 5A Big Data Chair:
Feida Zhu |
Session 5B Clustering II Chair:
Emmanuel Müller |
Session 5C Active/Metric Learning Chair:
Jieping Ye |
Tutorial 3 Social Media Mining: Fundamental Issues
and Challenges |
|
12:30-13:40 Lunch and ICDM 2013 Community Meeting [Houston Pre-convene/Ballroom]
(everyone is welcome ) |
|||||
13:40 - 15:40 |
Session 6A Web Mining Chair:
Leman Akoglu |
Session 6B Sequence/Time Series Analysis Chair:
Zhenhui (Jessie) Li |
Session 6C Bioinformatics and Medical Informatics Chair:
Gaurav Pandey |
Session 6D Feature Selection Chair:
Wei Ding |
|
15:40-16:00 Tea/Coffee Break [Houston/San Antonio
Pre-convene, 3rd floor] |
|||||
16:00-17:00 Session
7A [Houston Ballroom C] Models and Algorithms II, Chair: Chandan Reddy |
|||||
16:00-17:10 Session
7B [Houston Ballroom B] Applications II, Chair: Rui Kuang |
|||||
16:00-17:30 ICDM
Panel: Data Mining with Big Data [Houston
Ballroom A] Chair: Xindong Wu
|
|||||
17:30-18:00 CLOSING SESSION [Houston Ballroom A] |
Sunday,
December 8, 2013
Conference
Opening (08:30--09:00)
Location:
Houston Ballroom
ICDM:
Keynote (09:00-10:00) Chair: Hui Xiong
Location:
Houston Ballroom
Title:
Opportunities and Challenges Facing Recommender Systems: Where Can We Go from
Here?
Speaker:
Alexander Tuzhilin
Abstract
The field of
Recommender Systems has experienced extensive growth over the last decade both
in the academia and the industry and has established itself as a vibrant and
mature research area in data mining and other disciplines. Given all this
progress and accomplishments achieved by the field, it is a good time to ask
the where-can-we-go-from-here question in order to work on the next generation
of recommender systems that will allow us to overcome the limitations that the
current generation of these systems is facing.
This talk will
address this question and present some of the underexplored directions in the
field of recommender systems that present promising research opportunities
according to the speaker’s perspective. It will also address the challenges
that researchers and practitioners may face pursuing some of these research
directions.
Biography
Alexander Tuzhilin is a Professor of Information
Systems, the NEC Faculty Fellow and the Chair of the Department of Information,
Operations and Management Sciences at the Stern School of Business at NYU. He
has received Ph.D. in Computer Science from the Courant Institute of
Mathematical Sciences, NYU. His current research interests include data mining,
recommender systems and personalization. Dr. Tuzhilin
has published extensively on these and other topics and has served on the
organizing and program committees of numerous conferences, including as a
Program Co-Chair of the Third IEEE International Conference on Data Mining
(ICDM), as a Conference Co-Chair of the Third ACM Conference on Recommender
Systems (RecSys), and as the Chair of the Steering
Committee of the ACM Conference on Recommender Systems. He has also served on
the Editorial Boards of the IEEE Transactions on Knowledge and Data
Engineering, the Data Mining and Knowledge Discovery Journal, the ACM
Transactions on Management Information Systems, the INFORMS Journal on
Computing (as an Area Editor), the Electronic Commerce Research Journal and the
Journal of the Association of Information Systems. Results of Dr. Tuzhilin’s various academic and industrial activities have
been featured in major media publications, including The New York Times, The
Wall Street Journal, Business Week and The Financial Times.
Tea/Coffee Break (10:00-10:30)
Location:
Houston/San Antonio Pre-convene, 3rd floor
Session
1A: Social Influence (Houston Ballroom A)
Session
Chair: Jie Tang
Session
Time: 10:30-12:30
Regular Papers:
¨ CSI: Charged System
Influence Model for Human Behavior Prediction, Yuanjun
Bi, Weili Wu, and Yuqing
Zhu
¨ Linear Computation for
Independent Social Influence, Qi Liu, Biao Xiang, Lei Zhang, Enhong Chen, and
Ji Chen
¨ Massive Influence in
Multiplex Social Networks: Model Representation and Analysis, Dung Nguyen, Soham Das, Thang Dinh, and My T. Thai
¨ UBLF: An Upper Bound
Based Approach to Discover Influential Nodes in Social Networks, Chuan Zhou, Peng Zhang, Jing Guo,
Xingquan Zhu, and Li Guo
Short Papers:
¨ Influence and Profit: Two
Sides of the Coin, Yuqing Zhu, Zaixin
Lu, Yuanjun Bi, Weili Wu, Yiwei Jiang, and Deying Li
¨ Influence-based
Network-oblivious Community Detection, Nicola Barbieri,
Francesco Bonchi, and Giuseppe Manco
¨ Validating Network Value
of Influencers by means of Explanations, Glenn Bevilacqua,
Shealen Clare, Amit Goyal,
and Laks V. S. Lakshmanan
¨ Influence
Maximization in Dynamic Social Networks, Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, and Xiaoming Sun
Session
1B: Classification I (Houston Ballroom B)
Session
Chair: Jing Gao
Session
Time: 10:30-12:30
Regular Papers:
¨ Generative Maximum
Entropy Learning for Multiclass Classification, Ambedkar
Dukkipati, Gaurav Pandey, Debarghya
Ghoshdastidar, Paramita Koley, and D. M. V. Satya Sriram
¨ Transfer Learning across
Networks for Collective Classification, Meng Fang,
Yin Jie, and Xingquan Zhu
¨ Learning Imbalanced
Multi-class Data with Optimal Dichotomy Weights, Xu-Ying Liu, Zhi-Hua Zhou, and Qian-Qian Li
¨ Conformal Prediction
Using Decision Trees, Ulf Johansson, Henrik Bostrom,
and Tuve Lofstrom
¨ Quantification Trees, Fosca Giannotti, Letizia Milli, Anna Monreale, Dino Pedreschi, Giulio
Rossetti, and Fabrizio Sebastiani
Short Papers:
¨ Multilabel Consensus Classification,
Sihong Xie, Xiangnan Kong, Jing Gao, Wei Fan, and Philip S Yu
¨ Combating Sub-clusters
Effect in Imbalanced Classification, Abhishek Shrivastava,
and Yang Zhao
Session
1C: Applications I (Houston Ballroom C)
Session
Chair: Abdullah Mueen
Session
Time: 10:30-12:30
Regular Papers:
¨ Parameter-Free Audio
Motif Discovery in Large Data Archives, Yuan Hao,
Mohammad Shokoohi-Yekta, George Papageorgiou,
and Eamonn Keogh
¨ Forecasting
Spatiotemporal Impact of Traffic Incidents on Road Networks, Bei Pan, UgurDemiryurek, Chetan Gupta, and Cyrus Shahabi
¨ GRIAS: an Entity-Relation
Graph based Framework for Discovering Entity Aliases, Lili
Jiang, Ping Luo, Jianyong Wang, Yuhong
Xiong, Bingduan Lin, Min
Wang, and Ning An
¨ Identifying
Transformative Scientific Research, Yi-hung Huang, Chun-Nan Hsu, and Kristina Lerman
¨ A Parameter-free Spatio-temporal Pattern Mining Model to Catalogue Global Ocean
Dynamics, James Faghmous, Matthew Le, Muhammed Uluyol, Snigdhansu Chaterjee, and Vipin Kumar
Short Papers:
¨ From Social User
Activities to People Affiliation, Guangxiang Zeng,
Ping Luo, Enhong Chen, and Min Wang
¨ Improved Electricity Load
Forecasting via Kernel Spectral Clustering of Smart Meters, Carlos Alzate and Mathieu Sinn
Tutorial
1: Methods and Applications of Network Sampling
Speakers:
Mohammad A. Hasan (IUPUI), Nesreen K. Ahmed (Purdue),
and J. Neville (Purdue)
Location: San
Antonio B
Tutorial Time:
10:30 – 12: 30
Abstract:
Network data
appears in various domains, including social, communication, and information
sciences. Analysis of such data is crucial for making inferences and
predictions about these networks, and moreover, for understanding the different
processes that drive their evolution. However, a major bottleneck to perform
such an analysis is the massive size of real-life networks, which makes
modeling and analyzing these networks simply infeasible. Further,
many networks, specifically those that belong to social and communication
domains, are not visible to the public due to privacy concerns, and other
networks, such as the Web, are only accessible via crawling. Therefore,
to overcome the above challenges, researchers use network sampling
overwhelmingly as a key statistical approach to select a sub-population of
interest that can be studied thoroughly.
In this
tutorial, we aim to cover a diverse collection of methodologies and
applications of network sampling. We will begin with a discussion of the
problem setting in terms of objectives (such as, sampling a representative subgraph, sampling graphlets,
etc.), population of interest (vertices, edges, motifs), and sampling
methodologies (such as Metropolis-Hastings, random walk, and snowball
sampling). We will then present a number of applications of these methods, and
will outline both the resulting opportunities and possible biases of different
methods in each application.
Session
Chair: Yizhou Sun
Session
Time: 13:40-15:40
LUNCH
(12:30-13:40)
Location:
ON YOUR OWN
Session
2A: Social Network Analysis (Houston Ballroom A)
Session
Chair: Yizhou Sun
Session
Time: 13:40-15:40
Regular Papers:
¨ Tree-like Structure in Social
and Information Networks, Aaron Adcock, Blair Sullivan, and Michael Mahoney
¨ An Efficient Approach to Updating
Closeness Centrality and Average Path Length in Dynamic Networks, Chia-Chen
Yen, Mi-Yen Yeh, and Ming-Syan Chen
¨ Learning, Analyzing and
Predicting Object Roles on Dynamic Networks, Kang Li, Suxin
Guo, Nan Du, Jing Gao, and Aidong
Zhang
¨ Subgraph Enumeration in Dynamic Graphs,
Abhijin Adiga, Anil Vullikanti, and Dante Wiggins
Short Papers:
¨ Predicting Social Links
for New Users across Aligned Heterogeneous Social Networks, Jiawei
Zhang, Xiangnan Kong, and Philip S Yu
¨ Sampling Heterogeneous
Social Networks, Cheng-Lun Yang, Perng-Hwa
Kung, Cheng-Te Li, Chun-An Chen, and Shou-De Lin
¨ A Feature-Enhanced
Ranking-Based Classifier for Multimodal Data and Heterogeneous Information
Networks, Scott Deeann Chen, Ying-Yu Chen, Jiawei Han, and Pierre Moulin
¨ Community Detection in
Networks with Node Attributes, Jaewon Yang, Julian McAuley, and Jure Leskovec
Session
2B: Clustering I (Houston Ballroom B)
Session
Chair: Kamal Karlapalem
Session
Time: 13:40-15:40
Regular Papers:
¨ Mixed Membership Subspace
Clustering, Stephan Gunnemann and Christos Faloutsos
¨ Discovering Non-Redundant
Overlapping Biclusters on Gene Expression Data, Duy Tin Truong, Roberto Battiti,
and Mauro Brunato
¨ Spectral Subspace
Clustering for Graphs with Feature Vectors, Stephan Gunnemann,
Ines Farber, Sebastian Raubach, and Thomas Seidl
¨ Weighted-Object Ensemble
Clustering, YazhouRen, Carlotta Domeniconi,
Guoji Zhang, and Guoxian Yu
¨ Noise-Resistant Bicluster Recognition, Huan Sun, Gengxin Miao, and Xifeng Yan
Short Papers:
¨ Cartification: A Neighborhood
Preserving Transformation for Mining High Dimensional Data, Emin
Aksehirli, Bart Goethals, Emmanuel Muller, and Jilles Vreeken
¨ Clustering on Multiple
Incomplete Datasets via Collective Kernel Learning, Weixiang
Shao, Xiaoxiao Shi, and Philip S Yu
Session
2C: Graph and Network Mining (Houston Ballroom C)
Session
Chair: Hanghang Tong
Session
Time: 13:40-15:40
Regular Papers:
¨ On Pattern Preserving
Graph Generation, Hong-Han Shuai, De-Nian Yang, Philip S Yu, Chih-Ya Shen,
and Ming-Syan Chen
¨ Blocking Simple and
Complex Contagion By Edge Removal, Christopher Kuhlman, Gaurav Tuli, Samarth Swarup, Madhav Marathe, and S.S. Ravi
¨ BIG-ALIGN: Fast Bipartite
Graph Alignment, Danai Koutra,
Hanghang Tong, and David Lubensky
¨ Compression-based Graph
Mining Exploiting Structure Primitives, Jing Feng, Xiao He, Nina Hubig, Christian Bohm, and
Claudia Plant
¨ Mining Evolving Network Processes,
Misael Mongiovi, Petko Bogdanov, and Ambuj Singh
Short Papers:
¨ Structural-Context
Similarities for Uncertain Graphs, Zhaonian Zou, and Jianzhong Li
¨ Graph Partitioning Change
Detection Using Tree-Based Clustering, Sho-ichi Sato
and Kenji Yamanishi
ICDM
Contest Session (14:00 – 15:40)
Location:
San Antonio B
Session
Chair: Nitesh Chawla
Tea/Coffee
Break (15:40-16:00)
Location:
Houston/San Antonio Pre-convene, 3rd floor
Session
3A: Pattern Discovery (Houston Ballroom A)
Session
Chair: Xiang Zhang
Session
Time: 16:00-18:00
Regular Papers:
¨ Permutation-based
Sequential Pattern Hiding, Robert Gwadera, Aris Gkoulalas-Divanis, and Grigorios Loukides
¨ Itemsets for Real-valued Datasets,
Nikolaj Tatti
¨ Mining Statistically
Significant Sequential Patterns, Cecile Low Kam, Chedy Raissi, Mehdi Kaytoue, and Jian Pei
¨ Dominance Programming for
Itemset Mining, Benjamin Negrevergne,
Tias Guns, Anton Dries, and Siegfried Nijssen
¨ Binary Time-Series Query
Framework for Efficient Quantitative Trait Association Study, Hongfei Wang and Xiang Zhang
Short Papers:
¨ Efficiently Mining Top-K
High Utility Sequential Patterns, Junfu Yin, Zhigang Zheng, and Longbing Cao
¨ Mining Dependent Frequent
Serial Episodes from Uncertain Sequence Data, Li Wan, Ling Chen, and Chengqi Zhang
Session
3B: Models and Algorithms I (Houston Ballroom B)
Session
Chair: Francois Petitjean
Session
Time: 16:00-18:00
Regular Papers:
¨ Efficient Learning for
Models with DAG-Structured Parameter Constraints, Wenliang
Zhong and James Kwok
¨ Efficient Learning on
Point Sets, Liang Xiong, Barnabas Poczos,
and Jeff Schneider
¨ Divide-and-Conquer
Anchoring for Near-separable Nonnegative Matrix Factorization and Completion in
High Dimensions, Tianyi Zhou, Wei Bian,
and Dacheng Tao
¨ Scaling Log Linear Analysis
to High-dimensional Data, Francois Petitjean,
Geoffrey Webb, and Ann Nicholson
Short Papers:
¨ Non-negative Multiple
Tensor Factorization, Koh Takeuchi, RyotaTomioka, Katsuhiko Ishiguro, Akisato
Kimura, and Hiroshi Sawada
¨ Multimedia LEGO: Learning
Structured Model by Probabilistic Logic Ontology Tree, Shiyu
Chang, Guo-Jun Qi, Jinhui
Tang, Qi Tian, Yong Rui,
and Thomas Huang
¨ Nonlinear Causal
Discovery for High Dimensional Data: A Kernelized
Trace Method, Zhitang Chen, Kun Zhang, and Laiwan Chan
¨ Network Hypothesis
Testing Using Mixed Kronecker Product Graph Models, Sebastian
Moreno and Jennifer Neville
Session
3C: Mobile Intelligence (Houston Ballroom C)
Session
Chair: Wen-Chih Peng
Session
Time: 16:00-18:00
Regular Papers:
¨ Reconstructing Individual
Mobility from Smart Card Transactions: A Space Alignment Approach, Nicholas
Jing Yuan, Yingzi Wang, Fuzheng
Zhang, Xing Xie, and Guang-Zhong
Sun
¨ Mining Probabilistic
Frequent Spatio-Temporal Sequential Patterns with Gap
Constraints from Uncertain Databases, Yuxuan Li,
James Bailey, Lars Kulik, and Jian Pei
¨ Mining Following
Relationships in Movement Data, Zhenhui Li, and Fei Wu
¨ Focal-Test-Based Spatial
Decision Tree Learning: A Summary of Results, Zhe
Jiang, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran
Short Papers:
¨ Spatio-Temporal Topic Modeling
in Mobile Social Media for Location and Time Recommendation, Bo Hu, Mohsen Jamali, and Martin Ester
¨ On the Feature Discovery
for App Usage Prediction in Smartphones, Zhung-Xun
Liao, Shou-Chung Li, Wen-Chih
Peng, Philip S Yu, and Te-Chuan Liu
¨ A Mobility Simulation
Framework of Humans with Group Behavior Modeling, Anshul
Gupta, Aurosish Mishra, Satya
Gautam Vadlamudi, P P Chakrabarti, Sudeshna Sarkar, Tridib Mukherjee,
and Nathan Gnanasambandam
¨ Hibernating Process:
Modelling Mobile Calls at Multiple Scales, Siyuan
Liu, Lei Li, and Ramayya Krishnan
Session
3D: Data Preprocessing (San Antonio B)
Session
Chair: Petko Bogdanov
Session
Time: 16:00-18:00
Regular Papers:
¨ Statistical Selection of
Congruent Subspaces for Outlier Detection on Attributed Graphs, Patricia
Iglesias Sanchez, Emmanuel Mueller, Fabian Laforet, Fabian
Keller, and Klemens Boehm
¨ Explaining Outliers by Subspace
Separability, Barbora Micenkova, Raymond T. Ng, Ira Assent, and Xuan-Hong Dang
¨ Min-Max Hash for Jaccard Similarity, JianqiuJi, Jianmin Li, Shuicheng Yan, Qi Tian, and Bo Zhang
¨ wRACOG: A Gibbs Sampling-Based
Oversampling Technique, Barnan Das, Narayanan Chatapuram Krishnan, and Diane Cook
¨ A Masking Index for
Quantifying Hidden Glitches, Laure Berti-Equille, Ji Meng Loh,
and Dasu Tamraparni
Short Papers:
¨ On Anomalous Hot Spot
Discovery in Graph Streams, Weiren Yu, Charu Aggarwal, Shuai Ma, and Haixun Wang
¨ Beyond Boolean Matrix
Decompositions: Toward Factor Analysis and Dimensionality Reduction of Ordinal
Data, Radim Belohlavek, and
Marketa Krmelova
Reception&Poster
Session (18:30-20:00)
Location: Houston/San Antonio Pre-convene, State Room 1 and 2,
3rd floor
Monday,
December 9, 2013
ICDM:
Keynote (08:45-09:45) Chair: George Karypis
Location:
Houston Ballroom
Title:
Predictive Healthcare Analytics under Privacy Constraints
Speaker:
Joydeep Ghosh
Abstract
The move to electronic health records is producing a
wealth of information, which has the potential of providing unprecedented
insights into the cause, prevention, treatment and management of illnesses. Analyses of such data also promises numerous opportunities
for much more effective and efficient delivery of healthcare. However (valid)
privacy concerns and restrictions prevent unfettered access to such data. In
this talk I will first provide a perspective on the privacy vs. utility
trade-off in the context of healthcare analytics. I will then
outline two approaches that we have recently and successfully
taken that provide privacy-aware predictive modeling with little degradation in
model quality despite restrictions on what can be shared or analyzed. The first
approach focuses on extracting predictive value from data that has been
aggregated at various levels due to privacy concerns, while the second
introduces a novel, non-parametric sampler that can generate "realistic
but not real" data given a dataset that cannot be shared as is.
Biography
Joydeep Ghosh is currently the Schlumberger
Centennial Chair Professor of Electrical and Computer Engineering at the
University of Texas, Austin. He joined the UT-Austin faculty in 1988 after being
educated at, (B. Tech '83) and The University of Southern California (Ph.D’88).
He is the founder-director of IDEAL (Intelligent Data Exploration and Analysis
Lab) and a Fellow of the IEEE. Dr. Ghosh has taught graduate courses on data
mining and web analytics every year to both UT students and to industry, for
over a decade. He was voted as "Best Professor" in the Software
Engineering Executive Education Program at UT.
Dr. Ghosh's
research interests lie primarily in data mining and web mining, predictive
modeling / predictive analytics, machine learning approaches such as adaptive
multi-learner systems, and their applications to a wide variety of complex
real-world problems. He has published more than 300 refereed papers and 50 book
chapters, and co-edited over 20 books. His research has been supported by the
NSF, Yahoo!, Google, ONR, ARO, AFOSR, Intel, IBM, and several others. He has
received 14 Best Paper Awards over the years, including the 2005 Best Research
Paper Award across UT and the 1992 Darlington Award given by the IEEE Circuits
and Systems Society for the overall Best Paper in the areas of CAS/CAD. Dr.
Ghosh has been a plenary/keynote speaker on several occasions such as MICAI'12,
KDIR'10, ISIT'08, ANNIE’06 and MCS 2002, and has widely lectured on intelligent
analysis of large-scale data. He served as the Conference Co-Chair or Program
Co-Chair for several top data mining oriented conferences, including SDM'13,
SDM''12, KDD 2011, CIDM’07, ICPR'08 (Pattern Recognition Track) and SDM'06. He
was the Conf. Co-Chair for Artificial Neural Networks in Engineering (ANNIE)'93
to '96 and '99 to '03 and the founding chair of the Data Mining Tech. Committee
of the IEEE Computational Intelligence Society. He has also co-organized
workshops on high dimensional clustering, Web Analytics, Web Mining and
Parallel/ Distributed Knowledge Discovery.
Tea/Coffee
Break (09:45-10:05)
Location:
Houston/San Antonio Pre-convene, 3rd floor
Session
4A: Business Intelligence (Houston Ballroom A)
Session
Chair: Mohamed Ghalwash
Session
Time: 10:05-12:05
Regular Papers:
¨ Price
Information Patterns in Web Search Advertising: An Empirical Case Study on
Accommodation Industry, Guanting Tang, Yupin Yang, and Jian Pei
¨ Mining User Lifecycles
from Online Community Platforms and their Application to Churn Prediction, Mattthew Rowe
¨ Search Behavior Based
latent Semantic User Segmentation for Advertising Targeting, Xinyu Guo, Xueqing
Gong, Rong Zhang, Xiaofeng
He, and Aoying Zhou
¨ A High-Dimensional Set
Top Box Ad Targeting Algorithm Including Experimental
Comparisons to
Traditional TV Algorithms, Brendan Kitts, Dyng Au,
and Brian Burdick
¨ Collective Response Spike
Prediction for Mutually Interacting Consumers, Rikiya
Takahashi, Hideyuki Mizuta, Naoki Abe, Ruby Kennedy, Vincent Jeffs, Ravi Shah, and Robert Crites
Short Papers:
¨ A Probabilistic Behavior
Model for Discovering Unrecognized Knowledge, Takeshi Kurashima,
Tomoharu Iwata, Noriko Takaya, and Hiroshi Sawada
¨ How Many Zombie Users
Around You?, Hongfu Liu, Yuchao
Zhang, and Junjie Wu
Session
4B: Classification II (Houston Ballroom B)
Session
Chair: Zhi-Hua Zhou
Session
Time: 10:05-12:05
Regular Papers:
¨ Classification of
Multi-Dimensional Streaming Time Series by Weighting each Classifier's Track
Record, Bing Hu, Yanping Chen, Jesin
Zakaria, Liudmila Ulanova,
and Eamonn Keogh
¨ Controlling Attribute
Effect in Linear Regression, Toon Calders,
Asim Karim, Faisal Kamiran,
Wasif Ali, and Xiangliang
Zhang
¨ Context-Aware MIML
Instance Annotation, Forrest Briggs, Xiaoli Fern, and
Raviv Raich
¨ TL-PLSA: Transfer
Learning between Domains with Different Classes, Anastasia Krithara
and George Paliouras
¨ Multi-Instance
Multi-Graph Dual Embedding Learning, Jia Wu, and Xingquan Zhu
Short Papers:
¨ Leveraging Supervised
Label Dependency Propagation for Multi-label Learning, Bin Fu, Zhihai Wang, and Guandong Xu
¨ Multiclass
Semi-Supervised Boosting Using Similarity Learning, Jafar
Tanha, Mohammad Javad Saberian, and Maarten Someren
Session
4C: Text Mining (Houston Ballroom C)
Session
Chair: Kyuseok Shim
Session
Time: 10:05-12:05
Regular Papers:
¨ A Novel Relational
Learning to Rank Approach for Topic-Focused Multi-Document Summarization, Yadong Zhu, YanyanLan, Jiafeng Guo, Pan Du and Xueqi Cheng
¨ Constructing Topical
Hierarchies in Heterogeneous Information Networks, Chi Wang, Marina Danilevsky, Jialu Liu, Nihit Desai, HengJi and Jiawei Han
¨ Modeling Preferences with
Availability Constraints, Bing Tian Dai and Hady W. Lauw
¨ Tag-Weighted Dirichlet Allocation, Shuangyin
Li, Guan Huang, Ruiyang Tan and Rong
Pan
¨ Mining Summaries of
Propagations, Lucrezia Macchia,
Francesco Bonchi, Francesco Gullo
and Luca Chiarandini
Short Papers:
¨ Discriminatively Enhanced
Topic Models, Snigdha Chaturvedi,
Hal Daume III and Taesun
Moon
¨ External Evaluation of
Topic Models: A Graph Mining Approach, Hau Chan and
Leman Akoglu
Tutorial
2: Applied Matrix Analytics: Recent Advance and Case Studies
Speakers: Hanghang Tong (CUNY), Fei Wang
(IBM TJ Watson), and Chris Ding (UTA)
Location: San
Antonio B
Tutorial Time:
10:05 – 12: 05
Abstract:
Matrix provides
a natural representation for many real world data, such as images, documents,
networks, etc. Matrix based algorithms have been attracting tremendous
attention in the data mining research community because of its versatility,
neat interpretability, and broad applicability. This tutorial will review the
emerging matrix-based data mining algorithms in understanding and analyzing
human behavior. We will focus on the application of those technologies in two
high impact application domains, including social informatics and healthcare
informatics. Our emphasis will be on how recent emergent matrix-based data
mining algorithms have been advancing these application domains; and on the new
challenges posed by these applications.
LUNCH
(12:05-13:15)
Location:
ON YOUR OWN
Excursion
[Board buses at Draft Media
Sports Lounge Exit on Olive Street, Hotel North Tower Main Lobby]
Time: 13:15-18:30
Banquet & ICDM 13 Year
Impact Award Address (18:30-20:30)
Location: Lone Star A
Session Chair: Diane Cook
Tuesday,
December 10, 2013
ICDM:
Keynote (09:00-10:00) Chair: Diane Cook
Location:
Houston Ballroom
Title:
Large-scale Learning in Computational Advertising
Speaker:
Jianchang (JC) Mao
Abstract
Online
Advertising is one of the fastest growing businesses on the Internet
today. Search engines, web publishers, major ad networks, and ad
exchanges are now serving billions of ad impressions per day and generating
hundreds of terabytes of user events data every day. The rapid growth of online
advertising has created enormous opportunities as well as technical challenges
that involve Big Data. Computational Advertising attempts to mine the big data
for making optimal ads serving decision in order to maximize a total utility
function that captures publisher revenue, user experience and return on
investment for advertisers. It has emerged as a new interdisciplinary field
that involves information retrieval, machine learning, data mining, statistics,
operations research, and micro-economics, to solve challenging problems that
arise in online advertising.
In this talk, I
will outline a number of major big data learning problems in various aspects of
computational advertising, including user/query intent understanding,
document/ad understanding, user targeting, ad selection, relevance modeling,
user response prediction, keyword recommendation, forecasting,
allocation, and marketplace optimization. Then, I will showcase our recent
solutions to some of these problems, including query clustering for auction
optimization and keyword recommendation. Query clustering for auction
optimization is based on KL-divergence between two queries represented by their
rank-score distributions under the Gaussian mixture assumption. We derived a variational EM algorithm for minimizing an upper bound of
the total within-cluster KL-divergence. These clusters are then used for
optimizing auction parameters, which yields significant improvements in
marketplace KPIs. Keyword recommendation is formulated as a supervised
multi-label random forest learning problem where labels (categories) are tens
of millions of keywords and training data is automatically generated from click
logs. Large-scale experiments conducted with 50 million webpages and 10
million keywords extracted from Bing logs showed significant gains in precision
at 10 compared to previous ranking and NLP based techniques.
Biography
Jianchang (JC) Mao is Partner & Head of
Advertising Relevance and Revenue Development in the Applications and Services
Group at Microsoft, responsible for R&D of technologies and products that
power Paid Search and Display Marketplaces. He joined Microsoft in April 2012.
Previously, Mao was Vice President and Head of Advertising Sciences at Yahoo!
Labs, overseeing the R&D of advertising technologies and products. He was
also the Science/Engineering Director responsible for the development of
back-end technologies for several Yahoo! social search products including
Yahoo! Answers. Prior to joining Yahoo!, Mao was Director of Emerging
Technologies and Principal Architect at Verity Inc., a leader in enterprise
search, from 2000 to 2004. Prior to this, he was a research staff member at the
IBM Almaden Research Center from 1994 to 2000, after
receiving his PhD degree in computer science from Michigan State University in
1994. At Yahoo!, Mao was a Master Inventor awarded in 2012, received the
Leadership Superstar Award (for VP and above) in 2010, and received a Superstar
Team Award in 2008. During his tenure at IBM Almaden
Research Center, he received an IBM Outstanding Technical Achievement Award and
several Research Division Awards for outstanding contributions.
Mao’s research
interests include machine learning, data mining, information retrieval,
computational advertising, social networks, pattern recognition, and image
processing. He has published more than 50 papers in journals, book chapters,
and conferences, and holds 25 U.S. patents. Mao received an Honorable
Mention Award in ACM KDD Cup 2002 (Task 1: Information Extraction from Biomedical
Articles), an IEEE Transactions on Neural Networks Outstanding Paper Award in
1996 (for his 1995 paper), and an Honorable Mention Award from the
International Pattern Recognition Society in 1993. He served as an associate
editor of the IEEE Transactions on Neural Networks (1999-2000). Mao received
the Distinguished Alumni Award from the Computer Science and Engineering
Department at Michigan State University in 2011. Mao is an IEEE Fellow.
Tea/Coffee
Break (10:00-10:30)
Location:
Houston/San Antonio Pre-convene, 3rd floor
Session
5A: Big Data (Houston Ballroom A)
Session
Chair: Feida Zhu
Session
Time: 10:30-12:30
Regular Papers:
¨ Fast Pairwise Query
Selection for Large-Scale Active Learning to Rank, BuyueQian,
Xiang Wang, Jun Wang, WeifengZhi, Hongfei
Li, and Ian Davidson
¨ Efficient Visualization
of Large-scale Data Tables through Reordering and Entropy Minimization, Nemanja Djuric, and Slobodan Vucetic
¨ Communication-Efficient
Distributed Multiple Reference Pattern Matching for M2M Systems, Jui-Pin Wang, Yu-Chen Lu, Mi-Yen Yeh, Shou-De Lin, and Phillip
Gibbons
¨ Distributed Column Subset
Selection on MapReduce, Ahmed Farahat,
Ahmed Elgohary, Ali Ghodsi,
and Mohamed Kamel
Short Papers:
¨ MLI: An API for
Distributed Machine Learning, Evan Sparks, Ameet Talwalkar, Virginia Smith, Xinghao
Pan, Joseph Gonzales, Tim Kraska, Michael Jordan, and
Michael Franklin
¨ Efficient Invariant
Search for Distributed Information Systems, Yong Ge,
and Guofei Jiang
¨ Integrity Verification of
Outsourced Frequent Itemset Mining with Deterministic
Guarantee, Boxiang Dong, Ruilin
Liu, and Wendy Hui Wang
¨ PerturBoost: Practical Confidential
Classifier Learning in the Cloud, Keke Chen, and Shumin Guo
Session
5B: ClusteringII (Houston Ballroom B)
Session
Chair: Emmanuel Müller
Session
Time: 10:20-12:20
Regular Papers:
¨ Sparse K-Means with l_q(0<=q<=1) Penalty for High-Dimensional Data
Clustering, Yu Wang, Xiangyu Chang, Rongjian Li, and Zongben Xu
¨ Active Density-based
Clustering, Son T. Mai, Xiao He, Nina Hubig, Claudia
Plant, and Christian Boehm
¨ Power to the Points: Validating
Data Memberships in Clusterings, Parasaran
Raman, and Suresh Venkatasubramanian
¨ Stochastic Blockmodel with Cluster Overlap, Relevance Selection, and
Similarity-Based Smoothing, Joyce Jiyoung Whang, Piyush Rai,
and Inderjit Dhillon
Short Papers:
¨ Classification-Based
Clustering Evaluation, John Whissell, and Charles
Clarke
¨ Co-ClusterD:
A Distributed Framework for Data Co-Clustering with Sequential Updates, Sen Su,
Xiang Cheng, Lixin Gao, and Jiangtao
Yin
¨ Constrained Clustering:
Effective Constraints Propagation with Imperfect Oracle, Xiatian
Zhu, Chen Change Loy, and Shaogang Gong
¨ Most Clusters can be
Retrieved with Short Disjunctive Queries, Vinay Deolalikar
Session
5C: Active/Metric Learning (Houston Ballroom C)
Session
Chair: Jieping Ye
Session
Time: 10:30-12:30
Regular Papers:
¨ Maximizing Expected Model
Change for Active Learning in Regression, Wenbin Cai and Ya Zhang
¨ Kernel Density Metric
Learning, Yujie He, Wenlin
Chen, Yi Mao, and Yixin Chen
¨ Most-Surely vs.
Least-Surely Uncertain, Manali Sharma and Mustafa Bilgic
¨ Active Matrix Completion,
Shayok Chakraborty, Jiayu
Zhou, Vineeth Balasubramanian,
Sethuraman Panchanathan,
Ian Davidson, and Jieping Ye
Short Papers:
¨ Active Query Driven by Uncertainty
and Diversity for Incremental Multi-Label Learning, Sheng-Jun Huang and Zhi-Hua Zhou
¨ Efficient and Scalable
Information Geometry Metric Learning, Wei Wang, Baogang
Hu, and Zengfu Wang
¨ Online Active Learning
with Imbalanced Classes, Zahra Ferdowsi, Rayid Ghani, and Raffaella Settimi
¨ Accelerating Active
Learning with Transfer Learning, David Kale and Yan Liu
Tutorial
3: Social Media Mining: Fundamental Issues and Challenges
Speakers: Mohammad
Ali Abbasi (ASU), Huan Liu
(ASU), and Reza Zafarani (ASU)
Location: San
Antonio B
Tutorial Time:
10:30 – 12: 30
Abstract:
Social media generates massive amounts of
user-generated-content data. Such data differs from classic data and poses new
challenges to data mining. This tutorial presents fundamental issues of social
media mining, ranging from network representation to influence/diffusion
modeling, elaborate state-of-the-art approaches of processing and analyzing
social media data, and show how to utilize patterns to real-world applications,
such as recommendation and behavior analytics. The tutorials designed for
researchers, students and scholars interested in studying social media and
social networks. No prerequisite is required for ICDM participants to attend
this tutorial.
LUNCH
and ICDM 2013 Community Meeting (12:30-13:40)
Location:
Houston Pre-convene/Ballroom
Session
6A: Web Mining (Houston Ballroom A)
Session
Chair: Leman Akoglu
Session
Time: 13:40-15:40
Regular Papers:
¨ Semantic Frame-Based
Document Representation for Comparable Corpora, Hyungsul
Kim, Xiang Ren, Yizhou Sun, Chi Wang, and Jiawei Han
¨ Utilizing URLs Position
to Estimate Intrinsic Query-URL Relevance, Xiaogang
Han, Wenjun Zhou, Xing Jiang, Hengjie
Song and Toyoaki Nishida
¨ TopicSketch: Real-time Bursty Topic Detection from Twitter, Wei Xie, Feida Zhu, Jing Jiang, Ee-Peng Lim, and Ke Wang
¨ Classifying Spam Emails
using Text and Readability Features, Rushdi Shams and
Robert Mercer
Short Papers:
¨ Discriminative Link
Prediction using Local Links, Node Features and Community Structure, Abir De, Niloy Ganguly, and Soumen Chakrabarti
¨ Progression Analysis of
Community Strengths in Dynamic Networks, Nan Du, Jing Gao and Aidong Zhang
¨ A Model for Discovering
Correlations of Ubiquitous Things, Lina Yao, Quan Z
Sheng, Byron Gao, and Anne Ngu
¨ Bayesian Multi-task
Relationship Learning with Link Structure, Yingming
Li, Ming Yang, Zhongang Qi, and Zhongfei
(Mark) Zhang
Session
6B: Sequence/Time Series Analysis (Houston Ballroom B)
Session
Chair: Zhenhui (Jessie) Li
Session
Time: 13:40-15:40
Regular Papers:
¨ Enumeration of Time
Series Motifs, Abdullah Mueen
¨ Modeling Temporal
Adoptions Using Dynamic Matrix Factorization, Freddy Chong Tat Chua, Richard Oentaryo, and Ee-Peng Lim
¨ Online Estimation of
Discrete Densities, Michael Geilke, Andreas Karwath, Eibe Frank and Stefan
Kramer
¨ Time Series
Classification Using Compression Distance of Recurrence Plots, Diego Silva, Vinicius Souza, and Gustavo Batista
Short Papers:
¨ Efficient Online Sequence
Prediction with Side Information, Han Xiao
¨ Efficient Proper Length
Time Series Motif Discovery, Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks and Chotirat Ratanamahatana
¨ Adaptive Model Tree for
Streaming Data, Anca Zimmer, Michael Kurze and Thomas Seidl
¨ SAX-VSM: Interpretable
Time Series Classification Using SAX and Vector Space Model, Pavel Senin and Sergey Malinchik
Session
6C: Bioinformatics and Medical Informatics (Houston Ballroom C)
Session
Chair: Gaurav Pandey
Session
Time: 13:40-15:40
Regular Papers:
¨ Regularization Paths for
Sparse Nonnegative Least Squares Problems with Applications to Life Cycle
Assessment Tree Discovery, Jingu Kim, Naren Ramakrishnan, Manish Marwah, Amip Shah, and Haesun Park
¨ A Comparative Analysis of
Ensemble Classifiers: Case Studies in Genomics, Sean Whalen, and Gaurav Pandey
¨ Cox Regression with
Correlation Based Regularization for Electronic Health Records, Bhanukiran Vinzamuri, and Chandan Reddy
¨ Extraction of
Interpretable Multivariate Patterns for Early Diagnostics, Mohamed Ghalwash, Vladan Radosavljevic, and Zoran Obradovic
Short Papers:
¨ Transfer Learning Across
Cancers on DNA Copy Number Variation Analysis, Huanan
Zhang, Ze Tian, and Rui Kuang
¨ Exploring Patient Risk
Groups with Incomplete Knowledge, Xiang Wang, Fei
Wang, Jun Wang, Buyue Qian, and Jianying
Hu
¨ Quantitative Prediction
of Glaucomatous Visual Field Loss from Few Measurements, Zeng-Han Liang, Ryota Tomioka, Hiroshi Murata,
Ryo Asaoka, and Kenji Yamanishi
¨ Statistical inference of
protein “LEGO Bricks”, Arun Konagurthu,
Arthur Lesk, David Abramson, Peter Stuckey, and Lloyd
Allison
Session
6D: Feature Selection (San Antonio B)
Session
Chair: Wei Ding
Session
Time: 13:40-15:40
Regular Papers:
¨ Markov Blanket Feature
Selection with Non-Faithful Data Distributions, Kui
Yu, Xindong Wu, Zan Zhang,
Yang Mu, Hao Wang, and Wei Ding
¨ Feature Transformation
with Class Conditional Decorrelation, Xu-Yao Zhang
¨ Local and Global
Discriminative Learning for Unsupervised Feature Selection, Liang Du, Zhiyong Shen, Peng Zhou, and Yi-Dong Shen
¨ An Unsupervised Algorithm
for Learning Blocking Schemes, Mayank Kejriwal and Daniel Miranker
¨ The Pairwise Gaussian
Random Field for High-Dimensional Data Imputation, Zhuhua
Cai, Christopher Jermaine, Zografoula
Vagena, Dionysios Logothetis, and Luis L. Perez
Short Papers:
¨ Group Feature Selection
with Streaming Features, Haiguang Li and Xindong Wu
¨ Multitask Learning with
Feature Selection for Groups of Related Tasks, Meenakshi
Mishra, and Jun Huan
Tea/Coffee
Break (15:40-16:00)
Location:
Houston/San Antonio Pre-convene, 3rd floor
Session
7A: Models and Algorithms II (Houston Ballroom C)
Session
Chair: Chandan Reddy
Session
Time: 16:00-17:00
Regular Papers:
¨ Efficient Algorithms for Selecting
Features with Arbitrary Group Constraints via Group Lasso, Deguang
Kong and Chris Ding
¨ Bayesian Discovery of
Multiple Bayesian Networks via Transfer Learning, Diane Oyen
and Terran Lane
Short Papers:
¨ Large Scale Elastic Net
Regularized Linear Classification SVMs and Logistic Regression, Balamurugan Palaniappan
¨ Walk 'n' Merge: A
Scalable Algorithm for Boolean Tensor Factorization, Dora Erdos,
and Pauli Miettinen
Session
7B: Applications II (Houston Ballroom B)
Session
Chair: Rui Kuang
Session
Time: 16:00-17:10
Regular Papers:
¨ Guiding Autonomous Agents
to Better Behaviors through Human Advice, Gautam Kunapuli, Phillip Odom, Jude Shavlik,
and Sriraam Natarajan
¨ Dynamic Pattern Detection
with Temporal Consistency and Connectivity Constraints, Skyler Speakman, Yating Zhang, and Daniel
Neill
Short Papers:
¨ On Good and Fair
Paper-Reviewer Assignment, Cheng Long, Raymond Chi-Wing Wong, Yu Peng, and Liangliang Ye
¨ Coupled Heterogeneous
Association Rule Mining (CHARM): Application toward Inference of Modulatory
Climate Relationships, Doel L. Gonzalez II, Saurabh V. Pendse, Kanchana Padmanabhan, Michael P.
Angus, Isaac K. Tetteh, Shashank
Srinivas, Andrea Villanes,
Fredrick Semazzi, Vipin
Kumar, and Nagiza F. Samatova
¨ Prominent Features of
Rumor Propagation in Online Social Media, Sejeong Kwon,
Meeyoung Cha, Kyomin Jung,
Wei Chen, and Yajun Wang
ICDM
Panel: Data Mining with Big Data (16:00 – 17:30)
Chair:
Xindong Wu
Location:
Houston Ballroom A
Panelists:
Chris Clifton, Program
Director, US National Science Foundation
Vipin Kumar, (ACM and IEEE Fellow), University of Minnesota
Jian Pei (TKDE
Editor-in-Chief), Simon Fraser University
Bhavani Thuraisingham
(IEEE Fellow), University of Texas at Dallas
Geoff Webb (DMKD Editor-in-Chief), Monash
University
Zhi-Hua Zhou (IEEE Fellow), Nanjing University
CLOSING
SESSION (17:30 -18:00)
Location:
Houston Ballroom A