This is the title.

ICDM’13 Technical Program

Sunday, December 8, 2013

Conference Opening (08:30--09:00)

Location: Houston Ballroom

ICDM: Keynote (09:00-10:00) Chair: Hui Xiong

Location: Houston Ballroom

Title: Opportunities and Challenges Facing Recommender Systems: Where Can We Go from Here?

Speaker: Alexander Tuzhilin

Abstract

The field of Recommender Systems has experienced extensive growth over the last decade both in the academia and the industry and has established itself as a vibrant and mature research area in data mining and other disciplines. Given all this progress and accomplishments achieved by the field, it is a good time to ask the where-can-we-go-from-here question in order to work on the next generation of recommender systems that will allow us to overcome the limitations that the current generation of these systems is facing.

This talk will address this question and present some of the underexplored directions in the field of recommender systems that present promising research opportunities according to the speaker’s perspective. It will also address the challenges that researchers and practitioners may face pursuing some of these research directions.

Biography
Alexander Tuzhilin is a Professor of Information Systems, the NEC Faculty Fellow and the Chair of the Department of Information, Operations and Management Sciences at the Stern School of Business at NYU. He has received Ph.D. in Computer Science from the Courant Institute of Mathematical Sciences, NYU. His current research interests include data mining, recommender systems and personalization. Dr. Tuzhilin has published extensively on these and other topics and has served on the organizing and program committees of numerous conferences, including as a Program Co-Chair of the Third IEEE International Conference on Data Mining (ICDM), as a Conference Co-Chair of the Third ACM Conference on Recommender Systems (RecSys), and as the Chair of the Steering Committee of the ACM Conference on Recommender Systems. He has also served on the Editorial Boards of the IEEE Transactions on Knowledge and Data Engineering, the Data Mining and Knowledge Discovery Journal, the ACM Transactions on Management Information Systems, the INFORMS Journal on Computing (as an Area Editor), the Electronic Commerce Research Journal and the Journal of the Association of Information Systems. Results of Dr. Tuzhilin’s various academic and industrial activities have been featured in major media publications, including The New York Times, The Wall Street Journal, Business Week and The Financial Times.

Tea/Coffee Break (10:00-10:30)

Location: Houston/San Antonio Pre-convene, 3rd floor

Session 1A: Social Influence (Houston Ballroom A)

Session Chair: Jie Tang

Session Time: 10:30-12:30

Regular Papers:

¨ CSI: Charged System Influence Model for Human Behavior Prediction, Yuanjun Bi, Weili Wu, and Yuqing Zhu

¨ Linear Computation for Independent Social Influence, Qi Liu, Biao Xiang, Lei Zhang, Enhong Chen, and Ji Chen

¨ Massive Influence in Multiplex Social Networks: Model Representation and Analysis, Dung Nguyen, Soham Das, Thang Dinh, and My T. Thai

¨ UBLF: An Upper Bound Based Approach to Discover Influential Nodes in Social Networks, Chuan Zhou, Peng Zhang, Jing Guo, Xingquan Zhu, and Li Guo

Short Papers:

¨ Influence and Profit: Two Sides of the Coin, Yuqing Zhu, Zaixin Lu, Yuanjun Bi, Weili Wu, Yiwei Jiang, and Deying Li

¨ Influence-based Network-oblivious Community Detection, Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco

¨ Validating Network Value of Influencers by means of Explanations, Glenn Bevilacqua, Shealen Clare, Amit Goyal, and Laks V. S. Lakshmanan

¨ Influence Maximization in Dynamic Social Networks, Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, and Xiaoming Sun

Session 1B: Classification I (Houston Ballroom B)

Session Chair: Jing Gao

Session Time: 10:30-12:30

Regular Papers:

¨ Generative Maximum Entropy Learning for Multiclass Classification, Ambedkar Dukkipati, Gaurav Pandey, Debarghya Ghoshdastidar, Paramita Koley, and D. M. V. Satya Sriram

¨ Transfer Learning across Networks for Collective Classification, Meng Fang, Yin Jie, and Xingquan Zhu

¨ Learning Imbalanced Multi-class Data with Optimal Dichotomy Weights, Xu-Ying Liu, Zhi-Hua Zhou, and Qian-Qian Li

¨ Conformal Prediction Using Decision Trees, Ulf Johansson, Henrik Bostrom, and Tuve Lofstrom

¨ Quantification Trees, Fosca Giannotti, Letizia Milli, Anna Monreale, Dino Pedreschi, Giulio Rossetti, and Fabrizio Sebastiani

Short Papers:

¨ Multilabel Consensus Classification, Sihong Xie, Xiangnan Kong, Jing Gao, Wei Fan, and Philip S Yu

¨ Combating Sub-clusters Effect in Imbalanced Classification, Abhishek Shrivastava, and Yang Zhao

Session 1C: Applications I (Houston Ballroom C)

Session Chair: Abdullah Mueen

Session Time: 10:30-12:30

Regular Papers:

¨ Parameter-Free Audio Motif Discovery in Large Data Archives, Yuan Hao, Mohammad Shokoohi-Yekta, George Papageorgiou, and Eamonn Keogh

¨ Forecasting Spatiotemporal Impact of Traffic Incidents on Road Networks, Bei Pan, UgurDemiryurek, Chetan Gupta, and Cyrus Shahabi

¨ GRIAS: an Entity-Relation Graph based Framework for Discovering Entity Aliases, Lili Jiang, Ping Luo, Jianyong Wang, Yuhong Xiong, Bingduan Lin, Min Wang, and Ning An

¨ Identifying Transformative Scientific Research, Yi-hung Huang, Chun-Nan Hsu, and Kristina Lerman

¨ A Parameter-free Spatio-temporal Pattern Mining Model to Catalogue Global Ocean Dynamics, James Faghmous, Matthew Le, Muhammed Uluyol, Snigdhansu Chaterjee, and Vipin Kumar

Short Papers:

¨ From Social User Activities to People Affiliation, Guangxiang Zeng, Ping Luo, Enhong Chen, and Min Wang

¨ Improved Electricity Load Forecasting via Kernel Spectral Clustering of Smart Meters, Carlos Alzate and Mathieu Sinn

Tutorial 1: Methods and Applications of Network Sampling

Speakers: Mohammad A. Hasan (IUPUI), Nesreen K. Ahmed (Purdue), and J. Neville (Purdue)

Location: San Antonio B

Tutorial Time: 10:30 – 12: 30

Abstract:

Network data appears in various domains, including social, communication, and information sciences. Analysis of such data is crucial for making inferences and predictions about these networks, and moreover, for understanding the different processes that drive their evolution. However, a major bottleneck to perform such an analysis is the massive size of real-life networks, which makes modeling and analyzing these networks simply infeasible. Further, many networks, specifically those that belong to social and communication domains, are not visible to the public due to privacy concerns, and other networks, such as the Web, are only accessible via crawling. Therefore, to overcome the above challenges, researchers use network sampling overwhelmingly as a key statistical approach to select a sub-population of interest that can be studied thoroughly.

In this tutorial, we aim to cover a diverse collection of methodologies and applications of network sampling. We will begin with a discussion of the problem setting in terms of objectives (such as, sampling a representative subgraph, sampling graphlets, etc.), population of interest (vertices, edges, motifs), and sampling methodologies (such as Metropolis-Hastings, random walk, and snowball sampling). We will then present a number of applications of these methods, and will outline both the resulting opportunities and possible biases of different methods in each application.

Session Chair: Yizhou Sun

Session Time: 13:40-15:40

LUNCH (12:30-13:40)

Location: ON YOUR OWN

Session 2A: Social Network Analysis (Houston Ballroom A)

Session Chair: Yizhou Sun

Session Time: 13:40-15:40

Regular Papers:

¨ Tree-like Structure in Social and Information Networks, Aaron Adcock, Blair Sullivan, and Michael Mahoney

¨ An Efficient Approach to Updating Closeness Centrality and Average Path Length in Dynamic Networks, Chia-Chen Yen, Mi-Yen Yeh, and Ming-Syan Chen

¨ Learning, Analyzing and Predicting Object Roles on Dynamic Networks, Kang Li, Suxin Guo, Nan Du, Jing Gao, and Aidong Zhang

¨ Subgraph Enumeration in Dynamic Graphs, Abhijin Adiga, Anil Vullikanti, and Dante Wiggins

Short Papers:

¨ Predicting Social Links for New Users across Aligned Heterogeneous Social Networks, Jiawei Zhang, Xiangnan Kong, and Philip S Yu

¨ Sampling Heterogeneous Social Networks, Cheng-Lun Yang, Perng-Hwa Kung, Cheng-Te Li, Chun-An Chen, and Shou-De Lin

¨ A Feature-Enhanced Ranking-Based Classifier for Multimodal Data and Heterogeneous Information Networks, Scott Deeann Chen, Ying-Yu Chen, Jiawei Han, and Pierre Moulin

¨ Community Detection in Networks with Node Attributes, Jaewon Yang, Julian McAuley, and Jure Leskovec

Session 2B: Clustering I (Houston Ballroom B)

Session Chair: Kamal Karlapalem

Session Time: 13:40-15:40

Regular Papers:

¨ Mixed Membership Subspace Clustering, Stephan Gunnemann and Christos Faloutsos

¨ Discovering Non-Redundant Overlapping Biclusters on Gene Expression Data, Duy Tin Truong, Roberto Battiti, and Mauro Brunato

¨ Spectral Subspace Clustering for Graphs with Feature Vectors, Stephan Gunnemann, Ines Farber, Sebastian Raubach, and Thomas Seidl

¨ Weighted-Object Ensemble Clustering, YazhouRen, Carlotta Domeniconi, Guoji Zhang, and Guoxian Yu

¨ Noise-Resistant Bicluster Recognition, Huan Sun, Gengxin Miao, and Xifeng Yan

Short Papers:

¨ Cartification: A Neighborhood Preserving Transformation for Mining High Dimensional Data, Emin Aksehirli, Bart Goethals, Emmanuel Muller, and Jilles Vreeken

¨ Clustering on Multiple Incomplete Datasets via Collective Kernel Learning, Weixiang Shao, Xiaoxiao Shi, and Philip S Yu

Session 2C: Graph and Network Mining (Houston Ballroom C)

Session Chair: Hanghang Tong

Session Time: 13:40-15:40

Regular Papers:

¨ On Pattern Preserving Graph Generation, Hong-Han Shuai, De-Nian Yang, Philip S Yu, Chih-Ya Shen, and Ming-Syan Chen

¨ Blocking Simple and Complex Contagion By Edge Removal, Christopher Kuhlman, Gaurav Tuli, Samarth Swarup, Madhav Marathe, and S.S. Ravi

¨ BIG-ALIGN: Fast Bipartite Graph Alignment, Danai Koutra, Hanghang Tong, and David Lubensky

¨ Compression-based Graph Mining Exploiting Structure Primitives, Jing Feng, Xiao He, Nina Hubig, Christian Bohm, and Claudia Plant

¨ Mining Evolving Network Processes, Misael Mongiovi, Petko Bogdanov, and Ambuj Singh

Short Papers:

¨ Structural-Context Similarities for Uncertain Graphs, Zhaonian Zou, and Jianzhong Li

¨ Graph Partitioning Change Detection Using Tree-Based Clustering, Sho-ichi Sato and Kenji Yamanishi

ICDM Contest Session (14:00 – 15:40)

Location: San Antonio B

Session Chair: Nitesh Chawla

Tea/Coffee Break (15:40-16:00)

Location: Houston/San Antonio Pre-convene, 3rd floor

Session 3A: Pattern Discovery (Houston Ballroom A)

Session Chair: Xiang Zhang

Session Time: 16:00-18:00

Regular Papers:

¨ Permutation-based Sequential Pattern Hiding, Robert Gwadera, Aris Gkoulalas-Divanis, and Grigorios Loukides

¨ Itemsets for Real-valued Datasets, Nikolaj Tatti

¨ Mining Statistically Significant Sequential Patterns, Cecile Low Kam, Chedy Raissi, Mehdi Kaytoue, and Jian Pei

¨ Dominance Programming for Itemset Mining, Benjamin Negrevergne, Tias Guns, Anton Dries, and Siegfried Nijssen

¨ Binary Time-Series Query Framework for Efficient Quantitative Trait Association Study, Hongfei Wang and Xiang Zhang

Short Papers:

¨ Efficiently Mining Top-K High Utility Sequential Patterns, Junfu Yin, Zhigang Zheng, and Longbing Cao

¨ Mining Dependent Frequent Serial Episodes from Uncertain Sequence Data, Li Wan, Ling Chen, and Chengqi Zhang

Session 3B: Models and Algorithms I (Houston Ballroom B)

Session Chair: Francois Petitjean

Session Time: 16:00-18:00

Regular Papers:

¨ Efficient Learning for Models with DAG-Structured Parameter Constraints, Wenliang Zhong and James Kwok

¨ Efficient Learning on Point Sets, Liang Xiong, Barnabas Poczos, and Jeff Schneider

¨ Divide-and-Conquer Anchoring for Near-separable Nonnegative Matrix Factorization and Completion in High Dimensions, Tianyi Zhou, Wei Bian, and Dacheng Tao

¨ Scaling Log Linear Analysis to High-dimensional Data, Francois Petitjean, Geoffrey Webb, and Ann Nicholson

Short Papers:

¨ Non-negative Multiple Tensor Factorization, Koh Takeuchi, RyotaTomioka, Katsuhiko Ishiguro, Akisato Kimura, and Hiroshi Sawada

¨ Multimedia LEGO: Learning Structured Model by Probabilistic Logic Ontology Tree, Shiyu Chang, Guo-Jun Qi, Jinhui Tang, Qi Tian, Yong Rui, and Thomas Huang

¨ Nonlinear Causal Discovery for High Dimensional Data: A Kernelized Trace Method, Zhitang Chen, Kun Zhang, and Laiwan Chan

¨ Network Hypothesis Testing Using Mixed Kronecker Product Graph Models, Sebastian Moreno and Jennifer Neville

Session 3C: Mobile Intelligence (Houston Ballroom C)

Session Chair: Wen-Chih Peng

Session Time: 16:00-18:00

Regular Papers:

¨ Reconstructing Individual Mobility from Smart Card Transactions: A Space Alignment Approach, Nicholas Jing Yuan, Yingzi Wang, Fuzheng Zhang, Xing Xie, and Guang-Zhong Sun

¨ Mining Probabilistic Frequent Spatio-Temporal Sequential Patterns with Gap Constraints from Uncertain Databases, Yuxuan Li, James Bailey, Lars Kulik, and Jian Pei

¨ Mining Following Relationships in Movement Data, Zhenhui Li, and Fei Wu

¨ Focal-Test-Based Spatial Decision Tree Learning: A Summary of Results, Zhe Jiang, Shashi Shekhar, Xun Zhou, Joseph Knight, and Jennifer Corcoran

Short Papers:

¨ Spatio-Temporal Topic Modeling in Mobile Social Media for Location and Time Recommendation, Bo Hu, Mohsen Jamali, and Martin Ester

¨ On the Feature Discovery for App Usage Prediction in Smartphones, Zhung-Xun Liao, Shou-Chung Li, Wen-Chih Peng, Philip S Yu, and Te-Chuan Liu

¨ A Mobility Simulation Framework of Humans with Group Behavior Modeling, Anshul Gupta, Aurosish Mishra, Satya Gautam Vadlamudi, P P Chakrabarti, Sudeshna Sarkar, Tridib Mukherjee, and Nathan Gnanasambandam

¨ Hibernating Process: Modelling Mobile Calls at Multiple Scales, Siyuan Liu, Lei Li, and Ramayya Krishnan

Session 3D: Data Preprocessing (San Antonio B)

Session Chair: Petko Bogdanov

Session Time: 16:00-18:00

Regular Papers:

¨ Statistical Selection of Congruent Subspaces for Outlier Detection on Attributed Graphs, Patricia Iglesias Sanchez, Emmanuel Mueller, Fabian Laforet, Fabian Keller, and Klemens Boehm

¨ Explaining Outliers by Subspace Separability, Barbora Micenkova, Raymond T. Ng, Ira Assent, and Xuan-Hong Dang

¨ Min-Max Hash for Jaccard Similarity, JianqiuJi, Jianmin Li, Shuicheng Yan, Qi Tian, and Bo Zhang

¨ wRACOG: A Gibbs Sampling-Based Oversampling Technique, Barnan Das, Narayanan Chatapuram Krishnan, and Diane Cook

¨ A Masking Index for Quantifying Hidden Glitches, Laure Berti-Equille, Ji Meng Loh, and Dasu Tamraparni

Short Papers:

¨ On Anomalous Hot Spot Discovery in Graph Streams, Weiren Yu, Charu Aggarwal, Shuai Ma, and Haixun Wang

¨ Beyond Boolean Matrix Decompositions: Toward Factor Analysis and Dimensionality Reduction of Ordinal Data, Radim Belohlavek, and Marketa Krmelova

Reception&Poster Session (18:30-20:00)

Location: Houston/San Antonio Pre-convene, State Room 1 and 2, 3rd floor

Monday, December 9, 2013

ICDM: Keynote (08:45-09:45) Chair: George Karypis

Location: Houston Ballroom

Title: Predictive Healthcare Analytics under Privacy Constraints

Speaker: Joydeep Ghosh

Abstract
The move to electronic health records is producing a wealth of information, which has the potential of providing unprecedented insights into the cause, prevention, treatment and management of illnesses. Analyses of such data also promises numerous opportunities for much more effective and efficient delivery of healthcare. However (valid) privacy concerns and restrictions prevent unfettered access to such data. In this talk I will first provide a perspective on the privacy vs. utility trade-off in the context of healthcare analytics. I will then outline two approaches that we have recently and successfully taken that provide privacy-aware predictive modeling with little degradation in model quality despite restrictions on what can be shared or analyzed. The first approach focuses on extracting predictive value from data that has been aggregated at various levels due to privacy concerns, while the second introduces a novel, non-parametric sampler that can generate "realistic but not real" data given a dataset that cannot be shared as is.

Biography
Joydeep Ghosh is currently the Schlumberger Centennial Chair Professor of Electrical and Computer Engineering at the University of Texas, Austin. He joined the UT-Austin faculty in 1988 after being educated at, (B. Tech '83) and The University of Southern California (Ph.D’88). He is the founder-director of IDEAL (Intelligent Data Exploration and Analysis Lab) and a Fellow of the IEEE. Dr. Ghosh has taught graduate courses on data mining and web analytics every year to both UT students and to industry, for over a decade. He was voted as "Best Professor" in the Software Engineering Executive Education Program at UT.

Dr. Ghosh's research interests lie primarily in data mining and web mining, predictive modeling / predictive analytics, machine learning approaches such as adaptive multi-learner systems, and their applications to a wide variety of complex real-world problems. He has published more than 300 refereed papers and 50 book chapters, and co-edited over 20 books. His research has been supported by the NSF, Yahoo!, Google, ONR, ARO, AFOSR, Intel, IBM, and several others. He has received 14 Best Paper Awards over the years, including the 2005 Best Research Paper Award across UT and the 1992 Darlington Award given by the IEEE Circuits and Systems Society for the overall Best Paper in the areas of CAS/CAD. Dr. Ghosh has been a plenary/keynote speaker on several occasions such as MICAI'12, KDIR'10, ISIT'08, ANNIE’06 and MCS 2002, and has widely lectured on intelligent analysis of large-scale data. He served as the Conference Co-Chair or Program Co-Chair for several top data mining oriented conferences, including SDM'13, SDM''12, KDD 2011, CIDM’07, ICPR'08 (Pattern Recognition Track) and SDM'06. He was the Conf. Co-Chair for Artificial Neural Networks in Engineering (ANNIE)'93 to '96 and '99 to '03 and the founding chair of the Data Mining Tech. Committee of the IEEE Computational Intelligence Society. He has also co-organized workshops on high dimensional clustering, Web Analytics, Web Mining and Parallel/ Distributed Knowledge Discovery.

Tea/Coffee Break (09:45-10:05)

Location: Houston/San Antonio Pre-convene, 3rd floor

Session 4A: Business Intelligence (Houston Ballroom A)

Session Chair: Mohamed Ghalwash

Session Time: 10:05-12:05

Regular Papers:

¨ Price Information Patterns in Web Search Advertising: An Empirical Case Study on Accommodation Industry, Guanting Tang, Yupin Yang, and Jian Pei

¨ Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction, Mattthew Rowe

¨ Search Behavior Based latent Semantic User Segmentation for Advertising Targeting, Xinyu Guo, Xueqing Gong, Rong Zhang, Xiaofeng He, and Aoying Zhou

¨ A High-Dimensional Set Top Box Ad Targeting Algorithm Including Experimental

Comparisons to Traditional TV Algorithms, Brendan Kitts, Dyng Au, and Brian Burdick

¨ Collective Response Spike Prediction for Mutually Interacting Consumers, Rikiya Takahashi, Hideyuki Mizuta, Naoki Abe, Ruby Kennedy, Vincent Jeffs, Ravi Shah, and Robert Crites

Short Papers:

¨ A Probabilistic Behavior Model for Discovering Unrecognized Knowledge, Takeshi Kurashima, Tomoharu Iwata, Noriko Takaya, and Hiroshi Sawada

¨ How Many Zombie Users Around You?, Hongfu Liu, Yuchao Zhang, and Junjie Wu

Session 4B: Classification II (Houston Ballroom B)

Session Chair: Zhi-Hua Zhou

Session Time: 10:05-12:05

Regular Papers:

¨ Classification of Multi-Dimensional Streaming Time Series by Weighting each Classifier's Track Record, Bing Hu, Yanping Chen, Jesin Zakaria, Liudmila Ulanova, and Eamonn Keogh

¨ Controlling Attribute Effect in Linear Regression, Toon Calders, Asim Karim, Faisal Kamiran, Wasif Ali, and Xiangliang Zhang

¨ Context-Aware MIML Instance Annotation, Forrest Briggs, Xiaoli Fern, and Raviv Raich

¨ TL-PLSA: Transfer Learning between Domains with Different Classes, Anastasia Krithara and George Paliouras

¨ Multi-Instance Multi-Graph Dual Embedding Learning, Jia Wu, and Xingquan Zhu

Short Papers:

¨ Leveraging Supervised Label Dependency Propagation for Multi-label Learning, Bin Fu, Zhihai Wang, and Guandong Xu

¨ Multiclass Semi-Supervised Boosting Using Similarity Learning, Jafar Tanha, Mohammad Javad Saberian, and Maarten Someren

Session 4C: Text Mining (Houston Ballroom C)

Session Chair: Kyuseok Shim

Session Time: 10:05-12:05

Regular Papers:

¨ A Novel Relational Learning to Rank Approach for Topic-Focused Multi-Document Summarization, Yadong Zhu, YanyanLan, Jiafeng Guo, Pan Du and Xueqi Cheng

¨ Constructing Topical Hierarchies in Heterogeneous Information Networks, Chi Wang, Marina Danilevsky, Jialu Liu, Nihit Desai, HengJi and Jiawei Han

¨ Modeling Preferences with Availability Constraints, Bing Tian Dai and Hady W. Lauw

¨ Tag-Weighted Dirichlet Allocation, Shuangyin Li, Guan Huang, Ruiyang Tan and Rong Pan

¨ Mining Summaries of Propagations, Lucrezia Macchia, Francesco Bonchi, Francesco Gullo and Luca Chiarandini

Short Papers:

¨ Discriminatively Enhanced Topic Models, Snigdha Chaturvedi, Hal Daume III and Taesun Moon

¨ External Evaluation of Topic Models: A Graph Mining Approach, Hau Chan and Leman Akoglu

Tutorial 2: Applied Matrix Analytics: Recent Advance and Case Studies

Speakers: Hanghang Tong (CUNY), Fei Wang (IBM TJ Watson), and Chris Ding (UTA)

Location: San Antonio B

Tutorial Time: 10:05 – 12: 05

Abstract:

Matrix provides a natural representation for many real world data, such as images, documents, networks, etc. Matrix based algorithms have been attracting tremendous attention in the data mining research community because of its versatility, neat interpretability, and broad applicability. This tutorial will review the emerging matrix-based data mining algorithms in understanding and analyzing human behavior. We will focus on the application of those technologies in two high impact application domains, including social informatics and healthcare informatics. Our emphasis will be on how recent emergent matrix-based data mining algorithms have been advancing these application domains; and on the new challenges posed by these applications.

LUNCH (12:05-13:15)

Location: ON YOUR OWN

Excursion

[Board buses at Draft Media Sports Lounge Exit on Olive Street, Hotel North Tower Main Lobby]

Time: 13:15-18:30

Banquet & ICDM 13 Year Impact Award Address (18:30-20:30)

Location: Lone Star A

Session Chair: Diane Cook

Tuesday, December 10, 2013

ICDM: Keynote (09:00-10:00) Chair: Diane Cook

Location: Houston Ballroom

Title: Large-scale Learning in Computational Advertising

Speaker: Jianchang (JC) Mao

Abstract

Online Advertising is one of the fastest growing businesses on the Internet today. Search engines, web publishers, major ad networks, and ad exchanges are now serving billions of ad impressions per day and generating hundreds of terabytes of user events data every day. The rapid growth of online advertising has created enormous opportunities as well as technical challenges that involve Big Data. Computational Advertising attempts to mine the big data for making optimal ads serving decision in order to maximize a total utility function that captures publisher revenue, user experience and return on investment for advertisers. It has emerged as a new interdisciplinary field that involves information retrieval, machine learning, data mining, statistics, operations research, and micro-economics, to solve challenging problems that arise in online advertising.

In this talk, I will outline a number of major big data learning problems in various aspects of computational advertising, including user/query intent understanding, document/ad understanding, user targeting, ad selection, relevance modeling, user response prediction, keyword recommendation, forecasting, allocation, and marketplace optimization. Then, I will showcase our recent solutions to some of these problems, including query clustering for auction optimization and keyword recommendation. Query clustering for auction optimization is based on KL-divergence between two queries represented by their rank-score distributions under the Gaussian mixture assumption. We derived a variational EM algorithm for minimizing an upper bound of the total within-cluster KL-divergence. These clusters are then used for optimizing auction parameters, which yields significant improvements in marketplace KPIs. Keyword recommendation is formulated as a supervised multi-label random forest learning problem where labels (categories) are tens of millions of keywords and training data is automatically generated from click logs. Large-scale experiments conducted with 50 million webpages and 10 million keywords extracted from Bing logs showed significant gains in precision at 10 compared to previous ranking and NLP based techniques.

Biography

Jianchang (JC) Mao is Partner & Head of Advertising Relevance and Revenue Development in the Applications and Services Group at Microsoft, responsible for R&D of technologies and products that power Paid Search and Display Marketplaces. He joined Microsoft in April 2012. Previously, Mao was Vice President and Head of Advertising Sciences at Yahoo! Labs, overseeing the R&D of advertising technologies and products. He was also the Science/Engineering Director responsible for the development of back-end technologies for several Yahoo! social search products including Yahoo! Answers. Prior to joining Yahoo!, Mao was Director of Emerging Technologies and Principal Architect at Verity Inc., a leader in enterprise search, from 2000 to 2004. Prior to this, he was a research staff member at the IBM Almaden Research Center from 1994 to 2000, after receiving his PhD degree in computer science from Michigan State University in 1994. At Yahoo!, Mao was a Master Inventor awarded in 2012, received the Leadership Superstar Award (for VP and above) in 2010, and received a Superstar Team Award in 2008. During his tenure at IBM Almaden Research Center, he received an IBM Outstanding Technical Achievement Award and several Research Division Awards for outstanding contributions.

Mao’s research interests include machine learning, data mining, information retrieval, computational advertising, social networks, pattern recognition, and image processing. He has published more than 50 papers in journals, book chapters, and conferences, and holds 25 U.S. patents. Mao received an Honorable Mention Award in ACM KDD Cup 2002 (Task 1: Information Extraction from Biomedical Articles), an IEEE Transactions on Neural Networks Outstanding Paper Award in 1996 (for his 1995 paper), and an Honorable Mention Award from the International Pattern Recognition Society in 1993. He served as an associate editor of the IEEE Transactions on Neural Networks (1999-2000). Mao received the Distinguished Alumni Award from the Computer Science and Engineering Department at Michigan State University in 2011. Mao is an IEEE Fellow.

Tea/Coffee Break (10:00-10:30)

Location: Houston/San Antonio Pre-convene, 3rd floor

Session 5A: Big Data (Houston Ballroom A)

Session Chair: Feida Zhu

Session Time: 10:30-12:30

Regular Papers:

¨ Fast Pairwise Query Selection for Large-Scale Active Learning to Rank, BuyueQian, Xiang Wang, Jun Wang, WeifengZhi, Hongfei Li, and Ian Davidson

¨ Efficient Visualization of Large-scale Data Tables through Reordering and Entropy Minimization, Nemanja Djuric, and Slobodan Vucetic

¨ Communication-Efficient Distributed Multiple Reference Pattern Matching for M2M Systems, Jui-Pin Wang, Yu-Chen Lu, Mi-Yen Yeh, Shou-De Lin, and Phillip Gibbons

¨ Distributed Column Subset Selection on MapReduce, Ahmed Farahat, Ahmed Elgohary, Ali Ghodsi, and Mohamed Kamel

Short Papers:

¨ MLI: An API for Distributed Machine Learning, Evan Sparks, Ameet Talwalkar, Virginia Smith, Xinghao Pan, Joseph Gonzales, Tim Kraska, Michael Jordan, and Michael Franklin

¨ Efficient Invariant Search for Distributed Information Systems, Yong Ge, and Guofei Jiang

¨ Integrity Verification of Outsourced Frequent Itemset Mining with Deterministic Guarantee, Boxiang Dong, Ruilin Liu, and Wendy Hui Wang

¨ PerturBoost: Practical Confidential Classifier Learning in the Cloud, Keke Chen, and Shumin Guo

Session 5B: ClusteringII (Houston Ballroom B)

Session Chair: Emmanuel Müller

Session Time: 10:20-12:20

Regular Papers:

¨ Sparse K-Means with l_q(0<=q<=1) Penalty for High-Dimensional Data Clustering, Yu Wang, Xiangyu Chang, Rongjian Li, and Zongben Xu

¨ Active Density-based Clustering, Son T. Mai, Xiao He, Nina Hubig, Claudia Plant, and Christian Boehm

¨ Power to the Points: Validating Data Memberships in Clusterings, Parasaran Raman, and Suresh Venkatasubramanian

¨ Stochastic Blockmodel with Cluster Overlap, Relevance Selection, and Similarity-Based Smoothing, Joyce Jiyoung Whang, Piyush Rai, and Inderjit Dhillon

Short Papers:

¨ Classification-Based Clustering Evaluation, John Whissell, and Charles Clarke

¨ Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates, Sen Su, Xiang Cheng, Lixin Gao, and Jiangtao Yin

¨ Constrained Clustering: Effective Constraints Propagation with Imperfect Oracle, Xiatian Zhu, Chen Change Loy, and Shaogang Gong

¨ Most Clusters can be Retrieved with Short Disjunctive Queries, Vinay Deolalikar

Session 5C: Active/Metric Learning (Houston Ballroom C)

Session Chair: Jieping Ye

Session Time: 10:30-12:30

Regular Papers:

¨ Maximizing Expected Model Change for Active Learning in Regression, Wenbin Cai and Ya Zhang

¨ Kernel Density Metric Learning, Yujie He, Wenlin Chen, Yi Mao, and Yixin Chen

¨ Most-Surely vs. Least-Surely Uncertain, Manali Sharma and Mustafa Bilgic

¨ Active Matrix Completion, Shayok Chakraborty, Jiayu Zhou, Vineeth Balasubramanian, Sethuraman Panchanathan, Ian Davidson, and Jieping Ye

Short Papers:

¨ Active Query Driven by Uncertainty and Diversity for Incremental Multi-Label Learning, Sheng-Jun Huang and Zhi-Hua Zhou

¨ Efficient and Scalable Information Geometry Metric Learning, Wei Wang, Baogang Hu, and Zengfu Wang

¨ Online Active Learning with Imbalanced Classes, Zahra Ferdowsi, Rayid Ghani, and Raffaella Settimi

¨ Accelerating Active Learning with Transfer Learning, David Kale and Yan Liu

Tutorial 3: Social Media Mining: Fundamental Issues and Challenges

Speakers: Mohammad Ali Abbasi (ASU), Huan Liu (ASU), and Reza Zafarani (ASU)

Location: San Antonio B

Tutorial Time: 10:30 – 12: 30

Abstract:

Social media generates massive amounts of user-generated-content data. Such data differs from classic data and poses new challenges to data mining. This tutorial presents fundamental issues of social media mining, ranging from network representation to influence/diffusion modeling, elaborate state-of-the-art approaches of processing and analyzing social media data, and show how to utilize patterns to real-world applications, such as recommendation and behavior analytics. The tutorials designed for researchers, students and scholars interested in studying social media and social networks. No prerequisite is required for ICDM participants to attend this tutorial.

LUNCH and ICDM 2013 Community Meeting (12:30-13:40)

Location: Houston Pre-convene/Ballroom

Session 6A: Web Mining (Houston Ballroom A)

Session Chair: Leman Akoglu

Session Time: 13:40-15:40

Regular Papers:

¨ Semantic Frame-Based Document Representation for Comparable Corpora, Hyungsul Kim, Xiang Ren, Yizhou Sun, Chi Wang, and Jiawei Han

¨ Utilizing URLs Position to Estimate Intrinsic Query-URL Relevance, Xiaogang Han, Wenjun Zhou, Xing Jiang, Hengjie Song and Toyoaki Nishida

¨ TopicSketch: Real-time Bursty Topic Detection from Twitter, Wei Xie, Feida Zhu, Jing Jiang, Ee-Peng Lim, and Ke Wang

¨ Classifying Spam Emails using Text and Readability Features, Rushdi Shams and Robert Mercer

Short Papers:

¨ Discriminative Link Prediction using Local Links, Node Features and Community Structure, Abir De, Niloy Ganguly, and Soumen Chakrabarti

¨ Progression Analysis of Community Strengths in Dynamic Networks, Nan Du, Jing Gao and Aidong Zhang

¨ A Model for Discovering Correlations of Ubiquitous Things, Lina Yao, Quan Z Sheng, Byron Gao, and Anne Ngu

¨ Bayesian Multi-task Relationship Learning with Link Structure, Yingming Li, Ming Yang, Zhongang Qi, and Zhongfei (Mark) Zhang

Session 6B: Sequence/Time Series Analysis (Houston Ballroom B)

Session Chair: Zhenhui (Jessie) Li

Session Time: 13:40-15:40

Regular Papers:

¨ Enumeration of Time Series Motifs, Abdullah Mueen

¨ Modeling Temporal Adoptions Using Dynamic Matrix Factorization, Freddy Chong Tat Chua, Richard Oentaryo, and Ee-Peng Lim

¨ Online Estimation of Discrete Densities, Michael Geilke, Andreas Karwath, Eibe Frank and Stefan Kramer

¨ Time Series Classification Using Compression Distance of Recurrence Plots, Diego Silva, Vinicius Souza, and Gustavo Batista

Short Papers:

¨ Efficient Online Sequence Prediction with Side Information, Han Xiao

¨ Efficient Proper Length Time Series Motif Discovery, Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks and Chotirat Ratanamahatana

¨ Adaptive Model Tree for Streaming Data, Anca Zimmer, Michael Kurze and Thomas Seidl

¨ SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model, Pavel Senin and Sergey Malinchik

Session 6C: Bioinformatics and Medical Informatics (Houston Ballroom C)

Session Chair: Gaurav Pandey

Session Time: 13:40-15:40

Regular Papers:

¨ Regularization Paths for Sparse Nonnegative Least Squares Problems with Applications to Life Cycle Assessment Tree Discovery, Jingu Kim, Naren Ramakrishnan, Manish Marwah, Amip Shah, and Haesun Park

¨ A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics, Sean Whalen, and Gaurav Pandey

¨ Cox Regression with Correlation Based Regularization for Electronic Health Records, Bhanukiran Vinzamuri, and Chandan Reddy

¨ Extraction of Interpretable Multivariate Patterns for Early Diagnostics, Mohamed Ghalwash, Vladan Radosavljevic, and Zoran Obradovic

Short Papers:

¨ Transfer Learning Across Cancers on DNA Copy Number Variation Analysis, Huanan Zhang, Ze Tian, and Rui Kuang

¨ Exploring Patient Risk Groups with Incomplete Knowledge, Xiang Wang, Fei Wang, Jun Wang, Buyue Qian, and Jianying Hu

¨ Quantitative Prediction of Glaucomatous Visual Field Loss from Few Measurements, Zeng-Han Liang, Ryota Tomioka, Hiroshi Murata, Ryo Asaoka, and Kenji Yamanishi

¨ Statistical inference of protein “LEGO Bricks”, Arun Konagurthu, Arthur Lesk, David Abramson, Peter Stuckey, and Lloyd Allison

Session 6D: Feature Selection (San Antonio B)

Session Chair: Wei Ding

Session Time: 13:40-15:40

Regular Papers:

¨ Markov Blanket Feature Selection with Non-Faithful Data Distributions, Kui Yu, Xindong Wu, Zan Zhang, Yang Mu, Hao Wang, and Wei Ding

¨ Feature Transformation with Class Conditional Decorrelation, Xu-Yao Zhang

¨ Local and Global Discriminative Learning for Unsupervised Feature Selection, Liang Du, Zhiyong Shen, Peng Zhou, and Yi-Dong Shen

¨ An Unsupervised Algorithm for Learning Blocking Schemes, Mayank Kejriwal and Daniel Miranker

¨ The Pairwise Gaussian Random Field for High-Dimensional Data Imputation, Zhuhua Cai, Christopher Jermaine, Zografoula Vagena, Dionysios Logothetis, and Luis L. Perez

Short Papers:

¨ Group Feature Selection with Streaming Features, Haiguang Li and Xindong Wu

¨ Multitask Learning with Feature Selection for Groups of Related Tasks, Meenakshi Mishra, and Jun Huan

Tea/Coffee Break (15:40-16:00)

Location: Houston/San Antonio Pre-convene, 3rd floor

Session 7A: Models and Algorithms II (Houston Ballroom C)

Session Chair: Chandan Reddy

Session Time: 16:00-17:00

Regular Papers:

¨ Efficient Algorithms for Selecting Features with Arbitrary Group Constraints via Group Lasso, Deguang Kong and Chris Ding

¨ Bayesian Discovery of Multiple Bayesian Networks via Transfer Learning, Diane Oyen and Terran Lane

Short Papers:

¨ Large Scale Elastic Net Regularized Linear Classification SVMs and Logistic Regression, Balamurugan Palaniappan

¨ Walk 'n' Merge: A Scalable Algorithm for Boolean Tensor Factorization, Dora Erdos, and Pauli Miettinen

Session 7B: Applications II (Houston Ballroom B)

Session Chair: Rui Kuang

Session Time: 16:00-17:10

Regular Papers:

¨ Guiding Autonomous Agents to Better Behaviors through Human Advice, Gautam Kunapuli, Phillip Odom, Jude Shavlik, and Sriraam Natarajan

¨ Dynamic Pattern Detection with Temporal Consistency and Connectivity Constraints, Skyler Speakman, Yating Zhang, and Daniel Neill

Short Papers:

¨ On Good and Fair Paper-Reviewer Assignment, Cheng Long, Raymond Chi-Wing Wong, Yu Peng, and Liangliang Ye

¨ Coupled Heterogeneous Association Rule Mining (CHARM): Application toward Inference of Modulatory Climate Relationships, Doel L. Gonzalez II, Saurabh V. Pendse, Kanchana Padmanabhan, Michael P. Angus, Isaac K. Tetteh, Shashank Srinivas, Andrea Villanes, Fredrick Semazzi, Vipin Kumar, and Nagiza F. Samatova

¨ Prominent Features of Rumor Propagation in Online Social Media, Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang

ICDM Panel: Data Mining with Big Data (16:00 – 17:30)

Chair: Xindong Wu

Location: Houston Ballroom A

Panelists:
Chris Clifton, Program Director, US National Science Foundation

Vipin Kumar, (ACM and IEEE Fellow), University of Minnesota

Jian Pei (TKDE Editor-in-Chief), Simon Fraser University
Bhavani Thuraisingham (IEEE Fellow), University of Texas at Dallas
Geoff Webb (DMKD Editor-in-Chief), Monash University
Zhi-Hua Zhou (IEEE Fellow), Nanjing University

CLOSING SESSION (17:30 -18:00)

Location: Houston Ballroom A

ICDM’13 Technical Program at-a-glance

ICDM’13 Technical Program