An Overview of On-going Projects

EAGER: Collaborative Research: Sequential Recommender Systems in Mobile and Pervasive Environments

Sponsored by National Science Foundation (NSF), 2012 - 2014

http://datamining.rutgers.edu/project/eager.htm

Individuals on the move, e.g., tourists on a sightseeing trip in an unfamiliar city often find themselves overwhelmed by the challenges of coping with unfamiliar environments. This presents a need for tools and methods that will guide them by providing them useful recommendations while they are "on the move." Recent advances in mobile and sensor-based technologies have made it possible to collect and process location traces across many different mobile applications. Such data, when combined with other spatio-temporal, contextual, and user-specific information can, in principle, be used to generate useful recommendations for individuals on the move. This exploratory research project formulates and explores a novel variant of recommender systems, namely, mobile sequential recommender systems for mobile users where each recommendation takes into account the trajectory and history of past recommendations, as one of selecting a sequence of locations to recommend under a set of spatio-temporal, contextual, and privacy constraints. More detials ...


Enhancing the Capacity for Information Assurance Education Through Interdisciplinary Collaboration

Sponsored by National Science Foundation (NSF), 2012 - 2014

http://datamining.rutgers.edu/project/due.htm

This project is increasing Rutgers University's capacity to produce highly trained information assurance (IA) professionals by developing new interdisciplinary degree programs at both the graduate and undergraduate levels. A unique aspect of the effort is that it addresses the dependability of the information and information services, as well as the big data and cloud computing infrastructure, in an integrated manner. More detials ...


MILAN: Multi-Modal Passive Intrusion Learning in Pervasive Wireless Environments

Sponsored by National Science Foundation (NSF), 2010 - 2013

http://datamining.rutgers.edu/project/milan.htm

The widespread deployment of wireless communication systems creates unprecedented opportunities to impact our daily lives. Regardless of whether wireless infrastructures are used just for communication or as the basis for actual responses, large-scale wireless data provide increasing opportunities for detecting environmental changes caused by moving objects. Indeed, it is expected to develop the ability to make use of existing wireless infrastructure and sensing data to track moving objects which do not carry radio devices and may not even being aware of being tracked. However, these wireless data are dynamic and have complex data characteristics, such as multi-scale, multi-source and multi-modal. As these data become large and more detailed, new challenges are emerging for intrusion learning. This project aims to develop effective and scalable multi-modal passive intrusion learning techniques that have the capability to detect and track device-free moving objects in pervasive wireless environments through adaptive learning in a collaborative way. More detials ...


Financial Fraud Detection with Data Mining Techniques

Recent years have witnessed increased interests in financial fraud detection and prevention. This is driven by the ever-worsening financial crisis and an increased awareness of the importance of financial risk management. Indeed, financial losses due to fraudulent financial statement are very significant. A number of high-profile companies, such as Enron, Lucent, Xerox, and WorldCom, were committed fraud by the U.S. Securities and Exchange Commission (SEC). It is very critical to develop an effective and efficient financial fraud detection framework for the best interest of investors, auditors, regulators, and governments.

The wide availability of fine-grained financial data, such as financial statements and stock transactions, enables unprecedent opportunities to change the computing paradigm for financial fraud detection and prevention. However, as these financial data become more detailed and multi-dimensional, it becomes ever more difficult for analysts to sift through the data even though it may contain valuable information. Data Mining holds great promise to address this challenge by providing efficient techniques to uncover useful information hidden in the large data repositories. Along this line, in this project, we investigate the characteristics of misstating firms by exploiting financial statements of these misstating firms. The results of this investigation will lead to a set of financial fraud indicators which will be, in turn, used for building a financial fraud prediction models.


Energy-Efficient Knowledge Discovery in Location Traces

The increasing availability of large-scale location traces and car sensing data creates unprecedent opportunities to change the paradigm for knowledge discovery in transportation systems. A particularly promising area is to extract energy-efficient transportation patterns (green knowledge), which can be used as the guidance for reducing inefficiencies in energy consumption of transportation sectors. However, extracting green knowledge from location traces is not a trivial task. Conventional data analysis tools might not be suitable for handling the massive quantity, complex, dynamic, and distributed nature of location traces. To that end, in this project, we propose to develop an analytical foundation for extracting energy-efficient transportation patterns from location traces. Specifically, we have the initial focus on the following challenging tasks. First, we will profile the driver behaviors according to the driving patterns identified from driving traces. Second, we will find correlations between road topologies and the energy use. Third, we will identify seasonal adjustment frequently used segments of trajectories. Finally, we will exploit data analysis techniques to identify abnormal traffic discontinuities/gaps.


Customer Service Support with Multi-focal Learning

In this study, we formalize a multi-focal learning problem, where training data are partitioned into several different focal groups and the prediction model will be learned within each focal group. The multi-focal learning problem is motivated by numerous real-world learning applications. For instance, for the same type of problems encountered in a customer service center, the problem descriptions from different customers can be quite different. The experienced customers usually give more precise and focused descriptions about the problem. In contrast, the inexperienced customers usually provide more diverse descriptions. In this case, the examples from the same class in the training data can be naturally in different focal groups. As a result, it is necessary to identify those natural focal groups and exploit them for learning at different focuses. The key developmental challenge is how to identify those focal groups in the training data. As a case study, we exploit multi-focal learning for profiling problems in customer service centers. The results show that multifocal learning can significantly boost the learning accuracies of existing learning algorithms, such as Support Vector Machines (SVMs), for classifying customer problems.