An Overview of On-going Projects

IIS: Collaborative Research: Harnessing Big Data for Improving Career Mobility

Sponsored by National Science Foundation (NSF), 2020 - 2023

http://datamining.rutgers.edu/project/iis_2.htm

U.S. college students are facing critical challenges for their career development and job mobility, which is vital for their long-term career success, especially during global pandemic times. Indeed, the questions that often puzzle students include what career choices to choose next, how to update skills for future new jobs, and which learning opportunities to take. These challenges have been increasingly observed among different groups of students in different majors and socioeconomic statuses at many universities. This project collects and analyzes academic curriculum and student career data, discovers useful patterns about college curriculum and students’ career development, studies students’ career choices, and develops sophisticated solutions to improve their career mobility. This study makes significant contributions to the fields of data mining, machine learning, and education and career data analytics. The results of this project can bring new ways for understanding and improving college graduates’ career success, provide useful insights and tools for students to make their decisions on career development, and augment the service capability of college career and academic advising offices. This project integrates the research with education through new course module development, involving graduate and undergraduate students in research, and research showcases for local K-12 students. More detials ...


EAGER: Collaborative Research: Substructure-aware Spatiotemporal Representation Learning

Sponsored by National Science Foundation (NSF), 2020 - 2022

http://datamining.rutgers.edu/project/iis_1.htm

Spatiotemporal networked data are an essential representation of information about critical infrastructures such as transportation networks, power grids, and social networks. The evolving vehicle mobility of a transportation network can locate the source of traffic jams. The dynamic retweet keywords of a social network may inform a novel disease outbreak. This project will develop novel techniques to equip machines with automated and precision characterization with spatiotemporal networks. The main novelty of this project will be in its ability to preserve substructure patterns in characterization of spatiotemporal networked data. By recognizing and characterizing these substructure patterns such as a subnetwork of traffic jam, a subnetwork of overload or outage, computers can better extract semantics, forecast trends, and detect anomalies, which are important for operations, management, and defense of critical infrastructures. In transportation operations, the developed techniques have the potential to change how civil engineers identify the behavioral factors and surrounding features of precursor to crashes, fatalities, and accidents. For the researchers of power grid management, the automated and precision characterization approaches can help to inform the counter measures and characteristics of outrage events, such as generation loss, large load, series capacitor fault, and line trip. In public health and pandemics, the substructure awareness will enable the fast and early detection of novel diseases from subtle natural language patterns in social networks. More detials ...


IIS: A Multi-source Data Driven Optimization Framework for Interconnected Express Delivery System Design and Inventory Rebalance

Sponsored by National Science Foundation (NSF), 2018 - 2020

http://datamining.rutgers.edu/project/IIS.htm

The inter-connected express delivery system is very needed for many emerging applications, such as public bike rental service, electric car sharing service, and fresh product delivery. The successful deployment of inter-connected express delivery systems can greatly improve transportation, energy saving, food supply, and urban sustainability. Compared with traditional delivery systems, the inter-connected express delivery system has the following unique characteristics: (1) each station covers a small service area; (2) all stations are internally connected because they can act as inventories or suppliers to each other. There are two fundamental research challenges for the development of the inter-connected express delivery system: how to decide the station locations for a given area and how to timely rebalance the inventories among stations. It is very important to address these fundamental challenges in order to make the inter-connected express delivery system more effective, efficient and sustainable. This project aims to develop a data driven solution for solving these challenges. This study will advance the field of inter-connected express delivery system, expand the curricular content of data mining and optimization, and train undergraduate and graduate students. More detials ...


Enhancing the Capacity for Information Assurance Education Through Interdisciplinary Collaboration

Sponsored by National Science Foundation (NSF), 2012 - 2014

http://datamining.rutgers.edu/project/due.htm

This project is increasing Rutgers University's capacity to produce highly trained information assurance (IA) professionals by developing new interdisciplinary degree programs at both the graduate and undergraduate levels. A unique aspect of the effort is that it addresses the dependability of the information and information services, as well as the big data and cloud computing infrastructure, in an integrated manner. More detials ...


MILAN: Multi-Modal Passive Intrusion Learning in Pervasive Wireless Environments

Sponsored by National Science Foundation (NSF), 2010 - 2013

http://datamining.rutgers.edu/project/milan.htm

The widespread deployment of wireless communication systems creates unprecedented opportunities to impact our daily lives. Regardless of whether wireless infrastructures are used just for communication or as the basis for actual responses, large-scale wireless data provide increasing opportunities for detecting environmental changes caused by moving objects. Indeed, it is expected to develop the ability to make use of existing wireless infrastructure and sensing data to track moving objects which do not carry radio devices and may not even being aware of being tracked. However, these wireless data are dynamic and have complex data characteristics, such as multi-scale, multi-source and multi-modal. As these data become large and more detailed, new challenges are emerging for intrusion learning. This project aims to develop effective and scalable multi-modal passive intrusion learning techniques that have the capability to detect and track device-free moving objects in pervasive wireless environments through adaptive learning in a collaborative way. More detials ...


Financial Fraud Detection with Data Mining Techniques

Recent years have witnessed increased interests in financial fraud detection and prevention. This is driven by the ever-worsening financial crisis and an increased awareness of the importance of financial risk management. Indeed, financial losses due to fraudulent financial statement are very significant. A number of high-profile companies, such as Enron, Lucent, Xerox, and WorldCom, were committed fraud by the U.S. Securities and Exchange Commission (SEC). It is very critical to develop an effective and efficient financial fraud detection framework for the best interest of investors, auditors, regulators, and governments.

The wide availability of fine-grained financial data, such as financial statements and stock transactions, enables unprecedent opportunities to change the computing paradigm for financial fraud detection and prevention. However, as these financial data become more detailed and multi-dimensional, it becomes ever more difficult for analysts to sift through the data even though it may contain valuable information. Data Mining holds great promise to address this challenge by providing efficient techniques to uncover useful information hidden in the large data repositories. Along this line, in this project, we investigate the characteristics of misstating firms by exploiting financial statements of these misstating firms. The results of this investigation will lead to a set of financial fraud indicators which will be, in turn, used for building a financial fraud prediction models.


Energy-Efficient Knowledge Discovery in Location Traces

The increasing availability of large-scale location traces and car sensing data creates unprecedent opportunities to change the paradigm for knowledge discovery in transportation systems. A particularly promising area is to extract energy-efficient transportation patterns (green knowledge), which can be used as the guidance for reducing inefficiencies in energy consumption of transportation sectors. However, extracting green knowledge from location traces is not a trivial task. Conventional data analysis tools might not be suitable for handling the massive quantity, complex, dynamic, and distributed nature of location traces. To that end, in this project, we propose to develop an analytical foundation for extracting energy-efficient transportation patterns from location traces. Specifically, we have the initial focus on the following challenging tasks. First, we will profile the driver behaviors according to the driving patterns identified from driving traces. Second, we will find correlations between road topologies and the energy use. Third, we will identify seasonal adjustment frequently used segments of trajectories. Finally, we will exploit data analysis techniques to identify abnormal traffic discontinuities/gaps.


Customer Service Support with Multi-focal Learning

In this study, we formalize a multi-focal learning problem, where training data are partitioned into several different focal groups and the prediction model will be learned within each focal group. The multi-focal learning problem is motivated by numerous real-world learning applications. For instance, for the same type of problems encountered in a customer service center, the problem descriptions from different customers can be quite different. The experienced customers usually give more precise and focused descriptions about the problem. In contrast, the inexperienced customers usually provide more diverse descriptions. In this case, the examples from the same class in the training data can be naturally in different focal groups. As a result, it is necessary to identify those natural focal groups and exploit them for learning at different focuses. The key developmental challenge is how to identify those focal groups in the training data. As a case study, we exploit multi-focal learning for profiling problems in customer service centers. The results show that multifocal learning can significantly boost the learning accuracies of existing learning algorithms, such as Support Vector Machines (SVMs), for classifying customer problems.