Spatial Data Management

Overview

Professor Shahabi and the InfoLAB conduct pioneering research in areas related to data management (including query processing and analysis), data integration, data mining and machine learning, geospatial data management, and large-scale rendering and visualization.

Selected Research in Transportation

Traffic Forecasting

A spatiotemporal network is a spatial network (e.g., road network) along with the corresponding time-dependent weight (e.g., travel time) for each edge of the network. The design and analysis of policies and plans on spatiotemporal networks require realistic models that accurately represent the temporal behavior of such networks. We build a traffic modeling framework for road networks that enables:

  1. generating an accurate temporal model from archived temporal data collected from a spatiotemporal network (so as to be able to publish the temporal model of the spatiotemporal network without having to release the real data)
  2. augmenting any given spatial network model with a corresponding realistic temporal model custom-built for that specific spatial network (in order to be able to generate a spatiotemporal network model from a solely spatial network model).

We used the proposed framework to generate the temporal model of the Los Angeles County freeway network and publish it for public use.

We have built the technologies for solving traffic prediction problems using the traffic sensor datasets from the IMSC TransDec platform. Due to thorough sensor instrumentations of the road network in Los Angeles as well as the vast availability of auxiliary commodity sensors from which traffic information can be derived (e.g., CCTV cameras, and GPS devices), a large volume of real-time and historical traffic data at very high spatial and temporal resolutions have become available. Therefore, how to mine valuable information from these data is important. We have piloted the studies of traffic prediction for individual road segments using such large datasets. We utilized the spatiotemporal behaviors of rush hours and events to perform an accurate prediction of both short-term and long-term average speed on road-segments, even in the presence of infrequent events (e.g., accidents). By utilizing both the topology of the road network and sensor dataset, we overcame the sparsity of our sensor dataset and extend the prediction task to the entire road network. We also addressed the problems related to the impact of traffic incidents. We developed a set of methods to predict the dynamic evolution of the impact of incidents.

We then study the online traffic prediction problem. One key challenge in traffic prediction is how much to rely on prediction models that are constructed using historical data in real-time traffic situations, which may differ from that of the historical data and can change over time. To overcome this challenge, we propose a novel online framework that learns from the current traffic situation (context) in real-time and predicts the future traffic by matching the current situation to the most effective prediction model trained using historical data. As real-time traffic data arrive, the traffic context space is adaptively partitioned to efficiently estimate the effectiveness of each base predictor in a different situation.

Traffic Forecasting Selected Publications

  • Yu, R., Li, Y., Shahabi, C., Demiryurek, U., & Liu, Y. (2017). Deep learning: A generic approach for extreme condition traffic forecasting. In Proceedings of the 2017 SIAM International Conference on Data Mining (pp. 777-785). Society for Industrial and Applied Mathematics. (Best paper Runner-up)
  • Deng, D., Shahabi, C., Demiryurek, U., Zhu, L. (2017). Situation Aware Multi-Task Learning for Traffic Prediction. In Proceedings of the 2017 IEEE International Conference on Data Mining series (ICDM).
  • Deng, D., Shahabi, C., Demiryurek, U., Zhu, L., Yu, R., & Liu, Y. (2016, August). Latent Space Model for Road Networks to Predict Time-Varying Traffic. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1525-1534).
Path Planning

In path planning, the turn-by-turn directions provided in existing navigation applications are exclusively derived from underlying road network topology information. Therefore, the turn-by-turn directions are simplified as a metric translation of physical world (e.g., distance/time to turn) to spoken language. Such translation – that ignores human cognition of the geographic space is often verbose and redundant for the drivers who know the geographical areas. We built a Personalized RoutE Guidance System dubbed PaRE – with which the goal is to generate more customized and intuitive directions based on user-generated content. PaRE utilizes a wealth of user-generated historical trajectory data to extract namely “landmarks” (e.g., points of interests or intersections) and frequently visited routes between them from the road network. The extracted information is used to obtain cognitive customized directions for each user. We formalized this task as a problem of finding the optimal partition for a given route that maximizes the familiarity while minimizing the number of segments in the partition and developed two efficient algorithms to solve it. We applied our solution to both real and synthetic trajectory datasets to evaluate the performance and effectiveness of PaRE.

Path Planning Selected Publications

  • Li, Y., Su, H., Demiryurek, U., Zheng, B., He, T., & Shahabi, C. (2017, April). PaRE: A System for Personalized Route Guidance. In Proceedings of the 26th International Conference on World Wide Web (WWW) (pp. 637-646).
  • Demiryurek, U., Banaei-Kashani, F., & Shahabi, C. (2010, November). A case for time-dependent shortest path computation in spatial networks. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 474-477).

Selected Research in Health

Quantifying Performance Status with Wearables and Mobility Sensors

Performance Status (PS) of a patient is a clinical assessment that is key to understanding the patient’s fitness for the next treatment and is a critical determinant of how a patient responds to treatment. However, current PS evaluation is subjective, and patient-physician disagreement is associated with a 16% increase in the risk of death.

The Analytical Tools to Objectively Measure Human Performance (ATOM-HP) is a pilot project aimed at quantifying PS to complement the current clinical assessment. The goal is to obtain a more objective evaluation of PS by tracking routine activities with consumer-grade biometric sensors and cameras and translating them into quantitative performance measures. In addition, an app designed for daily patient symptom reporting from a cell phone will collect daily ratings of fatigue, appetite, physical activity, sleep disturbance, and body weight.

ATOM-HP Primary Objectives are to: (1) Identify predictors of Hospitalization / Urgent interventions / Dose Delays (Mission Failure), (2) Identify predictors of performance status score, and (3) Identify predictors of fatigue score.

Selected Publications

  • Minh N.B. Nguyen, Zaki Hasnain, Sanjay Purushotham, Luciano Nocera, Paul K. Newton, and Cyrus Shahabi, Mining Human Mobility to Quantify Performance Status, The 17th IEEE International Conference on Data Mining (ICDM2017) (demonstration), New Orleans, LA, USA, November 18-21, 2017.
  • Minh Nguyen, Sanjay Purushotham, Hien To, and Cyrus Shahabi, m-TSNE: A Framework for Visualizing High-Dimensional Multivariate Time Series, VAHC2016 Workshop on Visual Analytics in Healthcare in conjunction with AMIA 2016, Chicago, IL, USA, November 12 – 16, 2016.
  • Minh Nguyen, Liyue Fan, and Cyrus Shahabi, Activity Recognition Using Wrist-Worn Sensors for Human Performance Evaluation, The Sixth Workshop on Biological Data Mining and its Applications in Healthcare in conjunction with the 14th IEEE International Conference on Data Mining (ICDM 2015), Atlantic City, New Jersey, USA, November 14 – 17, 2015
Point of Care Mobility Monitoring

Reliable mobility assessment is an essential tool to diagnose or optimize treatment in persons affected by mobility disorders, e.g., for musculo-skeletal disorders.

With the Point of Care Mobility Monitoring project (PoCM2 project we have built a system to automatically assess mobility using a single 3D sensor (Microsoft Kinect). PoCM2 aimed at (1) validating the system ability to assess mobility and (2) predict the medication state of Parkinson’s disease patients while using a relatively small number of motion tasks.

A key component of the PoCM2 system is a graph-based feature extraction technique that captures the dynamic coordination between parts of the body while providing results that are easier to interpret than those obtained with other data-driven approaches. The PoCM2 was able to differentiate between medication state (on or off medication) on a dual-task walking action, i.e., walking in a figure-of-eight pattern while counting backward, for which the Kinect sensor seems to be the most reliable.

Selected Publications

  • Jiun-Yu Kao, Minh Nguyen, Luciano Nocera, Cyrus Shahabi, Antonio Ortega, Carolee Winstein, Ibrahim Sorkhoh, Yu-chen Chung, Yi-an Chen, and Helen Bacon, Validation of Automated Mobility Assessment using a Single 3D Sensor, ACVR2016-Fourth International Workshop on Assistive Computer Vision and Robotics in conjunction with ECCV2016, Amsterdam, The Netherlands, October 9th, 2016.
  • Banaei-Kashani F, Medioni G, Nguyen K, et al. PoCM2: Monitoring Mobility Disorders At Home Using 3D Visual Sensors and Mobile Sensors. In: Wireless Health; Baltimore, MD; 2013.
  • Wang R, Medioni G, Winstein C, Blanco C. Home Monitoring Musculo-Skeletal Disorders with a Single 3D Sensor. In: International Workshop on Human Activity Understanding from 3D Data in conjunction with IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Portland, OR; 2013.

*The complete list of publications can be found here.

Selected Research in Media

Geo-Tagged Mobile Media Modeling

More and more visual data (image, video, AR) are tied to geographical locations. For example, surveillance camera monitoring may not have any meaning without its associated location information. Combining visual data with its location coordinates can provide an effective way to index and search videos, especially when a repository handles an extensive amount of data. DSRC has been studied how to utilize spatial properties of visual data for efficient data management.

We define “geo-tagged media content” as images and videos with geo-reference (all data related geospatial properties of the content). Especially, our approach is to represent the media content based on the geospatial properties of the region it covers, so that large video collections can be indexed and searched effectively. We refer to this area as the viewable scene in 2D (or viewable space in 3D) of the video scene. We model the viewable space of a scene with parameters such as the camera location, the angle of the view, and the camera direction. The camera’s viewable scene changes when the camera moves or rotates. This dynamic scene information has to be acquired from sensor-equipped cameras (i.e., smartphone with MediaQ App), stored within an appropriate catalog or schema and indexed for efficient querying and retrieval (i.e., MediaQ server). This model effectively converts challenging computer vision problem in image/video search into known spatio-temporal database problem, which greatly enhances the performance of image/video data management. By considering related spatial metadata, more relevant and precisely delimited search results can be obtained. Using the models, novel approaches for querying videos based on the notion of spatial property which becomes a critical search criterion in many applications.

Selected Publications in Geo-Tagged Mobile Media Modeling

  • Ying Lu and Cyrus Shahabi. Efficient Indexing and Qerying of Geo-tagged Aerial Videos. In Proceeding of the 25th ACM SIGSPATIAL GIS, 2017
  • Ying Lu, Cyrus Shahabi, and Seon Ho Kim. Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica, Volume 20, Issue 4, pp 829-857, October 2016.

Selected Research in Smart Cities

Spatial Crowdsourcing

The increases in communication and computational performance of mobile devices, coupled with advances in sensor technology, have led to exponential growth in data collection and sharing by smartphones. In exploiting the mobility of such a large volume of potential users, a new mechanism for efficient and scalable data collection has emerged, namely, spatial crowdsourcing (SC). In this paradigm, requesters outsource their spatiotemporal tasks (tasks associated with location and time) to a set of workers, who perform the tasks by physically traveling to the tasks’ locations. SC has applications in numerous domains such as journalism, tourism, security, disaster response, environmental monitoring, and urban planning.

We first coined the term “Spatial Crowdsourcing” in a paper published at ACM SIGSPATIAL in 2012. With this award, we expanded SC research to the point that the term is commonplace; for example, there is now an entry for spatial crowdsourcing in the International Encyclopedia of Geography published in 2017 by John Wiley & Sons. Moreover, several industries, such as Microsoft, have now spatial crowdsourcing research groups, and new startups in this area have proven commercially successful such as Uber, TaskRabbit, and Waze. Since 2013, the number of papers published on this topic has grown exponentially, and, recently, the flagship conference in databases, VLDB (Very Large DataBases) Conference, even added a tutorial on SC.

In this research project, we focused on mitigating two major impediments to the success of SC: scalability and trust. For scalability, task assignment and scheduling were identified as the main bottlenecks of the system, and spatial aspects of the tasks were exploited to reduce the complexity of assignment. Our initial solution was to batch the tasks and assign them to workers using a weighted bipartite matching mechanism and, then, apply travelling salesman dynamic programming solutions to schedule each worker’s tasks. However, we realized the matching becomes time consuming. Therefore, we investigated a cloud-based, distributed approach to the matching problem. Our final result in scalable task assignment and scheduling for SC is the design of an online, auction-based framework for assigning tasks to workers. This includes matching tasks to workers and computing a schedule for each worker. Our initial approaches for task assignment in SC cannot scale as either task matching or task scheduling becomes a bottleneck. In the auction-based framework, though, upon the arrival of a new task, each worker locally performs task scheduling for itself and submits a bid to the server. Then, the server chooses the optimal bid and assigns the task to the corresponding worker. Consequently, the assignment responsibility is split between the workers (for scheduling) and the server (for matching), thereby eliminating all bottlenecks.

For the second objective, trust, we first focus on designing techniques for the requesters to trust the tasks performed by the workers. To build this trust, we developed reputation mechanisms for the workers and allow a requester to ask for a task to be performed by multiple (trusted) workers redundantly to increase the trustworthiness of the results. However, we later concentrated on the more interesting aspect of the problem: workers and requesters trusting the spatial crowdsourcing platform itself. Our task-assignment solutions require the locations of the workers and/or the tasks to be disclosed to untrusted parties (SC server) for effective assignments of tasks to workers. We first devised approaches based on differential privacy to protect workers’ locations during task assignment while maximizing the number of assigned tasks. Our final result in assuring privacy in SC is the design of a three-stage framework for matching a set of arriving tasks to workers in an online manner (consistent with our online, auction-based framework) while protecting the privacy of the locations of both tasks and workers. We protect this privacy by perturbing location data using geo-indistinguishability principles.

Selected Publications in Spatial Crowdsourcing

  • Hien To, Gabriel Ghinita, Liyue Fan, and Cyrus Shahabi. Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing. IEEE Transactions on Mobile Computing (TMC 2016), 2016
  • Hien To, Liyue Fan, Luan Tran, and Cyrus Shahabi. Real-Time Task Assignment in Hyperlocal Spatial Crowdsourcing under Budget Constraints. IEEE International Conference on Pervasive Computing and Communications (PerCom 2016), 2016
  • Leyla Kazemi, Cyrus Shahabi, and Lei Chen. GeoTruCrowd: Trustworthy Query Answering with Spatial Crowdsourcing. International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013), 2013

Cyrus Shahabi

Professor of Computer Science, Electrical Engineering and Spatial Sciences and the Director of the Information Laboratory (InfoLAB) at the Computer Science Department

Information Laboratory

The mission of InfoLab is to investigate new approaches to the management of unconventional data types within atypical architectures.

Visit InfoLab

IMSC is a research center that focuses on data-driven solutions for real-world applications by applying multidisciplinary research in the area of data science.