Neural Databases

Description

Neural Databases (NeuroDBs) improve data access efficiency and accuracy by using neural networks to store data. we have focused on designing NeuroDB to answer range aggregate queries (RAQs), where the query is to return an aggregation of an attribute (e.g., avg. sales) from a database given a range predicate (e.g., for a given time period and location). For example, the figure below (left) shows a dataset of location signals of individuals, and the duration they stayed in that location (color represents visit duration in hours). Middle figure shows that the query of average visit duration for a 50m by 50m rectangle with bottom left corner at a user-specified geo-coordinate can be represented by a query function that takes as input the geo-coordinate of the rectangle and outputs the average visit duration of data points in the corresponding rectangle. Figure on the right shows a neural network learned to approximate the query function. Then, given query geo-coordinates, NeuroDB uses this learned neural network to answer the query.



  • Dataset of locations


  • True query function


  • Learned NeuroDB


We have studied this problem in two settings:

 

  • We studied how to improve the efficiency of answering RAQs. The goal here is to improve efficiency of query answering in real-world database systems, where RAQs are a common building block. We show this approach can provide significant practical advantages and we theoretically study the benefits of using such an approach. We also extend our theoretical analysis to analyze learned indexes to understand why and when they perform well.  
  • We studied how to accurately answer spatial count queries (SCQs, which are a subset of RAQs) (e.g., number of people at a location) while preserving differential privacy. Since such queries ask information about location of individuals, answering them must be done while ensuring privacy of the individuals is not violated. The goal here is to improve accuracy of the queries while preserving user’s privacy. We've studied this problem in both spatial and spatiotemporal settings. 

Papers

  • S. Zeighami, R. Ahuja, G. Ghinita, and C. Shahabi, A Neural Database for Differentially Private Spatial Range Queries, Proceedings of the VLDB Endowment 15 (5), 2022 [link]
  • S. Zeighami, C. Shahabi, V. Sharan. NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks. Proceedings of the ACM on Management of Data, 2023. [link]
  • R. Ahuja, S. Zeighami, G. Ghinita, and C. Shahabi, A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy. Proceedings of the ACM on Management of Data, 2023. [link]
  • S. Zeighami, C. Shahabi, On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing, Proceedings of the 40th International Conference on Machine Learning, 2023.[link]

People

Students



  • Sepanta Zeighami


  • Raghav Seshadri


  • Ritesh Ahuja (Graduated Aug/2022, now at Oracle)


Collaborators



  • Gabriel Ghinita

Principal Investigator



  • Cyrus Shahabi

Sponsors

 

IMSC is a research center that focuses on data-driven solutions for real-world applications by applying multidisciplinary research in the area of data science.