Agora

Description

Agora is a novel distributed end-to-end system which efficiently computes and provides location popularity within the Los Angeles County (LAC) in real time. Agora partitions LAC into fine grained grid cells, analyzes millions of records of spatial data streams from diverse online sources and computes the popularity of each cell. The computation of the popularity measure is based on three metrics: frequency which counts the total number of visits across all system users to a specific location, diversity which counts the total number of unique users who visited a given location and location entropy.

A sub component of the system collects, filters and analyzes geotagged social media data. Currently, Agora is integrated with four data sources (i.e., Twitter, Flickr, Yelp and Gowalla) to gather historic and real-time streaming social data. In order to compute popularity of different locations, the data sources must contain:

  • real location data, i.e., longitude and latitude. This information is enough to find the spatial cell it belongs to and calculate the frequency of that cell.
  • user information that can uniquely identify a user, i.e., user id.

The data is stored in our IMSC MongoDB Cluster. Within 6 months of operation, Agora has collected the following data within LAC:

  • 16,239,239 Twitter Data
  • 1,188,660 Flickr Data
  • 191,651 Yelp Businesses Data
  • 7,113 Gowalla Data

IMSC is a research center that focuses on data-driven solutions for real-world applications by applying multidisciplinary research in the area of data science.