A scalable machine-learning platform for predicting billboard impressions
The Neuron
- Customer case
- Data Engineering
- Data projects
The Neuron provides a programmatic bidding platform to plan, buy and manage digital Out-Of-Home ads in real-time. They asked us to predict the number of expected impressions for digital advertising on billboards in a scalable and efficient way.
Our approach
Our work consisted of three parts: setting up a data lake in AWS, processing data, and developing, training and implementing a machine-learning model.
Data Lake in AWS
We started with setting up a data lake. You can store large amounts of structured and unstructured data with a data lake. Nowadays, organisations use a wide variety of applications that generate large amounts of data in various formats.
For The Neuron, we set up a data lake with S3 for data storage, Glue Catalog for metadata management and Glue Jobs (managed Apache Spark jobs) for data processing. The data lake is divided into three layers:
- Bronze for raw, unprocessed data.
- Silver for processed data.
- Gold for fully processed and enriched data.
By using serverless AWS components, we ensured good scalability and stability of the platform and lower operational costs. Using Terraform, we developed and implemented all infrastructure with Infrastructure as Code (IaC).
Collect data from cameras
Each billboard is equipped with a camera. The continuous stream of images is collected by a third-party service that uses object detection algorithms to measure the number of people and vehicles passing by.
The raw data is extracted from this service every 5 minutes and stored in a CSV format in the bronze layer of the data lake. The data is then processed and stored in an Apache Parquet format in the silver layer of the data lake. As a final step, the data is aggregated so that it is ready for use within the Machine Learning model. The aggregated data is then stored in the gold layer of the data lake. All processing steps are performed using Apache Spark.
Model development, training and implementation
Het doel van het project was om het aantal billboard-impressies over een The project's objective was to be able to predict the number of billboard impressions over a certain period. To increase the model's accuracy, each billboard was given its own trained version of the model. This meant training a separate model for a total of 140 billboards.
Result
We used AWS Sagemaker to train multiple Machine Learning models simultaneously. The resulting models were stored in S3. The models were then made available via a REST API, from where predictions are retrieved and made available to the bidding platform.
Expected impressions for the next twenty minutes are predicted for each billboard and presented to potential buyers via the exchange platform.
Future
Besides integrating data from cameras, we set data pipelines to process and make data available regarding weather conditions around billboards. In the future, this data can contribute to a further improvement in the accuracy with which the number of expected impressions is predicted.
Want to know more?
Joachim will be happy to talk to you about what we can do for you and your organisation as a data partner.
Business Manager+31(0)20 308 43 90+31(0)6 23 59 83 71joachim.vanbiemen@digital-power.com
Receive data insights, use cases and behind-the-scenes peeks once a month?
Sign up for our email list and stay 'up to data':