Overview of Data Lake Architecture
- Build Data Lake to perform Analytics to collect data from different countries Servers, IoT, Social Media, Click Streams and logs to research Product discovery, Product recommendations, and New Product Requirements.
- Build Data lake and Data Warehouse for Real-Time & Batch Data Processing for Social Media Analytics, IoT Analytics, Image Analytics, Recommendation System using clickstream, Data Warehouse ETL operations.
Challenges for Building the Data Pipeline
- Server Data have specified format to pull data with DCD format of the file which consists of 16 XML’s.
- Monitoring of Stores and refrigerators with IoT device with Data Pipeline to collects data from IoT devices, run analytics and detect anomalies in data received from the sensors.
- Setup Data Pipeline to collect Real-Time data from the Social Media with hashtags for Sentimental and Intent Analytics.
- Recommendation system to collect clickstream from the Web and Mobile application.
- Product Search and Discovery Data Scraping.
- Data Ingestion from ERP Solution for there Vendors.
Solution Offered for Building the Analytics Platform
The solutions for Building the Analytics Platform are the following:
Real-Time Social Media Analytics
Data collection from Real-Time tweets from the Twitter API and scrapping of API’s with filter specific keyword, hashtag, language, and location. Python to collect data from Twitter through Twitter API’s and transfer to Google Cloud Pub/Sub. Google Cloud App Engine to deploy the application. Data from Pub/Sub consumed in Cloud DataFlow for the further cleaning, transformation and sent to the Data Lake BigQuery.
Real-Time IoT Analytics Platform
Sensor Data from IoT devices at different warehouses with Refrigerator installed at various places to collected data. Different IoT Devices configured with Google IoT Core using MQTT bridge. Google Pub/Sub used as a messaging queue and Google Cloud DataFlow for the transformation and cleaning. Cleaned data sent to the Data Lake BigQuery for the further Analytics.
Real-Time Clickstream Analytics
ClickStream Analytics used for product Recommendation System. Real-Time Clickstream data is captured using Google Cloud Function with an HTTP request as the trigger and collected data sent to Google Pub/Sub. Before performing the Data Analytics with BigQuery, the data gets cleaned and transformed using Cloud DataFlow.
Sales Analytics Platform
A portal where store manager uploads the data file in DCD format. On the backend, convert the file into the CSV, and publish the data to Cloud Pub/Sub for the further processing. Cloud DataFlow used for the data cleaning and necessary data transformation. After these transformations, the data sent to the BigQuery and Bigtable(for Cache).
Technology Stack
- Cloud App Engine
- Cloud Pub/Sub
- Cloud IoT Core
- Cloud Function
- Cloud DataFlow
- BigQuery
- BigTable
- DataLab
- Data Studio
Real-Time IoT Recommendation
Real-Time Recommendation System extracts Live ClickStreams and performs Apache Spark/ Apache Kafka Streaming on it through Revamp, Refine and Supervise. Executes Queries in NoSQL and produces Live Recommendations. Similarly, Batch Processing or Historical Data ClickStreams extraction results in Live Recommendations. IoT combined with Recommendation gives a better analysis of users engrossment. Role of a Recommendation Engine is to execute injunctions. Recommendations filter the content and display the data which appeals to the particular user. Recommendations fall under two category-
- Characteristic Based Recommendation include keywords, categories.
- User Based Recommendation include Ratings, likes, followers.
Product Recommendation System involves analysis and showing items that user would like to purchase. Recommendation Systems are capturing the markets and flooding every application with suggestions based on Content Based, Knowledge Based, Hybrid, Demographic and Utility-Based Filtering. The recommendation is a form of personalization, but not vice versa.
Real-Time IoT Recommendation Examples Include -
- Social Media Recommendations covering Facebook, YouTube, Instagram
- Music and E-Commerce Sites and Applications
- Google Search and Voice Application
- IoT Sensor Devices
- Google Maps and Cab Applications