What is Smart Video Analytics?

 In order to understand what is “Smart” Video Analytics, let’s try to understand what is Video Analytics first. Video Analytics, as the name suggests is the method of extracting meaningful and impactful information from either pre-recorded videos or live video streams. Video Analytics can primarily be broadly classified into two classes:

  • Video Surveillance:​ It is the process of ensuring compliance with certain rules and regulations. Eg. CCTV Surveillance used for traffic law enforcement.
  • Video Insights:​ Extracting meaningful insights from videos, primarily for commercial gain. Eg. Biometric attendance and determination of customer demographics.

Video Analytics, it itself, can also be done manually, by hiring a group of people to stare at thousands of hours of live or recorded feed and determine anomalies or insights. However, the question arises, is that the best and most cost-effective solution? The answer is “No”. There are two main reasons behind it:

  • Human Attention:​ Human beings are bad at continuous attention to detail. It is very likely that the person will lose track of the subject in question after a given duration of continuous manual video analytics.
  • Scale:​ Assume, One person can perform manual video analytics on at the most two feeds simultaneously. If the feed runs 24-7, there need to be at least 3 shifts of people working for the same. Now, if we try to scale up to a hundred feeds, we will need to hire 50×3=150 people just to stare at videos all day long. This is simply not a viable option for large scale analytics.

The solution to the above problem is ​Computer Driven Smart Video Analytics​. With the recent progress in the quantity of available data, computation hardware improvements and

Artificial Intelligence-based algorithms for Image and Video-based analytics, Smart Video Analytics is the most cost-effective solution to the given problem.

Let us discuss the different use-cases of Smart Video Analytics and how they can drastically influence product development based on real-time insights from video data.

Facial Recognition based Biometric Attendance

One of the key areas where Artificial Intelligence and Computer Vision have outdone human beings is in the field of biometric attendance.

Traditionally, tracking attendance would mean to manually enter or sign a register, which is clumpy, difficult to maintain and can easily be tampered with. A few common challenges are discussed below:

More often than not, many employees may misreport their timings, which may go unverified due to human error.

  • Keeping a track of physical registers containing the attendance of all the employees for a large organization is difficult. Manual digitization of the same would mean a separate workforce is required to be hired just to maintain the records and is a challenge in itself.
  • As a solution to this, biometric attendance came into the picture, more specifically, fingerprint recognition and verification. The state of the Art models provides an accuracy of up to 98.60% (source)​ .  This improved on most of the above problems, but it brought into light a set of new challenges to overcome, as mentioned below:

Fingerprint verification requires physical contact with the sensor, hence it can lead to hygiene concerns since every employee is required to touch the same device.

  • Fingerprint verification is an intrusive process, this means the person in question is required to manually go and register his fingerprint for attendance. This may not be convenient for large organizations since this may lead to the formation of queues.
  • Most fingerprint verification devices fail if the fingertips of the person are either dirty or wet. This is a major challenge during the rainy season.

Hence, from the above, it can be concluded that even though fingerprint verification is a big improvement over traditional methods, it comes with its own set of challenges. The next big improvement over fingerprint-based biometrics is ​Facial Recognition​ based biometrics. Over recent years, state-of-the-art Facial Recognition Algorithms have shown accuracy as high as 99.63%±0.09​. This is claimed by ​Google’s Facial Recognition Algorithm, FaceNet​. With such high levels of accuracy, let us look at how Facial Recognition Technology improves over previous attempts:

  • It is completely digitized, and hence can easily be tracked and maintained in a digital database.
  • Since the entire process is automated, it is difficult to tamper with and is not susceptible to human error.
  • There is no physical contact with the equipment. Hence it does not raise any hygiene concerns.
  • It is a non-intrusive process. A camera mounted at an appropriate angle at the entrance can easily capture the face of the person during entry. The employees are not required to queue up to mark their attendance.
  • State of the art facial recognition algorithms are robust enough for them to correctly identify the person in question even with variations in facial hair patterns.

Customer Insights

One key new application of Smart Analytics is the ability to gain useful information on customers who walk into a physical retail store, restaurant, hotel, etc. This information can then be used for the following:

  • Improvement of a service based on customer emotion and demographics
  • Targeted advertising and product placement
  • Reorganization of store commodities based on customer interest
  • Determine Quality of Service (QoS) by calculating queue wait-time or service time
  • Combined customer analytics for retail kiosks

These services require 24-7 monitoring and are based on the determination of emotion, age, gender, ethnicity, movement heatmap, calculation of wait-time, etc. Almost all of these components are nearly impossible to achieve through manual video monitoring.

Improvement of a service based on customer emotion and demographics

Customer demographics refer to the information related to the age, gender, ethnicity, etc. of the customer. Deep Learning algorithms trained on a large dataset of people can easily generate this information from live video feeds of the customer where his/her face is clearly visible.

Let us look at how the information on demographics can be used for Business Development: ● Age:​ For certain types of businesses, identifying the most frequently visiting age group can be significantly useful for promotions and targeted advertisements.

  • Gender:​ Similar to age, gender plays a role in knowing the customer base and developing products accordingly.
  • Emotion:​ Customer’s emotional response and its change over time allow the store manager to determine how well the staff is serving the customers. It can be an indirect measure to determine the performance of the staff. It also becomes easier for the supervisor to respond to bad customer experience.

Targeted advertising and product placement

Product placement and store layout are of paramount importance in a retail scenario. It by itself has the ability to persuade customers to buy more. Additionally, the location of product advertisements across the store is also quite important. Let us see how these can be solved with Artificial Intelligence based Video Analytics.

Smart Video Analytics can provide the store owner with heatmaps at an hourly, daily, monthly or even yearly basis, which shows the regions where the customer tends to visit more often and stay longer. This would allow the store manager to improve sales by doing the following:

  • Tune the product placement so that higher priority items are placed nearer to the hotspots.
  • Target advertisements such that more people see it and are influenced by it.
  • With the changing product demand at a seasonal rate, the store manager can modify the product stack based on demand and location as shown by the heatmap.

Determine Quality of Service (QoS) by calculating queue wait-time or service time

Quality of Service (QoS) is one of the key factors in customer retainment. However, traditionally, there is no clear way to measure and determine the average time that is required to complete the billing and checkout process at a store. This problem can easily be tackled with real-time Artificial Intelligence-based Video Analytics. Let us see how that can be achieved:

  • State-of-the-art Object Detection models can detect people standing in the queue in real-time with very high precision. Object Detection Models like YOLOv3​ provide a very well-balanced detector for objects.
  • Once the person is detected, he/she can be tracked using Deep Learning based Object Trackers. Each such tracked customer can be given an ID.
  • A Region of Interest (ROI) can be defined for the region in front of the service desk/area.
  • A queue wait-time or service time then can be calculated algorithmically based on the duration of the tracked object staying inside the selected ROI.

Combined customer analytics for retail kiosks

Retail kiosks primarily display targeted advertisements or serve products. They are a significant investment for the owner. Hence, it is important to justify their effectiveness in sales. This can be achieved by a combination of all of the discussed video analytics tools:

  • The age, demographics data of the customer can be used to determine which age-group or gender-group to target.
  • The emotion of the customer can explain what kind of experience he/she had while transacting at the kiosk.
  • The wait-time will draw insights upon how long the customer spends in-front of the kiosk.
  • Heatmaps of the area around the kiosks will show what is the peak time when there are more customers. It will also show what fraction of the number of customers actually takes interest in the kiosk.

Traffic Video Surveillance

Another potentially untapped application of Smart Video Analytics is Traffic Video

Surveillance. Enforcing traffic rules is a big challenge. It requires large manpower and even then falls short of being properly enforced. Some mundane parts of these enforcements can easily be done by applying Artificial Intelligence based algorithms on Traffic CCTV feed.

Let’s discuss two of the most common applications of Smart Video Analytics for traffic law enforcement.

  • Enforcement of Traffic Stop at a Stop sign or Red Traffic Light
  • Enforcement of wearing of Helmet by two-wheeler riders

Enforcement of Traffic Stop at a Stop sign or Red Traffic Light

By law, Stop signs require the vehicle to slow down and stop before moving forward. These are usually regulated in unmanned smaller intersections, where there is a chance of the vehicle not stopping and hence breaking the law. This may go unnoticed since these intersections are rarely monitored and may lead to accidents.

Artificial Intelligence can easily help enforce traffic stop at a stop sign by determining if the vehicle had slowed down and stopped or not:

  • A range of techniques such as Motion Detection with Area of object filtering, or real-time vehicle detection using YOLOv3 (or alike) Object Detection Algorithms can be used to detect a moving car.
  • After the object is detected, it has to be tracked using a Deep Learning based tracking algorithm, as we had discussed earlier for person detection and tracking for determination of wait-time.
  • With detection and tracking in place, pixel mapping and displacement can be used to determine the relative speed of the vehicle, with that we can determine if the vehicle has stopped at the Stop Sign or not.
  • Additionally, this same series of algorithms can be used to determine if a vehicle is speeding.
  • After determining if the driver has violated the stop sign, if needed, we can recognize the license plate of the vehicle as well.
  • There are several State-of-the-art License plate recognition solutions available, such as OpenALPR and Plate Recognizer. However, if you have access to the data, developing a license plate recognition model is not too challenging. It can be roughly done using the following steps:

○ First, detection and localization of the license plate: Any object detection model can be used for this purpose, and as we have used before, our preference would be to use the YOLOv3 Object Detection Model.

○ After the plate is successfully detected, the Region of Interest (ROI) can be extracted and with the use of image processing techniques along with a deep learning-based classifier, the License Plate numbers can be extracted and be sent directly to the concerned authorities.

Enforcing the wearing of Helmet by two-wheeler riders

Another interesting application of traffic surveillance is to enforce the wearing of helmets by two-wheeler riders (motorcycle riders and bikers, for example).

Looking at some statistics, as shown in the graph below, the number of motorcycle-related accidents have been increasing over the years.

Out of an estimated 148,000 deaths between 1966 and 2008, in 2008, only in 59% of fatal accident cases, the rider was wearing a helmet. This means a solid 41% of fatalities were caused due to not wearing a helmet, and according to NHTSA, wearing helmets leads to 37% increased chances of survival during a possibly fatal accident. (source: Wikipedia​ & NHTSA)

Hence, the proper enforcement of such laws is in the need of the hour. Since, due to the sheer scale, it is not possible to have human traffic enforcement agents guarding every street 24×7. Thus automatic detection and reporting of any such violations using Artificial Intelligence based Video Analytics is a potential savior. Let us see in brief how it can be achieved:

  • Again, using state-of-the-art Object Detection models, our first task would be to detect the two-wheeler rider, his/her helmet, and License Plate.
  • If a helmet is detected, then well and good. He/she is following the traffic norms and no further action is to be taken.
  • In case a helmet is not detected, then the next step is to extract the license plate information.
  • To achieve this, we may choose to use existing state-of-the-art models, like the ones that have already been mentioned (​OpenALPR​ and ​Plate Recognizer​), or we may choose to build it ourselves.
  • Finally, once the license plate information has been extracted, it can be forwarded to the concerned authority, and further action can be taken against the defaulter(s).


Retail/Bank Surveillance – Weapons and Robbery

Finally, one of the key applications of Artificial Intelligence-based Video Analytics is that it can silently detect an event and take some pre-configured actions. One key area where this is of immense importance is in detecting an ongoing armed robbery or detecting an individual with a gun threatening another individual.

Early or real-time detection of such events can allow a silent alarm to be triggered which informs the nearest law enforcement officers immediately. Additionally, in certain cases, like that of bank robbery, a loud alarm can also be triggered.

There are a couple of methods to solve this problem with Artificial Intelligence:

  • One approach is to detect the presence of any weapon or any identity-concealing mask/helmet in the video feed.
  • The other approach is to use advanced networks to detect violent or suspicious behavior.


Detecting the presence of Guns

This, even though from a technology point of view, might sound like a simple problem statement, it is exceptionally challenging to do. This is primarily because guns (handguns) are quite small and are not clearly visible over CCTV feed. Hence the algorithm may find it difficult to detect.

However, recent development in generating ​Artificially Generated Synthetic Datasets by companies like Edgecase.ai have shown significant promise in improving the accuracy of such weapon detection models.

As for algorithms, ​Detectron 2​ by ​Facebook and MaskRCNN, originally by Facebook’s AI​ research team, have shown great promise in such applications.  Detection of Violent or Suspicious Behaviour 

Being able to detect Violent or Suspicious Behaviour can potentially save lives by preventing incidents before they happen. This is a rather emerging field in Deep Learning and is currently under research.

Practical implementations are somewhat limited due to comparatively low model accuracy, high computational requirements, and a high False Positive rate. However, with time, this can develop into a valuable asset towards Smart Video Surveillance.


In this writeup, we have just scratched the surface of what is possible using Artificial Intelligence-based Video Analytics. However, the possibilities are endless. With the rate of increase in computational power with widespread research on ​Computer Vision​, state-of-the-art algorithms are becoming more accurate and cheaper to implement than hiring people for the same task.

Leave a comment

Your email address will not be published. Required fields are marked *