Realtime Human Pose Estimation via Webcam: About “VisionPose” and How it Works

AI Pose Estimation AI Engine VisionPose Banner

Human pose estimation is a fascinating field in computer vision, and will have a great impact in our daily lives – be it security or the simplification of work tasks.

In the following I would like to introduce our in-house developed AI pose estimation engine “VisionPose”, as a possible way to implement human pose estimation into applications. I will go into detail on its different versions and its technical specifics, as well as how the technology has been applied to in the past by our customers.

By the end of this article, I believe you will be as excited about the possibilities of this technology as we are!

What is VisionPose?

Human pose estimation is a task in computer vision, where keypoints of the human body are detected and associated to describe the pose of a person. And this is exactly what VisionPose does:

The AI engine uses computer vision to detect human skeletal information without relying on cameras with depth sensors or any other equipment – but only by using simple webcams. VisionPose can perform pose estimation for real-time videostream, still images and videos, and detects up to 30 keypoints in the human body, including 25 joints and 5 facial key points. The detailed analysis results can be output in CSV format.

VisionPose is in-house developed and trained by NEXT-SYSTEM, using the bottom-up method to ensure that during real time detection the analysis speed does not drop even when multiple people are in the picture. It is provided in the form of an SDK (software development kit), with everything included that software engineers would need to implement the technology into their own products. To ensure that VisionPose can be used in a wide range of fields, it is provided for several different platforms, including Windows C# and C++, Linux and Unity.

Let me sum up VisionPose’s most distinctive points for you:

Detects 30 keypoints of the human body at max 60 FPS in real time
3D inference with 2 cameras or a single camera possible
Two ready-to-use apps included
1) Real-time pose estimation displayed on the video (source code included)
2) Pose estimation using electronic files of videos or still images
No usage restrictions, including commercial use
Multi platform

The real time keypoint detection speed is up to 60FPS, and with its AI-based image analysis technology, it performs pose estimation with high accuracy and low noise. Small children and people outdoors under direct sunlight can be detected as well, which would be difficult for conventional infrared cameras.

Comparison of Pose Estimation with VisionPose (right) and a Depth Sensor Camera (left)

VisionPose is designed for general purpose use, so it is trained for postures that are taken in daily life. NEXT-SYSTEM took around 7,000 pictures of our employees in all kinds of poses and angles in front of a green screen, and then created tens of thousands of data by adding more patterns, such as synthesizing backgrounds.Keypoints were then added on all of these images by hand, which made a high accuracy level possible.

Shooting data for additional learning — Shooting Data for Additional Learning

Let me now get to the different versions of the VisionPose series.

Introducing the VisionPose Lineup

The VisionPose SDK comes in two versions: “VisionPose Standard,” which performs 3D inference with two cameras, and “VisionPose Single3D,” which performs 3D inference with a single camera.

In the following I will short;u summerize the different features of the two versions.

VisionPose Standard SDK

VisionPose Standard is, as the name suggests, the Standard version of VisionPose. Using two cameras, it can detect up to 30 skeletal keypoints for each detected body for 2D or 3D inference, and since it detects even the tips of the feet and hands, very detailed data results can be obtained. The user can perform 2D inference with one camera, and 3D Inference by using two cameras. As of April 2022, NEXT-SYSTEM has released VisionPose Standard for Windows C#, Windows C++ and Linux.

30 keypoints that can be detected by VisionPose Standard

For more details, including pricing, see the VisionPose Standard official website.

VisionPose Single3D

VisionPose Single3D detects up to 17 skeletal points of the whole body in 3D, and 30 keypoints in 2D by using only one camera. This makes it easier to utilize the system for game development and VR/AR development, and any other kind of motion capture project. As of April 2022, VisionPose Single3D is released for Windows Unity only.

For more details, see the VisionPose Single3D for Windows Unity official website.

Introducing the Included Sample Apps

Both versions of VisionPose include two sample applications, which can be used immediately after activation of the SDK:

Real-time camera image analysis sample application BodyAndColor and still image and video file analysis application VP Analyzer. Let me briefly summarize what they can do.

BodyAndColor

This is a real-time skeleton visualization application, which visualizes the skeletons with color-coded lines for each part of the body based on the coordinate data of the skeletons obtained from webcam images.

Visualization of Skeletons as Color Lines Through BodyAndColor

In VisionPose Single3D, BodyAndColor is topped with model data for a 3DGC character, MICHICO, which was created and designed in-house by NEXT-SYSTEM. So the app can immediately be used to reflect the motion acquired from the camera onto the 3DCG character in real time.

Motion Capture Sample App “BodyAndColor with MICHICO”

VP Analyzer

This file analysis application detects skeleton keypoints from videos and still images. The user can obtain highly accurate skeletal information data from pre recorded video and still image files, and the obtained results can be output in form of a video and in CSV format.

File Analysis Application “VP Analyzer”

Still image file analysis is possible for JPG, PNG, BMP files, and the result output is available as JPG or PNG. For video files, the user can input AVI, MP4, WMV, MOV files, and will receive the result file in AVI format.

Who is using VisionPose and for what?

So now that you know everything about VisionPose’s technical background, let’s continue to the more practical side of the topic, and take a look at how the SDK has been used in the past.

With a wide range of application fields, such as motion analysis in sports and fitness, workflow analysis and hazard detection in factories, safety surveillance in child- and nursing care, motion caption in entertainment and gaming, VisionPose is currently used by a total of 250 clients, including several major Japanese companies. Let me give you a few of the most outstanding examples.

Toyota Motor Corporation

Toyota Motor Corporation used VisionPose for their rehabilitation robot “Welwalk WW-2000”, which utilizes pose estimation to assist rehabilitation of lower limbs that were paralyzed by strokes or other injuries. The Welwalk displays information on a large monitor in front of patient, including a gait analysis guidance function. Patients can practice a natural walking style, with the aim of helping them to walk unaided again, while also relieving physical duties of physiotherapist, such as lifting patients and catching them when they fall.

Toyota Motor Corporation’s Welwalk WW-2000

You can also find more detailed information on this exciting example in the Toyota Times!

Sports Science Laboratory

The lab used VisionPose to analyze their database of over 4,000 baseball players to identify the relationship between pitching motion and pitching disorders. Through this they created a large-scale video database, which they then implemented into an app that helps athletes to prevent injury as well as improve their performance. This database and technology will be indispensable for future sports medicine research.

NEC Solution Innovators, Ltd.

NEC Solution Innovators, Ltd. implemented VisionPose into their face recognition packaged software “Bio-IDiom KAOATO”. This system collates a person taken by video camera with the face image registered and determines whether they are the same person. VisionPose helped to increase the accuracy of authentication and to enhance the anti-spoofing functions.

Apart from these examples, the VisionPose SDK has also been applied in fields of musculoskeletal analysis, video motion analysis systems, and even in arcade games and dance evaluation!

By the way, NEXT-SYSTEM has used VisionPose for in-house development as well, to create for example motion capture app “MICHICON-Plus”, and workout counter app “IETORE”, which are both available for free in the iOS app store.

The app “IETORE” uses VisionPose pose estimation to automatically count the number of repetitive movements, such as squats.

So as you can see, VisionPose is truly an all-arounder, being used in such a vast range of fields and industries.

From here on out

This type of skeletal detection and tracking technology will be utilized even more in the coming future. Improvements to the technology made its possible for it to be implemented in various ways, and it is attracting attention as a tool that will play an important role in the development of our society from hereon out. Here are a few examples of the fields where rapid development is underway:

Athletes’ movement analysis during competition, such as form check and scoring assistance
Analysis of consumer behavior by analyzing movements inside the store
Detecting suspicious behavior and shoplifting through pose estimation via surveillance cameras
Accident detection and work supervision in manufacturing sites

The last one is actually something that NEXT-SYSTEM has been working on in-house: we have developed a behavior recognition and analysis system, that utilizes VisionPose pose estimation to identify the behavior of a person: “VP-Motion”. The system is user-trainable by labeling the behaviors to be identified, thus the system is very adaptive to a wide range of work environments.

The system can help to spot accidents (e.g. a person falling down), monitor routine workflows (e.g. in factories) by detecting behavior that differs from routines, as well as detect suspicious behavior (e.g. shoplifting).

VP-Motion Introduction Video

The system is currently still under development, but a release is planned for the near future.

As you can see, VisionPose has laid the groundwork for the general implementation of AI pose estimation, and we are excited for everything that is about to come.

Find more information about VisionPose on the official website.

And for fellow developers: NEXT-SYSTEM provides a 30-day Free Trial of the VisionPose SDK, with all functions included. You can apply for it on the VisionPose website!

About NEXT-SYSTEM Co., Ltd.
NEXT-SYSTEM is a technology company founded in Japan, Fukuoka City in 2002, and since then has been focused on the research of behavior analysis through AI technology, ergonomic system development and development of cutting-edge systems, such as xR (AR/VR/MR), and the development and sales of their Pose Estimation AI Engine “VisionPose” and AR Signage System “Kinesys”.

Company Website

Table of Contents

Realtime Human Pose Estimation via Webcam: About “VisionPose” and How it Works

Published by NEXT-SYSTEM on 02/07/202202/07/2022

What is VisionPose?