monado/src/xrt/drivers/ht
2021-11-08 08:11:26 -06:00
..
templates d/ht: Switch to new get_hand_tracking signature and update tracking 2021-10-02 23:09:36 +01:00
ht_algorithm.cpp d/ht: split ht_algorithm into source and header 2021-11-08 13:56:37 +00:00
ht_algorithm.hpp d/ht: split ht_algorithm into source and header 2021-11-08 13:56:37 +00:00
ht_driver.cpp d/ht: move DEBUG_GET_ONCE_LOG_OPTION to ht_driver.cpp (NFC) 2021-11-08 08:11:26 -06:00
ht_driver.hpp d/ht: move DEBUG_GET_ONCE_LOG_OPTION to ht_driver.cpp (NFC) 2021-11-08 08:11:26 -06:00
ht_hand_math.cpp d/ht: split ht_hand_math into source and header 2021-11-08 13:56:37 +00:00
ht_hand_math.hpp d/ht: split ht_hand_math into source and header 2021-11-08 13:56:37 +00:00
ht_image_math.cpp d/ht: split ht_image_math into source and header 2021-11-08 13:56:37 +00:00
ht_image_math.hpp d/ht: split ht_image_math into source and header 2021-11-08 13:56:37 +00:00
ht_interface.h d/ht: Switch to new get_hand_tracking signature and update tracking 2021-10-02 23:09:36 +01:00
ht_models.cpp d/ht: split ht_models into source and header 2021-11-08 13:56:37 +00:00
ht_models.hpp d/ht: split ht_models into source and header 2021-11-08 13:56:37 +00:00
ht_nms.cpp d/ht: split ht_nms into source and header 2021-11-08 13:56:37 +00:00
ht_nms.hpp d/ht: split ht_nms into source and header 2021-11-08 13:56:37 +00:00
readme.md d/ht: Change everything 2021-09-03 21:06:18 +00:00

What is this?

This is a driver to do optical hand tracking. The actual code mostly written by Moses Turner, with tons of help from Marcus Edel, Jakob Bornecrantz, Ryan Pavlik, and Christoph Haag. Jakob Bornecrantz and Marcus Edel are the main people who gathered training data for the initial Collabora models.

Currently, it works with the Valve Index. In the past, it was tested with a Luxonis 1090ffc, and in the future it should work fine with devices like the T265, Leap Motion Controller (w/ LeapUVC), or PS4/PS5 cam, should there be enough interest for any of those.

Under good lighting, I would say it's around as good as Oculus Quest 2's hand tracking. Not that I'm trying to make any claims; that's just what I honestly would tell somebody if they are wondering if it's worth testing out.

How to get started

Get dependencies

Get OpenCV

Each distro has its own way to get OpenCV, and it can change at any time; there's no specific reason to trust this documentation over anything else.

Having said that, on Ubuntu, it would look something like

sudo apt install libopencv-dev libopencv-contrib-dev

Or you could build it from source, or get it from one of the other 1000s of package managers. Whatever floats your boat.

Get ONNXRuntime

I followed the instructions here: https://onnxruntime.ai/docs/how-to/build/inferencing.html#linux

then had to do

cd build/Linux/RelWithDebInfo/
sudo make install

Get the ML models

Make sure you have git-lfs installed, then run ./scripts/get-ht-models.sh. Should work fine.

Building the driver

Once onnxruntime is installed, you should be able to build like normal with CMake or Meson.

If it properly found everything, - CMake should say

-- Found ONNXRUNTIME: /usr/local/include/onnxruntime

[...]

-- #    DRIVER_HANDTRACKING: ON

and Meson should say

Run-time dependency libonnxruntime found: YES 1.8.2

[...]

Message: Configuration done!
Message:     drivers:  [...] handtracking, [...]

Running the driver

Currently, it's only set up to work on Valve Index.

So, the two things you can do are

  • Use the survive driver with both controllers off - It should automagically start hand tracking upon not finding any controllers.
  • Use the vive driver with VIVE_USE_HANDTRACKING=ON and it should work the same as the survive driver.

You can see if the driver is working with openxr-simple-playground, StereoKit, or any other app you know of. Poke me (Moses) if you find any other cool hand-tracking apps; I'm always looking for more!

Tips and tricks

This tracking likes to be in a bright, evenly-lit room with multiple light sources. Turn all the lights on, see if you can find any lamps. If the ML models can see well, the tracking quality can get surprisingly nice.

Sometimes, the tracking fails when it can see more than one hand. As the tracking gets better (we train better ML models and squash more bugs) this should happen less often or not at all. If it does, put one of your hands down, and it should resume tracking the remaining hand just fine.

Future improvements

  • Get more training data; train better ML models.
  • Improve the tracking math
    • Be smarter about keeping tracking lock on a hand
    • Try predicting the next bounding box based on the estimated keypoints of the last few frames instead of blindly trusting the detection model, and not run the detection model every single frame.
    • Instead of directly doing disparity on the observed keypoints, use a kinematic model of the hand and fit that to the 2D observations - this should get rid of a lot of jitter and make it look better to the end user if the ML models fail
    • Make something that also works with non-stereo (mono, trinocular, or N cameras) camera setups
  • Optionally run the ML models on GPU - currently, everything's CPU bound which could be dumb under some circumstances
  • Write a lot of generic code so that you can run this on any stereo camera
  • More advanced prediction/interpolation code that doesn't care at all about the input frame cadence. One-euro filters are pretty good about this, but we can get better!