question

YulongWuSYEEM-5468 avatar image
0 Votes"
YulongWuSYEEM-5468 asked YulongWuSYEEM-5468 commented

Audio and color image are not synced

Hi there, we have purchased lots of Azure Kinect DK for make some recordings, include the audio and video. I use this library to record the RGB stream and the audio stream. However, the resulted audio and video are not strictly sync'ed. Given a recording of 1.5 hour, generally the video is shorter by around 2 seconds than the audio.

Based on this, I suspect that the generation of audio and video are based on different clocks. Am I right?

I also checked that endpoint descriptor of the USB connection of the Kinect, to check out the data transfer type. And I found that
- Audio use Isochronous type, to be more specific, it is asynchronous.
- RGB image stream use both Isochronous and bulk.

I think probably I am right, otherwise the outputed audio/video should be well sync'ed.

Looking forward to your reply.

Thanks so much!



azure-kinect-dk
· 8
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Please help answer, as it is quite urgent!

0 Votes 0 ·

Hi @YulongWuSYEEM-5468 ,

I am not sure if this Github repo(https://github.com/bibigone/k4a.net) is useful for Azure Kinect DK? Please correct me if I am missing any details over here.

130788-image.png


Have you referred to this documentation?: https://docs.microsoft.com/en-us/azure/kinect-dk/azure-kinect-recorder

130861-image.png


0 Votes 0 ·
image.png (75.1 KiB)
image.png (52.9 KiB)

Hi @SatishBoddu-MSFT , thx so much for your response. In response to your comment:
1. I believe the issue captioned has nothing to do with this library, for two reasons:
1) I only use this library to record the data from the color camera, and this library uses the native k4arecord.dll to achieve this. This library is simply a wrapper.
2) Even if I do not use this library, I did the following experiment. With the official Kinect DK recorder, I made a recording of a duration of 1:30:34.23 (standard time), the recording has 162965 frames. Given fps=30, the recording itself have a duration of 1:30:32.17 (162965/30 seconds), which is more than 2 seconds shorter than the true duration.

Sure, I have almost read all of articles related to Azure Kinect. The official recorder cannot record audio, but we have to record audio and the color image at the same time, so we cannot use this recorder.


Thanks again for your response, and looking forward to your follow-up.

0 Votes 0 ·

Thanks for the information, I have escalated it internally, will get back to you soon.

0 Votes 0 ·

Hi @YulongWuSYEEM-5468 , Below is the response from the product team.

The mic array and camera subsystems are entirely separate pieces of hardware with no synchronization support. You may be interested in developing your own synchronization code on the host PC.

Please refer to this GitHub: The saved data of the mic array is not synced with that of the color camera.

Please comment in the below section for further help on this topic.

0 Votes 0 ·

Hi Satish, thanks so much for your reply. Actually the github issue is also submitted by me.

In addition, as we have to do the synchronization ourselves, we need to consider some other situations:

  1. Is there the possibility that some frames of the video are dropped inside the device? If there exists the frame droppings, but the host computer does not know it, then we cannot do the synchronization. This question also applies for the audio samples.

  2. The audio samples and video frames are basically seemingly evenly distributed along the timeline. However, we know every clock has its own personalities, and it is pretty sure that there must exist clock frequecy changes with time. I know there are PLL to stablize the clock frequency, but I also believe there definitely exists clock frequency drift along time. So how much could this frequency drift impact the production of video frames and audio samples. I think I may need some specifications related to each clock.

Thanks so much and looking forward to your reply.



1 Vote 1 ·

Oh thank you for confirming the Github issue!

On the first point, I need to check as I too had the same question!
Let me see if I can find any information on the "frame dropping issue".

0 Votes 0 ·
Show more comments
QuentinMiller-3866 avatar image
1 Vote"
QuentinMiller-3866 answered YulongWuSYEEM-5468 commented

The audio is not synced with the RGB, depth, and IMU. It is infeasible to detect the audio frame drops by the timestamp system applied to RGB, depth, and IMU. Because the IMU sample rate is much higher than RGB and depth, it would be possible to use the IMU stream to detect RGB and/or depth frame drops.

· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks @QuentinMiller-3866. Just for your reference, every rgb frame has a timestamp created by the Kinect when the frame is generated. I have checked that basically the timestamp is evenly distributed along the timeline, so I think it is possible to detect the frame dropping for the video frame by checking the timestamp increment.

0 Votes 0 ·

This is true for RGB but not true for depth if you sync the RGB and depth cameras. The time the depth frame is taken depends on the exposure time of the RGB camera so that the exposure of the depth camera is approximately 1/3 into the exposure of the RGB camera. Auto exposure of RGB will move this around based on changes to the lighting conditions.

0 Votes 0 ·

Will auto exposure of RGB make the fps vary with time?

0 Votes 0 ·

In addition, may I ask how long is the exposure time for the color camera? What I concern is: if the host computer receives a video frame, then when is this frame's exposure time? 33ms earilier than when it arrives at the host PC? Or less than 33ms?

0 Votes 0 ·
QuentinMiller-3866 avatar image
0 Votes"
QuentinMiller-3866 answered QuentinMiller-3866 edited

@YulongWuSYEEM-5468 the RGB fps is not impacted by the auto exposure. The max exposure is capped at the time between frames permitted by the selected fps. The time between depth frames may vary as when syncing the RGB and depth cameras the center of exposure of the depth camera is 1/3 in to the RGB exposure time. The average fps will be the selected fps.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.