定位相機Locatable camera

HoloLens 包含掛接在裝置前方的全球面向相機,可讓應用程式查看使用者看到的內容。HoloLens includes a world-facing camera mounted on the front of the device, which enables apps to see what the user sees. 開發人員可以存取和控制攝影機,就像在 smartphone、筆記本電腦或桌上型電腦上的彩色攝影機一樣。Developers have access to and control of the camera, just as they would for color cameras on smartphones, portables, or desktops. 適用于行動裝置和桌上型電腦的相同通用 windows media capture 和 windows Media foundation api 可在 HoloLens 上運作。The same universal windows media capture and windows media foundation APIs that work on mobile and desktop work on HoloLens. Unity 也將這些 Windows api 包裝 成可在 HoloLens 上以簡單的方式使用相機來進行工作,例如,將一般相片和影片 (與或不含全像影像) ,以及在場景中找出相機的位置和觀點。Unity has also wrapped these windows APIs to abstract simple usage of the camera on HoloLens for tasks such as taking regular photos and videos (with or without holograms) and locating the camera's position in and perspective on the scene.

裝置相機資訊Device camera information

HoloLens (第一代) HoloLens (first-generation)

  • 已修正焦點相片/影片 (PV) 攝影機(具有自動白平衡、自動曝光和完整影像處理管線)。Fixed focus photo/video (PV) camera with auto white balance, auto exposure, and full image processing pipeline.

  • 當相機處於作用中狀態時,世界上的白色隱私權 LED 將會發亮White Privacy LED facing the world will illuminate whenever the camera is active

  • 攝影機支援下列模式 (所有模式都是16:9 的外觀比例) 30、24、20、15和 5 fps:The camera supports the following modes (all modes are 16:9 aspect ratio) at 30, 24, 20, 15, and 5 fps:

    影片Video 預覽Preview Still 水準視圖 (H-FOV) Horizontal Field of View (H-FOV) 建議用法Suggested usage
    1280x7201280x720 1280x7201280x720 1280x7201280x720 45deg45deg 使用影片穩定) (預設模式(default mode with video stabilization)
    N/AN/A N/AN/A 2048x11522048x1152 67deg67deg 最高解析度仍為影像Highest resolution still image
    1408x7921408x792 1408x7921408x792 1408x7921408x792 48deg48deg Overscan (填補影片穩定之前的) 解析度Overscan (padding) resolution before video stabilization
    1344x7561344x756 1344x7561344x756 1344x7561344x756 67deg67deg 使用 overscan 的大型 FOV 影片模式Large FOV video mode with overscan
    896x504896x504 896x504896x504 896x504896x504 48deg48deg 影像處理工作的低電源/低解析度模式Low power / Low resolution mode for image processing tasks

HoloLens 2HoloLens 2

  • 自動聚焦相片/影片 (PV) 攝影機(具有自動白平衡、自動曝光和完整影像處理管線)。Auto-focus photo/video (PV) camera with auto white balance, auto exposure, and full image processing pipeline.

  • 當相機處於作用中狀態時,就會導致世界各地的白色隱私權燈亮著。White Privacy LED facing the world will illuminate whenever the camera is active.

  • HoloLens 2 支援不同的相機設定檔。HoloLens 2 supports different camera profiles. 瞭解如何 探索並選取攝影機功能Learn how to discover and select camera capabilities.

  • 攝影機支援下列設定檔和解析度 (所有的影片模式都是16:9 的外觀比例) :The camera supports the following profiles and resolutions (all video modes are 16:9 aspect ratio):

    設定檔Profile 影片Video 預覽Preview Still 畫面播放速率Frame rates 水準視圖 (H-FOV) Horizontal Field of View (H-FOV) 建議用法Suggested usage
    舊版、0 BalancedVideoAndPhoto、100Legacy,0 BalancedVideoAndPhoto,100 2272x12782272x1278 2272x12782272x1278 15、3015,30 64.6964.69 高品質的影片錄製High quality video recording
    舊版、0 BalancedVideoAndPhoto、100Legacy,0 BalancedVideoAndPhoto,100 896x504896x504 896x504896x504 15、3015,30 64.6964.69 適用于高品質相片捕捉的預覽串流Preview stream for high quality photo capture
    舊版、0 BalancedVideoAndPhoto、100Legacy,0 BalancedVideoAndPhoto,100 3904x21963904x2196 64.6964.69 高品質的相片捕捉High quality photo capture
    BalancedVideoAndPhoto,120BalancedVideoAndPhoto,120 1952x11001952x1100 1952x11001952x1100 1952x11001952x1100 15、3015,30 64.6964.69 長時間案例Long duration scenarios
    BalancedVideoAndPhoto,120BalancedVideoAndPhoto,120 1504x8461504x846 1504x8461504x846 15、3015,30 64.6964.69 長時間案例Long duration scenarios
    視訊會議,100VideoConferencing,100 1952x11001952x1100 1952x11001952x1100 1952x11001952x1100 15、30、6015,30,60 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100Videoconferencing,100 1504x8461504x846 1504x8461504x846 5、15、30、605,15,30,60 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 1920x10801920x1080 1920x10801920x1080 1920x10801920x1080 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 1280x7201280x720 1280x7201280x720 1280x7201280x720 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 1128x6361128x636 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 960x540960x540 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 760x428760x428 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 640x360640x360 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 500x282500x282 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios
    視訊會議,100 BalancedVideoAndPhoto,120Videoconferencing,100 BalancedVideoAndPhoto,120 424x240424x240 15、3015,30 64.6964.69 視訊會議,長時間案例Video conferencing, long duration scenarios


客戶可以利用 混合的現實 capture 來拍攝您應用程式的影片或相片,包括全像全像觀賞和影片穩定。Customers can leverage mixed reality capture to take videos or photos of your app, which include holograms and video stabilization.

開發人員在建立應用程式時,如果您想要在客戶捕獲內容時盡可能地查看,則有一些考慮需要考慮。As a developer, there are considerations you should take into account when creating your app if you want it to look as good as possible when a customer captures content. 您也可以直接在應用程式內啟用 (和自訂) mixed reality capture。You can also enable (and customize) mixed reality capture from directly within your app. 深入瞭解 適用于開發人員的混合現實開發Learn more at mixed reality capture for developers.

找出世界中的裝置攝影機Locating the Device Camera in the World

當 HoloLens 拍攝相片和影片時,所捕捉的框架會包含相機的位置,以及相機的鏡頭模型。When HoloLens takes photos and videos, the captured frames include the location of the camera in the world, as well as the lens model of the camera. 這可讓應用程式在真實世界中的相機位置,以增強影像案例。This allows applications to reason about the position of the camera in the real world for augmented imaging scenarios. 開發人員可以使用其最愛的影像處理或自訂電腦視覺程式庫,以創造性的來變換自己的案例。Developers can creatively roll their own scenarios using their favorite image processing or custom computer vision libraries.

HoloLens 檔中其他位置的「相機」可能參考「虛擬遊戲攝影機」 (應用程式轉譯為) 的錐。"Camera" elsewhere in HoloLens documentation may refer to the "virtual game camera" (the frustum the app renders to). 除非另有指示,否則這個頁面上的「相機」是指真實的 RGB 色攝影機。Unless denoted otherwise, "camera" on this page refers to the real-world RGB color camera.

使用 UnityUsing Unity

若要從「CameraIntrinsics」和「CameraCoordinateSystem」移至您的應用程式/全局座標系統,請依照 Unity 文章中 的可定位相機中的指示進行。To go from the 'CameraIntrinsics' and 'CameraCoordinateSystem' to your application/world coordinate system, follow the instructions in the Locatable camera in Unity article. CameraToWorldMatrix 是由 PhotoCaptureFrame 類別自動提供,因此您不需要擔心下面討論的 CameraCoordinateSystem 轉換。CameraToWorldMatrix is automatically provided by PhotoCaptureFrame class, and so you don't need to worry about the CameraCoordinateSystem transforms discussed below.

使用 MediaFrameReferenceUsing MediaFrameReference

如果您使用 MediaFrameReference 類別從相機讀取影像畫面格,則適用這些指示。These instructions apply if you are using the MediaFrameReference class to read image frames from the camera.

每個影像框架都會 (相片或影片) 是否包含在拍攝時根目錄于相機的SpatialCoordinateSystem ,可使用MediaFrameReferenceCoordinateSystem屬性來存取。Each image frame (whether photo or video) includes a SpatialCoordinateSystem rooted at the camera at the time of capture, which can be accessed using the CoordinateSystem property of your MediaFrameReference. 此外,每個畫面格都包含相機鏡頭模型的描述,可在 CameraIntrinsics 屬性中找到。In addition, each frame contains a description of the camera lens model, which can be found in the CameraIntrinsics property. 這些轉換會一起定義3D 空間中每個圖元的光線,代表產生圖元的光子所採用的路徑。Together, these transforms define for each pixel a ray in 3D space representing the path taken by the photons that produced the pixel. 這些光線可以與應用程式中的其他內容相關,方法是從框架的座標系統取得轉換到其他的座標系統 (例如從 固定的參考框架) 。These rays can be related to other content in the app by obtaining the transform from the frame's coordinate system to some other coordinate system (e.g. from a stationary frame of reference). 總而言之,每個影像框架都會提供下列各項:To summarize, each image frame provides the following:

HolographicFaceTracking 範例會示範在相機座標系統和您自己的應用程式座標系統之間,查詢轉換的方式相當直接。The HolographicFaceTracking sample shows the fairly straightforward way to query for the transform between the camera's coordinate system and your own application coordinate systems.

使用媒體基礎Using Media Foundation

如果您直接使用媒體基礎從相機讀取影像畫面格,您可以使用每個畫面格的 MFSampleExtension_CameraExtrinsics 屬性MFSampleExtension_PinholeCameraIntrinsics 屬性 來找出相對於應用程式其他座標系統的相機畫面格,如下列範例程式碼所示:If you are using Media Foundation directly to read image frames from the camera, you can use each frame's MFSampleExtension_CameraExtrinsics attribute and MFSampleExtension_PinholeCameraIntrinsics attribute to locate camera frames relative to your application's other coordinate systems, as shown in this sample code:

#include <winrt/windows.perception.spatial.preview.h>
#include <mfapi.h>
#include <mfidl.h>
using namespace winrt::Windows::Foundation;
using namespace winrt::Windows::Foundation::Numerics;
using namespace winrt::Windows::Perception;
using namespace winrt::Windows::Perception::Spatial;
using namespace winrt::Windows::Perception::Spatial::Preview;
class CameraFrameLocator
    struct CameraFrameLocation
        SpatialCoordinateSystem CoordinateSystem;
        float4x4 CameraViewToCoordinateSytemTransform;
        MFPinholeCameraIntrinsics Intrinsics;
    std::optional<CameraFrameLocation> TryLocateCameraFrame(IMFSample* pSample)
        MFCameraExtrinsics cameraExtrinsics;
        MFPinholeCameraIntrinsics cameraIntrinsics;
        UINT32 sizeCameraExtrinsics = 0;
        UINT32 sizeCameraIntrinsics = 0;
        UINT64 sampleTimeHns = 0;
        // query sample for calibration and validate
        if (FAILED(pSample->GetUINT64(MFSampleExtension_DeviceTimestamp, &sampleTimeHns)) ||
            FAILED(pSample->GetBlob(MFSampleExtension_CameraExtrinsics, (UINT8*)& cameraExtrinsics, sizeof(cameraExtrinsics), &sizeCameraExtrinsics)) ||
            FAILED(pSample->GetBlob(MFSampleExtension_PinholeCameraIntrinsics, (UINT8*)& cameraIntrinsics, sizeof(cameraIntrinsics), &sizeCameraIntrinsics)) ||
            (sizeCameraExtrinsics != sizeof(cameraExtrinsics)) ||
            (sizeCameraIntrinsics != sizeof(cameraIntrinsics)) ||
            (cameraExtrinsics.TransformCount == 0))
            return std::nullopt;
        // compute extrinsic transform
        const auto& calibratedTransform = cameraExtrinsics.CalibratedTransforms[0];
        const GUID& dynamicNodeId = calibratedTransform.CalibrationId;
        const float4x4 cameraToDynamicNode =
            make_float4x4_from_quaternion(quaternion{ calibratedTransform.Orientation.x, calibratedTransform.Orientation.y, calibratedTransform.Orientation.z, calibratedTransform.Orientation.w }) *
            make_float4x4_translation(calibratedTransform.Position.x, calibratedTransform.Position.y, calibratedTransform.Position.z);
        // update locator cache for dynamic node
        if (dynamicNodeId != m_currentDynamicNodeId || !m_locator)
            m_locator = SpatialGraphInteropPreview::CreateLocatorForNode(dynamicNodeId);
            if (!m_locator)
                return std::nullopt;
            m_frameOfReference = m_locator.CreateAttachedFrameOfReferenceAtCurrentHeading();
            m_currentDynamicNodeId = dynamicNodeId;
        // locate dynamic node
        auto timestamp = PerceptionTimestampHelper::FromSystemRelativeTargetTime(TimeSpan{ sampleTimeHns });
        auto coordinateSystem = m_frameOfReference.GetStationaryCoordinateSystemAtTimestamp(timestamp);
        auto location = m_locator.TryLocateAtTimestamp(timestamp, coordinateSystem);
        if (!location)
            return std::nullopt;
        const float4x4 dynamicNodeToCoordinateSystem = make_float4x4_from_quaternion(location.Orientation()) * make_float4x4_translation(location.Position());
        return CameraFrameLocation{ coordinateSystem, cameraToDynamicNode * dynamicNodeToCoordinateSystem, cameraIntrinsics };

    GUID m_currentDynamicNodeId{ GUID_NULL };
    SpatialLocator m_locator{ nullptr };
    SpatialLocatorAttachedFrameOfReference m_frameOfReference{ nullptr };

扭曲錯誤Distortion Error

在 HoloLens 上,影片和靜止影像串流會在系統的影像處理管線中 undistorted,然後才會將框架提供給應用程式 (預覽資料流程包含原始的扭曲框架) 。On HoloLens, the video and still image streams are undistorted in the system's image processing pipeline before the frames are made available to the application (the preview stream contains the original distorted frames). 由於只有 CameraIntrinsics 可供使用,因此應用程式必須假設圖像框架代表完美的 pinhole 攝影機。Because only the CameraIntrinsics are made available, applications must assume image frames represent a perfect pinhole camera.

在 HoloLens (第一代) ,在框架中繼資料中使用 CameraIntrinsics 時,映射處理器中的 undistortion 函式可能仍會保留最多10圖元的錯誤。On HoloLens (first-generation), the undistortion function in the image processor may still leave an error of up to 10 pixels when using the CameraIntrinsics in the frame metadata. 在許多使用案例中,這項錯誤並不重要,但如果您要將影像對齊真實世界的海報/標記,您會發現 <的10px 位移 (大約是11mm,而將放置2個計量離開) ,則可能造成失真錯誤。In many use cases, this error will not matter, but if you are aligning holograms to real world posters/markers, for example, and you notice a <10px offset (roughly 11mm for holograms positioned 2 meters away), this distortion error could be the cause.

的相機使用案例Locatable Camera Usage Scenarios

顯示在世界中的拍攝相片或影片Show a photo or video in the world where it was captured

裝置相機框架隨附「相機到世界」轉換,可用來顯示裝置在拍攝影像時的確切位置。The Device Camera frames come with a "Camera To World" transform, that can be used to show exactly where the device was when the image was taken. 例如,您可以將小型全像攝影圖示放置在此位置, (CameraToWorld MultiplyPoint (Vector3。零) # A3,甚至在相機面對的方向繪製一個小箭號 (CameraToWorld. MultiplyVector (Vector3. 轉寄) # A7。For example, you could position a small holographic icon at this location (CameraToWorld.MultiplyPoint(Vector3.zero)) and even draw a little arrow in the direction that the camera was facing (CameraToWorld.MultiplyVector(Vector3.forward)).

標記/模式/海報/物件追蹤Tag / Pattern / Poster / Object Tracking

許多混合現實應用程式都使用可辨識的影像或視覺模式來建立空間中的可追蹤點。Many mixed reality applications use a recognizable image or visual pattern to create a trackable point in space. 然後,這會用來呈現相對於該點的物件,或建立已知位置。This is then used to render objects relative to that point or create a known location. HoloLens 的一些用途包括尋找以 fiducials 標記的真實世界物件 (例如,具有 QR 代碼) 的電視監視器、將 fiducials 放在,以及以視覺化方式與非 HoloLens 裝置(例如,已設定為透過 Wi-fi 與 HoloLens 進行通訊的平板電腦)進行配對。Some uses for HoloLens include finding a real world object tagged with fiducials (e.g. a TV monitor with a QR code), placing holograms over fiducials, and visually pairing with non-HoloLens devices like tablets that have been setup to communicate with HoloLens via Wi-Fi.

若要辨識視覺效果模式,然後將該物件放在應用程式世界空間中,您將需要一些事項:To recognize a visual pattern, and then place that object in the applications world space, you'll need a few things:

  1. 影像模式辨識工具組,例如 QR 代碼、AR 標記、臉部搜尋工具、圓形追蹤器、OCR 等等。An image pattern recognition toolkit, such as QR code, AR tags, face finder, circle trackers, OCR etc.
  2. 在執行時間收集影像畫面格,並將其傳遞至辨識層Collect image frames at runtime, and pass them to the recognition layer
  3. 將其影像位置 Unproject 回世界位置,或可能是世界光線。Unproject their image locations back into world positions, or likely world rays.
  4. 將您的虛擬模型放在這些世界位置Position your virtual models over these world locations

一些重要的影像處理連結:Some important image processing links:

保持互動式應用程式框架速率很重要,特別是在處理長時間執行的影像辨識演算法時。Keeping an interactive application frame-rate is critical, especially when dealing with long-running image recognition algorithms. 基於這個理由,我們通常會使用下列模式:For this reason, we commonly use the following pattern:

  1. 主要執行緒:管理相機物件Main Thread: manages the camera object
  2. 主要執行緒:要求新的框架 (async) Main Thread: requests new frames (async)
  3. 主執行緒:傳遞新的框架至追蹤執行緒Main Thread: pass new frames to tracking thread
  4. 追蹤執行緒:進程映射以收集關鍵點Tracking Thread: processes image to collect key points
  5. 主要執行緒:移動虛擬模型以符合找到的重點Main Thread: moves virtual model to match found key points
  6. 主執行緒:重複步驟2Main Thread: repeat from step 2

某些影像標記系統只提供單一圖元位置 (其他人提供完整轉換,在此情況下,將不需要此區段) ,這相當於可能的位置。Some image marker systems only provide a single pixel location (others provide the full transform in which case this section will not be needed), which equates to a ray of possible locations. 若要取得單一3d 位置,我們可以利用多個光線,並依其近似交集找出最終結果。To get to a single 3d location, we can then leverage multiple rays and find the final result by their approximate intersection. 若要這樣做,您需要執行下列動作:To do this, you'll need to:

  1. 取得迴圈來收集多個相機影像Get a loop going collecting multiple camera images
  2. 尋找相關聯的功能點及其世界光線Find the associated feature points, and their world rays
  3. 當您有功能的字典,而且每個都有多個世界光線時,您可以使用下列程式碼來解決這些光線的交集:When you have a dictionary of features, each with multiple world rays, you can use the following code to solve for the intersection of those rays:
public static Vector3 ClosestPointBetweenRays(
   Vector3 point1, Vector3 normalizedDirection1,
   Vector3 point2, Vector3 normalizedDirection2) {
   float directionProjection = Vector3.Dot(normalizedDirection1, normalizedDirection2);
   if (directionProjection == 1) {
     return point1; // parallel lines
   float projection1 = Vector3.Dot(point2 - point1, normalizedDirection1);
   float projection2 = Vector3.Dot(point2 - point1, normalizedDirection2);
   float distanceAlongLine1 = (projection1 - directionProjection * projection2) / (1 - directionProjection * directionProjection);
   float distanceAlongLine2 = (projection2 - directionProjection * projection1) / (directionProjection * directionProjection - 1);
   Vector3 pointOnLine1 = point1 + distanceAlongLine1 * normalizedDirection1;
   Vector3 pointOnLine2 = point2 + distanceAlongLine2 * normalizedDirection2;
   return Vector3.Lerp(pointOnLine2, pointOnLine1, 0.5f);

有兩個或多個追蹤的標記位置,您可以定位模型化場景,以符合使用者目前的案例。Given two or more tracked tag locations, you can position a modelled scene to fit the user's current scenario. 如果您無法採用重力,則需要三個標記位置。If you can't assume gravity, then you'll need three tag locations. 在許多情況下,我們會使用簡單的色彩配置,其中白色球體代表即時追蹤標記位置,而藍色球體表示模型化標記位置。In many cases, we use a simple color scheme where white spheres represent real-time tracked tag locations, and blue spheres represent modelled tag locations. 這可讓使用者以視覺化方式測量對齊品質。This allows the user to visually gauge the alignment quality. 我們假設所有的應用程式都有下列設定:We assume the following setup in all our applications:

  • 兩個或多個模型化標記位置Two or more modelled tag locations
  • 場景中的一個「校正空間」是標記的父系One 'calibration space' which in the scene is the parent of the tags
  • 相機功能識別碼Camera feature identifier
  • 此行為會移動校正空間來對齊模型化標籤與即時標籤 (我們很小心地移動上層空間,而不是模型化標記,因為其他連接是相對於其) 的位置。Behavior which moves the calibration space to align the modelled tags with the real-time tags (we are careful to move the parent space, not the modelled markers themselves, because other connect is positions relative to them).
// In the two tags case:
 Vector3 idealDelta = (realTags[1].EstimatedWorldPos - realTags[0].EstimatedWorldPos);
 Vector3 curDelta = (modelledTags[1].transform.position - modelledTags[0].transform.position);
 if (IsAssumeGravity) {
   idealDelta.y = 0;
   curDelta.y = 0;
 Quaternion deltaRot = Quaternion.FromToRotation(curDelta, idealDelta);
 trans.rotation = Quaternion.LookRotation(deltaRot * trans.forward, trans.up);
 trans.position += realTags[0].EstimatedWorldPos - modelledTags[0].transform.position;

使用 Led 或其他辨識器程式庫追蹤或識別已標記的固定或移動真實世界的物件/臉部Track or Identify Tagged Stationary or Moving real-world objects/faces using LEDs or other recognizer libraries


  • 具有 Led (的產業機器人,或較慢移動物件的 QR 代碼) Industrial robots with LEDs (or QR codes for slower moving objects)
  • 識別和辨識房間中的物件Identify and recognize objects in the room
  • 找出並辨識房間內的人員 (例如,將全像連絡人卡片放在臉部) Identify and recognize people in the room (e.g. place holographic contact cards over faces)

另請參閱See also