Perspective Image#

A perspective image is created by central projection: light rays from 3D points pass through one projection center and intersect the image plane.

In weitsicht, a perspective image combines:

  • a camera model (interior orientation + distortion model),

  • an exterior orientation (camera position and rotation),

  • an image/pixel coordinate system.

This page describes the full concept, required inputs, and practical implications for mapping. For practical usage in weitsicht, see Images Guide.

Conceptual Split in weitsicht#

For implementation and reasoning, two tightly coupled parts are separated:

  • Camera model: defines intrinsics and distortion behavior.

  • Perspective image: combines the image data with georeferencing context (pose, CRS relation, mapping usage).

They are mathematically coupled and must be used together for accurate geo-referenced mapping.

Coordinate Systems Involved#

A perspective image workflow usually involves four coordinate systems:

  • 3D point CRS: coordinates of terrain/object points, tie points, or reconstructed geometry.

  • Perspective image CRS: CRS in which image exterior orientation is expressed.

  • Camera CRS: local frame fixed to the camera body/optical system.

  • Pixel/image CRS: 2D coordinate system of image pixels.

See also:

Geo-reference Definition#

Within this package, a perspective image is considered geo-referenced when at least:

  • a camera model is available,

  • exterior orientation is available,

  • the relation to a spatial CRS is defined.

Without these, geometric operations may still run, but results are not reliable for map-accurate measurements or GIS integration.

Exterior Orientation#

Exterior orientation (EO, EOR, XOR, extrinsics) describes camera pose:

  • camera center position,

  • camera orientation.

Operationally, EOR defines the camera CRS location and orientation in the perspective image CRS. This convention is critical and must be consistent with axis directions, handedness, rotation order, and transform direction.

In weitsicht:

  • \(\mathbf{P}_0\) is the camera center position in the perspective image CRS.

  • \(R\) is stored as camera-to-world rotation (\(\mathbf{v}_{world} = R\,\mathbf{v}_{cam}\)), see Image Pose, Attitude, and Rotation Angles.

Therefore, transforming a 3D point from the perspective image CRS into the camera CRS is done with:

\[\mathbf{X}_{cam} = R^{T} (\mathbf{X}_{world} - \mathbf{P}_0)\]

If EOR comes from another software stack, confirm and convert:

  • world-to-camera vs camera-to-world representation,

  • Euler order and angle sign,

  • quaternion convention,

  • units (degrees/radians, meters/millimeters).

Interior Orientation (Camera Intrinsics)#

Interior orientation defines the ideal pinhole mapping parameters:

  • focal length (or focal lengths in x/y),

  • principal point,

  • optional skew term.

Real cameras require distortion correction/modeling in addition to ideal intrinsics. Distortion terms depend on the camera model (radial, tangential, and possibly higher-order terms). For details on model types, parameter meaning, and distortion validity, see Camera Model.

In weitsicht, camera calibration parameters are defined for a calibration image size. When an image is resampled (e.g. downscaled for speed), pixel coordinates are internally scaled between the current image size and the calibration size so the same camera model can be reused consistently.

Projection Model#

A standard perspective model is often written as:

\[\begin{split}s \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = K \left[ R_{w2c} \mid t_{w2c} \right] \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

where:

  • \((X, Y, Z)\) is a point expressed in the perspective image CRS (or transformed into it),

  • \(R_{w2c}, t_{w2c}\) map perspective/world CRS → camera CRS,

  • \(K\) is the intrinsic matrix,

  • \((u, v)\) are pixel/image coordinates.

With the weitsicht convention (Image Pose, Attitude, and Rotation Angles), the corresponding world-to-camera extrinsics are:

\[R_{w2c} = R^{T}, \qquad t_{w2c} = -R^{T}\mathbf{P}_0\]

so you may also see:

\[\begin{split}s \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = K \left[ R^{T} \mid -R^{T}\mathbf{P}_0 \right] \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}\end{split}\]

Collinearity in vector form (similarity notation)#

In classic photogrammetry, the collinearity condition can be expressed in a compact vector form that looks like a spatial similarity transformation:

\[(\mathbf{p} - \mathbf{p}_0) = s \, R^{T} \, (\mathbf{P} - \mathbf{P}_0)\]

Here the observation is an image point:

  • \(\mathbf{p}\) is the observed point in the image coordinate system (often embedded as a 3D point on the image plane),

  • \(\mathbf{p}_0\) defines the interior orientation (principal point and focal length, e.g. \(x_0, y_0, c\)),

  • \(\mathbf{P}\) is the observed 3D point in the superior/world system (the perspective image CRS),

  • \(\mathbf{P}_0\) and \(R\) define the exterior orientation (camera center and attitude),

  • \(s\) is an observation-specific scale factor (it depends on the point depth; it is not a single global scale).

One common convention is to model the image point on the image plane as:

\[\begin{split}\mathbf{p} = \begin{bmatrix} x \\ y \\ 0 \end{bmatrix}, \qquad \mathbf{p}_0 = \begin{bmatrix} x_0 \\ y_0 \\ c \end{bmatrix}\end{split}\]

so that:

\[\begin{split}\mathbf{p} - \mathbf{p}_0 = \begin{bmatrix} x-x_0 \\ y-y_0 \\ -c \end{bmatrix}\end{split}\]

In weitsicht, the interior orientation parameters (principal point, focal length, distortion) live in the camera model, while \(\mathbf{P}_0\) and \(R\) are the pose. The exact sign conventions for \(x, y, c\) depend on the chosen image coordinate system and axis directions (see Camera Coordinate System and Pixel Coordinate System).

Collinearity equations (classic photogrammetry form)#

Using the elements \(r_{ij}\) of the rotation matrix \(R\) and a projection center \(P_0=(X_0, Y_0, Z_0)\), the collinearity equations are commonly written as:

\[\frac{x - x_0}{-c} = \frac{(X - X_0)\,r_{00} + (Y - Y_0)\,r_{10} + (Z - Z_0)\,r_{20}} {(X - X_0)\,r_{02} + (Y - Y_0)\,r_{12} + (Z - Z_0)\,r_{22}}\]
\[\frac{y - y_0}{-c} = \frac{(X - X_0)\,r_{01} + (Y - Y_0)\,r_{11} + (Z - Z_0)\,r_{21}} {(X - X_0)\,r_{02} + (Y - Y_0)\,r_{12} + (Z - Z_0)\,r_{22}}\]

In this formulation, the camera coordinates are \((x_{cam}, y_{cam}, z_{cam}) = R^{T}\,(X-P_0)\) and \(x = x_0 + (-c)\,x_{cam}/z_{cam}\), \(y = y_0 + (-c)\,y_{cam}/z_{cam}\).

In practice, distortion correction is applied in addition to this ideal linear projection.

Forward and Inverse Mapping#

Common operations:

  • Forward projection: 3D point -> image pixel.

  • Inverse projection (ray casting): pixel -> 3D ray from camera center.

  • 3D intersection: ray + surface model (plane/DEM/mesh) -> 3D point.

This single-image ray/surface intersection step is also often referred to as Monoplotting (see Monoplotting).

Inverse projection alone does not produce a unique 3D point; an additional geometric constraint or surface model is required.

When working with undistorted pixels, mask invalid areas first (see the distortion validity border on Camera Model). In code, weitsicht.ImagePerspective.image_points_inside() can be used to test whether undistorted pixel coordinates are valid for the current camera model.

In weitsicht, this intersection step is implemented by mapper classes:

GSD estimation (mapping results)#

Mapping methods on weitsicht.ImagePerspective (map_points, map_center_point, map_footprint) return a MappingResultSuccess that can include GSD estimates:

  • gsd: mean GSD over valid mapped points

  • gsd_per_point: per-point GSD (aligned with the input order; use mask to filter valid entries)

For a perspective camera, GSD is not a single global constant: it varies with range (distance to the mapped 3D intersection point) and with viewing geometry.

Range / focal length approximation#

For small angular separations, the scale can be approximated as:

\[\mathrm{GSD} \approx \frac{R}{f_{px}}\]

where \(R\) is the distance from camera center to the mapped 3D point and \(f_{px}\) is a focal length expressed in pixels. In weitsicht, \(f_{px}\) is taken from weitsicht.CameraBasePerspective.focal_length_for_gsd_in_pixel (for the OpenCV camera model this is the mean of fx and fy).

Incidence angle correction#

If a surface normal \(n\) is available, weitsicht applies an incidence-angle correction so that oblique views produce larger on-surface footprints:

\[\mathrm{GSD}_{surface} \approx \frac{R}{f_{px}\,\cos(i)}, \qquad \cos(i) = |\hat{n} \cdot \hat{v}|\]

with \(\hat{v}\) the unit viewing direction (from surface point towards the camera).

Neighbour-ray refinement#

To capture local effects from the camera model (including distortion and non-uniform ray spacing), weitsicht refines per-point GSD using neighbouring pixel rays (a 1-pixel step in x and y):

  • best effort: compute a chord-length estimate on the range sphere from ray-direction differences, and

  • when normals are valid: intersect the neighbour rays with the local tangent plane through the mapped point and measure the resulting surface distance per pixel step.

The refinement runs without additional mapper calls; it reuses the mapped 3D point and the camera model.

Note

Normals are provided by the mapper backend and their meaning depends on the mapper. See Surface Normals. GSD is expressed in the linear unit of the image/mapping CRS; use a metric CRS (meters) for meaningful values.

Data Requirements for Reliable Use#

For robust geospatial results, provide:

  • calibrated camera model (intrinsics + distortion),

  • accurate per-image EOR,

  • clearly defined CRS metadata,

  • synchronized image and navigation timestamps,

  • consistent units and angle conventions.

Optional but often necessary for higher quality:

  • lever-arm and boresight calibration,

  • ground control/check points,

  • a suitable elevation/surface model.

Quality and Accuracy Considerations#

Main error sources:

  • wrong CRS or mixed datums,

  • wrong rotation convention or axis interpretation,

  • poor GNSS/IMU quality or unsynchronized timestamps,

  • outdated/inaccurate camera calibration,

  • weak geometry (insufficient overlap, no cross-view diversity),

  • inappropriate surface model for ray intersection.

Recommended checks:

  • reproject known control points and inspect residuals,

  • verify that footprints and view directions are physically plausible,

  • compare multiple images for consistency over shared ground features.

Practical Checklist#

Before processing a dataset, verify:

  • camera CRS definition matches your EOR source,

  • pixel convention is consistent (Pixel Coordinate System),

  • EOR angles and units are converted correctly,

  • transform direction is explicit and tested,

  • image identifiers and EO records are correctly linked.