ABSTRACT
Video cameras are finding increasing use in the study and analysis of bird flight over short ranges. However, reconstruction of flight trajectories in three dimensions typically requires the use of multiple cameras and elaborate calibration procedures. We present an alternative approach that uses a single video camera and a simple calibration procedure for the reconstruction of such trajectories. The technique combines prior knowledge of the bird’s wingspan with a camera calibration procedure that needs to be used only once in the system’s lifetime. The system delivers the exact 3D coordinates of the bird at the time of every full wing extension, and uses interpolated height estimates to compute the 3D positions of the bird in the video frames between successive wing extensions. The system is inexpensive, compact and portable, and can be easily deployed in the laboratory as well as the field.
Introduction
The increasing use of high-speed video cameras is offering new opportunities as well as challenges for tracking three-dimensional motions of humans and animals, and of their body parts (e.g. Shelton et al., 2014; Straw et al., 2011; Fontaine et al., 2009; Dakin et al., 2016); Ros et al., 2017; Troje, 2002; de Margerie et al., 2015; Jackson et al., 2016; Macfarlane et al., 2015; Deetjen et al., 2017).
Stereo-based approaches that use two (or more) cameras are popular, however they require (a) synchronisation of the cameras (b) elaborate calibration procedures (e.g. Hedrick, 2008; Hartley and Zisserman, 2003; Theriault et al., 2014; Jackson et al., 2016) (b) collection of large amounts of data, particularly when using high frame rates; and (c) substantial post-processing that entails frame-by-frame tracking of individual features in all of the video sequences, and establishing the correct correspondences between these features across the video sequences (e.g. Cavagna et al., 2008). This is particularly complicated when tracking highly deformable objects, such as flying birds.
Vicon-based stereo trackers simplify the problem of feature tracking by using special reflective markers or photodiodes attached to the tracked (e.g. Ros et al., 2017; Goller and Altshuler, 2014; Tobalske et al., 2007; Troje, 2002). However, these markers can potentially disturb natural movement and behaviour, especially when used on small animals.
A novel recent approach uses structured light illumination produced by a laser system in combination a high-speed video camera to reconstruct the wing kinematics of a freely flying parrotlet at 3200 frames/second (Deetjen et al., 2017). However, this impressive capability comes at the cost of some complexity, and works best if the bird possesses a highly reflective plumage of a single colour (preferably white).
GPS-based tracking methods (e.g. Bouten et al., 2013) are useful for mapping long-range flights of birds, for example, but are not feasible in indoor laboratory settings, where GPS signals are typically unavailable or do not provide sufficiently accurate positioning. Furthermore, they require the animal to carry a GPS receiver, which can affect the flight of a small animal.
A simple technique for reconstructing 3D flight trajectories of insects from a single overhead video camera involves tracking the position of the insect as well as the shadow that it casts on the ground (e.g. Zeil, 1993; Srinivasan et al., 2000). However, this technique requires the presence of the unobscured sun in the sky, or a strong artificial indoor light, which in itself could affect the animal’s behaviour. (The latter problem could be overcome, in principle, by using an infrared source of light and an infrared-sensitive camera).
This paper presents a simple, inexpensive, compact, field-deployable technique for reconstructing the flight trajectories of birds in 3D, using a single video camera. The procedure for calibrating the camera is uncomplicated, and is an exercise that needs to be carried out only once in the lifetime of the lens/camera combination, irrespective of where the system is used in subsequent applications.
The system was used in a recent study of bird flight (Vo et al., 2016) but that paper provided only a cursory description of the technique. This paper provides a comprehensive description of the underlying technique and procedure, which will enable it to be used in other laboratories and field studies.
Methodology
Derivation of method
Our method uses a single, downward-looking camera positioned at the ceiling of the experimental arena in which the birds are filmed. The camera must have a field of view that is large enough to cover the entire volume of space within which the bird’s flight trajectories are to be reconstructed.
Essentially, the approach involves combining knowledge of the bird’s wingspan (which provides a scale factor that determines the absolute distance of the bird from the camera) with a calibration of the camera that uses a grid of known geometry drawn on the floor. This calibration provides a means of accounting for all of the imaging distortions that are introduced by the wide-angle optics of the camera lens.
A square grid of known mesh dimensions is laid out on the floor. The 2D locations (X,Y) of each of the intersection points are therefore known. Figure 1 illustrates, schematically, a camera view of the grid on the floor, and of a bird in flight above it, as imaged in a video frame in which the wings are fully extended. In general, the image of the grid will not be square, but distorted by the non-linear off-axis imaging produced by the wide-angle lens, as shown in the real image of Figure 3. The intersection points of the grid in the camera image are digitised (manually, or by using specially developed image analysis software), and their pixel locations are recorded. Thus, each grid location (Xi,Yi) on the floor is tagged with its corresponding pixel co-ordinates (pxi,pyi) in the image. This data is used to compute a function that characterises a two-dimensional mapping between the grid locations on the floor and their corresponding pixel co-ordinates in the image.
Video footage of a bird flying in the chamber, as captured by the overhead camera, is then analysed to reconstruct the bird’s 3D flight trajectory, as described below. Two examples of such footage are provided in the Supplementary videos SV1 and SV2. The positions of the wingtips are digitised in every frame in which the wings are fully extended, i.e. when the distance between the wingtips is equal to the wingspan, and attains a maximum in the video image. In the Budgerigar this occurs once during each wingbeat cycle, roughly halfway through the downstroke. We denote the pixel co-ordinates of the wingtips in these frames, which we call the Wex frames, by (pxL,pyL) (left wingtip) and (pxR,pyR) (right wingtip). The projected locations of the two wingtips on the floor are determined by using the mapping function, described above, to carry out an interpolation. Essentially, the projected location of this wingtip on the floor is obtained by computing the position of the point on the floor that has the same location, relative to its four surrounding grid points, as does the position of the wingtip (in image pixel co-ordinates) in relation to the positions of the four surrounding grid locations (in image pixel co-ordinates). Thus, in the case of the left wing tip, for example, this computation effectively uses the locations of the four grid points 1,2, 3 and 4 (see Figure 1) with locations (X1,Y1), (X2,Y2), (X3,Y3) and (X4,Y4) on the floor, and their corresponding image pixel co-ordinates (px1,py1), (px2,py2), (px3,py3) and (px4,py4) respectively, to interpolate the projected position of the pixel co-ordinate (pxL,pyL) on the floor. A similar procedure is used to project the position of the right wingtip (pxR,pyR) on the floor. The construction of the two-dimensional mapping function, and the interpolation are accomplished by using the Matlab function TriScattered Interp. (Equivalent customized codes could be written in any language)
Once the positions of the two wingtips have been projected on to the floor, this information can be used to determine the instantaneous position of the bird in three dimensions, as illustrated in Figure 2. In Figure 2, the 3D positions of the left and right wingtips are denoted by M, with co-ordinates (xL,yL,z), and N, with co-ordinates (xR,yR,z), respectively. Their projected points on the floor are denoted by C, with co-ordinates (XL,YL,0), and D, with co-ordinates (XR,YR,0), respectively.
The height of the bird above the floor is established by determining the ratio between the known wingspan of the bird (w), and the projection of its wingspan on the floor, which we denote by W. W, which is equal to the distance between points C and D in Figure 2, is given by
We denote the ratio (W/w) by Q.
From the geometrical similarity of the triangles OCD and OMN, and triangles OEF and ORG, we can write where H is the height of the ceiling (assumed to be known), and u is the distance of the bird below the ceiling. The height h of the bird above the floor, equal to (H-u), is then computed from (2) as h is the height of the two wingtips above the floor. The (x,y) co-ordinates of the left and right wingtips can also be computed from the wingspan ratio Q as follows.
From the similarity of triangles ODF and ONG, and OEF and ORG, we have: which can be rewritten as
This implies that the (x,y) position co-ordinates of the left wingtip are given by and the (x,y) position co-ordinates of the right wingtip are
Thus, the 3D position co-ordinates of the left and right wing tips are (xL,yL,h) and (xR,yR,h). If we assume that the centre of the bird (the approximate position of its centre of gravity) is located midway between the extended wingtips, then the 3D co-ordinates of the centre of the bird (xc,yc,zc) can be computed as
However, computing the centre of the bird in this way is valid only at the instants when the wings are fully extended. At other times the wings would be pointing either forward or backward, and this calculation would yield an incorrect result. Another way to define the centre of the bird would be as the centre of its thorax, but the thorax can be difficult to segment and track in the images. In addition, the thorax can execute pitch and roll movements, thus introducing variability in the measurements. During flight, the head is the most stable part of the bird’s anatomy-it maintains a horizontal orientation that is largely independent of the pitch and roll attitude of the body (Warrick et al., 2002; Frost, 2009; Bhagavatula, 2011). It is also a highly visible part of the bird that can be tracked reliably - either manually through frame-by-frame digitisation, or by software algorithms that employ relatively simple heuristics. Moreover, the head carries the bird’s primary sense organs, including the eyes. Therefore, reconstructing the 3D trajectory of the head can be useful for determining the visual stimuli that the bird experiences during its flight.
The 3D position co-ordinates of the head can be calculated for each frame as follows. The pixel co-ordinates of the head are determined in every frame (either through manual digitisation or an automated tracking algorithm). The head pixel co-ordinates are projected on to the floor, using the same interpolation procedure that was applied to the wingtips. We denote the floor co-ordinates of the head by (XH,YH) (not shown in Figure 2). Then, using the same geometrical reasoning as above, the (x,y) position co-ordinates of the head are given by and the full 3D co-ordinates of the head are given by
We note that the height of the head (h) is directly calculable only in the frames in which the wings are fully extended, because the bird’s wingspan is the known metric that enables determination of the height. The heights in other frames are estimated through temporal interpolation, assuming that the height varies approximately linearly between successive wing extensions. This is a reasonable assumption for most birds - typically, the height of flight varies slowly and smoothly across several wing beat cycles. However, the X and Y coordinates of the bird in 3D (xH,yH) are determined independently for each frame of the video sequence, based on the digitised pixel co-ordinates of the head in each frame, and the temporally interpolated height for that frame. Thus, while the height of the head (h) is temporally interpolated between wing extensions, the (X,Y) co-ordinates of the head (xH,yH) are calculated independently for each frame, based on the pixel co-ordinates of the head in that frame.
In summary, our method delivers a sample of the bird’s height at every frame in which the wings are extended. These samples are interpolated in time to obtain a height profile of the head for the entire video sequence. This height profile is then used in combination with the pixel co-ordinates of the head in each frame to obtain the 3D co-ordinates of the bird for each frame of the video sequence.
In Budgerigars, the wings are fully outstretched only once during each wingbeat cycle–roughly halfway through the downstroke, as we have noted above. This is also appears to be the case in pigeons and magpies (Tobalske and Dial, 1996). It is possible that in certain other species, which move their wings in the same plane during the upstroke and the downstroke, without folding them, there are two Wex frames per wingbeat cycle - one occurring during the upstroke, the other during the downstroke. In such cases we can obtain two height estimates per wingbeat cycle, and therefore reconstruct the height profile at twice the temporal resolution.
In the above analysis, we have assumed that the head of the bird is at the same height as that of its extended wingtips. If the head is at a different height - as may be evinced from prior knowledge or from side-view images of bird flight in wind tunnels - this known height offset can be added to the wingtip height to obtain the true height of the head.
Procedural steps
Based on the theory described above, the step-by-step procedure for reconstructing the 3D trajectory of the head of a bird from a video sequence captured by a single overhead camera can be described as follows:
(i) Construct the floor grid and acquire an image of the grid from the video camera. An example is shown in Figure 3. The grid is used only once for the camera calibration, and does not need to be present in the experiments.
(ii) Digitise the pixel co-ordinates of the grid locations in the camera image, to obtain a one-to-one mapping between the real co-ordinates of the grid locations on the floor and their corresponding pixel coordinates in the image.
(iii) Acquire knowledge of the bird’s wingspan, either from published data for the species, or, preferably, from direct measurement of the actual individual (because the wingspan can vary from individual to individual due to age and other factors).
(iv) Acquire video footage of the bird during flight in the chamber
(v) Select the frames in the video sequence in which the wings are fully extended. The selection can be done either manually, or through custom-written software. The wing-extension frames are denoted by Wex.
(vi) Digitise the pixel positions of the left and right wingtips of the bird in each of the Wex frames, as shown in the illustrative example of Figure 4.
(vii) Determine the height of the extended wingtips (and therefore the head) in each of the Wex frames from equns (1–3).
(ix) Obtain the height profile of the head for the entire video sequence by temporally interpolating the heights calculated for the Wex frames.
(x) Digitise the pixel position of the head in each frame of the video sequence.
(xi) Compute the 3D position of the head for each frame from equns (9) and (10).
Test of accuracy
The precision of the 3D trajectory reconstruction procedure was evaluated by placing a small test target at 44 different, known 3D locations within the tunnel, of which 39 were within the boundary of the grid. The test target was a model bird with a calibrated wingspan of 30 cm. The head was assumed to be midway between the wingtips, and at the same height as the wingtips. This assumption does not affect the generality of the results, as discussed above. The standard deviations of the errors along the x, y and H directions were 2.1 cm (X), 0.6 cm (Y) and 2.6 cm (H). A detailed compilation of the errors is given in Table S1 of the S1.
Ethics Statement
All experiments were carried out in accordance with the Australian Law on the protection and welfare of laboratory animals and the approval of the Animal Experimentation Ethics Committees of the University of Queensland, Brisbane, Australia.
Results
Examples of flight tracking and reconstruction
Here we show some examples of reconstruction of 3D trajectories of flights of Budgerigars through an indoor tunnel, of dimensions of dimensions 7.28 m (length) x 1.36 m (width) × 2.44 m (height). The birds were trained to fly from a perch at one end of the tunnel to a bird cage at the other. A downward-facing video camera, placed at the centre of the ceiling of the tunnel, was used to film the flights and reconstruct the trajectories in 3D. A grid, of check size 20 cm × 20 cm (as in Figure 3), was drawn on the floor to calibrate the camera using the procedure described above. The reconstructed 3D trajectories do not include the take-off and landing phases of the flight. They only show a section of the trajectory that extends over a window extending from about 1.75 m ahead of the aperture to about 0.25 m behind it, which could be viewed as a ‘cruise’ phase where the bird has completed take-off and not yet commenced landing.
Flights through the tunnel were filmed with the tunnel being either empty (devoid of any obstacles) or carrying a narrow, vertically oriented aperture (a slit) at the halfway point, through which the birds had to fly to get to the other end. To prevent injuries to the birds, the aperture was created by suspending two cloth panels that reached from the ceiling to the floor. Two aperture widths were tested: In one set of tests, the aperture was 5cm wider than the bird’s wingspan; in the other set, the aperture was 5cm narrower than the bird’s wingspan. It has been shown in earlier studies (Schiffner et al., 2014; Vo et al., 2016) that Budgerigars are acutely aware of their wingspan: when negotiating a narrow aperture, they fold their wings back briefly only when the aperture is narrower than their wingspan, and fly through without interrupting their wingbeat cycle if the aperture is wider than their wingspan.
A plan view of a reconstructed flight is shown in Figure 4. In this example, the bird (Four) has a wingspan of 29 cm and it flies through a 34 cm aperture, which is 5 cm wider than the wingspan. The figure shows the (X,Y) positions of the two wingtips at the time of each wing extension, the centre of the body (defined as the midpoint of the line joining the wingtips), and the position of the head. It is evident that the bird flies through the aperture without interrupting its wingbeat cycle, as the wingbeat extensions are equally spaced. This is also clear from Figure 5, which shows two 3D views of the same flight trajectory, where the blue circles represent the centre of the body at each wing extension and the red curve shows the reconstructed 3D position of the head for every frame, as described in the text above and in the legend. The lateral view of the trajectory (Figure 5, right hand panel) shows that the bird maintains its height while passing through the aperture, because the wingbeat cycle is not interrupted.
Figure 6 shows two 3D views of a trajectory of the same bird during flight through an aperture that is 5 cm narrower than its wingspan. Here is clear that the wingbeat cycle is interrupted when the bird passes through the aperture – the distance between successive wing extensions is dramatically larger during the passage. This is also clear from the lateral view of the trajectory (Figure 6, right hand panel), which shows that the bird loses altitude while passing through the aperture, because the wingbeat cycle is interrupted.
Figure 7 shows two 3D views of a trajectory of the same bird during flight through the tunnel when there is no aperture. In this case – as in Figure 5 – the wingbeat cycle is not interrupted anywhere in the flight. This is also clear from the lateral view of the trajectory (Figure 7, right hand panel), which shows that the bird maintains a constant wingbeat cycle and does not lose altitude abruptly anywhere along the trajectory.
It is clear from Figs. 4 – 7 that bird Four interrupts its wingbeat cycle only when it confronts an aperture that is narrower than its wingspan, and not when the aperture is wider than the wingspan or is not present in the tunnel. A loss of altitude occurs only when the wingbeat cycle in interrupted, and not otherwise.
Figure 8 shows plan views of the reconstructed 3D trajectories of the head for the three conditions. In each case, the asterisks mark the locations of the head at the times of full wing extension. Other details are given in the figure legend. In the case of the narrow aperture (red track), the bird temporarily interrupts its wing beat cycle while passing through the aperture. The final wing extension prior to passing the aperture occurs at a point approximately 0.35 m ahead of the aperture. The wing beat cycle resumes after passage through the aperture, with the first wing extension occurring at a point approximately 0.5m beyond the aperture. In the wide aperture and the no aperture conditions, the wing beat cycle continues uninterrupted throughout the flight. These observations are in agreement with those of Schiffner et al. (2014), who report an exquisite ability of these birds to gauge the width of oncoming passages in relation to their wingspan. However, their study only recorded the frequency and timing of wing closures, and did not reconstruct the birds’ trajectories in 3D.
Figure 9 shows reconstructed profiles of the forward flight speed (speed along the X axis of the tunnel) for the flights of bird Four in the narrow aperture, wide aperture and no-aperture conditions. These profiles were constructed using three different procedures, the details of which are described in the legend. The three procedures yield consistent results. The principal observation is that the forward speed is more or less constant throughout the flight and is independent of the flight condition, as observed in Vo et al. (2016). Interestingly, the interruption of the wing beat cycle during the flight through the narrow aperture does not significantly reduce the forward speed.
In the SI (Figures S1-S6) we show results for another bird (Nemo), corresponding those shown above for bird Four.
Discussion
This study has described a simple, inexpensive method for reconstructing the flight trajectories of birds in 3D, using a single video camera. The advantages of the method are:
i) The technique does not use a conventional stereo-based approach. Therefore, it does not require complex calibration procedures involving capturing views of a checkerboard at various positions and orientations, which does not always guarantee accurate localisation in all regions of the experimental space.
ii) The technique does not need feature correspondences to be determined across video frames from two or more cameras.
iii) The grid marker on the floor provides a calibration of the camera geometry and accounts for all of the distortions in the camera optics. There is no need to assume that the camera can be approximated by a pinhole camera, or by any other specific optical geometry. This calibration is a one-off procedure that can be used for the rest of the lifetime of the camera/lens combination, provided the optics are not altered.
iv) Once the calibration has been performed, the calibration grid can be removed or covered (if this is necessary to prevent its potential influence on the behaviour of the birds in the experiments).
v) When a bird glides with its wings outstretched, its height (and therefore the 3D coordinates of the wingtips and the head) can be reconstructed in every frame without requiring any interpolation.
vi) Moving the camera to a different location does not require recalibration. 3D trajectories of birds can continue to be reconstructed with reference to the new optical axis of the camera and the new plane of the (internally stored) calibration grid, as long as the lines joining the wingtips of the bird at each wing extension are parallel to the plane of the calibration grid, as was the case during the original calibration. Thus, in principle, the camera that was calibrated in our experiments using the calibration grid on the floor, can also be used to reconstruct the trajectories of birds in outdoor flight by facing the camera upwards and performing the reconstruction relative to the same calibration grid. Trajectory reconstruction is possible even if a bird is located on the opposite side of the calibration grid – the geometry underlying the reconstruction will be the same, but will involve extrapolation rather than interpolation.
vii) Because the method is computationally simple, it can be applied in closed-loop experimental paradigms in which visual stimuli need to be modified in real time in response to the bird’s flight, as is now being done with some animals (e.g. Stowers et al., 2017).
viii) The system is compact, portable, and easily deployed in the field.
The limitations of the method are:
i) We have assumed that the wings are fully extended at each extension, and that the tip-to-tip distance at these extensions is always equal to the measured wingspan. Variability in the wingtip distances from extension to extension (which may occur during certain manoeuvres) will introduce errors in the reconstruction of the 3D trajectory.
ii) The height estimates are accurate only when the lines joining the extended wingtips are parallel to the calibration plane. This may not be the case when a bird rolls during a tight turn. The pixel distance between the extended wingtips may then be shorter in the camera image, leading to an underestimate of the bird’s height (increase in the estimated distance of the bird from the camera).
iii) The calibration grid on the floor grid must cover a sufficiently large area to enable projection of the wingtips on to the floor at all possible bird positions. This could be a problem when the bird is flying close to the ceiling or to one of the walls of the tunnel (or chamber), as it would require extrapolation of the grid beyond the floor of the chamber. Grid extrapolation can be carried out, but it requires assumptions to be made about the unknown optical distortions in the extrapolated regions of the grid.
iv) The method requires selection of the Wex frames in the video sequence, determination of the pixel co-ordinates of the left and right wingtips in each of the Wex frames, and determination of the pixel co-ordinates of the head in each frame of the video sequence. While we have carried out all of these operations manually, they are tedious and time-consuming. Automated tracking and digitisation of the wingtips and the head in the video sequence can be incorporated as an additional ‘front end’ to the system, which we are currently exploring.
v) The technique delivers true altitude measurements only at each full wing extension. Altitudes at the intermediate frames can be obtained by linear (or spline-based) interpolation. These interpolated heights can be combined with the digitized image position of the head in each frame to obtain a continuous, frame-by-frame 3D trajectory of the bird’s head. The result should be reasonably accurate, provided that the bird’s altitude varies smoothly between successive wing extensions. This is very likely to be the case in cruising flight, but may not apply during flight in densely cluttered environments which may entail abrupt changes of altitude as well as variations in the wing kinematics.
Potential future applications of the method presented in this paper include:
i) Tracking of birds in natural outdoor environments by using an upward-facing wide-angle camera, as discussed briefly above. The species of the bird would have to be known, however, in order to use an estimate of its wingspan.
ii) Reconstruction of 3D flight trajectories of airplanes. In such an application, the 3D coordinates of the airplane can be estimated accurately in every frame without any need for interpolation, because the wingspan is constant (as in a gliding bird). Again, the model of the aircraft would need to be known or identified, in order to use an estimate of its wingspan.