![]() CONTENTS
1. PROJECT BACKGROUND
1.1 Abstract 1.2 Motivation 1.3 Goals 2. LITERATURE 2.1 Pedestrian Extraction & Classification 2.2 Movement Recognition BIBLIOGRAPHY
The full report can be downloaded here.
1. PROJECT BACKGROUND
1.1 Abstract
Image processing is often used in urban surveillance for automation, logistics and security. These operations monitor people, events and traffic in order to maintain safety and order. However, most urban surveillance still relies on many human operators monitoring CCTV feeds 24 hours a day. This new research describes a system designed and implemented to help understand the requirements and techniques for automating this procedure, where pedestrians are monitored such that human operators are only alerted when necessary. This requires a combination of movement detection and tracking to determine the behaviour of pedestrians in a scene, followed by analysis to locate suspicious behaviour. The adopted approach involves a two-tier system of image analysis via background removal and blob tracking, followed by subsequent analysis of the movement data. As such, the new approach combines shape analysis within the tracking framework, shown here to good effect. This contribution will assist in the future development of surveillance systems, which may adopt the combination of methods tested in this report in order to create more robust implementations. 1.2 Motivation
Typically, human operators monitor Closed Circuit Television (CCTV) placed in towns and cities, alerting authorities when they spot pedestrians in danger, acting suspiciously, or behaving in an unacceptable manner. This process uses very little automation. CCTV Operators are often employed to continually monitor a number of CCTV streams - perhaps up to twenty or thirty cameras each. The routine nature of this task decreases the concentration of the operators, leading to events that are either missed or ignored due to ‘video-blindness’, possibly at the expense of others. These problems are not helped by the fact that most CCTV data are discarded – the segments that are saved are those that an operator has deemed worthy recording, or have been spotted via very basic movement-detection software. This introduces problems whereby the operators may miss data of interest, and that any data automatically recorded requires human interpretation at a later date. Automated surveillance would help streamline this process by identifying video streams that an operator may be interested in. A system that could observe CCTV feeds and flag events of interest would be of huge value to security operations. While unable to interpret and act on the images to the same degree of accuracy as a human operator, they provide a safety net, potentially notifying an operator of behaviour they would otherwise have missed. 1.3 Goals
The goal of this project is to identify new methods and create a test system capable of extracting pedestrian movement information from video similar to that collected by CCTV, such that any movement can be analysed to spot suspicious activity. This process involves collecting data, researching and developing appropriate methods to extract pedestrian motion from video, followed by further analysis of this data to identify abnormal motion. The secondary goals are to identify the limitations of the approach as well as the system and data requirements for the techniques to work effectively. The implementation should lay a generic framework that could be used to further improve the techniques, based on the approaches identified and the results of their application. Specifically, the project goals are:
2. LITERATURE
There exists plenty of image processing research capable of - for example - calculating the density or motion of large crowds. There also exist studies that determine the movement of a single individual by analysing their orientation and limb positions. Unfortunately there is less research covering small groups of people, such as pedestrians, and even less about interpreting such data. However, the image processing studies that do exist are incredibly useful. They describe effective methods and approaches that will be needed to extract data from the video. 2.1 Pedestrian Extraction & Classification
There is a myriad of ways to extract information describing the position of pedestrians in a scene, but not all of them are suitable for tracking movement under changing light and weather conditions. Papageurgiou and Poggio developed a system trained to recognise human figures [17] based on pixel similarities with a large training set of figures under various light and weather conditions. Analysing similarity of matches across consecutive frames is then used to determine movement of the matched figures. This approach works well when the training set is large, but is computationally intensive - many minutes per frame of the video. The study does however show that accurate recognition can be done with coarse image data - the training set of figures consisted of thousands of low-resolution images. S. A. Velastin et al investigate the analysis of movement in crowd scenes very similar to those of this project, outlining several methods and their effectiveness when combined with textural analysis [13]. Particularly, the ‘Gray Level Dependence Matrix’ (a co-occurrence matrix) is compared with other approaches for textural analysis. The results demonstrate the effectiveness of simple statistical analysis of texture versus neural networks when measuring crowd density. Unfortunately, this research only examines static images and hence fails to cover crowd motion, but the techniques it describes could potentially be combined with motion estimation techniques. A study conducted by Heisele and Woehler uses image segmentation [12], where raw data is filtered to split the image into segments, which are then analysed and those that match particular shapes are analysed further. This exploits the fact that it is common for colour and luminance to stay relatively constant across an entire patch (such as clothes) allowing for a figure to be accurately identified. Optical flow has also been applied to the problem, in particular in obstacle-avoidance systems [9], where pixel intensities are analysed for patterns that signify a shape. Velastin thoroughly researched approaches with specific regard to urban crowd movement and image processing - his papers [6] and [5] investigate the movement of pedestrians, specifically with CCTV footage. Velastin describes some general methods for detecting motion after background removal, including methods such as optical flow - popular due to its effectiveness, but only when certain prerequisites are met (such as a high frame rate). Velastin details a related, but far more effective algorithm operating on pixel intensities - but at the apparent cost of greater processing requirements. The majority of systems use a different approach, instead pre-processing the video to remove irrelevant data such as the background. A technique by Latzel and Tsotsos operates on the processed images, observing motion captured across many successive images, allowing various styles of motion to be detected [2]. Similarly, techniques developed by Vannoorenberge et al [10] detects changes in pixel intensities that match a predetermined model, such as the period over which pixel intensities change in a particular manner. Both these systems are effective and can work in real-time, but require the application of complex modelling and simulation. ‘Blob analysis’ raises the level of abstraction by extracting areas of interest in the original video - ‘blobs’ representing objects not usually present in the scene. The shape and dimensions of the blobs are analysed to determine patterns. This is the approach used by Masoud [1], implemented in real-time using specialist hardware, in which the data is captured by a single camera and proves to be exceptionally accurate when tracking blobs, even when partially or totally occluded. The approach operates by identifying not only the blobs present in each frame, but the pixel differences between frames to identify movement. Yonemoto et al use a similar method to support 3D object tracking via two cameras, in which limbs can be accurately identified and tracked [14]. 2.2 Movement Recognition
Spatio-temporal analysis has in the past been used to recognise walking persons, where subspaces in the video are treated as spatio-temporal volumes [15]. Application of a Fourier transform to this data can then identify data relating to movement across the volume. This approach has the advantage of operating with relatively low computational intensity on a variety of entities, allowing trajectories to be accurately reconstructed from a video. However, this method has not been documented working on multiple entities simultaneously. The common approach to detecting movement is to produce difference images - an image representing the different details between two images - since this is computationally efficient [1]. This difference image can then be analysed further to extract movement vectors that describe the motion of the blobs captured in the respective images. Murakami and Wada demonstrate another approach, discarding the difference frame and instead comparing the properties of blobs identified in consecutive frames [4]. A blob that is close to the position of a blob in a previous frame, and shares similar dimensions, is likely to refer to the same figure. Motion vectors are also used to find blob segmentation, which are subsequently merged or separated for the purpose of analysis. The same approach is applied to a 2D image to determine movement in 3D space. Extrapolating the movement of pedestrians in 3D space from a 2D image allows far greater understanding of the interactions between entities, but does require exceptional calibration of equipment for complete accuracy. Murakami and Wada’s approach works on low-quality video streams due to the frame-differencing algorithm and some trigonometry. Determining 3D motion does require precise knowledge of the angle and position of the camera, in addition to basic topology of the scene being analysed, but without these details, 2D paths are easy to identify. BIBLIOGRAPHY
[1] Masoud, O. and Papanikolopoulos, N.P, A novel method for tracking and counting pedestrians in real-time using a single camera. IEEE Transactions on Vehicular Technology, vol. 50, n. 5, p 1267-78, Sept. 2001. [2] Seki, M., Fujiwara, H. and Sumi, K., A Robust Background Subtraction Method for Changing Background, Proceedings IEEE Workshop on Applications of Computer Vision, p 207-213, 2000. [3] Latzel, M. and Tsotsos, J.K, A robust motion detection and estimation filter for video signals, Proceedings 2001 International Conference on Image Processing, vol. 1, pt. 1, p 381-4, 2001. [4] Murakami, S. and Wada, A., An automatic extraction and display method of walking persons' trajectories, Proceedings 15th International Conference on Pattern Recognition, vol. 4, pt. 4, p 611-14. [5] Velastin, S. A., Analysis of Crowd Movements and Densities in Built-up Environments using Image Processing, IEE Colloquium 1993/236, p 8/1-6. [6] Velastin, S. A., Automated Measurement of Crowd Density and Motion using Image Processing, Seventh International Conference on `Road Traffic Monitoring and Control' 1994, p 127-32. [7] Cunado, D., Nixon, M. S. and Carter, J. N., Automatic Gait Recognition via Model-Based Evidence Gathering, Proceedings IEEE Workshop on Identification Advanced Technologies, p 27-30, 1999 [8] Hosoi, R., Ishijima, S. and Kojima, A., Dynamical model of a pedestrian in a crowd, Proceedings 5th IEEE International Workshop on Robot and Human Communication, p 44-9, 1996. [9] Kai-Tai Song and Jui-Hsiang Huang, Fast optical flow estimation and its application to real-time obstacle avoidance, Proceedings 2001 IEEE International Conference on Robotics and Automation, vol. 3, pt. 3, p 2891-6, 2001. [10] Vannoorenberghe, P., Motamed, C., Blosseville, J. M. and Postaire, J. G., Monitoring pedestrians in a uncontrolled urban environment by matching low-level features, 1996 IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pt. 3, p 2259-64, 1996. [11] Boghossian, B. A. and Velastin, S. A., Motion-based Machine Vision Techniques for the Management of Large Crowds, Sixth IEEE International Conference on Electronics, Circuits and Systems 1999, Proceedings, Vol. 2, p 961-4. [12] Heisele, B. and Woehler, C., Motion-based recognition of pedestrians, Proceedings Fourteenth International Conference on Pattern, vol. 2, pt. 2, p 1325-30, 1998. [13] Marana, A. N, Costa, L. F., Lotufo, R. A. and Velastin, S. A., On the Efficacy of Texture Analysis for Crowd Monitoring, SIBGRAPI'98 1998, Proceedings, p 354-61. [14] Yonemoto, S., Nakano, H. and Taniguchi, R., Real-time human figure control using tracked blobs, Proceedings 12th International Conference on Image Analysis and Processing, p 127-32, 2003. [15] Ricquebourg, Y. and Bouthemy, P., Real-time human figure control using tracked blobs, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, n. 8, p 797-808, , Aug. 2000. [16] Cheung, S. C. and C. Kamath, Robust techniques for background subtraction in urban traffic video, Video Communications and Image Processing, SPIE Electronic Imaging, San Jose, Jan. 2004. [17] Papageurgiou, C. and Poggio, T., Trainable Pedestrian Detection, Proceedings 1999 International Conference on Image Processing, vol. 4, pt. 4, p 35-9, 1999. [18] Curio, C., Edelbrunner, J., Kalinke, T., Tzomakas, C. and von Seelen, W., Walking pedestrian recognition, Proceedings IEEE Conference on Intelligent Transportation Systems, p 292-297, 1999. [19] Rourke, A. and Bell, M. G. H, Wide Area Pedestrian Monitoring using Video Image Processing, International Conference on Image Processing and its Applications, p 563-6, 1992. SCREENSHOTS
|