liu.seSearch for publications in DiVA
Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 78) Show all publications
Wallenberg, M. & Forssen, P.-E. (2017). Attentional Masking for Pre-trained Deep Networks. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS17): . Paper presented at The 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), September 24–28, Vancouver, Canada. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Attentional Masking for Pre-trained Deep Networks
2017 (English)In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS17), Institute of Electrical and Electronics Engineers (IEEE), 2017Conference paper, Published paper (Refereed)
Abstract [en]

The ability to direct visual attention is a fundamental skill for seeing robots. Attention comes in two flavours: the gaze direction (overt attention) and attention to a specific part of the current field of view (covert attention), of which the latter is the focus of the present study. Specifically, we study the effects of attentional masking within pre-trained deep neural networks for the purpose of handling ambiguous scenes containing multiple objects. We investigate several variants of attentional masking on partially pre-trained deep neural networks and evaluate the effects on classification performance and sensitivity to attention mask errors in multi-object scenes. We find that a combined scheme consisting of multi-level masking and blending provides the best trade-off between classification accuracy and insensitivity to masking errors. This proposed approach is denoted multilayer continuous-valued convolutional feature masking (MC-CFM). For reasonably accurate masks it can suppress the influence of distracting objects and reach comparable classification performance to unmasked recognition in cases without distractors.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Systems
Identifiers
urn:nbn:se:liu:diva-142061 (URN)10.1109/IROS.2017.8206516 (DOI)000426978205110 ()978-1-5386-2682-5 (ISBN)
Conference
The 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017), September 24–28, Vancouver, Canada
Note

Funding agencies: Swedish Research Council [2014-5928]; Linkoping University

Available from: 2017-10-20 Created: 2017-10-20 Last updated: 2018-04-11Bibliographically approved
Ogniewski, J. & Forssén, P.-E. (2017). Pushing the Limits for View Prediction in Video Coding. In: 12th International Conference on Computer Vision Theory and Applications (VISAPP’17): . Paper presented at 12th International Conference on Computer Vision Theory and Applications (VISAPP'17), 27 February-1 March, Porto, Portugal. Scitepress Digital Library
Open this publication in new window or tab >>Pushing the Limits for View Prediction in Video Coding
2017 (English)In: 12th International Conference on Computer Vision Theory and Applications (VISAPP’17), Scitepress Digital Library , 2017Conference paper, Published paper (Refereed)
Abstract [en]

The ability to direct visual attention is a fundamental skill for seeing robots. Attention comes in two flavours: the gaze direction (overt attention) and attention to a specific part of the current field of view (covert attention), of which the latter is the focus of the present study. Specifically, we study the effects of attentional masking within pre-trained deep neural networks for the purpose of handling ambiguous scenes containing multiple objects. We investigate several variants of attentional masking on partially pre-trained deep neural networks and evaluate the effects on classification performance and sensitivity to attention mask errors in multi-object scenes. We find that a combined scheme consisting of multi-level masking and blending provides the best trade-off between classification accuracy and insensitivity to masking errors. This proposed approach is denoted multilayer continuous-valued convolutional feature masking (MC-CFM). For reasonably accurate masks it can suppress the influence of distracting objects and reach comparable classification performance to unmasked recognition in cases without distractors.

Place, publisher, year, edition, pages
Scitepress Digital Library, 2017
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Engineering
Identifiers
urn:nbn:se:liu:diva-142063 (URN)
Conference
12th International Conference on Computer Vision Theory and Applications (VISAPP'17), 27 February-1 March, Porto, Portugal
Available from: 2017-10-20 Created: 2017-10-20 Last updated: 2018-01-13Bibliographically approved
Ogniewski, J. & Forssén, P.-E. (2017). What is the best depth-map compression for Depth Image Based Rendering?. In: Michael Felsberg, Anders Heyden and Norbert Krüger (Ed.), Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part II. Paper presented at 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24 (pp. 403-415). Springer, 10425
Open this publication in new window or tab >>What is the best depth-map compression for Depth Image Based Rendering?
2017 (English)In: Computer Analysis of Images and Patterns: 17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Proceedings, Part II / [ed] Michael Felsberg, Anders Heyden and Norbert Krüger, Springer, 2017, Vol. 10425, p. 403-415Conference paper, Published paper (Refereed)
Abstract [en]

Many of the latest smart phones and tablets come with integrated depth sensors, that make depth-maps freely available, thus enabling new forms of applications like rendering from different view points. However, efficient compression exploiting the characteristics of depth-maps as well as the requirements of these new applications is still an open issue. In this paper, we evaluate different depth-map compression algorithms, with a focus on tree-based methods and view projection as application.

The contributions of this paper are the following: 1. extensions of existing geometric compression trees, 2. a comparison of a number of different trees, 3. a comparison of them to a state-of-the-art video coder, 4. an evaluation using ground-truth data that considers both depth-maps and predicted frames with arbitrary camera translation and rotation.

Despite our best efforts, and contrary to earlier results, current video depth-map compression outperforms tree-based methods in most cases. The reason for this is likely that previous evaluations focused on low-quality, low-resolution depth maps, while high-resolution depth (as needed in the DIBR setting) has been ignored up until now. We also demonstrate that PSNR on depth-maps is not always a good measure of their utility.

Place, publisher, year, edition, pages
Springer, 2017
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 10425
Keywords
Depth map compression; Quadtree; Triangle tree; 3DVC; View projection
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Systems
Identifiers
urn:nbn:se:liu:diva-142064 (URN)10.1007/978-3-319-64698-5_34 (DOI)000432084600034 ()2-s2.0-85028463006 (Scopus ID)9783319646978 (ISBN)9783319646985 (ISBN)
Conference
17th International Conference, CAIP 2017, Ystad, Sweden, August 22-24
Funder
Swedish Research Council, 2014-5928
Note

VR Project: Learnable Camera Motion Models, 2014-5928

Available from: 2017-10-20 Created: 2017-10-20 Last updated: 2018-06-01Bibliographically approved
Ovrén, H. & Forssén, P.-E. (2015). Gyroscope-based video stabilisation with auto-calibration. In: 2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA): . Paper presented at 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26-30 May, 2015 (pp. 2090-2097).
Open this publication in new window or tab >>Gyroscope-based video stabilisation with auto-calibration
2015 (English)In: 2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, p. 2090-2097Conference paper, Published paper (Refereed)
Abstract [en]

We propose a technique for joint calibration of a wide-angle rolling shutter camera (e.g. a GoPro) and an externally mounted gyroscope. The calibrated parameters are time scaling and offset, relative pose between gyroscope and camera, and gyroscope bias. The parameters are found using non-linear least squares minimisation using the symmetric transfer error as cost function. The primary contribution is methods for robust initialisation of the relative pose and time offset, which are essential for convergence. We also introduce a robust error norm to handle outliers. This results in a technique that works with general video content and does not require any specific setup or calibration patterns. We apply our method to stabilisation of videos recorded by a rolling shutter camera, with a rigidly attached gyroscope. After recording, the gyroscope and camera are jointly calibrated using the recorded video itself. The recorded video can then be stabilised using the calibrated parameters. We evaluate the technique on video sequences with varying difficulty and motion frequency content. The experiments demonstrate that our method can be used to produce high quality stabilised videos even under difficult conditions, and that the proposed initialisation is shown to end up within the basin of attraction. We also show that a residual based on the symmetric transfer error is more accurate than residuals based on the recently proposed epipolar plane normal coplanarity constraint.

Series
IEEE International Conference on Robotics and Automation ICRA, ISSN 1050-4729
Keywords
Calibration, Cameras, Cost function, Gyroscopes, Robustness, Synchronization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Signal Processing
Identifiers
urn:nbn:se:liu:diva-120182 (URN)10.1109/ICRA.2015.7139474 (DOI)000370974902014 ()978-1-4799-6922-7; 978-1-4799-6923-4 (ISBN)
Conference
2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26-30 May, 2015
Projects
LCMMVPS
Funder
Swedish Research Council, 2014-5928Swedish Foundation for Strategic Research , IIS11-0081
Available from: 2015-07-13 Created: 2015-07-13 Last updated: 2018-06-19Bibliographically approved
Ovrén, H., Forssén, P.-E. & Törnqvist, D. (2015). Improving RGB-D Scene Reconstruction using Rolling Shutter Rectification. In: Yu Sun, Aman Behal & Chi-Kit Ronald Chung (Ed.), New Development in Robot Vision: (pp. 55-71). Springer Berlin/Heidelberg
Open this publication in new window or tab >>Improving RGB-D Scene Reconstruction using Rolling Shutter Rectification
2015 (English)In: New Development in Robot Vision / [ed] Yu Sun, Aman Behal & Chi-Kit Ronald Chung, Springer Berlin/Heidelberg, 2015, p. 55-71Chapter in book (Refereed)
Abstract [en]

Scene reconstruction, i.e. the process of creating a 3D representation (mesh) of some real world scene, has recently become easier with the advent of cheap RGB-D sensors (e.g. the Microsoft Kinect).

Many such sensors use rolling shutter cameras, which produce geometrically distorted images when they are moving. To mitigate these rolling shutter distortions we propose a method that uses an attached gyroscope to rectify the depth scans.We also present a simple scheme to calibrate the relative pose and time synchronization between the gyro and a rolling shutter RGB-D sensor.

For scene reconstruction we use the Kinect Fusion algorithm to produce meshes. We create meshes from both raw and rectified depth scans, and these are then compared to a ground truth mesh. The types of motion we investigate are: pan, tilt and wobble (shaking) motions.

As our method relies on gyroscope readings, the amount of computations required is negligible compared to the cost of running Kinect Fusion.

This chapter is an extension of a paper at the IEEE Workshop on Robot Vision [10]. Compared to that paper, we have improved the rectification to also correct for lens distortion, and use a coarse-to-fine search to find the time shift more quicky.We have extended our experiments to also investigate the effects of lens distortion, and to use more accurate ground truth. The experiments demonstrate that correction of rolling shutter effects yields a larger improvement of the 3D model than correction for lens distortion.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2015
Series
Cognitive Systems Monographs, ISSN 1867-4925 ; 23
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-114344 (URN)10.1007/978-3-662-43859-6_4 (DOI)978-3-662-43858-9 (ISBN)978-3-662-43859-6 (ISBN)
Projects
Learnable Camera Motion Models
Available from: 2015-02-19 Created: 2015-02-19 Last updated: 2018-06-19Bibliographically approved
Ringaby, E. & Forssén, P.-E. (2014). A Virtual Tripod for Hand-held Video Stacking on Smartphones. In: 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL PHOTOGRAPHY (ICCP): . Paper presented at IEEE International Conference on Computational Photography (ICCP 2014), May 2-4, 2014, Intel, Santa Clara, USA. IEEE
Open this publication in new window or tab >>A Virtual Tripod for Hand-held Video Stacking on Smartphones
2014 (English)In: 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL PHOTOGRAPHY (ICCP), IEEE , 2014Conference paper, Published paper (Refereed)
Abstract [en]

We propose an algorithm that can capture sharp, low-noise images in low-light conditions on a hand-held smartphone. We make use of the recent ability to acquire bursts of high resolution images on high-end models such as the iPhone5s. Frames are aligned, or stacked, using rolling shutter correction, based on motion estimated from the built-in gyro sensors and image feature tracking. After stacking, the images may be combined, using e.g. averaging to produce a sharp, low-noise photo. We have tested the algorithm on a variety of different scenes, using several different smartphones. We compare our method to denoising, direct stacking, as well as a global-shutter based stacking, with favourable results.

Place, publisher, year, edition, pages
IEEE, 2014
Series
IEEE International Conference on Computational Photography, ISSN 2164-9774
National Category
Engineering and Technology Electrical Engineering, Electronic Engineering, Information Engineering Signal Processing
Identifiers
urn:nbn:se:liu:diva-108109 (URN)10.1109/ICCPHOT.2014.6831799 (DOI)000356494100001 ()978-1-4799-5188-8 (ISBN)
Conference
IEEE International Conference on Computational Photography (ICCP 2014), May 2-4, 2014, Intel, Santa Clara, USA
Projects
VPS
Available from: 2014-06-25 Created: 2014-06-25 Last updated: 2015-12-10Bibliographically approved
Lesmana, M., Landgren, A., Forssén, P.-E. & Pai, D. K. (2014). Active Gaze Stabilization. In: A. G. Ramakrishnan (Ed.), Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing: . Paper presented at The ninth Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP’14), December 14-17, Bangalore, Karnataka, India (pp. 81:1-81:8). ACM Digital Library
Open this publication in new window or tab >>Active Gaze Stabilization
2014 (English)In: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing / [ed] A. G. Ramakrishnan, ACM Digital Library, 2014, p. 81:1-81:8Conference paper, Published paper (Refereed)
Abstract [en]

We describe a system for active stabilization of cameras mounted on highly dynamic robots. To focus on careful performance evaluation of the stabilization algorithm, we use a camera mounted on a robotic test platform that can have unknown perturbations in the horizontal plane, a commonly occurring scenario in mobile robotics. We show that the camera can be eectively stabilized using an inertial sensor and a single additional motor, without a joint position sensor. The algorithm uses an adaptive controller based on a model of the vertebrate Cerebellum for velocity stabilization, with additional drift correction. We have alsodeveloped a resolution adaptive retinal slip algorithm that is robust to motion blur.

We evaluated the performance quantitatively using another high speed robot to generate repeatable sequences of large and fast movements that a gaze stabilization system can attempt to counteract. Thanks to the high-accuracy repeatability, we can make a fair comparison of algorithms for gaze stabilization. We show that the resulting system can reduce camera image motion to about one pixel per frame on average even when the platform is rotated at 200 degrees per second. As a practical application, we also demonstrate how the common task of face detection benets from active gaze stabilization.

Place, publisher, year, edition, pages
ACM Digital Library, 2014
Keywords
Gaze stabilization, active vision, Cerebellum, VOR, adaptive control
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-114318 (URN)10.1145/2683483.2683565 (DOI)978-1-4503-3061-9 (ISBN)
Conference
The ninth Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP’14), December 14-17, Bangalore, Karnataka, India
Projects
Learnable Camera Motion Models
Funder
Swedish Research Council
Available from: 2015-02-18 Created: 2015-02-18 Last updated: 2018-01-11Bibliographically approved
Ringaby, E., Forssén, P.-E., Friman, O., Olsvik Opsahl, T., Vegard Haavardsholm, T. & Kåsen, I. (2014). Anisotropic Scattered Data Interpolation for Pushbroom Image Rectification. IEEE Transactions on Image Processing, 23(5), 2302-2314
Open this publication in new window or tab >>Anisotropic Scattered Data Interpolation for Pushbroom Image Rectification
Show others...
2014 (English)In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 23, no 5, p. 2302-2314Article in journal (Refereed) Published
Abstract [en]

This article deals with fast and accurate visualization of pushbroom image data from airborne and spaceborne platforms. A pushbroom sensor acquires images in a line-scanning fashion, and this results in scattered input data that needs to be resampled onto a uniform grid for geometrically correct visualization. To this end, we model the anisotropic spatial dependence structure caused by the acquisition process. Several methods for scattered data interpolation are then adapted to handle the induced anisotropic metric and compared for the pushbroom image rectification problem. A trick that exploits the semi-ordered line structure of pushbroom data to improve the computational complexity several orders of magnitude is also presented.

Place, publisher, year, edition, pages
IEEE, 2014
Keywords
pushbroom, rectification, hyperspectral, interpolation, anisotropic, scattered data
National Category
Engineering and Technology Electrical Engineering, Electronic Engineering, Information Engineering Signal Processing
Identifiers
urn:nbn:se:liu:diva-108105 (URN)10.1109/TIP.2014.2316377 (DOI)000350284400001 ()
Available from: 2014-06-25 Created: 2014-06-25 Last updated: 2017-12-05Bibliographically approved
Ovrén, H., Forssén, P.-E. & Törnqvist, D. (2013). Why Would I Want a Gyroscope on my RGB-D Sensor?. In: : . Paper presented at IEEE Workshop on Robot Vision 2013, Clearwater Beach, Florida, USA, January 16-17, 2013 (pp. 68-75). IEEE
Open this publication in new window or tab >>Why Would I Want a Gyroscope on my RGB-D Sensor?
2013 (English)Conference paper, Oral presentation only (Refereed)
Abstract [en]

Many RGB-D sensors, e.g. the Microsoft Kinect, use rolling shutter cameras. Such cameras produce geometrically distorted images when the sensor is moving. To mitigate these rolling shutter distortions we propose a method that uses an attached gyroscope to rectify the depth scans. We also present a simple scheme to calibrate the relative pose and time synchronization between the gyro and a rolling shutter RGB-D sensor. We examine the effectiveness of our rectification scheme by coupling it with the the Kinect Fusion algorithm. By comparing Kinect Fusion models obtained from raw sensor scans and from rectified scans, we demonstrate improvement for three classes of sensor motion: panning motions causes slant distortions, and tilt motions cause vertically elongated or compressed objects. For wobble we also observe a loss of detail, compared to the reconstruction using rectified depth scans. As our method relies on gyroscope readings, the amount of computations required is negligible compared to the cost of running Kinect Fusion.

Place, publisher, year, edition, pages
IEEE, 2013
Keywords
RGB-D sensor, rolling shutter, Kinect Fusion, Kinect, calibration
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:liu:diva-87751 (URN)10.1109/WORV.2013.6521916 (DOI)978-1-4673-5647-3 (ISBN)978-1-4673-5646-6 (ISBN)
Conference
IEEE Workshop on Robot Vision 2013, Clearwater Beach, Florida, USA, January 16-17, 2013
Projects
Embodied Visual Object Recognition
Funder
Swedish Research Council, Embodied Visual Object Recognition
Available from: 2013-02-08 Created: 2013-01-22 Last updated: 2015-12-10Bibliographically approved
Ringaby, E. & Forssén, P.-E. (2012). Efficient Video Rectification and Stabilisation for Cell-Phones. International Journal of Computer Vision, 96(3), 335-352
Open this publication in new window or tab >>Efficient Video Rectification and Stabilisation for Cell-Phones
2012 (English)In: International Journal of Computer Vision, ISSN 0920-5691, E-ISSN 1573-1405, Vol. 96, no 3, p. 335-352Article in journal (Refereed) Published
Abstract [en]

This article presents a method for rectifying and stabilising video from cell-phones with rolling shutter (RS) cameras. Due to size constraints, cell-phone cameras have constant, or near constant focal length, making them an ideal application for calibrated projective geometry. In contrast to previous RS rectification attempts that model distortions in the image plane, we model the 3D rotation of the camera. We parameterise the camera rotation as a continuous curve, with knots distributed across a short frame interval. Curve parameters are found using non-linear least squares over inter-frame correspondences from a KLT tracker. By smoothing a sequence of reference rotations from the estimated curve, we can at a small extra cost, obtain a high-quality image stabilisation. Using synthetic RS sequences with associated ground-truth, we demonstrate that our rectification improves over two other methods. We also compare our video stabilisation with the methods in iMovie and Deshaker.

Place, publisher, year, edition, pages
Springer Verlag (Germany), 2012
Keywords
Cell-phone, Rolling shutter, CMOS, Video stabilisation
National Category
Engineering and Technology
Identifiers
urn:nbn:se:liu:diva-75277 (URN)10.1007/s11263-011-0465-8 (DOI)000299769400005 ()
Note
Funding Agencies|CENIIT organisation at Linkoping Institute of Technology||Swedish Research Council||Available from: 2012-02-27 Created: 2012-02-24 Last updated: 2017-12-07
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5698-5983

Search in DiVA

Show all publications