Online First

2022 : Volume 1, Issue 1

Pallet Localization Techniques of Forklift Robot: A Review of Recent Progress

Author(s) : Yongyao Li 1 2 , Xiaohe Chen 1 2 , Guanyu Ding 3 , Chao Li 3 , Sen Wang 1 2 , Qinglei Zhao 4 and Qi Song 2 3

1 School of Electronic and Information Engineering , Changchun University of Science and Technology , China

2 Suzhou Institute of Biomedical Engineering and Technology , Chinese Academy of Sciences , China

3 , Pilot AI Company , China

4 Changchun Institute of Optics, Fine Mechanics and Physics , Chinese Academy of Sciences , China

J Robot Mech Eng

Article Type : Review Article

DOI : https://doi.org/10.53996/2770-4122.jrme.1000107

 

Abstract

Pallets are intensely used in the warehouses and retailing stores and the automation of pallet localization and detection are highly desired and studied for forklift robot and pallet-picking instruments. Due to the fact that pallet types are varied a lot in practice, it’s extremely difficult to develop single solution to detect all types of pallet. This article presents a general review of pallet identification and localization techniques for industrial forklift robot and pallet-picking instrument. Some modern computer-vision techniques are reviewed and compared. In particular, Deep Neural Network (DNN) method is usually applied to detect and locate the pallet in the RGB images. The Point Cloud method is used to label region of interest (RoI) in 2D range data and the pallet’s feature is extracted and this method is able to provide the precise localization of the pallets. Here, Pallet identification and localization algorithm (PILA) strategy is introduced and this approach could deliver highly-precise orientation angle and centric location of the pallets without any artificial assistance, which utilizes RGB image and Point-cloud data to balance the localization precision and running-time with low-cost hardware. The experimental results show that the pallet could be located with the 3D localization accuracy of 1cm and angle resolution of 0.4 degree at the distance of 3m with running time less than 700ms. PILA is a promising solution for autonomous pallet picking instrument and self-driving forklift applications.

Keywords: Pallet Recognition; Pallet Localization; Deep Neutral Network; RGBD camera

Description

 

Introduction

In recent decades the unmanned industry automation techniques have raised enough attention and especially in logistics applications. With the COVID-19 pandemic continues and spreads all over the world, the pallet picking by unmanned forklifts robot and AGV has become much more desired [1]. The major challenges of the pallet detection and localization are 
1.    x, y and z value of the pallet center and the orientation angle.
2.    “Real-time” operation requirement to guarantee the pallet-picking.
3.    The pallet types and size may vary dramatically in practice, while the typical model-based pallet localization methods are not capable to handle all the cases properly.


Pallet detection and localization issue has been investigated from the 80’s. In the very early stage, infrared sensors or RFID are used to provide the distance of pallet and only point-to-point measurement can be implemented on forklift [2]. With vision technique developed with embedded system hardware, the object detection algorithm is used to locate the artificial features for more precise pallet identification and positioning. However, this method is difficult to implement in the warehouse since it requires a significant modification and will increases the cost dramatically [3, 4]. 2D laser rangefinder or 3D depth camera is another approach to locate the pallets. However, it could be challenging to capture the enough features from the 2D depth information [5-8]. Alternatively, the plane segmentation on 3D point cloud data could deliver more precise results with the template matching method [9]. Unfortunately, this method is limited by the detection speed and the recognition accuracy is seriously affected by the pallet type and Point Cloud data quality, which may impose the strict requirement on the depth image hardware and computing unit. To best of our knowledge, all the existed methods of pallet recognition and localization, using single data source like RGB images or Point Cloud may either lead to the high probability of the false positioning, or consume a lot of computing power and raise the cost dramatically. In the last part we introduce the third approach which is based on 2D image object detection and 3D Point Cloud processing and can deliver precise location data. This pipeline strategy, which we call as the pallet identification and localization algorithm (PILA), uses a low-cost hardware and only requires a minor computing resource. In this pipeline, the deep neural network (DNN) [10, 11] methods are used to detect the pallet in RGB images and Point Cloud data are aligned to the region of interest (RoI) with RGB image. Then the pallet’s location and angle are extracted by the geometric features extracted from Point Cloud data. To sum up, the DNN method is designed to recognize the pallets with the high rate, and the Point Cloud data to deliver the precise localization results with less computing time or resource. The results show the excellent performance on the pallet recognition as the 3D localization accuracy is above 1cm and pose estimation error is below 0.4 degree.

In this paper, Section 2 introduces visual detection techniques of pallet and some neutral network models such as RCNN, Fast-RCNN, SSD and YOLO. In this part, the pallet training dataset is also described with great details. Section 3 introduces Point Cloud approach is explained with some examples. Section 4 presents the pipeline strategy of PILA model and experimental results show that PILA outperform other two approaches in some aspect (Figure 1).
 


Figure 1: The front surface view of a normal pallet used in warehouse.

Pallet Dataset

As shown in Figure 1, pallets used in warehouse normally include ISO pallets, European pallets and North American pallets with sizes ranging from 800×600 to 2240×2740. The plastic and wood pallets are the most common ones in practice. As Figure 2 shows, more than five types of pallets are collected in the dataset for different scenarios and conditions, including cases on the rack, on the ground, with the card box, or with small angles. Furthermore, the pallet images of different lighting conditions, floor conditions and partial occlusion are also included to match the real environment in the warehouse. Generally, there are two-way pocket and four-way pocket pallet, which makes forklift can pick from two-way or four-way directions. We have collected both types in the dataset to make model more generic. Pallet assembly information is shown in Table 1. The pallet dataset contains more than 1000 pictures.
 


Figure 2: Various types of pallets used in the training dataset. Pallets on the ground, pallets with the card box, on the racks, and tilting pallets are among the multiple types of pallets.

 

Pallet material

Color

Dimensions (W×L×H) mm

Recognition rate (%)

Wood

Wooden

700×1400×130

98

Plastic

White

914×1200×150

99

Plastic

Blue

1000×1200×150

98.5


Table 1: Information about some pallet types in the dataset.

Visual-Based Pallet Detection

As one of the most popular topics of computer vision, the object detection has been recognized as one of the most popular topics for decades and intense research activities have been conducted. Traditional model-based target detection techniques require to manually designing the strategy to segment the geometry of pallet and recognize each block. This process involves lots of human effort like picking feature descriptors such as Haar-like features and Ad boost algorithm to cascade multiple classifier [12]. However, the hierarchical feature generation by target detectors based on deep learning is an automatic process, which shows great potential in recognition and classification compared with other methods. There are two major object detection architectures in general. One is the one-stage detector and Single Shot Multi-box Detector (SSD) and You Only Look Once (YOLO). Another one is the two-stage detector as Regional Proposed Network (RPN) method, R-CNN and Faster R-CNN [13]. The one-stage detector contains a single feed-forward fully convolution network that directly provides the object classification with tackled area. Two-stage detector normally consists of the individual region proposal and classification stages. In particular, the reference anchors are used to locate the proposed regions of interest (RoI) for multiple object candidates. The content of RoI is further categorized and the location is fine-tuned in the second stage. Two-stage architecture is able to deliver more precise result while needs longer running time [14].

Neural Network Architecture for Pallet Recognition
As a typical single-stage detector, YOLO takes the whole object detection issue as a regression problem. The input image is divided into a set of grid cells. Each grid cell predicts a fixed number of bounding boxes with a confidence score, which is calculated by multiplying the object detection probability with the intersection over the union (IoU), where IoU is the overlap ratio between the area of predicted bounding box and ground truth bounding box, and the class probability of a bounding box is finally from the IoU score. As shown in Equation (1), if IoU score is greater than 0.5, the matching value m is 1, which means the positive match. On the contrary, zero or negative match means that the object is detected or not. 

Different than YOLO, SSD receives a whole image as input and passes it over multiple convolution layers and convolution feature maps are utilized to predict the bounding boxes. The model generates a vector of object class probabilities for predicting bounding boxes. The architecture used in this case is shown as Figure 3, which is a VGG-16 model pre-trained on Image Net for image classification. A feed-forward convolutional network is used to generate a fixed-size set of bounding boxes and score are given for the object class instances existing in these boxes. Instead of predicting a score value of the potential object, SSD model directly gives the likelihood of a class present in the bounding box. 

Faster R-CNN is the two-stage architecture that utilizes a multi-task learning process to address the detection issue by combining classification and bounding box regression. The system usually comprises of two stages as a region proposal network and a Fast R-CNN header network that employs a convolution backbone to extract high-level features from input pictures. Faster R-CNN replaces the Selective Search method [15] in the original algorithm of RPN. In the first stage, to produce proposals, RPN employs a sliding window over the feature maps generated by the backbone network. Multi-scale anchors are used on the feature map to predict multiple candidate boxes. The anchors are defined with various scales and aspect ratios to identify arbitrary objects. The judge function decides whether the anchors are foreground or background and then modifies them using boundary regression to obtain a precise region proposal. The top-ranked object candidates are next cropped using a RoI pooling layer derived from the feature extractor's intermediate layer, which can cope with the issue of varied sizes of feature maps being input to the network with fully connected layer. In the second stage, each proposal undergoes a final classification and box-refinement procedure [16, 17].

Broad results have shown that the performance of Faster R-CNN and SSD could deliver the better detection accuracy compared to YOLO. However, YOLO is the faster than SSD and Faster R-CNN.

 



Figure 3: The diagram of SSD architecture.
 


The recognition rates of three types of pallets are shown in Table 2. The average rate is above 98 percent, which is pretty durable for the warehouse operation. The pallet detection results with labeled RoI of pallets are presented in Figure 4. The presence of multiple pallets in the scene, or the tilting of the pallets, can be identified well regardless of card box’s presence. 
 

Pallet material

Color

Dimensions (W×L×H) mm

Recognition rate (%)

Wood

Wooden

700×1400×130

98

Plastic

White

914×1200×150

99

Plastic

Blue

1000×1200×150

98.5


Table 2: Pallet Recognition Results of SSD Model.
 


Figure 4: Pallet images (a) The scene with multiple pallets in the field of vision during detection, (b) The tilted wooden pallet, (c) The tilted plastic pallet.

Point Cloud Based Pallet Shape Detection

Usually, 2D LRF are mostly used for the mobile robot SLAM. With the extensively used in unmanned robot navigation, there are some methods to detect and localize pallets based on LRF device. In contract to visual-based solution, this approach does not suffer from imaging distortion, illumination condition or scaling issues which can lead to false detection or feature misdetection. In the early work laser scan data was used for scene segmentation, object detection, and object recognition. The method was present for 3D objects detecting and classifying based on 3D point cloud data [18]. However, 3D solution requires more stringent condition on hardware and algorithm and will increase the cost dramatically.

In order to utilize well-developed object-detection technique and also obtain fast processing, 2D range data is converted to 2D image and DNN technique could be employed [19, 20]. The pallet detection pipeline with 2D rang data [21] is depicted in Figure 5. It consists of three phases as data preparation, training and test, and pallet tracking. The data preparation phase is used to convert the 2D laser scanner data into 2D image. Then, the training and testing phase takes 2D images as input. Once the model is fine-tuned and verified, the tracking phase is executed to detect and keep tracking all possible pallets from the scene. The 2D Laser scanner used to get the range data is shown in Figure 6 (a) and the RoI of pallet tracking with the range data is shown in Figure 6 (b). Range data is converted to bitmap after acquisition and detected by the trained model. If it is greater than a certain threshold, it will be identified as a pallet.
 


Figure 5: The pallet detection pipeline with 2D rang data.
 


Figure 6: (a) S3000 industrial 2D Laser scanner and (b) 2D range data contains pallet pattern.

Algorithm 1 describes online image creation as follows. 2D range data is read and then X and Y range data is converted to 2D image for imaging processing. 2D image dataset are collected and trained to identify the possible pallet pattern in 2D range data.

Algorithm 1: On-line Image Creation
 

Algorithm 1: On-line Image Creation

1: function Read frame

2:  Subscribe to ROS Laser topic

3:  Receive range data as a ROS message

4:  Convert laser scan ranges to Cartesian coordinates

5:  Convert the X and Y point cloud into 2D image

6:  if Training Phase

7:   Define ROIs in the image

8:   Generate an artificial data by rotating the image by 90°& - 90°

9:  else

10:  Break

11: end if

12: end function

Pallet Detection Technique Based on RGB and Depth Images

PILA Description
Two stages architecture of PILA pipeline strategy are introduced in this part. The Figure 7 shows the pipeline flowchart of PILA consequentially. Deep neural network is utilized to recognize the possible pallet from the RGB images of available scenes. The model is generated by off-line training and the transferred model is used for online detection from camera. The algorithm is divided into 3 functional stages. In the first stage, the pallet is detected and the confidence score of detection is given. In the second stage, RGB-Depth images are used to align the pallet in RGB image to depth image. In the third stage, the Point Cloud data is used to extract the pallet front-face plane and line segments are extracted to locate a “T-shape” of pallet center. In particular, the horizontal (x) and vertical (y) line segments at the pallet’s edge are detected according to the pallet shape, which may be varied across different pallet types and the decision rule used here is designed to find “T-section” of pallet center as more universal and loose solution. Finally, the x, y and z value of centric location and orientation angle of the pallet facet could be obtained.


Figure 7: Flowchart of pallet localization, which consists of DNN trained module, online detection module, Fusion module and Point Cloud process module.

Point Cloud Processing
In this part, Point Cloud data is processed for more precise pallet positioning [22] as outlined in Algorithm 2. The Point Cloud data becomes cleaner after the series of operations, such as filtering, segmentation, and extraction. This is highly efficient to improve the computing speed and positioning accuracy with this algorithm design.

Algorithm 2: Point Cloud processing strategy
 

Algorithm 2: Point Cloud processing strategy

1. Convert the depth image data to Point Cloud data

2. Remove out-of-range and scattered Point Cloud data as outliers

3. Down-sample the Point cloud data

4. Segment front surface planes from Point Cloud data

5. Extract horizontal (x) and vertical (y) line groups from selective rules

6. Choose the best x and y line candidates to format “T-shape”

7. Determine triangle centric points of pallet’s front-face

 

Point cloud data is filtered and horizontal (x) and vertical (y) line segmentations are extracted from the smoothed data to locate the pallet pocket section. First, the Point Cloud data is performed through a pass-through filter to secure all points of which Z value (distance) is between 0.5 m and 3 m. The purpose of this step is to avoid the highly-occluded case by the forklift arms, pedestrians, or the ground reflecting of laser beam. Then outliners are removed and surface data is averaged out. At last, down-sample is applied to reduce the computation cost. Since the forklift robot always handle the pallets sitting on the ground or on the rack, the vertical plane needs to be extracted as the frontal surface of pallet. The pipeline of extracting vertical plane is shown in Figure 8. One or several 2D planes are extracted from the projecting filtered inliers along Z direction after Pint Cloud segmentation, the most possible plane could be found based on the score of centroid. The centroid calculation of Point Cloud data is the key approach of determining the most possible pallet plane. Prior to this step, the 2D Point Cloud is down sampled to speed up the computation process.

 

Lastly, the x and y lines at pockets are found to form “T-shape” and the pallet center are located. As the decisive part of this approach, we have proposed a universal decision rule based on general geometry relation of pallet. The “T-shape” is found based on the combination of the bottom line of pallet top (x lines) and outside boundary of middle post (y lines). The pipeline of lines extraction and pallets location decision is shown in Figure 9. The horizontal boundary and vertical boundary points in x and y directions are extracted. Then x and y lines by KdTree search method are extracted and the number of points in x line and y line segments must be larger than the threshold respectively. After sorting all x and y lines, x line and y lines closest to the rough center point are found to form “T-shape” in Figure 10 (d). This will bring the generosity and robustness of pallet center decision regardless of any specific pallet geometry. At last, Point A, B (intersected points) and C are determined as shown in Figure 10 (d), Figure 10 is the graphic presentation of four primary steps of PILA. The (a) and (b) are RGB images and Point Cloud data as well as (c) and (d) are generated through the pallet identification and Point Cloud processing to locate the pallet center.
 

 


Figure 8: Pipeline of extracting vertical plane. After Point Cloud segmentation, the filtering inliers of the Point Cloud are projected and the 2D Point Cloud is generated. Finally, the 2D Point Cloud is down sampled and then a vertical plane is generated.
 


Figure 9: The pipeline of lines extraction and pallets location. The horizontal boundary and vertical boundary points in x and y directions are extracted first. x and y lines by KdTree search method are executed and 1 of x line and 2 of y lines are picked closest to the center.
 


Figure 10: Graphic presentation of four primary steps of PILA. (a) The RGB image of pallet, (b) The raw Point Cloud data converted from depth image, (c) The filtered Point Cloud data according to pallet recognition, (d) Final Point Cloud data for pallet location.

The experimental results show that PILA is able to identify and locate pallets in 3D and the average absolute errors are on the order of 1 cm or within 0.65 degree in the Table 3. The average time spent for each position is as fast as 700 ms, thus the “real-time” requirements of forklift robots operating could be met. Experimentally, the accuracy and speed of PILA are much higher and faster than those using exclusive data source as RGB images or depth data for pallet positioning [23-25]. Table 3 shows the comparison of PILA with primary commercial solutions in the industrial robot market and PILA shows a decent performance in speed, precision and working distance.
 

Method

Depth image [6]

Range and look pallet finder (RLPF) [25]

2D images [26]

PILA

Speed (ms)

900

NA

~50

700

Accuracy (mm/degree)

30/NA

10.8/0.78

15/0.57

9.9/0.4

Distance (m)

3.5

4

4.5

3


Table 3: Comparison of commercial solutions.

Conclusion and Discussion

This paper reviews the pallet identification techniques including visual-based pallet detection, Point cloud method and pallet recognition and localization algorithm (PILA) based on the combination of object-recognition and point-cloud correlation techniques. For visual approach, we give a brief review about RGB images recognized by DNN method. YOLO, SDD and Faster RCNN as typical examples are described in details. Then the combination of RGB and depth image techniques is introduced to build 3D pallet color model for the more accurate positioning performance. PILA has been implemented and verified for the practical pallet picking instruments and autonomous forklift applications and been demonstrated excellent efficiency in terms of precision, speed and working distance. Results have shown that the positive pallet recognition rate is as high as 90 percent and the pallet center localization error and orientation angle error are less than 1 cm or 0.4 degree. In addition, compared with the most solutions based on computer vision or depth image, PILA has been proved to be feasible for forklift robot applications in the practical logistics warehouse.

Funding

This work was supported by the National Natural Science Foundation of China (#61975228) and Dalian Science and Technology Covid-19 Emergency Fund.

Reference

1.    Song Q, Zhao Q, Wang S, et al. Dynamic Path Planning for Unmanned Vehicles Based on Fuzzy Logic and Improved Ant Colony Optimization. IEEE Access. 2020;8:62107-62115.
2.    Aref MM, Ghabcheloo R, Mattila J. A Macro-Micro Controller for Pallet Picking By an Articulated-Frame-Steering Hydraulic Mobile Machine. IEEE Int Conf Rob Autom. 2014:6816-6822.
3.    Wang S, Chen XH, Ding GY, et al. A Lightweight Localization Strategy for LiDAR-Guided Autonomous Robots with Artificial Landmarks. Sens. 2021;21:4479.
4.    Seelinger M, Yoder JD. Automatic Pallet Engagment by A Vision Guided Forklift. Proceedings of the 2005 IEEE International Conference on Robotics and Automation. 2005:4068-4073.
5.    Mohamed IS, Capitanelli A, Mastrogiovanni F, et al. Detection, Localization and Tracking of Pallets Using Machine Learning Techniques and 2D Range Data. Neural Comput Appl. 2019;32:1-18.
6.    Molter B, Fottner J. Real-Time Pallet Localization with 3d Camera Technology for Forklifts in Logistic Environments. 2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI). 2018:297-302.
7.    Bellomo N, Marcuzzi E, Baglivo L, et al. Pallet Pose Estimation with LIDAR and Vision for Autonomous Forklifts. IFAC Proc Volumes. 2009;42:612-617.
8.    Varga R, Nedevschi S. Robust Pallet Detection for Automated Logistics Operations. VISIGRAPP. 2016;4:470-477.
9.    Xiao J, Lu H, Zhang L, et al. Pallet Recognition and Localization Using an RGB-D Camera. Int J Adv Rob Syst. 2017;14(6):1729881417737799.
10.    Canziani A, Paszke A, Culurciello E. An Analysis of Deep Neural Network Models for Practical Applications. arXiv preprint arXiv:1605.07678. 2016:1-7.
11.    Li T, Huang B, Li C, et al. Application of Convolution Neural Network Object Detection Algorithm in Logistics Warehouse. J Eng. 2019;23:9053-9058.
12.    Syu JL, Li HT, Chiang JS, et al. A Computer Vision Assisted System for Autonomous Forklift Vehicles in Real Factory Environment. Multimed Tools Appl. 2016. 
13.    Zou Z, Shi Z, Guo Y, et al. Object Detection in 20 Years: A Survey. arXiv preprint arXiv:190505055. 2019:1-39.
14.    Carranza-García M, Torres-Mateo J, Lara-Benítez P, et al. On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data. Remote Sens. 2021;13: 89.
15.    Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell. 2017;39:1137-49.
16.    Sultana F, Sufian A, Dutta P. A Review of Object Detection Models Based on Convolutional Neural Network. Intelligent Computing: Image Proc Based Appl. 2020:1-16.
17.    Du, Lixuan, Rongyu Zhang, et al. Overview of Two-Stage Object Detection Algorithms. J Phys Conf Ser. 2020;1544.
18.    Monica R, Aleotti J, Rizzini DL. Detection of Parcel Boxes for Pallet Unloading Using a 3D Time-of-Flight Industrial Sensor. 2020 Fourth IEEE International Conference on Robotic Computing (IRC). 2020.
19.    Ihab S Mohamed. Detection and Tracking of Pallets using a Laser Rangefinder and Machine Learning Techniques. Rob. 2017.
20.    JY Oh, Choi HS, Jung SH, et al. Development of Pallet Recognition System using Kinect Camera. Int J Multimedia and Ubiquitous Eng. 2014;9:227-232.
21.    He Z, Wang Y, Yu H. Feature-to-Feature Based Laser Scan Matching in Polar Coordinates with Application to Pallet Recognition. Procedia Eng. 2011;15:4800-4804.
22.    Rusu RB, Cousins S. 3D is here: Point Cloud Library (PCL). 2011 IEEE international conference on robotics and automation. 2011:1-4.
23.    Cui G, Lu L, He Z, et al. A Robust Autonomous Mobile Forklift Pallet Recognition. 2nd International Asia Conference on Informatics in Control, Automation and Robotics (CAR 2010). 2010;3:286-290.
24.    Mohamed IS, Capitanelli A, Mastrogiovanni F, et al. 2D Laser Rangefinder Scans Dataset of Standard EUR Pallets. Data Brief. 2019;24:103837.
25.    Baglivo L, Biasi N, Biral F, et al. Autonomous Pallet Localization and Picking for Industrial Forklifts: A Robust Range and Look Method. Meas Sci Technol. 2011;22:085502.
26.    Casado F, Losada DP, Santana-Alonso A. Pose Estimation and Object Tracking Using 2d Images. Procedia M. 2017;11:63-71.

CORRESPONDENCE & COPYRIGHT

Corresponding Authors: Dr. Guanyu Ding, Pilot AI Company, China.
Dr. Qi Song, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Pilot AI Company, China.

Copyright: © 2021 All copyrights are reserved by Guanyu Ding and Qi Song, published by Coalesce Research Group. This This work is licensed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Support Links

Track Your Article

Twitter Tweets