We present ShapeFormer, a pure transformer based architecture that efficiently predicts missing regions from partially complete input point clouds. Prior work for point cloud completion still produce samples of inferior visual quality, specifically near smooth regions, sharp corners, thin lines, etc. To solve these problems, we carefully design the encoder and decoder of ShapeFormer to - (1) encode the partial input point cloud using memory efficient Local Context Transformer, (2) predict missing regions from the overall shape representation using Folding Blocks, (3) guide the completion procedure using geometric cues present in the input partial shape using Skip Context Transformer, (4) and finally group points based on their semantic similarity into regions using the learnable Region Grouping layers. Our experiments demonstrate that ShapeFormer can accurately predict complete point clouds of high visual quality, and can achieve competitive results in the Completion3D benchmark and even outperform state-of-the-art methods in the Multi-View Partial Point Cloud benchmark (↓ 10% CD). We introduce Completion3D-C, a benchmark to evaluate robustness of various point cloud completion methods and ShapeFormer achieves best performance across various unseen transformations (↓ 14% CD on average). We also show that our method generalizes well to out-of-domain samples belonging to both seen and unseen categories. All results bring us one step closer to using transformers as a ``universal modelling tool" for point clouds.