Recently, object detection has been widely used in power systems to assist fault diagnosis of transmission lines. However, it is still faced with great challenges due to multi-size targets existing in a single inspection image. Current state-of-art object detection pipelines, like Faster R-CNN, perform well on large objects with low resolution, but usually fail to detect small objects due to low resolution and poor representation. Many existing object detectors for this problem typically exploit feature pyramids, multi-scale image inputs, etc., which can attain high accuracy but is computation and memory consuming. In this paper, we propose an improved cascaded Faster R-CNNs framework that reduces the computational cost while maintaining high detection accuracy to cope with multi-size object detection in high-resolution inspection images, where the first-stage Faster R-CNN is used to detect large objects while the second one detects small objects relative to large objects. We further merge the first-stage and the second into a single network by sharing convolutional features–using the semantic context between multi-size targets, the first stage tells the second where to look. For the "tell" step, we just map the bounding box coordinates of large objects detected in the first stage to the VGG16 network, crop the corresponding feature maps and feed them to the following second stage. Experiments on the test datasets demonstrate that our method achieves a higher detection mAP of 87.6% at 5FPS on an NVidia Titan X compared with the one-stage Faster R-CNN.