The field of social robotics has grown considerably in recent years and social and collaborative robots have entered the consumer market. However, communicative aspects such as timing of utterances and correct interpretation of actions remain a major challenge for social robots. In this position paper I argue that to build collaborative robotic systems that act in socially and interactionally appropriate ways, we need to focus on humans as "the other" in robot-human interaction, whom robotic utterances should be designed for. I present multimodal conversation analysis (CA), a video-based approach that focuses on how actions are interpreted by participants in the context of the ongoing interaction. Identifying three different scales at which CA can be applied, I demonstrate how this approach can support various stages of robot interaction design, making social robots easier to collaborate with from a human perspective.