This paper introduces SPIN, an efficient LLM inference serving system leveraging speculative decoding, which addresses limitations of current approaches through dynamic selection of heterogeneous small speculative models (SSMs), batch-optimized request decomposition during verification, and GPU-pipelined execution coordination, achieving a 2.28× performance improvement over state-of-the-art methods.