Videos uploaded to Meta's Family-of-Apps are transcoded into multiple bitstreams of various codec formats, resolutions and quality to provide the best video quality across the wide variety of devices and connection bandwidth constraints. On Facebook alone, there are more than 4 billion video views per day and to address the video processing at this scale, we needed a video processing solution that can deliver the best video quality possible, with the shortest amount of encoding time — all while being energy efficient, programmable, and scalable. In this paper, we present, Meta Scalable Video Processor (MSVP) that can do video processing at on-par quality compared to SW solutions but at a small fraction of the compute time and energy. Each MSVP ASIC can offer a peak SIMO (Single Input Multiple Output) transcoding performance of 4K at 15fps at the highest quality configuration and can scale up to 4K at 60fps at the standard quality configuration. This performance is achieved at ~10W of PCIe module power. We achieved a throughput gain of ~9x for H.264 when compared against libx264 SW encoding. For VP9, we achieved a throughput gain of ~50x when compared with libVPX speed 2 preset. Key components of MSVP transcoding include video decode, scalar, encoding and quality metric computation. In this paper, we go over ASIC architecture of MSVP, design of individual components and compare the perf/W vs quality against standard industry used SW encoders.
Video consumption across social platforms has increased at a rapid pace. Video processing is a compute-heavy workload, and domain-specific accelerators (ASICs) allow more efficient scaling than general purpose CPUs. One of the challenges for video ASIC adoption is that videos ingested in datacenters are user-generated content and have a long-tail distribution of uncommon features. Software stack can handle the outliers gracefully, but these uncommon features may pose a challenge for the ASIC with undesirable effects for the unsupported/unhandled end cases. To avoid undesirable effects in the production, it is critical to proof our system against the long-tail conditions early in the product cycle of the ASIC development. Similarly, critical signals like BD-rate quality and outlier detection are needed from production traffic early in the product cycle. To address these needs, we propose an extensible framework that allows a continuous development strategy using production traffic, through progressive evaluation in various product phases of the video ASIC development cycle. A similar framework would benefit other ASIC accelerator programs in reducing time to deploy on large-scale platforms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.