 Auto-curation of sports highlights for Wimbledon and US Open 2017 Sports highlights help people keep track of significant moments of a match without spending hours watching video footage Manually constructing such highlights is, however, time and effort demanding Motivated by that, we propose an end-to-end multi-modal curation system to create sports highlights Our work combines audio and visual cues from video footage in real-time to detect celebration moments with higher accuracy Our system has been deployed for Wimbledon and US Open Tennis Tournament in 2017 and has attracted tremendous interest from the media This work is the result of cooperation between IBM Watson Research Center and IBM IX Business Strategy and Experience Design We introduce first-of-kind system for automatically extracting tennis highlights uniquely-feels multi-modal excitement measures from the players Noble techniques for learning multi-modal classifiers without costly training data annotation Classifier framework for players' separation recognition that learn complementary information Official highlights at two current tennis tournaments that process video clips and extract highlights from multiple course learning consecutive days Our system was deployed on Wimbledon and US Open's official website in 2017 For both tournaments, there are hundreds of tennis players joining each year The tournaments last from 13 to 20 days On busy days, the matches can occupy around 20 different courts This results in hundreds of hours of footage The work has generated a lot of interest and media coverage and has brought back tremendous success This is the architecture of the entire system It involves multiple modules that process all metadata, both visual and audio information There are a lot of sensors to collect a wide variety of data such as the position of the bones and players Statisticians also recorded things like who wins, what points, etc They are comprehensively used to screen the data and provide potential candidates for highlights The record system scores the candidate video based on visual and audio cues that is to recognize criterion sounds and separation action from players We also use Watson Visual Recognition API to detect handshakes which marks the end of a match Everything is combined together with the constraints of some business rules which cause the structure of the video to build a highlight video Thank you for listening