My Summer at Souscout 

Jinze Wang, SEAS Master of Science in Engineering, 2024

This summer I worked at Souscout as an data engineer intern. Souscout is a start-up company that provides consulting service to soccer clubs based on its software and database. I am a huge soccer fan and I also want to start up my own business in the future. My job at Souscout was a data engineer, which matches my skillset and coursework. That’s why this internship is very suitable to me. I have learned many meaningful things this summer. 

The main project I was working on this summer is building a data pipeline for Souscout. This project is consisted of the following parts. Firstly I need to scrape soccer player data from different websites. I used Python and selenium to do this, and during this process I solved some problems like how to bypass cloudfare detection, and how to speed up the scraping process. Then we comes to the second part, where I need to combine the data I get from different resources. Different platforms have their own way to identify players, in other words, the same player will have different player id on different websites. So I need to find a way to match these player ID so that I can combine two datasets using the match player ID function. It seems simple and easy, but there are many problems to deal with, like some website record player’s full name, while others only have short name, player’s height and weight might change since young players are still growing, player’s nationality might also change due to some reasons. So I can’t rely on only one attribute to match those IDs, I design a metric to assess the similarity between two player records, which takes in some general attributes of each players, like name, D.O.B, height and nationality, to output a similarity score between each two player profiles. This similarity score then helps me find the best match from different data sources. And if the best matches are not similar enough, I will call openAI APIs to help me identify if they are the same. And it worked great! 

By doing this project, I learned many new things about data science in soccer. For example, many websites provide dataset about soccer players or clubs, but those websites only have a subset of information. Some websites provide players’ in-game stats like shooting and pace, while other websites show players’ transfer market value and contract info. There is no website that combines them all. What’s more, some data are not the latest. Like some players are still young and growing, so the player`s height on the website may not be the latest. The fundamental reason for all these is that the demand for those data is not much. So there is a great potential for data science in soccer field.  

Also I learned some valuable lessons about a start-up company. Like how to find the demand in the market, how to make your product stand out, how to get your first customer.  

To sum up, although it is an unpaid job, I learned many things more valuable than a salary here. It was a great experience. 

This is part of a series of posts by recipients of the 2023 GAPSA Summer Internship Funding Program that is coordinated by Penn Career Services. We’ve asked funding recipients to reflect on their summer experiences and talk about the industries in which they spent their summer. You can read the entire series here.

By Career Services
Career Services