How Data Science is Improving E-Commerce at Etsy


– Hello and welcome to
Experience Weekly Data Talk. A show where we talk to data science leaders from around the world. Today we are talking
about how data science is improving e-commerce and we are super honored and excited to
have Dr. Liangjie Hong, who is the head of data science at Etsy, which is one of my favorite online stores. He previously was the senior
manager of research at Yahoo. He received his PhD in computer science from Lehigh University and it’s just an honor, Dr.
Hong, to have you today. – Yeah, thank you for having me. – So, can you come share
with our community, kind of your path that led you to start to work in data science? – Yeah, sure, absolutely. So, I first studied sort of
machine learning data mining in my graduate school, then
I slowly developed interest in machine learning and how
machine learning can apply to real-world problems, at that time, probably about 10 years
ago, social networks were very popular, so then majority
of my dissertation work is about how to apply data
science to social network. Then I came to Yahoo
Research, so I basically spent quite a bit of time
to apply cutting edge, machine learning techniques for a wider range of problems. Then I came to Etsy where I spent a lot of time studying those
problems for e-commerce as well as how to interact with product design and product managements. – Very cool, so did you… Did you always know… Because I remember even ten years ago, the term “data scientist”
wasn’t really around. I don’t remember hearing it. – Right. – So I’m kinda curious, like… You were in love with computer science. – Right, right. – You were in school. Where were you kind of… Were you planning on teaching, or did you wanna get out of academia and actually apply what you were learning? – Right, so, “data scientist”, this buzzword, actually came
around in 2011, 2012, and actually that was funny because during that time, I was an intern in LinkedIn, and I believe LinkedIn was first a couple of, you know, places where people coined this
term “data scientist”. So, yes, before that, I was, in general, interested in data-mining
and machine learning. Those are more towards
algorithm parts, right? And so, like, “Oh, this
algorithm, that algorithm. This model and that model.” I think the beauty and the passion of the data scientists is really, we bring models and real-world problems together, and try to sort of help out the business, and so on and so forth. – So, could you tell us a little bit of now that you’re head of
data science at Etsy, can you kind of talk about your… The work you’re doing now? – Yeah, absolutely. So, we have roughly 15 data
scientists in the team. So, we have half the
team in San Francisco, half of the team in Brooklyn, New York, which is our sort of headquarters. Basically, we are part of an engineering organization where the
goal of our teams is to build engineering-quality, kind of, end-to-end machine learning solutions to a lot of our products inside Etsy. For example, we build up
machine learning solutions for search ranking, where you sort of type a keyword that we want to return the most relevant result to you, and we also develop algorithms and solutions for our recommendations. So, if you come to Etsy, you see different modules, and how we can recommend the most relevant route to you. So this is a mixture of engineering, plus we work very closely with
our product managers, designers, US researchers, to really flesh out what is the
best user experiences to present to users. – Yeah, I mean… And for those who have never gone to Etsy.com, you definitely
gotta check it out. There is just so much amazing things to… I dunno. For me and my wife, when we had kids, we were shopping on Etsy every week because there’s so many
cute products there. – Right. – And, I dunno, I was
even browsing yesterday, and I was looking at the
home and living section, and at the top it showed you how many items are in that section, and I counted, it said there was over nine million items just in the home and living section. It boggles my mind. – Yep, yep, gladly, yeah. So, we have like 39
million active listings. We have more than 40
million active buyers. We have 3 million active sellers, so it is really a very, very large-scale market places, I would say. Of course, comparing to Amazon or eBay, we’re still sort of small, but in terms of unique goods,
hand-crafted goods, we’re definitely a very large marketplace. – Yeah, so, how does… With so many millions of items, like you said, 39 million, plus the hundreds and thousands that are being added every single day, how does machine learning help and assist people find what they need? Because there’s so many things that you could possibly find there. – Yeah, exactly. So, this is an ongoing challenge of ours, so A) is when we process
the, like you mentioned, the hundred-thousands
of new listings per day, and we have to tag them,
we give them the label, we have to sort of categorize them into different categories, and make sure we understand, “Oh, this
is female, small size, wedding dress” versus that
is “ring” or something, right, so we use a lot of machine learning techniques to basically process the data. A lot of data is not perfect, and a majority of data is very, very noisy, so we have to make a lot of effort there. Now, after that, it’s basically, we’re ingesting into our sort of search engines, or recommendation engines where it is a really challenging job to find the things that you would
like to interact with in the future, right? So, typically we have, in either case, search or recommendations, we need to search among millions of listings and narrow it down to a, let’s say, several thousand, let’s say one with two thousand candidates pool, then we use a more advanced machine
learning to basically rank those things and make sure that we recommend you the last
five or six items for you. So, it is a really challenging problem. – I’m kinda curious,
because I know when… If you’re trying to
sell something, you can write your own descriptions and maybe categorize and tag, and
I’m kind of curious, how well do humans tag versus
machine learning tagging? Because sometimes people might mistag because we make errors. – Yep. So, human tagging is
definitely very useful, because you need the
bootstrap that they are. Users need to give us the
input, “Oh, this is this size. That is that category.” Or, “What’s the material of this product?” The machine learning is to solve the scalability of each, so we have very, very tiny portion of the data, that’s basically what can be tagged by human beings. So, in general, for that portion, I think, the human being… The sort of human quality
is very, very high. And, of course, machine
learning algorithms can be as good as the data, right? So, the human part. But for the vast majority of the non-tagging world, non-human tagged world, that’s where the machine learning models can be put into play. Yeah, so that’s why we use these models. It’s not necessarily they’re more accurate, but it’s more like they can be applied in large scale. – You mentioned that
there’s a lot of noisy data. Can you kind of explain
what you mean by that? – Well, A) is that… So, from our side, we definitely want to know more about your product, right? So you upload a product listing to Etsy, we want to know, okay, what kind of materials you have, what’s the color, what’s the size, where are some of the raw materials coming from, right? So, if there is a brand… All kinds of aspects, right, and in fact, we probably have hundreds of such aspects, or such attributes that… – Wow. – But imagine you are a seller, right? So you just want to upload a photo, and just list it on the website. – Yeah. – That’s a very, very, sort of cumbersome kind of process. So, we need a balance between user experience and the quality of the data. So, usually, we ask some key elements that you need to fill in, but you can leave the rest of the things blank, then we can ask some other crowdsourcing kind of support to help us to tag. Now, if we talk about the crowdsourcing because they are not the owner, they are not the seller of those products, so they may misunderstand the things. So, that’s where from multiple places that the noisy data comes from. – Mm. Yeah, I can’t even imagine the amount of data that you’re having to collect to help make the user experience better, especially with all the
human involvement and people categorizing their own content, writing descriptions,
and then, on top of that, you have the various machine learning algorithms helping to
sort all through that. Can you talk a little bit about some… For data scientists who are interested in getting involved in e-commerce, can you kind of talk about what are some popular machine learning algorithms or techniques that are used in e-commerce? – Yeah, that’s a great question. So, I always talk to
interview candidates that machine learning e-commerce
is extremely challenging, and the interesting part of that is that there’s not too many off-of-shelf algorithms or models they can use. That’s actually the beauty of the work, is that you keep exploring, and a lot of things you can borrow, you of course, found
traditional sort of domains, but there is a lot of things you need to innovate, right? So, I usually give this example, for example, if you are on Netflix, and you wrote to recommend
movies to users, and so, let’s say you watch House of Cards 1, and you say, “Oh, let me recommend House of Cards 2, House of Cards 3, okay.” That’s okay for Netflix. There’s usually some kind of Netflix recommendation system that will do that. But imagine that’s e-commerce, right? So, you just bought a camera, right? – Right. – And then we start to show you other cameras, and people are going to complain. And in fact, for us, we
have a situation where a customer from Britain purchased a wedding dress, then we keep showing the wedding dress, right? So then, this person actually complained, wrote an email to us. (laughter) – Yeah, I bought my wedding dress. – “Stop showing my wedding dress.” You know, you buy chairs, and we’re going to show you chairs. So, that’s sort of the phenomenon of machine learning e-commerce. Meaning, like, there
are a lot of problems. We need to rethink, repurpose for the e-commerce domain. – Yeah, I can see that, what a huge challenge, because for certain items, like a wedding dress,
you just want one, right? – Yeah, for a very short time. – Yeah, and you don’t need to see any more after you choose that. That’s a huge challenge, and so, I guess, for certain types of products, there has to be rules in place, like if someone buys this, probably not a good idea to show
other similar items. – Well, I mean, even coming up with those rules is challenging, right? So, we’re talking about 40 million buyers, and all buyers are very different, right? – Yeah. – Because some of them might be resellers. In fact, we definitely have very high sort of engaged high-volume buyers. Not the wedding dress, per se, but they keep buy wedding-related stuff. – Mm. – You can imagine that’s kind of the main resell, other purposes, right? So, if you have a hard logic saying you purchased wedding-related things, so then we just stop showing you, for the next two weeks or next two months, those users may jump to us, saying, “Hey, what’s wrong with your algorithm? I want to see similar things. I think I showed you strong enough my personal preference, why don’t you take my personal
preference into account?” So, it’s an extremely
difficult problem to solve. – Yeah, I could see that. I could just see how complex the work is for you and your team. – Right, so… And a lot of challenging
problems with e-commerce is also about the sparsity
of the data, right? So, because a lot of users go to Amazon or eBay on a daily basis, so they actually give a lot of opportunities for this site to be able to exploit their
personal preferences, then, so forth and so on, right? For Etsy, a lot of people come here to buy gifts, right, so
to buy a special thing, that’s things for their special occasions. So we do have a lot of buyers. They show up in, say,
Thanksgiving, or holiday season, right, then they disappear
for the whole year. They show up again in
the next holiday season. – Okay, yeah. – Right, so then you can imagine, like, “Oh, we don’t have too
many data points,” right? “The last batch was last year.” Are you willing to
utilize those data points? Or do you say, “Those data points are outdated, so we don’t have too much information about these guys,” right? So it’s a very difficult situation that where we need to provide personalized and engaging experiences for these users. – Mm. I’m kinda curious, also, obviously there’s people who do trolling
and do inappropriate things, and I’m kind of curious, how do you help prevent and care for the Etsy community by
making sure there’s no offensive content, you know,
pictures being uploaded? – Yeah, that’s a great question. So, we have dedicated teams to basically vet through a lot of shops and sellers, and a lot of content that we have on-site. We also have machine
learning algorithms to scan fraud stuff, and
even money laundering. – Mm. – Yeah, so there’s a
mixture of a lot of human investment as well as
machine learning algorithms. – Can you talk a little bit about that machine learning being
used to prevent fraud? I’m curious about that. – Right, so basically,
we have teams trained to, over the years, look at user activities, and look at how people
want to exploit the site, or want to exploit a lot of the rules that we put in place, and
use those behaviors and train our models such that
we can detect those things. It’s a never-ending process, because of people changing their behavior. They invented new games, and then we have to catch up on those frauds, but yes, we utilize that to do a lot of fraud-detection problems. – You know, I’m also curious about how machine learning is helping shoppers when they’re… You know, I think more and more people are looking through their mobile devices, and I think behavior is
sometimes different on how we use mobile devices
versus desktop computers. – Yep. So, actually half of the traffic to Etsy is from mobile devices, and we also understand the behaviors on mobile devices are very
different from desktop. One thing, I mean, that’s
probably special for Etsy is that people tend to browse and explore on their mobile devices, but eventually check out
from desktop machines. One thing is that, a lot of things are actually very expensive. It’s not a commodity, right? So, you buy a painting from the UK that’s probably going to be $70 or $80, so, then, a lot of people want to make sure that transactions and all the things are correct, so that’s where they use their desktop, but the mobiles devices are definitely driving more and more traffic nonetheless. – Mm. Yeah, I think, just when I
watch my wife as she’s… She loves to browse, she’ll browse Etsy on her mobile device, and add things to her cart that look interesting, but then she’ll go to her desktop to actually make the purchase. Is that what you see a lot? – Yes, yes, and that’s
a very common pattern. I mean, on the other side, we are trying to improve the checkout procedures such that people feel comfortable to checkout on their mobile devices. – Yeah, okay. Yeah, and also, it’s funny, because you were talking about how everybody’s so different in the way that they shop. For example, when I
shop on an online store, I usually do my research, look at product reviews, and then I’ll buy within that 30-minute period. I shop fast, and if I go to a store, like a physical store, I’m
set to what I wanna buy. My wife, on the other hand,
likes taking her time. She will spend a lot of time thinking before she actually will buy something. I’m kind of curious, how does machine learning kind of adjust things that are shown, or “Hey,
it’s time to buy this”? – Yeah, that’s a very, very good question. So, I think… Okay, so generally, I actually say to a lot of candidates and a lot of people who may think
about machine learning or especially recommendations,
or search e-commerce, is that, a lot of people… Let’s say you go to a
shopping mall, right, so you go to the department store. I would say not everybody is willing to buy certain things. A lot of people are exploring and just walking around, and they
also still enjoy the atmosphere, the environment, and from the shop’s perspective, they also understand that not everybody is interested in buying instantly. They want to inspire you, that maybe you are purchasing next time. So, that’s very normal in our sort of offline kind of shopping experience. I think the challenge is that, how can we mimic that experience online? I think we’re doing a lot of… A reasonably good job for the folks that have a very, very strong shopping intent. They know exactly what they want to buy, they have exactly some kind of keyword, and they just type that in, then checkout, like you said, right, so… – Yeah. – Read the real checkout, and, you know, it’s all very strict. And we have challenges, and I think that’s not only for Etsy, but that’s for the e-commerce across the board. How can we really model a
discovery process, right? So, say I come to the site, I have, say, ten minutes to kill. I don’t have a very strong intent in my mind, so what is this inspirational kind of process such that we can inspire people to purchase things? That’s where, I think,
the machine learning can now be integral in the play, and also a lot of innovations in machine learning e-commerce should happen. I think right now we are at the very, very early stages of this, because nobody has already defined… There’s, you know, you search the paper, we search a blog, there’s no such thing as an e-commerce discovery model, or an e-commerce inspiration model. So, that really, I think, will change the way people shop online. – Mm. It’s fascinating hearing about all the different ways you’re leveraging machine learning for e-commerce, and all the challenges involved. I mean, so many challenges that you and your team are working through. – Yeah. – Daily. I was noticing, I was on your website, your personal one, and I saw you were at a Big Data meetup recently, and you talked about optimizing gross merchandise value in e-commerce. Can you talk a little bit about what you mean by that? – Right, so, that’s one example where I mentioned a little earlier that we need to adopt the traditional models to the e-commerce domain, right? Traditional information retrieval, or traditional search, a
classic example is Google, where they optimize static relevance. So, say you want to search Barack Obama, then you have the Wikipedia, probably, jump to that at the top, and you have some other sites, and these ranking is basically golden for every single person, and the notion of the relevance is sort of like building,
or generically there. But e-commerce, it’s different. A) let’s say you search Harry Potter, and you want to buy
some magical sticker or something, and I search Harry Potter, I want to buy a T-shirt, so the notion of relevance is personalized, in general, in the e-commerce search, and B) is that for the e-commerce side, I mean, relevance is one way to look at the things, but we optimize revenue, which is called
gross merchandise value. It’s basically… You can think of that as
expanding your revenue. We want to optimize when
people search things, so it’s not only we want to provide the most relevant result, but also the result can generate the most revenue. So then, we need to model how likely you are to click on that thing, and after you click on that thing, how likely you are going
to purchase that thing, and we also need to take
the price into account. Okay, is that we recommend the things that have higher conversion
rate but a lower price? Or a low conversion rate
but a very high price? Right, so you see all these trade-offs and all these compromises that we need to make, such that we adopt the traditional model to optimize incremental revenue
in the e-commerce setup. – Mm. That’s amazing, I never
even thought about… (laughter) How you’re placing products, or recommending to different people based on, “This has a higher conversion rate but lower cost, but this item generates more revenue for the company,
but lower conversion rate.” Are you doing just a lot of testing? – Yeah, we do hundreds of AB testing. Offline, we also do a lot of testing. So, to make sure that all the algorithms or all the models we put out there have measurable effects, right? We know that every single one we put out there, what’s the incremental revenue it’s generating, what’s incremental user engagement in, that general thing, so, yeah. – So, one of the questions I always love to ask data science leaders is, because we have a lot of people in our community that are looking to get jobs in data science, and so I’m always curious, when you’re hiring someone for your team, what skill sets and maybe even personality types are important to you for someone who’s going to be good to work in a machine learning team specialized in e-commerce? – Right, so, I always
also get such questions, going to meetups and conversing and so on. I think I want to
emphasize something that’s probably not super emphasized. So, one is the ability to formulate the real-world problems into
machine learning setups. A lot of students, a lot of people who are very interested in the field, they tend to think machine learning or data science is a basket of models. A basket of techniques, and “I need to learn these 20 models, I
need to learn these five programming languages,”
and so on and so forth. Those are definitely very important. Those are hard skills that
you need to have, right? But for us, one very, very important thing is that, because we talk to product managers, we talk to designers. They are not necessarily folks with machine learning and data sets backwards. So then, we have to translate, if you like this word, translate whatever their requirements and the way they think into machine learning setups. This is a very, very
difficult skill, actually. Because there are too many possibilities. Like, one scenario you
can translate it into five different setups, and all these five difference setups might mean different things, and may have a different consequence,
and so on and so forth. So, how can you think about this is very, very important for us. I think it’s a key as to data scientists, because this is where this kind of scientist, or science part
is really taking place. So, that’s one very, very important skill set that we are looking for. The other part is very similar to this. It is communication skills, right? So, again, you invented
this fantastic model. You sort of discover these
really good solutions, but how can you communicate
with the shareholders? Again, we are talking
about the shareholders that have a very diverse background, like product managers, designers,
company executives, all sides, like students,
and so on and so forth. So, how can you make sure the things that you put up there can be summarized in plain English words, right? So, this a very, very important skill as data scientists grow, and I think it’s going to sort of
help them along the way. – Yeah, I think the soft skills right there are so key, right, because if you can’t communicate it well, you’re not gonna get buy-in, or it’s not gonna be very easy to sell it within the organization, and yeah. So, I’m glad you touched on that, because a lot of times people will focus on the hard skills: the models and the background in stats, or the different program languages. But, to your point, to get anything done in an organization, you gotta have that soft side. – Yeah, so, right now
the hard side is already emphasized enough, so I
think we sort of agree on what the kind of hard sides are. But the way I look at this is, I see more successful
data scientists are… They have much more mature soft skills. They can maneuver inside
the organization, and how they can really put data science and machine learning as a driving force in the organizations, so that’s why I’ve emphasized the softer skills. – Before we end, I always
like to ask a series of questions, and the first one is, what is your favorite programming language? – Right. (laughter) – It’s like children, it’s like children. – Right, right, right. So, I like Python quite a bit. I think it’s a very flexible and a very good tool for data science. – Okay. And last question is,
what advice do you have for our community who is interested in getting started in a data science career? – Yeah, so, one advice I would have is to have patience and keep learning. I want to share a very short, good example is that we’d recently got a candidate submit the things for our full-time data scientist job, and that person actually is from Julliard Music School. – Really? – His major, actually, he’s getting the masters of piano playing,
and he has a lot of… All his reference letters are from the Lincoln Performance Center, right. So, I actually had to send an email to this guy that said, “Look, obviously you’re not the data scientist role that we’re looking for, but if you really think you have a passion about that,” because this person also attached his GitHub repository in his resume, and obviously this person has a bimodal kind of interest, alright, so daytime, probably, he’s a musician, but free-time as a data scientist. So, I had sent a personal email to this person, encouraging
him to pursue the way. So, that’s where I want to give the folks that advice, alright, so even though, today, you may not really tap into this industry, but just keep your interest in place, and one day, I think, there
is some good outcome of it. – That’s a cool story,
I can’t believe that… Somebody who obviously
is going to Julliard and mastering in piano,
that’s in another level. – Yeah, exactly. – Super smart, complex thinker, and then actually wanting
to pursue data science. That is amazing. (laughter) – I was shocked when I
looked at that resume. – That’s brilliant. Okay, before we end, where can everyone learn more about you? – Where? – Yeah, what website, or… If someone wants to connect with you. – I have a persona website, so just search my name, and that’s basically the top one from Google results. So there, you can actually check out what we’re looking for, like the job description and so on and so forth, and we also list a bunch of papers, blog posts, the we post out there. – Awesome. And I’ll make sure to put links to your LinkedIn profile, so
people can follow you there. And also links to your
website on our blog, and for those that are
listening to the podcast, the short url is just ex.pn/datatalk40, and that’ll bring you over to the website where we’ll have this interview in video format along with the podcast episode, and a full transcription, and links to where you
can connect with Liangjie. So, thank you so much, Dr. Hong. – No problem. – Fascinating talking with you, and I hope you have a great week. – Sure, thank you. – Alright, take care. We’ll see you all next week on Data Talk.

, , , , ,

Post navigation

One thought on “How Data Science is Improving E-Commerce at Etsy

  1. Liangjie Hong is Head of Data Science at Etsy Inc., managing a group of data scientists to deliver cutting-edge scientific solutions for: Search and Discovery, Personalization and Recommendation, and Computational Advertising. Previously, he was Senior Manager of Research at Yahoo Research from 2013 to 2016, leading science efforts for Personalization and Search Sciences.

    Liangjie has published papers in all major international conferences in data mining, machine learning and information retrieval, such as SIGIR, WWW, KDD, CIKM, AAAI, WSDM, RecSys and ICML, winning WWW 2011 Best Poster Paper Award, WSDM 2013 Best Paper Nominated and RecSys 2014 Best Paper Award, as well as serving as a program committee member in KDD, WWW, SIGIR, WSDM, AAAI, EMNLP, ICWSM, ACL, CIKM, IJCAI and several workshops.

Leave a Reply

Your email address will not be published. Required fields are marked *