Rajesh Muppalla, the co-founder of Indix, gives a talk, at Scale by the Bay, which tells us his journey as he learned how to build specific NLP pipelines. The first part of the talk will cover the evolution of the architecture, building blocks and algorithms of the NLP Pipeline. The second part of the talk will focus on how we fine-tuned the e-commerce NLP Pipeline and transferred our learnings from the e-commerce domain to the Tax Compliance domain.
At Indix (acquired by Avalara), our goal was to build the "Google of Products". The product catalog currently has 3+ billion products which was amassed by crawling 5000+ retailer and brand web sites. Naturally, we needed a robust NLP pipeline to make sense of the unstructured text data at this scale. The building blocks I will cover are Language Models, Word Embeddings and Knowledge Graph. The algorithms I will cover will be classification, entity extraction, document similarity and query understanding (for e-commerce domain). Post acquisition by Avalara, the team was tasked to make sense of the unstructured text data in the Tax Compliance domain with limited data.
This talk was given by Rajesh Muppalla at Scale by the Bay.