- The client, Repugen is a US based start-up specializing in online reputation management for Health care providers like hospitals and private practices.
- Repugen aggregates reviews posted by patients in social media and online directories. The reviews are analysed for their sentiment. Information like attributes on the physicians, nurses, support staff, hospital facility etc., are extracted from these reviews and actionable insights are provided to the stake holders to improve patient experience
- Trained a machine learning sentiment classifier to score entities as positive, negative and neutral from historic data.
- Built a custom NLP pipeline to identify and extract hidden entities in the review text and extract the sentences associated with the entities.
- The text related to the hidden entities is scored using the trained classifier.
- Trained a model to detect and extract the most common positive and negative attributes that has the highest correlation with review sentiment.
- The entities are ranked across these common positive and negatives attributes.
Web scraping – A domain that is resonating across industries and businesses recently. Web scraping is one of the big businesses in the years to come.
In this modern world, the volume of unstructured data on the web is huge. This data explosion presents enormous opportunities for companies that can extract, manage, and analyze this data.
Data Scraping is basically a process of extracting data from a website using some scripts or automation tool/software. In this demo, we have to scrape the review and information about the doctors from various medical field-oriented websites using Scrapy and Selenium tools.