Application of Natural Language Processing (NLP) and text mining for process improvement.

Fair Wear

  • Customer case
  • Data Science
fairwear-logo
Erik-commercial-manager
Erik van der Kooij
Commercial Manager
3 min
22 Feb 2019

Fair Wear is a non-profit organisation that aims to improve the working conditions of employees in garment factories. The NGO has collected a lot of documentation about its activities in recent years, for example in the form of reports from a complaint line for factory employees, reports of audits that check whether factories comply with the guidelines, and reports of training for factory employees. This information is stored as typed text, usually in Word or PDF format.

The expectation was that very relevant, unique insights could be obtained from the data. However, because only unstructured data (typed text) was available, this proved impossible to analyse for Fair Wear. They asked our Data Scientists to create models that extract all relevant data from the audit and training reports and automatically classify complaints by subject. With the new insights, Fair Wear can improve its own procedures and share best practices with other NGOs, governments and interest groups.

Our approach

We started with the complaint data. We identified relevant and recurring topics through qualitative research. For this we interviewed the stakeholders within Fair Wear and we analysed the complaint data qualitatively. We then linked the results of the interviews to the recurring topics in the data and manually labeled the complaint data. This dataset formed the foundation for developing an automatic classification model.

After the qualitative research, we started working with Natural Language Processing (NLP). We trained an NLP model based on the hand-labeled data, with the aim of approaching the topics as closely as possible. We then applied text mining: with this, complaint texts containing, for example, terms such as 'beaten', 'physical' or 'pain' can be placed under the topic 'physical abuse' by means of queries.

By combining NLP with text mining, we were able to guarantee optimal reliability of the model. This is essential when interpreting this type of sensitive information.

The result

The NLP model for the complaint data is a supervised multi-label topic model, which is trained on human-validated data. Multiple topics (labels) can be assigned to one complaint or audit. The model is fully tailored to the specific wishes of Fair Wear.

The output consists of a structured dataset in which all complaints and audits from the past are classified as topics and variables. New complaints are classified automatically and very reliably using our NLP model. In addition, we made specific models to store data from audit and training reports in a structured way.

In total we defined about 40 relevant topics for the complaint data, of which 8 can be fully determined with NLP. For the other topics, too little data is available to model reliably. Nonetheless, to find these topics, text mining is applied. With these topics, Fair Wear employees have insight into the problems that affect factory employees. The audit and training reports are summarised in datasets that can be analysed.

The foundation for a new reporting system

The results of the NLP and text mining projects also serve as the foundation for setting up a new reporting system. Text fields can be replaced in the future by, for example, dropdowns and checkboxes. This makes reporting complaints, audits and training easier, resulting in a lower workload for Fair Wear employees. Fair Wear now has insight into the topics of incoming complaints, and solutions that do and do not work. Based on this, the NGO can improve its processes. The learning points and best practices are shared with other NGOs, governments and interest groups.

Want to know more?

Commercial Manager Erik will be happy to talk to you about what we can do for you and your organisation as a data partner.

Receive data insights, use cases and behind-the-scenes peeks once a month?


Sign up for our email list and stay 'up to data':