Skip to main content

Improve neural network training with Saimple

Improving the training set will save a lot of time and effort to reach the needed reliability for your AI business cases.

Streamline your neural network training process

The training of your neural network relies on a training database. The performance of the network will depend on the quality of this database. It is not enough to check its quality by the mean of statistical methods. Now, you can also understand how the neural network will learn from this database.

Saimple helps you to check what a network has learned and how to improve its learning. Right from the start you have a better control of the learning that is done and you save time in building a solid learning training set.

Visualize what your AI has actually learned

Testing is about measuring performance, but knowing what your neural network has actually learned gives you why it is performant or not. The earlier you have this information the more time you save downstream.

Training can be time-consuming and the only way to find out if it was successful is to measure the accuracy on a test dataset. But nothing prevents the AI to have good accuracy and be entirely biased. To prevent bias you need to know more precisely what the neural network has actually learned.

With Saimple you can visualize the impact of every input on the decision made by the network. By doing so it is possible to know how the network is impacted by every input at each step of the inference. You can discover unintended behavior as well as knowledge on how to improve both your training dataset and your neural network architecture.

Prevent the risk of a failing reliability

Classical measures of performance are often not enough to have a clear understanding of how the system will operate in real conditions. You need to go beyond the data you know and discover how much your system will be robust. Systems have in their specification a domain of use that is intended and on which the product is supposed to reliably operate. But to cover the domain you only rely on the data you know to do the testing. Therefore, if your data does not cover enough ground, you might miss something within your domain of use.

To cover more ground and ensure a better reliability of the system you need to test not only on isolated inputs but also whole input domains. You do not need to go through each straw in every haystack to check if there is one example that does not work. With Saimple you can ensure that each stack is correct right away.

Improve your dataset

Constituting a good training dataset is crucial for the performance of your neural network. The more data the better is not sufficient, to have good results you need good data! Improving your training dataset will improve the quality of your product and saves you a lot of trouble with the training process.

Knowing how biased or unbalanced data diversity is impacting your training is a first step. But then you need to have correcting actions onto the dataset. For that your data scientists need guidance on how to adapt efficiently to your large scale database.

Saimple is not only discovering issues it can also help you correct them by identifying:

  • What part of the data is causing bias;
  • Which data labels are being confused;
  • Which data labels are not robust enough and require more data, and more.

Improving the training dataset by understanding the direction in which you want to go can benefit data scientists, as it helps them avoid unnecessary correction steps later in the process.

Prevent the risk of a failing reliability

According to Telus International, there are 7 distinct bias in machine learning :

Sample bias

Sample bias occurs when a dataset does not reflect the realities of the environment in which a model will run. An example of this is certain facial recognition systems trained primarily on images of white men. These models have considerably lower levels of accuracy with women and people of different ethnicities. Another name for this bias is selection bias.

Exclusion bias

Exclusion bias is most common at the data preprocessing stage. Most often it’s a case of deleting valuable data thought to be unimportant. However, it can also occur due to the systematic exclusion of certain information. For example, imagine you have a dataset of customer sales in America and Canada. 98% of the customers are from America, so you choose to delete the location data thinking it is irrelevant. However, this means you model will not pick up on the fact that your Canadian customers spend two times more.

Measurement bias

This type of bias occurs when the data collected for training differs from that collected in the real world, or when faulty measurements result in data distortion. A good example of this bias occurs in image recognition datasets, where the training data is collected with one type of camera, but the production data is collected with a different camera. Measurement bias can also occur due to inconsistent annotation during the data labeling stage of a project.

Recall bias

This is a kind of measurement bias, and is common at the data labeling stage of a project. Recall bias arises when you label similar types of data inconsistently. This results in lower accuracy. For example, let’s say you have a team labeling images of phones as damaged, partially-damaged, or undamaged. If someone labels one image as damaged, but a similar image as partially damaged, your data will be inconsistent.

Observer bias

Also known as confirmation bias, observer bias is the effect of seeing what you expect to see or want to see in data. This can happen when researchers go into a project with subjective thoughts about their study, either conscious or unconscious. We can also see this when labelers let their subjective thoughts control their labeling habits, resulting in inaccurate data

Racial bias

Though not data bias in the traditional sense, this still warrants mentioning due to its prevalence in AI technology of late. Racial bias occurs when data skews in favor of particular demographics. This can be seen in facial recognition and automatic speech recognition technology which fails to recognize people of color as accurately as it does Caucasians

Association bias

This bias occurs when the data for a machine learning model reinforces and/or multiplies a cultural bias. Your dataset may have a collection of jobs in which all men are doctors and all women are nurses. This does not mean that women cannot be doctors, and men cannot be nurses. However, as far as your machine learning model is concerned, female doctors and male nurses do not exist. Association bias is best known for creating gender bias.

    Related Use Cases

    Detect learning bias in AI models

    This use case example will help to understand what an association bias is, how to detect it and finally how to overcome the issue with Saimple.Sa…

    Improve model robustness with data augmentation for e-commerce

    With the development of online retailing, artificial intelligence is becoming increasingly important for the fashion industry.According to a repo…

    Improve pattern recognition in x-rays for pneumonia prediction

    This use case highlights how the Saimple tool offers healthcare professionals the ability to understand the origin of the clinical decision made by t…

    Discover more ways to build trustworthy and responsible AI with Saimple solutions

    ai model design improvement

    Improving the training set will save a lot of time and effort to reach the needed reliability for your AI business cases.

    AI model robustness validation

    Adopting AI is about proving to your clients and your stakeholders that risks can be managed, that is why we provide you tools to ensure proper validation and documentation.

    Get AI You Can Trust: Faster & Easier.

    Get a Personalized Demo: Secure & Streamline Your AI Delivery in 30 minutes.