What is Ground Truth? (2024)

If you’re new to Transkribus, or machine learning in general, then you are also probably new to the term “Ground Truth”. In short, Ground Truth is the accurate and verified data which is used to train machine learning models, such as those used for automatic transcriptions in Transkribus. And this data is pretty important for the success of your model, as machine learning is just a tool that statistically replicates the data you supply. Therefore, the better your Ground Truth data, the better your model will be.

In this post, we’re going to take a further look at what Ground Truth is, why it is so important for handwritten text recognition software, and how you can best prepare Ground Truth data on Transkribus.

A basic Ground Truth definition

Used in statistics and machine learning, Ground Truth is data that we assume to be true. For example, you have two images. One image depicts a dog, and the other a cat. We know this to be true because we, as humans, have the ability to recognise different animals. If you ask a thousand people which picture contains the dog, they would overwhelmingly point to the same picture.

But for a computer, this task is a lot harder. A computer does not automatically know which animal is which, it has to be taught how to do this. And this is where Ground Truth comes in. If you wanted to train a computer to recognise which photos contain dogs and which cats, you would first have to provide it with a large dataset of images, each labelled as either “photo with dog” or “photo with cat”. From these Ground Truth labels, the computer can learn what images with dogs look like and what images with cats look like and create a model containing this information.

Once that model is sufficiently trained with enough data, it can be presented with a brand new image and should be able to say whether that image contains a cat or a dog, just like a human would. This is why it is called “artificial intelligence”: it is training computers to do intelligent tasks that humans do naturally. And Ground Truth is the basis for this whole process.

What is Ground Truth? (1)Male tabby cat. Alvesgaspar. Public domain, via Wikimedia Commons

What is Ground Truth? (2)Canis lupus familiaris (perro) en Monfero. Fernando Losada Rodríguez. Public domain, via Wikimedia Commons

Distinguishing between cats and dogs is easy for a human, but almost impossible for an untrained computer.

Is Ground Truth just used for training models?

No, it’s also used for testing models. Let’s say you have already created your model for distinguishing between photos of cats and dogs. Now, you want to test how well that model actually works. You can do this by presenting the model with Ground Truth datasets for which you have a “correct answer”, and see if the model can come up with that same answer.

With the cat/dog model, that means you would show the model a series of images which have already been labelled as either “photo with dog” or “photo with cat” and count how often the model correctly assigns the correct label to a photo. This way, you can see how well your model performs.

How is Ground Truth data used in text recognition models?

What if you didn’t want a model that could distinguish between cats and dogs in photos, but that could read and transcribe historical documents? These kinds of models are the core technology behind all handwritten text recognition platforms, and they are trained with Ground Truth in exactly the same way. In this case, the Ground Truth data isn’t images of cats and dogs but images of texts with accurate transcriptions. Using machine learning, the computer learns from the data which characters in the image represent which characters in the transcription. Then, just like before, it then takes this information and uses it to create a model.

Some text recognition platforms only allow you to use models that have been trained by the creators of the platform. What is particularly unique about Transkribus is that it allows users to create their own models and train them to read a particular type of document. Because these custom models are trained on very specific Ground Truth data, they tend to be more accurate at transcribing documents similar to those in the Ground Truth dataset. This is ideal if you have very specific documents, such as letters by a small group of people, a handwritten diary, or notarial documents from a certain time period.

What is Ground Truth? (3)

Ground Truth is known as “Training data” within Transkribus. © Transkribus

How do I prepare Ground Truth for a Transkribus model?

As you might have guessed already, training your own custom model requires creating Ground Truth data to train it on. In most cases, you will need at least 10,000 words of transcribed handwritten text or 5,000 words of transcribed printed text to train your first model. However, this varies depending on the type of material and model.

There are three main ways of finding suitable Ground Truth data for models in Transkribus:

  1. You can manually transcribe documents. This is done by uploading images of the documents to Transkribus and then manually typing out the transcription in the text editor. You can find out more about manual transcription in our Help Center.
  2. You can find pre-transcribed examples of texts. For example, if a colleague has already been using Transkribus to transcribe documents similar to the ones you are working on, they can share these directly with your Transkribus collection and you can then save them as Ground Truth.
  3. You can take a public model as a base. Transcribe all your documents with the most suitable public model and then correct those transcriptions to make them more accurate and re-save them as Ground Truth. This will tailor the model to your specific documents, and save a lot of time in the process.
What is Ground Truth? (4)

Using a public model as a base for your custom model can save a lot of time and effort. © Transkribus

What is important to remember when preparing Ground Truth?

The success of your model depends on the quality of your Ground Truth data. And when it comes to quality, the most important aspects are accuracy and consistency.

Firstly, your transcriptions should be as accurate as possible. In machine learning, the model automatically assumes that the Ground Truth data it has been given is true. That means that if there are inaccuracies in your Ground Truth, then the model will think that these inaccuracies are correct, and this will affect the accuracy of any documents the model then goes on to transcribe.

Secondly, your transcription should be consistent. There are many different ways to transcribe linguistic features such as diacritics, ligatures, or S-characters. The way you choose to transcribe these features in your Ground Truth dictates the way that your model will go on to transcribe them. Therefore, it makes sense to consistently transcribe your Ground Truth documents in exactly the way you want the rest of your documents to be transcribed so that the model can learn your system and apply it to later documents.

You can find out more about linguistic features to pay particular attention to on this page.

Where can I find out more about preparing Ground Truth and training models in Transkribus?

Preparing Ground Truth and models in Transkribus is an extensive topic, and it is worth doing some research before embarking on your first model. Here are some resources where you can find out more about training text recognition models with Ground Truth data:

  • Our Help Center is a mine of information on all aspects of Transkribus. You can check out the section about training text recognition models here.
  • The FAQs on our website provide answers to the most commonly asked questions about models and training data.
  • We have also prepared the following video as a user-friendly guide to training models in Transkribus:

What is Ground Truth? (2024)

FAQs

What is Ground Truth? ›

In short, Ground Truth is the accurate and verified data which is used to train machine learning models, such as those used for automatic transcriptions in Transkribus

Transkribus
Transkribus is a comprehensive solution for the digitisation, AI-powered text recognition, transcription and searching of historical documents.
. And this data is pretty important for the success of your model, as machine learning is just a tool that statistically replicates the data you supply.

What is a ground truth answer? ›

What is Ground Truth? Ground truth refers to the actual nature of the problem that is the target of a machine learning model, reflected by the relevant data sets associated with the use case in question.

What is the meaning of ground truth? ›

Ground truth is information that is known to be real or true, provided by direct observation and measurement (i.e. empirical evidence) as opposed to information provided by inference.

What is the ground truth measure? ›

Example: Some ground truth data are measurements of things that are expected to remain constant over time (e.g., date of birth), while some ground truth data are measurements of things that naturally vary over time (e.g., city of residence or subjective measures, such as interests).

What is truth grounding? ›

Ground-truth verb.

To make observations firsthand rather than through inference or indirect sources.

What is the ground of truth? ›

Hence, the expression, "the pillar and ground of the truth" is at once a description that provides strong support that is permanent and enduring ("pillar"); it is a bulwark and stay ("ground"); and that which it upholds, "truth," is not simply surface or what might appear to be truth, but on higher levels of dialectic ...

What is the truth answer? ›

Truth is the property of being in accord with fact or reality. In everyday language, truth is typically ascribed to things that aim to represent reality or otherwise correspond to it, such as beliefs, propositions, and declarative sentences. Truth is usually held to be the opposite of falsehood.

How do you get ground truth? ›

Developing ground truth datasets often require major tasks such as model design, data labeling, classifier design and training/testing. Ground truth labels for datasets are mostly annotated manually by a group of annotators and then later compared using different techniques to set target labels for the dataset.

What is ground truth text? ›

Ground truth is the term that describes real word data used to train and test AI model outputs. Ground truth data is required for many AI applications, including automated driving and audio or speech recognition.

Why is ground truth important? ›

Similarly, in machine learning, ground truth data is information from real-world observations used to calibrate an artificial intelligence (AI) algorithm or model. In short, it's the reality you teach your AI so it can draw the right conclusions and make the right decisions.

What is the difference between ground truth and truth? ›

versus 'official truth'

The Official truth, with few or no data to collect on the ground. Summarizing, they are two complementary concepts: Ground truth: is the empirical evidence, the geographic reality. It is collected on location or proofs of existence like satellite images.

What is the process of ground truthing? ›

Ground truth is a method of collecting digital information about the condition of the crop and field in general. The information is collected at a particular location and it is used as an addition to the remote sensing data collected by a drone.

How do you use ground truth in a sentence? ›

For the lack of ground truth, it's hard to evaluate the performance of inter-subject registration. The ground truth was obtained from the full orthoimage due to the high resolution of this product.

How to ground yourself spiritual? ›

6 everyday practices to help ground yourself
  1. Add in mindful breath work. ...
  2. Practice good sleep hygiene. ...
  3. Prioritize physical wellness with nutrition and exercise. ...
  4. Learn to work with your "inner critic" ...
  5. Work on how to identify catastrophizing. ...
  6. Spend time in nature.
Feb 20, 2024

How to ground yourself while sleeping? ›

Use an Earthing Mat and/or Earthing Sheet

Earthing mats or earthing sheets are an easy way to bring the benefits of earthing into your home as they simulate the effect of sleeping in direct contact with the Earth, while still being in the comfort of your own home.

What is the meaning of grounded in the truth? ›

To be “grounded in our truth” is to know our truth, speak our truth, and act on our truth. Sometimes we may find it nearly impossible to know what our truth is, as we live out truths for other people, or for society.

How does groundtruth work? ›

Ground truth data is data collected at scale from real-world scenarios to train algorithms on contextual information such as verbal speech, natural language text, human gestures and behaviors, and spatial orientation.

What is ground truth vs official truth? ›

versus 'official truth'

The Official truth, with few or no data to collect on the ground. Summarizing, they are two complementary concepts: Ground truth: is the empirical evidence, the geographic reality. It is collected on location or proofs of existence like satellite images.

How do you establish ground truth? ›

Developing ground truth datasets often require major tasks such as model design, data labeling, classifier design and training/testing. Ground truth labels for datasets are mostly annotated manually by a group of annotators and then later compared using different techniques to set target labels for the dataset.

Top Articles
10 best money-making apps for September 2024
How Did I Become a Blockchain Developer in 2 Years?
Restaurer Triple Vitrage
Chicago Neighborhoods: Lincoln Square & Ravenswood - Chicago Moms
News - Rachel Stevens at RachelStevens.com
Booknet.com Contract Marriage 2
Santa Clara College Confidential
Autobell Car Wash Hickory Reviews
Nyuonsite
Cinepacks.store
Learn How to Use X (formerly Twitter) in 15 Minutes or Less
Craigslist/Phx
Turbocharged Cars
Vichatter Gifs
Troy Athens Cheer Weebly
Washington Poe en Tilly Bradshaw 1 - Brandoffer, M.W. Craven | 9789024594917 | Boeken | bol
How Much Is Tj Maxx Starting Pay
Simon Montefiore artikelen kopen? Alle artikelen online
Mini Handy 2024: Die besten Mini Smartphones | Purdroid.de
2015 Honda Fit EX-L for sale - Seattle, WA - craigslist
Most McDonald's by Country 2024
Extra Virgin Coconut Oil Walmart
Overton Funeral Home Waterloo Iowa
Napa Autocare Locator
Locate At&T Store Near Me
Voy Boards Miss America
Florida History: Jacksonville's role in the silent film industry
Missed Connections Dayton Ohio
Pecos Valley Sunland Park Menu
Military life insurance and survivor benefits | USAGov
Teen Vogue Video Series
The EyeDoctors Optometrists, 1835 NW Topeka Blvd, Topeka, KS 66608, US - MapQuest
104 Presidential Ct Lafayette La 70503
Dove Cremation Services Topeka Ks
Astro Seek Asteroid Chart
Rlcraft Toolbelt
How to Draw a Bubble Letter M in 5 Easy Steps
Lake Dunson Robertson Funeral Home Lagrange Georgia Obituary
Http://N14.Ultipro.com
Imperialism Flocabulary Quiz Answers
Enjoy4Fun Uno
دانلود سریال خاندان اژدها دیجی موویز
Lima Crime Stoppers
Powerspec G512
Grand Valley State University Library Hours
Ups Authorized Shipping Provider Price Photos
UWPD investigating sharing of 'sensitive' photos, video of Wisconsin volleyball team
Mejores páginas para ver deportes gratis y online - VidaBytes
Mail2World Sign Up
Laura Houston Wbap
Bradshaw And Range Obituaries
Latest Posts
Article information

Author: Aracelis Kilback

Last Updated:

Views: 5865

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Aracelis Kilback

Birthday: 1994-11-22

Address: Apt. 895 30151 Green Plain, Lake Mariela, RI 98141

Phone: +5992291857476

Job: Legal Officer

Hobby: LARPing, role-playing games, Slacklining, Reading, Inline skating, Brazilian jiu-jitsu, Dance

Introduction: My name is Aracelis Kilback, I am a nice, gentle, agreeable, joyous, attractive, combative, gifted person who loves writing and wants to share my knowledge and understanding with you.