type/token ratios (2024)

WordList > type/token ratios

If a text is 1,000 words long, it is said to have 1,000 "tokens". But a lot of these words will be repeated, and there may be only say 400 different words in the text. "Types", therefore, are the different words.

The ratio between types and tokens in this example would be 40%.

But this type/token ratio (TTR) varies very widely in accordance with the length of the text -- or corpus of texts -- which is being studied. A 1,000 word article might have a TTR of 40%; a shorter one might reach 70%; 4 million words will probably give a type/token ratio of about 2%, and so on. Such type/token information is rather meaningless in most cases, though it is supplied in a WordList statistics display. The conventional TTR is informative, of course, if you're dealing with a corpus comprising lots of equal-sized text segments (e.g. the LOB and Brown corpora). But in the real world, especially if your research focus is the text as opposed to the language, you will probably be dealing with texts of different lengths and the conventional TTR will not help you much.

Wordlist uses a different strategy for computing this, therefore. The standardised type/token ratio (STTR) is computed every n words as Wordlist goes through each text file. By default, n = 1,000. In other words the ratio is calculated for the first 1,000 running words, then calculated afresh for the next 1,000, and so on to the end of your text or corpus. A running average is computed, which means that you get an average type/token ratio based on consecutive 1,000-word chunks of text. (Texts with less than 1,000 words (or whatever n is set to) will get a standardised type/token ratio of 0.)

Setting the N boundary

Adjust the n number in Minimum & Maximum Settings to any number between 100 and 20,000.

What STTR actually counts

Note: The ratio is computed a) counting every different form as a word (so say and says are two types) b) using only the words which are not in a stop-list c) those which are within the length you have specified, d) taking your preferences about numbers and hyphens into account.

The number shown is a percentage of new types for every n tokens. That way you can compare type/token ratios across texts of differing lengths. This method contrasts with that of Tuldava (1995:131-50) who relies on a notion of 3 stages of accumulation. The WordSmith method of computing STTR was my own invention but parallels one of the methods devised by the mathematician David Malvern working with Brian Richards (University of Reading).

Further discussion

TTR and STTR are both pretty crude measures even if they are often assumed to imply something about "lexical density". Suppose you had a text which spent 1,000 words discussing ELEPHANT, LION, TIGER etc, and then 1,000 discussing MADONNA, ELVIS, etc., then 1,000 discussing CLOUD, RAIN, SUNSHINE. If you set the STTR boundary at 1,000 and happened to get say 48% or so for each section, the statistic in itself would not tell you there was a change involving Africa, Music, Weather. Suppose the boundary between Africa & Music came at word 650 instead of at word 1,000, I guess there'd be little or no difference in the statistic. But what would make a difference? A text which discussed clouds and written by a person who distinguished a lot between types of cloud might also use MIST, FOG, CUMULUS, CUMULO-NIMBUS. This would be higher in STTR than one written by a child who kept referring to CLOUD but used adjectives like HIGH, LOW, HEAVY, DARK, THIN, VERY THIN to describe the clouds... and who repeated DARK, THIN, etc a lot in describing them.....

(NB. Shakespeare is well known to have used a rather limited vocabulary in terms of measures like these!)

type/token ratios (2024)
Top Articles
How to Sell on eBay: 2024 Profitable Guide To Make Money
What Determines Bitcoin's Price?
Craigslist Radford Virginia
Parent Portal Pat Med
Klondike Solitaire - Online & 100% Free
Auto Wheels & Tires near Cleveland, OH - craigslist
What Dinosaurs Are Scavengers In Jurassic World Evolution 2 - Stunningdino.com
1v1 Lol | Play Unblocked Games on Ubg4all
825 Riverside Parkway Suite 100 West Sacramento
Schoology Fort Bend Isd
Telegram FAQ
Varsity Competition Results 2022
Harnett County Detention Center NC Recent Arrests and Bookings
OSRS Monkey Madness - RuneScape Guide
Economic Census: NAICS Codes & Understanding Industry Classification Systems
Craigslist Golf Clubs For Sale
Thothut
❤️ Red Heart Emoji Guide For All Girls and Boys
Sanford Orlando Kennel Club Results
Deviantart Stuffing
Kenzie Reeves Wikipedia
Gulfstream Park Entries And Results
M3Gan Showtimes Near Regal City North
Rachel Griffin | Singer Songwriter from New York, NY
Howmet Upoint Login
Becker-Hunt Funeral Home Obituaries
Uw Madison Kb
Behind the Idea: OpenPayd | The Fintech Times
Public Policy 101 Icivics Answer Key
Morse Road Bmv Hours
chicago houses for rent - craigslist
Costco Gas Price City Of Industry
Ramsey County Recordease
Sun Tracker Pontoon Wiring Diagram
Ozark/Nixa 12 Movie Showtimes & Tickets | Ozark Movie Theater - B & B Theatres
Emerson Naturals Kratom
Nbstsa Verification
Seminole Producer Obituaries 2022
Wfin Local News
Iowa State Map Campus
Basketball Stars Unblocked Games Premium
Eastway Wrecker Auction List
Bfads 2022 Walmart
417-990-0201
2000 Ford F-150 for sale - Scottsdale, AZ - craigslist
Carroll ticking off more milestones in breakout campaign
Opscans 1073
Minions 2 Mentor Crossword
Walmartjobs.com Career
Sallisaw Bin Store
Mohamed 6 Style Vestimentaire
Usps Passport Appt
Latest Posts
Article information

Author: The Hon. Margery Christiansen

Last Updated:

Views: 6147

Rating: 5 / 5 (70 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: The Hon. Margery Christiansen

Birthday: 2000-07-07

Address: 5050 Breitenberg Knoll, New Robert, MI 45409

Phone: +2556892639372

Job: Investor Mining Engineer

Hobby: Sketching, Cosplaying, Glassblowing, Genealogy, Crocheting, Archery, Skateboarding

Introduction: My name is The Hon. Margery Christiansen, I am a bright, adorable, precious, inexpensive, gorgeous, comfortable, happy person who loves writing and wants to share my knowledge and understanding with you.