The FIDO2 standards contain some special requirements on the PIN. One constraint is thatthe PIN must be supplied as "... the UTF-8 representation of" the "Unicode characters inNormalization Form C". Another constraint is that the PIN must be a minimum lengthmeasured in "code points" (the standard declares, "This specification attempts to countcode points as an approximation of Unicode characters"), and a maximum length measured inbytes (described further below).
What does that mean? How does one build such a PIN?
Unicode characters
First, let's look at "Unicode characters". The Unicode standard specifies a number foreach character supported. For example, the number for cap-A is U+0041
or 0x000041
.The number for the lower case greek letter pi (π) is U+03C0
. There is no logicallimit to numbers, but currently the maximum Unicode number is 0x10FFFF
(21-bits, or 3bytes).
Unfortunately, it is also possible to create "combinations". There is a "block" of unicodenumbers that are "combining diacritical marks", meaning that when they appear in an arrayof characters, software that can render Unicode will know to combine them with theprevious character. For example, the Unicode for lower case e
is U+0065
, and theUnicode for "acute accent" is U+0301
(the acute accent is a small, diagonal line above aletter, sort of like a single quote or forward slash). Combine the two
char[] eWithAcute = new char[] { '\u0065', '\u0301' };
and the result is a lower case e
with an acute accent: é.
There is also a Unicode number for an e
with an acute accent: U+00E9
. In other words,there are two ways to represent this letter in Unicode.
char[] eWithAcute = new char[] { '\u0065', '\u0301' }; char[] sameCharacter = new char[] { '\u00E9' };
Normalization
In order to use a PIN, there has to be one and only one way to encode the characters.Otherwise, someone could enter the correct PIN and if the underlying platform encodes itdifferently than the original one, then it would not authenticate. So the second elementof the PIN is normalization. There is a standard that specifies how to "convert" most ofthe combinations into single numbers. For example, normalization can convert0065 0301
into 00E9
.
Hence if your PIN is normalized, then there is only one set of numbers to represent it.The standard specifies a number of ways to normalize, and FIDO2 has chosen the techniquedescribed as "Form C".
UTF-8
Once the PIN has been normalized, it is in essence an array of Unicode numbers. It wouldbe possible to specify that each character in the PIN be a 3-byte (big endian) number. Itwould also be possible to specify that only 16-bit characters be allowed in a PIN andencode it as an array of 2-byte values. However, the standard specifies encoding it asUTF-8. In this encoding scheme, many characters can be expressed as a single byte, ratherthan two or three. In addition, there are no 00
bytes in UTF-8. For example, cap-C isU+0043
and in UTF-8, it is 0x43
. The letter pi is U+03C0
, and is encoded in UTF-8 as0xCB80
. In this way, it is possible to save space by "eliminating" many of the 00
bytes.
Actually, the encoding scheme is efficient only in that it treats ASCII characters assingle bytes. There are non-ASCII Unicode characters that are only one byte (U+00xx
),and are UTF-8 encoded as two bytes, and some two-byte Unicode characters that areencoded using three bytes, and three-byte Unicode encoded in four bytes. However, becauseASCII characters are the most-used characters, the efficienices usually outweigh theinefficiencies.
C# and Unicode
Your PIN collection code will likely include some code that does something like this.
while (someCheck) { ConsoleKeyInfo currentKeyInfo = Console.ReadKey(); if (currentKeyInfo.Key == ConsoleKey.Enter) { break; } inputData = AppendChar(currentKeyInfo.KeyChar, inputData, ref dataLength); }
You read each character in the PIN as a char
and append it to a char[]
. You could usethe string
class, but Microsoft recommends not using the string
class to hold sensitivedata. This is because:
System.String instances are immutable, operations that appear to modify an existinginstance actually create a copy of it to manipulate. Consequently, if a String objectcontains sensitive information such as a password, credit card number, or personal data,there is a risk the information could be revealed after it is used because yourapplication cannot delete the data from computer memory.
By reading each PIN as a char
, you are limiting the characters you support to those thatcan be represented as a 16-bit number in the Unicode space. You would not supportU-10000
to U+10FFFF
. This will almost certainly be no problem, because these numbersalmost exclusively represent emojis and other figures (e.g. U+1F994 is a hedgehog:🦔), along with rare alphabets (e.g. U+14400 to U+14646 are for Anatolianhieroglyphs).
You now have a char array to represent the PIN.
C# and Normalization
At this point, you need to normalize. For example, suppose that someone has a Germankeyboard and originally set a FIDO2 PIN that included a lower case u
with an umlaut(ü). That keyboard represented the character as U+00FC
. But now this person isusing a keyboard that has no umlaut so uses the keystrokes Option-U
followed by u
.Maybe the platform reads it as U+00FC
, but maybe it reads it as U+0075, U+0308
.
If the char array is normalized, U+00FC
will stay U+00FC
, but U+0075, U+0308
will beconverted to U+00FC
.
How does one normalize in C#? Unfortunately there are no good solutions. Here are threepossibilities: ignore the problem and assume no one will use a PIN that really needsnormalization, write your own normalization code (or obtain something from a vendor), oruse the String.Normalize
method which would store the PIN in a new immutable stringinstance.
Assume PINs will not need normalization
This might not be unsafe. While it is possible to have a PIN that when entered is not thesame as the normalized version, it is not likely.
First of all, a PIN that consists of only ASCII characters is normalized. Second, mostpeople will choose a PIN that does not contain unusual characters. And third, there isa good chance that the keyboard or PIN-reading software will return the normalized versionof a character even if some other form is possible.
Write your own normalization code
To do so, you will likely reference the Unicode standard along with the NormalizationAnnex to develop some class that can read a char
array and convert those values to thenormalized form C. For example, your program might read all the characters and determineif there are any characters from the "combining diacritical marks" block. If so, combinethem with the appropriate prior character and map to the normalized value.
Alternatively, you might want to use some Open Source normalization code or find someother vendor with some module that can perform the appropriate operations.
char[] pinChars = CollectPin(); char[] normalizedPinChars = PerformNormalization(pinChars);
Normalization using the string
class
As we saw above, holding sensitive data in a string
carries some risk. Whether or notthis is an acceptable risk for your application is something that you will need todetermine. If your application's risk profile would allow the use of the string
class,here's what you can do.
char[] pinChars = CollectPin(); char[] normalizedPinChars = PerformNormalization(pinChars); . . .public char[] PerformNormalization(char[] pinChars){ string pinAsString = new string(pinChars); string normalizedPin = pinAsString.Normalize(); return normalizedPin.ToCharArray();}
C# and UTF-8
Once you have an array of characters, you can convert that into UTF-8 using the C#Encoding
class.
byte[] utf8Pin = Encoding.UTF8.GetBytes(normalizedPinChars);
This byte array is what you pass to theSetPinCommand.
If you are using the string
class to normalize, your code could look something likethis.
char[] pinChars = CollectPin(); string pinAsString = new string(pinChars); string normalizedPin = pinAsString.Normalize(); byte[] utf8Pin = Encoding.UTF8.GetBytes(normalizedPin);
Length restrictions
The standard specifies that a PIN must be at least four code points. Remember, thestandard declares, "This specification attempts to count code points as an approximationof Unicode characters".
The standard also specifies that a PIN can be no more than 63 bytes. That means after thePIN has been converted to "... the UTF-8 representation of" the "Unicode characters inNormalization Form C", it is a byte array. That byte array's length must be less than orequal to 63.
It is possible a YubiKey can be manufactured with a longer minimum length (that is allowedby the standard), and it is possible on some YubiKeys to programmatically increase theminimum length. You can find the minimum PIN length on any YubiKey in theAuthenticatorInfo'sMinimumPinLength property.
The standard does not allow increasing or decreasing the maximum PIN length.