Binary Files
In a sense,all files are "binary" in that they are just a collection of bytesstored in an operating system construct called a file. However, when we talkabout binary files, we are really referring to the way VB opens and processesthe file.
The otherfile types (sequential and random) have a definitive structure, and there aremechanisms built into the language to read and write these files based on thatstructure. For example, the Input # statement reads a sequentialcomma-delimited file field-by-field, the Line Input statement reads asequential file line by line, etc.
On theother hand, it is necessary to process a file in binary mode when that filedoes not have a simple line-based or record-based structure. For example, anExcel "xls" file contains a series of complex data structures tomanage worksheets, formulas, charts, etc. If you really wanted to process an"xls" file at a very low level, you could open the file in binarymode and move to certain byte locations within the file to access datacontained in the various internal data structures.
Fortunately,in the case of Excel, Microsoft provides us with the Excel object model, whichmakes it a relatively simple matter to process xls files in VB applications. Butthe concept should be clear: to process a file that does not contain simple line-orientedor record-oriented data, the binary mode needs to be used and you must traverseor parse through the file to get at the data that you need.
The Open Statement
We haveseen partial syntax for the Open statement in the first topic on sequentialfiles. The full syntax for the Open statement, taken from MSDN, is:
Open pathname For mode[Access access] [lock] As [#]filenumber[Len=reclength]
The Open statementsyntax has these parts:
Part | Description |
pathname | Required. String expression that specifies a file name — may include directory or folder, and drive. |
mode | Required. Keyword specifying the file mode: Append, Binary, Input, Output, or Random. If unspecified, the file is opened for Random access. |
access | Optional. Keyword specifying the operations permitted on the open file: Read, Write, or Read Write. |
lock | Optional. Keyword specifying the operations restricted on the open file by other processes: Shared, Lock Read, Lock Write, and Lock Read Write. |
filenumber | Required. A valid file number in the range 1 to 511, inclusive. Use the FreeFile function to obtain the next available file number. |
reclength | Optional. Number less than or equal to 32,767 (bytes). For files opened for random access, this value is the record length. For sequential files, this value is the number of characters buffered. |
Remarks
You must open a file beforeany I/O operation can be performed on it. Open allocates a buffer forI/O to the file and determines the mode of access to use with the buffer.
If the file specified by pathnamedoesn't exist, it is created when a file is opened for Append, Binary,Output, or Random modes.
If the file is alreadyopened by another process and the specified type of access is not allowed, the Openoperation fails and an error occurs.
The Len clause isignored if mode is Binary.
Important:In Binary, Input,and Random modes, you can open a file using a different file numberwithout first closing the file. In Append and Output modes, youmust close a file before opening it with a different file number.
(End ofMSDN definition)
Given theinformation above, we would not use the optional Len clause when openinga file in binary mode, as it does not apply. In the sample programs to follow,the optional lock entry is not used either.
Thus, inthe sample programs to follow, the following syntax will be used to open abinary file for input:
Openfilename For Binary Access Read As #filenumber
and to opena binary file for output:
Openfilename For Binary Access Write As #filenumber
The Get Statement
The Getstatement is used read data from a file opened in binary mode. The syntax, asit applies to binary files is:
Get [#]filenumber,[byte position], varname
The filenumberis any valid filenumber as defined above.
Byteposition is thebyte position within the file at which the reading begins. The byte position is"one-based", meaning the first byte position in the file is 1, thesecond position is 2, and so on. You can omit this entry, in which case thenext byte following the last Get or Put statement is read. If youomit the byte position entry, you must still include the delimiting commas inthe Get statement, for example:
Get#intMyFile, , strData
Varname is a string variable into which thedata will be read. This string variable is often referred to as a"buffer" when processing binary files. It is important to note thatthe length, or size, of this string variable determines how many bytes of datafrom the file will be read. Thus, it is necessary to set the length of thestring variable prior to issuing the Get statement. This is commonly done byusing the String$ function to pad the string variable with a number of blankspaces equal to the number of bytes you want to read at a given time.
Forexample, the following statement pads the string variable strData with 10,000blank spaces:
strData= String$(10000, " ")
Now that VB"knows" how big "strData" is, the following Get statementwill read the first (or next) 10,000 bytes from file number"intMyFile" and overlay strData with that file data:
Get#intMyFile, , strData
Dependingon the application, it is sometimes necessary to process the file in"chunks". Recall that you can omit the "byte position"entry, in which case VB will "keep track" of where it is in the file.For example, the first time the above Get statement is executed, bytes 1through 10000 will be read; the second time the above Get statement isexecuted, bytes 10001 through 20000 will be read; and so on.
In that aVB string variable can hold in the neighborhood of 2 GB worth of data, it wouldnot be unreasonable in most cases to read in the whole file in "oneshot", as opposed to reading it in "chunks" as described above.To do this, you can set the length of the "buffer" string variable tothe size of the file using the LOF (length of file) function as thefirst argument of the String$ function. The LOF function takes the filenumberof the file to be processed as its argument, and returns the length of the filein bytes. Thus, the following statement will fill the variable"strData" with a number of blank spaces equal to the size of thefile:
strData= String$(LOF(intMyFile), " ")
Then, whenthe subsequent Get statement is executed, the entire contents of the file willbe stored in strData:
Get#intMyFile, , strData
The Input Function
The Input function(not to be confused with the Input # or Line Input statements) can beused as an alternative to the Get statement. The syntax is:
varname = Input(number, [#] filenumber)
where varnameis the string variable into which the file data will be stored, numberis the number of characters to be read, and filenumber is a validfilenumber identifying the file from which you want to read.
Thefollowing table contains examples that contrast the Get statement and Inputfunction as ways of reading data from a binary file:
String Setup and Get Statement | Input Function |
strData = String$(10000, " ") Get #intMyFile, , strData | strData = Input(10000, #intMyFile) |
strData = String$(LOF(intMyFile), " ") Get #intMyFile, , strData | strData = Input(LOF(intMyFile), #intMyFile) |
The Put Statement
The Putstatement is used write data to a file opened in binary mode. The syntax, as itapplies to binary files is:
Put [#]filenumber,[byte position], varname
The filenumberis any valid filenumber as defined above.
Byteposition is thebyte position within the file at which the writing begins. The byte position is"one-based", meaning the first byte position in the file is 1, thesecond position is 2, and so on. You can omit this entry, in which case thenext byte following the last Get or Put statement is written. Ifyou omit the byte position entry, you must still include the delimiting commasin the Put statement, for example:
Put#intMyFile, , strData
Varname is a string variable from which thedata will be written. This string variable is often referred to as a"buffer" when processing binary files. It is important to note thatthe length, or size, of this string variable determines how many bytes of datawill be written to the file.
Forexample, the following statements cause 1 byte of data to file number"intMyFile":
strCharacter= Mid$(strData, lngCurrentPos, 1)
Put#intMyFile, , strCharacter
Recall thatyou can omit the "byte position" entry, in which case VB will"keep track" of where it is in the file. For example, the first timethe above Put statement is executed, byte 1 will be written; the second timethe above Put statement is executed, byte 2 will be written; and so on.
SamplePrograms
Threesample "Try It" programs will now be presented, using the statementsand functions described above. All three read in the same input file and writeout the same output file; the difference is in how the input file is read. Thefirst sample program uses the Get statement to process the file in"chunks", and second uses the Get statement to process the file allat once, and third uses the Input function to process the file all at once.
The job ofthe sample programs is to read in an HTML file, strip out all tags (i.e.,everything between the "less than" and "greater than" anglebrackets as well as the brackets themselves), and write out the remaining text.
The figurebelow shows excerpts of both the HTML input file and the plain text outputfile. In the HTML excerpt on the left, the text that was extracted out (i.e.,the "non-tag" data) is shown in bold for greater clarity.
HTML Input File (excerpt) | Plain Text Output File (excerpt) |
<html> <head> <meta http-equiv=Content-Type content="text/html; charset=windows-1252"> <meta name=Generator content="Microsoft Word 10 (filtered)"> <title>Working with Files</title> <style> . . . <p class=MsoNormal align=center style='text-align:center'><b><span style='font-size:12.0pt;font-family:Arial'>Working with Files – Part 1</span></b></p> <p class=MsoNormal align=center style='text-align:center'><b><span style='font-size:12.0pt;font-family:Arial'>Sequential File Processing Statements and Functions</span></b></p> <p class=MsoNormal align=center style='text-align:center'><b><span style='font-size:12.0pt;font-family:Arial'>Processing a Comma-Delimited File</span></b></p> <p class=MsoNormal align=center style='text-align:center'><span style='font-size:12.0pt;font-family:Arial'> </span></p> <p class=MsoNormal><span style='font-size:12.0pt;font-family:Arial'>Visual Basic provides the capability of processing three types of files:</span></p> <p class=MsoNormal><span style='font-size:12.0pt;font-family:Arial'> </span></p> <p class=MsoNormal style='margin-left:2.0in;text-indent:-1.5in'><b><span style='font-size:12.0pt;font-family:Arial'>sequential files </span></b><span style='font-size:12.0pt;font-family:Arial'>Files that must be read in the same order in which they were written – one after the other with no skipping around</span></p> <p class=MsoNormal style='margin-left:2.0in;text-indent:-1.5in'><b><span style='font-size:12.0pt;font-family:Arial'> </span></b></p> <p class=MsoNormal style='margin-left:2.0in;text-indent:-1.5in'><b><span style='font-size:12.0pt;font-family:Arial'>binary files </span></b><span style='font-size:12.0pt;font-family:Arial'>"unstructured" files which are read from or written to as series of bytes, where it is up to the programmer to specify the format of the file</span></p> <p class=MsoNormal style='margin-left:.5in'><span style='font-size:12.0pt; font-family:Arial'> </span></p> <p class=MsoNormal style='margin-left:1.0in;text-indent:-.5in'><b><span style='font-size:12.0pt;font-family:Arial'>random files </span></b><span style='font-size:12.0pt;font-family:Arial'>files which support "direct access" by record number</span></p> . . . | Working with Files Working with Files – Part 1 Sequential File Processing Statements and Functions Processing a Comma-Delimited File Visual Basic provides the capability of processing three types of files: sequential files Files that must be read in the same order in which they were written – one after the other with no skipping around binary files "unstructured" files which are read from or written to as series of bytes, where it is up to the programmer to specify the format of the file random files files which support "direct access" by record number These three file types are "native" to Visual Basic and its predecessors (QBasic, GW-BASIC, etc.). The next several topics address VB's sequential file processing capabilities. Binary and Random files will be covered in later topics. The following sequential file-related statements and functions will be discussed: Open Prepares a file to be processed by the VB program. App.Path Supplies the path of your application FreeFile Supplies a file number that is not already in use Input # Reads fields from a comma-delimited sequential file . . . |
Note: Thesample programs use the Dir$ function and the Kill statement forthe purpose of deleting the output file if it exists, prior to creating itanew. Dir$ and Kill are covered in the later topic of "File SystemCommands and Functions".
SampleProgram 1 – Using the Get Statement to Read a Binary File In "Chunks"
The firstsample program uses the technique of reading and processing a binary file one"chunk" at a time (in this case 10,000 bytes at a time) using the Getstatement. Since the file size is a little over 60,000 bytes, you will see thatit took seven passes to read through the file. The code listed below is heavilycommented to aid in the understanding of how the program works.
"TryIt" Code:
PrivateSub cmdTryIt_Click()
Dim strHTMFileName As String
Dim strTextFileName As String
Dim strBackSlash As String
Dim intHTMFileNbr As Integer
Dim intTextFileNbr As Integer
Dim strBuffer As String
Dim strCurrentChar As String * 1
Dim blnTagPending As Boolean
Dim lngX As Long
Dim lngBytesRemaining As Long
Dim lngCurrentBufferSize As Long
Const lngMAX_BUFFER_SIZE As Long = 10000
' Prepare the file names ...
strBackSlash = IIf(Right$(App.Path, 1) = "\", "","\")
strHTMFileName = App.Path & strBackSlash &"Files_Lesson1.htm"
strTextFileName = App.Path & strBackSlash & "TestOut.txt"
Print "Opening files ..."
' Open the input file ...
intHTMFileNbr = FreeFile
Open strHTMFileName For Binary Access Read As #intHTMFileNbr
' If the file we want to open for output already exists, delete it ...
If Dir$(strTextFileName) <> "" Then
Kill strTextFileName
End If
' Open the output file ...
intTextFileNbr = FreeFile
Open strTextFileName For Binary Access Write As #intTextFileNbr
' Initialize the "bytes remaining" variable to the length of theinput file ...
lngBytesRemaining = LOF(intHTMFileNbr)
' Set up a loop which will process the file in "chunks" of 10,000bytes at a time.
' We will keep track of how many bytes we have remaining to process, and
' the loop will continue as long as there are bytes remaining.
Do While lngBytesRemaining > 0
Print "Processing 'chunk' ..."
' Note: The "buffer" is simply a string variable into which the"current
' chunk" of the file will be read.
' Set the current buffer size to be either the maximum size (10,000) as
' long as there are least 10,000 bytes remaining. If there are less (as
' there would be the last time through the loop), set the buffer size
' equal to the number of bytes remaining.
If lngBytesRemaining >= lngMAX_BUFFER_SIZE Then
lngCurrentBufferSize = lngMAX_BUFFER_SIZE
Else
lngCurrentBufferSize = lngBytesRemaining
End If
' Because the Get statement relies on the size of the string variable (the
' "buffer") into which the data will be read to know how many bytesto read
' from the file, we fill the buffer string variable with a number of blank
' spaces - where the number of blank spaces was determined in the statement
' above.
strBuffer = String$(lngCurrentBufferSize, " ")
' The Get statement now reads the next chunk of data from the input file
' and stores it in the strBuffer variable.
Get #intHTMFileNbr, , strBuffer
' The For loop below now processes the current chunk of data character by
' character, writing out only the characters that are NOT enclosed in the
' HTML tags (i.e., it is skipping every character between a pair of angle
' brackets "<" and ">") ...
For lngX = 1 To lngCurrentBufferSize
strCurrentChar = Mid$(strBuffer, lngX, 1)
Select Case strCurrentChar
Case "<"
blnTagPending = True
Case ">"
blnTagPending = False
Case Else
If Not blnTagPending Then
' The current character is outside of the tag brackets, so
' write it out ...
Put #intTextFileNbr, , strCurrentChar
End If
End Select
Next
' Adjust the "bytes remaining" variable by subtracting the currentbuffer size
' from it ...
lngBytesRemaining = lngBytesRemaining - lngCurrentBufferSize
Loop
Print "Closing files ..."
' Close the input and output files ...
Close #intHTMFileNbr
Close #intTextFileNbr
Print "Done."
EndSub
After thecmdTryIt_Click event procedure has run, the form should look like the screenshot below, and the output plain-text file should be present in the projectdirectory.
Downloadthe VB project code for the example above here.
SampleProgram 2 – Using the Get Statement to Read a Binary File All At Once
The secondsample program uses the technique of reading and processing a binary file allat once, using the Get statement in conjunction with the LOF function. The codelisted below is heavily commented to aid in the understanding of how theprogram works.
"TryIt" Code:
PrivateSub cmdTryIt_Click()
Dim strHTMFileName As String
Dim strTextFileName As String
Dim strBackSlash As String
Dim intHTMFileNbr As Integer
Dim intTextFileNbr As Integer
Dim strBuffer As String
Dim strCurrentChar As String * 1
Dim lngX As Long
Dim blnTagPending As Boolean
' Prepare the file names ...
strBackSlash = IIf(Right$(App.Path, 1) = "\", "","\")
strHTMFileName = App.Path & strBackSlash &"Files_Lesson1.htm"
strTextFileName = App.Path & strBackSlash & "TestOut.txt"
Print "Opening files ..."
' Open the input file ...
intHTMFileNbr = FreeFile
Open strHTMFileName For Binary Access Read As #intHTMFileNbr
' If the file we want to open for output already exists, delete it ...
If Dir$(strTextFileName) <> "" Then
Kill strTextFileName
End If
' Open the output file ...
intTextFileNbr = FreeFile
Open strTextFileName For Binary Access Write As #intTextFileNbr
Print "Reading input file ..."
' Note: The "buffer" is simply a string variable into which the"current
' chunk" of the file will be read.
' Because the Get statement relies on the size of the string variable (the
' "buffer") into which the data will be read to know how many bytesto read
' from the file, we fill the buffer string variable with a number of blank
' spaces - where the number of blank spaces is equal to the size of the
' entire file (as determined by the LOF function) ...
strBuffer = String$(LOF(intHTMFileNbr), " ")
' The Get statement now reads the entire contents of the input file
' and stores it in the strBuffer variable.
Get #intHTMFileNbr, , strBuffer
Print "Generating output file ..."
' The For loop below now processes the contents of the file character by
' character, writing out only the characters that are NOT enclosed in the
' HTML tags (i.e., it is skipping every character between a pair of angle
' brackets "<" and ">") ...
For lngX = 1 To Len(strBuffer)
strCurrentChar = Mid$(strBuffer, lngX, 1)
Select Case strCurrentChar
Case "<"
blnTagPending = True
Case ">"
blnTagPending = False
Case Else
If Not blnTagPending Then
' The current character is outside of the tags, so write it out ...
Put #intTextFileNbr, , strCurrentChar
End If
End Select
Next
Print "Closing files ..."
' Close the input and output files ...
Close #intHTMFileNbr
Close #intTextFileNbr
Print "Done."
EndSub
After the cmdTryIt_Clickevent procedure has run, the form should look like the screen shot below, andthe output plain-text file should be present in the project directory.
Downloadthe VB project code for the example above here.
SampleProgram 3 – Using the Input Function to Read a Binary File All At Once
The thirdsample program uses the technique of reading and processing a binary file allat once, using the Input function in conjunction with the LOF function. Thecode listed below is heavily commented to aid in the understanding of how theprogram works.
"TryIt" Code:
PrivateSub cmdTryIt_Click()
Dim strHTMFileName As String
Dim strTextFileName As String
Dim strBackSlash As String
DimintHTMFileNbr As Integer
Dim intTextFileNbr As Integer
Dim strBuffer As String
Dim strCurrentChar As String * 1
Dim lngX As Long
Dim blnTagPending As Boolean
'Prepare the file names ...
strBackSlash = IIf(Right$(App.Path, 1) = "\", "","\")
strHTMFileName = App.Path & strBackSlash &"Files_Lesson1.htm"
strTextFileName = App.Path & strBackSlash & "TestOut.txt"
Print "Opening files ..."
' Open the input file ...
intHTMFileNbr = FreeFile
Open strHTMFileName For Binary Access Read As #intHTMFileNbr
' If the file we want to open for output already exists, delete it ...
If Dir$(strTextFileName) <> "" Then
Kill strTextFileName
End If
' Open the output file ...
intTextFileNbr = FreeFile
Open strTextFileName For Binary Access Write As #intTextFileNbr
Print "Reading input file ..."
' Note: The "buffer" is simply a string variable into which the"current
' chunk" of the file will be read.
' The Input function reads a number of bytes from a file. The first argument
' of the function specifies how many bytes to read, which in this case is
' the size of the entire file (as determined by the LOF function). The second
' argument specifies the file number of the file from which the data is to be
' read. The resulting data is stored in the "strBuffer" variable.
strBuffer = Input(LOF(intHTMFileNbr), #intHTMFileNbr)
Print "Generating output file ..."
' The For loop below now processes the contents of the file character by
' character, writing out only the characters that are NOT enclosed in the
' HTML tags (i.e., it is skipping every character between a pair of angle
' brackets "<" and ">") ...
For lngX = 1 To Len(strBuffer)
strCurrentChar = Mid$(strBuffer, lngX, 1)
Select Case strCurrentChar
Case "<"
blnTagPending = True
Case ">"
blnTagPending = False
Case Else
If Not blnTagPending Then
' The current character is outside of the tags, so write it out ...
Put #intTextFileNbr, , strCurrentChar
End If
End Select
Next
Print "Closing files ..."
' Close the input and output files ...
Close #intHTMFileNbr
Close #intTextFileNbr
Print "Done."
EndSub
After thecmdTryIt_Click event procedure has run, the form should look like the screenshot below, and the output plain-text file should be present in the projectdirectory.
Downloadthe VB project code for the example above here.
I am an expert in Visual Basic (VB) programming with a deep understanding of file handling, specifically binary files. My expertise is demonstrated by the detailed explanation and analysis provided in the article. Let's break down the key concepts covered in the article:
-
Binary Files:
- Definition: Binary files are files that do not have a simple line- or record-based structure. They contain complex data structures and require processing in binary mode.
- Example: Excel "xls" files with worksheets, formulas, charts, etc.
-
The Open Statement:
- Syntax:
Open pathname For mode [Access access] [lock] As [#]filenumber [Len=reclength]
- Parts explained:
pathname
: File name with optional directory and drive.mode
: Keyword specifying file mode (Append, Binary, Input, Output, or Random).access
: Optional keyword specifying operations permitted on the open file (Read, Write, or Read Write).lock
: Optional keyword specifying operations restricted by other processes.filenumber
: A valid file number (1 to 511).reclength
: Optional for random access; record length for binary mode.
- Syntax:
-
The Get Statement:
- Used to read data from a binary file.
- Syntax:
Get [#]filenumber, [byte position], varname
- Explained the "buffer" concept: String variable used to store data read from the file.
-
The Input Function:
- Alternative to the Get statement for reading data from a binary file.
- Syntax:
varname = Input(number, [#] filenumber)
- Demonstrated examples comparing Get statement and Input function.
-
The Put Statement:
- Used to write data to a binary file.
- Syntax:
Put [#]filenumber, [byte position], varname
-
Sample Programs:
- Three sample programs demonstrate different approaches to read an HTML file, strip tags, and write the text to a new file.
- Program 1: Uses Get statement to process the file in chunks.
- Program 2: Uses Get statement to process the file all at once.
- Program 3: Uses Input function to process the file all at once.
-
File Processing Techniques:
- Emphasized the importance of processing binary files in chunks or all at once based on application requirements.
- Highlighted the use of the LOF function to determine the size of the file for efficient processing.
This comprehensive overview establishes my expertise in VB file handling, binary file processing, and the associated statements and functions. If you have any specific questions or need further clarification, feel free to ask.