MOLB 550: Bioinformatics and Genome Analysis
Operating systems, File management, email and File Transfer
Lecture 3: January 28, 1998
Topics: Operating Systems, File management and transfer, and email
An Operating System (OS) essentially provides a human interface to the cold inanimate wires, chips and other electronic components that make up a computer. In practice, operating systems actually comprise at least two "layers": a hardware layer and an interface layer. The hardware layer provides a mechanism for software to control hardware, such as disk drives, video and a mouse or trackball, and is usually buried deep among a myriad of little files and commands with cryptic and often unreadable names. The human interface layer provides a command line or graphical user interface (GUI) to the hardware layers, and forms the basic environment that users associate with particular machines. The hardware layer may be completely bundled with the OS and almost invisible to the average user, or the hardware layer may form an integral part of the OS. It is not normally possible or necessary for a user to adjust the hardware layers on any computer, but as almost all computer users have experienced, when problems occur it is often the fault of the hardware layer. The user interface, on the other hand, is designed to be customized and adjusted.
There are three major operating systems commonly used in the field of bioinformatics, and several minor OS's that may become important in the future. The primary OS used in bioinformatics today is the unix operating system followed closely by windows NT and the MacOS. In 1998, it is clear that unix still leads the other two, although windows NT is making ground and may well become the dominant OS while the MacOS is being relegated to end-user laboratories and specialty applications. An understanding of the important features of each operating system is essential for productivity and trouble-free use of bioinformatics programs and tools.
Unix - mature OS in three main flavors: BSD, SVR4, linux
- pre-emptive multitasking and multithreaded;
- process based and file based
MacOS - system 7 and MacOS 8
- No multitasking/multithreading in System 7; cooperative style multitasking in MacOS 8
Windows NT - 3.51 and 4.0
- preemptive multitasking and multithreaded;
The basic unit that computers operate upon is the file, Unix takes this to a logical extreme and calls all elements of the computer files (including hardware). Files are kept in directories or folders.
The advantage of using files to reference everything in unix is that you can use file commands to do many things that require special commands in other OS's. For example, to print a file you can simply copy it to the file that represents the printer.
Thankfully, file transfer is becoming easier due to better standards (TCP/IP, html), but not completely foolproof, still need some old tools such as ftp and uucp (including uuencode and uudecode in mail messages).
Files come in two basic configurations: text, ascii, flat and binary (images, propriety word processors)
Used to be that ascii files were 7-bit files and binary files were 8-bit files, but no more - in reality ASCII is just and agreed standard for information interchange (American Standard Code for Information Interchange)
In a standard 8 bit byte represents one character you can have 256 distinct values. (2 ^8 = 256), However, basic ASCII (US-ASCII, standard ASCII or NVT-ASCII) uses only 7 bits has 128 values (2^7 = 128) - this is the network version of ascii.
In fact, even NVT ascii uses 8 bit bytes for ascii but the 8th bit is not used:
Extended ascii (ASCII) and just plain ASCII (when not used on network) or when using and ISO character set for foreign characters. (eg ISO-2022-jp or ISO-8859-1) uses 256 (8-bit) characters.
Binary files are those files that use a different (usually priority) coding for character and control characters, because they often need to go beyond the 256 character limit. THESE FILES MUST BE ENCODED BEOFRE TRANSFER.
Use MIME, BinHex, uuencode, mailer encoding or ftp-binary transfer.
Good email technique is now essential to all scientists.
Mail packages:
Pine and mail à Unix services
Eudora, Pegasus, NETCentra, FreeMail à client only
MS Internet Mail, Netscape Communicator à Browser + Mail (Server available)
Lotus Notes, Novell Netware, MS Exchange à full featured proprietry systems
AOL, Compuserve, MSN à Complete internet services
Attachments:
Compression v's encoding: Lempel-Ziv- Huffman
First compress then encode, if using ftp!
Encoding (default on pine)
MIME (Multipurpose Internet Mail Extensions), BinHex, uuencode, or standard unix encoding schemes.
Compression:
Packages: WinZip, StuffIt (SIT files), WinPack
z - pack
Z - compress
zip - pkzip
gz - gnu zip
zoo - booz
Tar - unix system compression (also gtar)
Self extracting archives (*.exe) - zip files that can extract themselves.
Top of Document | Operating Systems | Files and File Transfer | email | syllabus