Tuesday, May 29, 2007

.NET Assembly

http://en.wikipedia.org

For the counterpart to assembly language in the Microsoft .NET framework, see Common Intermediate Language.

In the Microsoft .NET framework an assembly is a partially compiled code library for use in deployment, versioning and security. In the Microsoft Windows implementation of .NET, an assembly is a PE (portable executable) file. There are two types, process assemblies (EXE) and library assemblies (DLL). A process assembly represents a process which will use classes defined in library assemblies. In version 1.1 of the CLR classes can only be exported from library assemblies; in version 2.0 this restriction is relaxed. The compiler will have a switch to determine if the assembly is a process or library and will set a flag in the PE file. .NET does not use the extension to determine if the file is a process or library. This means that a library may have either .dll or .exe as its extension.

The code in an assembly is compiled into MSIL, which is then compiled into machine language at runtime by the CLR.

An assembly can consist of one or more files. Code files are called modules. An assembly can contain more than one code module and since it is possible to use different languages to create code modules this means that it is technically possible to use several different languages to create an assembly. In practice this rarely happens, principally because Visual Studio only allows developers to create assemblies that consist of a single code module.

Assembly names

The name of an assembly consists of four parts:
  1. The short name. On Windows this is the name of the PE file without the extension.
  2. The culture. This is an RFC 1766 identifier of the locale for the assembly. In general, library and process assemblies should be culture neutral; the culture should only be used for satellite assemblies.
  3. The version. This is a dotted number made up for 4 values — major, minor, build and revision. The version is only used if the assembly has a strong name (see below).
  4. A public key token. This is a 64-bit hash of the public key which corresponds to the private key used to sign[1] the assembly. A signed assembly is said to have a strong name.

The public key token is used to make the assembly name unique. Thus, two strong named assemblies can have the same PE file name and yet .NET will recognize them as different assemblies. The Windows file system (FAT32 and NTFS) only recognizes the PE file name, so two assemblies with the same PE file name (but different culture, version or public key token) cannot exist in the same Windows folder. To solve this issue .NET introduces something called the GAC (Global Assembly Cache) which is treated as a single folder by the .NET CLR, but is actually implemented using nested NTFS (or FAT32) folders.

To prevent spoofing attacks, where a cracker would try to pass off an assembly appearing as something else, the assembly is signed with a private key. The developer of the intended assembly keeps the private key secret, so a cracker cannot have access to it, and cannot guess the associated public key. Thus the cracker cannot make his assembly impersonate something else. Signing the assembly involves taking a hash of important parts of the assembly and then encrypting the hash with the private key. The signed hash is stored in the assembly along with the public key. The public key will decrypt the signed hash. When the CLR loads a strongly named assembly it will generate a hash from the assembly and then compare this with the decrypted hash. If the comparison succeeds then it means that the public key in the file (and hence the public key token) is associated with the private key used to sign the assembly. This will mean that the public key in the assembly is the public key of the assembly publisher and hence a spoofing attack is thwarted.

Assemblies and .NET security

.NET Code Access Security is based on assemblies and evidence. Evidence can be anything deduced from the assembly, but typically it is created from the source of the assembly — whether the assembly was downloaded from the Internet, an intranet, or installed on the local machine (if the assembly is downloaded from another machine it will be stored in a sandboxed location within the GAC and hence is not treated as being installed locally). Permissions are applied to entire assemblies, and an assembly can specify the minimum permissions it requires through custom attributes (see .NET metadata). When the assembly is loaded the CLR will use the evidence for the assembly to create a permission set of one or more code access permissions. The CLR will then check to make sure that this permission set contains the required permissions specified by the assembly.

.NET code can perform a code access security demand. This means that the code will perform some privileged action only if all of the assemblies of all of the methods in the call stack have the specified permission. If one assembly does not have the permission a security exception is thrown.

The .Net code can also perform Linked Demand for getting the permission from the call stack. In this case the CLR will look for only one method in the call stack in the TOP position has the specified permission. Here the stack walk through is bound to one method in the call stack by which the CLR assumes that all the other methods in the CALL STACK have the specified permission.

Private and shared assemblies

When a developer compiles code the compiler will put the name of every library assembly it uses in the compiled assembly's .NET metadata. When the CLR executes the code in the assembly it will use this metadata to locate the assembly using a technology called Fusion. If the called assembly does not have a strong name, then Fusion will only use the short name (the PE file name) to locate the library. In effect this means that the assembly can only exist in the application folder, or in a subfolder, and hence it is called a private assembly because it can only be used by a specific application. Versioning is switched off for assemblies that do not have strong names, and so this means that it is possible for a different version of an assembly to be loaded than the one that was used to create the calling assembly.

The compiler will store the complete name (including version) of strongly named assembly in the metadata of the calling assembly. When the called assembly is loaded, Fusion will ensure that only an assembly with the exact name, including the version, is loaded. Fusion is configurable, and so you can provide an application configuration file to tell Fusion to use a specific version of a library when another version is requested.

Shared assemblies are stored in the GAC. This is a system-wide cache and all applications on the machine can use any assembly in the cache. To the casual user it appears that the GAC is a single folder, however, it is actually implemented using FAT32 or NTFS nested folders which means that there can be multiple versions (or cultures) of the same.

Satellite assemblies

In general, assemblies should only contain culture-neutral resources. If you want to localize your assembly (for example use different strings for different locales) you should use satellite assemblies — special, resource-only assemblies. Satellites are not loaded by Fusion and so they should not contain code. As the name suggests, a satellite is associated with an assembly called the main assembly. That assembly (say, lib.dll) will contain the neutral resources (which Microsoft says is International English, but implies to be US English). Each satellite has the name of the associated library appended with .resources (for example lib.resources.dll). The satellite is given a non-neutral culture name, but since this is ignored by existing Windows file systems (FAT32 and NTFS) this would mean that there could be several files with the same PE name in one folder. Since this is not possible, satellites must be stored in subfolders under the application folder. For example, a satellite with the UK English resources will have a .NET name of "lib.resources Version=0.0.0.0 Culture=en-GB PublicKeyToken=null", a PE file name of lib.resources.dll, and will be stored in a subfolder called en-GB.

Satellites are loaded by a .NET class called System.Resources.ResourceManager. The developer has to provide the name of the resource and information about the main assembly (with the neutral resources). The ResourceManager class will read the locale of the machine and use this information and the name of the main assembly to get the name of the satellite and the name of the subfolder that contains it. ResourceManager can then load the satellite and obtain the localized resource.

Fusion

File systems in common use by Windows (FAT32, NTFS, CDFS, etc.) are restrictive because the file names do not include information like versioning or localization. This means that two different versions of a file cannot exist in the same folder unless their names have versioning information. Fusion is the Windows loader technology that allows versioning and culture information to be used in the name of a .NET assembly that is stored on these filesystems. Despite being the exclusive system for loading a managed assembly into a process, Fusion is also currently used to load Win32 assemblies independent of managed assembly loading.
Fusion uses a specific search order when it looks for an assembly.

  1. If the assembly is strongly named it will first look in the GAC.
  2. Fusion will then look for redirection information in the application's configuration file. If the library is strongly named then this can specify that another version should be loaded, or it can specify an absolute address of a folder on the local hard disk, or the URL of a file on a web server. If the library is not strongly named, then the configuration file can specify a subfolder beneath the application folder to be used in the search path.
  3. Fusion will then look for the assembly in the application folder with either the extension .exe or .dll.
  4. Fusion will look for a subfolder with the same name as the short name (PE file name) of the assembly and then looks for the assembly in that folder with either the extension .exe or .dll.


If Fusion cannot find the assembly, the assembly image is bad, or if the reference to the assembly doesn't match the version of the assembly found, it will throw an exception. In addition, information about the name of the assembly, and the paths that it checked, will be stored. This information may be viewed by using the Fusion log viewer (fuslogvw), or if a custom location is configured, directly from the HTML log files generated.

Referencing assemblies

One can reference an executable code library by using the /reference flag of the C# compiler.

Delaysigning of an assembly

The shared assemblies need to give a strong name for uniquely identifying the assembly which might be shared among the applications. The strong naming consists of the public key token, culture, version and PE file name. If an assembly is likely to be used for the development purpose which is a shared assembly, the strong naming procedure contains only public key generation. The private key is not generated at that time. It is generated only when the assembly is deployed.

Language of an assembly

The assembly is built up with the MSIL code. MSIL code is nothing but assembly language coding. The framework internally converts the high level language code into assembly code. If we have a program that prints "Hello world", the equivalent MSIL code is:.method private hidebysig static void Main(string[] args) cil managed {
.entrypoint
.custom instance void [mscorlib]System.STAThreadAttribute::.ctor() = ( 01 00 00 00 )
// Code size 11 (0xb)
.maxstack 1
IL_0000: ldstr "Hello World"
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
IL_000a: ret } // end of method Class1::Main
So the assembly code loads the String first into stack. Then it calls the Writeline function and stores the address where the control should return after the function is over.

No comments: