Some pictures from the Office TLC at TechEd 09 North America in Los Angeles
Crazy Big E (Where’s the cowboy hat?)
Scot the Knife
Performance monitoring and tuning are topics which most professionals know or care little about – until performance becomes a problem. It’s one of the topics that doesn’t come up frequently enough to drive a lot of interest in understanding how it works, how it fits together, and what to do about it. However, there is value in understanding the fundamentals of performance monitoring with Windows-based system and what to do based on what you find.
The primary activity in performance monitoring is seeking to understand the bottlenecks in the system which either is already causing performance issues or have the potential to cause performance issues in the future. Because we’re seeking out bottlenecks we’re looking – primarily – for metrics and counters which are able to tell us the relative amount of capacity that has been used and how much is remaining.
Because of this the performance metrics that we’re going to gravitate to are those which are expressed as a percentage. The percent of disk time, percent of CPU time, and percent of network usage are good examples of the kinds of metrics that we’ll want to focus on when evaluating performance at a macro level. They are not, however, an exhaustive list of metrics. They are only the metrics that are easiest to understand and extract value from quickly.
Even with counters that report status on a percentage of available resources there are still challenges to face. The first challenge is determining when there’s a problem because of a sustained lack of available resources and when it’s a momentary blip on the radar.
The primary consideration in performance monitoring is over what interval of time can you accept performance challenges? What level of performance is acceptable and which is not? Is it important that the CPU have some availability every second? In most cases the answer to that question is no. However, the question becomes more difficult as you ask the question over a one minute interval. Most users tolerate the occasional slow down that is over within a minute. However, hours of performance problems are a different story.
So when evaluating what is a performance problem and what isn’t a performance problem consider how long your users would be willing to accept a slow down and then ignore or temper your response to momentary spikes in whatever counter you’re looking at. Momentary spikes are a normal occurrence and simply mean that the system is pouring all of its resources into fulfilling the requests that the users have made.
Performance monitoring on a Windows system requires an understanding of the way that Windows breaks down counters. On a Windows system performance monitoring starts with an object. An object is a broad representation of something, such as memory. This broad topic groups a set of related counters. Each counter is an individual measure in that category. For the memory object, page faults/sec, pages/sec, and committed bytes are all examples of counters. Each counter may measure the object in a different way but all of them relate to the object to which the counter belongs.
For each counter there may be multiple instances. An instance is a copy of the counter for a different part of the system. For instance, if a system has two processors, the counter for % processor time will have three instances; one for each processor and one for a total (or average) between the two processors. In most cases each instance needs to be viewed separately from the others to identify specific areas where problems may occur.
You’ll find that for most purposes there are only four areas of performance monitoring that you care about. They are: memory, disk, processor, and network. These are the key metrics because they are the core system components that are most likely to be the source of the bottlenecks.
One of the challenges in performance monitoring is the interdependence of these key subsystems on one another. A bottleneck in one area can quickly become a bottleneck in another area. Thus the order which you evaluate the performance of these subsystems is important to reaching the right conclusion.
The first characteristic to evaluate is the memory characteristic because it has the greatest potential to impact the other metrics. Memory will, in fact, often show up as a disk performance problem. Sometimes this disk problem will often become apparent before the memory issue is fully understood.
In today’s operating systems when memory is exhausted the hard disk is used as a substitute. This is a great idea since hard drives are substantially larger than memory on a server. However, it has the disadvantage that hard drives are orders of magnitude slower than memory. As a result what might be a relatively light load on memory will quickly tax a hard disk and bring both the disk and the system to its knees.
One way to mitigate this is to minimize, or eliminate the virtual memory settings in Windows to prevent Windows from using the hard drive as if it were memory. This setting can prevent a memory bottleneck from impacting the hard drives – but raises the potential for the programs running on the server to not be able to get the memory that they need. This is generally an acceptable balance for making sure that you’re aware of the true root cause of an issue.
The memory counter to watch is the pages per second (pages/sec) counter. This counter tracks the number of times that data was requested from memory but it had to actually be read from disk. This counter, above all others, helps to identify when the available memory doesn’t meet the demands of the system. A small number, say less than 100, of these is a normal consequence of a system which is running, however, sustained numbers larger than 100 may indicate a need to add more memory. If you’re seeing a situation where you need more memory you can not evaluate the disk performance reliably since the system will be using the disk to accommodate the shortage of memory.
The primary counter for monitoring disk time is the ‘% Disk Time’ counter. This counter represents the average number of pending disk requests to a disk for the interval multiplied by 100 (to get a percentage.) This calculation method leads to some confusion when the disk driver can accept multiple concurrent requests such as SCSI and Fibre Channel disks. It is possible for the instances measuring these types of disks to have a % disk time above 100%.
One of the choices to be made when selecting disk counters is whether to select Logical disk counters or Physical disk counters. Logical disk counters measure the performance relative to the partition or logical volume rather than by the physical disks involved. In other words, Logical disk counters are based on drive letter mappings rather than on the disks involved. The physical disk option shows instances for each of the hard drives that the operating system sees. These may either be physical drives or in the case of RAID controllers and SAN devices, the logical representation of the physical drive.
In general, the way that you’ll be measuring performance for disk drives the best approach is to use physical disk counters. This will allow you to see which hard disk drives are busier and which ones are not. Of course, if there’s a one-to-one relationship between your logical drives (partitions) and the physical drives (that the operating system sees) then either logical or physical disk counters are fine. However, only the physical disk counters are turned on by default. If you decide to use logical disk counters, you’ll need to run the DISKPERF command to enable logical disk counters, and reboot the system.
The % disk usage counter should be evaluated from the perspective of how long of a performance slow down you can tolerate. In most cases, disk performance is the easiest to fix – by adding additional drives. So it’s an easy target if you’re seeing sustained % disk times above 100%. If you’re on a RAID array or a SAN consider that you may want to be evaluating the % disk times from 100 % times the number of effective drives in the array. For RAID 1 and RAID 1+0, it’s one half the number of disks. For RAID 5, it’s the number of disks minus one.
Since the dawn of computing, people have been watching processing time and the processes which are consuming it. We’ve all seen the performance graphs that are created by task manager and watched in amazement at the jagged mountain range that it creates. Despite the emphasis on processor time for overall performance it’s one of the last indicators to review for performance bottlenecks. This is because it’s rarely the core problem facing computers today. In some scientific applications and others with intense processing requirements it may truly be the bottleneck – however, everyone seems to know what applications those are. For most applications processor speed just isn’t the key issue.
The most useful measure of a processor’s availability is the % processor time. This will indicate the percentage of time that the processor (or processors) were consumed. This is useful because taken over a period of time it indicates the average amount of capacity that is left.
Improving processing speed isn’t an option for most servers. The application will need to be split up, optimized, or a new server installed to replace the existing one. It is for this reason that when processing bottlenecks occur they are some of the most expensive to address.
Until recently not much thought was given to the network as a potential bottleneck but with the advent of super-sized servers with four or more processors and terabytes of disk space it has to be a consideration. Network performance monitoring is a measure of how much of the bandwidth available on the networks is actually being consumed.
This is a tricky proposition since the connected network speed may not be the total effective speed. For instance, a super-server is connected through a 1GB connection to a switch which has eight 100 MB connections. The server will assume that 1GB of data can flow through the network that it is connected to. However, in reality only 800 MB at the most is truly available to be consumed.
Another consideration is that many network drivers even today are less than stellar in their reporting performance information. More than a few network card drivers have failed to properly report what they’re doing.
In general, network performance monitoring should be done from the perspective of understanding whether it is a possible bottleneck by evaluating what the maximum effective throughput of the network is likely to be and determining what that percentage of the theoretical limit is. In general it is reasonable to assume a 60% utilization rate for Ethernet is all that is really possible.
The guidelines here may not be enough to completely diagnose a performance problem and identify a specific course of action to resolve it, however, in many cases it will be. In those cases where it’s not clear enough to be resolved by looking at the high level indicators that were mentioned here, you’ll have to dive through the other counters and identify which ones will help you isolate the problem and illuminate potential solutions.
Most CorasWorks’ web parts have a hidden property called Display which can be added to the DWP file which will change the basic display behavior of the web part so that it emits the specific output that you’re looking for. Because the display tag is hidden it’s not available from the tool pane and must be added to the DWP file directly.
The first step is to configure the CorasWorks component so that it is correctly returning the right data and export that web part as a DWP so there will be a file that is correctly configured. To do this, make sure that the title bar for the web part is shown. If the title bar isn’t shown, enter Design mode by clicking on Modify Shared Page–Design this Page. Next, click the down arrow on the right hand side of the web part title bar. On the context menu select Export…, save the file when prompted.
The next step to customizing the output is to create the format for the replacement. The basic format for the replacement is a set of HTML fragments each terminated with a <END> "tag". The orders of these elements is header, item (non-selected), footer, and (when appropriate) a final section for selected items. The Special Site Navigation component is the only component which has a section for selected items at this point.
In each of these sections there are several replacement strings which will be replaced with the value contained in the associated field or property. An example of the content to make a spreadsheet roll up look like a linked list appears below:
<table border="0" width="100%">
<tr><td style="padding-bottom: 5px" class="ms-vb"><img src="/_layouts/images/square.gif"></td><td style="padding-bottom: 5px" class="ms-vb"><a HREF="<%Link%>"><%Display%></a></td></tr>
For a list the replacement strings are the field names surrounded by a <% and %>. You can see this in the above example of <%Link%> and <%Display%> — these are both fields in the lists being rolled up by CorasWorks’ web part.
For special site navigation, there are only three replacement strings that are valid:
Adding the tag to the XML is simple, but is very specific. First, the value must be encoded or placed in a CDATA section so that it is not interpreted as a part of the XML. The best way to do this is the CDATA section so the Display property will still be readable.
First, add a new <Display></Display> tag set prior to the closing </WebPart> tag in the DWP file. Next copy the xmlns attribute from the last tag prior to the new display tag you just added and add that attribute to the display tag. Each web part uses it’s namespace for the Display tag. You must provide this xmlns attribute for the Display node or it will not work.
In the middle of the <Display></Display> tag set add a CDATA node by adding <![CDATA[ ]]>. In the middle of the two brackets ‘[ ]’ add the content that you created above.
Save the DWP file.
The next step is to import the modified DWP file. From the Modify Shared Page menu select Add Web Parts and Import. Click the browse button and locate the DWP file that you modified. Click the Upload button to upload the control. Drag the control on to the page from the tool pane.
The display of the links or navigation should reflect the updated HTML that you provided.
If for some reason you don’t see any modified display make sure that your <Display> tag has the correct xmlns attribute. It should match the other xmlns attributes in the file.
Adding a reference to a shared library from a web part is not as simple as using visual studio to add a reference to a project output or a fixed DLL on the file system. In addition to adding the reference to the project itself, you must add the DLL to the cab file and modify the manifest.xml so that the referenced DLL is deployed with the web part. This How To shows you what must be done for the web part to deploy correctly when referencing another assembly.
Adding the reference to the project can be done with the following steps:
Now you have added the reference to the project. Next is adding the DLL to the CAB file.
The process of adding the referenced DLL to the CAB file is easy. Simply follow these steps:
Now that you’ve added the DLL to the Cab file it’s time to add the file to the manifest.xml.
The final step is to add the file to manifest.xml so that STSADM will deploy the DLL for you when the web part is deployed. You can do this by following these steps:
Now you have completed the changes necessary for the web part project to deploy the referenced DLL along with your web part.
A strong name is a cryptographic public/private key pair that is applied to an assembly in order to prove that it has not been tampered with after compilation. The process of strong naming an assembly has three components: creating the key, adding the key to the project, and adding the key to the assembly. We’ll look at each part in turn.
The process of creating the key is relatively straight forward as Visual Studio ships with a utility that creates the key for you. Follow these steps to create the key:
You now have a strong name key which can be added to the project.
Now that you have the key, you need to add it to the project. This is done to ensure that Visual Studio will manage the check in and check out process for you. Follow these steps to add the key to the project:
You’ve now added the key to the project. It will be checked in and out when you select those commands in Visual Studio.
The final step in the creation of a strong name is to add the key to the assemblyinfo.cs file. This file was already added to your project for you. Here’s how to add the key to the assembly:
You’re successfully added a strong name to an assembly.
[Note: I wrote this a year ago, mainly as therapy for the frustration. I thought it might be interesting for others to see. Since the event of more than a year ago things have substantially improved and I like my service except for the occaisional quirk. I’m just glad I don’t have to work through problems every day. –rlb]
The plane lands and turns onto the taxi way. The flight attendants politely but firmly let you know that you should sit in your seat until the aircraft comes to a complete stop and the pilot turns off the fasten seatbelt sign. Then she informs you that cellular phones can now be used.
You turn on your phone to the startup tone. You look down at the signal bar. No signal. You try to check your voice mail and receive a message of no service. You look out the window and confirm you are in a large city. Looking back down again you realize that this is more than just some added delay in getting the phone to wake up.
A situation quite similar happened to me recently. I landed in Seattle, WA which is just about as far from my home in Indianapolis, IN as you can get without leaving the continental United States. Although concerned, I felt comfortable that this was just a minor glitch that could be easily resolved. I made it to the hotel, missing the opportunity to call my wife and let her know that I landed safely before she went to bed.
I picked up the phone at the hotel to call Cingular, the wireless carrier I had switched to only three weeks before hand. I was greeted with a prompt that told me that customer service was closed, but there was, luckily, an after hours support number.
After a brief interaction, and a technical support person who couldn’t locate the problem I was told to turn the phone off until the morning and things would probably work themselves out.
Three weeks prior when trying to activate the service I learned to be somewhat skeptical as I went through a week and a half of trying to get things working. I ordered the phone from TeleSales and activated service for two lines. One for my wife and the other to support my needs. I decided to port my telephone number from Sprint PCS to Cingular in no small part so I could see how bad portability really was. Sometimes writers have to do things that they know are wrong so they can remind others not to do it – or so I’ve been told by my similarly masochistic friends.
Things were broken from the start as my wife’s phone arrived on time two days after I ordered it but mine was nowhere to be found. A phone call or two later I was told that the phone hadn’t shipped because they were waiting on the portability department to OK the phone’s shipment. Over the course of the next few phone calls and countless hours wasted I realized that the process for shipping new service on a ported phone was completely insane since Telesales was waiting on portability. Portability was waiting on me to have the device so they could activate it in the system.
When two more days passed and I still hadn’t seen the device I called back in only to find out that TeleSales couldn’t track the phone. However, they were able to finally confirm that the phone shipped – a day after it was supposed to. By the time the phone arrived it was nearly a full week later. The device was ordered on a Saturday and arrived the Friday following. When I called in to activate the phone they couldn’t get the system to take it. They explained that something was wrong but they’d put in a ticket which might take two business days to get an answer to.
On a whim I called back on Monday and received an activation for the phone. While on the phone I made a quick outbound call. Success! At least that’s what I thought until I went to call the phone back and setup my voicemail. Neither one of them worked.
Another call back and a few more wasted hours got me no where. Over the next two days I logged literally hours on the phone trying to get someone who would resolve the problem. The first day I was told that the phone wasn’t “fully” in the system and that I’d have to wait until Tuesday. Tuesday came without voice mail or inbound calls.
I finally found a supervisor willing to commit to resolving my issues. A few hours later I had inbound calls. A few more calls from me finally got my voice mailbox setup correctly since apparently the device wasn’t setup right before it was shipped and there were some other configuration issues with the phone.
For the time after these problems and before my journey to Seattle all was well. I was enjoying the extended coverage area and service.
When the next morning of my Seattle trip came I still had no service. When I called back in I was told that they couldn’t see the device and would have to put a roaming ticket in and that those took two to three days to resolve. I was also told that it was most likely an equipment problem and that the phone should be looked at.
Over the next three days I wasted three or so hours trying to get them to identify a store that I could go into to get a resolution. It seems that the multi-mode (GSM/TDMA) phones that I had weren’t popular in the area so none of the local stores sold them or could diagnose them. The store which was less than a block from the convention center where I was going to be was useless because they didn’t have the ability to resolve the issue.
I was offered the ability to order a new device from TeleSales – which couldn’t send devices any faster than two day air and I already had experience with them not shipping the same day. Ultimately this meant that I’d be on my way home before a replacement device could get to me.
The end result was four days in Seattle without the cellular coverage I was paying for.
Perhaps the most frustrating part of the whole situation was the fact that there weren’t any quick solutions to the problems I was sharing. Everything was a two or three day wait. Every solution was presented with the caveat that it probably wouldn’t fix my problem.
Each solution that I presented was met unceremoniously with a reason why it wouldn’t work. They couldn’t reprogram the SIM card in my phone because I wasn’t in my home market. They couldn’t get me a SIM card with a Seattle local number because I didn’t live in Seattle and Indiana and Seattle use different billing systems.
It’s just too bad I can’t live without my cell phone.
So, at Crowe we run Lotus Notes. (I hate Lotus Notes.) Sorry, it’s an automatic response.
Anyway, I wanted to export my contacts to sync with my Exchange infrastructure here at the home office. No problem. Notes will export to vCards. Ok, there is a problem. Notes exports to one huge vCard file with multiple vCards. Outlook only recognizes files with one vCard. The net result is that you can only import the first record in the Notes vCard export.
So I wrote a utility which bursts a single vCard file into multiple vCard files so you can double click the ones you want and add them to outlook. It’s really quick and fairly slick.
I’m working on getting a license created that I can use to distribute free stuff like this, but in the mean time send me an email if you need a copy of the utility (or the code). I’ll send it to you if you email me. The standard disclaimers apply… I wrote it for me, it may or may not work for you.
I’ve been doing a series at developer.com which describes each of the roles in the software development process — the latest article on the role of a developer has posted.
I’ve always avoided front page like the plague. In part due to the fact that I don’t want to unghost my pages but also in part because I just dislike the tool. However, that’s changed…
The ability to create data view web parts and more importantly to convert list view web parts into dataview web parts (with a simple right click I might add) is very cool. Add the list view web part to your page, right click in front page to convert to a dataviewweb part and you’re done.
Of course using front page will unghost the page you’re working with — however, you can always export the DWP file and use it other places.
Another useful tool…