Infrastructure Management: 2015

Sunday, October 25, 2015

Pros and cons of External Cloud Services

Nowadays, the cloud is everywhere. All software you see and all providers propose cloud brokering via their Datacenters, platforms or software. Now what are the real cloud drivers? We often speak about cloud acceleration but rarely speak about issues that arise when you start playing in the cloud...

The good points with the cloud :

It is fast and flexible (from a service ramp up / down perspective)
It usually implements the very latest technology and software platforms
It is usually very adapted to geographically dispersed solutions, allowing resilience and regional / local implementation of global solutions across the globe
You don't have to handle capacity reporting and planning of the environment... just pay as you go
For some very specific requirements, you can have your environment online only a few hours per day (when you really need it). The rest of the time, you turn it off and limit the cost, which is not as cost efficient when done on your owned assets
You run on a standard operated environment
There is a huge varitery of services (IaaS, PaaS, SaaS) to subscribe out there. Cloud solutions are usually financial efficient when you buy PaaS or Saas. IaaS will usually let you manage a level of complexity between your in-house architecture and the cloud infrastructure which adds complexity and overhead to your teams

Now the drawbacks of these kind of services :

As soon as you have a certain volume and a sustained growth, public cloud is usually more expensive that in house
If you don't run on standard "off the shelf" applications, the cloud can be a real issue for you. Even though upgrades are planned and predictable, there is no place for blocking upgrades of versions. Should your application not support that intense patching roadmap, you will be really bothered
Service transfer from cloud services can be a real pain point. A very simple example I went through was when using an SAP on an "as a service" mode. When attempting to exit that service, and without any clear contractual statement, I was explained that the SAP image itself was the intellectual property of the vendor. As a result, a complete replatforming of the SAP environment had to be done, follow by a data synchronization.
Data reversibility is an extension of the issue above. Be sure, when contracting a cloud service, that you have the right conditions for a transition out. Vendor "lock in" is one of the most common issues on the cloud.
Most cloud services come with limited to no SLA. Cloud providers prever to commit on means than results. Be aware of this before putting any critical service in the cloud
Cloud is usually not recommended for critical data. Indeed, it is always complicated to impose data location to cloud providers. Moreover, cloud providers are usually seen as a big regulartory risk since the staff managing the cloud platforms is geographically dispersed and you usually have no control over this

Cloud services continue to progress and seduce more and more companies. They remain perfect for either temporary workload hosting (IaaS) or complete end to end solution hosting (PaaS or SaaS, where limited integration is required with your core company services.

Keep in mind that cloud services are extremely standardized but always come with limited SLA commitment and a very limited framework. As such, either you will need to be a major player to get an appropriate contract or you will have to accept the risks on the services you contract.

To conclude: should you have a sustained growth, internal private cloud with capacity on demand will be the perfect solution. For other cases, or off the shelf platforms, hybrid or public cloud can greatly enhance your capabilities but keep in mind that getting strong commitment on SLAs and legal / auditability aspects will remain a challenge for you.

Friday, October 9, 2015

RFP General Principles - RFP for hardware / software with an integration project

Many people would want a complete RFP document sample, which would let them almost copy paste the document, adapt it a little, change the fonts and header and reissue the document. Before providing such examples, it is key to understand what are is the best way to break down an RFP in order to make it easy for you and the vendor to analyze, and understand your expectations.

The structure of your document I suggest is the following :

1. Overview

- 1.1 Company Presentation : this gives an overview of your company, with global figures (employees, business units, sales or revenue figures, key growth drivers...)

- 1.2 RFP Context : this is often omitted but more important than you think. It provides the general context of the RFP and lets the suppliers understand if you are just replacing a deprecated platform or planning something scalable to take on a huge business growth or increase flexiblity on your services

2. Project Expectations

In this section, add one chapter per expectation. Based on whether you expect the RFP to be service / SLA driver or a very technical design based on a BOM (bill of materials), you can put several sections. These sections can be related to availability, resilience, flexibility, capability to scale up or scale out the solution... but also be technical requirements such as imposing your standards regarding management network (do you impose a non routed OOB management network), security access (should everything be authenticated against your corporate active directory), do you have specific connectivity & protocols (e.g on a Nexus core stack, you would have LACP network connections, meaning specific teaming settings on network cards). specific plugins with other platforms, reporting and capacity mangement feaures (capacity and performance trending...)

3. Instructions to bidders

- 3.1 : Key contacts (name, function / role, email, phone): indicate the SPOC (single point of contact for the project). Usually, on big projects, you have a project manager, the project sponsor or team leader and a guy from the procurement

- 3.2 General information for the bidders : in all RFPs, you should given the global guidelines to the bidders to avoid any surprises when attempting to close the deal at the end of the RFP cycle. Such general information would be :

Proposal acceptance requirement : indicate language in which you expect the bidder's answer, the currency in which prices must be given, if you impose a fixed exchange rate against standard currencies such as dollar (because your budget is stable since it is based on a hedged currency)

Proposal validity period

Non disclosure information

Non compensation conditions : indicate that no kind of compensation will be provided for manpower and costs associated when participating to this RFP

- 3.3 : Timelines : it is compulsory to provide timelines with your RFP, so that bidders know precisely what is expected and when. This is usually summed up in a small table with the key dates (date, milestone, action owner, communication medium)

- 3.4 : Questions for clarifications : be clear about your rules on questions :

Best practise would be that you have a milestone in section 3.2 Timelines indicating the hard stop for questions & answers

In this section, you should indicate to all bidders if you allow yourself to disclose their questions anonymously to other bidders, should the questions be interesting to share

- 3.5 Evaluation criteria : be very clear on how you will assess their reponse :

Is your project TCO (total cost of ownership) driven?

Are you going to check compliance of the bidder's answer with you RFP requirements?

Are you looking for a solution with proven stablity & flexibility

Will you expect reference calls and reference presentations to prove the bidders professionalism?

Will you be expecting a very clear vision of the project, resources to engage, efforts expected on your side, planning...

- 3.6 Proposal Structure : this is where you impose a certain level of structure to the bidders' response. It is key to get this part neat. Indeed, if all proposed are aligned in a "template" format, parsing the answers and comparing the bidders will be very simple. The usual information I expect in a response to an RFP is the following :

Executive summary

Bidder's presentation (I usually supply an excel sheet to impose the information I am requesting : turnover, operating income, best in class resellers and integrators, similar project references...)

Technical solution : this is where you have to turn verbose mode on, indicating all RFP technical deliverables (high level design, low level design, power consumption, maximum capacity and performance of the solution

Project Organization : explain all mandatory phases you expect the bidder / integrator to perform during the integration project (design workshops, activities, onboarding of your internal resources, reporting details and frequency)

Financial Breakdown of the proposal : here again, provide an excel sheet the bidder should fill in and put all the fields you expect him to fill in (unit public price, unit proposed price, quantity, total price, taxes, non recoverable taxes , project costs, training and handover costs...)

Terms and conditions : this is usually an internal document provided by your legal or procurement team

Saturday, August 1, 2015

Web Application Performance - Tuning from Client to Server, going through Citirx optimization

This first post will give you all the inputs to understand how web protocol works and how performance can be terrible due to either poor application coding, configuration issues on client or server side and / or latency on network links. We will go through different steps in order to understand the complete stack and options available to developers, architects and operational teams in charge of running the service.

Bear in mind that we are going to looks here at optimizing performance of applications that are laggy due to HTTP protocol being chatty and degrading perfromance on low speed or high latency links. If your application is performing like crap (e.g poor database queries lasting hours), this topic won't help make it better :-)

0. Understanding HTTP :

Without going into the details, HTTP is a standard process that has been defined in order to standardize the internet. It is pretty straightforward and independant of the web server and client you are using. It always starts very simply : the client queries the server to download an HTML. In the URL, you will see all sorts of extensions to the file which is loaded (aspx, php, html, jar...), but these are simply script or compiled code that will produce html content.

Once the client downloads the HTML file, that file contains a list of tags which define the web page structure but also all media to load. As your browser parses the HTML files, it triggers more requests to the web server to download media this time (javascript files, css style sheets, images, video resources, sounds....)

So to summarize, this is pretty simple... You always start with one file which defines the page structure and all the associated media. When your browser receives the HTML file, it parses it and starts loading all resources defined in the HTML file...

1. A simple test case :

The easiest way to illustrate and understand the basic behaviour of a web client communicating with its server is to build a very basic web page and load it. In order to emphasize the issues, I have created a 1.5 MB png image and renamed it 20 times (a0 to a9 and b0 to b9) so that the page loads slowly enough to illustrate the breakdown of the activity.

Let's start with the code, which is very basic. As you can see below, this HTML file has a list of 20 PNG images

And an overview of the page shown in the browser

I used a freeware tool to take very basic measurements from Firefox & Internet Explorer: HttpWatch (Please be aware you will need another tool for similar measurements from Chrome). This tool simply displays graphically the loading of a web page from your browser. When loading the page above, you get the image below :

Now, what do we see here ?

The very first item loaded from the site is test.html (as shown in the first line above). The loading of this page is very quick since the HTML file is very small in size (724 bytes). Note that nothing else happens while the test.html file is loaded. This is simply due to the fact that the browser needs to load the HTML file and parse it to know what other files it has to load from the web server (the HTML file defines the page strucutre in addition to all resources to load such as javascript files, css style sheets, images....)
The following part is that is interesting : you would expect that, as soon as the hmtl file is parsed by your browser, all images would be loaded together. This is not the case. Worse, this is never the case. Indeed, there are a set of restrictions on servers and browsers that limit the maximum number of parallel transfers between a client and a server. I used a Firefox version 3 for the test above, which has a limitation of 6 maximum connections per server (controlled by the network.http.max-persistent-connections-per-server setting as explained here)

Now, why do these limitations exist? This is more a "gentleman agreement" on the Internet. Indeed, if users completely removing this limit, they would very quickly hit the limit of maximum connections on the web servers and penalize the service for other users. Be aware that increasing this value above 10 maximum connections per server risks having your IP backlisted... so make sure you play with this in private. Finally, the RFC 2616 defining HTTP 1.1 standards initally stated that only 2 persistent connections per server should be allowed... but this was in the old days of internet, when pages were lightweight and didn't host so much media...

If we go a little further in the analysis, we can not that the screenshot above shows and HTTP GET 200 return code which corresponds in HTTP to an "OK" (meaning you successfully executed the GET request and downloaded the file from the server). Note that if we reload the page, we get another HTTP code :

This time, you can notice the loading time is way faster. This is simply due to the fact your browser has a local cache of the files. Still, you will notice the browser queries the server to see if the local cache file corresponds to the file stored on the web server... the impact is not noticeable here, due to the fact the web server is local, but if you imaging a web server at 200 ms latency from you (e.g a server in Asia contacted from a client in Europe), and the fact you can only run 6 queries in parallel, you need 4 groups of queries (6 + 6 + 6 + 2) to check all the files with the cached one which adds 800 milliseconds to the loading time of your page... If this is not clear at this stage, don't worry, we will check this out in detail lower in this post.

2. Fixing the performance issue through client configuration :

There are several places you can work on in order to solve the issue. The ones presented in this section are only configuration changes. Please note this will enhance performance but is it can be combined with solutions proposed lower in this document to really boost end user experience...

The first obvious setting to change is on your browser side, and increase the maximum connections :

In Firefox, type about:config in the URL bar and change the value of the http.max-persistent-connections-per-server setting

In Internet Explorer, the value is changed in the Operating System Registry. Click Start => Run => Regedit and look for the key HKEY_CURRENT_USER> Software> Microsoft>Windows> CurrentVersion> Internet Settings and change the values of the key MaxConnectionsPer1_0Server and MaxConnectionsPerServer.

On Chrome, this setting seems to be hard coded and tied to the user profile... meaning fine tuning for a single end user (hence a single profile) will be tricky...

You can find the details of the default maximum connetions per server in the Browserscope web site.

3. Fixing the performance issue through server / application configuration :

If client configuration is not sufficient, you can start working on the server or application side. The server will allow the same tuning as the client, working on the maximum TCP connections it will accept but also the maximum number of connections per client.

The next changes are directly in the application stack, Where several changes can be done in order to change the end user experience. Here are a few easy ways of improving your solution (this part will be detailed later)

Adding expiration header to your media:
This solution is pretty straightforward as you simply replace every call to a resource by a small piece of code that will inform the client of the expiration date of your resource. This code can be dealt with in several ways. The first solution is to build a small script page that adds header information about resource expiry before transfering the resource to the client.

a) Hard coding expriation

A simple example would be that, instead of referring to your images in your HTML files, you call a home made script that informs the client of a default expiration date :

b) Global Configuration of expiration via htaccess

Rather that handle every image individually, an easy solution is to set expiration directly on your .htaccess file in order to have a general control over file expiration

c) Global Configuration of expiration via web server

Several web browsers will let you configure media expriation directly in their configuration files. The most common web server being Apache, let's look at how to do the change with Apache. A little module named mod_expires will give you control over the expiration headers of media in your web pages. Here again, configuration will allow you to set expiration based on media types :

Now, this is usually not sufficient... so the last section of this article will show information that will make a huge difference...

4. Leveraging other technologies to change user experience :

Before looking into WAN acceleration devices, let's just compare 2 solutions :

A first solution which remains our initial use case : I connected to a central Sharepoint web site located in Europe, from my work computer, when I was travelling in Brazil. You will see below the large blue sections which represent download of media through a WAN link

A second attempt, when I was on the same site (and connection) in Brazil, but where I no longer used my browser but a browser hosted on a Citrix platform in Europe, just beside the Sharepoint farm I want to connect to. Here, transfer times are reduced to almost nothing and web experience is completely changed

So what is the main difference here? The first is the page loading time : with my local browser, loading time was 14 seconds. When going through Citrix, this dropped to 5 seconds. Indeed, from my browser, due to the chatty HTTP protocol and the latency coupled to the 5 persistent connections limit with the webserver, I spend most of my time on red and blue parts which correspond to waiting for server response (250 ms round trip latency) and waiting for the data to come through the pipe (data transfer).

On the second case, with Citrix, waiting for the server and transfers are minimal since all data is transfered between 2 servers in the same Datacenter. The only transfer to my local PC is Brazil is Citrix data, which, as you will see later, can also be optimized / cached with specific WAN acceleration appliances.

Sunday, February 22, 2015

MSSQL - Simplify and Automate Refresh Operations

One of the most recurring tasks of an MSSQL DBA is refreshing Databases. This task can quickly get boring as you connect to one instance, dump the database, copy it accross the network and then restore it on the other side, remapping manually Data and Log file logical names to the new physical file structure on the target Database.

The script below aims at making your life easier by scripting both steps of a refresh process, leveraging a network file share to host the dump file :

Section 1 will export the Database to a defined network share
Section 2 will load the Database dump file, analyze the file structure, and remap logical data and log file paths to restore the database in the right place

0. Pre-requisites

The only pre-requisite of this script is to have a shared folder on the network on which the dump will be performed. For security reasons, it might be good to restrict access to that folder and only grant access to service accounts running your MSSQL instances in addition to your DBA accounts.

1. Dumping the Source Database

Connect to the source instance with a MSSQL Management Studio
Execute the following query on the source instance, replacing the parts in blue with the correct database name. Finally, execute the query.

DECLARE @sourceDB AS varchar(128) = 'sourceDatabaseName';
DECLARE @exportPath AS varchar(128) = '\\mySharedFolderPath\' + @sourceDB + '.bak';
BACKUP DATABASE @sourceDB TO DISK = @exportPath;

Check the MSSQL query log to make sure there are no errors

Should you have an error message, this could be linked to the fact the MSSQL instance service account is not part of the authorized groups that can connect to the CIFS file share you set up.

2. Restoring the Dump to the Target Database

Connect to the target instance with a MSSQL Management Studio and open a new query. Copy / paste the code below replacing the parts in blue with the correct database names and execute the query.

DECLARE @SourceDB AS varchar(128) = 'sourceDatabaseName';
DECLARE @TargetDB AS varchar(128) = 'targetDatabaseName';
DECLARE @DumpPath AS varchar(128) = '\\mySharedFolderPath\'+@SourceDB+'.bak';
DECLARE @TargetDBData AS varchar(512);
DECLARE @TargetDBLog AS varchar(512);
SET @TargetDBData = (select physical_name from sys.master_files where database_id = (select database_id from sys.databases where name = @targetDB) and type_desc = 'ROWS');
SET @TargetDBLog = (select physical_name from sys.master_files where database_id = (select database_id from sys.databases where name = @targetDB) and type_desc = 'LOG');
print 'Target Data File Path : '+@TargetDBData;
print 'Target Log File Path : '+@TargetDBLog;
CREATE TABLE #tmp (LogicalName nvarchar(128) NOT NULL, PhysicalName nvarchar(260) NOT NULL, Type char(1) NOT NULL, FileGroupName nvarchar(120) NULL, Size numeric(20, 0) NOT NULL, MaxSize numeric(20, 0) NOT NULL, FileID bigint NULL, CreateLSN numeric(25,0) NULL, DropLSN numeric(25,0) NULL, UniqueID uniqueidentifier NULL, ReadOnlyLSN numeric(25,0) NULL , ReadWriteLSN numeric(25,0) NULL, BackupSizeInBytes bigint NULL, SourceBlockSize int NULL, FileGroupID int NULL, LogGroupGUID uniqueidentifier NULL, DifferentialBaseLSN numeric(25,0)NULL, DifferentialBaseGUID uniqueidentifier NULL, IsReadOnly bit NULL,
IsPresent bit NULL, TDEThumbprint varbinary(32) NULL );
INSERT #tmp EXEC ('restore filelistonly from disk = ''' + @DumpPath + '''')
DECLARE @DataFileName AS varchar(128);DECLARE @LogFileName AS varchar(128);
SET @DataFileName = (select logicalname from #tmp where type='D');SET @LogFileName = (select logicalname from #tmp where type='L');
print 'Data File Name : '+@DataFileName;print 'Log File Name : '+@LogFileName;
DROP TABLE #tmp
-- Set DB in single user to kill other connections and revert
USE master;
EXEC('ALTER DATABASE ' + @TargetDB+ ' SET SINGLE_USER WITH ROLLBACK IMMEDIATE');
EXEC('ALTER DATABASE ' + @TargetDB+ ' SET MULTI_USER');
-- Start Restore process
RESTORE DATABASE @TargetDB FROM DISK = @DumpPath WITH REPLACE,MOVE @DataFileName TO @TargetDBData,MOVE @LogFileName TO @TargetDBLog

The script is designed to perform the following operations :

Analyze source database dump file to identify data & log names
Analyze target database structure to define the path to the MDF / LDF files that need to be replaced
Restrict the target database to SINGLE_USER mode in order to kill all other connections
Perform the refresh operation

Once the refresh has completed, you should get the following log :

An updated version of this script should soon provide the missing step of the process : export account mappings and remap accounts after the refresh is complete...

MSSQL List Tables, Row Count and Table Size

Part of the basic toolset for MSSQL Databases, you will find below a simple query to list all tables, row count and disk useage of the tables of a DB :

SELECT
t.NAME AS TableName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
WHERE t.NAME NOT LIKE 'dt%' AND t.is_ms_shipped = 0 AND i.OBJECT_ID > 255
GROUP BY t.Name, p.Rows
ORDER BY t.Name