Tuesday, December 10, 2013

BizTalk 2013–Integration with Amazon S3 storage using the WebHttp Adapter

I have recently encountered a requirement where we had to integrate a legacy Document Management system with Amazon in order to support a Mobile-Field Worker application.  The core requirement is that when a document reaches a certain state within the Document Management System, we need to publish this file to an S3 instance where it can be accessed from a mobile device.  We will do so using a RESTful PUT call.

Introduction to Amazon S3 SDK for .Net

Entering this solution I knew very little about Amazon S3.  I did know that it supported REST and therefore felt pretty confident that BizTalk 2013 could integrate with it using the WebHttp adapter.

The first thing that I needed to do was to create a Developer account on the Amazon platform. Once I created my account I then downloaded the Amazon S3 SDK for .Net. Since I will be using REST technically this SDK is not required however there is a beneficial tool called the AWS Toolkit for Microsoft Visual Studio.  Within this toolkit we can manage our various AWS services including our S3 instance.  We can create, read, update and delete documents using this tool.  We can also use it in our testing to verify that a message has reached S3 successfully.

image

Another benefit of downloading the SDK is that we can use the managed libraries to manipulate S3 objects to better understand some of the terminology and functionality that is available.  Another side benefit is that we can fire up Fiddler while we are using the SDK and see how Amazon is forming their REST calls, under the hood, when communicating with S3

Amazon S3 Accounts

When you sign up for an S3 account you will receive an Amazon Key ID and a Secret Access Key. These are two pieces of data that you will need in order to access your S3 services.  You can think of these credentials much like the ones you use when accessing Windows Azure Services.

image

BizTalk Solution

To keep this solution as simple as possible for this Blog Post, I have stripped some of the original components of the solution so that we can strictly focus on what is involved in getting the WebHttp Adapter to communicate with Amazon S3.

For the purpose of this blog post the following events will take place:

  1. We will receive a message that will be of type: System.Xml.XmlDocument.  Don’t let this mislead you, we can receive pretty much any type of message using this message type including text documents, images and pdf documents.
  2. We will then construct a new instance of the message that we just received in order to manipulate some Adapter Context properties. You may now be asking – Why do I want to manipulate Adapter Context properties?  The reason for this is that since we want to change some of our HTTP Header properties at runtime we therefore need to use a Dynamic Send Port as identified by Ricardo Marques.

    image

    The most challenging part of this Message Assignment Shape was populating the WCF.HttpHeaders context property.  In C# if you want to populate headers you have a Header collection that you can populate in a very clean manner:

    headers.Add("x-amz-date", httpDate);

    However, when populating this property in BizTalk it isn’t as clean.  You need to construct a string and then append all of the related properties together.  You also need to separate each header attribute onto a new line by appending “\n” . 

    Tip: Don’t try to build this string in a Helper method.  \n characters will be encoded and the equivalent values will not be accepted by Amazon so that is why I have built out this string inside an Expression Shape.

    After I send a message(that I have tracked by BizTalk) I should see an HTTP Header that looks like the following:

    <Property Name="HttpHeaders" Namespace="http://schemas.microsoft.com/BizTalk/2006/01/Adapters/WCF-properties" Value=

    "x-amz-acl: bucket-owner-full-control
    x-amz-storage-class: STANDARD
    x-amz-date: Tue, 10 Dec 2013 23:25:43 GMT
    Authorization: AWS <AmazonKeyID>:<EncryptedSignature>
    Content-Type: application/x-pdf
    Expect: 100-continue
    Connection: Keep-Alive"/>

    For the meaning of each of these headers I will refer you to the Amazon Documentation.  However, the one header that does warrant some additional discussion here is the Authorization header.  This is how we authenticate with the S3 Service.  Constructing this string requires some additional understanding.  To simplify the population of this value I have created the following helper method which was adopted from the following post on StackOverflow:

    public static string SetHttpAuth(string httpDate)
         {
              string AWSAccessKeyId = "<your_keyId>";
              string AWSSecretKey = "<your_SecretKey>";

             string AuthHeader = "";
            string canonicalString = "PUT\n\napplication/x-pdf\n\nx-amz-acl:bucket-owner-full-control\nx-amz-date:" + httpDate + "\nx-amz-storage-class:STANDARD\n/<your_bucket>/310531500150800.PDF";
                

             // now encode the canonical string
             Encoding ae = new UTF8Encoding();
             // create a hashing object
             HMACSHA1 signature = new HMACSHA1();
             // secretId is the hash key
             signature.Key = ae.GetBytes(AWSSecretKey);
             byte[] bytes = ae.GetBytes(canonicalString);
             byte[] moreBytes = signature.ComputeHash(bytes);
             // convert the hash byte array into a base64 encoding
             string encodedCanonical = Convert.ToBase64String(moreBytes);
             // finally, this is the Authorization header.
             AuthHeader = "AWS " + AWSAccessKeyId + ":" + encodedCanonical;

             return AuthHeader;
         }

    The most important part of this method is the following line(s) of code:

    string canonicalString = "PUT\n\napplication/x-pdf\n\nx-amz-acl:bucket-owner-full-control\nx-amz-date:" + httpDate + "\nx-amz-storage-class:STANDARD\n/<your_bucket>/310531500150800.PDF";
                

    The best way to describe what is occurring is to borrow the following from the Amazon documentation.

    The Signature element is the RFC 2104HMAC-SHA1 of selected elements from the request, and so the Signature part of the Authorization header will vary from request to request. If the request signature calculated by the system matches the Signature included with the request, the requester will have demonstrated possession of the AWS secret access key. The request will then be processed under the identity, and with the authority, of the developer to whom the key was issued.

    Essentially we are going to build up a string that reflects that various aspects of our REST call (Headers, Date, Resource) and then create a Hash using our Amazon secret.  Since Amazon is aware of our Secret they can decrypt this payload and see if it matches our actual REST call.  If it does – we are golden.  If not, we can expect an error like the following:

    A message sent to adapter "WCF-WebHttp" on send port "SendToS3" with URI http://<bucketname>.s3-us-west-2.amazonaws.com/ is suspended.
    Error details: System.Net.WebException: The HTTP request was forbidden with client authentication scheme 'Anonymous'.
    <?xml version="1.0" encoding="UTF-8"?>
    <Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><StringToSignBytes>50 55 54 0a 0a 61 70 70 6c 69 63 61 74 69 6f 6e 2f 78 2d 70 64 66 0a 0a 78 2d 61 6d 7a 2d 61 63 6c 3a 62 75 63 6b 65 74 2d 6f 77 6e 65 72 2d 66 75 6c 6c 2d 63 6f 6e 74 72 20 44 65 63 20 32 30 31 33 20 30 34 3a 35 37 3a 34 35 20 47 4d 54 0a 78 2d 61 6d 7a 2d 73 74 6f 72 61 67 65 2d 63 6c 61 73 73 3a 53 54 41 4e 44 41 52 44 0a 2f 74 72 61 6e 73 61 6c 74 61 70 6f 63 2f 33 31 30 35 33 31 35 30 30 31 35 30 38 30 30 2e 50 44 46</StringToSignBytes><RequestId>6A67D9A7EB007713</RequestId><HostId>BHkl1SCtSdgDUo/aCzmBpPmhSnrpghjA/L78WvpHbBX2f3xDW</HostId><SignatureProvided>SpCC3NpUkL0Z0hE9EI=</SignatureProvided><StringToSign>PUT

    application/x-pdf

    x-amz-acl:bucket-owner-full-control
    x-amz-date:Thu, 05 Dec 2013 04:57:45 GMT
    x-amz-storage-class:STANDARD
    /<bucketname>/310531500150800.PDF</StringToSign><AWSAccessKeyId><your_key></AWSAccessKeyId></Error>

    Tip: Pay attention to these error messages as they really give you a hint as to what you need to include in your “canonicalString”.  I discounted these error message early on and didn’t take the time to really understand what Amazon was looking for. 

    For completeness I will include the other thresshelper methods that are being used in the Expression Shape.  For my actual solution I have included these in a configuration store but for the simplicity of this blog post I have hard coded them.

    public static string SetAmzACL()
        {
            return "bucket-owner-full-control";
        }

        public static string SetStorageClass()
        {
            return "STANDARD";
        }

    public static string SetHeaderDate()
          {
              //Use GMT time and ensure that it is within 15 minutes of the time on Amazon’s Servers
              return DateTime.UtcNow.ToString("ddd, dd MMM yyyy HH:mm:ss ") + "GMT";
             
          }

  3. The next part of the Message Assignment shape is setting the standard context properties for WebHttp Adapter.  Remember since we are using a Dynamic Send Port we will not be able to manipulate these values through the BizTalk Admin Console.

    msgS3Request(WCF.BindingType)="WCF-WebHttp";
    msgS3Request(WCF.SecurityMode)="None";
    msgS3Request(WCF.HttpMethodAndUrl) = "PUT";  //Writing to Amazon S3 requires a PUT
    msgS3Request(WCF.OpenTimeout)= "00:10:00";
    msgS3Request(WCF.CloseTimeout)= "00:10:00";
    msgS3Request(WCF.SendTimeout)= "00:10:00";
    msgS3Request(WCF.MaxReceivedMessageSize)= 2147483647;

    Lastly we need to set the URI that we want to send our message to and also specify that we want to use the WCF-WebHttp adapter.

    Port_SendToS3(Microsoft.XLANGs.BaseTypes.Address)="http://<bucketname>.s3-us-west-2.amazonaws.com/310531500150800.PDF";
    Port_SendToS3(Microsoft.XLANGs.BaseTypes.TransportType)="WCF-WebHttp";

    Note: the last part of my URI 310531500150800.PDF represents my Resource.  In this case I have hardcoded a file name.  This is obviously something that you want to make dynamic, perhaps using the FILE.ReceivedFileName context property.

  4. Once we have assembled our S3 message we will go ahead and send it through our Dynamic Solicit Response Port.  The message that we are going to send to Amazon and Receive back is once again of type System.Xml.XmlDocument
  5. One thing to note is that when you receive a response back from Amazon is that it won’t actually have a message body (this is inline with REST).  However even though we receive an empty message body, we will still find some valuable Context Properties.  The two properties of interest are:

    InboundHttpStatusCode

    InboundHttpStatusDescription

    image

     

  6. The last step in the process is to just write our Amazon response to disk.  But, as we have learned in the previous point is that our message body will be empty but does give me an indicator that the process is working (in a Proof of Concept environment).

Overall the Orchestration is very simple.  The complexity really exists in the Message Assignment shape. 

image

 Testing

Not that watching files move is super exciting, but I have created a quick Vine video that will demonstrate the message being consumed by the FILE Adapter and then sent off to Amazon S3.

 https://vine.co/v/hQ2WpxgLXhJ

Conclusion

This was a pretty fun and frustrating solution to put together.  The area that caused me the most grief was easily the Authorization Header.  There is some documentation out there related to Amazon “PUT”s but each call is different depending upon what type of data you are sending and the related headers.  For each header that you add, you really need to include the related value in your “canonicalString”.  You also need to include the complete path to your resource (/bucketname/resource) in this string even though the convention is a little different in the URI.

Also it is worth mentioning that /n Software has created a third party S3 Adapter that abstracts some of the complexity  in this solution.  While I have not used this particular /n Software Adapter, I have used others and have been happy with the experience. Michael Stephenson has blogged about his experiences with this adapter here.

Sunday, December 1, 2013

BizTalk Summit 2013 Wrap-up

On November 21st and 22nd I had the opportunity to spend a couple days at the 2nd annual BizTalk Summit held by Microsoft in Seattle.  At this summit there were approximately 300 Product Group members, MVPs, Partners and Customers.  It was great to see a lot of familiar faces from the BizTalk community and talk shop with people who live and breathe integration.

Windows Azure BizTalk Services reaches GA

The Summit started off with a bang when Scott Gu announced that Windows Azure BizTalk Services has reached General Availability (GA)!!!   What this means is that you can receive production level support from Microsoft with 99.9% uptime SLA. 

 

image

During the preview period, Microsoft was offering a 50% discount on Windows Azure BizTalk Services (WABS).  This preview pricing ends at the end of the year.  So if you have any Proof of Concept (POC) apps running in the cloud that you aren’t actively using, please be aware of any potential billing implications.

Release Cadence

The next exciting piece of news coming from Microsoft is the release cadence update for the BizTalk Server product line.  As you have likely realized, there is usually a BizTalk release shortly after the General Availability of Platform updates.  So when a new version of Windows Server, SQL Server or Visual Studio is launched, a BizTalk Server release usually closely follows.  Something that is changing within the software industry is the accelerated release cadences by Microsoft and their competitors.  A recent example of this accelerated release cadence is Windows 8.1, Windows Server 2012 R2 and Visual Studio 2013.  These releases occurred much sooner than they have in the passed.  As a result of these new accelerated timelines the BizTalk Product Group has stepped-up, committing to a BizTalk release every year!  These releases will alternate between R2 releases and major releases.  For 2014, we can expect a BizTalk 2013 R2 and in 2015 we can expect a full release.

BizTalk Server 2013 R2

So what can we expect in the upcoming release?

  • Platform alignment(Windows, SQL Server, Visual Studio) and industry specification updates (SWIFT).
  • Adapter enhancements including support for JSON (Yay!), Proxy support for SFTP and authorization enhancements for Windows Azure Service Bus.  A request I do have for the product team is please include support for Windows Server Service Bus as well.
  • Healthcare Accelerator improvements.  What was interesting about this vertical is it is the fastest growing vertical for BizTalk Server which justifies the additional investments.

image

 

Hybrid Cloud Burst

There were a lot of good sessions but one that I found extremely interesting was the session put on by Manufacturing, Supply Chain, and Information Services (MSCIS).  This group builds solutions for the Manufacturing and Supply Chain business units within Microsoft. You may have heard of a “little” franchise in Microsoft called XBOX.  The XBOX franchise heavily relies upon Manufacturing and Supply chain processes and therefore MSCIS needs to provide solutions that address the business needs of these units.  As you are probably aware, Microsoft has recently launched XBOX One which is sold out pretty much everywhere.  As you can imagine building solutions to address the demands of a product such as XBOX would be pretty challenging.  Probably the biggest hurdle would be building a solution that supports the scale needed to satisfy the messaging requirements that many large Retailers, Manufacturers and online customers introduce.

In a traditional IT setting you throw more servers at the problem.  The issue with this is that it is horribly inefficient.  You essentially are building for the worst case (or most profitable) but when things slow down you have spent a lot of money and you have poor utilization of your resources.  This leads to a high total cost of ownership (TCO). 

Another challenge in this solution is that an ERP is involved in the overall solution.  In this case it is SAP(but this would apply to any ERP) and you cannot expect an ERP to provide the performance to support ‘cloud scale’.  At least not in a cost competitive way. If you have built a system in an Asynchronous manner you can now throttle your messaging and therefore not overwhelm your ERP system.

MSCIS has addressed both of these major concerns by building out a Hybrid solution. By leveraging Windows Azure BizTalk Services and Windows Azure Service Bus Queues/Topics in the cloud they can address the elasticity requirements that a high demand product like XBOX One creates. As demand increases, additional BizTalk Services Units can be deployed so that Manufacturers, Retailers and Customers are receiving proper messaging acknowledgements.  Then On-Premise you can keep your traditional capacity for tools and applications like BizTalk Server 2013 and SAP without introducing significant infrastructure that will not be fully utilized all the time.

Our good friend, Mandi Ohlinger ,who is a technical writer with the BizTalk team, worked with the MSCIS  to document the solution.  You can read more about the solution on the BizTalk Dev Center.  I have included a pic of the high-level architecture below.

image

While Microsoft is a large software company(ok a Devices and Services company) what we often lose sight of is that Microsoft is a very large company (>100 000) employees and they have enterprise problems just like any other company does.  It was great to see how Microsoft uses their own software to address real world needs.  Sharing these types of experiences is something that I would really like to see more of.

Symmetry

(These are my own thoughts and do not necessarily reflect Microsoft’s exact roadmap)

If you have evaluated Windows Azure BizTalk Services you have likely realized that there is not currently symmetry between the BizTalk Service and BizTalk Server.  BizTalk Server has had around 14 years (or more) of investment where as BizTalk Services, in comparison, is relatively new.  Within Services we are still without core EAI capabilities like Business Process Management (BPM)/Orchestration/Workflow, Business Activity Monitoring (BAM), Business Rules Engine (BRE), comprehensive set of adapters and complete management solution.

With BizTalk Server we have a mature, stable, robust Integration platform.  The current problem with this is that it was built much before people started thinking about cloud scale.  Characteristics such as MSDTC and even the MessageBox have contributed to BizTalk being what it is today (a good platform), but they do not necessarily lend themselves to new cloud based platforms.  If you look under the hood in BizTalk Services you will find neither of these technologies in place.  I don’t necessarily see this as a bad thing.

A goal of most, if not all, products that Microsoft is putting is the cloud is symmetry between On-Premise and Cloud based offerings.  This puts the BizTalk team in a tough position.  Do they try to take a traditional architecture like BizTalk Server and push it into the cloud, or built an Architecture on technologies that better lend themselves to the cloud and then push them back on premise? The approach, going forward,  is innovating in the cloud and then bringing those investments back on-premise in the future.

Every business has a budget and priorities have to be set.  I think Microsoft is doing the right thing by investing in the future instead of making a lot of investments in the On-Premise offering that we know will be replaced by the next evolution of BizTalk.  There were many discussions between the MVPs during this week in Seattle on this subject with mixed support across both approaches. With the explosion of Cloud and SaaS applications we need an integration platform that promotes greater agility, reduces complexity and addresses scale in a very efficient manner instead of fixing some of the deficiencies that exist in the current Server platform. I do think the strategy is sound, however it will not be trivial to execute and will likely take a few years as well.

Adapter Eco-system

Based upon some of the sessions at the BizTalk Summit, it looks like Microsoft will be looking to create a larger ISV eco-system around BizTalk Services.  More specifically in the Adapter space.  The reality is that the current adapter footprint in BizTalk Services is lacking compared to some other competing offerings.  One way to address this gap is to leverage trusted 3rd parties to build and make their adapters available through some sort of marketplace. I think this is a great idea provided there is some sort of rigor that is applied to the process of submitting adapters.  I would not be entirely comfortable running mission critical processes that relied upon an adapter that was built by a person who built it as a hobby.  However, I would not have an issue purchasing an adapter in this fashion from established BizTalk ISV partners like BizTalk360 or /nSoftware.

Conclusion

All in all it was a good summit.  It was encouraging to see the BizTalk team take BizTalk Services across the goal line and make it GA.  It was also great to see that they have identified the need for an accelerated release cadence and shared some news about the upcoming R2 release.  Lastly it was great to connect with so many familiar faces within the BizTalk community.  The BizTalk community is not a huge community but it is definitely international so it was great to chat with people who you are used to interacting with over Twitter, Blogs or LinkedIn.

In the event you still have doubts about the future of BizTalk, rest assured the platform is alive and well!