Tim Jeanes: TechEd 2008 - Day 3

PDC201 - A Lap Around Cloud Services

Wow! I'd heard a little about cloud computing and how it was the way forward, but I didn't expect it to be as different as this!

Windows Azure is the operating system that handles it, but it's not an OS you can buy and install. It's just what runs in the Microsoft cloud. You buy computing time and storage space (somehow - they haven't decided how to charge for it just yet) and work within that. To you it looks like you're running in Windows Server 2008 64 bit, fully patched with .NET 3.5 SP1 on IIS7. You don't have many priviledges there though - you certainly can't install anything.

The biggest change is how storage works. The file system's not at all what you're used to. Instead of files you have blobs, tables and queues.

Blobs are the most like files, and can be kept in containers, which are pretty much like folders. If you purchase only storage and no computing power, this is all you need to store all your files in the cloud.

Tables are nothing like SQL tables. They contain entities, which are collections of properties. You can mix types of entities within a table. Each entity needs a row key and a partition key. Row keys give you instant access to an entity; every other query results in a full table scan. You can partition your data into more managable chunks, and so if your query specifies the partition, only that partition will be scanned. If you don't know which partition your data is in, you can run as many asynchronous queries as you have partitions to speed the process up. This seems really weird to me - that there's no indexing at all - but we're assured that it's all designed to be scalable to the order of billions of rows.

Queues let you specify work items. You can buy large numbers of websites and worker processes, but as they can't talk to each other directly, they can only communicate via your storage. Queues are the way of handling this.

There's an SDK available at www.azure.com that lets you run a local mock of the cloud and develop against it. Once you're happy with your service, deployment is a doddle. You build your app, upload the compiled file, upload the Service Model file (that tells the cloud how many instances to run, etc.) and you're done.

Initially your service is deployed to a staging area for checking; deploying it there takes some time to get it uploaded and running, but swapping it with your live application is instant, thus ensuring you have zero downtime, even during upgrades.


DAT308 - Answering The Queries Your Users Really Want To Ask: Full-Text Search in SQL Server 2008

We were debating in the office the other day: when the user uses one of our search pages to search the customer table, and they type John Smith into the name search field, did they mean Name='John Smith', Name LIKE '%John Smith%', Name LIKE '%John%' AND Name LIKE '%Smith%', or Name LIKE '%John%' OR Name LIKE '%Smith%'. Well, Full-Text searching opens up a whole load of new possibilities.

Full-Text searching understands about word boundaries, it understands about different inflections on words (you can search for 'file' and get back 'filing'), and it even understands about thesauruses (you can search for 'customer' and get back 'client'). It can be configured to run on different languages. It can open documents (Word, PDF, etc.) that you're holding in binary fields in your database, pull the text out of those documents and search them too. If you hold documents of your own custom type, containing text in Klingon, you can make your own add-ons to cope with that too.

What's also nice is that the indexing all happens in the background - your searches will run quickly when you come to query the data.

We saw a lot of this at a previous TechEd, but immediately wrote it off as a bad job when we found out how broken it was in SQL Server 2005. The searching was done by the OS rather than by SQL Server, so if you searched on a text field and on an ID fields simultaeously, the OS would scan the text in every single record while SQL pulled out the one record with that ID, then compare the results and keep the intersection. Also, backing up the database didn't back up the Full Text Index, so restoring the database later would leave you with an index that was worse than useless.

Fortunately, this has all been fixed in SQL Server 2008: it does all its own searching, and the indexes are including in the backups. It all just works.

You get a rank for each full text search query as well, so if you say you want 'John NEAR Smith', you'll get every record containing both words, but the further apart they are, the lower the rank.

Still, I feel we're still a step away from idealism. There are so many options on how to search text data, it's fine if you're writing a bespoke SQL statement and understand what the user's actually looking for. Unfortunately what users want is a text box on a web page that acts like Google.

I guess we could run a few different full text search algorithms and merge the results, but merging ranks across queries is meaningless, unfortunately, so I guess we'll just have to pick one search algorithm and run with that. The jury's still out.


DAT307 - ADO.NET Entity Framework In Depth

Oh dear. What a disappointing session. We've been using LINQ to SQL for almost a year now, and were keen to see how we can upgrade to LINQ to Entities and what advantages it would give us.

Unfortunately, we only skimmed over the new features: the examples we saw pretty much may as well have been done using LINQ to SQL. In passing, they mentioned that the Entity Framework supports one entity spanning more than one database table, and polymorphism being allowed in the entity-to-table mapping, but no more detail than that, and certainly no demonstration. I knew this already: I wanted to know what this looks like and how to use it.

One new feature we saw in passing was this:
    from p in myContext.Products.Include("Category")
... ensuring that the Category object for each Product is built when the query is executed. Now that's a whole lot less restrictive than the LoadOptions we used in LINQ to SQL!

We saw a little more of use LINQ against ADO.NET Data Services that we saw on Monday: given a service that returns records, you can write LINQ against its result set - the clever thing being that the query gets put into the query string of the request, so the server only returns the records you want.

Still, if I'd spent the time on Google I think I would have learnt more, but bafflingly there's no wireless network availability in the seminar rooms this year. Sigh.


PDC209 - .NET Services Drilldown - Part 2 (Access Control and Workflow)

OK, to be honest this pretty much went over my head. I was tempted to cut and paste from Wikipedia and see if anyone noticed, but there wasn't enough material on there to make it convincing.

The key points, though, were that Microsoft are moving away from roles-based authentication and towards claims-based authentication. Their offering to help with this is currently called Geneva, which has a number of components to handle the various parts of the authentication process.

When securing a cloud application, the user calls tha app, which requests claims from a trusted claims provider. The claims can give details about the user's age, location, employment status, etc., which the cloud app can then map to its own application-specific claims. Those claims can then be checked at the appropriate points in the application to ensure that the user has the appropriate rights to perform various functions. Microsoft are working to the open standard for claims-based authentication, so the authentication can come from anywhere, provided the application trusts that claims source.


PDC210 - Developing and Deploying Your First Cloud Service

Here we saw a bit more detail about writing cloud services, going into a little more detail than this morning's session.

Configuring an application is a little different to the web.config file we're used to. Instead we have a ServiceDefinition file that tells Azure what your service looks like, and what settings you require; also you have a ServiceConfiguration file that gives the values of these settings for your particular installation.

In this session we also saw (briefly) how blob storage can be secured, and how you need to provide credentials for read or write access, and also how you can keep certain blobs public: public blobs can be accessed directly by URLs, and would typically be what you use for pictures you put in your website.

We saw how to code against table storage. You use ADO.NET Data Services to talk to table storage, which fully supports LINQ. You can make a DataContext for accessing your tables by inheriting it from TableStorageDataServiceContext - the constructor takes a StorageAccountInfo, which is all it needs to start accessing the tables. Your classes that describe the entities in the tables inherit from TableStorageEntity, forcing you to implement the necessary row key and partition key.

Though table storage is what's really used to store data in the cloud, I really hope it isn't the closest thing you get to proper SQL-like table access. I found myself wondering about the most efficient way to store our business entities in this framework, how to construct further tables to index them and make them searchable in a reasonable timeframe. However I pretty soon realised that someone else will do a much better job of it than I can, and, just like SQL Server does that for me in conventional data storage, SQL Services will do it for me in the cloud. Accessing it from our application may be a little different, but actually there's a chance it may be exactly the same.

Finally we saw a (somewhat rushed) example of writing Worker Roles. These are the only bits of executable code you get to put in the cloud other than Web Roles (i.e. the processes that run your web site or web services). These can't make any connections of their own to the outside world, and can only read data from your data storage area.

Each Worker you make has to run indefinitely - basically you deliberately trap it in an infinite loop (with suitable Thread.Sleep calls to ensure it doesn't gobble CPU (that you'll most likely have to pay for)).

The worker picks up jobs left for it in the queue (most likely by your Web Role) and does whatever you specify with them. I guess to me this is most like the scheduled clean-up jobs we have running on our web servers, combined with the long-running jobs our web apps sometimes spin up additional threads to execute.

All in all, cloud computing is the future whether we like it or not. Windows Azure looks to make it pretty manageable and shockingly reliable and fast. It's a bit of a jump in the learning curve, but not as big as it could have been. I feel sure that here at Compsoft we're extremely well positioned to start developing applications in the cloud and I just can't wait to get started!