JENS MALMGREN I create, that is my hobby.

Porting my blog for the second time, deployment part 2

This is post #54 of my series about how I port this blog from Blogengine.NET 2.5 ASPX on a Windows Server 2003 to a Linux Ubuntu server, Apache2, MySQL and PHP. A so called LAMP. The introduction to this project can be found in this blog post /post/Porting-my-blog-for-the-second-time-Project-can-start.

The domain name malmgren.nl points to the server I host at home, right now when I write this. When I deploy my new blog I will log into Argeweb where I got my domain registered and there I will point it to the new Virtual Private Server.

I could do that right away, that would be a really quick site migration. The bad thing is that all links to pages on my old blog will not work properly on the new server. I would loose all the traffic to these old links and all pages indexed by Google would point to a wrong location. This is not a good thing.

Before Google there was this thing called "perma-links". A permanent link to a page. Often it was a cryptic code that would bring up the correct page. As long as a webmaster held on to the perma-link code people would find those pages for ever. It even would not matter what the page was called. But it does not work like that anymore. All links are permanent links these days. If you make a page on the Internet with any value then it is supposed to be found until the end of time. Kind of. And another issue with the perma-link is that it brings the user to the same page with a totally different url. Is it a duplicate? Yes it is. Duplicate content is worthless. If you have several ways to get to the same page then probably the page is worthless. Just my humble opinion.

So how do we solve this? The solution to this is that when a page that don’t exists anymore is requested on the new server then we let the server reply with a redirect. There are two important flavors of redirects:

  • Permanent: Status 301 means that the page moved permanently to a new location. The browser should not attempt to request the original location but use the new location from now on. This is what I will use for my transition.
  • Temporary: Status 302 means that the page is temporarily located somewhere else, and the browser should continue requesting the original url. This is useful if you have pages that you "delete" and then later you need the page to come back. Suppose you got a special offer type of page during the summer. Perhaps in the winder it is deleted and next summer it comes back again. I don’t have that so I don’t need that.

A server can provide many more replies. If you are interested you could read more about it here: https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

So that was the theory, now some workshop but wait a moment! In my previous post I was setting up the temporary website at www2.malmgren.nl. That was actually not necessary in my case. My old blog had this url: http://www.jens.malmgren.nl and now I decided that the new url will be https://www.malmgren.nl and these two can be live at the same time. So I could have been working on setting up the new web server behind the scenes without disturbing the live site. The first thing I did now was to correct this and get rid of www2 and make it www.malmgren.nl. With this sorted I could continue with the plan for deploying the new blog.

There is this thing with moving from http to https. Why would you need https? One reason is that apparently you get better ranking by Google with https but this might be an anecdote. Another thing is that your site could be easier to load with HTTP2, also there, it could be an anecdote. Anyway, I would like to try it so lets do that first. Then I come back to the redirection of the individual pages later.

My first idea was to go to Argeweb and buy the Transport Layer Security (TLS) stuff there since I am already a happy customer of Argeweb. That was impossible. Argeweb don’t want me as a customer of TLS certificates. If you want that you need to buy it as a part of a webhosting package. But a webhosting package can that be combined with my new and shiny Virtual Private Server. No. Up until now I had only great things to say about Argeweb but this was not anticipated. Years back I bought my domain name a Argeweb. Now a few weeks ago I got my VPS there. Apparently for an TLS certificate you need to come in from another angle to be served and then you get all sorts of other extra stuff that I might or might not need. It is probably possible that I can spend hours and days with a customer support to knit it all together but I am not that kind of person. My spare time is to valuable, if your services don’t work my way then I move myself to another place.

And that place is StartCom. They got a different business model. Get your certificate for free. If you loose it or dabble up then you pay. I am fine with that. And by the way TLS is another version of SSL.

You might already guessed that I use several web browsers. It is a really good idea to pay attention to what browser you are using when you sign up to StartCom because they will provide you with a certificate to ensure that you are you. Next time you come with another browser they are like "Who are you?".

To get this to work they carry out a couple of tests with you. They wanted me to put a file in the root of the web server. Then they wanted to send an email to the postmaster of the web server. Well first I had that disabled. Not to mention that I am in a transition so all these things I did on my old server. But that went fine. StartCom accepted me as me.

They also gave me a program ’startcomtool.exe’ to generate something. At first I had no clue what I was supposed to do with it. It was simple enough. I had to click on ’Generate CSR’ and then on ’Copy’ and then paste it into a field at StartCom and Submit and that was it.

I was rewarded with a zip archive with various files in folders. I concentrated on the Apache2 files. Then I had many many questions.

And StartCom had a PDF with answers. With these answers I was jigsawing a puzzle where the pieces fitted each other nicely.

#sudo a2enmod ssl

#service apache2 restart

At this point I got the green padlock icon but php would not run and give me the page.

http://www.debianadmin.com/install-and-configure-apache2-with-php5-and-ssl-support-in-debian-etch.html

By default the server will listen for incoming HTTP requests on port 80 -- and not SSL connections on port 443. So you need to enable SSL support by entering the following entry to the file /etc/apache2/ports.conf save and exit the file.

Listen 443

So I tried that. Now I could call the post.php and it would render through the php engine but my rewrite routine is not activated. I looked around and figured out that there are several ways of doing this. I tried this:

This worked for me!

I assume that the standard way of turning on TLS/SSL for a site is to do it for all sites at once in the web server. It is an assumption. In my case I want some sites to have padlock and others not. So when I got this working then I am pleased, even if it might not be the same way everybody else are doing it.

Now with the https in place it is about time to thing about the redirection. In the .htaccess file I need to add lines like these:

Redirect 301 /retiredpage.aspx https://www.malmgren.nl/post/newpage

So to create this list I need a complete list of the old pages and the new pages. Remember that in some cases I changed the slug of the page so I need to keep the old to be able to make the transition to the new page. So to make this possible I will need to go back to the import and make sure that during import I save the original path so that in the end I can make a redirection list. Oh well, where do I store the original path? In the post record obviously. But currently there is no field for that. So first we add one.

mysql≻ ALTER TABLE Post ADD COLUMN `Oldpath` VARCHAR(256) NULL;
Query OK, 468 rows affected (0.03 sec)
Records: 468  Duplicates: 0  Warnings: 0

Easy enough. Did it work?

mysql≻ desc Post;
+-------------------+--------------+------+-----+---------+----------------+
| Field             | Type         | Null | Key | Default | Extra          |
+-------------------+--------------+------+-----+---------+----------------+
| ID                | int(11)      | NO   | PRI | NULL    | auto_increment |
| NextID            | int(11)      | YES  | MUL | NULL    |                |
| PrevID            | int(11)      | YES  | MUL | NULL    |                |
| PositionType      | int(11)      | NO   |     | 0       |                |
| Author            | varchar(15)  | NO   |     | NULL    |                |
| Title             | varchar(256) | NO   |     | NULL    |                |
| Description       | varchar(256) | NO   |     | NULL    |                |
| Content           | longblob     | YES  |     | NULL    |                |
| IsPublished       | tinyint(1)   | NO   |     | NULL    |                |
| IsDeleted         | tinyint(1)   | NO   |     | NULL    |                |
| IsCommentsEnabled | tinyint(1)   | NO   |     | NULL    |                |
| PublishedOn       | datetime     | NO   |     | NULL    |                |
| ModifiedOn        | datetime     | NO   |     | NULL    |                |
| Slug              | varchar(80)  | NO   | UNI | NULL    |                |
| Oldpath           | varchar(256) | YES  |     | NULL    |                |
+-------------------+--------------+------+-----+---------+----------------+
15 rows in set (0.00 sec)

Yes it worked. So now back to the drawing board. Make changes to Analyze.pl

# http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-deployment-part-2.aspx
my $strOldpath = "/" . $dictStringFieldToValue{"slug"} . ".aspx";

Then when saving the Post I had to add the old path:

$dbh-≻do(
	’INSERT INTO Post (Author,Title,Description,Slug,Content,’ .
	’IsPublished,IsDeleted,IsCommentsEnabled,’ .
	’PublishedOn,ModifiedOn,Oldpath) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)’, undef,
	$dictStringFieldToValue{"author"},
	$dictStringFieldToValue{"title"},
	$dictStringFieldToValue{"description"},
	$dictStringFieldToValue{"slug"},
	$dictStringFieldToValue{"content"},
	$dictBoolFieldToValue{"ispublished"},
	$dictBoolFieldToValue{"isdeleted"},
	$dictBoolFieldToValue{"iscommentsenabled"},
	$dictStringFieldToValue{"pubDate"},
	$dictStringFieldToValue{"lastModified"},
	$strOldpath
	);

While I was at it I created the category 2016 and connected the Posts from this year to it. Like so:

# http://www.jens.malmgren.nl/post/Porting-my-blog-for-the-second-time-deployment-part-2.aspx
$dbh-≻do("INSERT INTO Category (Slug, Title, Description, TypeID) VALUES (’2016’, ’2016’, ’Things I made 2016’, 2)", undef);
my $intCategoryId = $dbh-≻{mysql_insertid};
$dbh-≻do("insert PostCategory (PostID, CategoryID) select Post.ID, $intCategoryId from Post where PublishedOn ≻ ’2016-01-01’;", undef);

When running Analyze.pl this time the Oldpath field was filled properly and the category 2016 was created and all Posts from this year was connected. Then it was just a matter of extracting the list of old and new. Like so:

mysql≻ SELECT Oldpath, Slug FROM Post INTO OUTFILE ’/tmp/redirectsource.txt’;
Query OK, 468 rows affected (0.00 sec)

The file created has 486 rows. At the beginning of each row there is the old path and followed by a tab character and then the new slug. To make this into redirect rules I convert this list in Notepad++ with one convenient regular expression replacement:

Pasted this into the .htaccess page. It worked.

Here at this point I need to be really honest about how theory and practice got in the way of each other. Some parts of the redirections worked but not all of them and absolutely not all redirects at the same time. So what happened here?

One day I would like to decomission my old website. Because of this I had to create a new website on the new webserver replacing the old webserver. That website replies to the http://www.jens.malmgren.nl webrequests. On that website it was possible to redirect to the new website with the list I created with the old and new slugs here above.

The configuration for http://www.jens.malmgren.nl on the VPS looks like this:

≺VirtualHost *:80≻
    ≺Directory /var/www/www.jens.malmgren.nl/public_html≻
        Options Indexes FollowSymLinks
                AllowOverride All
                Order allow,deny
                allow from all
    ≺/Directory≻
    ServerAdmin jens.malmgren@xs4all.nl
    ServerName malmgren.nl
    ServerAlias www.jens.malmgren.nl
    DocumentRoot /var/www/www.jens.malmgren.nl/public_html
    DirectoryIndex index.html
    ErrorLog /var/www/www.jens.malmgren.nl/log/error.log
    CustomLog /var/www/www.jens.malmgren.nl/log/access.log combined
    RewriteEngine on

    RewriteRule ^/category/(.*).aspx$ https://www.malmgren.nl/category/$1? [NC]
    RewriteRule ^/[0-9]{4}/[0-9]{2}/default.aspx https://www.malmgren.nl? [NC]

    ReWriteCond %{SERVER_PORT} !^443$
    RewriteRule ^/?(.*) https://www.malmgren.nl [NC]
≺/VirtualHost≻

As you can see I turn on the rewrite engine in the configuration. The first rule is converting an old category link to the new server and the new category linke. They look the same so that was easy.

The next rule takes all links to 2016/06/default.aspx and crunch them into a link to the main page of the new site. To bad but I don't have an elegant way of doing this.

The last two rules checks if the port was not 443 then it would just forward anything to the new website.

So how do you find ideas for what pages you should redirect? Well that is easy, just search for "site:www.jens.malmgren" in Google to find out what is out there.

Next up was redirecting images I placed manually on the Windows server in a folder I called "Images".

For this directory I copied the directory listing from the command prompt and pasted it into Notepad++ and then I did this regular expression transformation:

Search: ^(.*?)$
Replace: Redirect 301 "/Images/1" "https://www.malmgren.nl/media/1"

Please notice that I used spaces in these filenames so in this case I had to use quotes as well. That was not all images though. I pasted these into the .htaccess file of the www.jens.malmgren.nl site.

So far everything worked well. There was something that failed horribly whatever I was throwing at it and that was this construction:

http://www.jens.malmgren.nl/image.axd?picture=2015%2f11%2fEmiliaCostaLipsRightSize.jpg

The same image in the new blog looks like this:

https://www.malmgren.nl/media/EmiliaCostaLipsRightSize.jpg

So I decided to come back to that issue later.

I noticed that whenever the redirection is not supertight it could happen that directory browsing happened on the server. That is not nice. So I disabled that by removing the option Indexes from VertualHost in the conf files of Apache. Done.

With this the website was live and the time was half past 12 AM on 21 June 2016. This project started on 5 September 2015. Wohoo! For me this is a big moment. It feels great.

I was born 1967 in Stockholm, Sweden. I grew up in the small village Vågdalen in north Sweden. 1989 I moved to Umeå to study Computer Science at University of Umeå. 1995 I moved to the Netherlands where I live in Almere not far from Amsterdam.

Here on this site I let you see my creations.

I create, that is my hobby.