Sunday, August 25, 2013

How to Setup a Tornado Server, with Web Sockets Support, on Production, Behind NGINX


Ubuntu 13.04, Digital Ocean



I am not an expert on servers setup, so if you see anything I could have done better please leave a comment and I'll try it out on my server and improve this tutorial. So far this setup works very nice for me. Ok, let's get this started.

Step 1 - Create User

First things first, let's ssh into the server and because this is the first time, i will do it as a root user:

$ ssh root@<your server ip>


Say yes to continue connecting, enter your password and we are in.

On Digital Ocean, the password that you will receive is a default and therefore, first thing, we need to change that:

$ passwd

Next thing, i will create a user for myself:

$ adduser dev

Define new password for the user and you can leave everything else blank or fill it in, as you wish.
Next step is to give my new user root privileges:

$ visudo

Find the section called 'User privilege specification', it will look something like this:

# User privilege specification
root ALL=(ALL:ALL) ALL

under it add your user, granting all root privileges:

dev ALL=(ALL:ALL) ALL

Enter Cntr x to exit and save the file.

Step 2 - Configure SSH

Although this step is optional, it is very much advisable to do this for security reasons. 
Caution! Misplacing following information can bar your entrance from your own server!
That said, we are all grown people and if you can't keep your login creds secure and in mind(or securely backed up somewhere), then this is no business for you.

Open the configuration file:


$ nano /etc/ssh/sshd_config

Find the following sections and change the information as follows:


Port 5050
Protocol 2
PermitRootLogin no


1. Port -  default is 22. You can change this to anything between 1025 and 65536. Whatever you choose, remember it (!) you will use it to login in the future. 
2.  PermitRootLogin - change this form 'yes' to 'no'. This will disallow root login. From now on, you will be able to login with your user only. I also found out, that on Digital Ocean, you will still be able to login with root user through a terminal on digitalocean.com.

Next, add following lines at the bottom of the config file:


UseDNS no
AllowUsers dev


AllowUsers will allow only this users to login to the server. Don't forget to substitute 'dev' for whatever you called your user.

Finally, reload the ssh to make the changes take effect:


$ reload ssh


To test this, open new terminal and try to login with your user this time:


$ ssh -p 5050 dev@<your server ip>


If something went wrong and your root access terminal was closed and you can't login with your user, you can go back to digitalocean.com and login with root using their terminal and go over the sshd_config file, fixing whatever might have gone wrong.

Step 3 - Setup SSH Login to Server

Creating a new user for yourself, and a new password, both for root and your new user, and preventing root login access, and changing the port, and defining allowed users to login to the server, is all very well and good...BUT! There is always a but ;) Your password can still be very well cracked by brute force attack. Here come into picture SSH keys that are nearly impossible to guess by brute force alone.

Make sure that you have a .ssh/ (chmod 700) folder under /home/dev/ (a.k.a ~ ) on your server and chmod 600 for .ssh/* (for all files inside). And now let's make our server even more secure, and easier to access from your computer, by creating a pair of ssh keys.

On your computer


$ ssh-keygen -t rsa
$ Enter file in which to save the key (/demo/.ssh/id_rsa): ~/.ssh/tornado_server_rsa
$ Enter passphrase (empty for no passphrase): 


I usually leave this empty, so hit enter.
Now let's add this key to the list of known identities:


$ ssh-add ~/.ssh/tornado_server_rsa


Verify that it went ok:


$ ssh-add -l


Should show the new key.
At this point you should have a pair of public/private keys for your server. What is left is to pass the public key to the server. 2 ways to do this.
First:


$ ssh-copy-id dev@<server ip>


Second:


$ cat ~/.ssh/tornado_server_rsa.pub | ssh -p 2012 dev@<server ip> ''cat >> ~/.ssh/authorized_keys''


No matter what you chose, in both cases you should see some communication going on, asking you if you are sure you want to continue connecting and such. The usual ssh stuff.

After this step you should be able to login to your server using ssh keys without the prompt for password. A much more secure and easier way.


Step 4 - Installing Necessary Packages


$ sudo apt-get install python-setuptools libcurl4-gnutls-dev libexpat1-dev gettext libz-dev libssl-dev build-essential git-core git nginx keychain



$ sudo easy_install pip
$ sudo pip install tornado
$ sudo pip install supervisor


Setting up keychain (optional)

This step is good if you keep the code under your /home folder.
What keychain does for us is help us manage our ssh keys. It drives both ssh-agent and ssh-add, and can maintain a single ssh-agent process across mutliple login sessions. It means that you will have to enter your passphrase only once each time your machine is booted.

Add following to your .bashrc:


#####################################################################################
### The --clear option make sure Intruder cannot use your existing SSH-Agents keys
### i.e. Only allow cron jobs to use password less login
#####################################################################################
/usr/bin/keychain --clear $HOME/.ssh/*_rsa
source $HOME/.keychain/$HOSTNAME-sh




If you didn't create .bash_aliases file yet, now is the time to do so:


$ touch ~/.bash_aliases
$ nano ~/.bash_aliases


Add following lines to the file:


alias ssh='eval $(/usr/bin/keychain --eval --agents ssh -Q --quiet ~/.ssh/*_rsa) && ssh'


This checks to see if keychain is runnig and if not starts it, adding all your rsa keys under .ssh folder.


alias git='eval $(/usr/bin/keychain --eval --agents ssh -Q --quiet ~/.ssh/*_rsa) && git'


Why aliases?
Keychain is tacking itself to every command you might need to use keys. In our case ssh and git. First time you run those command, you will enter the passwords and for afterwards keychain will do this for you.

Step 5 - Get the Code

Pulling the code from Github is much easier done by ssh. So, let's create a pair of keys to use with git. 


$ sudo ssh-keygen -t rsa


Enter a passphrase if you want, otherwise hit enter.
I usually save the keys in named files:


$ Enter file in which to save the key (/demo/.ssh/id_rsa): ~/.ssh/github_rsa


Why sudo ssh-keygen? Because i'm going to save the code under /srv/ folder which is accessible for write only to root users, therefore usual 'git clone' will not work. You can create an ssh key without sudo, but then it won't work with 'sudo git clone' that you will have to use to pull the code.

Now let's create a folder that will hold the code:


$ sudo mkdir /srv/www
$ sudo mkdir /srv/www/myapp


Go to the folder that will hold the code, in my case that is myapp/ :


$ cd /srv/www/myapp
$ sudo ssh-agent /bin/bash
root@server$ git clone git@....

Cntr+d to exit.

Step 6 - Setup NGINX

Create a user for nginx:


$ sudo adduser --system --no-create-home --disabled-login --disabled-password --group nginx
$ sudo nano /etc/nginx/nginx.conf


Configure the file according to this gist.

The server configuration, under http {}, you can alternatively add to 
/etc/nginx/sites-available/myapp (you will have to create it first), then run:


$ sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp


In both cases, delete default server block:


$ sudo rm -r /etc/ngingx/sites-enabled/default


Step 7 - Configure Supervisor

Supervisor will help you keep your tornado server running.


$ sudo mkdir /etc/supervisor/conf.d/tornado.conf
$ sudo nano /etc/supervisor/conf.d/tornado.conf


Inside this file add following lines:


[program:tornado-8001]
command = python /srv/www/myapp/app.py --port=8001
stderr_logfile = /var/log/supervisor/tornado-stderr.log
stdout_logfile = /var/log/supervisor/tornado-stdout.log 
autostart = true
autorestart = true


[program:tornado-8002]
command = python /srv/www/myapp/app.py --port=8002
stderr_logfile = /var/log/supervisor/tornado-stderr.log
stdout_logfile = /var/log/supervisor/tornado-stdout.log 
autostart = true
autorestart = true

[program:tornado-8003]
command = python /srv/www/myapp/app.py --port=8003
stderr_logfile = /var/log/supervisor/tornado-stderr.log
stdout_logfile = /var/log/supervisor/tornado-stdout.log 
autostart = true
autorestart = true

[program:tornado-8004]
command = python /srv/www/myapp/app.py --port=8004
stderr_logfile = /var/log/supervisor/tornado-stderr.log
stdout_logfile = /var/log/supervisor/tornado-stdout.log 
autostart = true
autorestart = true


Cntr+x, save and exit the file.
Make sure that all log file directories exist, if need be create them.

Step 8 - Start NGINX and Supervisor


$ /etc/init.d/nginx start
$ sudo supervisord


To validate that your app is up and running, access your domain. If you only see nginx default template, then something is wrong in your nginx configuration and nginx is not forwarding the requests to your app. In that case, check nginx error log and tornado-stderr.log.



Monday, August 12, 2013

Django Class Based Views and Inline Form Sets

Django 1.5.1, Python 2.7.5

Ok, so you moved on to working with class based views, got a usual form going on, and everything went smooth, but now you want to use more than one form on the same page and, even better, one of the forms is actually a form set. Cool! Here is a real example of how I made it work on one of the projects I was working on.

For the sake of the example, a few words of what I have and what I want to achieve.
I want to let my users manage theirs sponsors. Simple in itself and a basic need. My users want to be able to define a sponsor and it's sponsorship dates. From this I have 2 models: Sponsor and Sponsorship, connected with one to many field, aka ForeignKey.
Let's say we defined the models, and take  a look at forms.py :

from django.forms import ModelForm
from django.forms.models import inlineformset_factory

from models import Sponsor, Sponsorship


class SponsorForm(ModelForm):

    class Meta:
        model = Sponsor

 

class SponsorshipForm(ModelForm):
     class Meta:
        model = Sponsorship

SponsorShipsFormSet = inlineformset_factory(Sponsor, Sponsorship,
                                            form=SponsorshipForm, extra=2)


Notice that I defined the formset in forms.py and not inside the view.

Ok, so I have the models, I have the forms, now I need a view to work it all out.
views.py:



class CreateSponsor(SponsorMixin, CreateView):
    form_class = SponsorForm
    template_name = 'sponsor_form.html'

    def get_context_data(self, **kwargs):
        data = super(CreateSponsor, self).get_context_data(**kwargs)
        if self.request.POST:
            data['sponsorships'] = SponsorShipsFormSet(self.request.POST)
        else:
            data['sponsorships'] = SponsorShipsFormSet()
        return data

    def form_valid(self, form):
        context = self.get_context_data()
        sponsorships = context['sponsorships']
        with transaction.commit_on_success():
            form.instance.created_by = self.request.user
            form.instance.updated_by = self.request.user
            self.object = form.save()

        if sponsorships.is_valid():
           sponsorships.instance = self.object
           sponsorships.save()

        return super(CreateSponsor, self).form_valid(form)

    def get_success_url(self):
        return reverse('sponsors')


Let's go over what is going on here. My main form is SponsorForm and the view will take care of that pretty much by itself. Notice get_context_data(), on get it will create an unbound SponsorShipsFormSet and on post instantiate it with the data in self.request.POST. But if you remember to instantiate inline formset you also need, on get and on post, to define an instance, which is the main model instance. You don't see it here because it's a create new instance view, there is no instance yet to which i want to bind the formset.
Next is form_valid method. I get the context of the view, and extract from it the formset, which now holds the data the user entered. I make sure that i first save the main form - SponsorForm, that will create an instance i need for the inline formset. Notice that i call is_valid() method on sponsorships, that's because the class doesn't take care of it for me (this is SponsorForm class, for it, it happens automatically). All that is left to do is to bind the formset to this new instance and we are done ;)

Here is a gist for better readability.

Sunday, July 28, 2013

How to Add an RSS Feed

Python 2.7.5  Django 1.4.5

RSS stands for Rich Site Summery or Really Simple Syndication, often called feed or channel, and provides a summarised text (in XML format) on what is going on on your website.

Here you can find Django official documentation on RSS feeds: https://docs.djangoproject.com/en/1.4/ref/contrib/syndication/.

In general, I find working with XML from Python code a pretty unpleasant experience, but there is no running away from it, almost every project will need one of those. Django build in syndication feed framework will take care of a lot of those unpleasantries, and, within it's own limitations, provides a pretty nice way to create a feed and fast.

Now, I will give a pretty straightforward example, yet above that you will find on Django official docs. If this is your first glance on RSS feeds in Django, you might want to go over the docs first (see a link attached at the beginning of this article).
I wrote this feed for a project I'm working on. In this project I have users, and each user has it's own RSS feed(s). Each channel will have a title, a link, description and an image. The image on each channel will show same thing for all and it is my project logo with a link to main website.

I like my code organised, so I start by creating a feeds.py on the same level as views.py.

In feeds.py (follow the link to view this code in a more readable way on gist):

from django.shortcuts import get_object_or_404
from django.contrib.syndication.views import Feed
from django.utils.feedgenerator import Rss201rev2Feed

from myproject.myapp.models import Model, Broadcast, \
     BroadcastSocialAccountStatus


OFFICIAL_URL = "http://mainwebsite.com/"
OFFICIAL_LOGO_URL = 'https://s3.amazonaws.com/bucketname/logo.png'


class OfficialFeed(Rss201rev2Feed):
    def add_root_elements(self, handler):
        Rss201rev2Feed.add_root_elements(self, handler)
        handler.startElement(u'image', {})
        handler.addQuickElement(u"url", OFFICIAL_LOGO_URL)
        handler.addQuickElement(u"title", u'MyProject')
        handler.addQuickElement(u"link", OFFICIAL_URL)
        handler.endElement(u"image")


class BSAFeed(Feed):

    feed_type = OfficialFeed

    def get_object(self, request, *args, **kwargs):
        user_slug = request.META['PATH_INFO'].split('/')[2]
        user = get_object_or_404(Model, slug=user_slug)
        return user

    def link(self, obj):
        return '/user/{}/broadcasts/feed/'.format(obj.slug)

    def title(self, obj):
        return "%s Broadcasts Feed" % obj.name

    def description(self, obj):
        return "Updates on %s social media activity" % obj.name

    def items(self, obj):
        return BroadcastSocialAccount.objects.filter(broadcast__user=obj,
                 status=BroadcastSocialAccountStatus.SENT).order_by('-sent_at')

    def item_title(self, item):
        return "{} posted on {}".format(item.broadcast.user.name,
                                     item.social_account.partner)

    def item_description(self, item):
        return item.broadcast.message

    # item_link is only needed if NewsItem has no get_absolute_url method.
    def item_link(self, item):
        return item.get_broadcast_abs_url()

    def item_pubdate(self, item):
        return item.sent_at

Let's take a look at what I've got here:

  • feed_type - this is where you can specify if this will be an RSS feed or an Atom feed. This is also where you can add elements to the channel. In my case I added a logo to all channels, which also is a link to my main website.
  • get_object - object in this case is the feed. If you are familiar with Django class based views, this shouldn't be surprising.
  • link - returns a relative uri to this feed (channel). Relative to what? Relative to your project Site instance. This is also one of the limitations of Django feed framework - it's a bit of a problem working with multiple domains on the same project, especially if you don't have multiple Site's to go with those domains.
  • title - title for this feed.
  • description - a description of the feed.
  • items - this is the content of the feed.  Build in go per Model. If you want to syndicate more than one Model, for example if this is a feed for cars manufactory and you want to have a feed for cars and currently active offers, you will either need to create 2 feeds: one for cars and another for offers, or you will need to query for both cars and offers, and then combine them into one list. This method should return a list of items to present, each will have a title, a description, a link and a publication date.
  • item_title - title of an item.
  • item_description - description of an item.
  • item_link - a link to a full article, a link to where a reader can view the item. If this is a blog post, then this is a link to the post itself.
  • item_pubdate - when this item was published, helps readers see how up to date the content is.

An RSS feed is actually just one of the views on a website, and returns a response in XML format. And every view needs a url of course. 
So, now lets connect this view to a URL, in urls.py:

from django.conf.urls import url
from myproject.myapp.feeds import BSAFeed

urlpatterns = (
  ...
  url(r'^broadcasts/feed/$', BSAFeed(), name='broadcasts_feed'),
)


And that's it. Now I have a feed on my website.

Tuesday, July 9, 2013

Forms Submit


Django 1.4, JQuery 1.7.2, Bootstrap Twitter
I already talked about how to add forms to pages using Django and now I want to talk about how to prevent double submit of forms.
So, I have following template:
{% block content %}

{% trans "Submit" as submit %}

 <div class="well">
  <h1>{% trans "Report an Issue" %}</h1>
  {% if error_message %}
  <p><strong>{{ error_message }}</strong></p>
  {% endif %}
  <form id="form" enctype="multipart/form-data" method="post" class="form-horizontal">
   <div class="control-group">
    {% csrf_token %}
    {% crispy form %}
    <input type="submit" value="{{ submit }}" class="btn" />
   </div>
  </form>
 </div>

{% endblock %}
It's a usual POST, after submit a user is redirected to another page.
To prevent double submit I will disable submit button on click, to prevent double click from submitting two identical forms, and show a spinner instead of my form. To show a spinner I will add one more div to the template, with loading animtion (gif file) from this nice site http://ajaxload.info/ .
And now my template will look like this:
{% block content %}

{% trans "Submit" as submit %}

 <div class="well">
  <h1>{% trans "Report an Issue" %}</h1>

  {% if error_message %}

  <p>
   <strong>{{ error_message }}</strong>
  </p>

  {% endif %}

  <form id="form" enctype="multipart/form-data" method="post" class="form-horizontal">
   <div class="control-group">
    {% csrf_token %}
    {% crispy form %}
    <input type="submit" value="{{ submit }}" class="btn" />
   </div>
  </form>
 </div>

 <div id="loader" style="display:none;">
  <p><img src="{{ STATIC_URL }}img/loader.gif" />{% trans "Please Wait" %}</p>
 </div>

{% endblock %}
As you can see I add a new div with style set to display:none, so as to not show it before time.
Issue form js file:
$(function(){
 $('input[type=submit]').click(function(){
  $(this).attr('disabled', 'disabled');
  $(this).parents('.well').hide();
  $('#loader').show();
  $(this).parents('#form').submit();
 });
});
Let's take a look at whats going on here. First of all I wait for the document to be ready, which means that the DOM is ready and the content is not rendered yet. I locate my submit button in the DOM and bind handler function to click event. By setting disable attribute to disabled, I prevent double click. I hide the div containing the form, show loader div and submit the form. In my case I don't need to hide the loader after submit and show the form again, because on submit the browser will redirect or re-load the whole page, if the form didn't pass validation.
One more thing, CSS file, without one I got my loader div showing right after nav-bar div, and not centered, so to fix this small problem:
#loader {
 margin-right: 250px;
 margin-top: 150px;
 margin-bottom: 150px;
}
This will show my loader in the middle of the page.
Usually, you don't need to show a spinner in such cases, because the browser will show one, but for the sake of exercise and  those cases when you want to make form submit more pronounced and obvious it is good.


Forms in Django


Django 1.4, JQuery 1.7.2, django-crispy-forms
Want to get information from a user? News flash, you need a form :)
In this post I will show how to create a form that will have its own page, another form that won't have its own page, and finally how to add more than one form to a page.

To Each its Own

Every form starts in forms.py, where you define what fields it will have, in other words what information you want from a user. Form is a kind of a pipe between a model, a database, and a user. So, to create a form in Django you first of all need a model.
In models.py:
class SuggestedWebSite(models.Model):

    homepage_url = models.URLField(_("homepage url"))
    picture = models.ImageField(_("picture"), upload_to='user_suggested_websites', blank=True, null=True)
    added_at = models.DateTimeField(_("added at"), default=datetime.now)
    added_by = models.ForeignKey(User, related_name="suggested_websites", verbose_name=_("added by"))

    class Meta:
        verbose_name = _("suggested site")
        verbose_name_plural = _("suggested sites")

    def __unicode__(self):
        d = extract(self.homepage_url)
        return d.domain + '.' + d.tld
I use translation, cos this site shows in Hebrew and the arguments that look like  _("str") used for translating fields names. About picture field and how to upload files in Django I will talk later on, in another post.
Now to forms.py:
from django.forms import ModelForm
from myproject.websites.models import SuggestedWebSite

class SuggestedWebSiteForm(ModelForm):

  class Meta:
    model = SuggestedWebSite
    fields = ('homepage_url', 'picture')
and of course views.py:
from myproject.websites.forms import SuggestedWebSiteForm
from myproject.websites.models import SuggestedWebSite
from crispy_forms.helpers import FormHelper
from django.contrib.auth.decorators import login_required
from django.core.urlresolvers import reverse
from django.http import HttpResponseRedirect
from django.core.mail import mail_managers

def abs_url(request, url):
    site = RequestSite(request)
    return "http://%s%s" % (site.domain, url)

@login_required
def suggest_website(request):
    if request.method == 'POST':
        form = SuggestedWebSiteForm(request.POST, request.FILES)
        if form.is_valid():
            form.instance.added_by = request.user
            form.save()

            mail_managers(_('User suggested a site: %s') % 
                             form.cleaned_data['homepage_url'], "\n".join([ 
                             _('url: %s') % form.cleaned_data['homepage_url'],
                            abs_url(request, reverse('admin:websites_suggestedwebsite_change', args=(form.instance.id,))),
                            abs_url(request, reverse('admin:websites_suggestedwebsite_changelist')),]))
            return HttpResponseRedirect(reverse('suggest_thankyou'))
    else:
        form = SuggestedWebSiteForm()

    form.helper = FormHelper()
    form.helper.form_tag = False

    return render(request, 'suggest_website.html', {'form': form, })
Lets take a look at what is going on here. First of all, see the decorator @login_required, in other words, only logged in users will be able to access this page. This small decorator checks if request.user.is_authenticated() before allowing the user to post, if you don't use it, and you don't want everyone to be able to submit the form, then you will need add this check before posting anything to your server. Now, the user is authenticated, follows the link, fills in the form, clicks on submit and the data is sent to the server. On server side we now check if the method is post, which means we are about to write data to server. Now we create a new instance of this form, and check that the user entered valid data, more on that you can find on Django website, if the data was found valid I do a few more things with the data, save the form, inform relevant people about this event and then redirect a user to a nice page, that says thank you. I the form wasn't sent in post method, I create an unbound form, and present it with all its errors, nothing gets to be written to the server in this case. form.helper that you see is a part of crispy-forms, that includes some very nice functionality for handling forms in Django.
in urls.py:
from django.conf.urls import patterns, url
from django.conf import settings
from django.views.generic.simple import direct_to_template

urlpatterns = patterns('',
    ...
    url(r'^suggest/$', 'buggy.websites.views.suggest_website', name='suggest_website'),
    url(r'^suggest-thankyou/$', direct_to_template,  {'template': 'suggest_thankyou.html'}, name='suggest_thankyou'),
)
and of course a template to output it all nice and easy:
{% extends "base.html" %}
{% load i18n %}
{% load crispy_forms_tags %}

{% block title %}Suggest a Website{% endblock %}

{% block content %}

<form enctype="multipart/form-data" method="post">
 {% csrf_token %}
 {% crispy form %}
 <input type="submit" value="{% blocktrans %}Submit{% endblocktrans %}" class="btn" />
</form>

{% endblock %}
Notice enctype attribute, it's a part of what it takes to allow file uploads. csrf_token is a must for every post method you make, to protect you from cross site attacks. crispy form outputs the form nice and easy, and if you have validation errors, like submitting null form, it will be outputted nice and friendly to the user, explaining the problem at hand and suggesting how to fix it.
Very nice and easy :)

 Template Crisys - How to Add a Form to Another Page

So, what if I don't want to create separate template to output the form?
Have no fear, here it comes :)
Where it all begins, in models.py of course:
from django.utils.translation import ugettext_lazy as _
from datetime import datetime
from django.contrib.auth.models import User
from django.db import models

class WebSiteComment(models.Model):

    content = models.TextField(_("content"))
    picture = models.ImageField(_("picture"), blank=True, null=True,  upload_to='upics', max_length=200)
    site = models.ForeignKey(WebSite, related_name="comments", verbose_name=_("site"))
    added_by = models.ForeignKey(User, verbose_name=_("added by"))
    added_at = models.DateTimeField(_("added at"), default=datetime.now)
    approved = models.BooleanField(_("approved"), default=False)
    approved_by = models.ForeignKey(User, blank=True, null=True, related_name="sitecomments_approved", verbose_name=_("approved by"))

    class Meta:
        verbose_name = _("site comment")
        verbose_name_plural = _("site comments")

    def __unicode__(self):
        return "Comment by %s on %s" % (self.added_by, self.site)
In forms.py:
from django.forms import ModelForm
from myproject.websites.models import WebSiteComment

class AddSiteCommentForm(ModelForm):
    class Meta:
        model = WebSiteComment
        fields = ('content', 'picture')
Very simple and easy, from all the fields the user will see only the content and picture fields, with their translated names.
Where it all connects, views.py:
from myproject.websites.forms import AddSiteCommentForm
from myproject.websites.models import WebSite, WebSiteComment
from crispy_forms.helpers import FormHelper
from django.core.urlresolvers import reverse
from django.shortcuts import render
from annoying.decorators import  JsonResponse
from django.http import HttpResponseRedirect

def details(request, dname):
    w = WebSite.objects.get(domain=dname)

    comment_form = None

    if request.user.is_authenticated():
        if request.method == 'POST':
            comment = WebSiteComment(site=w, added_by=request.user, added_at=datetime.datetime.now())
            comment_form = AddSiteCommentForm(request.POST, request.FILES, instance=comment)
            if comment_form.is_valid():
                comment_form.save()
                return HttpResponseRedirect(reverse('myproject.websites.views.details', args=(dname,)))

        else:
            comment_form = AddSiteCommentForm()

        comment_form.helper = FormHelper()
        comment_form.helper.form_tag = False

    issues = [(o, o.is_affecting_user(request.user) if request.user.is_authenticated() else False) for o in w.issues.all()]

    favourite = False
    if request.user.is_authenticated():
        if UserFavouriteWebsite.objects.filter(user=request.user, site=w).exists():
            favourite = True

    return render(request, 'details.html', {
                                            'website': w,
                                            'is_favourite': favourite,
                                            'issues':issues,
                                            'comment_form': comment_form,
                                            })
Now this is a very nice example, you can see here quit a few things. As you can see, no login decorator here, so I'm performing the check for authentication myself. And, as before, I make sure that post method was used correctly, and here is first difference from before. First I create a comment instance, then I create a new form instance and connect them to each other, basing new form instance on comment instance, filling in all the fields that were not presented to the user, and filled in automatically on server side. Secondly, if the form was handled with no errors, then i reload the page, presenting new comments added, and showing the form again, if the user will want to add some more comments. And then I continue working on other elements that will be shown on the page, and return all the data for rendering.
Meanwhile in the template:
<div class="span6">

 <a name="comments" href="#"></a>
 <h2>{% trans "Comments" %}:</h2>
 {% for c in website.comments.all %}
 <div id="id-comments" class="well">
  <h3>{{ c.added_by }}</h3>
  <p>{{ c.content }}</p>
  {% if c.picture %}
   <img src={{c.picture.url}} />
  {% endif %}
 </div>
 {% endfor %}

 <div class="well">
 {% if user.is_authenticated %}
  <form id="comment-form" enctype="multipart/form-data" method="post">
   {% csrf_token %}
   {% crispy comment_form %}
   <input type="submit" value="{{ submit }}" class="btn"/>
  </form>

 {% else %}
 <a href="{% url django.contrib.auth.views.login %}?next={{ request.path }}%23comments">{% trans "Log in to post a comment" %}</a>
 {% endif %}
</div>
This is just a part of the template that outputs the comments. Other things happen all around.

Forms Party! - the more the merrier :)

So far we always handled one form per page, per view, but what happens when you need to handle more than one form? How to do it? Well, the main idea is to add different names to the forms, kind of hooks in our forms, to make it easy for the server to catch each one in turn.
No drastic changes in models.py and forms.py, except for having more models and more forms. So, lets take a look at where it will really change.
In a template:
<form id="comment-form" enctype="multipart/form-data" method="post">
 {% csrf_token %}
 {% crispy comment_form %}
 <input type="hidden" name="formtype" value="comment"/>
 <input type="submit" value="{{ submit }}" class="btn"/>
</form>

<form id="issue-form" enctype="multipart/form-data" method="post">
 {% csrf_token %}
 {% crispy issue_form %}
 <input type="hidden" name="formtype" value="issue"/>
 <input type="submit" value="{{ submit }}" class="btn"/>
</form>
And in views.py:
issue_form = None
comment_form = None

if request.method == 'POST':
   if request.POST['formtype'] == 'issue':
      issue = Issue(site=w, added_by=request.user, added_at=datetime.datetime.now())
      issue_form = AddIssueForm(request.POST, request.FILES, instance=issue, prefix="issue")
      if issue_form.is_valid():
         issue_form.save()
         return HttpResponseRedirect(reverse('myproject.websites.views.details', args=(dname,)))
   elif request.POST['formtype'] == 'comment':
       comment = WebSiteComment(site=w, added_by=request.user, added_at=datetime.datetime.now())
       comment_form = AddSiteCommentForm(request.POST, request.FILES, instance=comment, prefix="comment")
       if comment_form.is_valid():
          comment_form.save()
          return HttpResponseRedirect(reverse('myproject.websites.views.details', args=(dname,)))
As you can see, all the trick was to add another hidden input tag in a template, to pass specific name connected to that form, catch it on the server side and do what you need to do with them.

Simultaneously Looping Techniques in Python

Looping over two lists at the same time might be a bit of a problem, and you will have to use either a custom function or one of already existing Python tools. Also, treat "simultaneously" and "at the same time" terms a bit loosely. I don't think it's possible to run two processes/functions exactly at the same time, many other processes run at your computer and influence the timing of your processes/functions. The best we can achieve is to run them closely enough so as to perform what we need.

Introducing zip

Given two lists of same length:

list1 = [1, 2, 3]
list2 = [4, 5, 6]
for i, k in zip(list1, list2):
     print i, k
and the output is:
1    4
2    5
3    6
zip is a build in python function, no need to import anything, you can use it straight away.
So, what if the lists are of different length? Well, depends on what you want to do with those lists. If you want to do something with all the elements in both lists, you will need to use another tool.
list1 = [1, 2, 3]
list2 = [4, 5, 6, 7]
for i, k in zip(list1, list2):
   print i, k
and the output is:
1    4
2    5
3    6
huh..? same as before?! Well, yeah.
So, what happened?
zip takes two lists, and returns a list of tuples as follows:
zip (list1, list2)
and the output is:
[(1,4), (2, 5), (3, 6)]
and then this is what you are looping over. And when zip runs over the end of one list, he washes hands and say "Here you go, buddy. Enjoy".

Introducing itertools

itertools holds a number of useful functions for different kinds of looping techniques. In my case, only two functions of itertools are of interest to me.

IZIP

In [1]: import itertools
In [2]: list1 = [1, 2, 3]
In [3]: list2 = [4, 5, 6, 7]
In [4]: for i, k in itertools.izip(list1, list2):
...:       print i, k
...: 
1 4
2 5
3 6
In [5]: itertools.izip(list1, list2)
Out[5]: <itertools.izip at 0x27fd5a8>
In [6]: itertools.izip(list1, list2).next()
Out[6]: (1, 4)
and this is the difference between zip() and itertools.izip(). Where zip() returns a list of tuples, izip() returns an iterator( itertools, duh..!).

IZIP_LONGEST

 for i, k in itertools.izip_longest(list1, list2, fillvalue='None'):
    print i, k
...: 
1 4
2 5
3 6
None 7
by default fillvalue will use None. You can use what ever other value that suites you better. izip_longest returns an iterator, same as izip.

And What If I Don't Want to Use itertools and My Lists are of Different Length

Here is another way to do it:
l1 = [1, 2, 3]
l2 = [4, 5, 6, 7]
def get_element(l1, i):
   try:
      return l1[i]
   except IndexError:
      return "None"
for i in xrange(max(len(l1), len(l2))):
   print get_element(l1, i), get_element(l2, i)
and the output is:
1 4
2 5
3 6
None 7
Another way of achieving "simultaneous" looping is multi-threading and of that I will talk in another post sometime soon, when I'll get some free time (whatever that is, lol ;)

Monday, June 24, 2013

HTMLParser for small and easy tasks

Python 2.7, Django 1.4

When I just started learning web development, my first task ever was to scrape dozens of web sites. New language, new concepts, new tools. It took  me days to complete the task and I learned how NOT to build a web site. To complete that task I used a web scraping framework known to us by the name of Scrapy. Since than I came to know lxml, Beautiful Soup and HTMLParser. For any extensive web harvesting I use Scrapy, but for some small tasks, HTMLParser is just the thing.

HTMLParser

So, what is HTMLParser and why use it?
HTMLParser is a Python module, so if you have Python installed, you already have it. In itself, HTMLParser does nothing, if you will feed it data, without proper modifications, you will get nothing in return. To make it tick, you need to override the needed methods, and that is what it will do for you. The only thing HTMLParser provides for you is a method to parse X/HTML formatted text, this method is build in and you can't change it.
Before continuing, let's take a look at html tag:
<a href="#">I am a link</a>

First part, that comes before 'I am a link', is a start tag, and that is where all our attributes live.
'I am a link' is the data that this tag holds.
And the last part of a tag </a> is called end tag, most html tags have one and it holds no attributes.

HTMLParser Methods You Have to Override

HANDLE_STARTTAG(SELF, TAG, ATTRS)

This is the method you want to override in most cases and is used for extracting attributes and their data.

HANDLE_ENDTAG(SELF, TAG)

As the name states, handles end tags. Can be used to validate the html.

HANDLE_DATA(SELF, DATA)

This is the method you can use to extract any data from any h, p, text and other tags. For example if you want to extract 'I am a link' in previous example, this is the method you can use:
def handle_data(self, data):
    print data

__INIT__ METHOD

Python documentation doesn't state it, but it is advised to override this method and adapt it to your needs. First parser I wrote didn't work and adding this method solved the matter.

Example

In my case, I needed to extract all href's in given html and validate the links, some relative and some absolute and my task was to check that they all worked.
This is the parser I coded into existence:
from HTMLParser import HTMLParser
import requests
from django.core.urlresolvers import resolve
from django.http import Http404

class MyHTMLParser(HTMLParser):

    def __init__(self, fp):
        """
        fp is an input stream returned by open() or urllib2.urlopen()
        """
        HTMLParser.__init__(self)
        self.seen = {}  # holds parsed hrefs
        self.is_good = True
        self.feed(fp.read())

    def handle_starttag(self, tag, attrs):
        """
        Looking for href attributes and validating them
        """
        for k, v in attrs:
            if k == 'href' and v not in self.seen:
                self.seen[v] = True
                try:
                    resolve(v)
                except Http404:
                    self.is_good = self._check_abs_url(v)
            if not self.is_good:
                return

    def status(self):
        """
        Indicator if all links in current html are working.
        Returns True if no broken links found.
        """
        return self.is_good

    def _check_abs_url(self, url):
        """
        Checks if the link is broken
        """
        try:
            f = request.head(url)
            return True
        except requests.exceptions.RequestException:
            return False

And that is my parser. The only method I override is handle_starttag and __init__. I use a Django, build in, function to validate relative links, and requests for absolute link. One other thing, this parser does a lot of requests, so to make it easier on both servers (the one that does the request and the one that responses) I do head requests.

Wednesday, June 19, 2013

Setting Up Google App Engine WebApp2 Project in Virtualenv

Python 2.7, virtualenv, Ubuntu 12.04, GAE 1.7.1
This is a short HOW-TO for how I solved my ImportError: no module google.appengine.ext while working in virtualenv.
GAE can be downloaded from here.
To install GAE on Linux, just extract the content to where ever you want. I myself use apps/ directory under my /home/usr directory. Which means, in my case, GAE will be found in /home/usr/apps/google_appengine.
After downloading and extracting GAE to said directory, I create a virtualenv for my project.
Project directory structure:
/ProjectA
/bin
/build
/include
/lib
/local
/man
/src
/app
/static
/img
/js
/css
/templates
$ virtualenv ProjectA/ --no-site-packages

I'm running this command from the parent directory of ProjectA. In general you need to specify a full path to the directory that will be the virtual environment.
More info can be found here.
After creating the virtualenv, I create inside ProjectA an src directory structure that will hold all my code. As someone  who comes from Django background, I tend to uphold same architecture in webapp2 (for example, all my models are saved in models.py), as I find it a very good way of coding.
Ok, so far we downloaded our google app engine and created project directory structure.
Lets activate our virtualenv:
$ cd ProjectA/
$ source bin/activate
(ProjectA)$

So now we have a clean development environment with latest python and pip ready and working.
As someone who practice TDD way of coding, my first pip command is:
(ProjectA)$ pip install nose
This will install latest nose framework for testing in python. And here is a catch, nose will search your whole project directory to find tests, but only in package directories or test directory (more info here), so either create a directory that will match testMatch of nose or make your app a package. When you use Django, it is done automatically for you, but in GAE it's not a must for your app to run. Adding __init__.py to your app directory will solve it.
To link the GAE we downloaded before to this virtualenv, add gae.pth to /lib/python2.7/site-packages with following content:
<full path to GAE directory>
<full path to GAE directory>/google
<full path to GAE directory>/lib/antlr3
<full path to GAE directory>/webapp2
<full path to GAE directory>/lib/yaml/lib

That's it. Now you can use commands like 'from goole.appengine.ext import ndb'.
Run a few tests on your code, if you see any more GAE connected ImportError, just add the path to needed module to gae.pth.