Here are the differences between HDInsight and Hortonworks HDP



I get asked a lot about the differences between Microsoft HDInsight and Hortonworks HDP. Turns out there tight coordinated development efforts between the two companies. I was surprised to see how much closer the two products are now today compared to a year ago. Here is what I was able to find.

Lag in Versions

As of today, 8 May 2017, the current versions are:

  • HDP: 2.6
  • HDInsight: 3.5. Based on HDP 2.5

So, the first thing we see is HDInsight has a lag between the latest and greatest of Hortonworks. This really results in differences between feature version. That makes sense because Azure is a different beast than a straight IaaS deployments. Microsoft engineers need to make it work with WASB storage and other Azure specific architecture (networking, security, etc).

Feature Gaps

It would be great if all of the features were ported to Azure but they are not. Some priorities are made on what to support. I imagine this is driven by customer demand. Here are the feature not found in HDInsight:

  • Druid
  • Solr
  • Slider
  • Accumulo
  • Falcon
  • Atlas
  • Flume
  • Knox


I found this information from the product pages

What Version of PHP am I Running


I was trying to install a debugger for PHP and had trouble mostly because it turned out I had two different versions of PHP running on my machine. One I installed using homebrew and the other … well I don’t know how it got there.

Nevertheless, this is how I found out which one I was using.

  • From the command line
    php -v
    PHP 5.5.12 (cli) (built: May 27 2014 19:40:54)
  • From the browserNavigate to a php page with
  • Point apache to the right locationEdit apache config
    LoadModule php5_module /usr/local/opt/php55/libexec/apache2/

Which Php Config

As an added bonus, I’ll show how to find which php.ini file is loaded.

From the command line you can check which php.ini php_cli is running with:

php -i | grep "Loaded Configuration File"


Moving GitLab to an EBS Volume



I’ve really enjoyed using GitLab to manage my git repositories. It was really easy to get started because I used a Bitnami GitLab image for Amazon. The problem is when I created the instance I chose an Instance store over EBS. I don’t have a really good reason why I did but now I need to move my data to its own volume so I can restart my instance with no fear of loosing data.

  1. Check how much space you have
    $ df -h
    /dev/xvda1 9.9G 4.5G 4.9G 48% /

    /dev/xvda1 is the root device volume. 10 GB is the default for ephemeral

    I’m at 50% and would like to have more room to grow. However, the real reason is I don’t want to lose my data.

  2. Create and Attach an EBS Volume

    You need to create the volume in the same available zone as the host computer in order to attach it.Also, take note of the device name when you attach it. We’ll need this name when we mount the volume

  3. Format the Volume

    First find which volume is the new one with the list block devices command

    $ lsblk
    xvda1 202:1    0    10G  0 disk /
    xvdb  202:16   0     4G  0 disk /mnt
    xvdf  202:80   0   100G  0 disk

    xvdf is the new one because it doesn’t have a mount point associated with it. Format the drive:

    $ sudo mkfs -t ext4 /dev/xvdf
  4. Create full backup of GitLab

    First stop the application using thebitnamictrl script and move to a safe location:

    $ sudo /opt/bitnami/ stop
    $ sudo mv /opt /opt2
  5. Mount the volume

    Although the volume has been attached, it hasn’t been mounted yet. You can see this by running df again:

    $ df -h.
    /dev/xvda1 9.9G 4.5G 4.9G 48% /

    However the list block devices command can see the volume.

    $ lsblk
    xvda1 202:1    0    10G  0 disk /
    xvdb  202:16   0     4G  0 disk /mnt
    xvdf  202:80   0   100G  0 disk
    • xvda1 is mounted as the root file system.
    • xbdb is 4 GB mounted as /mnt
    $ sudo mkdir /opt
    $ sudo mount /dev/xvdf /opt
  6. Restart Gitlab

    Move data over to new /opt drive and restart:

    $ cp -R /opt2/bitnami /opt
    $ sudo /opt/bitnami/ start
  7. Verify the volume and data
    $df -h
    /dev/xvda1      9.9G  1.2G  8.3G  12% /
    /dev/xvdf        99G  3.6G   90G   4% /opt

    I now have 3.6 GB moved from / to /opt and /opt is mounted to the /dev/xvdf volume


Symfony denied Access

Moving my Symfony app to a QA server showed me how rude the server is. I could see the home page right after I copied the project to the new machine with not problem (except for modifying app.config.php and various file permission ownership issues). Nevertheless, the real problem came when I tried navigating to any other route besides the index. Each time I would get a 403 Access denied error.

Apparently the server “understood the request, but is refusing to fulfill it”. Thanks. I especially like how the guideline for 403 errors says

…if the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity.

That sounds good because I really don’t know why I’m not seeing my page and I would really like to know why the server is refusing to fulfill it. In my case I am given the clear description of “Access denied.” Thanks again.
After much searching and testing I found the solution. The fix was to modify my php.ini and restart the service

; /etc/php5/fpm/php.ini


I’m sure there is a better way to restart this service.

$ ps -eaf | grep php
root      1269     1  0 16:39 ?        00:00:00 php-fpm: master process (/etc/php5/fpm/php-fpm.conf) 
$ sudo kill 1269
$ ps -eaf | grep php
root      1436     1  6 17:49 ?        00:00:00 php-fpm: master process (/etc/php5/fpm/php-fpm.conf)  

And I’m sure there are more ways for a server to be helpful.

Logging onto EC2

I tend to forget how to log onto EC2. This morning I spun up a few servers with an Ubuntu image on EC2. As soon as I tried to log on, I received a

Permission denied (publickey).

After doing a quick search I remembered I needed to add the username on the instance

ssh -i /path/to/keypair.pem

So simple.

Extract Terms from Text

I’ve been playing around with NLP. I wanted to see if I could extract terms from text. With the help of the internet, I found some answers. It boils down to:


The python example looks like this

from topia.termextract import extract

extractor = extract.TermExtractor()

text ="One company that successfully leverages a generic strategy is Costco Wholesale and that generic strategy is low-cost leadership. The company's mission is to provide popular products to customers at the lowest prices the market can offer. One way that Costco has been successful at this is by cutting expenses. Actual Costco stores are literally warehouses full of products. The company saves on many of the cosmetic aspects of typical retail stores. Additionally, most Costco stores are open 10 am to 8:30 pm during the week and closing earlier on the weekends. Less operating time saves money. Additionally, Costco operates on a membership program. This means that someone must be a member to enter the store and purchase the merchandise. One staff member stands at the entrance checking membership cards as members enter and other staff members stand at the exit matching receipts with purchases. This design allows the company to cut down on staffing costs by not needing as many employees wandering the large warehouses."

# Show terms from text
taggedTerms = sorted(extractor(text))

The results from python are:

  • (‘8:30 pm’, 1, 2),
  • (‘Actual Costco stores’, 1, 3),
  • (‘Costco’, 5, 1),
  • (‘Costco stores’, 1, 2),
  • (‘company’, 4, 1),
  • (‘member’, 4, 1),
  • (‘membership cards’, 1, 2),
  • (‘membership program’, 1, 2),
  • (‘staff member’, 1, 2),
  • (‘staff members’, 1, 2),
  • (‘store’, 4, 1)

Yahoo Example

Calling Yahoo with the same text We get

  • costco stores
  • costco wholesale
  • generic strategy
  • cost leadership
  • cosmetic aspects
  • membership cards
  • costco
  • membership program
  • warehouses
  • popular products
  • staff member
  • retail stores
  • staff members
  • receipts
  • lowest prices
  • wholesale
  • money


When I posted this post, WordPressed suggested I should tag this post with the following. So there must be a plugin somewhere, could be topia.

  • Costco Wholesale
  • Costco
  • Costco stores
  • generic strategy
  • Actual Costco stores