Parse Azure Deployment Status

I’ve been making a lot of deployments that take a while to finish. There is an api call to fetch the status of the operations.


azure group deployment operation list --name deploymentname --resource-group rgname

The next step was to parse the json response.


azure group deployment operation list --name deploymentname --resource-group rgname | python parse.py

Here is the code Parse Azure Deployment Response

Advertisements
Posted in software | Tagged , | Leave a comment

How Long to Deploy a Secure HDInsight Cluster

It takes a while to build a secure HDInsight cluster. In my case it was about an hour from start to finish.

SecureClusterDeployment

About half of the work was setting up the VNet and configuring the Domain Controllers. The HDInsight cluster took about 30 mins which his about normal.

Some steps were dependent on other steps and also some steps were running asynchronously. I didn’t record when I started the process, so I’m using the time of the first time as my zero. Add a few seconds on for the first step to run.

Deployment step Time from Start (min)
Availability Set 00.00
Public IP 00.08
Load Balancer 00.11
VNet 00.26
BDC Nic 00.29
PDC Nic 00.31
Storage Account 00.31
AD BDC VM 05.30
AD PDC VM 05.70
Prepare BDC Script 12.80
Create AD Forest Script 23.90
Update Vnet DNS1 Script 24.10
Update BDC Nic Script 24.90
Configure BDC Script 29.70
Update Vnet DNS2 Script 29.90
Create Cluster 56.00
Posted in Uncategorized | Tagged , | Leave a comment

How I’ve deployed R Models with Microsoft R Server

Many organizations are benefiting from advanced analytics and machine learning. I work with a lot of teams who create amazing data analytics models and who need to then deploy them in a way to be useful. This is what I’ve seen and one way to deploy models.

Generally deploying models run into two major obstacles technical and organizational. Any solution needs to cover both.

Create and Deploy Analytics Web Services

  • Write your models and routines in your local environment.
  • Deploy them as production-ready microservices.
  • Update your models and routines and version the microservice.

The best teams version their models using version control software just as it’s used in software development.

Step by Step

Stand up R Server in Azure

  • Navigate to Azure portal
  • Spin up an R Server
  • Azure R Server Dashboard

    Azure R Server Dashboard

  • Open the network port
  • image2017-2-28 14-52-51

Configure Server

  • ssh onto the machine
  • Launch Admin utility and configure password and server
  • <div class="line number1 index0 alt2"><code class="java plain">$ cd /usr/lib64/microsoft-deployr/</code><code class="java value">9.0</code><code class="java plain">.</code><code class="java value">1</code><code class="java plain">/</code></div>
    <div class="line number2 index1 alt1"><code class="java plain">$ sudo dotnet Microsoft.DeployR.Utils.AdminUtil/Microsoft.DeployR.Utils.AdminUtil.dll</code>

Install RClient

  • Download R Client
  • Configure RStudion to use R Client

Create and Deploy Model

  • ## MODEL DEPLOYMENT EXAMPLE ##
    
    ##########################################################
    # Load mrsdeploy package on R Server #
    ##########################################################
    
    library(mrsdeploy)
    
    ##########################################################
    # Create & Test a Logistic Regression Model #
    ##########################################################
    
    # Use logistic regression equation of vehicle transmission
    # in the data set mtcars to estimate the probability of
    # a vehicle being fitted with a manual transmission
    # based on horsepower (hp) and weight (wt)
    
    
    # Create glm model with `mtcars` dataset
    carsModel <- glm(formula = am ~ hp + wt, data = mtcars, family = binomial)
    
    # Produce a prediction function that can use the model
    manualTransmission <- function(hp, wt) {
    newdata <- data.frame(hp = hp, wt = wt)
    predict(carsModel, newdata, type = "response")
    }
    
    # test function locally by printing results
    print(manualTransmission(120, 2.8)) # 0.6418125
    
    ##########################################################
    # Log into Microsoft R Server #
    ##########################################################
    
    # Use `remoteLogin` to authenticate with R Server using
    # the local admin account. Use session = false so no
    # remote R session started
    remoteLogin("http://localhost:12800",
    username = “admin”,
    password = “{{YOUR_PASSWORD}}”,
    session = FALSE)
    
    ##########################################################
    # Publish Model as a Service #
    ##########################################################
    
    # Publish as service using `publishService()` function from
    # `mrsdeploy` package. Name service "mtService" and provide
    # unique version number. Assign service to the variable `api`
    api <- publishService(
    "mtService",
    code = manualTransmission,
    model = carsModel,
    inputs = list(hp = "numeric", wt = "numeric"),
    outputs = list(answer = "numeric"),
    v = "v1.0.0"
    )
    
    ##########################################################
    # Consume Service in R #
    ##########################################################
    
    # Print capabilities that define the service holdings: service
    # name, version, descriptions, inputs, outputs, and the
    # name of the function to be consumed
    print(api$capabilities())
    
    # Consume service by calling function, `manualTransmission`
    # contained in this service
    result <- api$manualTransmission(120, 2.8)
    
    # Print response output named `answer`
    print(result$output("answer")) # 0.6418125
    
    ##########################################################
    # Get Service-specific Swagger File in R #
    ##########################################################
    
    # During this authenticated session, download the
    # Swagger-based JSON file that defines this service
    swagger <- api$swagger()
    cat(swagger, file = "swagger.json", append = FALSE)
    
    # Share Swagger-based JSON with those who need to consume it
    

Use Rest Api

  • Get authorization token with Rest Client
  • image2017-2-28 15-4-18
  • image2017-2-28 15-5-42
  • Call Web service with token and parameters
  • image2017-2-28 15-14-26
  • image2017-2-28 15-14-49

Resources

Posted in Uncategorized | Leave a comment

How I fetched data from Hive in R with HDInsight

I have an HDInsight Spark cluster and wanted to fetch some data out of a Hive table and play around with it in R. I’m using Microsoft R. Here is how I did it.

 


mySparkCC <- RxSpark(executorMem="1g",
driverMem="1g",
executorOverheadMem="1g",
numExecutors=2,
idleTimeout = 600, #3600 is the default
persistentRun = TRUE,
consoleOutput=TRUE)
rxSetComputeContext(mySparkCC)
rxGetComputeContext()

sdf2 <- RxHiveData(query = "select * from hivesampletable LIMIT 100")

rxGetInfo(sdf2, getVarInfo = TRUE)

Posted in Uncategorized | Leave a comment

Here are the differences between HDInsight and Hortonworks HDP

 

HDPvsHDI

I get asked a lot about the differences between Microsoft HDInsight and Hortonworks HDP. Turns out there tight coordinated development efforts between the two companies. I was surprised to see how much closer the two products are now today compared to a year ago. Here is what I was able to find.

Lag in Versions

As of today, 8 May 2017, the current versions are:

  • HDP: 2.6
  • HDInsight: 3.5. Based on HDP 2.5

So, the first thing we see is HDInsight has a lag between the latest and greatest of Hortonworks. This really results in differences between feature version. That makes sense because Azure is a different beast than a straight IaaS deployments. Microsoft engineers need to make it work with WASB storage and other Azure specific architecture (networking, security, etc).

Feature Gaps

It would be great if all of the features were ported to Azure but they are not. Some priorities are made on what to support. I imagine this is driven by customer demand. Here are the feature not found in HDInsight:

  • Druid
  • Solr
  • Slider
  • Accumulo
  • Falcon
  • Atlas
  • Flume
  • Knox

References

I found this information from the product pages

Posted in software, Uncategorized | Tagged , , , | Leave a comment

What Version of PHP am I Running

phpapache

I was trying to install a debugger for PHP and had trouble mostly because it turned out I had two different versions of PHP running on my machine. One I installed using homebrew and the other … well I don’t know how it got there.

Nevertheless, this is how I found out which one I was using.

  • From the command line
    php -v
    PHP 5.5.12 (cli) (built: May 27 2014 19:40:54)
    
  • From the browserNavigate to a php page with php.info()
    php.info()
    5.4
    
  • Point apache to the right locationEdit apache config
    /etc/apache2/httpd.conf
    
    LoadModule php5_module /usr/local/opt/php55/libexec/apache2/libphp5.so
    

Which Php Config

As an added bonus, I’ll show how to find which php.ini file is loaded.

From the command line you can check which php.ini php_cli is running with:

php -i | grep "Loaded Configuration File"

Resources

Posted in software | Tagged , | Leave a comment

Moving GitLab to an EBS Volume

 

gitlab-logo

I’ve really enjoyed using GitLab to manage my git repositories. It was really easy to get started because I used a Bitnami GitLab image for Amazon. The problem is when I created the instance I chose an Instance store over EBS. I don’t have a really good reason why I did but now I need to move my data to its own volume so I can restart my instance with no fear of loosing data.

  1. Check how much space you have
    $ df -h
    /dev/xvda1 9.9G 4.5G 4.9G 48% /
    

    /dev/xvda1 is the root device volume. 10 GB is the default for ephemeral

    I’m at 50% and would like to have more room to grow. However, the real reason is I don’t want to lose my data.

  2. Create and Attach an EBS Volume

    You need to create the volume in the same available zone as the host computer in order to attach it.Also, take note of the device name when you attach it. We’ll need this name when we mount the volume

  3. Format the Volume

    First find which volume is the new one with the list block devices command

    $ lsblk
    
    xvda1 202:1    0    10G  0 disk /
    xvdb  202:16   0     4G  0 disk /mnt
    xvdf  202:80   0   100G  0 disk
    

    xvdf is the new one because it doesn’t have a mount point associated with it. Format the drive:

    $ sudo mkfs -t ext4 /dev/xvdf
    
  4. Create full backup of GitLab

    First stop the application using thebitnamictrl script and move to a safe location:

    $ sudo /opt/bitnami/ctlscript.sh stop
    $ sudo mv /opt /opt2
    
  5. Mount the volume

    Although the volume has been attached, it hasn’t been mounted yet. You can see this by running df again:

    $ df -h.
    /dev/xvda1 9.9G 4.5G 4.9G 48% /
    

    However the list block devices command can see the volume.

    $ lsblk
    
    xvda1 202:1    0    10G  0 disk /
    xvdb  202:16   0     4G  0 disk /mnt
    xvdf  202:80   0   100G  0 disk
    
    • xvda1 is mounted as the root file system.
    • xbdb is 4 GB mounted as /mnt
    $ sudo mkdir /opt
    $ sudo mount /dev/xvdf /opt
    
  6. Restart Gitlab

    Move data over to new /opt drive and restart:

    $ cp -R /opt2/bitnami /opt
    $ sudo /opt/bitnami/ctlscript.sh start
    
  7. Verify the volume and data
    $df -h
    
    /dev/xvda1      9.9G  1.2G  8.3G  12% /
    /dev/xvdf        99G  3.6G   90G   4% /opt
    

    I now have 3.6 GB moved from / to /opt and /opt is mounted to the /dev/xvdf volume

References

Posted in software | Tagged , , , | Leave a comment