Read more about Working With Nutch 2.x - The API, Part 1: Creating Multiple Configurations

Now that we know the basics of Nutch, we can dive into our use case. We write scripts that do two things:

Ingestion of the various configurations
Execute and control crawls

This post will tackle ingesting the configs. I will specifically be using Python for the examples in this post, but the principles should apply to any language.

In our project, we had 50+ sites we wanted to crawl, all with different configuration needs. We organized these configurations into a nice JSON api that we ingest. In our examples, we will be using Python’s Requests API to get the JSON. We’ll also need a way to create a unique UUID for each configuration, so we’ll use Python’s UUID module. You can use the package installer pip to get them:

$ pip install requests
$ pip install uuid

We’re going to use a class to handle all of the processing for injection. We’ll create a file for this, call it configInjector.py. The beginning of the file should look something like this:

import os
import uuid
import requests
from shutil import copy2

class ConfigInjector(object):
    def __init__(self):
        pass

We’re importing os and copy2 so we can create, edit, and copy files that we need. Next, we’re going to want to get the config itself, as well as an ID from the configuration node itself. We’ll make a new file for this, call it inject.py. This will be the script we actually run from cron for injection. It begins something like this:

import urllib2
import json
import argparse
import configInjector

parser = argparse.ArgumentParser(description="Ingests configs.")
parser.add_argument("confugUrl", help="URL of the JSON config endpoint.")
args = parser.parse_args()

For our imports, we’ll use requests and UUID like earlier as well as urllib2 to download our remote JSON and argparse to give our script an argument for where to download JSON. We’re also importing our own configInjector class file.

The argparse module allows us to pass command line arguments to the Python script. In the code above, we instantiate the argument parser, add our argument (configUrl), and set the results of the argument to args. This allows us to pass in a url for the location of our JSON endpoint.

Now that we have the foundation set up let’s get the data. We’ll use urllib2 to grab the JSON and json.load() add it to a variable:

response = urllib2.urlopen(args.confugUrl)
configs = json.load(response)

We’ll then loop through it and call our class for each config in the JSON:

for configId in configs:
    configInjector.ConfigInjector(configId, configs[configId])

Now that we are getting the configs, let’s fill out our class and process them. We’ll use the __init__ constructor to do the majority of our data transformations. The two major things we want to do is process and inject Nutch config settings and create regex-urlfilters.txt for each config.

First, we’ll do our transformations. We want to get our config options in order to plug into Nutch, so we’ll just set them as variables in the class:

class ConfigInjector(object, configId, config):
    def __init__(self):self.config = config
        self.configId = configId

        # Config transformations
        self.configTitle = self.config["configTitle"]
        self.allowExternalDomains = self.config["allowExternalDomains"]
        self.uuid = str(uuid.uuid3(uuid.NAMESPACE_DNS, str(self.configId)))

We’re setting three things in this example: a config title and UUID for reference and a configuration state for the Nutch config db.ignore.external.links. We’re using the static configId to generate the UUID so that the same UUID is always used by each individual configuration.

Next, we’ll need to create some files for our seed urls and match patterns. We’re going to create two files, seed-XXXXXX.txt and regex-urlfilters-XXXXXX.txt, where XXXXXX is the configId. For the seed files, we’ll create our own directory (called seeds), but for the regex files, we must store them in $NUTCH_HOME/runtime/local/conf in order for Nutch to find them (this is due to Nutch’s configuration of the Java CLASSPATH). First, we’ll set the filenames based upon configId (this goes in the __init__ function):

self.regexFileName = 'regex-urlfilter-' + self.nodeId + '.txt'
self.seedFileName = 'seed-' + self.nodeId + '.txt'

We also want to call the functions we are about to write here, so that when we call the class, we immediately run all the necessary functions to inject the config (again, in the __init__ function):

# Run processes
self._makeConfigDirectories()
self._configureSeedUrlFile()
self._copyRegexUrlfilter()
self._configureRegexUrlfilter()
self._prepInjection()

Next, we’ll setup the directories (the underscore at the beginning of the function name just tells python not to load this function when being imported because it will only be used internally):

def _makeConfigDirectories(self):
     if not os.path.exists('/path/to/nutch/runtime/local/conf/'): 
         os.makedirs('/path/to/nutch/runtime/local/conf/') 
     if not os.path.exists('/path/to/nutch/seeds/'): 
         os.makedirs('/path/to/nutch/seeds/')

This simply checks to make sure the directories are there and makes them if they aren’t. Next, we’ll create the seed files:

def _configureSeedUrlFile(self): 
     furl = open('/path/to/nutch/seeds/' + self.seedFileName, "w") 
     for url in self.config["seedUrls"]: 
         furl.write(url + "\n")

Basically, we are opening a file (or creating one if it doesn’t exist--this is how “w” functions) and writing each url from the JSON config to each line. We must end each url with a newline (\n) for Nutch to understand the file.

Now we’ll make the regex file. We’ll do it in two steps so that we can take advantage of what Nutch has pre-built. We’re going to copy Nutch’s built-in regex-urlfilters.txt so that we can use all of its defaults and add any defaults we would like to all configs. Before we do that, we have an important edit to make to regex-urlfilters.txt: remove the .+ from the end of the file in both /path/to/nutch/conf and /path/to/nutch/runtime/local/conf. We’ll add it back in the file ourselves, but if we leave it there, the filters won’t work at all because Nutch uses the first match when determining whether to fetch a url, and .+ means “match any”. For our use, we’re going to add this back on the end of the file after we write our regex to it.

We’ll copy regex-urlfilters.txt in this function:

def _copyRegexUrlfilter(self): 
     frurl = '/path/to/nutch/conf/regex-urlfilter.txt' 
     fwurl = '/path/to/nutch/runtime/local/conf/' + self.regexFileName 
     copy2(frurl, fwurl)

Then, we write our filters from the config to it:

def _configureRegexUrlfilter(self): 
     notMatchPatterns = self.config["notMatchPatterns"] 
     matchPatterns = self.config["matchPatterns"] 
     regexUrlfilter = open('/path/to/nutch/runtime/local/conf/' + self.regexFileName, "a") 
     if notMatchPatterns: 
         for url in notMatchPatterns: 
             regexUrlfilter.write("-^" + url + "\n")
     if matchPatterns: 
         for url in matchPatterns: 
             regexUrlfilter.write("+^" + url + "\n")regexUrlfilter.write("+.\n") 
     regexUrlfilter.close()

A few things are going on here: we are opening and appending to the file we just copied (that’s how “a” works) and then, for each “do not match” pattern we have, we are adding it to the file, followed by the match patterns. This is because, as we said before, Nutch will use the first regex match it gets, so exclusion needs to go first to avoid conflicts. We then write .+ so that Nutch accepts anything else--you can leave it off if you would prefer Nutch exclude anything not matched, which is its default behavior.

As a quick side note, it is important to mention that designing it this way means that each time we inject our configuration into Nutch, we will be wiping out and recreating these files. This is the easiest pathway we found for implementation, and it affords no disadvantages except that you cannot manually manipulate these files in any permanent way. Just be aware.

Now that we have our files in place, the last thing we have to do is inject the configuration into Nutch itself. This will be our first use of the Nutchserver API. If you have not already, open a console on the server that hosts Nutch and run:

$ nutch nutchserver

Optionally, you can add a --port argument to specify the port, but we’ll use the default: 8081. Then we’ll prep the data for injection into the API:

def _prepInjection(self): 
     config = {} 

     # Custom config values 
     config["meta.config.configId"] = self.configId 
     config["meta.config.configTitle"] = self.configTitle 
     config["meta.config.seedFile"] = '/path/to/nutch/seeds/' + self.seedFileName 

     # Crawl metadata 
     config["nutch.conf.uuid"] = self.uuid 

     # Crawl Config 
     config["urlfilter.regex.file"] = self.regexFileName 
     config["db.ignore.external.links"] = self.allowExternalDomains 

     self._injectConfig(config)

Note that we are creating both our own custom variables for later use (we named them “meta.config.X”) and setting actual Nutch configuration settings. Another note: urlfilter.regex.file takes a string with the filename only. You CANNOT specify a path for this setting, which is why we store the regex files in /path/to/nutch/runtime/local/conf, where the CLASSPATH already points.

Lastly, we’ll do the actual injection. The self._injectConfig(config) at the end of the _prepInjection function starts injection:

def _injectConfig(self, config): 
     job = {"configId": self.uuid,"force": "true","params": config} 
     r = requests.post('http://localhost:8081/config/' + self.uuid, json = job) 
     return r

All we do here is set up the JSON to push to the API and then inject. Every configuration we send to the API must have a UUID as it’s configId (which we will reference later when creating crawl jobs). We set force to true so that configurations will get overwritten when they change upstream and then we pass in our configuration parameters.

We then use the requests python module to make the actual injection. This is significantly easier than using something like CURL. We post to a url containing the uuid and have the JSON as the body (requests has a handy json argument that converts Python dictionaries to json before adding it to the body). Lastly, we return the post response for later use if needed.

And that’s it! We have successfully posted our dynamic custom configuration to nutchserver and created the relevant files. In the next post, we’ll show you how to crawl a site using these configurations.

Categories

General

Tags

Api

Dynamic Configuration Ingestion

Nutch

Author

Mobomo

Read more about The Basics: Working with Nutch 2.x

Let’s be honest, the documentation for Apache Nutch is scarce. Doing anything more complicated than a single-configuration crawl requires hours of prowling Stack Overflow and a plethora of sick Google-fu moves. Thankfully, I’ve already suffered for you!

A recent project involved configuring Nutch to crawl 50+ different sites, all in different states of web standard conformity, all with different configuration settings. These had to be dynamically added and needed to account for changing configurations. In the following few posts, I’ll share the steps we took to achieve this task.

What is Nutch?

Apache Nutch 2.x is an open-source, mature, scalable, production-ready web crawler based on Apache Hadoop (for data structures) and Apache Gora (for storage abstraction). In these examples, we will be using MongoDB for storage and Elasticsearch for indexing; however, this guide should still be useful to those using different storage and indexing backends.

Basic Nutch Setup

The standard way of using Nutch is to set up a single configuration and then run the crawl steps from the command line. There are two primary files to set up: nutch-site.xml and regex-urlfilter.txt. There are several more files you can utilize (and we’ll discuss a few of them later), but for the most basic implementation, that’s all you need.

The nutch-site.xml file is where you set all your configuration options. A mostly complete list of configuration options can be found in nutch-default.xml; just copy and paste the options you want to set and change them accordingly. There are a few that we’ll need for our project:

http.agent.name - This is the name of your crawler. This is a required setting for every Nutch setup. It’s good to have all of the settings for `http.agent` set, but this is the only required one.
storage.data.store.class - We’ll be setting this one to org.apache.gora.mongodb.store.MongoStore for Mongo DB.
Either elastic.host and elastic.port or elastic.cluster - this will point Nutch at our Elasticsearch instance.

There are other settings we will consider later, but these are the basics.

The next important file is regex-urlfilter.txt. This is where you configure the crawler to include and/or exclude specific urls from your crawl. To include a urls matching a regex pattern, prepend your regex with a +. To exclude, prepend with a -. We’re going to take a slightly more complicated approach to this, but more on that later.

The Crawl Cycle

Nutch process

Nutch’s crawl cycle is divided into 6 steps: Inject, Generate, Fetch, Parse, Updatedb, and Index. Nutch takes the injected URLs, stores them in the CrawlDB, and uses those links to go out to the web and scrape each URL. Then, it parses the scraped data into various fields and pushes any scraped hyperlinks back into the CrawlDB. Lastly, Nutch takes those parsed fields, translates them, and injects them into the indexing backend of your choice.

How To Run A Nutch Crawl

Inject

For the inject step, we’ll need to create a seeds.txt file containing seed urls. These urls act as a starting place for Nutch to begin crawling. We then run:
$ nutch inject /path/to/file/seeds.txt

Generate

In the generate step, Nutch extracts the urls from pages it has parsed. On the first run, generate only queues the urls from the seed file for crawling. After the first crawl, generate will use hyperlinks from the parsed pages. It has a few relevant arguments:

-topN will allow you to determine the number of urls crawled with each execution.
-noFilter and -noNorm will disable the filtering and normalization plugins respectively.

In its most basic form, running generate is simple:
$ nutch generate -topN 10

Fetch

This is where the magic happens. During the fetch step, Nutch crawls the urls selected in the generate step. The most important argument you need is -threads: this sets the number of fetcher threads per task. Increasing this will make crawling faster, but setting it too high can overwhelm a site and it might shut out your crawler, as well as take up too much memory from your machine. Run it like this:
$ nutch fetch -threads 50

Parse

Parsing is where Nutch organizes the data scraped by the fetcher. It has two useful arguments:

-all: will check and parse pages from all crawl jobs
-force: will force parser to re-parse all pages

The parser reads content, organizes it into fields, scores the content, and figures out links for the generator. To run it, simply:
$ nutch parse -all

Updatedb

The Updatedb step takes the output from the fetcher and parser and updates the database accordingly. Updatedb markes urls for future generate steps at this point. Nutch 2.x supports several storage backends thanks to it abstracting storage through Apache Gora (MySQL, MongoDB, HBase). No matter your storage backend, however, running it is the same:
$ nutch updatedb -all

Index

Indexing is taking all of that hard work from Nutch and putting it into a searchable interface. Nutch 2.x supports several indexing backends (Solr, Cassandra, Elasticsearch). While we will be using Elasticsearch, the command is the same no matter what indexer you are using:
$ nutch index -all

Congrats, you have done your first crawl! However, we’re not going to be stopping here, oh no. Our implementation has far more moving parts than a simple crawl interface can give, so in the next post, we will be utilizing Nutch 2.3’s RESTful API to add crawl jobs and change configurations dynamically! Stay tuned!

Categories

Tags

Author

Read more about ATARC: Improving Mobile Experience For Users

The ATARC Mobile Customer Experience Project Team has been examining ways that Federal agencies can better utilize customer experience technologies and techniques such as a user centered design approach to improve the internal and public-facing mobile applications that serve employees and citizens of the United States.

They have identified a number of principles that agencies should leverage to properly engage mobile audiences. Brian Lacey, CEO of Mobomo, presented the user centered design strategy that we used for USO, a recent mobile app that launched in the spring of 2017 as part of a series of deployments to over 200+ USO centers.

USO was seeking to design, develop, and deploy a cross-platform mobile application that introduces an additional channel for United States military service members and their families to better engage with USO centers and programs.

Specifically, this mobile application needed to foster greater discoverability of the USO locations, where they provide services and programs offered by the USO through its distributed locations and online. The initial launch was on an iOS native application platform. It was key that our design process focus on the USO end user(s) in order to create a platform that people would enjoy using the most.

User Centered Design Process

User centered design process flow

It’s important to create a design for the people that use your product - always keep the user at the center of the design focus. A gorgeous app with poor UX is not a gorgeous app, it is a recipe for user frustration! Design happens at the intersection of the user, the interface and the context.

We started the design process by defining the specific goals of the app. After working with the USO team and deciphering their goals, we were able to better define the overall strategy in which we would use to accomplish the goals they were seeking to achieve.

We defined the goals through stakeholder interviews and preliminary research which helped to better define the main purpose of the app, and the general baseline for the approach that we would use during the process.

As the design phase progressed, user interviews and site visits occurred which gave our team insight in determining the user personas or better known as the ideal end user that would be utilizing the USO mobile platform.

By conducting user personas, this helped us to uncover the end user's wants and needs in which we based the design of the app that you see today.

USO user personas

Each user persona has slightly different needs, and being able to narrow in on those specific needs while not neglecting the other persona’s needs creates a balance in the design for the user experience. For example, The “enlisted personnel” persona wants to know what amenities are available and to be able to skip through the check-in line as quickly as possible.

While the “military spouse” persona wants to know what programs are at that specific location so that they can keep their kids entertained. Having a fast check-in is vital to this persona as they are probably dealing with multiple kids who are not as patient in waiting in line as others. The “caregiver” persona is looking for support programs and events so that they can share their similar experiences with other caregivers.

For the case of this project, the USO key user persona’s were identified as the following:

The caregiver

The enlisted personnel

The military spouse

While identifying these different persona’s Mobomo was able to build the experience map and its key features which allows the users to have the ability to quickly check-in and find programs that most closely relate to them, and their physical location.

After the experience map and user personas were created, Mobomo began to develop screen sketches, the data architecture, and overall flow for the mobile application. By building these out early in the process we were able to identify possible data fails early on and come up with ways to mitigate these potential issues for the future.

flow chart

User centered design

Wireframes are generated from the app architecture and the app flow diagrams as the basic development process begins. These basic designs are tested in the app and built out around the user experience. As a result of the user experience driving the design and development we are able to successfully accomplish USO organizational goals, of creating more user data and simplifying the check-in process.

By accomplishing both of these issues while driving the app around the user we’ll be able to ensure the app’s longevity while still being a useful tool for both the user and for the organization.

User centered design

Mobomo focuses on what method of communication is best to get feedback from the end users versus what is assumed to be the best method- this is why Mobomo has been successful with human centric design feedback and making adjustments based on the actual user feedback.

Like most, USO is big on user engagement, the surveys and direct email were specific to USO. They wanted to be available in a variety of different ways to receive feedback on how they can be improving as well as what other services they can improve upon to offer to the end user.

As we were building the app, feedback from end users became an important facet for the USO as an organization. They like that there is a survey in the hands of potentially all of their end users which can help them improve their service.

The survey is specifically about the user’s experience at a particular USO. That information is fed back to that location through a different mechanism.

The user has the option for General USO Feedback, so that directs them to an email address that lets the user give the USO as an organization feedback. And then finally there is an direct email link for the user to give feedback on the App itself and how the app can improve to better serve that specific person.

User centered design

Since launching the iOS app, the next phase will be adding more enhancements to the iOS app, features could include: Program check-in, calendar integration, event signup, push notifications, and USO news.

Aside from enhancements to the iOS app, we are looking forward to expanding USO’s mobile presence by including an Android application into their digital presence. You can find the iOS app in the App store. Check out the full USO app!

Categories

General

Tags

Atarc

Mobile Customer Experience

Author

Mobomo

Read more about How To Automate Photoshop To Improve Your Workflow

Photoshop logo We specialize in interface design and we take pride in our process but there is a lot of work done behind the scenes before the design is complete. When you work as a digital interface designer, you spend your day interacting with many tools and files. Sometimes you find yourself doing repetitive tasks which can become annoying after awhile but being methodical and organized can definitely help. We have talked about design etiquette and how to keep your files and folders organized which can help improve your workflow - but what about shortcuts on tools…Let’s talk ways you can customize Photoshop to save you time and hopefully improve your workflow.

The Actions Panel

‘Actions’ are one of the tools you can use to help automate things in Photoshop. "An action is a series of tasks that you play back on a single file or a batch of files” —menu commands, panel options, tool actions, and so on. For example, you can create an action that changes the size of an image, applies an effect to the image, and then saves the file in the desired format”. Source Adobe Support Actions dropdown in Photoshop There are many ways you can take advantage of this tool, for example, instead of having to manually copy and paste the style of a layer - you can have it attached to a keyboard shortcut in an action. There are some tools that don't have an option for a keyboard shortcut, having actions allows the user to give a keyboard shortcut to a specific tool to use it later. Batch screen display Not all the tools in photoshop have keyboard shortcuts. Actions can be used manually in ‘Batch’ or you can utilize Droplet which are small applications that automatically process all files that are dragged onto their icon. More about actions

Template files and the CC library

Artboards make working with multiple files easier - in combination to smart objects linked to CC libraries, you can have everything you need without needing extra .psd or .psb files in your computer. You can access them from the cloud, no matter where you are and no matter what device or application in the adobe family you are using. This is helpful when dealing with images that are going to be used in different social media sites. Photoshop home screen

Scripts

You can have an event, such as opening, saving, or exporting a file in Photoshop that triggers a JavaScript or a Photoshop action. Photoshop provides several default events, or you can have any scriptable Photoshop event trigger the script or action. See the Photoshop Scripting Guide for more information on scriptable events. Source Adobe Support Scripts are similar to actions but they allow access to elements not accessible by actions and this gives more flexibility and automation to some tools. There are many useful scripts for example Template Generator, Lighten / Darken Color and many others, just search for “Photoshop Scripts” and you’ll get many results, or create your own! Related: https://www.ps-scripts.com/

Plugins and Extensions

Add-ons allow complex tasks to be done with a single click, you can add special effects to a picture, modify layer names in batch, and more, there are many available paid and free plugins available from Adobe’s Creative page. Adobe add-ons Do you have suggestions or is there a particular tool that you would be interested in learning more about to improve your workflow? Reach out!

Categories

Tags

Author

Read more about Part One: Test Driven Development

/test-driven-development

Automated testing has become one of those terms that we hear but what does it mean exactly? Is it actually necessary in order to be successful? And what common objectives can we learn from automated testing? What is automated testing? Test automation is the use of special software (separate from the software being tested) to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but imperative tasks in a formalized testing process already in place, or perform additional testing that would be difficult to do manually. Test automation is critical for continuous delivery and continuous testing. The biggest question that many ask is why is automated testing necessary - see some of our reasonings below.

Automated Testing Saves Time and Money

Software tests have to be often repeated during development cycles to ensure quality. For each release of the software, it may be tested on all supported operating systems and hardware configurations. Manually repeating these tests is costly and time-consuming. Automated tests can be run over and over again at no additional cost, and they are much faster than manual tests.

Vastly Increases Test Coverage

Tests are written as YOU define, if you put in the time and effort to write a lengthy test you wouldn’t normally test manually, you can run this unattended ensuring the product is behaving as expected while keeping your sanity.

Testing Improves Accuracy

Even the most robotic humans make mistakes, automated tests will run as specified every time they are ran.

Automation Does What Manual Testing Cannot

An example of this is the fact that automated testing can scale, simulating thousands of users hitting the web application or test all 200thousand plus pages of our web application

Automated QA Testing Helps Both Dev and QA

Simply put, you catch bugs quicker and are notified in real time.

Morale Improves Across Team

Automating repetitive processes allows team members to focus on more challenging problems which can be more rewarding.

Some common objections to automated testing:

Writing test will take me more time to write thus making me less productive!

Initially, this may take you away from moving the application forward at the rate your comfortable with but automated testing will help you from having to go back and revisit what you’ve created.

It won’t catch the tricky bugs

It may or may not, writing automated tests frees up your time to find the tricky bugs so you can push fixes.

Writing tests are boring! Boring? I agree, writing tests isn’t always the funnest scenario; if you want you can try writing your test as a pun. I have no idea where to start! We will cover a roadmap that is open ended to facilitate discussion. We are always here to help! Want to read more about QA and automated testing? Check out how automated testing saves QA jobs!

Categories

General

Tags

Automated Testing

Test Driven Development

Author

Mobomo

Read more about Interview With Clutch On WordPress and Drupal

WordPress and Drupal

President of Mobomo, Ken Fang, recently sat down with Clutch for a Q and A about all things WordPress and Drupal.

What should people consider when choosing a CMS or a website platform?

They should probably consider ease of use. We like open-source because of the pricing, and pricing is another thing they should take into account. Finally, for us, a lot of it revolves around how popular that particular type of technology is. Being able to find developers or even content editors that are used to that technology or CMS is important.

Could you speak about what differentiates Drupal and WordPress from each other?

Both of them are open-source platforms, and they’re probably the most popular CMS’s out there. WordPress is probably the most popular, with Drupal running a close second. Drupal is more popular in our federal space. I think the main difference is that WordPress started off more as a blogging platform, so it was typically for smaller sites. Whereas Drupal was considered to be more enterprise-grade, and therefore a lot of the larger commercial clients and larger federal clients would go with Drupal implementation.

They’ve obviously both grown a lot over the years. We’re now finding that both of the platforms are pretty comparable. WordPress has built a lot of enterprise functionality, and Drupal has built in a lot more ease of use. They’re getting closer and closer together. We still see that main segregation, with WordPress being for smaller sites, easier to use, and then Drupal for more enterprise-grade.

Could you describe the ideal client for each platform? What type of client would you recommend each platform for?

Definitely on the federal side, Drupal is a much more popular platform. Federal and enterprise clients should move to the Drupal platform, especially if they have other systems they want to integrate with, or more complex workflow and capability. WordPress we see much more on the commercial side, smaller sites. The nice thing about WordPress is that it’s pretty quick to get up and running. It’s a lot easier for the end user because of its limited capability. If you want to get something up more cost-effectively, that’s pretty simple, WordPress is a good way to go.

Could you speak about the importance of technical coding knowledge when building a website on either platform, from a client’s perspective?

Most of these main CMS’s are actually built in PHP, and most of them have a technology stack that requires different skillsets. So, on the frontend side, both of them require theming. It’s not only knowing HTML, CSS, and JavaScript, but it’s also understanding how each of the content management systems incorporate that into a theme. You usually start off with a base theme, and then you customize it as each client wants. As such, you need either WordPress or Drupal themers to do that frontend work. For any backend development, you do need PHP developers. For Drupal, it’s called modules. There are open-source modules that people contribute that you can just use, you can customize them, or you can even build your own custom modules from scratch. For WordPress, they’re called plugins, but it’s a very similar process. You can incorporate a plugin, customize it, or write your own custom plugin.

In between all of this, because it is a content management framework and platform, there are site builders or site configurators. The nice part about that is that you can literally fire up a Drupal website and not have to know any PHP coding or whatever. If you’re just doing a plain vanilla website, you can get everything up and running through the administrative interface. A Drupal or WordPress site builder can basically do that, provided they are savvy with how the system actually works form an administration standpoint. So, those are the technical skills that we typically see, that clients would need to have. In many cases, we’ll build out a website and they’ll want to maintain it. They’ll need somebody in-house, at least a Drupal site builder or a themer, or something like that.

Do you have any terms or any codes that clients should be aware of or should know prior to trying to launch a project in Drupal or WordPress?

PHP is definitely the main language they should know, and then HTML, JavaScript, and CSS for the frontend stuff. Drupal 8 has some newer technologies. Twig is used for theming as an example, so there’s a set of technologies associated with Drupal 8 they need to know as well.

Is there a particular feature of WordPress or Drupal that impressed you and potential users should know about?

I’m going to lean a little more into the Drupal world because a lot of people are starting to move to Drupal 8, which was a big rewrite. There are now a lot of sites starting to use that in production. They did quite a bit of overhaul on it. It is more API-driven now. Everything you do in Drupal 8 can be published as a web service. You can even do a lot of what they call headless Drupal implementations. That means you can use some of the more sexy frameworks, like Angular or React, to build out more intricate frontends, and still use Drupal as a CMS, but really as a web service.

Are there any features of the two platforms that could be improved to make it a better CMS?

I think they’re pretty evolved CMS’s. On both of them, platforms are getting into place to build right on the CMS’s without having to install them. Platforms like Acquia, WordPress.com, Automaticc. These platforms are profitable because from an enterprise standpoint right now, it is hard doing multisite implementations at that scale, managing all of the architecture, and stuff like that. From a technical standpoint, if you get into an enterprise, clients who says they want to be able to run a thousand sites on a single platform, that becomes difficult to do from a technical perspective. They both have the ability to support multisite implementations, but advancements in there to make those types of implementations easier to use and deploy would be a significant advancement for both platforms.

What should companies and clients expect in terms of cost for setting up a website, maintaining it, and adding new features?

For a very basic site, where you’re just taking things off the shelf – implementing the site with a theme that’s already built, and using basic content – I would say a customer can get up and running anywhere from two to six weeks, $20,000-30,000. Typically, those implementations are for very small sites. We’ve seen implementations that have run into the millions, that are pretty complex. These are sites that receive millions of hits a day; they have award-winning user experience and design, custom theming, integration with a lot of backend systems, etc. Those can take anywhere from six to twelve months, and $500,000 to $1 million to get up and running.

Can you give some insight into SEO and security when building a website?

The nice thing about Drupal and WordPress is that there are a lot of modules and plugins that will manage that, from Google Analytics to HubSpot, all sort of SEO engines. You can pretty much plug and play those things. It doesn’t replace the need for your traditional content marketing, analyzing those results and then making sure your pages have the appropriate content and keywords driving traffic into them, or whatever funnel you want. All your analytic tools usually have some sort of module or plugin, whether it’s Google, Salesforce, Pardot, or whatever. A lot of those things are already pretty baked in. You can easily get it up and running. That’s the nice thing about the SEO portion of it.

The other nice thing about it being open-source is that there are constant updates on sort of security. Using these CMS systems, because they tie to all the open-source projects, if you download a module, anytime there’s a security update for it, you’ll get alerted within your administrative interface. It’s usually just a one-click installation to install that upgrade for security patches. That’s nice, as you’re literally talking hundreds of thousands of modules and millions of users. They’re usually found and patched pretty quickly. As long as you stay on that security patching cycle, you should be okay. You could still do stupid stuff as an administrator. You could leave the default password, and somebody could get in, so you still have to manage those things. From a software perspective, as long as you’re using highly-active, contributed modules and the core, security patches and findings come out pretty regularly on those things.

As a company, because we do stuff with some regulated industries like banking and federal agencies, we usually have to go a level above on security. Take a WordPress site or whatever, we would actually remove that form the public so it couldn’t be hit from outside of a VPN or internal network, and then have it publish out actual content and static pages so the outside just doesn’t even connect to the back-end system. That does take some custom programming and specialty to do. Most people just implement your regular website with the appropriate security controls, and it’s not a big issue.

Are there any additional aspects of building a website or dealing with a CMS that you’d like to mention? Or any other CMS platforms you’d like to give some insight on?

For us, because we are such a big mobile player, we typically would say that, whatever you build, your CMS, obviously focus on user experience. Most people are doing a good job of that these days. One of the areas that is still a little weak is this whole idea of a content syndication. There’s still a big push where the content editors build webpages, and they want to control the layout, pages, etc. They get measured by the number of visitors to the website and all that stuff. I’m not saying that’s not important; however, we’re trying to push an idea of a web service content syndication. So, how you use these CMS’s to do that, so your content gets syndicated worldwide. It doesn’t necessarily have to be measured by how many people hit your website. It should be measured by the number of impressions.

For instance, with the work we’ve done at NASA, they announced the TRAPPIST-1 discovery of potential Earth-like planets. That drove a huge amount of traffic to the website, probably close to nine million hits that day. If you look at the actual reach of that content and NASA’s message – through the CMS’s integration with social media, with API’s that other websites were taking, with Flickr, that sort of thing – it hit over 2.5 billion social media posts. That’s an important thing to measure. How are you using your content management system more as a content syndication platform, opposed to just building webpages? USGS has also done a really solid job of this ‘create once, publish everywhere’ philosophy. I think people should be looking at content management systems as content management systems, not as website management systems.

We ask that you rate Drupal and WordPress on a scale of 1 - 5, with 5 being the best score.

How would you rate them for their functionalities and available features?

Drupal – 5 – We have a bias towards Drupal because it’s more enterprise-grade. It fits what a lot of our clients need. I think they’ve come a long way with both the 7 and 8 versions and have really brought down the cost of implementation and improved the ease of use.

WordPress – 4 – I think it’s fantastic. It’s obviously extremely popular and very easy to set up and use. I give it a 4 and not a 5 because it’s not as easy to extend to enterprise-grade implementations. For some functionalities, you still have to dig into core, and nobody wants to be modifying core modules.

How would you rate them for ease of use and ease of implementation?

Drupal – 4.5 for ease of use, because it’s not as easy as WordPress, and 4.5 for ease of installation.WordPress – 5 for ease of use, and 4 for ease of implementation. If you want to go out of the box, it’s a little more difficult. Configuring multisite is a real difficulty in WordPress.

How would you rate them for support, as in the response of their team and the helpfulness of available online resources?

Drupal – 4

WordPress – 4

Being open-source projects, there are a ton of people contributing. They’re very active, so you usually can get your answers. In many cases, to get something embedded into core, it does have to get reviewed by the organization, which is a bunch of volunteers for the most part. Because of that, it does take a while for things to get embedded.

How likely are you to recommend each platform for a client?

Drupal – 5

WordPress – 5

I think they’re the strongest CMS’s out there for the price.

How likely are you to recommend each platform for a user to build their own DIY website?

Drupal – 3

WordPress – 4

If you’re going to build your own website, and you have zero technical skills, you might want to look into a Weebly, Wix, or something like that. There is a need to know how to do site-building if you use Drupal or WordPress. Somebody has to configure it and understand it.

How would you rate your overall satisfaction collaborating with each platform?

Drupal – 5

WordPress – 5

We implement on both of them regularly, and they’re really great. They solve the need for a lot of our clients to migrate from much more expensive legacy systems.

Clutch.co interview: https://clutch.co/website-builders/expert-interview/interview-mobomo-drupal-wordpress

Categories

Tags

Author

Read more about AFCEA Announces Best In Government InnovateIT Finalists

/AFCEA-announces-best-in-government-innovateIT-finalists

InnovateIT Awards Announced

“The 2017 AFCEA Bethesda InnovateIT Awards recognizes the best in Government-wide InITiatives celebrating individuals or groups whose contributions in information technology have significance beyond their organizations. These contributions represent achievements that advance business and citizen interaction, leading to improved effectiveness, cost-savings and leadership that meet national priorities and serve as a model of excellence government-wide.”

Mobomo has been working alongside many federal agencies over the years, helping to integrate the latest technologies to ensure the federal government has the most secure systems while being cost effective, saving millions each year. We are pleased to acknowledge two of our long term partners have been named as finalists and award winners for this years 2017 InnovateIT Awards. Tim Woods, Web Re-Engineering Project Lead from USGS, won the Technology Trailblazer Award and Ian Sturken, Web and Cloud Services Manager, Enterprise Application Architecture Co-Lead from NASA, was the award winner for the Mission Excellence Enabler Award.

Tim Woods, the Web Re-Engineering Project Lead, along with his team (WRET), is the visionary with a unique capability to combine deep technical understanding with executive stakeholder and business needs. He was able to work with executive leadership to define the vision and business needs of the new system, unite stakeholders and end users across multiple USGS regions and offices, and lead a user-centric design and agile software development team to deliver on a very aggressive project timeline.

The challenge faced by the US Geological Survey’s (USGS’s) Web Re-engineering Team (WRET) was to make the vast amount of scientific data and research easily accessible and searchable for the general public through an agency-wide website. This information included natural hazards, natural resources, ecosystems and the environment. The information impacts important business decisions, from infrastructure (e.g. impact of water erosion on bridges) to agriculture (e.g. predicting droughts or floods in specific locations) to hazard response and mitigation (e.g. improving ability to predict tornados, earthquakes and other natural hazards).

Previously, this information was stored in siloed databases, servers, spreadsheets and other resources, and there was no true understanding or inventory of what was available and no way for anyone outside of the agency to find or use this information. By making this information more readily available and centralized it allows citizens, industry, and other government agencies to make informed decisions about the world around them and to develop innovative solutions for preparing for potential threats and changes that impact human lives. We needed to launch the site within 6 months.

USGS WRET worked with a team of experts across the agency to develop the content, technical and mission-driven strategies for meeting this challenge. The website and CMS were built using agile methodologies, open-source software (Drupal) and hosted in an Amazon Web Services (AWS) cloud environment. USGS worked with an AWS-certified vendor that also provided Certified ScrumMasters (CSMs) for project and program management. The team launched USGS.gov in under five months, and the team continues to deploy new features in regular intervals based on USGS requirements using Scrum. An information architecture was designed to organize science information from hundreds of sources within one website and navigation system. A robust taxonomy structure allows content managers to use a “Create Once Publish Everywhere” (COPE) philosophy facilitating content distribution throughout the site.

To free up scientists to focus on science and minimize website-related tasks, the following tools were built: 1) an automated migration tool allows a new microsite to be set up rapidly by completing a simple form in the CMS and b) a custom ElasticSearch, LogStash, Kibana (ELK) module ingests data from multiple internal and external sources and can be configured within the Drupal CMS. The USGS team delivers in person and online training and maintains a training website with additional materials, updates and staff access.

This project aligned with the agency mission and with what USGS was trying to accomplish as a whole. It provides a forward thinking approach that not only makes science provided by USGS easily accessible and searchable to the public, but also uses advanced technical solutions for keeping content updated and, in some cases, providing near real-time data for natural hazard events.

The cloud-based, Drupal framework solution has allowed multiple internal USGS departments to reduce costs on maintaining websites and data repositories that are now managed through the centralized content management system and AWS cloud infrastructure on the USGS.gov website. As additional science centers move into the new framework, those cost savings will increase.

Additionally, science centers are able to focus both funding and resources on their important research and initiatives and to more rapidly make this information accessible to the public from a central, easily searchable website. This information enables innovative solutions to problems that impact lives and livelihoods and empowers citizens to better understand the Earth and its processes from global, regional and very local perspectives.

For a complete list of award winners visit AFCEA.

Categories

Tags

Author

Read more about Mobomo Named As One Of The Best Places To Work In 2017

Washington Business Journal

Best Places to Work Announced!

We are ecstatic to have been named one of the Best Places to Work by the Washington Business Journal! On June 22nd, the Journal had their annual event to honor and announce the rankings of the companies that were named as a best place to work in the greater Washington, D.C. area. Mobomo was named as the 12^th best place to work, out of 85 total companies.

The Washington Business Journal's 11th annual Best Places to Work program honors 85 Greater Washington companies that scored highest among hundreds of employers that participated in Omaha, Nebraska-based Quantum Workplace’s annual employee engagement survey.

The Best Places to Work results are quantitative, based on survey responses from employees themselves, rather than a panel of outside judges.

“The Washington Business Journal is owned and operated by American City Business Journals, the nation’s largest publisher of metropolitan business newspapers. American City Business Journals also includes Bizjournals, the new media division, which operates the Web sites for each of the company’s 43 business journal markets.

The Washington Business Journal has been Greater Washington’s leading source of business news and information for 30 years, providing over 150,000 business executives with comprehensive news on local people and their companies, as well as industry trends, tips and strategies and award-winning critical analysis. For more information, please visit www.washingtonbusinessjournal.com.”

Here at Mobomo, it’s fair to say that not one person is the same, we encourage creativity and thinking outside the box, we love hiring folks from different backgrounds and experience that’s what makes our culture ours.

We hire people who bring awesome to everything that they do, which in turn makes Mobomo awesome. Each employee brings something to the company which makes Mobomo a great place to work.

Categories

General

Tags

Best Places To Work

Washington Business Journal

Author

Mobomo

Read more about NASA Wins Webby Award for NASA.Gov

/NASA-wins-Webby-Award-2017

NASA Wins Webby Award

NASA.gov, the agency's primary website, received its ninth People's Voice Award in the Government & Civil Innovation category!

NASA.gov, led by Brian Dunbar, NASA’s Internet Services Manager, continues to incorporate cutting-edge technology solutions to communicate the excitement of exploration to the global online public. NASA’s commitment to innovation has been the foundation for NASA.gov’s continued success and solidifies its position as one of the most visited website in the federal government.

NASA's Office of Communications has managed NASA.gov, the agency's primary home on the web since 1994, setting a high standard for government online communications. The site won Webby Awards in 2003, 2012 and 2014, and visitors to NASA.gov have voted it the winner of the People's Voice award eight times since 2002.

The site receives an average of more than 300,000 visits a day, and surges with major announcements, such as the discovery of the first known system of seven Earth-size planets around a single star, which brought in 6.7 million visits in a week.

Mobomo has been thrilled to be part of an award winning team at NASA - we are excited to see what’s in store for the future.

Categories

Tags

Author

Read more about Our Project Management Process, The Mobomo Way

The Mobomo Way

You can think of your project manager as your liaison and main point of contact. Here’s what you can expect from them in terms of communication:

As project managers, one of the key concepts we discuss with our clients during the project kickoff is the “iron triangle”, or “triple constraint.” Both of these terms directly relate to the scope, cost, and schedule of a project. Think about the scope, cost, and schedule as the three edges, surrounding quality. If any one of these edges falls short, the entire quality of the project is affected. Thus, the importance of project management to keep all sides of your project on track!

/project-management-process-Mobomo
Another key concept which we practice at Mobomo, we embrace Scrum — and, we’re all Certified Scrum Masters! If you’re unsure of what we mean by Scrum, we’re not talking rugby here.

Scrum is an Agile framework for completing complex projects. Scrum originally was formalized for software development projects, but it works well for any complex, innovative scope of work. The possibilities are endless.

Generally speaking, Scrum allows us to…

While it might sound too good to be true, Scrum is deceptively simple! Here are some key pieces of the Scrum process you’ll hear about and be involved in (if you’d like)...

Now that we’ve talked about the key pieces of our process and methodology, let’s chat tools! The following are tools that you’ll have access to with us throughout your project…

/project-management-process-JIRA

/project-management-basecamp

/project-management-process-slack

Do you have a project or a question about our project management process? Get in touch so we can get started!

Categories

General

Tags

Project Management

Author

Mobomo

« First ‹ Previous 15 16 17 18 19 20 21 22 23 Next › Last »

Subscribe to

VA

NOAA

NASA

What is Nutch?

Basic Nutch Setup

The Crawl Cycle

How To Run A Nutch Crawl

Inject

Generate

Fetch

Parse

Updatedb

Index

User Centered Design Process

The Actions Panel

Template files and the CC library

Scripts

Plugins and Extensions

Automated Testing Saves Time and Money

Vastly Increases Test Coverage

Testing Improves Accuracy

Automation Does What Manual Testing Cannot

Automated QA Testing Helps Both Dev and QA

Morale Improves Across Team

Some common objections to automated testing:

Writing test will take me more time to write thus making me less productive!

Initially, this may take you away from moving the application forward at the rate your comfortable with but automated testing will help you from having to go back and revisit what you’ve created.

It won’t catch the tricky bugs

It may or may not, writing automated tests frees up your time to find the tricky bugs so you can push fixes.

What should people consider when choosing a CMS or a website platform?

Could you speak about what differentiates Drupal and WordPress from each other?

Could you describe the ideal client for each platform? What type of client would you recommend each platform for?

Could you speak about the importance of technical coding knowledge when building a website on either platform, from a client’s perspective?

Do you have any terms or any codes that clients should be aware of or should know prior to trying to launch a project in Drupal or WordPress?

Is there a particular feature of WordPress or Drupal that impressed you and potential users should know about?

Are there any features of the two platforms that could be improved to make it a better CMS?

What should companies and clients expect in terms of cost for setting up a website, maintaining it, and adding new features?

Can you give some insight into SEO and security when building a website?

Are there any additional aspects of building a website or dealing with a CMS that you’d like to mention? Or any other CMS platforms you’d like to give some insight on?

We ask that you rate Drupal and WordPress on a scale of 1 - 5, with 5 being the best score.

How would you rate them for their functionalities and available features?

How would you rate them for ease of use and ease of implementation?

How would you rate them for support, as in the response of their team and the helpfulness of available online resources?

How likely are you to recommend each platform for a client?

How likely are you to recommend each platform for a user to build their own DIY website?

How would you rate your overall satisfaction collaborating with each platform?

InnovateIT Awards Announced

Best Places to Work Announced!

NASA Wins Webby Award

The Mobomo Way