Resume of Mark Kerzner (mark-at-elephantscale.com)
Hands-on Software Architect, Writer, Trainer, Data Scientist
Skills & Technologies
Skills: Distributed and grid computing, AWS (S3, EC2), Spark, Hadoop,
MapReduce, MapR, Hive, Pig, Sqoop, Flume, HBase, Cassandra,
high-performance multi-threaded applications, data mining, text
analytics, mathematical optimization, linear and dynamic programming, Visualization (Tableau).
JSP, Servlets, PHP, Messaging (JMS, Tibco), Web Services, JBoss,
Weblogic, .NET, C#, VB.NET, Visual C++, ASP.
Environment: Linux, Windows, Mac, MySQL, SQLServer, Oracle.
Business domains: eDiscovery, Legal, Energy, Trading.
Performed Spakr/Hadoop/Cloud consulting projects for Deloitte, Cognizant, Intel, Cerner, Sutter, Deutsche Telecom,
T-Mobile, GHX healthcare and a number of startups. Total number of Hadoop clusters set up so far: 200+.
Creator of an eDiscovery/Enterprise Search solution, FreeEed (Hadoop, Lucene, Solr, HBase, EC2, S3, text analytics, document classification).
Co-author, trainer for ElephantScale LLC
Co-author, HBase Design Patterns
Co-author, Hadoop Illuminated
Organizer, presenter and hands-on trainer at Houston Hadoop Meetup.
Big Data book reviewer for Manning and Packt
Certified Cloudera Hadoop Administrator
Cerfified AWS Architect
Elephant Scale, Houston/San Jose,
Managing partner, trainer, software architect. Involved in a number of Big Data /
Hadoop projects in eDiscovery, search, marketing, and training.
included Intuit, Deloitte, Cerner, Cognizant, Intel, Cerner, Sutter, Bank of America.
Some of the accomplishments were:
used on these projects: Java, Hadoop, Hive, Pig, HBase, R, Sqoop,
Flume, Maven, Git, ZooKeeper/Exhibitor/Curator, YARN, Storm.
- Advising and implementing customer data collection with Spark/Hadoop analytics
- Project for quant startup: architect Spark/Hadoop/HBase infrastructure, assist with
data analysis, such as similarity-set, dimensionality reduction (PCA),
grouping and clustering (K-means), ranking, regressions, correlations, Tableau visualization of results.
- Smart meters processing for Texas utilities, using Big Data technologies, S3/Hadoop/Spark/Hive/Sqoop/Cascading, Tableau visualization of results.
- Large healthcase project, architecture overview, project
contributions for HBase, Hadoop, ZooKeeper/Curator/Exhibitor, Chef,
Storm, Java, Clojure.
- Created Cassandra Advanced Data Modeling course for DataStax with Patrick McFadin.
- Created a number of Hadoop and Big Data training courses.
- Created Big Data training for Intel Hadoop Distribution.
11/11 - 07/12
Cision, Chicago, IL
Big Data consultant
Big Data system with Hadoop, HBase, and Cassandra, Lucene, Solr, Java,
R, text analytics (feature extraction, document grouping) using
MapR clusters on EC2. Designs complete architecture, assuring
Nor1, Sunnyvale, CA
Big Data consultant
- Assisted with the addition of Hadoop processing to the IT infrastructure.
- Used Sqoop, Flume, Hive, R for analysis of web site traffic. Data mining: features exrtaction, clustering (k-means).
- Designed and implemented the database layer with Java, Spring, MyBatis, and Maven, Dozer.
GHX, Louisville, CO
Big Data consultant
and prototyped Track and Trace for pharmaceuticals, using Scala,
Cassandra, Hadoop, XML, REST fine-grain access control with
certificates, with capacity of 1,000-10,000 transactions per second,
with background processes to verify chain of custody and fraud
prevention. Tasks accomplished:
01/11 - 03/11
- Refactored Cassandra-access code, to allow either Hector or
Thrift access (Factory design pattern), replacing the original Thrift
code interspersed throughout the application.
- Designed Hadoop jobs to verify chain-of-custody and look for fraud indications.
- Prepared multi-cluster test harness on EC2 to exercise the system for performance and failover.
ChooChee, Mountain View, CA
Big Data consultant
08/10 - 12/10
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance;
- Reviewed the HBase NameNode/AvatarNode design for failover;
- Wrote MapReduce/HBase jobs.
ExtremeTix, Houston, TX
High performance applications consultant
Developed high-performance cache, making the site stable and improving
Implemented complex discount logic using Drools.
12/08 - 08/10.
HighGate, New Jersey
Designed and developed a high-performance cloud-based eDiscovery system
(Java, Hadoop, SimpleDB, RDS). This amounted to a sophisticated legal search application and included the following:
01/10 - 07/10
- Designed and architected the complete concept from scratch, based on my study of eDiscovery and on my JD work;
- Prototyped the proof-of-concept with Hadoop in two months;
- Created a complete processing engine, based on Cloudera's
distribution, enhanced to include a custom Amazon Machine Image (AMI),
which served as a release unit and contained all the necessary custom
- Created an operator console to start and manage eDiscovery
clusters on EC2; created an operator GUI application, which worked
inside the private cloud and automated all Hadoop and S3 related
project was later completely re-implemented again from
scratch as an open-source project FreeEed, this time based on
Hadoop,and a choice of NoSQL database (HBase, Cassandra, S3, or
Architect at Quiz Revolution,
10/08-08/09 - Senior Developer
Exobox, Houston, TX.
Develops text analysis and business intelligence applications, based on
include web scraping, document conversion, search index creation,
automatic categorization, duplicate detection, using Java technologies
and open-source projects. Later ported complete infrastructure into the
UBS, Houston, TX.
Consultant Developer for Commodities Trading
Develops high-performance trading applications, with the
high-reliability, high-performance, multi-threaded framework based on
- Signal Suite - real-time high volume (100k+ messages
per/second) data analysis that can be used from algorithmic trading to
system monitoring. Based on Esper for ASP (Event Stream Processing) and
CEP (Complex Event Processing).
Merrill Lynch Commodity Trading, Houston, TX.
Senior Developer for eConnect.
- re-engineer the system to improve performance and to bring the GUI to
today's look and feel;
- ICE/eConnect integration;
- integration of third-party trade systems.
- intensive testing, bug detection and fixes;using Swing, Java,
Weblogic, JMS, SQLServer, Hibernate, Linux, Windows.
BaseBase Corporation, Houston, TX.
Senior Software Engineer / Architect.
- a multi-media sharing site with distributed architecture;
- social network with AJAX interactions and multilayer Google Maps
JBoss, JMS, MySQL, Hibernate, AJAX, Linux, Windows.
HyperAlert, Houston, Texas.
defined and implemented new features, improved stability, scalability,
and reliability, until HyperAlert became a leading communications
platform for contacting people by phone, email, web, with real-time
response tracking. Used open-standards architecture with Linux, JBoss,
EJB, JSP, AJAX, VoiceXML, MySQL.
Lateral Data, Houston, Texas.
Architect and Lead Developer.
and implemented a software system
for eDiscovery - unique, massively parallel, scalable.
ODS_Petrodata, Houston, Texas.
and improves various aspects of the ODS Petrodata commercial websites.
The sites are used by subscription by oil companies and energy
operators to plan and execute offshore drilling programs. Technologies
used are Java, Weblogic, XML, JSP, Servlets, and SQLServer.
- integrated site search and indexing using open source Lucene,
SHMSoft, Houston, Texas.
Director, lead developer.
Suggests, designs, and implements new software products and
improvements. These include:
Translink, optimization energy trading planner, VB/Access, C++ advanced
optimization, for Energistics, LLP, http://energisticsllc.com
software package for delivery services with scheduling, dispatching,
payroll, accounting, and web order entry. Currently used by dozens of
people in 4 cities.
07/2001 – 01/2002
Structure Consulting Group, Houston, Texas.
Designs and develops applications for deregulated energy markets.
Java architect/lead developer for the Trade Manager, which keeps track
of energy trading contracts, energy consumption measurements and
financial settlements, and controls risk management. The technologies
used are Java, Swing, J2EE, Oracle, Tibco, PL/SQL.
07/2000 - 07/2001
Coral Energy, Houston, Texas.
Designs and develops applications for on-line energy trading, using
Java/Swing, J2EE, EJB, Weblogic/Oracle, Tibco, Endur.
02/2000 - 07/2000
Emerging.com, Houston, Texas.
Builds commercial B2C and B2B websites. Tasks accomplished:
- www.ashford.com rewrite using Java, Servlets, JSP's, WebLogic, WLCS.
Enron Energy Services, Houston, Texas.
Enron's Common Data Platform (CDP) which brings together all enterprise
data. CDP is based on EJB (Enterprise Java Beans specifications) and
comprises Java and C++ servers, with C++ and Java clients, ObjectStore
database, communicating through CORBA and XML.
Dresser-Rand, Houston, Texas.
"Global Access", an Internet-based system of remote control over
Shell Oil (BTC), Houston, Texas.
Develops applications for processing and 3-D modeling and visualization
of exploration data (123DI, Spir3DVIP) on UNIX.
Mincom, Pty., Houston, Texas.
OpenWorks/Geolog data server (Java, CORBA, PC, UNIX). Suggests and
develops innovative graphical user interface to database objects. The
interface is based on the JGO++ library, and is used to graphically
configure database mapping.
Petrophysical Solutions, Inc., Houston, Texas
Develops complete novel well log data processing applications in Java,
PC, UNIX, and databases.
12/1995-08/1998 (after 04/1996 continuing part time, at 30 hours/week).
Applied Training Resources, Houston, Texas.
Procedure Maker, a multimedia information management system for
Western Atlas International, Houston, Texas.
Designed and implemented applications for data base storage of well log
data. C++/MFC, VB, Windows, UNIX, WIND/U.
Oilware, Inc., Houston, Texas.
and implemented a C++ library of 100+ classes for new data exchange
standard (RP66 and DLIS). The volume of 20,000 lines was completely
designed, implemented, and tested in 1.5 years.
Halliburton Logging Services, Inc., Houston, Texas.
- designed and implemented a prototype for an object-oriented
- implemented parts of client-custom server for multi-user access of
the above database;
- designed and implemented new computer applications using AI and image
Dresser Atlas, Inc., Houston, Texas.
as Systems Analyst, left as Senior Computer Research Specialist.
Received Dresser Industries Golden Creativity Award in 1984.
- new computer applications for log analysis;
- systems for log processing, databases, interactive and hard copy
Hadoop bootcamp, Redwood City, CA, by ScaleUnlimited, 2009
MapR training with Zaloni, Chicago, IL, 2012
School of Law
St. Petersburg University, Russia.
MS in Math, 1978.
St. Petersburg Electrical Engineering Institute.
MS in Computer Science, 1978.
St. Petersburg 239 Liceum
Certified Hadoop Administrator.
Java Programmer Certification, SUN.
MSCD (C++, VB path) Certification (Microsoft).
Publications & Misc
Reviewer for "Hadoop Operations and Cluster Management Cookbook", 2013, Packt.
Reviewer for "Big Data Analytics with R and Hadoop", 2013, Packt.
Reviewer for "Securing Hadoop", 2014, Packt.
Reviewer for "Practical Data Analysis", 2013, Packt.
Reviewer for "Learning Cassandra for Administrator", 2014, Packt.
Reviewer for "Cassandra Design Patterns", 2013, Packt.
"Professional Java E-Commerce", WROX, 2002.
"Image Processing in Well Log Analysis", Prentice Press, 1985, reprint 2014.
Three US Patents for computer software/well log analysis.
Mensa Member since 1983
IEEE Member since 1980
ABA Member since 2013