Wednesday, October 4, 2017
Thursday, May 25, 2017
Applying machine learning to study and do capacity planning
ML algorithms and libraries have evolved and the scope of applying the techniques has also grown a lot. I am trying here to apply one of such techniques to study the performance metrics and then estimate the required capacity at a future load.
Lets take a simple set like pageviews and heap usage
Example:
pv, heapusage(mb)
105,637
110,638
115,640
120,642
125,644
130,646
135,648
140,650
145,652
Now, since this is not a classification type dataset, the regression models can be used. There are a few regression models from one of the coolest ML libraries - scikit (sklearn).
let's take a linearRegression model and try to fit it on the set.
The logic to implement the model is:
- declare the data set into arrays (have not changed the parameters like squaring them etc.,)
- create sets for training and validation as 20% is used for validation
- choose the model as linearRegression
- fit the model on the train set to make it learn the trends
- get the score of learning to verify how well it has fit on the dataset
- get the coefs and intecepts used to fit the model
- predict the validation set and compare it with actual values in validation set
- if it is all good, then try to predict the load for future increased load
Once, this is done, I have build wrappers and a simple UI.
UI Page 1: To simply upload the feature file i.e. the metrics file
UI Page 2: This will show the result. The correlation between the metrics, validation sets and how close the predicted values and finally the predicted output i.e. required Heap Memory for the given page views.
Although this is a simple demonstration, more features can be added..
UI Page 1: To simply upload the feature file i.e. the metrics file
UI Page 2: This will show the result. The correlation between the metrics, validation sets and how close the predicted values and finally the predicted output i.e. required Heap Memory for the given page views.
Although this is a simple demonstration, more features can be added..
Sunday, April 23, 2017
some useful linux commands
Below are some of the useful linux commands to diagnose issues..
To know about processes, parent and call heirarchy and to know the process's resource usage etc., we can use pstree, ps, top -H etc.,
Example:
pstree
init─┬─Xvnc
├─crond
├─firefox───9*[{firefox}]
├─gnome-terminal─┬─bash
│ ├─gnome-pty-helpe
│ └─{gnome-terminal}
├─gnome-terminal─┬─bash───su───bash───startWebLogic.s───java───103*[{java}]
ps -e f
132 ? Sl 0:01 gnome-terminal
137 ? S 0:00 \_ gnome-pty-helper
138 pts/1 Ss 0:00 \_ bash
353 pts/1 S 0:00 | \_ su
357 pts/1 S+ 0:00 | \_ bash
460 pts/1 S 0:00 | \_ /bin/sh ./startWebLogic.sh
510 pts/1 Sl 26:44 | \_ /...../jdk/bin/java -server -Xms256m -Xmx1024m -Dwe
993 pts/0 Ss+ 0:00 \_ bash
while ps -ef can give you full command of the process
And, it will also show the parent process id and process cpu, rss size etc.,
To know all the list of files that the process has accessed, lsof is the command..
/usr/sbin/lsof -p 510
So the weblogic server which is a JVM has accessed 2K+ files. And example is below..
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 510 root cwd DIR 202,2 4096 5657417 .../DefaultDomain
FD -is the file descriptor like below
cwd current working directory;
mem memory-mapped file;
mmap memory-mapped device;
pd parent directory;
rtd root directory;
tr kernel trace file
TYPE is the file type like a REGulare or DIRectory..
NODE is the inode number
DEVICE indicates the device type and partition numbers
There are couple of other useful commands to debug issues like top -H -p
issues in a JVM..
strace is another useful command to know what a process is doing at OS level like socket connections, reads etc.,
netstat is another command with options like -nap gives the information on the connections and rec/send Q which can indicate any network or program slow or being blocked and which
connections in which state etc.,
Disk space check commands like df -h, du -sh . etc., are useful to verify sizes and space on disk and network shares
free -g is another useful command to know how much memory has been consumed in physical, swap and cache/buffer areas.
SAR is another great collection of metrics rangning from processes, memory, swap activity, CPU, load average, disk, network etc.,
To flush out the cache and buffers drop_caches is what that needs to be cleared out. Example as shown below
free -m
total used free shared buffers cached
Mem: 15500 15330 169 0 249 11778
-/+ buffers/cache: 3302 12197
Swap: 10047 1704 8342
echo 1 > /proc/sys/vm/drop_caches --- this will flush out the cache and buffers.
free -m
total used free shared buffers cached
Mem: 15500 3367 12132 0 0 635
-/+ buffers/cache: 2731 12768
Swap: 10047 1704 8342
The above is useful if you want something to be loaded into memory again with some runtime changes and also to test out something like a disk speed
To check disk speed: DD - a linux command is a very useful one.
Example Write test:
date; time dd if=ip/test.txt of=op/test.txt bs=1024k count=1000
Sun Apr 23 06:46:17 PDT 2017
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.15715 seconds, 332 MB/s
real 0m3.164s
user 0m0.000s
sys 0m1.104s
-- date is to print out which date and time is to time the command and it is writing a file from ip dir to op dir with block size as 1K and a 1000 times i.e. a total of 1GB. This has spent 3.16s total i.e. 'real' means the elapsed time and system mode CPU time is 1.1 s and nothing really in user mode. So its basically the speed i.e. 332 MB/s took that much time..
But to test it in isolation i.e. read and write speeds, one can use /dev/zero which is a special null char file gives as many as read..
date; time dd of=/dev/zero if=test.txt bs=1024k count=1000 --- read speed
date; time dd if=/dev/zero of=test.txt bs=1024k count=1000 --- write speed
some of other commands like ping, tracert can be used to test the network speed and network paths..
date; time dd of=/dev/zero if=test.txt bs=1024k count=1000 --- read speed
date; time dd if=/dev/zero of=test.txt bs=1024k count=1000 --- write speed
some of other commands like ping, tracert can be used to test the network speed and network paths..
Friday, March 3, 2017
Test JMS Connections
Below a simple code to test the jms connection. This will time the initialcontext and the queue connection steps. You can get a queue and factory details from the weblogic console. Services - Messaging - JMSModules. Pre-reqs are like, you need to create a JMS server, JMSModule, ConnectionFactory, Queue.
import java.io.*;
import java.util.Hashtable;
import javax.jms.*;
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;
public class TestJMS
{
public final static String JNDI_FACTORY="weblogic.jndi.WLInitialContextFactory";
public final static String JMS_FACTORY="testFactory";
public final static String QUEUE="testQ";
private QueueConnectionFactory qconFactory;
private QueueConnection qcon;
private Queue queue;
long t1,t2;
public void initialize(Context ctx, String queueName)throws NamingException, JMSException
{
t1=System.currentTimeMillis();
qconFactory = (QueueConnectionFactory) ctx.lookup(JMS_FACTORY);
qcon = qconFactory.createQueueConnection();
t2=System.currentTimeMillis();
System.out.println(t2-t1);// this is to measure how long does the connection takes
}
public void close() throws JMSException {
qcon.close();
}
public static void main(String[] args) throws Exception {
long t1,t2;
Hashtable env = new Hashtable();
env.put(Context.INITIAL_CONTEXT_FACTORY, JNDI_FACTORY);
env.put(Context.PROVIDER_URL, args[0]);
t1=System.currentTimeMillis();
InitialContext ic = new InitialContext(env);
t2=System.currentTimeMillis();
System.out.println("inittime:"+(t2-t1));// this is to measure how long does the initial context takes
TestJMS qs = new TestJMS();
qs.initialize(ic, QUEUE);
qs.close();
}
}
Before executing this program, the environment variables needs to be set. This can be done by running setDomainEnv.sh under weblogic home-userprojs-domains..bin and run the java code from here by passing the t3 url as argument.
Wednesday, February 1, 2017
Java EE 7 APIs
EJB
session (stateful and stateless)
Message driven beans
Async local session in ejb lite
nonpersistent timers in ejb lite
Java Servlets
Nonblocking IO - for scalablity
http protocol upgrade
JSF
the UI framework for the web apps supports many features like
rendering, input validation, event handling, data conversion, page navigation,
EL and new ones like HTML 5, Faces flows, resource libraries..
JSP
JSTL (tag libraries)
Java Persistent API
JAX-RS - Java API for RESTful services
Managed Beans
Dependancy injection for Java and Java EE
Bean Validation
JMS
Java EE Connector Architecture
JavaMail API
JAAC - Java Authorization Contract for Container
JASPIC - Java Aithentication Service Provider for Continers
Java API for WebSocket
JSON-P - Java API for JSON processing
Concurrency Utilities for Java EE
Batch Applications for Java Platform
JDBC
JNDI
JavaBeanc Activation Framework
JAXP
JAXB (Java Architecture for XML binding)
JAX-WS
SAAJ - SOAP with attachments API for Java
JAAS - Java authentication and authorization service
Annotations
Subscribe to:
Posts (Atom)