T.J. Langford
Wright Lab & YCRC
October 28, 2020
numpy
and matplotlib
numpy
matplotlib
numpy
, matplotlib
numpy
¶import numpy as np
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
array
objectArrays can be created in a variety of ways. The most common are either empty:
a = np.zeros(10)
a
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
or you can create them from an existing list
:
b = np.array([0,1,2,3,4,5,6,7,8,9])
b
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Arrays have a few key properties:
print(a.dtype)
print(b.dtype)
float64 int64
a.shape
(10,)
c = np.array([[0,1,2,3],[4,5,6,7]])
c
array([[0, 1, 2, 3], [4, 5, 6, 7]])
c.shape
(2, 4)
Arrays are indexed (starting at 0
) and we can slice them:
b
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b[2:4]
array([2, 3])
b[0:-2]
array([0, 1, 2, 3, 4, 5, 6, 7])
b[::2]
array([0, 2, 4, 6, 8])
We also have "fancy indexing" for n-dimensional arrays:
c[:,2]
array([2, 6])
Arrays can be manipulated in-place, without creating a second copy:
b[3] = 10
b
array([ 0, 1, 2, 10, 4, 5, 6, 7, 8, 9])
You can assign a range of values at once, either to a single value or from another array:
a[0:2] = 9
a[3:7] = b[0:4]
a
array([ 9., 9., 0., 0., 1., 2., 10., 0., 0., 0.])
The assignment isn't linked, so changing b
now doesn't change a
:
b[3] = 10
a
array([ 9., 9., 0., 0., 1., 2., 10., 0., 0., 0.])
We can also act on these arrays with specific operations:
np.multiply(a, b)
array([ 0., 9., 0., 0., 4., 10., 60., 0., 0., 0.])
np.max(b)
10
np.mean(a)
3.1
np.median(a)
0.5
np.sum(c, axis=1)
array([ 6, 22])
Numpy has two main ways of importing and exporting data:
np.savetxt('test.txt', c, fmt='%f', delimiter=',', header='My favorite array')
cat test.txt
# My favorite array 0.000000,1.000000,2.000000,3.000000 4.000000,5.000000,6.000000,7.000000
np.save('test.npy', c)
ls -l test*
-rw-r--r-- 1 langford staff 192 Oct 21 11:22 test.npy -rw-r--r-- 1 langford staff 92 Oct 21 11:22 test.txt
numpy
¶Numpy has a full suite of tools for generating random numbers. Very helpful for Monte Carlo simulations or toy data.
Here we will generate 100k random floats from a normal distribution with mean = 2.0
and sigma = 1.0
.
r = np.random.normal(loc=2, scale=1, size=100000)
print(r[0:10])
[1.36524821 1.84659083 1.65146673 1.72326769 2.44126153 3.32639604 0.89162056 1.80902509 4.09851638 0.60734383]
We can randomly select elements from an array:
np.random.choice(a, size=2)
array([0., 1.])
All the "heavy-lifting" is done in C
, so numpy
-based work can be very fast.
pi
¶We can perform a Monte Carlo-based simulation to calculate pi
using two uniform random number generators.
def mc_pi(num_trials):
x = np.random.uniform(low=0.0, high=1.0, size=num_trials)
y = np.random.uniform(low=0.0, high=1.0, size=num_trials)
r = x**2 + y**2
return len(r[r<1])*4/num_trials
for n in [10,100,1000,10000,100000,1000000]:
print(f"{n}: {mc_pi(n)}")
10: 3.2 100: 3.16 1000: 3.1 10000: 3.1336 100000: 3.14468 1000000: 3.14098
import matplotlib.pyplot as plt
x = np.array([0,1,2,3,4])
y = np.power(x, 2)
plt.plot(x, y, marker='o', label='Square');
plt.xlabel('X-values');
plt.ylabel('Y-values');
plt.legend(loc=2);
x = np.array([0,2,4,6,8,10])
y = np.power(x, 2)
r = np.power(x, 3)
plt.scatter(x, y, s=r)
plt.xlabel('X-values'); plt.ylabel('Y-values');
Numpy
¶Now that we have a basic familiarity with numpy
, we will attempt to process some low-level data produced by a PMT connected to a scintillator cell.
A few key details about the data file:
data_file = '../data/180822-EJ309-1700V-Cf252.dat'
f = open(data_file, 'r')
header = np.fromfile(f, dtype='I', count=4)
print(f"Channel: {header[1]}, Trigger: {header[2]}, TimeStamp: {header[3]}")
trace = np.fromfile(f, dtype='<H', count=300)
plt.plot(trace)
plt.xlabel('Samples');plt.ylabel('ADC')
Channel: 1, Trigger: 0, TimeStamp: 294763
Text(0, 0.5, 'ADC')
def data_reader(file_name, num_ev=10000):
header_size = 4
window_size = 300
with open(file_name, 'r') as d:
# define data structure
dtype = [('time_stamp','f4'), ('en','f4'), ('psd', 'f4')]
data = np.zeros(shape=num_ev, dtype=dtype)
for i in range(num_ev):
header = np.fromfile(d, dtype='I', count=header_size)
data['time_stamp'][i] = header[3]
trace = np.fromfile(d, dtype='<H', count=window_size)
# Average the first 30 samples to determine what "zero" is
bl = np.average(trace[0:30])
# Subtract off the baseline
trace = np.subtract(trace, bl)
# find the minimum of the pulse
peak_loc = np.argmin(trace)
# Integrate the full pulse
data['en'][i] = -np.sum(trace[peak_loc-30:peak_loc+100])
# Integrate the tail of the pulse, then take the ratio to the full
data['psd'][i] = -np.sum(trace[peak_loc+10:peak_loc+100])
data['psd'][i] /= data['en'][i]
return data
data = data_reader(data_file, num_ev=30000)
<ipython-input-36-a2efc4a279d1>:31: RuntimeWarning: divide by zero encountered in float_scalars data['psd'][i] /= data['en'][i]
plt.plot(data['time_stamp'][0:100], marker=',')
plt.xlabel('Event Counter');
plt.ylabel('Time Stamp');
plt.hist(data['en'], bins=200, range=(0,100000), histtype='step');
plt.yscale('log');
plt.xlabel('Pulse Integral (ADC)');
plt.ylabel('Counts/bin');
plt.hist2d(data['en'], data['psd'], bins=50, range=((0,50000),(0,0.5)), norm=LogNorm());
plt.xlabel('Pulse Integral (ADC)');
plt.ylabel('Tail-fraction (%)');
Online Resources:
Further reading (O'Reilly Books available through Yale Library):