Artificial Neural Networks

Artifical Neural Networks (ANNs) are a key part of machine learning. We can see how they work by building a toy example.

This post shows two variations of ANNs, in two languages. First in Python (with the NumPy library) and then in J. These implementations are based on the code in this post.

Both of these languages are high-level, highly dynamic languages. Python's strengths include its extensive set of libraries, and the ease of extending the language both in terms of syntax and by binding to compiled C code. J's major strengths include its fundamental array-oriented paradigm, and the ease of composing its large set of primitive operations.

one-layer

One-layer networks can estimate linearly separable functions. However, they can not predict nonlinear functions.

Linear functions include most of the 2-input boolean functions:

input	and	or	x → y
0 0	0	0	1
0 1	0	1	0
1 0	0	1	1
1 1	1	1	1

import numpy as np
X = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])
y = np.array([[0,1,1,1]]).T
np.random.seed(1)
w0 = 2*np.random.random((3,1)) - 1
for i in range(10000):
    l1 = 1/(1+np.exp(-(X @ w0)))
    l1_err = y - l1
    l1_del = l1_err * l1 * (1 - l1)
    w0 += X.T @ l1_del
print(l1)

[[0.01627589]
 [0.98974255]
 [0.98974254]
 [0.99999822]]

input =: 4 3 $ 0 0 1  0 1 1  1 0 1  1 1 1
target =: 4 1 $ 0 1 1 1
dot =: +/ .*
sig =: {{ %1+^-y }}
train =: {{
    l1 =. sig input dot y
    l1_err =. target - l1
    l1_del =. l1_err * l1 * 1 - l1
    y + (|:input) dot l1_del }}
5j3": sig input dot train^:10000 <:+:?.3 1$0

two-layer

Two-layer networks are more versatile, and can estimate nonlinear functions. The classic example of this is the boolean XOR function:

input	xor
0 0	0
0 1	1
1 0	1
1 1	0

A single neuron can't estimate this function, no matter how hard it tries. However, 2 neurons in sequence handle it just fine.

import numpy as np
X = np.array([[0,0,1], [0,1,1], [1,0,1], [1,1,1]])
y = np.array([[0,1,1,0]]).T
np.random.seed(1)
w0 = 2*np.random.random((3,4)) - 1
w1 = 2*np.random.random((4,1)) - 1
for j in range(10000):
    l1 = 1/(1+np.exp(-(X @ w0)))
    l2 = 1/(1+np.exp(-(l1 @ w1)))
    l2_error = y - l2
    l2_delta = l2_error * l2 * (1 - l2)
    l1_error = l2_delta @ w1.T
    l1_delta = l1_error * l1 * (1 - l1)
    w0 += X.T @ l1_delta
    w1 += l1.T @ l2_delta
print(np.round(l2,3))

[[0.007]
 [0.991]
 [0.992]
 [0.01 ]]

input =: 4 3 $ 0 0 1  0 1 1  1 0 1  1 1 1
target =: 4 1 $ 0 1 1 0
dot =: +/ .*
sig =: {{ %>:^-y }}
train =: {{
    'ignore_me w0 w1' =. y
    l1 =. sig input dot w0
    l2 =. sig l1 dot w1
    l2_error =. target - l2
    l2_delta =. l2_error * l2 * 1 - l2
    l1_error =. l2_delta dot |: w1
    l1_delta =. l1_error * l1 * 1 - l1
    w0 =. w0 + (|:input) dot l1_delta
    w1 =. w1 + (|:l1) dot l2_delta
    l2;w0;w1 }}
5j3":0{::train^:10000 {{<:+:?.y$0}} each 1 ; 3 4 ; 4 1