When we designed our network we attempted to identify all the potential single points of failure and build resillience around them. We addressed the following issues:
- Geographical Reundancy - All of our systems are replicated in a 2nd site.
- Switch redundancy - We have 2 separate switches.
- Interconnect failures- All major interconnects, including BT connect over at least 2 separate links.
- PI Space- We run our own BGP routers, which means we control how our IP addresses are reached over the internet.
- IP links - We have 3 separate Gigabit links into the internet.
- Power - All of our Racks are dual fed and all equipment has dual PSU's
- Database - Our database is replicated across 2 separate arrays and split over 2 sites.
The first and most important part of our design strategy was to run our own BGP sessions. This means that the IP addresses that Bellingham uses are advertised by our own Border Gateway Protocol (BGP) routers to all of the other BGP routers on the Internet. Each BGP router then decides which of its IP links to use as a first or second choice route to reach our network. If there are any link failures along the way BGP will automatically route around this.
Most internet users will leave the BGP to their upstream suppliers and just purchase transit bandwidth. This leaves you vulnerable to problems that occur on their BGP sessions or on the physical links between your network and the IP transit provider. Running BGP allows us to have a completely resillient IP bandwidth solution. Another benefit of BGP is that we can optimise the IP routing to reach any destination on the interent. If for example we are receiving an intermittant service on one of our Peers we are able to promote a different Peer to send the traffic to its destination. This is only possible if you run your own BGP.
If for some reason one of our datacentres was to go off line, due to a natural disaster or complete power failure we would still be able to operate all of our services and interconnects from the surviving datacentre. Some of the services would be running at a slightly restricted capacity during peak times till the issuen was resolved but no services would go offline.
Rather than taking an N+1 approach to Geographical redundancy we chose to max out each datacentre at 66% of avilable capacity at peak times. This reduces our maximum normal working load by 34% but means that in the unlikely event of one of our datacentres going off line we would be able to run at 80% of peak load capacity. Based on the traffic loading accross our network this would only impact services for about 90 minutes per day. In terms of actual % of calls that would be rejected over a 24 hour period this represents about 5%.
This also has the added benefit of alowing us to take on short term projects that require a peak load capacity of up to 40% of our maximum capacity. The sort of project this would be relevant for would be TV voting.
All of our equipment has 2 Power Supply Units (PSU's) and is indendantly fed from separate Power Distribution Units (PDU's). The power entering the Datacentres is a diverse feed meaning that it comes from 2 separte substations which are in turn fed from separate power stations. Once inside the Datacentre the power is then fed into separate distribution boards and then into the Uninteruptable Power Suply (UPS). The UPS is a giant array of batteries designed to run the entire Datacentree for around 30 minutes.The UPS is then backed up by Diesel Generators that typically kickin within 2 minutes of a power outage.
The power netowrk is configured as N+1 meaning there is a spare for each element in the power network. The power supply is tested on a monthly basis with the generators run for 1 hour and different PDUs tested to ensure smooth operation. In the event of an extended power supply issue the fuel Holding tanks carry sufficient Diesel to run for 7 days. There are also 24 hour fuel re-suply contracts in place to cover us.
Capacity Planning / Network Improvement
Any network or technical platform is a work in progress, we are constantly evaluating new technologies to find ways to improve our network. We combine this with capacity planning, where we evaluate current usage patterns and make sure we have sufficient capcity for current clients whilst always keeping a little in reserve for new business.
Our account managers regularly meet clients to discuss their traffic forecasts for the next 6 & 12 months so we can build this into our network expansion program. we aim to run our network at no more than 90% of design capacity at peak times. We do of course have a further 20% capacity in reserve for short term projects such as TV voting by utilising our Geographical redundancy!
ASR – Answer Seize Ratio
BER - Bit Error Rate
BGP – Border Gateway Protocol
BPS - Bits Per Second
BW – Bandwidth
CDR – Call Data Record
DB – Database
DID - Direct Inward Dialing
DTMF - Dual Tone Multiple Frequency
FTP - File Transport Protocol
HTML - HyperText Markup Language
IN - Intelligent Network
IP – Internet Protocol
ISDN - Integrated Services Digital Network
ISP - Internet Service Provider
IVR -Interactive Voice Response System
LAN – Local Area Network
NGN - Non Geographic Number
PIN - Personal Identification Number
POP - Point of Presence
POTS - Plain Old Telephone Service
PSU – Power Supply Unit
SIP - Session Initiated Protocol
SS7 - Signalling System 7
UPS – Uninterruptable Power supply
URL - Universal Resource Locator
VoIP - Voice Over IP regularly
WAN - Wide Area Network