- Joined
- Sep 23, 2014
- Messages
- 289
- Likes
- 450
- Degree
- 1
I haven't been posting much because I've been deep in the code-woods working on a new lead-gen service that I am partnering with a client on. This is a rewrite of a previous system I had designed. I've been able to pick and choose whatever technologies I wish to work with and that has made all the difference in the world. Building this project got me to thinking about BuSo a lot and how I can best contribute.
We talk a lot here about things we've built. Just take a look around at all the success stories and sometimes failure stories that are shared here. There are many of them from some of the best of breed marketers out there. We talk about the SEO side and the marketing aspect, which is honestly the point of all of our code and hours spent. I'll be the first to say I don't have a ton to offer as far as SEO and marketing knowledge drops go. Most of you really have that covered in a very eloquent and succinct way. Thank you for that, it's much appreciated and helps me be more effective in my own projects.
With this in mind, what I feel I can offer BuSo is a bit over 30 years experience as a programmer and "how" I build some of the things I do. I don't have any code-ego issues that I'm aware of and I'm always happy to talk with fellow devs about what languages and techniques they are using. Me being at this for 30 years doesn't make me any better than anyone else but it has definitely given me a lot more time to make a lot more mistakes and learn from them.
Lets Talk about "HOW"
So, I see these great stories of WHAT you've built and WHAT you're doing to market that product/idea, but it's rare I see something about HOW it's built. I think a lot of folks would like even a bit of insight into what it takes to get more than a "handful of scripts" project off and running and maybe even a glimpse into the mind of the developers that build these systems.
I'd like to share with BuSo the general architecture of a system I built and pretty much where my mind is at as a developer these days. I know we have quite a few experienced devs here and you might find this boring. If so, no worries, I'm just hoping to share some insight here and challenge those that might not be doing this type of thing to expand their knowledge. To "up the game of the collective" so-to-speak.
TLDR Warning!!!
Yes this is long. Also, this may be very boring to most, especially if you're not into the code side of things. Some of you may be working with the things I describe below and if so, spare yourself and turn back now, lol. For those that continue, I hope this can at least challenge the way you think about a medium/large scale SAAS type project and how the same mindset can apply to practically any project you can think of.
A bit of background
This is a SAAS that does one thing in life. It provides access to a database of data that is aggregated and scrubbed from about 20 different sites. The data is oftentimes not complete so we use services like Pipl and outsourced data companies to help us fill in the blanks as our customers are interested in complete names/addresses/phones #'s etc. The data that is scraped involves going through a series of "filters" so that the end result is clean data (kinda like the way Google works only this is pretty effective). The system is currently in use in-house and has been running for about 5 years with no show-stoppers. I designed the original system in C#, primarily because the only way I could really drive a browser was with either the junk IE component for .net or using Awesomium so I went with the latter.
And it worked and worked quite well, but there's a difference in something that "just works" and something I feel good about maintaining, scaling and supporting. It was time to say goodbye to what we had and refactor the whole deal into something web-based. As developers, I feel we should always take a "how can I refactor this" stance with our projects, even if it means going with a completely different approach. Just spiking away at code to get something that barely works and then abandon it to do the same thing with other features is going to end up with a crap project held together with duct tape of "at least it works" code and a mile long todo list which, by the time you realize it, you'll be doing a complete re-write the right way anyway. Spike to get it going then refactor. Refactoring doesn't mean "make everything perfect", it means "make this a house I'd live in". So while we're on the subject of metaphors...
Smoketree and C# - Today on divorce court
So, this was designed "the right way" but hell, let's face it, it was a Winforms app... The front-end was a Winforms app that also "talked" to the back-end (I used RabbitMQ for process inter-op). I gotta admit, it works well and has never crashed, but times change. We are no longer in a "winforms" world, haven't been for years, too many to feel that pain anymore. I did what I had to at the time because it was the best compromise of the path of least resistance and providing value to my client. I never enjoyed working with C# and winforms although I am pretty proficient at it. I "get" it, I just don't get why I have to write so much code to do so little. This client was the last of my clients that I was supporting a C# app for (most of my work is web-based) so it was time to have a talk about the future of this project and moving away from anything to do with a dependency on M$ tech. It was time for a change.
Project "Binary Phoenix"
It was decided that we expand to a SAAS in 2015. I was once again offered the opportunity to design the system from scratch, using what I felt was best. I have been working with Rails since the beginning and Ruby since about 2003 or so so naturally I picked Rails for the base framework. I'm a programming language addict. If it's out there, I'm anywhere in the range of "I've messed with it" to "I'd consider my knowledge with it solid" as far as my skill level. I have stuck with Ruby because I'm happy with it. It's a joy to code in and I can go from idea to implementation with code so succinct tears are shed.
Let's take a walk-through shall we?
Front-end:
With redis I can just "pop" a task off the queue and rest assured that it won't be given to another process that just happened to ask for the data at the exact same time (yes it happens, albeit rarely). With pub/sub (publish/subscribe) I have a very lightweight way to communicate with any other part of the system at any time. I can have tables in MySQL and sync them with objects in redis that contain the same data. Any time I have to read data, I hit redis first because the lookup is pretty much instant. In short, redis holds it all together and does a beautiful job at it.
This system has been running in pre-production for about 2 weeks now and I don't anticipate anything tragic to happen. I've done my homework and built something I feel is ready for prime time. This system will be generating around 10k a month out of the gate as it's just a rebuild of an old system that has been generating revenue for a while now.
So to end this highly long-winded post, I really hope that some of the above may give some of my fellow devs a few new ideas or even a completely different mindset. Nothing I'm sharing here is particularly new or very innovative, it's just the patterns I've been using the past few years. In other words, I didn't invent any of this, I'm just using the tools above for what I feel is the best purpose. Also I'm sure you "can do this in language X" too. That's what makes things interesting and I'd love to hear about how you're using your favorite tools to build your projects.
So I ask my fellow builders, how do you build what you build?
We talk a lot here about things we've built. Just take a look around at all the success stories and sometimes failure stories that are shared here. There are many of them from some of the best of breed marketers out there. We talk about the SEO side and the marketing aspect, which is honestly the point of all of our code and hours spent. I'll be the first to say I don't have a ton to offer as far as SEO and marketing knowledge drops go. Most of you really have that covered in a very eloquent and succinct way. Thank you for that, it's much appreciated and helps me be more effective in my own projects.
With this in mind, what I feel I can offer BuSo is a bit over 30 years experience as a programmer and "how" I build some of the things I do. I don't have any code-ego issues that I'm aware of and I'm always happy to talk with fellow devs about what languages and techniques they are using. Me being at this for 30 years doesn't make me any better than anyone else but it has definitely given me a lot more time to make a lot more mistakes and learn from them.
Lets Talk about "HOW"
So, I see these great stories of WHAT you've built and WHAT you're doing to market that product/idea, but it's rare I see something about HOW it's built. I think a lot of folks would like even a bit of insight into what it takes to get more than a "handful of scripts" project off and running and maybe even a glimpse into the mind of the developers that build these systems.
I'd like to share with BuSo the general architecture of a system I built and pretty much where my mind is at as a developer these days. I know we have quite a few experienced devs here and you might find this boring. If so, no worries, I'm just hoping to share some insight here and challenge those that might not be doing this type of thing to expand their knowledge. To "up the game of the collective" so-to-speak.
TLDR Warning!!!
Yes this is long. Also, this may be very boring to most, especially if you're not into the code side of things. Some of you may be working with the things I describe below and if so, spare yourself and turn back now, lol. For those that continue, I hope this can at least challenge the way you think about a medium/large scale SAAS type project and how the same mindset can apply to practically any project you can think of.
A bit of background
This is a SAAS that does one thing in life. It provides access to a database of data that is aggregated and scrubbed from about 20 different sites. The data is oftentimes not complete so we use services like Pipl and outsourced data companies to help us fill in the blanks as our customers are interested in complete names/addresses/phones #'s etc. The data that is scraped involves going through a series of "filters" so that the end result is clean data (kinda like the way Google works only this is pretty effective). The system is currently in use in-house and has been running for about 5 years with no show-stoppers. I designed the original system in C#, primarily because the only way I could really drive a browser was with either the junk IE component for .net or using Awesomium so I went with the latter.
And it worked and worked quite well, but there's a difference in something that "just works" and something I feel good about maintaining, scaling and supporting. It was time to say goodbye to what we had and refactor the whole deal into something web-based. As developers, I feel we should always take a "how can I refactor this" stance with our projects, even if it means going with a completely different approach. Just spiking away at code to get something that barely works and then abandon it to do the same thing with other features is going to end up with a crap project held together with duct tape of "at least it works" code and a mile long todo list which, by the time you realize it, you'll be doing a complete re-write the right way anyway. Spike to get it going then refactor. Refactoring doesn't mean "make everything perfect", it means "make this a house I'd live in". So while we're on the subject of metaphors...
Smoketree and C# - Today on divorce court
So, this was designed "the right way" but hell, let's face it, it was a Winforms app... The front-end was a Winforms app that also "talked" to the back-end (I used RabbitMQ for process inter-op). I gotta admit, it works well and has never crashed, but times change. We are no longer in a "winforms" world, haven't been for years, too many to feel that pain anymore. I did what I had to at the time because it was the best compromise of the path of least resistance and providing value to my client. I never enjoyed working with C# and winforms although I am pretty proficient at it. I "get" it, I just don't get why I have to write so much code to do so little. This client was the last of my clients that I was supporting a C# app for (most of my work is web-based) so it was time to have a talk about the future of this project and moving away from anything to do with a dependency on M$ tech. It was time for a change.
Project "Binary Phoenix"
It was decided that we expand to a SAAS in 2015. I was once again offered the opportunity to design the system from scratch, using what I felt was best. I have been working with Rails since the beginning and Ruby since about 2003 or so so naturally I picked Rails for the base framework. I'm a programming language addict. If it's out there, I'm anywhere in the range of "I've messed with it" to "I'd consider my knowledge with it solid" as far as my skill level. I have stuck with Ruby because I'm happy with it. It's a joy to code in and I can go from idea to implementation with code so succinct tears are shed.
Let's take a walk-through shall we?
Front-end:
- Rails 4.2 and all the extra niceties that come with it (coffeescript/javascript, jquery, sass etc).
- Bootstrap & Font Awesome
- Jasny and Fuel-UX for certain components. For instance, Jasny makes it easy to do the "off canvas" thing.
- Paloma gem for per-page javascript
- Several gems for role based access control and authentication (devise, pundit and royce).
- Moment.js for date handling
- Websocket-rails gem for push/pull notifications. Will also use this to implement basic chat functionality with customers without having to use bold chat or something like that.
- Linux (Ubuntu): I've used pretty much every distro of Linux up to and including back in the "slackware" days. Hell I got my first taste of UNIX with HPUX and SCO. Out of all the distros I've worked with, Ubuntu has been the one that gives me the less headaches and gets well out of my way when I need to do things differently. I use Ubuntu for pretty much everything. If I need to do something that is all secret squirrel secure, I'll go with CentOS or the like, but I need a damn good reason for it.
- Apache with Phusion Passenger to support Rails: (http://httpd.apache.org/ and https://www.phusionpassenger.com/) This is relatively painless and I've been working with apache for quite a few years and am comfortable with it. I'm not entirely sold on apache for this use case and am considering nginx with passenger or a combo of nginx and puma.
- Git for repository: I don't use github much really for storing my code. It's a great thing and I don't mind contributing to the code of others to help the community, but really, I just don't want my code on a server I don't have root access to. Sorry but that's just how I roll. I have a few "gitolite" servers (https://github.com/sitaramc/gitolite) I have set up to store my source code. Each time I edit my code and it works, I just commit to to the central repository. When I deploy, my deploy scripts (capistrano) get the most recent codebase that I have commited and use that for deployment. If you aren't storing pretty much your entire life as a developer in some kind of revision control system, it's time to up your game cause you're doing something horribly wrong.
- Capistrano for deployment: (http://capistranorb.com/) Please don't tell me you still deploy software projects with SFTP, SCP, etc? Isn't it much nicer to just go to a command line and run: "bundle exec cap production deploy" and have your whole project deployed to your server, the server restarted/reloaded (if need be) and other things like ensuring file-permissions are straight and any other commands you have to run are taken care of? Yes it is. It's lovely, so look into capistrano. You can deploy pretty much anything you want, it doesn't just work with rails. Just please, don't use the SFTP/SCP route to deploy your stuff anymore. We've already partied like it's 1999, those days are gone.
- Supervisor for process monitoring: (http://supervisord.org/) Supervisor does what it says on the label, it monitors processes and restarts them when they shit the bed. You don't need to write specialized daemon processes unless you have a good need or you're a masochist. All you have to do is write a script that does the infinite loop thing and send the output to stdout. Supervisor will take care of logging the stdout and stderr and will keep track of the PID for you.
- Ruby processes that take care of scraping: (https://www.ruby-lang.org/en/) Since the sites we're scraping make use of Javascript calls, I'm using the poltergeist gem which is a driver for Capybara that allows me to easily use PhantomJS with Ruby. Some of you might be familiar with PhantomJS by having used it in other contexts such as with CasperJS. It's basically all PhantomJS underneath, just a different wrapper. Find out more about PhantomJS here (http://phantomjs.org/) and poltergeist here (https://github.com/teampoltergeist/poltergeist).
- Compiled Go code for classifying and general maintenance: I seriously dislike working with C and C++ code for things that require speed, I always have. It's not because I don't understand it, it's just because it's really not "pretty" code no matter what. With Go, I can get pretty close to the speed of C/C++, the code is compiled to machine code and I can also do things with concurrency patterns that you wouldn't want to touch with threads. (https://golang.org/). Go is used when I need that shot of nitro.
- MySQL for database: Nothing much to be said here. A good solid database. I shift between MySQL and Postgres mostly. If I'm doing something small, like an app that runs on a raspberry pi, I'll use sqlite. (http://www.mysql.com/)
- Redis for all queues and to facilitate process inter-op: (http://redis.io/) Here's what I hope you take with you, if nothing else. How many times do you have scripts/process that need to "talk" to each other? If you've tried basic RPC patterns in other languages, you'll know it's seriously a PITA to do this and ya, it'll work for the language you wrote it in, but what if you need to communicate with processes written in other languages. Like, a VERY simple way with no BS flaming hoops to jump through? I'd like to challenge your thinking a bit. How about only having to worry about 2 things? They are
- Can my language communicate with redis?
- Can my language allow me to easily work with the JSON format?
- Can my language communicate with redis?
- Each process initializes and registers itself in redis where the information is stored in hashes. Every process maintains its own state and just passes on the part of its state it wishes to share to redis. This is updated anywhere from every second to every minute, depending on the process and how chatty it needs to be.
- Each process creates an open channel to itself using the pub/sub capabilities of redis. If I want to know the state of another process or operation, I just communicate with the process directly via its channel. The processes never communicate directly with each other, they only need to worry about redis.
- There are many other queues and lists that are used to hold scraping tasks, progress of said tasks and a plethora of other objects of interest.
- Most messages are passed in JSON format. A typical "scrape command" might look like something like this:
{"command":"do_scrape","task_id":5150,"listen_after_processed":1} .
If a process is asked for its state, it looks something like:
{"process_id":"6f3011fc-d386-47ce-872e-8a6adfe81d26","current_action":"sleeping","last_checkin":"1422121755"}.
- Since every language I'm using understands the above JSON format and can also communicate with redis, I have a very easy way to communicate with all working parts of the system. So maybe for some reason I have amnesia, forget the pain of PHP code and I decide to write a scraper or some other part of the system in PHP. No problem. My PHP process doesn't really care about the other parts of the system and what language they are written in. The code is only worried about whether it can talk to redis and send/parse the return messages and know how to react.
With redis I can just "pop" a task off the queue and rest assured that it won't be given to another process that just happened to ask for the data at the exact same time (yes it happens, albeit rarely). With pub/sub (publish/subscribe) I have a very lightweight way to communicate with any other part of the system at any time. I can have tables in MySQL and sync them with objects in redis that contain the same data. Any time I have to read data, I hit redis first because the lookup is pretty much instant. In short, redis holds it all together and does a beautiful job at it.
This system has been running in pre-production for about 2 weeks now and I don't anticipate anything tragic to happen. I've done my homework and built something I feel is ready for prime time. This system will be generating around 10k a month out of the gate as it's just a rebuild of an old system that has been generating revenue for a while now.
So to end this highly long-winded post, I really hope that some of the above may give some of my fellow devs a few new ideas or even a completely different mindset. Nothing I'm sharing here is particularly new or very innovative, it's just the patterns I've been using the past few years. In other words, I didn't invent any of this, I'm just using the tools above for what I feel is the best purpose. Also I'm sure you "can do this in language X" too. That's what makes things interesting and I'd love to hear about how you're using your favorite tools to build your projects.
So I ask my fellow builders, how do you build what you build?