DAY1 – I find the architectural element of databases and the logic of organising data was very engaging so I’m looking forward to this class. Our first day was a chance to get a gentle introduction into the various aspects and key features of noSQL and interact with Seifedine on his first day. I appreciate his practical background in the subject and his engaging style in the lecture. The changes to our reflective journal assessments seem positive for further encouraging one to dive into the details that will help us in our final class exam. Doing some research for day one’s tutorial task has helped me grasp the overall semantic concept of noSQL and how important it is for modern data that’s so variable in type and so large in scale.
DAY2 – This was very much a day of going over the foundational skills of databases and reviewing some of our SQL work from first year. What I remember so clearly from first year was how easy some tasks appear but yet how easy it is to have a mistake go unnoticed. I found myself always triple checking answers for any errors such as selecting the wrong join type and ending up with duplications and so forth. Today’s task was simple yet valuable for remembering the syntax.
DAY3 – Today was a very technically oriented day further highlighting differences in SQL and noSQL whilst also going into the motivations behind the design and real world functionality. I really appreciated the time spent defining why the relational model fails as a total solution in so many modern use cases but how it still has a massive role to play and often forms part of a multi-database architecture in the enterprise world. With an eye on ‘business logic’ and the value of data such as bank transactions, it was really surprising to learn that NoSQL relies instead on using the BASE philosophy. Interestingly This article (link) states that MongoDB 4.0 has a new ACID transaction feature.
DAY 4 – A focus on Redis today led to me gaining a far better understanding of the nuances of its feature set. The associated slides that Seifedine presented had lots of detail on the different command set which will be a great reference whilst getting up to speed. I ran through the official tutorial (Try Redistry.redis.io) which was a browser based interactive walkthrough of the main features. Insightful extra material I discovered for the day was Brad Traversey’s video on using Redis as a cache database which seems a common use case (source). I find that real world use cases are always a help in trying to visualise and get to grips with a new technology personally. Looking forward to week two!
WEEK 2 – REFLECTIVE JOURNAL
Really excited for this second week of NoSQL databases. After reading some documentation on the installation process of Redis on a MacBook I decided to use a familiar tool, the Homebrew package manager to install and launch the server. This seemed simpler than downloading the .tar file, as Homebrew takes care of a lot of specific iOS elements. I enjoyed the practical nature of the lecture which provided me the chance to code alongside Seifedine and try and ingrain the logic and syntax of this database. I’m conscious it’s been almost a year since our SQL course! Although having done some background reading on Redis last week, two pieces of new information really impressed me. Firstly it has the capability to write 100,000 times a second and secondly a single key is capable of holding four billion elements. This is ‘big data’ I guess but the scale is hard to comprehend. The activity appeared straightforward but it took me some time to get the syntax correct and ensure an output to the console. But having done this and reading some further material I’ve come to understand it’s much more efficient to only initiate code when needed and not have the system constantly scanning for changes. The tutorial I used also made use of the expire function which I assume is popular in a security scenario when changing passwords of online accounts.
I’d dismissed the discussion on Discord between students about the difficulty in installing Cassandra – unfortunately they were right! I took a long time to identify the problem but as often happens it looks like such a simple fix in hindsight. I’m not sure how specific it is to iOs on Mac’s but I needed to change the default Java JDK version for Cassandra. Overall a super valuable lecture getting familiar with the column family model of databases. My initial impression is that it’s use of “data replication among the nodes in a cluster to ensure no single point of failure“ is a very important feature.
A beneficial class today working through interactions with the Cassandra database. Practical walkthroughs on your own machine ensure that you get a more intuitive feel for the information I think. The language syntax is very similar to SQL as Seifedine mentioned yet there are limitations in place on how to retrieve data. I liked the explanation in the activity that it does this for performance reasons such as the example of making a million redundant checks of a column. The implications of scale are really something in these NoSQL environments.
WEEK 3 – REFLECTIVE JOURNAL
I’m writing this reflection a lot later in the evening than planned – what appeared as a gentle exercise involved much more than I anticipated! I made the mistake of not prettifying my JSON syntax when composing search queries and so I was getting visually mixed up with four levels of brackets in conjunction with deciphering a new and unfamiliar syntax. After a period I decided to compose my queries in VSCode rather than directly in the Mongo command line so that I could take advantage of the syntax highlighting and spot errors immediately (I’ve included a screenshot in image 5 in the activity section). In summary, I’m super thankful for a practical day after which I definitely feel I’ve made the first baby step towards some fluency in retrieving data and comprehending the Mongo structure. I had a little mix-up when trying to grasp the indexing concept required for text searching. This source required a few re-readings!
Having eventually conquered yesterday’s tutorial, the lecture went smoothly as many points were reiterations or reframing of the commands we needed to use. For me the implicit AND operators are a little hard to recognize but I’m sure the ‘power users’ are very appreciative of its brevity in queries. Of particular note was the $regex and method chaining examples.
In the end I used even more time than yesterday in the activities. Importing the dataset took longer than intended as I found a lot more about the mongoimport tool through trial and error and what arguments are needed in the command. Much of my issue was getting conceptually mixed up in the four instances I have available locally and experimented with (my Docker container for work, another via brew install, an unzipped manual archive and the Compass Cluster!).
After that I endured the familiar and ugly learning process of trial and error of the syntax and structure of making much more specific commands of a larger dataset. The official documentation is good but specifically helpful was the associated Compass documentation as it contained SQL examples to relate to as in the image 12 below. It took a while but upon this written reflection I feel comfortable with what I’ve learned in being able to target embedded documents. First using the $unwind command which is a type of unpacking followed by the $match and $sort commands. Today just proves to me how important it is to spend time trying a few different approaches and there is no shortcut! I also take a little pride in discovering the Github datasets authors question on StackOverflow (source)!
Image 12 – MongoDB Compass documentation – helpful SQL comparison (source)
Although there was no activity I appreciated another practical / demonstration session with which I was able to follow along in the evening after a busy day of work. I was a touch frustrated as I always try to join live when I can, especially when Seifedine engages with students more than than on any other course. I’m conscious of always making an extra effort when studying ‘remotely’ as the experience is very different than that of sitting in a classroom where you can’t press pause! In short I found it beneficial to work through these examples with a focus on editing and updating the dataset rather than just querying it. I feel like I have a better understanding of the ‘no schema’ value proposition offered by NoSQL, mainly appreciating that one must still use it with some discipline or order, otherwise you could use a lot of time writing custom queries for each new business case. I’m very curious to see what type of exam we will be set next week!
WEEK 4 – REFLECTIVE JOURNAL
It was very rewarding to start the week off with a good grade from last weeks activity. Continuous assessment is more beneficial to one’s progress I feel, as you get a better grasp for where you are in the learning journey. That being said the lecture started with Seifedine mentioning that nobody had managed to order the collections correctly in the Wednesday activity – all of us simply loading the data into one collection. Clearly on such a small dataset this wasn’t a problem but I feel I appreciate the logic and semantics of our errors for when we are faced with the challenge of bigger data!The automatic scalability (and sharding for db’s) promised by the cloud is awesome and I would love to learn more about that. We concluded with a short discussion on tools which Seifedine was clear to point out something I’m continually amazed at – there is an endless sea of tools I guess arising from the detailed custom needs of large projects. This (source) article was a good compliment to today’s class. Enjoyed setting up some basic schema like data validation for the activity. I assume it prevents JSON formatting errors or acts as a layer of insurance to business logic that may be passed in.
Luckily for me on Mac, Brew package manager auto installed the database tools as shown below.
Image – Tools come with mac install community edition 4.4
Image – successful import of the data from the lecture – screenshot from Compass
I appreciated the discussion of these tools which are perhaps a bit advanced for our current stage but something used everyday by the professional perhaps. As additional reading, I enjoyed this article (link) about gridfs and the chunking of data which appears quite a clever way to manage big data. We touched a little on the topic of denormalization in class and the comparison with SQL which was valuable for my conceptual model. Further reading of the documentation (link) was necessary and an introduction to what they call ‘pipeline’ queries. I’m really beginning to be won over by the main arguments of the document model and the nested and embedded patterns (link) with its inherent advantage of speed and simpler queries. The avoidance of redundancy seemed brilliant in SQL but since our course last year I’ve definitely come to appreciate simpler queries!
I didn’t expect today’s topic of geospatial data but it was very thought provoking. I’ve not worked with location data before but it’s obvious upon reflection how vital it is to everyday life with our mobile devices. In the context of big data and advertising I’ve read about how this data can link people and behaviours for example. Seifedine gave a full demonstration and walked us through the specific features in which MongoDB caters for this data such as the need to run a “2dsphere” index on the data. Towards the end we had some time to discuss Hackolade and it’s extensive feature set, but the full discussion tomorrow will help us complete the rest of the activity sheet. I noticed on the start-up tips (image below) it can auto generate ERD’s which is a massive time saver! Another use case they gave was running the tool from a script to report on changes to a datas structure as a background process which seems like a big help for a dev team.
Image – auto generated structures from Hackolade
Although I experimented a little with data entry validation last week, the breadth of options and the quality of the interface offered by Hackolade appear very valuable in speeding up the build of a very technical database. Clearly NoSQL can easily lead to data structure errors with its inherent schemaless design so this element of Hackolade would appear to be a big help if application side code wouldn’t be desirable. I enjoyed thinking about normalized data again but I feel I really need some more practice!
Neo4j has become a familiar tool for me in the last few months as it is central to our studio project where we map related Wikipedia articles in a web-app. Having said that I appreciate we are just scratching the surface of what it can do in terms of valuable output and process. Overall it just seems like graph data in general is absolutely vital in understanding larger data. Phrased another way, it seems like an amazingly unique feature that it can show you patterns or connections no human could ever handle. I enjoyed our discrete math course last year with its focus on graph algorithms, so I definitely see myself spending more time in this area in the future. Looking forward to next week.
WEEK 5 – REFLECTIVE JOURNAL
A nice lecture today to fully introduce Neo4j and it’s possibilities. Seifedine’s enthusiasm for this database type is something I’ve come across from many developers which means there really must be some value to it over the historical precedents. Perhaps it’s the speed on certain queries versus those of a RDBMS? I appreciated the example given in the lecture as that was something I didn’t really expect! As I mentioned in last weeks journal, our studio project involves this graph database so it’s very topical for me. As Seifedine also mentioned, I have found their documentation very helpful. I would also say their developer relations live streams and attached repo’s in Github have been helpful in our project. One thing I haven’t spent much time doing and hope to, is writing more db queries in the cypher language particular to Neo4j.
With limited space available to reflect I’ll dedicate this entry to the activity process. A bit like last week I took so much longer than I first assumed 6 command queries would take to create! As mentioned above I made some initial mistakes whilst I took my time to become familiar with the dataset. After that I found it helpful to try and write out the logic in english and then match it with a search through the documentation. I found the Neo4j developer advocate (M Hunger: source) answering questions on Stack Overflow helpful also. Some of the queries seem a little verbose but after getting a bit more familiar with relationship syntax it seems to prevent some of the duplications you can easily get with a query that’s close but incorrect.
Following a run through of some examples and the details of the query language we tackled the neo4j certification for our activity. It was a really well structured process and something I wish I had done before yesterday’s activity! I really liked the following description they gave, “The nouns are nodes of the graph, the verbs are the relationships in the graph, and the adjectives and adverbs are the properties.” Also a clipout, image 11, highlighted how we should optimize queries on larger data.
Image 11 – optimizing queries
Image 12 – Very helpful autocomplete
Some of the language design made intuitive sense rather quickly as it heavily borrows from other programming paradigms. Two personal examples would be the ability to use REGEX and also the * selector (nodeA)-[:RELTYPE*]->(nodeB) as I’m familiar with in CSS. All in all a very productive and useful certification to achieve.s Movie
Today was the final lecture of our class and also where we were introduced to next weeks exam. Although we had no activity due, we worked through an introduction to some topics I’ve read mentioned as important to industry, Hadoop and Map-Reduce. There was a lot to squeeze into the hour including a neo4j python driver, a social network analysis demo and further neo4j plugins. The data science library is relevant to us in our studio project as we are considering one of their search algorithms – hopefully we can get something working with it next month! In summary of the course, I’ve learned more than I could’ve hoped and enjoy the change to continuous assessment – ensuring prioritisation of practice has helped give me extra confidence.