chithra

Saturday, April 29, 2006

THE VECTOR PROCESSING MODEL

The vector processing model:


Meaning of vector:

Vector is a physical quantity which has both magnitude and direction.

Meaning of model:

A model is a schematic description of a system, theory or phenomenon that accounts for its known or inferred properties.

Various mathematical have been proposed to represent information retrieval systems and procedures, one of which is “The vector processing model”, which represents documents and queries by term sets and compares global similarities between queries and documents.


The vector processing model assumes that an available term set, called term vectors, is used for both the stored records and information requests.

Consider a collection of documents in which each document is characterized by one or more index terms. Thus, the documents are the objects in the collection each of which is represented by a number of index terms. The similarity between two objects is normally computed as a function of the number of properties that are assigned to both objects. Substantially similar methods can be used for determining collection structure and for retrieving information by comparing the query vectors with the vectors representing the stored items and retrieving items that are found to be similar to the queries.


Consider two documents –DOCi and DOCj. Let TERMik reqresent the weight of the (property) term k assigned to document i. One may assume the value of TERMik as zero or one (in the case of binary system), or the weight may vary from zero to a maximum value (say four or six, or so). Now the two document vectors may be represented as

DOCi = (TERMil, TERMi2, TERMi3……TERMit)
DOCj = (TERMj1, TERMj2, TERMj3…..TERMjt)

Where t terms (i.e. properties) have been assigned to characterize each document (i.e. object).

The following vector functions are to be considered to compute the similarity between the two given vectors:
t
(1) TERM ik
k=1


This denotes the sum of weights of all the properties included in a given vector;


t
(2) TERM ik. TERMjk
k=1

Which denotes the component by component vector product, consisting of the sum the products of the corresponding term weights for two vectors;

t
(3) min (TERM ik. TERMjk )
k=1

Which denotes the sum of the minimum component weights of the components of the two vectors; and

t
(4) min(TERM ik )
k=1

Which denotes the length of the property vector (here, for the document DOCj), when the property vectors are considered as ordinary vectors.

These functions can be illustrated with the following example. Suppose the two document vectors are represented as

DOCi = (3, 2, 1, 0, 0, 0, 1, 1)
DOCj = (1, 1, 1, 0, 0, 1, 0, 0)

Where each document is assigned eight index terms. The four vector functions will then be:


t
(1) TERM ik = (3+2+1+0+0+0+1+1)=8
k=1



t
(2) TERM ik. TERMjk = (3.1)+(2.1)+(1.1)+(0.0)+(0.0)+(0.1)+(1.0)+(1.0)
k=1 = (3+2+1+0+0+0+0+0) = 6



t
(3) min (TERM ik. TERMjk ) = min(3,1) + min(2,1) + min(1,1) + min(0,0)
k=1 min(0,0) +min(0,1) +min(1,0) +min(1,0)
= 1+1+1+0+0+0+0+0 = 3

t
(4) min(TERM ik ) = √(3.3)+(2.3)+(1.1)+(0.0)+(0.0)+(0.0)+(1.1)+(1.1)
k=1


Several coefficients for similarity measures can be used; Salton and McGill 7 show five such coefficients, which are shown below.


1. The dice coefficient


t
2﴾∑ (TERMik.TERMjk)﴿
k=1 2(6)
SIM (DOCi, DOCj) = 1
t t 8+4
∑ TERM ik + ∑TERM jk
k=1 k=1



2. The Jaccard coefficient



t
∑(TERMik.TERMjk)
k=1 6
SIM (DOCi, DOCj) = 1
t t t 8+4-6
∑ TERM ik + ∑TERM jk+∑(TERM ik . TERMjk)
k=1 k=1 k=1

3. The cosine coefficient, which is a measure of the angle between two t-dimensional object vectors in a space of t dimensions:




t
∑ (TERMik.TERMJK)
k=1 6
SIM (DOCi,DOCj) = 0.75
t t 8
√ ∑ (TERM ik) . ∑ (TERM jk)
k=1 k=1

4. The overlap coefficient.


t
∑(TERMik.TERMJK)
k=1 6
SIM (DOCi, DOCj) = 1.5
t t 4
min﴾∑ (TERM ik). ∑ (TERM jk)﴿
k=1 k=1

5. The asymmetric coefficient:


t
∑(TERMik.TERMjk)
k=1 3
SIM (DOCi, DOCj) = 0.375
t 8
∑ (TERM ik)
k=1



Advantages of vector processing model:

1. It improves quality (term weighting)
2. Allows approximate matching (partial matching)
3. Gives ranking by similarity (cosine formula)
4. Simple, fast.

Disadvantages of vector processing model:

1. It assumes that index terms are independent
2. No logical expressions.

Sunday, November 27, 2005

SEMINAR

Seminar on:

Canon of consistent succession:

First we will understand the meaning of the phrase “Canon of consistent succession”. The word 'canon' means a body of principles, rules or standards. Here, canons are useful for designing a scheme of classification. The word ‘consistent’ means ‘regular’ or ‘firmness’. The word succession means “The act or process of following in order or sequence” or ‘lineage’.

So the meaning of “consistent succession” is “firmness in the act of following in sequence”.

The canon of consistent succession is one of the canons for succession of characteristics.

It states that “The succession of the characteristics in the associated scheme of characteristics should be consistently adhered to, so long as there is no change in the purpose of classification”.

That is once a citation order of facets has been established for a classification system; it should not be modified unless there is a change in the purpose, subject or scope of the system. In other words the succession of characteristics chosen for the purpose of classifying a specific universe of subject should be followed consistently.

This canon requires consistency not only in the characteristics used but also in the sequence in which they are used. It is obvious that lack of consistency will lead to chaos and defeat the purpose of classification.



For Example:

(i) For the universe of subjects going with the main class ‘History’, DDC has chosen the Geographical and the Period characteristics as the only necessary ones. It has also decided their succession as “Geographical and then Period”. So this decision should not be changed from time to time and should adhere to it consistently; i.e., the sequence should not change from edition to edition; otherwise chaos will result.

(ii) For the same universe of subjects, CC has chosen four characteristics instead of two. They are community, organ of the state, Attribute of organ, and the period. It has been decided that this is the most relevant succession. So CC should adhere consistently to this decision on the succession of these four characteristics. Otherwise chaos will result.

(iii) For the universe of subjects going with the main class ‘Literature’ DDC has chosen the language, form, period and so on as the succession of characteristics, so this sequence should not be changed from time to time and should adhere to it consistently.

The demand of this canon is quite simple, but useful to avoid confusion and chaos. This canon is important because it ensures a degree of consistency and predictability in the structure of classification system.


BIBLIOGRAPHY:

Ranganathan, S.R. 1967. Prolegomena to library classification. 3rd ed. Bombay: Asia publishing house.

Dhyani, Pushpa. 1998. Library classification: Theory and principles. New Delhi: H.S. Poplai.
Parkhi, R.S. 1964. Decimal classification and Colon classification in perspective. Bombay: Asia publishing house.

Tuesday, November 15, 2005

LAST DAY OF THE COMPETION

Hi every one !

Yesterday I was not able to blog because of my illness. I had fever and actually I wanted to blog so I got up at 8.30pm and went near my computer to blog my mom scolded me like anything then I silently went back and slept. So sad about me! I missed blogging for the first time in past 15 days. I felt very bad.

Hmm! Today is the last day of competition that never meant it is the end of blogging. I remember one of the great sayings “Ending of one thing is the beginning of the other”. I really enjoyed blogging all these days. In the beginning days of the competition I was very slow in typing but now a days I feel I have improved a lot and also I found a very good friend that is none other than my blog. Before, diary was a personal thing as it was kept secretly but now it is just the opposite in the case of blog. And we can also know the view point of others on different topics. I feel it is very interesting. And I am very proud to say that I know blogging. And many of my friends learnt blogging from me. All these credit goes to our sir N S Harinarayana. Thank you sir.
Ok friends bye goodnight, sweet dreams, take care, BYE BYE!!

SHAIRE

Kal jab mile thhe
to dil mein hua ek sound.
Aur aaj mile to kehte hain
your file not found!
-----------------------------------------------------
Jo muddat se hota aaya hai,
woh repeat kar doonga...
Tu naa mili to apni zindagi
ctrl+alt+delete kar doonga...
-----------------------------------------------------
Shayad mere pyar ko
taste karna bhool gaye...
Dil sey aisa cut kiya
ke paste karna bhool gaye...
-----------------------------------------------------
Laakhon honge nigaah mein
kabhi mujhe bhi pick karo...
Mere pyaar ke icon pe
kabhi to double-click karo...
---------------------------------------------------
Roz subha hum karte hain
pyar se unhe good morning...
Woh aise ghoor ke dekte hain
jaise 0 errors aur 5 warning...
-----------------------------------------------------
Aisa bhi nahin hai ke
I don't like your face.
Par dil ke storage mein
no more disk space.
-----------------------------------------------------
Ghar se jab tum nikale
pehen ke reshmi gown.
Jaane kitne dilon ka
ho gaya server down.
-----------------------------------------------------
Jabse meri zindagi mein,
aayi hai ik female.
Bhool gaya hai sab kuchh
kya mailbox, kya e-mail.
-----------------------------------------------------
Dil se ek ishq ki
application create kar raha hoon.
Pyaar se debug karna
mein wait kar raha hoon.
-----------------------------------------------------
Tumhaare intezaar mein
neend aayee so gaya.
Yeh dekho mera connection
time out ho gaya..
-----------------------------------------------------
Nazar mein to kai hain
aur shaayad lonely hain...
Problem yehi hai ki voh
ab read only hain...

OVERWORKED

I'm tired because I'm overworked.

The population of this country is 90 crores.
17 crores are retired.
That leaves 73 crores to do the work.
There are 24 crores in school, which leaves 49 crore to do
the work.

Of this there are 20 crores employed by the Central
government, leaving 29 crores to do the work.
3 crores are in the Armed Forces, which leaves 26 crores to
do the work.

Take from the total the 18 crores people who work for State
and City Governments and that leaves 8 crores to do the work.

Total unemployed are 6 crores that leaves 2 crores to do
the work

At any given time there are 1.2 crore people in hospitals,
leaving 80,00,000 to do the work


Now, there are 79,99,998 people in prisons.


That leaves just two people to do the work.

You and me.

And you're sitting at your computer reading jokes & junk
mails
leaving me alone to do all the work!!!

Sunday, November 13, 2005

HELLO FRIENDS

Hmm! today i am feeling a little better than yesturday.
Today it is the festival of godess tulasi. This is a festival of lights. It is actually the day that godess Tulasi married lord Krishna. we celebrated it very well my mom had prepared delicious food but i was not in a position to eat those things because of my fever so sad!!! and in the evening we lighted candles outside our house and it was very nice to look at. Ok thats it for today. bye good night sweetdreams.

DEFT DEFINITIONS:

FATHER:
A banker provided by nature.

BOSS:
Someone who is early when you are late and late when you are early.

POLITICIAN:
One who shakes your hand before elections and your confidence after.

DOCTOR:
A person who kills your ills by pills and kills you with his bills.

DIPLOMAT:
A person who tell you to go to hell in such a way that you actually look forward to the trip.

PESSIMIST:
A person who says that O is the last letter in ZERO, instead of the first letter in word OPPORTUNITY.

OPTIMIST:
A person who starts taking bath if he accidentally falls into a river.

RUMOR:
News that travels at the speed of sound.

DICTIONARY:
The only place where divorce comes before marriage.

OFFICE:
A place where you can relax after your strenuous home life.

YAWN:
The only time some married men ever get to open their mouth.

ETC….
A sign to make others believe that you know more than you actually do.

EXPERIENCE:
The name men give to their mistakes.

TEARS:
The hydraulic force by which masculine power is defeated by feminine power.

ATOM BOMB:
An invention to end all inventions.

Saturday, November 12, 2005

hi

hi today i am not feeling well. i am suffering from fever and so i am not in a position to blog anything today. ok friends bye.

Friday, November 11, 2005

VALUABLE GIFTS

The best gifts to give:


To your friend ……..
Loyalty.


To your enemy……..
Forgiveness.


To your boss ……..
Service.


To your child ……..
A good example.


To your mate ……..
Love.


To your parents ……..
Gratitude and devotion.


To GOD……..
Your life.

Thursday, November 10, 2005

SWAMY VIVEKANANDA


Inspiring words of Swamy Vivekananda:


One man ought to live in this world like a lotus leaf,
which Grows in water;
But is never moistened by water;
So a man ought to live in this world his heart to GOD
And his hands to WORK.

Strength is Life, Weakness is death

Be not weak , either physically, mentally ,
morally nor spiritually,

BE A HERO . ALWAYS SAY “I HAVE NO FEAR”.

Awake!, Arise !!, Stop Not Till The Goal Is Reached

Take up one idea ,make that idea your life ,
Think of it , dream of it, live on that idea.
Let the brain muscles , nerves ,
every part of your body,
Be full of that idea, and leave every other idea alone.
This is the way to success ……

Never lose faith in God.
Never lose faith in yourself;
You can do anything in the universe

More is the power of concentration more is the knowledge
Acquired, because this is the only method of acquiring knowledge.

HAVE NO MOTIVE EXCEPT “GOD”.

Hold your money merely as custodian for what is GOD’s.

Look at the ‘Ocean’ and not the ‘wave’.